6 min read
Nov 12, 2018

Let your data shine: What is data scrubbing?

Data scrubbing, or cleansing, is the process to clean all the things that make data dirty and unviable for use in business intelligence (BI) software and data analytics. Follow this data scrubbing checklist and let your data shine.

Thomas LaMonteContent Analyst

Here’s a compliment sure to make any business blush: “I bet your database cleans up nice-what a great opportunity for data scrubbing!”

From my experience, this moment will resolve itself with either high fives or a puzzled look that says: “What is data scrubbing?”

I’m glad you asked: Data scrubbing, or cleansing, is the act of checking business data for inaccuracies, duplicates, and outdated and incomplete entries. Data scrubbing is the process of cleaning out all the things that make data dirty and unviable for use in business intelligence (BI) software and data analytics.

Data isn’t that different from things you clean up on a regular basis: dishes, laundry, your dog who’s ransacked the garbage can. Data scrubbing, in fact, has more in common with giving your dog a bath than anything else, because data-like most dogs-tries its best to avoid getting clean.

If you put off giving your dog a bath, you’ll have a smelly-albeit relieved-canine. Neglect cleaning your data and the consequences are much worse: You risk ruining your business.

Group 3@1x Created with Sketch.

Sorry messy folks: People with well-kept, organized data have a competitive advantage over you.

Kissmetrics estimates that businesses lose at least 20 percent of revenue because of poor quality data. Forty-one percent of companies cite inconsistent (dirty) data across technologies including CRM, marketing tools, and BI as their biggest challenge.

When your data isn’t clean and it’s used for data analytics, it’s like putting gasoline in a diesel car. Not only will your business run less efficiently, you risk corroding the engine.

If low quality data is used for data analysis, you’ll make poor decisions with expensive consequences.

Not performing a data scrubbing regimen should feel as uncomfortable as days gone without a shower. Yet only 16 percent of companies characterize the data they are using as “very good.” In other words, data that is grounded in integrity, accuracy, and security.

Data scrubbing to rectify poor quality data can be the most exhausting and difficult part of data analysis-but it’s also the most important.

In fact, the most successful data-driven organizations are obsessed with keeping their data spotless. Maintaining data quality is not just about avoiding the repercussions of poor quality data, it’s just as much about using clean data to glean insights for business innovation.

Sorry messy folks: People with well-kept, organized data have a competitive advantage over you.

Group 3@1x Created with Sketch.

What is data scrubbing | cleaning | cleansing?

Everyone has some dirty data in need of a dusting off. So how do we clean it all up? A quick Google search pulls up the following activities:

  • Data scrubbing

  • Data cleaning

  • Data cleansing

So what separates these three-is it the type of brush used or maybe the level of hazmat protection needed? I bet cleansing must call for some essential oils, right?

The good news is that while there are nuances that separate these terms, they refer to the exact same process and strategies in the context of purifying data for use in analytics.

Whether you call it data scrubbing, cleaning, or cleansing, it refers to:

  1. The process of modifying, amending, merging, and removing corrupt, incomplete, outdated, or inaccurate data.

  2. The strategies to secure, make compliant, and enrich data to add value to the business.

Group 3@1x Created with Sketch.

Why does data scrubbing matter?

On average it costs:

  • $1 to prevent a duplicate

  • $10 to correct a duplicate

  • $100 to store a duplicate data asset if left untreated

It’s just good business sense to clean up after your data before the mess grows even bigger. Presently, 50 percent of IT budgets are spent on data rehabilitation.

The other issue is that most every human on earth is creating astronomical amounts of data. The internet of things (IoT) is a major contributor. You can barely get a coffee without creating your weight in megabytes.

From food trucks to Walmart, every business regardless of size should therefore practice data cleaning, because they are producing and collecting a glut of information. Data governance strategy-policies in place to mandate how data is managed in your organization-is a key priority for any sized business.

Group 3@1x Created with Sketch.

Data governance strategy brightens data quality

Of course I’d say that your business should undertake a strict regimen of data scrubbing for all the aforementioned reasons, and inform your data scrubbing strategy and execution by data governance policies. But I worry this advice is not enough to make an impact and would result in failure.

Instead, I recommend prioritizing data scrubbing as an always-running background process. Commit to data scrubbing to the same degree as the upkeep and maintenance you do in your personal life when picking up after yourself and your things.

 RECOMMENDATION:  Use the following data scrubbing checklist to kickoff your data governance initiative, leverage good data quality for business progress, or triple check that your cleaning all those hard to reach places where data hides in your organization.

Back to top