How important is “Data Profiling” in a BI project?

Data Profiling, in other words identifying the sources and  checking the quality of the data is a part of a BI project. But how important it is and how much it effect to the success of a BI project? That is what I’m going to discuss in this post.

It is known that higher percentage of BI projects fails due to various reasons and that has even been proven by famous “Gartner’’ as well. One of those reasons is project goes beyond the estimated time plan and budget. Hence the project become a financial lost for the respective implementing company and it leaves no other options than stopping the project.

If you are to implement a BI solution for a client, at the very beginning you know next to nothing about client’s business as well as the data. But in most cases, client will assure you that the data is nice and clean and it’s just a matter of pulling data and visualizing. Trust me, it’s a trap. don’t fall in to it Open-mouthed smile Open-mouthed smile. Apart from very few scenarios, it’s always not the case. So what we should do to overcome that problem?

Ideal Solution: Get to know business and data before do an estimate.

Well, this is the ideal way to overcome it and start the project. Before discuss about pricing you need to know the effort for the project and to give the effort, you need to know that business and data. For that you will need to sit with your clients and discuss about their business. Then you should request access for their data sources and should check the quality of their data with them. Depending on the  size of the project, this could takes between 2 days to 2 weeks. But how practical is it? Normally we have to provide an estimate for a RFP document which does not have any information about data and it’s quality. So what can we do in such cases?

If not: Estimate for Data Profiling

Estimate for data profiling is critical in any BI project, irrespective of the size of the project. In this period, not only you need to identify the sources ,fields and calculations but also problematic areas in data, at least in high level. The reason why I said high level is some of those data issues will anyway identified during development stage. Nevertheless, you should try to identify data quality issues inside data sources as much as possible. Therefore make sure you have a considerable effort in you estimate for this. I personally believe that 3%-4% of you total project duration must be allocated for this.

Estimate for Data Cleansing

data-cleansing

Okay, Now once you identify data issues in data profiling stage, what’s next? you will require to put some effort to clean the data. it is not something you can take lightly. You will have to put considerable effort to clean those data, may be from your side or  client will take responsibility of doing so. Even if the client takes the responsibility to clean up the data for you, you might have to wait till he cleans up data and it’s up to you to decide whether you are going to charge him for that waiting period or not. I would estimate 4%-5% time from total project duration for this.

Have adequate buffer time resolve data issues.

As I mentioned previously, doesn’t matter how much effort you put in data profiling period to identify data quality issues, problems will come during the development period always!! The reason for that  is we get to know some of those data issues when we load data in to production environment or may be in UAT environment . In most cases, Dev and QA environment have sub set of actual data and will not cover all business data. Even it is not, when you load data to Dev and QA you will come across some bad data.

Hence always anticipate data issues in any phase of the project and have adequate buffer time to resolve those issues. Most importantly, those data issue might be a blocker to you and you might have to wait doing nothing until client check and fix those  data.  I personally believe that, 4-5% of your total estimate must be allocated for these kind of data issue. However all these numbers is very subjective and heavily depend on the number of data sources and quality of those sources.

Thank you very much for reading and put you comment below if you have any thoughts about this. Cheers !!!!

One thought on “How important is “Data Profiling” in a BI project?

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s