Data Profiling, in other words identifying the sources and checking the quality of the data is a part of a BI project. But how important it is and how much it effect to the success of a BI project? That is what I’m going to discuss in this post.
It is known that higher percentage of BI projects fails due to various reasons and that has even been proven by famous “Gartner’’ as well. One of those reasons is project goes beyond the estimated time plan and budget. Hence the project become a financial lost for the respective implementing company and it leaves no other options than stopping the project.
If you are to implement a BI solution for a client, at the very beginning you know next to nothing about client’s business as well as the data. But in most cases, client will assure you that the data is nice and clean and it’s just a matter of pulling data and visualizing. Trust me, it’s a trap. don’t fall in to it . Apart from very few scenarios, it’s always not the case. So what we should do to overcome that problem?
Ideal Solution: Get to know business and data before do an estimate.
Well, this is the ideal way to overcome it and start the project. Before discuss about pricing you need to know the effort for the project and to give the effort, you need to know that business and data. For that you will need to sit with your clients and discuss about their business. Then you should request access for their data sources and should check the quality of their data with them. Depending on the size of the project, this could takes between 2 days to 2 weeks. But how practical is it? Normally we have to provide an estimate for a RFP document which does not have any information about data and it’s quality. So what can we do in such cases?
If not: Estimate for Data Profiling
Estimate for data profiling is critical in any BI project, irrespective of the size of the project. In this period, not only you need to identify the sources ,fields and calculations but also problematic areas in data, at least in high level. The reason why I said high level is some of those data issues will anyway identified during development stage. Nevertheless, you should try to identify data quality issues inside data sources as much as possible. Therefore make sure you have a considerable effort in you estimate for this. I personally believe that 3%-4% of you total project duration must be allocated for this.
Estimate for Data Cleansing
Okay, Now once you identify data issues in data profiling stage, what’s next? you will require to put some effort to clean the data. it is not something you can take lightly. You will have to put considerable effort to clean those data, may be from your side or client will take responsibility of doing so. Even if the client takes the responsibility to clean up the data for you, you might have to wait till he cleans up data and it’s up to you to decide whether you are going to charge him for that waiting period or not. I would estimate 4%-5% time from total project duration for this.
Have adequate buffer time resolve data issues.
As I mentioned previously, doesn’t matter how much effort you put in data profiling period to identify data quality issues, problems will come during the development period always!! The reason for that is we get to know some of those data issues when we load data in to production environment or may be in UAT environment . In most cases, Dev and QA environment have sub set of actual data and will not cover all business data. Even it is not, when you load data to Dev and QA you will come across some bad data.
Hence always anticipate data issues in any phase of the project and have adequate buffer time to resolve those issues. Most importantly, those data issue might be a blocker to you and you might have to wait doing nothing until client check and fix those data. I personally believe that, 4-5% of your total estimate must be allocated for these kind of data issue. However all these numbers is very subjective and heavily depend on the number of data sources and quality of those sources.
Thank you very much for reading and put you comment below if you have any thoughts about this. Cheers !!!!