Robert, the Customer Success Head at GlobalMart had requested two summary reports :
- Count of customers by State
- Count of orders by State
The format suggested is :
(Numbers shown are for representation purposes only)
Rajiv, the newbie Business Analyst prepared the report within minutes and sent across to Robert.
However, the next day Robert came back to Rajiv, sounding a bit upset about the quality of report received.
Comments from Robert :
- The number of customers in California is incorrect
- The orders by state report is entirely wrong
- Why are blanks in Profit and Sales columns?
- Why is there no customers for the Order ID : CA-2017-114413 ?
- The number formatting of Sales and Profit columns is not as expected
- I wanted to create an year wise product addition report. I was expecting something like this :
But, what I am getting is this :
Robert highlighted the above issues with Rajiv's manager and has issued a stern statement :
All the reports going to customers should be of top notch quality. Take whatever steps necessary to fix the issues and ensure they don't repeat in future.
Rajiv, visibly frightened and mostly clueless about what could be all possible quality issues and what are those necessary steps went to Sharon, the Lead Business Analyst for help.
Let's check out the below conversation between the two of them and figure out the problems and how to tackle them.
Rajiv : Hi Sharon, Hope you are doing good. I'm in a very bad situation right now. Need urgent help.
Sharon : Hi Rajiv. You look visibly tensed. Please calm down. I can help you.
Rajiv : Robert had asked me some simple reports. I am pretty confident about my Excel skills. So, I prepared them quickly and sent across. However, it turns out my reports have so called quality issues and he has given a stern warning to ensure all necessary steps are taken to ensure no further bad reports.
I don't understand what does he mean by quality issues and what should be the necessary steps? He seems pretty upset about our reports quality and wants a permanent fix.
Sharon : Hmm. I get it. You are facing the classic data quality issues faced by many organizations in their reporting processes.
Rajiv : What do I do about it? Here is my worked out Excel sheet. Please help.
Sharon : Thanks for the notebook. Let's dive right into it, learn what do we mean by quality issues, how to find and fix them.
Rajiv : I'm ready. Let's begin.
Sharon : First things first, get a first hand idea about what do we mean by data quality issues. Check out the hint to learn more.
Rajiv : Woah! I've been so ignorant about such important concepts all this while. This changes my entire perspective of looking at data. I've been always like jumping into doing things with data without even realizing it can have so many kinds of issues!
Sharon : Glad that you realized this. Data Quality causes millions of dollars of losses to companies every year. Planning and executing effective data quality treatment strategies is critical to building trust with management and most importantly customers.
Rajiv : Without doubt. I think I should revisit my report and try again with this knowledge.
Sharon: Great job Rajiv!
Before we conclude, a short note on one more type of data quality issue which is important but is little subjective to decide :
- Analyst friendliness
Rajiv: What does this mean?
Sharon: Let's take an example : Check the products table snapshot here :
You as Data Analyst have been asked to find out the count of products, by size. How will you approach this question?
Rajiv: Well I have no clue how to approach this question, Sharon!
Sharon: Checking the dataset carefully, you will find the sizes column having values as comma separated values. You would ideally like to use the sizes column in rows and drop product_id column in Values in a pivot. But, you can't do it directly because the data is not in a format friendly to your needs as a Data Analyst.
What if somehow we got the data unpivoted on the sizes as shown below (only for 2nd record)?
Keeping data in this way would surely make your life easy. That's what we call Analyst Friendliness of data.
Rajiv: How do you look for this kind of issue?
Sharon: The only way of doing this would be to check with business for the kind of insights they are looking for data and then checking with the datasets to see if there are column(s) which are making it difficult to perform analyst friendly analysis.
Rajiv: That's a wonderful learning Sharon. Thank you for all your help.