Rajiv's SQL skills got well known among the Management of GlobalMart. Robert, the Customer Success Team Head wanted to see how SQL fares better than Excel. So, he dropped a mail to Rajiv with some requests. Let's check his mail.
Hi Rajiv,
Got to know you have picked up SQL skills off late. I'm so happy to see your growth as a Data Professional. I've heard SQL works much faster and better compared to Excel. I have been receiving a report on a weekly basis but the work has traditionally been done solely in Excel. I found the process to be slow and time consuming. I was wondering if you could redo the same using SQL?
I am looking for the customer details report as shown below :
Make a note of the following requirements :
- Need First Name and Last Name separated as shown
- Print customer names in format as follows : If name is Alan Barnes then print A. Barnes
- Also print customer names in format as follows : If name is Alan Barnes then print Barnes A.
- Need State names in uppercase
- I've observed whitespace and spelling issues in state names. Please ensure the state names show correctly in the report
- All contact numbers starting with +1 should be converted to purely number format as shown above. Ex. If contact number is +1-582-300-5006, it should be printed as 5823005006. Leave any other formats as it is.
Looking forward to see what SQL can do.
Regards,
Robert
Rajiv on one side was happy to see his skills getting appreciated by senior management. However, he's in a fix now.
The task looked easily achievable in Excel. He has worked on this report before. However, knowing the capabilities of SQL, he would definitely like to give it a try.
The only problem is, He doesn't know how to tackle this in SQL. So, he reached out to his mentor Sharon for help.
Let's spend some time going through their conversation and figure out what skills Rajiv would need to solve this new challenge.
Rajiv : Hi Sharon, I'm glad that you introduced me to SQL. I'm getting noticed by management. But, that also brings new challenges for me. Robert has sent across a request for a report which I could easily do in Excel. However, with SQL in hand, I too feel it can be done much faster. Just that I'll need a bit of your support. I forwarded Robert's mail to you yesterday. Hope you could check it.
Sharon : Yes, I did. The good news is, you don't have to learn anything new to approach this task as long as your logic is strong enough. The functions more or less remain the same. You just need to string them together to achieve your goals. Let's first quickly see popular text manipulation functions which can help you. Check the hint below :
Rajiv : Damn! These look exactly the same as what I did in Excel
Sharon : Didn't I tell you? It's just like the language to express yourself changed from A to B. The logic isn't going anywhere
Rajiv : I'd definitely like to give it a try now. Here's my approach :
- first_name : Fetch position of whitespace and extract text from beginning till one position before whitespace
- last_name : Fetch position of whitespace and extract text from position after whitespace till the end
- name_format_1 : After extracting first name and last name, I can extract first character from the last name. But then, no idea how to combine them?
- name_format_2 : Seems like same way as name_format_1 but reversed in the end
- state : I think we can use TRIM() and REPLACE()
- contact_number : This one's a bit tricky. Will need more time to think
Sharon : Great approaches Rajiv. If you don't know how to combine multiple texts, all you have to do is to ask Google.
Check the below snapshot : Try the same and experiment with both CONCAT() function and the + operator. Remember, all syntax we are learning here applies to SQL Server. While the concepts may remain the same, the exact functions/operators may differ for other SQL providers like MySQL or Oracle. You may need to do some googling to get to the right results.
Rajiv : Ahh. How could I forget Google? Yes. Let me try creating columns one by one
Rajiv : Hi Sharon, Here's what I tried for fetching first_name
-- Step 1: Find the position of white space in the name
SELECT customer_name,
charindex(' ',customer_name) AS pos
FROM mskl_customers
-- Step 2 : Extract the characters from the start to the position 1 less than the position of ' '
SELECT customer_name,
LEFT(customer_name, CHARINDEX(' ',customer_name,1)-1) AS first_name
FROM mskl_customers
Sharon : Bingo! Few suggestions. The function CHARINDEX() also has a 3rd parameter which decides where you would like to start for searching a character
-- Consider the text : ORD-101-US
-- To extract 101, we'd like to find position of 1st -
-- Option 1 :
SELECT CHARINDEX('-','ORD-101-US') AS output
-- Option 2 :
SELECT CHARINDEX('-','ORD-101-US',1) AS output -- Here 1 is redundant
-- To extract US, we'd like to find the position of 2nd -
SELECT CHARINDEX('-','ORD-101-US',5) AS output
-- the above will start looking for '-' from character position 5
Also, I'd like you to explore the function SUBSTRING() and see if you can fetch the same result for first_name.
Rajiv : Sure, Let me try
Sharon : Hi Rajiv, have you tried fetching the 2nd column, last_name?
Rajiv : I am trying. Have written the below code till now :
-- Step 1 : fetch position of ' ' character
SELECT customer_name,
SUBSTRING(customer_name,1,CHARINDEX(' ',customer_name,1)-1) AS position
FROM mskl_customers
-- Step 2 : Start fetching from 1 position after ' ' character till the end
SELECT customer_name,
SUBSTRING(customer_name,CHARINDEX(' ',customer_name,1)+1,________) AS first_name
FROM mskl_customers
I am struggling in the part where I've given ____________. I can't give a fixed length there.
Sharon : Think! If you take total length of the name and subtract position of whitespace (which incidentally is same as the length of the name till the whitespace), you'll get the length of the remaining part of the name
Rajiv : Ahh! Got it. Here's my modified code :
-- Step 1 : fetch position of ' ' character
select customer_name,
substring(customer_name,1,charindex(' ',customer_name,1)-1) as position
from mskl_customers
-- Step 2 : Start fetching from 1 position after ' ' character till the end
select customer_name,
substring(customer_name,charindex(' ',customer_name,1)+1, len(customer_name)-charindex(' ',customer_name,1)) as last_name
from mskl_customers
Sharon : This is damn good! I'm sure you'll be able to write the code for columns name_format_1 and name_format_2
Rajiv : I'd surely want to try and get back. Do check my code below in a while
Sharon : Rajiv, you are doing really well with text processing. I don't think fixing issues in State column should be a problem for you now.
Rajiv : I tried below code. Do you think it is good?
select
case when trim(state) = 'calfornia' then upper('California')
when trim(state) = 'masachusets' then upper('massachusetts')
else upper(trim(state)) end as state
from mskl_customers
Sharon : Yes, very much. How did you know only California and Massachusetts have spelling issues?
Rajiv : I used below query to get unique list of states and did a visual check
select distinct state
from mskl_customers
order by state
Sharon : Good one!
Sharon : Now comes the final one : contact_number
Rajiv : Here the problem is the varying formats in which phone numbers are stored. This is confusing.
Sharon : That's how real world is. You may not always expect very neat and clean data. First figure out what different formats exist?
Rajiv : On visual check, I discovered the following different formats :
- 582-262-8228
- +1 206-431-9519
- 05822627759
- 582 262 4172
Sharon : But why worry about this? What's your ask?
Rajiv : This one.
All contact numbers starting with +1 should be converted to purely number format as shown above. Ex. If contact number is +1-582-300-5006, it should be printed as 5823005006. Leave any other formats as it is.
Sharon : So, let's only focus on fixing this then.
Rajiv : All right. This is what I have in mind. Check for a given phone number if it starts with +. If yes, then remove first 3 characters and then replace ‘-’ in the remaining text with ‘’
Sharon : That's it! You are done.
Rajiv : That gives me more confidence now. Let me try the code and get back to you.