Data Harmonization and ETL validation using Databricks on Azure with lab
Learning objective
- Clean Source Files dynamically
- Create Business metrics
- Validate the metrics generation process
Overview
Harmonizing data and Validating ETL using a unified analytics platform on Azure
Story
PrimerInsurance & Co is a Popular Insurance brand in India and other countries, with subscribers from more than 400 cities. They are currently very popular for their insurance plans in Auto. Their plans range from individual to group insurance. On a meeting by Senior management where different CXOs and CFOs met to discuss the future strategies of the company. One problem that each of them foresaw is the stagnation of the YOY growth of the company. Hence they have also started second cars business where they buy and sell the used cars
PrimerInsurance is an Insurance company facing a significant challenge in managing and analyzing the large amount of data generated from its application and website. The data comes from various sources such as Cars, Customers, Policies, Sales, and Claims of cars.
PrimerInsurance data team has been using multiple data processing and analysis tools such as Excel, SQL, Hadoop, and Jupyter Notebooks to manage this data effectively. However, they have realized that working with these different tools has become a bottleneck in their data processing and analysis workflows, leading to decision-making delays. This is also leading to ineffectiveness in collaboration between the data team
They also have data coming from multiple sources and this data is not having the same information. Hence they have an intense requirement of cleaning, harmonizing, and standardizing the data.
To address these challenges, PrimerInsurance is looking for a unified analytics platform having advanced analytics capabilities that integrates all of its data processing and analysis tools into a single platform. The platform will provide a single interface by replacing all of the existing tools and will allow the data teams to work more efficiently.
With a unified platform, the downstream teams can easily scale up or down to meet changing data requirements without having to switch between multiple tools or platforms.
Additionally, the platform will integrate data from different sources into a single location, making it easier to analyze and generate insights.
You are tasked with the responsibility of Identifying the unified analytics platform and building a POC for tackling the needs of PrimerInsurance & Co