Design and Implement Reliable ETL pipeline for WePlay Sports using Databricks
Learning objective
- Design and Implement Data Storage
- Design and Implement Data Processing solution
- Design and Implement Data Governance solution
Overview
This Project will help in validation of skills in working and building reliable pipeline solution on cloud
Story
WePlay is a Sports Analytics company that provides data solutions to a lot of their clients. The clientele of WePlay is typically Sports clubs across the world. They offer data-driven strategies to these sports clubs. They have recently started providing similar solutions to IPL teams that help the team manager and team captain decide on a strategy for the upcoming season. These clubs would like to see the power of the solutions before implementing full length.
Congratulations You and your team have been assigned to build a POC for WePlay. Demonstrating a quality outcome that will help your firm win a very high-value contract from IPL clubs
- WePlay gathered a lot of data from past matches as a data dump and made it available to you
- WePlay recently has also started monitoring real-time match data across their network such that they can give real-time insights to change the play on the go
- The structure and schema of the data across sources vary and hence have to be clearly investigated clearly
- WePlay Management asked you to build a reliable and scalable data solution and demonstrate a POC (proof of concept) before moving on to a full-length pipeline implementation.
- WePlay insisted that they want a cost-optimal solution without a lot of technology overhead and if possible a maximum use of unified platform solutions
- They have also highlighted that they need all the data assets discoverable, easy to access, have fine access control, and be highly secure.