Enqurious logo
Back to scenario

Advanced Data Analysis using Pyspark

general
pyspark
pyspark
dataframe-processing

Learning objective

  • To be able to launch Databricks workspace
  • To be able to attach a cluster and spin up a new Notebook
  • To be able to mount data in DBFS to prepare for data analysis
  • Showcase Pyspark skills to analyze data and offer relevant business insights

Overview

This scenario would challenge you to demonstrate beginner level proficiency with PySpark on Databricks platform

Story

Globalmart is a leading electronics retailer with a diverse range of products and brands. Recently Globalmart has seen an uptick in their sales with a boom of online purchases. Globalmart Senior management would like to leverage the data  for making informed decisions.

 

As a member of Globalmart'’s Data Engineering team, you are tasked with utilizing the capabilities of PySpark to analyze large datasets efficiently. Your goal is to develop a comprehensive report that includes following questions

 

  • Classifying and Identification of Customers to run promotions
  • Track the major drivers for product returns
  • Monthly sales report by product category

 

The Ad-hoc queries requested by the stakeholders will allow them to make data driven decisions and its your task to analyze the data and answer such questions

 

Here's a sneak peek into the learning experience you are going to have by working in this scenario :

 

Note : You'd need to sign up for a free account on Databricks community. Check here for the required steps