Enqurious logo
Liked the scenario! Give it a shot for free
Get access

Before You Start

  • You can access the datasets from the following location
AccountName: "xxxx"
ContainerName: "xxxx"
accountkey = "xxxx"

 

  • Databricks-Pyspark is needed to answer the questions that a business stakeholder asks. To use data bricks Pyspark environment and interact with the data in it, watch the following video.
  • Once you have created the workspace you can use the following code to mount data and create spark data frames as shown in the video
container = "xx"
storage_account_name = "xx"
storage_account_key = "xxx"

## You can have your own workspace Name
dbutils.fs.mount(
source = "xxx",
   extra_configs = {"xxxx".format(storage_account_name): storage_account_key})

Now that you have access to the data frames of your datasets it's time to answer some of the business stakeholder questions.

 

NOTE: Ensure you only write Pyspark and nothing else. Any other submission will be deemed invalid

Answer the following stakeholder questions

  • Write the code to observe and detect data quality issues in customer files. If any data quality issue is detected add code to clean it as well.
  • Write code to calculate the Recency and Age of purchases of the customers
  • Write code to calculate the frequency of purchases of the customer
  • Write code to find the Average transaction value of a customer
  • Write code to find CLV by considering profit margin as 30% and retention period same as the age of the customer since first purchase(Reference)
  • Write code to find the average number of days between each purchase of a customer
  • Provide a count of repeat customers by store
  • Identify Stores having max transaction value for each product
    • Ex: SKU-57643 is highest in 5380   store
  • Which Month has the highest number of transactions for each product?
  • Assume that SKU-17941 is having a special promotional offer announced by WeDistro and they are expecting a lot of purchases from customers. Hence We distro has mandated all stores to maintain at least 600 units of the specific product. Considering this scenario and looking at the latest inventory data available to you, which stores have to order for how many units (Ensure you look at lead times and current stocks at various places for a specific store)
  • Identify stores having the shortest lead time from order to availability of in-store inventory for each product.
Upload your code notebooks that has answers to all the above questions