Databricks Data Engineer Skill Path
End-to-end industry focused skill path to help gain essential skills of a Databricks Data Engineer
Pre-Requisites
Key Highlights
✅ Big Data analysis on datasets larger than 7 GB
✅ Aligned to Databricks Data Engineer Associate Certification
✅ SQL and Python optimization for efficient data analysis
Skill Path
A Practical Sense to SQL Query Optimization
This module dives deep into how to write queries in the most optimal way which would save compute resources and cost for an organization
Advanced Data wrangling and Python code optimization
This module dives deep into writing a modular code for challenging data tasks and working with complex data structures like JSON
Big Data Foundations
This module introduces you to the world of big data storage and processing and why using monolithic systems is a bad design for big data
Significance of Cloud
This module introduces you to the need for using the cloud and how it solves the modern-day data problems in the industry
Working with Object storage on Cloud
This module gives you a practical sense of data lakes, and when and where to use them. It also introduces you to the work of cloud SDK and how to use it programmatically Ingest the data from Data-Lakes
Databases connectivity on Cloud
This module gives you to various databases on the cloud and how to connect to them This module also deep dives into the ingestion of data from these databases
Need for ETL for Organizations
This module dives deep into the need for ETL Pipelines
Introduction to Databricks
This Module details the need for a Unified Analytics platform like Databricks and how to utilize it to tackle Data + AI challenges. In this Module, we will look into Databricks architecture and how it can be created in Azure Databricks. We will also understand different types of clusters needed for various Analytical workloads
Data Analysis using PySpark on Databricks Part 1
This Module dives deep into the analysis of data using PySpark
Data Analysis using PySpark on Databricks Part 2
This Module dives deep into the analysis of data using Spark SQL
Data Analysis on Databricks - Practice Set
Data Harmonization using Databricks
This is an industry Inspired Project to harmonize upstream heterogeneous systems
Introduction to Delta Lake in Databricks
This module details the need for Delta Lake and its advantages over Data Lake and a Data warehouse. Additionally, we will also look into how to create and work with data in a Delta Lake
Advanced Delta Lake Operation in Databricks
This module dives deep into design of delta lake for the industry and optimizing the serving layer for efficient queries
Building Batch ETL pipeline in Databricks using Medallion Architecture
This is an industry-inspired project
Structured Streaming in Databricks
This module gives an in-depth exploration of processing live data streams. building scalable streaming applications and integration with various data sources.
Efficient Streaming Pipelines using Auto Loaders in Databricks
This module on Auto Loaders in Databricks dive deep into reducing manual dependencies and increasing efficiency of a streaming pipeline. This module also talks about the implementation of scalable and fault tolerant streaming pipelines using Auto Loaders
Declarative ETL using Delta Live Tables in Databricks
This modules is all about how to accelerate the ETL process by building scalable data pipelines using Delta Live Tables in Databricks
Data Quality Tests using Delta Live Tables in Databricks
This module helps you to understand how Delta Live Tables enable automated data quality checks to ensure accurate, reliable data.
ETL Pipeline orchestration using Workflows in Databricks
This module details the need for orchestration of pipelines to build data pipelines at scale. This module also details out the use of workflows in Databricks for efficient orchestration of pipelines
Introduction to Data Governance using Unity Catalog in Databricks
Cost Optimization with Spark Performance tuning in Databricks
Certification Bundle
This module will allow you to be battle-ready for taking the Databricks Data Engineer Associate certification
Capstone Project
This is the end-to-end project covering having skills from across the path. You can choose and work with projects from catalog