Set up a GitHub repo, create a Python virtual environment, generate 260k+ rows of realistic e-commerce data, and configure your project files.
Create an Azure account and resource group, configure Data Lake with medallion containers, spin up a Databricks cluster, upload sample data, and secure connections.
Mount Azure Data Lake Storage (ADLS) in Databricks, build Bronze tables, define schemas, add metadata, run quality checks, and partition data. Philosophy: store raw data as-is while preserving lineage.
Install DBT, build clean Silver models, implement quality tests, document lineage, and standardize formats. Philosophy: clean, conform, validate.
Create aggregated Gold tables, build KPI metrics, run RFM and cohort analysis, and design dashboards. Philosophy: business-ready data for direct reporting.
Use Great Expectations for validation, set up automated tests, create scorecards, and add anomaly detection. Philosophy: trust but verify.
Track experiments with MLflow, build segmentation and churn models, add recommendations, and monitor predictions. Philosophy: start simple, iterate fast, deploy confidently.
Automate jobs with Databricks, orchestrate workflows, add monitoring dashboards, and set up alerts. Philosophy: automate everything, alert intelligently.
Implement GitHub Actions, automated tests, environment management, and deployment pipelines. Philosophy: test early, deploy with confidence.
Write a README, add diagrams, polish docs, prep resume bullets, and build a final showcase.