This project demonstrates a healthcare data pipeline built on Azure using Data Factory, Data Lake, Databricks, and Azure SQL.
- Azure Data Factory (Orchestration)
- Azure Data Lake Storage (Raw & Processed Data)
- Azure Databricks (Transformation/Analytics)
- Azure SQL Database (Reporting Layer)
- Infrastructure as Code (Bicep)
- Python (ETL scripts, Databricks jobs)
pipelines/
– Azure Data Factory pipeline definitionsrc/
– ETL Python scriptsinfra/
– Bicep templates for infrastructure deploymentnotebooks/
– Databricks notebooks for analysisconfig/
– Pipeline configurationtests/
– Unit tests
- Deploy infrastructure:
az deployment sub create --location <region> --template-file infra/main.bicep --parameters infra/parameters.json
- Configure Azure Data Factory using
pipelines/adf_pipeline.json
. - Run ETL scripts locally or as Databricks jobs.
- Explore and analyze data using provided notebooks.