Skip to content

STEFANOVIVAS/azure-databricks-project

Repository files navigation

Azure Formula1 project

Azure Formula1 project is an implementation of the data pipeline which consumes data from the Ergast API and makes F1 drivers/constructors standings available for Business Intelligence consumption. The pipeline infrastructure was built using Microsoft Azure as a backbone with ADLS Gen 2 as Datalake, Databricks/Spark as a data transformation framework, and Data Factory as an orchestrator.

Table of contents

Data Architecture Diagram

Ergast API Table Schema

How it works

Data Project Overview

These are the files from the Ergast API and the respective file formats that are used in this project, so different approaches are needed from the spark API to read each type of file.

File Name Format
Races CSV
Constructors Single Line JSON
Drivers Single Line Nested JSON
Results Single Line JSON
Pit Stops Multi Line JSON
Lap Times Split CSV Files
Qualifying Split Multi Line JSON Files

Data Ingestion Requirements

  • Ingest all 8 files into the data lake.
  • Ingested Data must have the schema applied.
  • Ingested Data must have audit columns.
  • Ingested Data must be stored in columnar format.
  • Must be able to analyze ingested data via SQL.
  • Ingestion logic must be able to handle the incremental load.
  • Join the key information required for reporting to create a new table.
  • Join the key information required for analysis to create a new table.
  • Transformed tables must have audit columns.

Reporting requirements

  • Driver standings for each year.
  • Constructor Standings for each year.

Analysis Requirements

  • Most dominant drivers over the years.
  • Most dominant Constructors over the years.
  • Visualize the outputs.

Prerequisites

  • Microsoft Azure account
  • Azure Databricks Service
  • Azure Data Factory

About

Formula one data engineer project with Azure, databricks and Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages