This README outlines the steps implemented in the Python script and the tools used for the analysis in this project.
The project analyzes TransBorder Freight Data using the CRISP-DM framework. The analysis involves data cleaning, exploration, transformation, and visualization of freight transportation data.
The following tools and libraries were used in this project:
- Python: Primary programming language for the analysis.
- Pandas: For data manipulation and cleaning.
- NumPy: For numerical computations.
- Matplotlib and Seaborn: For data visualization.
- Jupyter Notebook: To organize and execute the Python code interactively.
- Import necessary libraries.
- Load monthly TransBorder Freight datasets into Pandas DataFrames.
- Merge these monthly datasets into yearly datasets for efficient analysis.
- Handle missing values by either filling or removing them based on context.
- Rename columns for consistency and readability.
- Filter out irrelevant data entries.
- Perform type conversions (e.g., converting columns to appropriate data types).
- Create additional features or columns that provide meaningful insights (e.g., aggregating freight data by region).
- Generate summary statistics (mean, median, etc.) for key variables.
- Create visualizations to understand trends and patterns, such as:
- Line plots for freight trends over time.
- Bar charts for top trading partners.
- Heatmaps for correlation analysis.
- Analyze the visualizations and statistical outputs to identify trends and anomalies.
- Summarize key findings in text and visual form.
- Save cleaned and transformed datasets to new CSV files.
- Export generated visualizations as image files for reporting.
- Ensure you have Python installed on your system (version 3.7 or higher).
- Install the required Python libraries using pip:
pip install pandas numpy matplotlib seaborn
- Open the Jupyter Notebook file in your preferred editor (e.g., JupyterLab, VSCode).
- Execute each cell sequentially to perform the analysis.
- Review the outputs, including visualizations and exported files.
- Ensure all raw datasets are stored in the appropriate directory as specified in the script.
- Modify file paths and other configurations based on your system setup.
For questions or issues, please reach out.