Analyze Walmart's retail sales data end-to-end: from cleaning raw data using Python to answering business-critical questions using MySQL. This project provides actionable insights into product performance, sales volume trends, customer ratings, payment behavior, and branch efficiency.
This is a hands-on SQL + Python analytics project using real-world Walmart data.
- Data is cleaned and transformed using Python (pandas)
- Key fields like sales volume, customer behavior, and profitability margins are standardized
- Cleaned data is stored as
walmart_clean.csv - SQL queries (MySQL) explore retail KPIs, customer trends, and branch performance
- Python (pandas) – Data cleaning and preprocessing
- MySQL – SQL-based analytics and business intelligence
- Jupyter Notebook – For step-by-step Python execution
- Kaggle Dataset – Walmart Sales Dataset
git clone https://github.com/your-username/walmart-sales-analysis.git
cd walmart-sales-analysispip install pandasRun the provided Jupyter Notebook:
walmart_data_cleaning.ipynbThis notebook performs data cleaning steps — handling missing values, correcting types, removing duplicates — and exports a clean version of the dataset aswalmart_clean.csv.
Create the table schema in MySQL:
CREATE TABLE walmart_clean (
invoice_id VARCHAR(20),
branch VARCHAR(10),
city VARCHAR(50),
category VARCHAR(50),
unit_price FLOAT,
quantity INT,
date DATE,
time TIME,
payment_method VARCHAR(20),
rating FLOAT,
profit_margin FLOAT
);Then import the walmart_clean.csv file using MySQL Workbench or your preferred import tool.
This project answers real-world analytics questions relevant to Walmart and retail operations:
- 📦 What are the top-selling categories by units sold?
- 🏪 Which categories perform best at each branch?
- 🕒 What are the busiest times of day and days of the week?
- ⭐ Do higher-rated categories correlate with higher sales?
- 💰 Which branches are most efficient in terms of sales volume and profit margin?
- 💳 What are the most popular payment methods and average basket size?
⚠️ Which categories are underperforming in both units and customer ratings?- 📆 What are the monthly sales trends (seasonality)?
- 🧾 What is the average basket size per branch?
- 🔁 How are unit sales changing year-over-year by branch?
- 🔝 Food & Beverages consistently leads in total units sold.
- 🕒 Sales peak during evenings and weekends, especially in Branch B.
- ⭐ Higher-rated categories tend to sell more units, showing a positive link between customer satisfaction and demand.
- 💳 Ewallet is the most preferred payment method across branches.
⚠️ Fashion Accessories shows both low units and low ratings — a clear candidate for product review or promotions.- 📆 Branch C experienced a year-over-year decline in unit sales, highlighting areas needing operational focus.
📦 walmart-sales-analysis/
├── walmart_data_cleaning.ipynb # Python notebook for cleaning and exporting data
├── walmart_clean.csv # Cleaned dataset exported from notebook
├── queries.sql # MySQL queries answering business questions
└── README.md # This documentation
Contributions are welcome! Fork the repo, make your changes, and open a pull request.