Forecasting demand for bus journeys 15 days in advance using historical booking data, search trends, and enriched holiday features.
This repository contains my top-performing solutions for the RedBus DataDecode Hackathon 2025, hosted by Analytics Vidhya in collaboration with redBus. The challenge was to forecast route-level seat demand for a specific date of journey (DOJ), 15 days in advance, using historical booking and search data.
📈 Final Rank: 21
🏆 Recognized for analytical rigor and creative feature engineering.
Predict total seats booked for each route on a specific date of journey, using data available 15 days in advance.
👉 Click to visit the detailed problem statement: RedBus DataDecode Hackathon 2025
- Impact of holidays, weekends, school calendars, and wedding seasons.
- Regional variability in holiday effects.
- Day-of-week and temporal search trends.
- Noise in user search behavior and booking delays.
redBus-DataDecode_rank21_Sol/
├── BestSol/
│ ├── 15.06.25_v4.ipynb <- Final best performing solution
│ ├── submission.csv <- Corresponding submission file
│ ├── LB Standings .png <- Proof of leaderboard rank
├── AnotherSol/
│ ├── 22.06.25_v1.ipynb <- Alternate solution with holiday enrichment
│ ├── submission_with_holidays.csv
│ ├── holiday_dates.csv <- Holiday data scraped manually (2023–2025)
│
│── Data/
│ ├── train.zip <- train data
│ ├── test.csv <- test data
│
├── Model Architecture.png <- Model Block Diagram
└── README.md <- This file
Visual pipeline of the solution:
- 🧠 ML Models: LightGBM, XGBoost (with fine-tuned hyperparameters) and Ridge (meta-learner)
- 📅 Temporal Features: DOJ & DOI-based time deltas, week/day flags
- 📊 Feature Engineering:
- Route frequency statistics
- Search-to-booking ratios
- Holiday influence (via custom calendar CSV)
- Long weekend detection
- 🔍 External Data: Manually curated holiday calendar (India-wide)
- 📦 Tools: Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib
| Solution | Description | Notes |
|---|---|---|
| ✅ BestSol | Final best solution with tuned features and minimal leakage | Notebook · CSV |
| ✅ AnotherSol | Used enriched holiday features from custom calendar | Notebook · CSV · Holidays CSV |
Both models were designed with strict 15-day-ahead forecasting logic, avoiding future leakage and optimizing for RMSE.
RMSE (Root Mean Squared Error)
Used to evaluate model predictions against actual seats booked.
- Top 25 Rank (Rank 21) among 694 teams.
- Uploaded on GitHub to share reproducible research.
- redBus and Analytics Vidhya for hosting the challenge.
- Public sources for Indian holiday data (e.g., digitalpathsala.com, official state calendars).
- Community on GitHub and AV for discussions and guidance.
If you find this work helpful or have any questions, feel free to reach out:
