🛫 Forecast of Flight Prices at India 🛬
In this project a complete process of data analysis is carried out, covering logically from a good data cleaning, to a good exploratory analysis, to finally create a machine learning model with which predictions can be made of what a flight ticket will cost according to the characteristics of the flight.
The project actually implements three different models: RandomForest, XGBoost and CatBoost, and compares their performance in addition to making a slight optimization of hyperparameters.
The ultimate objective is to choose the model that has shown the best performance and implement a prediction with a small data set not known by the model, which are then validated to verify that they are reasonable according to the characteristics of input data, and are checked against similar historical data.
🧩 Features
- Data cleaning
- Exploratory Data Analysis (EDA)
- Data Visualization
- Feature Engineer
- Machine Learning Models: RandomForest, XGBoost, CatBoost
- Hyperparameter Optimization
- Forecasting and Validation
⁉️ Analized Variables
- Airline
- Origin and destination
- Date and hour of the flight
- Class (Economy/Business)
- Number of stops
- Flight duration in minutes
🤯 Employed Technologies
- Python 3.13
- Pandas
- NumPy
- Scikit-learn
- XGBoost
- CatBoost
- Matplotlib
- Seaborn
📚 Project's Structure
- Problem Definition
- Libraries' Import and Data Load
- Data Quality Report
- Exploratory Data Analysis
- Correlations Analysis
- Needed Transformations
- Choice and creation of models
- Evaluation Metrics
- Hyperparameter Grid Optimization
- Example of Price Prediction
- Final Conclusion
📊 Some Visualizations
Project information
- Category: Data Analysis | Machine Learning
- Tools: Python Pandas Seaborn Scikit-Learn
- Project Date: August, 2025
- Project URL: https://github.com/CarlosACalvo/Prediccion_Precios_Vuelo
🏅 Model Comparison
| Model | MSE | R² | MAE |
|---|---|---|---|
| Random Forest | 1.30e+07 | 0.975 | 2,200.99 |
| XGBoost | 1.69e+07 | 0.967 | 2,733.70 |
| CatBoost | 9.86e+06 | 0.981 | 1,788.83 |
The CatBoost model got the best performance, with:
- Least Mean Squared Error (MSE): 9.86e+06
- Greater Determination Coefficient (R²): 0.981
- Least Mean Absolut Error (MAE): 1,788.83
🧙🏻♂️ Predictions Examples
| Path | Airline | Class | Predicted Price |
|---|---|---|---|
| Delhi_Mumbai | Vistara | Business | ₹30,558.62 |
| Mumbai_Kolkata | Air India | Economy | ₹4,464.27 |
| Kolkata_Delhi | SpiceJet | Economy | ₹5,988.87 |
💵 Analysis of the reasonableness of predicted prices
🛫 Validating flight: Delhi_Mumbai - Vistara - Business - Mañana - ₹30558.62
- Historical minimal price: ₹7,153.00
- Historical maximum price: ₹90,281.00
- Historical mean price: ₹48,902.89
🛫 Validating flight: Mumbai_Kolkata - Air India - Economy - Tarde - ₹4464.27
- Historical minimal price: ₹4,457.00
- Historical maximum price: ₹21,273.00
- Historical mean price: ₹7,573.19
🛫 Validating flight: Kolkata_Delhi - SpiceJet - Economy - Noche - ₹5988.87
- Historical minimal price: ₹3,999.00
- Historical maximum price: ₹22,327.00
- Historical mean price: ₹6,086.34
➡️It is concluded that all predicted prices are reasonable as they are within the historical range and are lower than the average price, which is expected for a flight with these characteristics. ✅