flight-price-prediction
SDAIA Bootcamp project 2 - web scraping/linear regression.
This project aims to predict ticket prices for upcoming flights to help customers in selecting the optimum time for travel and the cheapest flight to the desired destination. A random forest regression model is applied to forecast the flight prices based on data scraped from Kayak.
Table of Contents
- Proposal
- MVP
- Scraping
- Analysis and Results
- Presentation
- Mobile App
- Authors
Project Proposal
The project proposal can be found here.
Project MVP
The project MVP can be found here.
Scraping
The Kayak Scraper Notebook can be found here.
Here's a demo of the scraper in action (played at 2x speed):
The scraped data can be found here.
In total, the data consists of 55,363 rows and 7 columns.
Analysis and Results
The project notebook can be found here.
Selected features are:
- Source (4 Sources were selected for this project)
- Destination (4 Destinations were selected for this project)
- Total Stops
- Average Price per Airline
- Duration
- Price (Target)
Correlation of features:
Experimenting with different models:
The final selected model is the random forest regression model with:
Metric |
Score |
MAE |
61.87 |
MSE |
40409.87 |
RMSE |
201.02 |
Therefore, the final model is able to predict flight ticket prices within around ≈ $61.87.
The final model can be found here.
Presentation
The presentation can be found here.
Mobile App
We've also developed an app on Android that finds the average estimated prices for a selected route and month based on our scraped data.
Below, a demo of the mobile app is shown:
Authors
-
Meshal Alamr
-
Norah Alkhalifah