Cab-Fare-Prediction
GitHub folder contains:
- R code of project in ‘.R format’: Cab Fare Prediction Using R.R
- Python code of project in ‘.ipynb format’: Cab Fare Prediction Using Python.ipynb
- Project report: Cab Fare Prediction.pdf
- Problem Statement.pdf
- Saved Model trained on entire training dataset from python: cab_fare_xgboost_model.rar
- Saved Model trained on entire training dataset from python: final_Xgboost_model_using_R.rar
- Predictions on test dataset in csv format:predictions_xgboost.csv
Problem Statement
The objective of this Project is to Predict Cab Fare amount based upon following data attributes in the dataset are as follows:
pickup_datetime - timestamp value indicating when the cab ride started.
pickup_longitude - float for longitude coordinate of where the cab ride started.
pickup_latitude - float for latitude coordinate of where the cab ride started.
dropoff_longitude - float for longitude coordinate of where the cab ride ended.
dropoff_latitude - float for latitude coordinate of where the cab ride ended.
passenger_count - an integer indicating the number of passengers in the cab ride.
It is a regression Problem.
All the steps implemented in this project
- Data Pre-processing.
- Data Visualization.
- Outlier Analysis.
- Missing value Analysis.
- Feature Selection.
- Correlation analysis.
- Chi-Square test.
- Analysis of Variance(Anova) Test
- Multicollinearity Test.
- Feature Scaling.
- Splitting into Train and Validation Dataset.
- Hyperparameter Optimization.
- Model Development
I. Linear Regression
II. Ridge Regression
III. Lasso Regression
IV. Decision Tree
V. Random Forest
- Improve Accuracy
a) Algorithm Tuning
b) Ensembles------XGBOOST For Regression
Finalize Model
a) Predictions on validation dataset
b) Create standalone model on entire training dataset
c) Save model for later use
- Python Code