Miniproject 1

Introduction

In this project I was assigned to create a model that will predict the fuel consumption of a cruise ship base on data from several sensors. Data were provided in CSV files of raw sensor outputs. Units of fuel consumption are left open.

Data preprocessing

Each CSV file contains two columns: time and value. Time is in .NET DateTime.Ticks format and needs to be converted to human readable format in order to properly aggregate the data from all sensors.

Defining the taarget variable

To determine the optimal time window to predict the fuel consumption, I aggregated records by day and counted the number of records. That gave me 54 days with records from all sensors. With minimul number of records per day for one sensor was 5894. I chose the window for predicting the consumption to be 1 hour. I will consider the fuel density and fuel volumetri flow to be constant.

Feature extraction

Most of the features will be represented as mean values during the one hour time window. For GPS data it would not make much sense, therefore the GPS data will be represented by minimum and maximum values on set interval. Also the GPS values will need to be converted to numerice values. The signes will be added acording to the following table.

Direction	Sign
North	+
South	-
East	+
West	-

Joining the fuel consumption table with data from sensors gives final dataset with 682 data points.

Prepairing data for training models

I decide not to shuffle the data before spliting. Keeping subsets chronological helps to keep validation and testing metrics more relevant towards use on new data. Trainning subset cosints of 477 data points. Validation and testing subset consist of 102 and 103 data points respectively.

Standardization

Since the first model to train will be ridge regression, I will use feature standardization. I experimented with "MinMax" normalisation and standardisation. Each benefiting different model.

Model selection and training

I decided to use RMSE as the evaluation metric. The R2 score was also used for it's interpretation benefits.

Ridge regression

Ridge regression was chosen as base line model, since linear model is the simpliest one. Ridge regression also utilises the regularization parameter on top of that. The best obtained validation RMSE and R2 were 0.3257 and 0.7459 respectively. This result was obtained using regularisation parameter alpha of 15.3878.

SVM

SVM was chosen as the second model for it capabilities of explorign non-linear relations. The best obtained validation RMSE was 0.1261 and validation R2 was 0.9617. This result was obtained using regularisation parameter C of 9.7667.

This model was the better out of these two. Therefore it was tested on test dataset. The obtained RMSE was 0.1613 and R2 was 0.9080.

Conclusion

The best model was SVM regresion model. R2 score of 0.9080 is quite good. Althougth it can be expected that the model performace will drop with newer data. Possible options to further improve model performance are dimensionality reduction and adding distance between starting and ending GPS coordinates in time window. The dimensionality reduction could be done using PCA as the features were not filtered and some of them are correlated for sure (e.g. trackDegreeMagnetic and trackDegreeTrue).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
notebooks		notebooks
pictures		pictures
processed_data		processed_data
rawdata		rawdata
.gitignore		.gitignore
MP1_ML25.pdf		MP1_ML25.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Miniproject 1

Introduction

Data preprocessing

Defining the taarget variable

Feature extraction

Prepairing data for training models

Standardization

Model selection and training

Ridge regression

SVM

Conclusion

About

Uh oh!

Releases

Packages

Languages

Ton201/ML_Miniproject1

Folders and files

Latest commit

History

Repository files navigation

Miniproject 1

Introduction

Data preprocessing

Defining the taarget variable

Feature extraction

Prepairing data for training models

Standardization

Model selection and training

Ridge regression

SVM

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages