Skip to content

Long term prediction

Pony Biam! edited this page May 18, 2020 · 10 revisions

A simple long term predictive LightGBM model can be found in this notebook. The model was trained with one year data (2016) in order to predict the following year (2017).

longterm-split

Features

Based on the exploratory data analysis a simple feature engineering was performed. Based on EDA of meter readings:

  • Healthcare, Food sales and services and Utility usages shows the highest meter reading values.
  • Hotwater meter shows the highest meter reading values.
  • Monthly behaviour (meter-reading median) shows higher readings in warm season.
  • Hourly behaviour (meter-reading median) shows gihger values from 6 to 19 hs.
  • Weekday behaviour: lowers during weekends.

In the following section can be found the features selected, transformed and created.

Selection

the following features were selected from each data set:

  • Building metadata
    • Building ID*
    • Site ID*
    • Primary space usage
    • Building size (sqft)
  • Weather data
    • Timestamp*
    • Site ID*
    • Air temperature
  • Meter reading data
    • Timestamp*
    • Building ID*
    • meter
    • meter reading (target)

Transformation

The following features were transformed:

  • primaryspaceusage categories (16) were reduced to healthcare, food sales and services, utility and other
  • meter categories (8) were preserved

Creation

The following features were created:

  • month
  • day of the week
  • hour of the day

Final features

  • Timestamp*
  • Site ID
  • Building ID
  • Month
  • Hour
  • Day of the week
  • Usage (4 levels: healthcare, food, utility, other)
  • Building size (sqft)
  • Air temperature
  • Meter (8 levels)
  • Meter reading / target

Parameters

Parameters for this model were not tuned, but were manually modified to perform better than default.

  • "objective": "regression"
  • "metric": "rmse"
  • "random_state": 55
  • "learning_rate": 0.01, (default 0.1)
  • "max_bin": 761 (default 255)
  • "num_leaves": 2197 (default 31)

Results

Performance, as expected, was poor for this model. It can be used as baseline for more complex models.

longterm-plot1
Figure 1: meter_reading real values and predicted with long-term model v. timestamp.

longterm-plot2
Figure 2: meter_reading predicted with long-term model v. real values.

meter/metric RMSE RMSLE CVRMSE MBE R2
all 3973.1273 3.9732 784.6818 -239.0092 -0.1916
electricity 1586.6895 3.9438 1020.5691 -993.3422 -17.64
water 1601.3708 5.3568 1105.8349 -1073.7617 -16.2551
chilledwater 6538.3303 3.5299 968.2689 -152.569 -0.0244
hotwater 2673.591 4.332 656.2291 -317.8803 -0.2952
gas 6486.7614 5.3152 380.5833 -0.7305 0.0171
steam 6032.6299 3.0625 380.4094 -14.9194 -0.3608
solar 1668.19 6.1226 5199.5686 -5195.5277 -641.3858
irrigation 1657.8213 6.6368 1020.9721 -946.6227 -6.1193

Table 1: metrics for the long-term model, calculated for all meters alltogether and for each one.

Clone this wiki locally