Software used in MSc Data Science Dissertation Project: "Limitations of Spatiotemporal Bus Prediction in a Town with Infrequent Buses".
This project assumes Python 3.7 and Anaconda are already installed.
environment.yml file contains the full lost of installed packages.
All code requires bus data from Trapeze Group which I am not free to share openly.
Assuming that you have independent access to this data the following sequence is recommended.
- Use
pipeline/bournemouth_input/data_reader.pyTo load and parse the data and convert into into stop events. - Use the various scripts in
pipeline/feature_engineering/*to add features and derive new derivative files such as time series and correlation files. Recommended order:filter_rate_and_overtakes.pytrain_validate_test.pyadd_features.pyadd_geo_features.pyadd_prev_next.pyadd_offsets.py- others as needed.
- Explore and test the various methods using Jupyter Notebooks in the
Data Explorationfolder. Some files of note:- Exploratory Data Analysis
IEA\Stop_events EDA2.ipynbEarly investigations of stop events, mostly simple patterns.IEA\just segments.ipynbLooking at segment based statistics.IEA\Contour plots.ipynbLooking the wider context.
- Spatiotemporal Models
short term\Correlation Coefficients.ipynbLooking at correlations for exogenous modelling.short term\Correlation Coefficients2.ipynbLooking at correlations for exogenous modelling.short term\first predictons.ipynbUsing exogenous models.short term\brute force predictors.ipynbWrapper models.GPS-speed\GPS-speed.ipynbUsing GPS speed to estimate durations.
- Exploratory Data Analysis
- Run
pipeline/models/Exogenous Models.pyto generate the most positive results.
Licensed for non-commercial usage only. See Licence.md