Skip to content

zhiweit/anomaly-detection-ml

Repository files navigation

Anomaly Detection Project

Classify whether a user is anomalous or normal using supervised and unsupervised models. This is best viewed in google colab here as all the visualisations and outputs are preserved in the interactive python notebook.

ROC AUC Curve

roc-auc

Problem and Task

Given users' ratings (0-5) of items, identify whether a user is anomalous or not. The data given consists of 3 columns (user, item, rating) e.g. user 3 gives rating 5 for item 1. The dataset is imbalanced as there are few anomalous users compared to normal ones. There are three phases of the project across three weeks where data with labels for previous phase are released as well along with new test set. We are recommended to try at least two supervised methods and one unsupervised method, and be ranked the best team in terms of performance (ROC AUC) to score well for this project.

Approach

Supervised Methods

  • Logistic Regression
  • KNN Classifier
  • Random Forest
  • XGBoost Classifier
  • Neural Networks

Unsupervised Methods

  • Autoencoder
  • Isolation Forest
  • Local Outlier Factor (LOF)

Improvements

  • Feature engineering
    • IQR of user's ratings
    • no. of items rated/not rated
    • no. of items rated neutral
    • fsti: The ratio between the number of items rated by the user and the total number of items in the recommender system.
    • fsmaxrti: The ratio between the number of items rated by the user with maximum score and the total number of items in the recommender system.
    • fsminrti: The ratio between the number of items rated by the user with minimum score and the total number of items in the recommender system.
    • fspi: The ratio between the number of popular items rated by the user and the total number of popular items, K, in the recommender system.
    • fspii: The ratio between the number of popular items rated by the user and the total number of items rated by the user.
  • Models
    • Catboost
    • SVM
  • Dealing with data imbalance
    • mixup approach data augmentation for DNN
    • SMOTE
    • class weights

About

Anomaly detection using supervised and unsupervised machine learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published