Skip to content

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

@AtanasGruev

Description

@AtanasGruev

Describe the bug
Simulating live model deployment of the standard multivariate model DefaultDetector (i.e. DetectorEnsemble of VAE and RRCF) by means of the TSADEvaluator leads to periodic re-training. Initially, TSADEvaluator's default_retrain_kwargs() method ensures that train_config for training the DetectorEnsemble is an instance of DetectorEnsembleTrainConfig. However, after passing down re-training from DetectorEnsemble to the individual models, no care is taken to ensure that the train_config for the DefaultDetector will be an instance of the same class. Instead, train_config for training the DetectorEnsemble that is the DefaultDetector is of type dict which leads to the bug reported.

Most likely this is related to TSADEvaluator mismatch between full_train_kwargs and full_retrain_kwargs, as the lines

full_train_kwargs = self.default_train_kwargs()
full_train_kwargs.update(train_kwargs)

do not ensure that
train_result = self._train_model(train_vals, **full_train_kwargs)
utilizes the correct train_config.

To Reproduce
Bug has been identified by going over the tutorial on "Multivariate Time Series Anomaly Detection" for Merlion v2.0.2, section "Model Inference and Quantitative Evaluation" (see https://opensource.salesforce.com/Merlion/v2.0.2/tutorials/anomaly/2_AnomalyMultivariate.html#Model-Inference-and-Quantitative-Evaluation). When performing "Sliding Window Evaluation" with TSADEvaluator, the ensemble fails at re-training the DefaultDetector model due to the bug reported.

Expected behavior
Successful re-training of the DefaultDetector model as part of DetectorEnsemble models when using TSADEvaluator.

Screenshots
A screenshot of the resulting error stack trace is attached.
DetectorEnsemble Sliding Window Evaluation

Desktop

  • OS: Ubuntu 24.04 LTS
  • Merlion Version: 2.0.2
  • Python Version: 3.9.18
  • openjdk-11-jdk installed as per docs.

Additional context
At the re-train trigger

if t >= t_next and not cur_train.is_empty() and not cur_test.is_empty():
the following call sequence occurs:

  1. merlion.evaluate.anomaly.TSADEvaluator's get_predict() invokes merlion.evaluate.base.EvaluatorBase's get_predict(). The latter contains the re-training logic.
  2. When re-training is initiated, self.model is an instance of merlion.models.ensemble.anomaly.DetectorEnsemble. Consequently, EvaluatorBase's _train_model() invokes DetectorEnsemble's train().
  3. merlion.models.ensemble.anomaly.DetectorEnsemble inherits from merlion.models.ensemble.base.EnsembleBase and merlion.models.anomaly.base.DetectorBase. Only the latter has a train() method. Therefore, DetectorEnsemble's train() actually calls DetectorBase's train().
  4. Using call_with_accepted_kwargs, DetectorBase's train() invokes DetectorEnsemble's _train().
  5. After executing
    train_cfgs = train_config.per_model_train_configs
    train_cfgs becomes List[dict].
  6. TSADEvaluator's get_predict() is invoked at the first iteration of
    for i, (model, cfg, pr_cfg) in enumerate(zip(self.models, train_cfgs, pr_cfgs)):
    try:
    train_kwargs = dict(train_config=cfg, anomaly_labels=anomaly_labels, post_rule_train_config=pr_cfg)
    train_scores, valid_scores = TSADEvaluator(model=model, config=eval_cfg).get_predict(
    train_vals=train, test_vals=valid, train_kwargs=train_kwargs, post_process=True
    )
    which is responsible for re-training the first ensemble model which is an instance of merlion.models.defaults.DefaultDetector. At this moment, train_kwargs['train_config'] is of type dict. Effectively, TSADEvaluator's get_predict() invokes EvaluatorBase's get_predict().
  7. EvaluatorBases get_predict() invokes EvaluatorBase's _train_model(). The latter invokes merlion.models.defaults.DefaultDetector's train(). At this moment, train_config is of type dict.
  8. self.model is set to be a DetectorEnsemble of VAE and RRCF, and DefaultDetector's train() invokes LayeredDetector's train(). merlion.models.layers.LayeredDetector inherits from merlion.models.layers.LayeredModel and merlion.models.anomaly.base.DetectorBase. Only the latter has a train() method. Therefore, LayeredDetector's train() invokes DetectorBase's train(). At this moment, train_config is of type dict.
  9. Using call_with_accepted_kwargs, DetectorBase's train() invokes DetectorEnsemble's _train().
  10. train_config is required to be an instance of DetectorEnsembleTrainConfig. We see that this is not the case. The error occurs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions