[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac'

**Describe the bug**
Simulating live model deployment of the standard multivariate model `DefaultDetector` (i.e. `DetectorEnsemble` of VAE and RRCF) by means of the `TSADEvaluator` leads to periodic re-training. Initially, `TSADEvaluator`'s `default_retrain_kwargs()` method ensures that `train_config` for training the `DetectorEnsemble` is an instance of `DetectorEnsembleTrainConfig`. However, after passing down re-training from `DetectorEnsemble` to the individual models, no care is taken to ensure that the `train_config` for the `DefaultDetector` will be an instance of the same class. Instead, `train_config` for training the `DetectorEnsemble` that is the `DefaultDetector` is of type `dict` which leads to the bug reported.

Most likely this is related to `TSADEvaluator` mismatch between `full_train_kwargs` and `full_retrain_kwargs`, as the lines https://github.com/salesforce/Merlion/blob/085ef8a69e5dcdfb9dcaa394cc21e087cccbb8f0/merlion/evaluate/base.py#L191-L192
do not ensure that https://github.com/salesforce/Merlion/blob/085ef8a69e5dcdfb9dcaa394cc21e087cccbb8f0/merlion/evaluate/base.py#L202 utilizes the correct `train_config`.

**To Reproduce**
Bug has been identified by going over the tutorial on "Multivariate Time Series Anomaly Detection" for Merlion v2.0.2, section "Model Inference and Quantitative Evaluation" (see https://opensource.salesforce.com/Merlion/v2.0.2/tutorials/anomaly/2_AnomalyMultivariate.html#Model-Inference-and-Quantitative-Evaluation). When performing "Sliding Window Evaluation" with `TSADEvaluator`, the ensemble fails at re-training the `DefaultDetector` model due to the bug reported.

**Expected behavior**
Successful re-training of the `DefaultDetector` model as part of `DetectorEnsemble` models when using `TSADEvaluator`.

**Screenshots**
A screenshot of the resulting error stack trace is attached. 
<img width="819" alt="DetectorEnsemble Sliding Window Evaluation" src="https://github.com/user-attachments/assets/3c85a747-3391-4540-9e2e-f8721e03d2ed">


**Desktop**
 - OS: Ubuntu 24.04 LTS
 - Merlion Version: 2.0.2
 - Python Version: 3.9.18
 - openjdk-11-jdk installed as per docs.

**Additional context**
At the re-train trigger https://github.com/salesforce/Merlion/blob/085ef8a69e5dcdfb9dcaa394cc21e087cccbb8f0/merlion/evaluate/base.py#L230 the following call sequence occurs: 

1) `merlion.evaluate.anomaly.TSADEvaluator`'s `get_predict()` invokes `merlion.evaluate.base.EvaluatorBase`'s `get_predict()`. The latter contains the re-training logic. 
2) When re-training is initiated, `self.model` is an instance of `merlion.models.ensemble.anomaly.DetectorEnsemble`. Consequently, `EvaluatorBase`'s `_train_model()` invokes `DetectorEnsemble`'s `train()`.
3) `merlion.models.ensemble.anomaly.DetectorEnsemble` inherits from `merlion.models.ensemble.base.EnsembleBase` and `merlion.models.anomaly.base.DetectorBase`. Only the latter has a `train()` method. Therefore, `DetectorEnsemble`'s `train()` actually calls `DetectorBase`'s `train()`.
4) Using `call_with_accepted_kwargs`, `DetectorBase`'s `train()` invokes `DetectorEnsemble`'s `_train()`.
5) After executing https://github.com/salesforce/Merlion/blob/085ef8a69e5dcdfb9dcaa394cc21e087cccbb8f0/merlion/models/ensemble/anomaly.py#L139 `train_cfgs` becomes `List[dict]`.  
6) `TSADEvaluator`'s `get_predict()` is invoked at the first iteration of https://github.com/salesforce/Merlion/blob/085ef8a69e5dcdfb9dcaa394cc21e087cccbb8f0/merlion/models/ensemble/anomaly.py#L159-L164 which is responsible for re-training the first ensemble model which is an instance of `merlion.models.defaults.DefaultDetector`. At this moment, `train_kwargs['train_config']` is of type `dict`. Effectively, `TSADEvaluator`'s `get_predict()` invokes `EvaluatorBase`'s `get_predict()`.
7) `EvaluatorBase`s `get_predict()` invokes `EvaluatorBase's` `_train_model()`. The latter invokes `merlion.models.defaults.DefaultDetector`'s `train()`. At this moment, `train_config` is of type `dict`.
8) `self.model` is set to be a `DetectorEnsemble` of VAE and RRCF, and `DefaultDetector`'s `train()` invokes `LayeredDetector`'s `train()`. `merlion.models.layers.LayeredDetector` inherits from `merlion.models.layers.LayeredModel` and `merlion.models.anomaly.base.DetectorBase`. Only the latter has a `train()` method. Therefore, `LayeredDetector`'s `train()` invokes `DetectorBase`'s `train()`. At this moment, `train_config` is of type `dict`.
9) Using `call_with_accepted_kwargs`, `DetectorBase`'s `train()` invokes `DetectorEnsemble`'s `_train()`.
10) `train_config` is required to be an instance of `DetectorEnsembleTrainConfig`. We see that this is not the case. The error occurs.



	for i, (model, cfg, pr_cfg) in enumerate(zip(self.models, train_cfgs, pr_cfgs)):
	try:
	train_kwargs = dict(train_config=cfg, anomaly_labels=anomaly_labels, post_rule_train_config=pr_cfg)
	train_scores, valid_scores = TSADEvaluator(model=model, config=eval_cfg).get_predict(
	train_vals=train, test_vals=valid, train_kwargs=train_kwargs, post_process=True
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	full_train_kwargs = self.default_train_kwargs()
	full_train_kwargs.update(train_kwargs)

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions