-
Notifications
You must be signed in to change notification settings - Fork 353
Description
Describe the bug
Simulating live model deployment of the standard multivariate model DefaultDetector (i.e. DetectorEnsemble of VAE and RRCF) by means of the TSADEvaluator leads to periodic re-training. Initially, TSADEvaluator's default_retrain_kwargs() method ensures that train_config for training the DetectorEnsemble is an instance of DetectorEnsembleTrainConfig. However, after passing down re-training from DetectorEnsemble to the individual models, no care is taken to ensure that the train_config for the DefaultDetector will be an instance of the same class. Instead, train_config for training the DetectorEnsemble that is the DefaultDetector is of type dict which leads to the bug reported.
Most likely this is related to TSADEvaluator mismatch between full_train_kwargs and full_retrain_kwargs, as the lines
Merlion/merlion/evaluate/base.py
Lines 191 to 192 in 085ef8a
| full_train_kwargs = self.default_train_kwargs() | |
| full_train_kwargs.update(train_kwargs) |
do not ensure that
Merlion/merlion/evaluate/base.py
Line 202 in 085ef8a
| train_result = self._train_model(train_vals, **full_train_kwargs) |
train_config.
To Reproduce
Bug has been identified by going over the tutorial on "Multivariate Time Series Anomaly Detection" for Merlion v2.0.2, section "Model Inference and Quantitative Evaluation" (see https://opensource.salesforce.com/Merlion/v2.0.2/tutorials/anomaly/2_AnomalyMultivariate.html#Model-Inference-and-Quantitative-Evaluation). When performing "Sliding Window Evaluation" with TSADEvaluator, the ensemble fails at re-training the DefaultDetector model due to the bug reported.
Expected behavior
Successful re-training of the DefaultDetector model as part of DetectorEnsemble models when using TSADEvaluator.
Screenshots
A screenshot of the resulting error stack trace is attached.

Desktop
- OS: Ubuntu 24.04 LTS
- Merlion Version: 2.0.2
- Python Version: 3.9.18
- openjdk-11-jdk installed as per docs.
Additional context
At the re-train trigger
Merlion/merlion/evaluate/base.py
Line 230 in 085ef8a
| if t >= t_next and not cur_train.is_empty() and not cur_test.is_empty(): |
merlion.evaluate.anomaly.TSADEvaluator'sget_predict()invokesmerlion.evaluate.base.EvaluatorBase'sget_predict(). The latter contains the re-training logic.- When re-training is initiated,
self.modelis an instance ofmerlion.models.ensemble.anomaly.DetectorEnsemble. Consequently,EvaluatorBase's_train_model()invokesDetectorEnsemble'strain(). merlion.models.ensemble.anomaly.DetectorEnsembleinherits frommerlion.models.ensemble.base.EnsembleBaseandmerlion.models.anomaly.base.DetectorBase. Only the latter has atrain()method. Therefore,DetectorEnsemble'strain()actually callsDetectorBase'strain().- Using
call_with_accepted_kwargs,DetectorBase'strain()invokesDetectorEnsemble's_train(). - After executing
Merlion/merlion/models/ensemble/anomaly.py
Line 139 in 085ef8a
train_cfgs = train_config.per_model_train_configs train_cfgsbecomesList[dict]. TSADEvaluator'sget_predict()is invoked at the first iteration ofwhich is responsible for re-training the first ensemble model which is an instance ofMerlion/merlion/models/ensemble/anomaly.py
Lines 159 to 164 in 085ef8a
for i, (model, cfg, pr_cfg) in enumerate(zip(self.models, train_cfgs, pr_cfgs)): try: train_kwargs = dict(train_config=cfg, anomaly_labels=anomaly_labels, post_rule_train_config=pr_cfg) train_scores, valid_scores = TSADEvaluator(model=model, config=eval_cfg).get_predict( train_vals=train, test_vals=valid, train_kwargs=train_kwargs, post_process=True ) merlion.models.defaults.DefaultDetector. At this moment,train_kwargs['train_config']is of typedict. Effectively,TSADEvaluator'sget_predict()invokesEvaluatorBase'sget_predict().EvaluatorBasesget_predict()invokesEvaluatorBase's_train_model(). The latter invokesmerlion.models.defaults.DefaultDetector'strain(). At this moment,train_configis of typedict.self.modelis set to be aDetectorEnsembleof VAE and RRCF, andDefaultDetector'strain()invokesLayeredDetector'strain().merlion.models.layers.LayeredDetectorinherits frommerlion.models.layers.LayeredModelandmerlion.models.anomaly.base.DetectorBase. Only the latter has atrain()method. Therefore,LayeredDetector'strain()invokesDetectorBase'strain(). At this moment,train_configis of typedict.- Using
call_with_accepted_kwargs,DetectorBase'strain()invokesDetectorEnsemble's_train(). train_configis required to be an instance ofDetectorEnsembleTrainConfig. We see that this is not the case. The error occurs.