Releases: dssg/triage
Mango Ataulfo ( Patch 5 )
Mango Ataulfo (Patch 4)
Improvements
- The experiment summary report now verifies whether the DB credentials are stored as environment variables or in a
database.yamlfile. - The experiment_hash and run_id of an experiment are printed on the log file instead of only being printed if the logging was on debug level.
- The summary of the run is added as log messages at the end of each Triage run.
- The ExperimentReport object includes default values if the user doesn’t specify them. By default, Performance is set to
recall@1_pct, and Bias is set totpr_disparityacross all groups. If the experiment doesn’t contain evaluations for these defaults, Triage will automatically use one of the available metrics instead. - The model grid preset has been updated.
Bug Fixes
- Setting up the precision display of pandas used on the experiment summary report of Post-modeling.
- Setting up the precision display of pandas used on visualizations to get it from the display on the audit tutorial notebook.
- Writing crosstabs to the DB
Documentation
- Postmodeling tutorial notebook
postmodeling_analysis_example_dojo_mh.ipynbwith the current functions of the postmodeling analysis.
Mango Ataulfo (Patch 3)
This patch fixes 2 bugs found in Triage’s Colab tutorial when using Python versions ≥ 3.12.
Bug Fixes
- Scientific notations in
random.randintfunctions have been changed to integers since Python versions ≥ 3.12 enforce strict integer arguments. - Removed
plotly.expresspackage that wasn’t being used by any module and requires newer versions ofpandasandnumpythat aren’t compatible with Triage’s requirements. - Changed version required of
plotlypackage.
Mango Ataulfo (Patch 2)
In version 5.5.1, we fixed a bug where files stored in /tmp during matrix creation were not being deleted after matrix creation was completed. However, Triage’s unit tests use the /tmp folder to store temporary mock matrices and objects; the new matrix generation script impacted the unit test execution. This patch refines the fix in 5.5.1 by explicitly detecting when the storage class is an S3 bucket and deleting only the files downloaded from S3, ensuring unit tests run successfully.
Bug Fixes
- /tmp cleanup now occurs only if the project’s storage is S3, allowing unit tests to run without unintended file deletions.
- Added a check to detect when entities in the protected groups and cohort data frames have diverged. If detected, an explicit error message is shown.
New functionality
- Crosstabs can now also be generated directly from matrices instead of database tables (run_crosstabs_from_matrix), accommodating potential differences between the two.
Mango Ataulfo (Patch 1)
When saving Triage’s generated output in S3, we use /tmp folder to temporarily save CSV and gz files used by Triage for creating feature matrices and trained models. This version fixes the bug of not deleting gz files downloaded from S3 and now stores all Triage’s generated output into /tmp/{user}
Bug Fixes
Fixed the non-deletion of gz files associated with feature matrices and trained models when downloaded from S3
Mango Ataulfo
This new version introduces the LinearRanker to the set of rankers available in Triage. It generates scores by computing a weighted sum of features, using a user-defined set of weights.
New functionality
- Added LinearRanker to the available options for simple rankers. This ranker allows users to define weighted features, enabling the creation of baseline models that can mimic existing solutions
Bug Fixes
- Fixed the path for
add_predictionsscript on CLI - Added
jupyterpackage to requirements. The Experiment Summary Report task added in version 5.4.0 requiresipykernelandnbconvert, which come as part of thejupyterpackage - Fixed display precision option in the experiment summary report template notebook
- Fixed mkdocs and mkdocstrings to generate Triage documentation
Documentation
- Updates in the Postmodeling section documentation on github pages
Revolution noodle
Revolution Noodle
This new version separates the Postmodeling analysis into two phases. The first phase generates an experiment summary report that allows the user to do a general sanity check of the experiment setup before moving on to Model selection. The second phase takes care of the Model analysis of a subset of models of interest, e.g., crosstabs, list analysis, etc.
New functionality
- Subsets. In this new version, subsets are generated by querying from the cohort rather than all existing entities. When the space of entities is large (and the subset is large), this significantly slows down the experiment. As a fix, we are now forcing the subsets to be a subset of the relevant cohort rather than of all available entities. We now include the
cohrot_namein the subset hash. - Experiment summary report - after each experiment run, triage can generate a Jupyter notebook that summarizes the experiment outputs. This can be used to verify whether the experiment generated the intended outputs identify any initial errors.
Bug Fixes
- When predicting forward and not having labels in the matrix, we add a default 0 as the label
- Package Dependencies. Upgraded scikit-learn version, and specified a compatible numpy version to ensure support for Python 3.9+
- Temporary files created for generating the CSV matrices are now stored in /tmp instead of /tmp/triage_ouptut/matrices
Fixed bugs in Colab tutorial
Documentation
- Added documentation in Postmodeling section related to the Experiment summary report
- Updated the Colab tutorial to reflect the new Experiment Report Summary