Skip to content

Commit 24c8019

Browse files
authored
Remove support for non-entity_id grouping for features (#887)
* remove hard-coded reference to entity_id * remove groups feature key and bump config version to v8 * remove non-entity groups from tests
1 parent 873c3ca commit 24c8019

33 files changed

+171
-359
lines changed

.python-version.current

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
triage-3.6.2
1+
triage-3.9.10

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Triage is designed to:
2525

2626
To install Triage, you need:
2727

28-
- Python 3.6
28+
- Python 3.7+
2929
- A PostgreSQL 9.6+ database with your source data (events,
3030
geographical data, etc) loaded.
3131
- **NOTE**: If your database is PostgreSQL 11+ you will get some
@@ -35,7 +35,7 @@ To install Triage, you need:
3535
Services's S3), to store the needed matrices and models for your
3636
experiments
3737

38-
We recommend starting with a new python virtual environment (with Python 3.6 or greater) and pip installing triage there.
38+
We recommend starting with a new python virtual environment and pip installing triage there.
3939
```bash
4040
$ virtualenv triage-env
4141
$ . triage-env/bin/activate
@@ -106,7 +106,7 @@ example:
106106

107107
(pyenv) installed
108108

109-
(python-3.6.2) installed
109+
(python-3.9.10) installed
110110

111111
(virtualenv) installed
112112

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ nav:
103103
- Testing Feature Configuration: experiments/feature-testing.md
104104
- Running an Experiment: experiments/running.md
105105
- Upgrading an Experiment:
106+
- v7 -> v8: experiments/upgrade-to-v8.md
106107
- v6 -> v7: experiments/upgrade-to-v7.md
107108
- v5 -> v6: experiments/upgrade-to-v6.md
108109
- v3/v4 -> v5: experiments/upgrade-to-v5.md

docs/sources/dirtyduck/dirty_duckling.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -133,10 +133,12 @@ If you wish, you can check the content of the file with `cat
133133
experiments/dirty-ducking.yaml`
134134

135135
```yaml
136-
config_version: 'v7'
136+
config_version: 'v8'
137137

138138
model_comment: 'dirtyduck-quickstart'
139139

140+
random_seed: 1234
141+
140142
temporal_config:
141143
label_timespans: ['3months']
142144

@@ -170,9 +172,6 @@ feature_aggregations:
170172

171173
intervals: ['all']
172174

173-
groups:
174-
- 'entity_id'
175-
176175
model_grid_preset: 'quickstart'
177176

178177
scoring:

docs/sources/dirtyduck/eis.md

Lines changed: 4 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ First the usual stuff. Note that we are changing `model_comment` and
7474
*hash* that differentiates models and model groups).
7575

7676
```yaml
77-
config_version: 'v7'
77+
config_version: 'v8'
7878

7979
model_comment: 'eis: 01'
8080
random_seed: 23895478
@@ -223,9 +223,6 @@ in [inspections prioritization](inspections.md):
223223
224224
intervals: ['1month', '3month', '6month', '1y', 'all']
225225
226-
groups:
227-
- 'entity_id'
228-
229226
-
230227
prefix: 'risks'
231228
from_obj: 'semantic.events'
@@ -247,10 +244,6 @@ in [inspections prioritization](inspections.md):
247244
248245
intervals: ['1month', '3month', '6month', '1y', 'all']
249246
250-
groups:
251-
- 'entity_id'
252-
- 'zip_code'
253-
254247
-
255248
prefix: 'results'
256249
from_obj: 'semantic.events'
@@ -270,9 +263,6 @@ in [inspections prioritization](inspections.md):
270263
271264
intervals: ['1month', '3month', '6month', '1y', 'all']
272265
273-
groups:
274-
- 'entity_id'
275-
276266
-
277267
prefix: 'inspection_types'
278268
from_obj: 'semantic.events'
@@ -291,9 +281,6 @@ in [inspections prioritization](inspections.md):
291281
292282
intervals: ['1month', '3month', '6month', '1y', 'all']
293283
294-
groups:
295-
- 'entity_id'
296-
- 'zip_code'
297284
```
298285

299286
We specify that we want to use all possible feature-group combinations for training:
@@ -513,7 +500,7 @@ The only differences between this experiment config file and the
513500
previous are in the `user_metadata` section:
514501

515502
```yaml
516-
config_version: 'v7'
503+
config_version: 'v8'
517504
518505
model_comment: 'eis: 02'
519506
random_seed: 23895478
@@ -942,8 +929,8 @@ models_dates_join_query: |
942929
#features_query must join models_dates_join_query with 1 or more features table using as_of_date
943930
features_query: |
944931
select m.model_id, m.as_of_date, f4.entity_id, f4.results_entity_id_1month_result_fail_avg, f4.results_entity_id_3month_result_fail_avg, f4.results_entity_id_6month_result_fail_avg,
945-
f2.inspection_types_zip_code_1month_type_canvass_sum, f3.risks_zip_code_1month_risk_high_sum, f4.results_entity_id_6month_result_pass_avg,
946-
f3.risks_entity_id_all_risk_high_sum, f2.inspection_types_zip_code_3month_type_canvass_sum, f4.results_entity_id_6month_result_pass_sum,
932+
f2.inspection_types_entity_id_1month_type_canvass_sum, f3.risks_entity_id_1month_risk_high_sum, f4.results_entity_id_6month_result_pass_avg,
933+
f3.risks_entity_id_all_risk_high_sum, f2.inspection_types_entity_id_3month_type_canvass_sum, f4.results_entity_id_6month_result_pass_sum,
947934
f2.inspection_types_entity_id_all_type_canvass_sum
948935
from features.inspection_types_aggregation_imputed as f2
949936
inner join features.risks_aggregation_imputed as f3 using (entity_id, as_of_date)

docs/sources/dirtyduck/inspections.md

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ The config file for this first experiment is located in
181181
[inspections_baseline.yaml](https://github.com/dssg/triage/blob/master/example/dirtyduck/experiments/inspections_baseline.yaml).
182182

183183
The first lines of the experiment config file specify the config-file
184-
version (`v7` at the moment of writing this tutorial), a comment
184+
version (`v8` at the moment of writing this tutorial), a comment
185185
(`model_comment`, which will end up as a value in the
186186
`triage_metadata.models` table), and a list of user-defined metadata
187187
(`user_metadata`) that can help to identify the resulting model
@@ -197,7 +197,7 @@ overwritten or incorrectly used), and if you add the
197197
different label definitions will belong to different model groups.
198198

199199
```yaml
200-
config_version: 'v7'
200+
config_version: 'v8'
201201

202202
model_comment: 'inspections: baseline'
203203
random_seed: 23895478
@@ -371,9 +371,6 @@ feature_aggregations:
371371
372372
intervals: ['all']
373373
374-
groups:
375-
- 'entity_id'
376-
377374
feature_group_definition:
378375
prefix:
379376
- 'inspections'
@@ -732,7 +729,7 @@ smart enough to use the previous tables and matrices instead of
732729
generating them from scratch.
733730

734731
```yaml
735-
config_version: 'v7'
732+
config_version: 'v8'
736733
737734
model_comment: 'inspections: basic ML'
738735
@@ -792,9 +789,6 @@ feature_aggregations:
792789
793790
intervals: ['1month', '3month', '6month', '1y', 'all']
794791
795-
groups:
796-
- 'entity_id'
797-
798792
-
799793
prefix: 'risks'
800794
from_obj: 'semantic.events'
@@ -816,10 +810,6 @@ feature_aggregations:
816810
817811
intervals: ['1month', '3month', '6month', '1y', 'all']
818812
819-
groups:
820-
- 'entity_id'
821-
- 'zip_code'
822-
823813
-
824814
prefix: 'results'
825815
from_obj: 'semantic.events'
@@ -839,9 +829,6 @@ feature_aggregations:
839829
840830
intervals: ['1month', '3month', '6month', '1y', 'all']
841831
842-
groups:
843-
- 'entity_id'
844-
845832
-
846833
prefix: 'inspection_types'
847834
from_obj: 'semantic.events'
@@ -860,10 +847,6 @@ feature_aggregations:
860847
861848
intervals: ['1month', '3month', '6month', '1y', 'all']
862849
863-
groups:
864-
- 'entity_id'
865-
- 'zip_code'
866-
867850
```
868851

869852
And as stated, we will train some Decision Trees, in particular we are
@@ -1177,7 +1160,7 @@ back to this problem in the Early Warning Systems.
11771160
Ok, let's add a more complete experiment. First the usual generalities.
11781161

11791162
```yaml
1180-
config_version: 'v7'
1163+
config_version: 'v8'
11811164
11821165
model_comment: 'inspections: advanced'
11831166

0 commit comments

Comments
 (0)