-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feat/params templating in cmd. #10885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/params templating in cmd. #10885
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #10885 +/- ##
==========================================
+ Coverage 90.68% 91.02% +0.34%
==========================================
Files 504 505 +1
Lines 39795 41323 +1528
Branches 3141 3292 +151
==========================================
+ Hits 36087 37615 +1528
- Misses 3042 3064 +22
+ Partials 666 644 -22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'm hesitant to support this, since with your proposal, it means:
Could you please expand more on that why you can't use a script? if model_type == "region":
DATA_DIR = "this"
else:
DATA_DIR = "that" If you are not aware, dvc already supports stage-level vars, so you could do this: stages:
test:
foreach:
- model_type: region
data_type: full
do:
wdir: ../
vars:
- data-version.yaml
cmd: python dummy_script.py --data-dir ${DATA_DIR} --output data/${item.model_type}_level/${item.data_type}/
params:
- data-version.yaml:
- DATA_DIR But stage-level vars cannot be interpolated. If you need different vars, I suggest not using |
I wasn't aware of stage level vars, this wasn't mentioned in the docs. foreach is something that we have to use. We will have to duplicate so many stages and it just isn't feasible. Now the reason we cannot change the script can be plenty. In my case I was using azcopy to download a dir. This is why I needed to access the parameter. The parameter isn't static either as the data dir is used for reproducibility. We version control in blob the data dir. So when we want to update and download the new data, we update the toml file so that the dvc repro detects the change. I think the tests are pretty comprehensive. If there are specific scenarios that we need to look out for I think we should test them. I limited the param namespace to cmd only since I don't believe it should be used for outs or metrics. For cmd you cannot expect to have a wrapping script every time. I agree that I could create a bash script and through a relative path point to it but it isn't that developer friendly. This is more readable IMO. I wish I knew about the stage vars. Could have reused that functionality. What is stopping us from supporting interpolated stage vars? |
I thought about this more. vars should be static in nature. param namespace is only meant for cmd so that it is more developer friendly. We don't want to use param for our outs or metrics because it isn't static in nature. So it makes sense to have param and vars separate! The change here only ensures param is used in cmd and nowhere else as it is meant to be. I still think vars can be interpolated as it is still static in nature. |
…o do this so that foreach/matrix templating works on templated params
…a reserved keyword.
… allow param interpolation in cmd.
757cef3
to
4b44613
Compare
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. #5465
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
This PR now allows you to do templating with on cmd.
Before templating is only done through vars, foreach and matrix. param keys loaded in params and only used for dependency calculation. However, there are often times when you would like to pass in the param to the cmd. This feature allows you to do that.
The reason I had to do this is because I had a requirement where I needed to pass in a param key value which is dynamically loaded based on foreach stage.
Example:
As you can see here, we dynamically load a param file. Loading dynamically works. However, I cannot access the DATA_DIR key which is required for the function to work. Now you can suggest to change the script, but that might not always work. In the example it is possible, but in the real scenario I faced, this just wasn't possible.
Now, with the new changes. This is possible.
The scenario I encountered is best explain in this test
tests/func/parsing/test_params_templating.py::TestParamsErrorHandling::test_foreach_with_dynamic_params_and_output