Skip to content

Conversation

legendof-selda
Copy link

@legendof-selda legendof-selda commented Oct 9, 2025

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

This PR now allows you to do templating with on cmd.
Before templating is only done through vars, foreach and matrix. param keys loaded in params and only used for dependency calculation. However, there are often times when you would like to pass in the param to the cmd. This feature allows you to do that.

The reason I had to do this is because I had a requirement where I needed to pass in a param key value which is dynamically loaded based on foreach stage.

Example:

stages:
  test:
    foreach:
      - model_type: region
        data_type: full
    do:
      params:
        - data/${item.model_type}_level/data-version.toml:
            - DATA_DIR
      wdir: ../
      cmd: python dummy_script.py --data-dir ${params.DATA_DIR} --output data/${item.model_type}_level/${item.data_type}/
$ dvc repro data/dvc.yaml
ERROR: failed to parse '[email protected]' in 'data\dvc.yaml': Could not find 'params.DATA_DIR'

As you can see here, we dynamically load a param file. Loading dynamically works. However, I cannot access the DATA_DIR key which is required for the function to work. Now you can suggest to change the script, but that might not always work. In the example it is possible, but in the real scenario I faced, this just wasn't possible.

Now, with the new changes. This is possible.

The scenario I encountered is best explain in this test tests/func/parsing/test_params_templating.py::TestParamsErrorHandling::test_foreach_with_dynamic_params_and_output

@github-project-automation github-project-automation bot moved this to Backlog in DVC Oct 9, 2025
@legendof-selda legendof-selda changed the title Feat/params templating Feat/params templating in cmd. Oct 9, 2025
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 94.71459% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.02%. Comparing base (2431ec6) to head (4b44613).
⚠️ Report is 141 commits behind head on main.

Files with missing lines Patch % Lines
dvc/parsing/context.py 79.74% 14 Missing and 2 partials ⚠️
dvc/parsing/__init__.py 93.90% 4 Missing and 1 partial ⚠️
tests/func/parsing/test_params_templating.py 98.70% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10885      +/-   ##
==========================================
+ Coverage   90.68%   91.02%   +0.34%     
==========================================
  Files         504      505       +1     
  Lines       39795    41323    +1528     
  Branches     3141     3292     +151     
==========================================
+ Hits        36087    37615    +1528     
- Misses       3042     3064      +22     
+ Partials      666      644      -22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@skshetry
Copy link
Collaborator

skshetry commented Oct 9, 2025

I'm hesitant to support this, since with your proposal, it means:

  • There are going to be multiple sources for parametrization, and
  • we'd have two levels of interpolation within a same stage. First params would have to be resolved, and the rest has to be resolved separately taking resolved params into account. (stage level vars does not support interpolation).

Now you can suggest to change the script, but that might not always work. In the example it is possible, but in the real scenario I faced, this just wasn't possible.

Could you please expand more on that why you can't use a script?

if model_type == "region":
	DATA_DIR = "this"
else:
	DATA_DIR = "that"

If you are not aware, dvc already supports stage-level vars, so you could do this:

stages:
  test:
    foreach:
      - model_type: region
        data_type: full
    do:
      wdir: ../
      vars:
      - data-version.yaml
      cmd: python dummy_script.py --data-dir ${DATA_DIR} --output data/${item.model_type}_level/${item.data_type}/
      params:
        - data-version.yaml:
            - DATA_DIR

But stage-level vars cannot be interpolated. If you need different vars, I suggest not using foreach .. do in that case, and define stages individually.

@legendof-selda
Copy link
Author

I'm hesitant to support this, since with your proposal, it means:

  • There are going to be multiple sources for parametrization, and
  • we'd have two levels of interpolation within a same stage. First params would have to be resolved, and the rest has to be resolved separately taking resolved params into account. (stage level vars does not support interpolation).

Now you can suggest to change the script, but that might not always work. In the example it is possible, but in the real scenario I faced, this just wasn't possible.

Could you please expand more on that why you can't use a script?

if model_type == "region":
	DATA_DIR = "this"
else:
	DATA_DIR = "that"

If you are not aware, dvc already supports stage-level vars, so you could do this:

stages:
  test:
    foreach:
      - model_type: region
        data_type: full
    do:
      wdir: ../
      vars:
      - data-version.yaml
      cmd: python dummy_script.py --data-dir ${DATA_DIR} --output data/${item.model_type}_level/${item.data_type}/
      params:
        - data-version.yaml:
            - DATA_DIR

But stage-level vars cannot be interpolated. If you need different vars, I suggest not using foreach .. do in that case, and define stages individually.

I wasn't aware of stage level vars, this wasn't mentioned in the docs. foreach is something that we have to use. We will have to duplicate so many stages and it just isn't feasible.

Now the reason we cannot change the script can be plenty. In my case I was using azcopy to download a dir. This is why I needed to access the parameter. The parameter isn't static either as the data dir is used for reproducibility. We version control in blob the data dir. So when we want to update and download the new data, we update the toml file so that the dvc repro detects the change.

I think the tests are pretty comprehensive. If there are specific scenarios that we need to look out for I think we should test them.

I limited the param namespace to cmd only since I don't believe it should be used for outs or metrics. For cmd you cannot expect to have a wrapping script every time. I agree that I could create a bash script and through a relative path point to it but it isn't that developer friendly. This is more readable IMO.

I wish I knew about the stage vars. Could have reused that functionality. What is stopping us from supporting interpolated stage vars?

@legendof-selda
Copy link
Author

I'm hesitant to support this, since with your proposal, it means:

  • There are going to be multiple sources for parametrization, and
  • we'd have two levels of interpolation within a same stage. First params would have to be resolved, and the rest has to be resolved separately taking resolved params into account. (stage level vars does not support interpolation).

Now you can suggest to change the script, but that might not always work. In the example it is possible, but in the real scenario I faced, this just wasn't possible.

Could you please expand more on that why you can't use a script?

if model_type == "region":
	DATA_DIR = "this"
else:
	DATA_DIR = "that"

If you are not aware, dvc already supports stage-level vars, so you could do this:

stages:
  test:
    foreach:
      - model_type: region
        data_type: full
    do:
      wdir: ../
      vars:
      - data-version.yaml
      cmd: python dummy_script.py --data-dir ${DATA_DIR} --output data/${item.model_type}_level/${item.data_type}/
      params:
        - data-version.yaml:
            - DATA_DIR

But stage-level vars cannot be interpolated. If you need different vars, I suggest not using foreach .. do in that case, and define stages individually.

I wasn't aware of stage level vars, this wasn't mentioned in the docs. foreach is something that we have to use. We will have to duplicate so many stages and it just isn't feasible.

Now the reason we cannot change the script can be plenty. In my case I was using azcopy to download a dir. This is why I needed to access the parameter. The parameter isn't static either as the data dir is used for reproducibility. We version control in blob the data dir. So when we want to update and download the new data, we update the toml file so that the dvc repro detects the change.

I think the tests are pretty comprehensive. If there are specific scenarios that we need to look out for I think we should test them.

I limited the param namespace to cmd only since I don't believe it should be used for outs or metrics. For cmd you cannot expect to have a wrapping script every time. I agree that I could create a bash script and through a relative path point to it but it isn't that developer friendly. This is more readable IMO.

I wish I knew about the stage vars. Could have reused that functionality. What is stopping us from supporting interpolated stage vars?

I thought about this more. vars should be static in nature. param namespace is only meant for cmd so that it is more developer friendly. We don't want to use param for our outs or metrics because it isn't static in nature. So it makes sense to have param and vars separate! The change here only ensures param is used in cmd and nowhere else as it is meant to be. I still think vars can be interpolated as it is still static in nature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants