Skip to content

dvc.api.get_url(): ignores request to avoid source control and crashes #10608

@rgoya

Description

@rgoya

Bug Report

Description

When calling

dvc.api.get_url(path,
                repo,
                config={"core": {"no_scm": True}}
                )

dvc still attempts to find a Git repository and raises an exception when it doesn't find it.

We have a mono repo with dvc repositories tracked by git on different paths. During interactive work and development, users interact with dvc and source control management. In some cases, tests and applications are required to run in an isolated environment that does not contain git information; the isolated environment contains all the required dvc configuration and internal files.

In such cases, we would like our code to access dvc information programatically with the API, e.g. using dvc.api.get_url() function to get the s3 path to the remote file. Given that the isolated environment no longer depends on git, but the .dvc/config file is kept and does not contain no_scm = True, we attempted to use the config parameter to request that no SCM be expected (by using config={"core": {"no_scm": True}}).

However, even though config={"core": {"no_scm": True}} is instructing dvc.api.get_url() to avoid checking for SCM, it still fails with:

dvc.scm.SCMError: /tmp/test_repo is not a git repository

Reproduce

  1. cd into folder under SCM. e.g. /path/to/test_repo
  2. dvc init --subdir
  3. dvc config core.remote s3
  4. dvc remote add -d s3 "s3://fake-bucket/path"
  5. touch test_file.txt
  6. dvc add test_file.txt
  7. Test with python:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/path/to/test_repo/")
print(url)

prints s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e
8. Move repository to untracked folder, mv /path/to/test_repo/ /tmp/
9. Test with python:

import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/")
print(url)

this raises SCMError: /tmp/test_repo is not a git repository
10. Try to avoid SCM check:

import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/", config={"core": {"no_scm": True}})
print(url)

this still raises SCMError: /tmp/test_repo is not a git repository

Expected

Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.

Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require git to do so.

Diagnosis and possible fix

During the execution of dvc.api.get_url(), there is a call to Repo.open() to which all provided parameters are passed; including config, as well as two fixed parameters subrepos=True and uninitialized=True.

Repo.repo() then has a call to _get_remote_config(url) which internally calls Repo(url), and this last call tries to find the SCM.

The call to _get_remote_config(url) ignores any parameters being considered by dvc.api.get_url(). Re-establishing these parameters (e.g. calling _get_remote_config(url, *args, **kwargs)) appears to fix the problem (submitting a fix here #10719 ).

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.59.1 (conda)
---------------------------
Platform: Python 3.10.13 on macOS-15.3.2-x86_64-i386-64bit
Subprojects:
	dvc_data = 3.16.9
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.40.2
	scmrepo = 3.3.10
Supports:
	gdrive (pydrive2 = 1.15.3),
	http (aiohttp = 3.11.13, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.11.13, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2025.3.0, boto3 = 1.37.1),
	ssh (sshfs = 2023.4.1)

Additional Information (if any):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions