-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug Report
Description
When calling
dvc.api.get_url(path,
repo,
config={"core": {"no_scm": True}}
)
dvc
still attempts to find a Git repository and raises an exception when it doesn't find it.
We have a mono repo with dvc
repositories tracked by git
on different paths. During interactive work and development, users interact with dvc
and source control management. In some cases, tests and applications are required to run in an isolated environment that does not contain git
information; the isolated environment contains all the required dvc
configuration and internal files.
In such cases, we would like our code to access dvc
information programatically with the API, e.g. using dvc.api.get_url()
function to get the s3
path to the remote file. Given that the isolated environment no longer depends on git
, but the .dvc/config
file is kept and does not contain no_scm = True
, we attempted to use the config
parameter to request that no SCM be expected (by using config={"core": {"no_scm": True}}
).
However, even though config={"core": {"no_scm": True}}
is instructing dvc.api.get_url()
to avoid checking for SCM, it still fails with:
dvc.scm.SCMError: /tmp/test_repo is not a git repository
Reproduce
cd
into folder under SCM. e.g./path/to/test_repo
dvc init --subdir
dvc config core.remote s3
dvc remote add -d s3 "s3://fake-bucket/path"
touch test_file.txt
dvc add test_file.txt
- Test with python:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/path/to/test_repo/")
print(url)
prints s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e
8. Move repository to untracked folder, mv /path/to/test_repo/ /tmp/
9. Test with python:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/")
print(url)
this raises SCMError: /tmp/test_repo is not a git repository
10. Try to avoid SCM check:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/", config={"core": {"no_scm": True}})
print(url)
this still raises SCMError: /tmp/test_repo is not a git repository
Expected
Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.
Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require git
to do so.
Diagnosis and possible fix
During the execution of dvc.api.get_url()
, there is a call to Repo.open()
to which all provided parameters are passed; including config
, as well as two fixed parameters subrepos=True
and uninitialized=True
.
Repo.repo()
then has a call to _get_remote_config(url)
which internally calls Repo(url)
, and this last call tries to find the SCM.
The call to _get_remote_config(url)
ignores any parameters being considered by dvc.api.get_url()
. Re-establishing these parameters (e.g. calling _get_remote_config(url, *args, **kwargs)
) appears to fix the problem (submitting a fix here #10719 ).
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.59.1 (conda)
---------------------------
Platform: Python 3.10.13 on macOS-15.3.2-x86_64-i386-64bit
Subprojects:
dvc_data = 3.16.9
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.10
Supports:
gdrive (pydrive2 = 1.15.3),
http (aiohttp = 3.11.13, aiohttp-retry = 2.8.3),
https (aiohttp = 3.11.13, aiohttp-retry = 2.8.3),
s3 (s3fs = 2025.3.0, boto3 = 1.37.1),
ssh (sshfs = 2023.4.1)
Additional Information (if any):