Skip to content

Conversation

@Adam-D-Lewis
Copy link

@Adam-D-Lewis Adam-D-Lewis commented Jun 29, 2025

Description

Adds a conda-lock check command to check that a conda-lock file is compatible with the environment spec. One of the intents of this PR is that the command run fast enough to be reasonably used as a pre-commit hook.

Closes #479

@netlify
Copy link

netlify bot commented Jun 29, 2025

Deploy Preview for conda-lock ready!

Name Link
🔨 Latest commit db67efd
🔍 Latest deploy log https://app.netlify.com/projects/conda-lock/deploys/68609388f6589800083a5b89
😎 Deploy Preview https://deploy-preview-816--conda-lock.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Adam-D-Lewis
Copy link
Author

Adam-D-Lewis commented Jun 29, 2025

I'm seeing some surprising, but perhaps expected behavior. I run conda-lock -f pyproject.toml on this file

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.dependencies]
mailgun = { version = "*", source = "pypi" }  # only depends on requests

[tool.poetry.group.dev.dependencies]
requests = "<2.32"

[tool.conda-lock]
channels = ['conda-forge']
platforms = ['linux-64']

and I get a lock file like

...
- name: requests
  version: 2.31.0
  manager: conda
  platform: linux-64
  dependencies:
    certifi: '>=2017.4.17'
    charset-normalizer: '>=2,<4'
    idna: '>=2.5,<4'
    python: '>=3.7'
    urllib3: '>=1.21.1,<3'
  url: https://conda.anaconda.org/conda-forge/noarch/requests-2.31.0-pyhd8ed1ab_0.conda
  hash:
    md5: a30144e4156cdbb236f99ebb49828f8b
    sha256: 9f629d6fd3c8ac5f2a198639fe7af87c4db2ac9235279164bfe0fcb49d8c4bad
  category: main
  optional: false
...
- name: requests
  version: 2.32.4
  manager: pip
  platform: linux-64
  dependencies:
    certifi: '>=2017.4.17'
    charset-normalizer: '>=2,<4'
    idna: '>=2.5,<4'
    urllib3: '>=1.21.1,<3'
  url: https://files.pythonhosted.org/packages/7c/e4/56027c4a6b4ae70ca9de302488c5ca95ad4a39e190093d6c1a8ace08341b/requests-2.32.4-py3-none-any.whl
  hash:
    sha256: 27babd3cda2a6d50b30443204ee89830707d396671944c998b5975b031ac2b2c
  category: main
  optional: false

Requests is listed twice, which is expected, but I would expect the category on the conda-managed requests to be dev, not main. B/c of this behavior, running conda-lock check as currently implemented in this PR yields ERROR:conda_lock.check_lockfile:For platform linux-64, conda-lock.yml has root package "requests" which is not in the lock specification or does not match the constraints: None. Run conda-lock lock to update the lockfile.

This occurs b/c the lock file / env spec are checked per group (main separately from dev) and per manager (conad separately from pip managed dependencies). When checking the main category conda-managed packages in the lockfile, there is a requests root package (no main-category conda-managed packages depend on it), but there is no requests package specified under the main category, that is managed by conda.

This could be worked around in this PR, but ideally would like some feedback by @maresb first.

@maresb
Copy link
Contributor

maresb commented Jun 29, 2025

Thanks so much @Adam-D-Lewis for putting this together!

Reviewing this will take one or many blocks of time involving several hours before I'll be able to understand the details, but I intend to make at least one iteration this week.

In the meantime, I'll immediately address the unexpected result you observe. I really appreciate your analysis. What conda-lock is currently doing looks buggy to me from two levels:

  1. requests was not disambiguated between conda and pip (I'm of the opinion that it should be conda, as explained in Duplicate packages in lockfile with different versions #813 (comment))
  2. it failed to put requests into the dev category. Currently conda-lock internally uses the "v2prelim" lockfile format allowing category to be a list. (This is my joint work with @srilman, so I understand this aspect much better than the pip logic.) In order to serialize to the v1 lockfile format understood by micromamba, any package containing multiple categories will be replicated into several entries that are identical, apart from their category value:
    def to_v1(self) -> list[LockedDependencyV1]:
    """Convert a v2 dependency into a list of v1 dependencies.
    In case a v2 dependency might contain multiple categories, but a v1 dependency
    can only contain a single category, we represent multiple categories as a list
    of v1 dependencies that are identical except for the `category` field. The
    `category` field runs over all categories."""
    package_entries_per_category = [
    LockedDependencyV1(
    name=self.name,
    version=self.version,
    manager=self.manager,
    platform=self.platform,
    dependencies=self.dependencies,
    url=self.url,
    hash=self.hash,
    category=category,
    source=self.source,
    build=self.build,
    optional=category != "main",
    )
    for category in sorted(self.categories)
    ]
    return package_entries_per_category
    ).

It would be good to understand what's going wrong here. Shall we move this investigation also into #813?

In general terms, at some point fairly soon I think it'll be necessary to fix or redo the flawed pip logic. I've made several passes at trying to understand the original intentions, adding more and more type hints along the way. This PR might be pushing the limits of what can be done prior to that. (I'll have to study it in order to evaluate this.) Also if you've incidentally managed to figure out any pieces of how the pip logic works, that would be incredibly valuable.

Is this PR functioning as expected in non-pip cases?

Thanks again, I really appreciate all your work!

@maresb
Copy link
Contributor

maresb commented Jul 4, 2025

Hi @Adam-D-Lewis, I had a closer look at the code. I appreciate the tests and the clear intent, but I have some reservations about the approach. In particular, the current strategy requires the implementation of a lot of delicate helper functions that are extremely difficult to make robust and correct. I'm worried that the improvements required to make things robust would be a really huge effort. I think this feature would be more successful and maintainable if we cleverly leverage existing capabilities to, if possible, completely avoid the need to introduce fragile code.

Some examples:

  • lockfile_root_pkg_names = all_lockfile_pkgs - all_dependency_names

    Maybe I'm missing something, but I don't think "root packages" are relevant for anything. We care about verifying the full lockspec. And some of the packages in the lockspec often appear as dependencies of other packages in the lockspec. So by only checking root packages, I think you're simply omitting the packages in the lockspec that don't get classified as root packages.

  •           constraint = SpecifierSet(convert_poetry_to_pep440(constraints.version))

    This implicitly assumes 1. that Conda MatchSpecs are PEP440 compliant, and 2. that your convert_poetry_to_pep440 correctly parses them. I don't know off the top of my head any examples that break these assumptions, but I'm somewhat confident that I could break them if I tried.

  •     extra_packages = lockfile_root_pkg_names - spec_pkgs_dict.keys()

    I think it's true that if there exists an extra package that there also must exist an extra root package. But this isn't immediately obvious to me, and requires a lot of reasoning. It would be a lot more intuitive to me if we instead simply tested if there are any lockfile packages not in the transitive closure of the lockspec packages. (Transitive closure is jargon for taking the lockspec packages and recursively adding all dependencies.)

So before trying to implement by hand all this really subtle and fragile stuff, I'd like to check if there is any really simple way to achieve the same result.

For example, if on a consistent environment I run

micromamba install --dry-run --json [lockspec]

then it returns almost immediately with

{
    "dry_run": true,
    "message": "All requested packages already installed",
    "prefix": "~/micromamba/envs/env-name",
    "success": true
}

If we could make this work, then we'd outsource all the difficult stuff to micromamba. And we already have code in conda-lock for --update to generate a fake metadata-only environment where we could attempt this. (I haven't tested this yet, so I'm not sure what complications may arise.)

Unfortunately, unlike with micromamba my initial experimentation suggests that conda always initiates a full solve. So I'm not sure if this approach would be feasible when using the conda executable. But maybe we could make it work somehow, and if not perhaps we could restrict our initial implementation to require mamba / micromamba and in the meantime open a feature request to Conda.

Another interesting question is if extra packages should be pruned. After all, an environment with extra packages is still consistent with the lockspec(*). My initial instinct is to default to failing when extra packages are present, but include an --allow-extra-packages flag to disable the extra packages check.

(*) Interesting and important edge case where this is not completely true: Conda packages can have run_constrained constraints that literally constrain the versions of packages that are not dependencies and may or may not be installed in any given environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate that a conda-lock file "agrees" with its sources without triggering a solve

2 participants