Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
This repository was archived by the owner on Sep 2, 2025. It is now read-only.

[CT-1105] [Feature] Enable Snowflake imports (stages) in addition to packages for model config #245

@lostmygithubaccount

Description

@lostmygithubaccount

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Community slack discussion: https://getdbt.slack.com/archives/C03QUA7DWCW/p1661529180748549

Currently, Python models in dbt using Snowflake allow you to specify packages within the config as a list of Python packages to include. Snowflake is limited to this set of packages through this method: https://repo.anaconda.com/pkgs/snowflake/

Custom packages, in addition to other scenarios (storing ML model artifacts, reading arbitrary config files) are enables via IMPORT statements in a stored procedure. There is a corresponding add_import method in the Python connector: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/_autosummary/snowflake.snowpark.html#module-snowflake.snowpark

We should enable this for Snowflake in the config, so:

    dbt.config(
        materialized = "table",
        packages = ["holidays"],
        imports = ["@mystage/myfile.py"]

or in YAML:

- name: mymodel
  config:
    - packages:
        - numpy
        - pandas
        - scikit-learn
    - imports:
       - "@mystage/file1"
       - "@myotherstage/file2"

The expectations is these files are available for use from the working directory of the Python process.

Describe alternatives you've considered

Allowing arbitrary object storage for use in the dbt DAG raises tracking/lineage and governance questions. You've now used a ML model binary uploaded manually to a stage to make predictions that power a key metric or dashboard and something goes wrong -- but the model was deleted. Is it traceable?

If you're using Python files and importing them (effectively package management), are they version-controlled with git? How do you manage the code?

I think allowing this raises broader questions, but it seems relatively low-effort and high-impact to enable this sooner rather than later as-is. We can then think about standardizing across backends in another iteration.

Who will this benefit?

Snowflake Snowpark users of Python models. This is a critical feature for using non-included Python packages, and other scenarios like machine learning.

Are you interested in contributing this feature?

maybe ;) no, requires writing SQL and not just Python ;)

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions