Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
9c230bd
SNOW-2011595 vendored urllib3 - mask urls before putting them into logs
sfc-gh-mmishchenko Apr 1, 2025
8aff235
SNOW-2011595: Added filters for urllib3 leaks
sfc-gh-fpawlowski Apr 4, 2025
50dbeb6
SNOW-2011595: Added formatter for urllib3 leaks
sfc-gh-fpawlowski Apr 4, 2025
57525b9
SNOW-2011595: Removed explicit masking
sfc-gh-fpawlowski Apr 4, 2025
53eb4ff
SNOW-2011595: Description update
sfc-gh-fpawlowski Apr 4, 2025
c4556ec
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-fpawlowski Apr 4, 2025
d7deb36
SNOW-2011595: Leaks tests updated
sfc-gh-fpawlowski Apr 4, 2025
eb4b8cb
SNOW-2011595: External libraries setup added.
sfc-gh-fpawlowski Apr 4, 2025
b107acf
SNOW-2011595: External libraries setup added.
sfc-gh-fpawlowski Apr 4, 2025
1cbe181
SNOW-2011595: Imports fixed
sfc-gh-fpawlowski Apr 4, 2025
dc56b02
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-fpawlowski Apr 4, 2025
c3266c0
SNOW-2011595: Detailed module path specified
sfc-gh-fpawlowski Apr 4, 2025
d14cb01
SNOW-2011595: Test push
sfc-gh-fpawlowski Apr 4, 2025
b6c1277
SNOW-2011595: Made filter to mask secrets
sfc-gh-fpawlowski Apr 6, 2025
e98d270
SNOW-2011595: Merged main
sfc-gh-fpawlowski Apr 6, 2025
7344c3f
SNOW-2011595: Added more test logs
sfc-gh-fpawlowski Apr 6, 2025
9cff666
SNOW-2011595: Rolledback workflow
sfc-gh-fpawlowski Apr 6, 2025
b4be608
SNOW-2011595: Added more test logs
sfc-gh-fpawlowski Apr 6, 2025
186043b
SNOW-2011595: Made filters applied to children loggers as well
sfc-gh-fpawlowski Apr 7, 2025
f96ad88
SNOW-2011595: Added docs
sfc-gh-fpawlowski Apr 8, 2025
bdf60f2
SNOW-2011595: Added checks for connection pool usage
sfc-gh-fpawlowski Apr 8, 2025
f2ec811
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-fpawlowski Apr 8, 2025
2e5460c
SNOW-2011595: Made test_put with aws similar to azure to trigger conn…
sfc-gh-fpawlowski Apr 8, 2025
ee99611
Merge remote-tracking branch 'origin/SNOW-2011595-snowflake-connector…
sfc-gh-fpawlowski Apr 8, 2025
cb1071e
SNOW-2011595: Added better checks for azure and aws calls
sfc-gh-fpawlowski Apr 8, 2025
e6cc6a2
SNOW-2011595: Make test run first
sfc-gh-fpawlowski Apr 8, 2025
f7697c0
SNOW-2011595: Make debug logs gh actions
sfc-gh-fpawlowski Apr 8, 2025
d1fa778
SNOW-2011595: Run only selected job
sfc-gh-fpawlowski Apr 8, 2025
b7002ed
SNOW-2011595: Run only selected job
sfc-gh-fpawlowski Apr 8, 2025
c5e20f9
SNOW-2011595: Run only selected job
sfc-gh-fpawlowski Apr 8, 2025
bddb333
SNOW-2011595: Run only selected job
sfc-gh-fpawlowski Apr 8, 2025
2111904
SNOW-2011595: Reverted unit drop
sfc-gh-fpawlowski Apr 9, 2025
5af4123
SNOW-2011595: Reverted unit drop
sfc-gh-fpawlowski Apr 9, 2025
5244d2b
SNOW-2011595: Reverted unit drop
sfc-gh-fpawlowski Apr 9, 2025
fefa9a5
SNOW-2011595: Reverted unit drop
sfc-gh-fpawlowski Apr 9, 2025
6d2419c
SNOW-2011595: Test push
sfc-gh-fpawlowski Apr 9, 2025
a61c30d
SNOW-2011595: Fixed to have logs for aws
sfc-gh-fpawlowski Apr 9, 2025
94a56b6
SNOW-2011595: Fixed to have logs for aws
sfc-gh-fpawlowski Apr 9, 2025
941d259
SNOW-2011595: Fixed to have logs for aws
sfc-gh-fpawlowski Apr 9, 2025
ab21027
SNOW-2011595: Run single test on gh actions
sfc-gh-fpawlowski Apr 9, 2025
6964602
SNOW-2011595: Added role to connections to preprod. Added correct tes…
sfc-gh-fpawlowski Apr 9, 2025
02fbbd7
SNOW-2011595: Reverted changes in gh actions
sfc-gh-fpawlowski Apr 9, 2025
ceea3ea
SNOW-2011595: Made sure no regression happens for aws
sfc-gh-fpawlowski Apr 9, 2025
7eee1be
Merge branch 'main' into fpawlowski/SNOW-2011595-snowflake-connector-…
sfc-gh-fpawlowski Apr 9, 2025
b221a41
SNOW-2011595: Run formatting
sfc-gh-fpawlowski Apr 9, 2025
ad13d12
SNOW-2011595: Comments cleanup
sfc-gh-fpawlowski Apr 9, 2025
0511204
SNOW-2011595: Fixed role not present on jenkins
sfc-gh-fpawlowski Apr 9, 2025
7dbb47c
SNOW-2011595: Fixed role not present on jenkins
sfc-gh-fpawlowski Apr 9, 2025
223fe6b
Fpawlowski/snow 2011595 snowflake connector for python logging presig…
sfc-gh-fpawlowski Apr 9, 2025
8217a22
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-fpawlowski Apr 9, 2025
1ce8e15
SNOW-2011595: Unified attributes access for old driver
sfc-gh-fpawlowski Apr 9, 2025
e7de9e4
Merge remote-tracking branch 'origin/SNOW-2011595-snowflake-connector…
sfc-gh-fpawlowski Apr 9, 2025
fd6632b
SNOW-1886670: Fix spacing
sfc-gh-fpawlowski Apr 9, 2025
30d9557
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-fpawlowski Apr 10, 2025
e571de3
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-fpawlowski Apr 11, 2025
fe5f780
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-mmishchenko Apr 14, 2025
8d8e13f
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-mmishchenko Apr 14, 2025
bee08e4
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-mmishchenko Apr 15, 2025
be1ac3a
SNOW-2011595: Review comments added
sfc-gh-fpawlowski Apr 15, 2025
db1fe28
Merge remote-tracking branch 'origin/SNOW-2011595-snowflake-connector…
sfc-gh-fpawlowski Apr 15, 2025
0b7bb42
Merge branch 'main' into SNOW-2011595-snowflake-connector-for-python-…
sfc-gh-mmishchenko Apr 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ on:
description: "Test scenario tags"

concurrency:
# older builds for the same pull request numer or branch should be cancelled
# older builds for the same pull request number or branch should be cancelled
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

Expand Down
1 change: 1 addition & 0 deletions DESCRIPTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Source code is also available at: https://github.com/snowflakedb/snowflake-conne
- Added `check_arrow_conversion_error_on_every_column` connection property that can be set to `False` to restore previous behaviour in which driver will ignore errors until it occurs in the last column. This flag's purpose is to unblock workflows that may be impacted by the bugfix and will be removed in later releases.
- Lower log levels from info to debug for some of the messages to make the output easier to follow.
- Allow the connector to inherit a UUID4 generated upstream, provided in statement parameters (field: `requestId`), rather than automatically generate a UUID4 to use for the HTTP Request ID.
- Improved logging in urllib3, boto3, botocore - assured data masking even after migration to the external owned library in the future.
- Fix expired S3 credentials update and increment retry when expired credentials are found.
- Added `client_fetch_threads` experimental parameter to better utilize threads for fetching query results.

Expand Down
3 changes: 3 additions & 0 deletions src/snowflake/connector/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
import logging
from logging import NullHandler

from snowflake.connector.externals_utils.externals_setup import setup_external_libraries

from .connection import SnowflakeConnection
from .cursor import DictCursor
from .dbapi import (
Expand Down Expand Up @@ -44,6 +46,7 @@
from .version import VERSION

logging.getLogger(__name__).addHandler(NullHandler())
setup_external_libraries()


@wraps(SnowflakeConnection.__init__)
Expand Down
18 changes: 1 addition & 17 deletions src/snowflake/connector/azure_storage_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os
import xml.etree.ElementTree as ET
from datetime import datetime, timezone
from logging import Filter, getLogger
from logging import getLogger
from random import choice
from string import hexdigits
from typing import TYPE_CHECKING, Any, NamedTuple
Expand Down Expand Up @@ -37,22 +37,6 @@ class AzureLocation(NamedTuple):
MATDESC = "x-ms-meta-matdesc"


class AzureCredentialFilter(Filter):
LEAKY_FMT = '%s://%s:%s "%s %s %s" %s %s'

def filter(self, record):
if record.msg == AzureCredentialFilter.LEAKY_FMT and len(record.args) == 8:
record.args = (
record.args[:4] + (record.args[4].split("?")[0],) + record.args[5:]
)
return True


getLogger("snowflake.connector.vendored.urllib3.connectionpool").addFilter(
AzureCredentialFilter()
)


class SnowflakeAzureRestClient(SnowflakeStorageClient):
def __init__(
self,
Expand Down
4 changes: 2 additions & 2 deletions src/snowflake/connector/cursor.py
Original file line number Diff line number Diff line change
Expand Up @@ -910,8 +910,8 @@ def execute(
_exec_async: Whether to execute this query asynchronously.
_no_retry: Whether or not to retry on known errors.
_do_reset: Whether or not the result set needs to be reset before executing query.
_put_callback: Function to which GET command should call back to.
_put_azure_callback: Function to which an Azure GET command should call back to.
_put_callback: Function to which PUT command should call back to.
_put_azure_callback: Function to which an Azure PUT command should call back to.
_put_callback_output_stream: The output stream a PUT command's callback should report on.
_get_callback: Function to which GET command should call back to.
_get_azure_callback: Function to which an Azure GET command should call back to.
Expand Down
Empty file.
27 changes: 27 additions & 0 deletions src/snowflake/connector/externals_utils/externals_setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
from __future__ import annotations

from snowflake.connector.logging_utils.filters import (
SecretMaskingFilter,
add_filter_to_logger_and_children,
)

MODULES_TO_MASK_LOGS_NAMES = [
"snowflake.connector.vendored.urllib3",
"botocore",
"boto3",
]
# TODO: after migration to the external urllib3 from the vendored one (SNOW-2041970),
# we should change filters here immediately to the below module's logger:
# MODULES_TO_MASK_LOGS_NAMES = [ "urllib3", ... ]


def add_filters_to_external_loggers():
for module_name in MODULES_TO_MASK_LOGS_NAMES:
add_filter_to_logger_and_children(module_name, SecretMaskingFilter())


def setup_external_libraries():
"""
Assures proper setup and injections before any external libraries are used.
"""
add_filters_to_external_loggers()
Empty file.
72 changes: 72 additions & 0 deletions src/snowflake/connector/logging_utils/filters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
from __future__ import annotations

import logging

from snowflake.connector.secret_detector import SecretDetector


def add_filter_to_logger_and_children(
base_logger_name: str, filter_instance: logging.Filter
) -> None:
# Ensure the base logger exists and apply filter
base_logger = logging.getLogger(base_logger_name)
if filter_instance not in base_logger.filters:
base_logger.addFilter(filter_instance)

all_loggers_pairs = logging.root.manager.loggerDict.items()
for name, obj in all_loggers_pairs:
if not name.startswith(base_logger_name + "."):
continue

if not isinstance(obj, logging.Logger):
continue # Skip placeholders

if filter_instance not in obj.filters:
obj.addFilter(filter_instance)


class SecretMaskingFilter(logging.Filter):
"""
A logging filter that masks sensitive information in log messages using the SecretDetector utility.

This filter is designed for scenarios where you want to avoid applying SecretDetector globally
as a formatter on all logging handlers. Global masking can introduce unnecessary computational
overhead, particularly for internal logs where secrets are already handled explicitly.
It would be also easy to bypass unintentionally by simply adding a neighbouring handler to a logger
- without SecretDetector set as a formatter.

On the other hand, libraries or submodules often do not have any handler attached, so formatting can't be
configured on those level, while attaching new handler for that can cause unintended log output or its duplication.

⚠ Important:
- Logging filters do **not** propagate down the logger hierarchy.
To apply this filter across a hierarchy, use the `add_filter_to_logger_and_children` utility.
- This filter causes **early formatting** of the log message (`record.getMessage()`),
meaning `record.args` are merged into `record.msg` prematurely.
If you rely on `record.args`, ensure this is the **last** filter in the chain.

Notes:
- The filter directly modifies `record.msg` with the masked version of the message.
- It clears `record.args` to prevent re-formatting and ensure safe message output.

Example:
logger.addFilter(SecretMaskingFilter())
handler.addFilter(SecretMaskingFilter())
"""

def filter(self, record: logging.LogRecord) -> bool:
try:
# Format the message as it would be
message = record.getMessage()

# Run masking on the whole message
masked_data = SecretDetector.mask_secrets(message)
record.msg = masked_data.masked_text
except Exception as ex:
record.msg = SecretDetector.create_formatting_error_log(
record, "EXCEPTION - " + str(ex)
)
finally:
record.args = () # Avoid format re-application of formatting

return True # allow all logs through
71 changes: 47 additions & 24 deletions src/snowflake/connector/secret_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,18 @@
import logging
import os
import re
from typing import NamedTuple

MIN_TOKEN_LEN = os.getenv("MIN_TOKEN_LEN", 32)
MIN_PWD_LEN = os.getenv("MIN_PWD_LEN", 8)


class MaskedMessageData(NamedTuple):
is_masked: bool = False
masked_text: str | None = None
error_str: str | None = None


class SecretDetector(logging.Formatter):
AWS_KEY_PATTERN = re.compile(
r"(aws_key_id|aws_secret_key|access_key_id|secret_access_key)\s*=\s*'([^']+)'",
Expand Down Expand Up @@ -48,21 +55,31 @@ class SecretDetector(logging.Formatter):
flags=re.IGNORECASE,
)

SECRET_STARRED_MASK_STR = "****"

@staticmethod
def mask_connection_token(text: str) -> str:
return SecretDetector.CONNECTION_TOKEN_PATTERN.sub(r"\1\2****", text)
return SecretDetector.CONNECTION_TOKEN_PATTERN.sub(
r"\1\2" + f"{SecretDetector.SECRET_STARRED_MASK_STR}", text
)

@staticmethod
def mask_password(text: str) -> str:
return SecretDetector.PASSWORD_PATTERN.sub(r"\1\2****", text)
return SecretDetector.PASSWORD_PATTERN.sub(
r"\1\2" + f"{SecretDetector.SECRET_STARRED_MASK_STR}", text
)

@staticmethod
def mask_aws_keys(text: str) -> str:
return SecretDetector.AWS_KEY_PATTERN.sub(r"\1='****'", text)
return SecretDetector.AWS_KEY_PATTERN.sub(
r"\1=" + f"'{SecretDetector.SECRET_STARRED_MASK_STR}'", text
)

@staticmethod
def mask_sas_tokens(text: str) -> str:
return SecretDetector.SAS_TOKEN_PATTERN.sub(r"\1=****", text)
return SecretDetector.SAS_TOKEN_PATTERN.sub(
r"\1=" + f"{SecretDetector.SECRET_STARRED_MASK_STR}", text
)

@staticmethod
def mask_aws_tokens(text: str) -> str:
Expand All @@ -81,17 +98,17 @@ def mask_private_key_data(text: str) -> str:
)

@staticmethod
def mask_secrets(text: str) -> tuple[bool, str, str | None]:
def mask_secrets(text: str) -> MaskedMessageData:
"""Masks any secrets. This is the method that should be used by outside classes.

Args:
text: A string which may contain a secret.

Returns:
The masked string.
The masked string data in MaskedMessageData.
"""
if text is None:
return (False, None, None)
return MaskedMessageData()

masked = False
err_str = None
Expand Down Expand Up @@ -119,7 +136,20 @@ def mask_secrets(text: str) -> tuple[bool, str, str | None]:
masked_text = str(ex)
err_str = str(ex)

return masked, masked_text, err_str
return MaskedMessageData(masked, masked_text, err_str)

@staticmethod
def create_formatting_error_log(
original_record: logging.LogRecord, error_message: str
) -> str:
return "{} - {} {} - {} - {} - {}".format(
original_record.asctime,
original_record.threadName,
"secret_detector.py",
"sanitize_log_str",
original_record.levelname,
error_message,
)

def format(self, record: logging.LogRecord) -> str:
"""Wrapper around logging module's formatter.
Expand All @@ -134,25 +164,18 @@ def format(self, record: logging.LogRecord) -> str:
"""
try:
unsanitized_log = super().format(record)
masked, sanitized_log, err_str = SecretDetector.mask_secrets(
masked, optional_sanitized_log, err_str = SecretDetector.mask_secrets(
unsanitized_log
)
# Added to comply with type hints (Optional[str] is not accepted for str)
sanitized_log = optional_sanitized_log or ""

if masked and err_str is not None:
sanitized_log = "{} - {} {} - {} - {} - {}".format(
record.asctime,
record.threadName,
"secret_detector.py",
"sanitize_log_str",
record.levelname,
err_str,
)
sanitized_log = self.create_formatting_error_log(record, err_str)

except Exception as ex:
sanitized_log = "{} - {} {} - {} - {} - {}".format(
record.asctime,
record.threadName,
"secret_detector.py",
"sanitize_log_str",
record.levelname,
"EXCEPTION - " + str(ex),
sanitized_log = self.create_formatting_error_log(
record, "EXCEPTION - " + str(ex)
)

return sanitized_log
26 changes: 16 additions & 10 deletions test/integ/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,16 +164,22 @@ def init_test_schema(db_parameters) -> Generator[None]:

This is automatically called per test session.
"""
ret = db_parameters
with snowflake.connector.connect(
user=ret["user"],
password=ret["password"],
host=ret["host"],
port=ret["port"],
database=ret["database"],
account=ret["account"],
protocol=ret["protocol"],
) as con:
connection_params = {
"user": db_parameters["user"],
"password": db_parameters["password"],
"host": db_parameters["host"],
"port": db_parameters["port"],
"database": db_parameters["database"],
"account": db_parameters["account"],
"protocol": db_parameters["protocol"],
}

# Role may be needed when running on preprod, but is not present on Jenkins jobs
optional_role = db_parameters.get("role")
if optional_role is not None:
connection_params.update(role=optional_role)

with snowflake.connector.connect(**connection_params) as con:
con.cursor().execute(f"CREATE SCHEMA IF NOT EXISTS {TEST_SCHEMA}")
yield
con.cursor().execute(f"DROP SCHEMA IF EXISTS {TEST_SCHEMA}")
Expand Down
21 changes: 20 additions & 1 deletion test/integ/test_large_result_set.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
#!/usr/bin/env python
from __future__ import annotations

import logging
from unittest.mock import Mock

import pytest

from snowflake.connector.secret_detector import SecretDetector
from snowflake.connector.telemetry import TelemetryField

NUMBER_OF_ROWS = 50000
Expand Down Expand Up @@ -111,8 +113,9 @@ def test_query_large_result_set_n_threads(

@pytest.mark.aws
@pytest.mark.skipolddriver
def test_query_large_result_set(conn_cnx, db_parameters, ingest_data):
def test_query_large_result_set(conn_cnx, db_parameters, ingest_data, caplog):
"""[s3] Gets Large Result set."""
caplog.set_level(logging.DEBUG)
sql = "select * from {name} order by 1".format(name=db_parameters["name"])
with conn_cnx() as cnx:
telemetry_data = []
Expand Down Expand Up @@ -161,3 +164,19 @@ def test_query_large_result_set(conn_cnx, db_parameters, ingest_data):
"Expected three telemetry logs (one per query) "
"for log type {}".format(field.value)
)

aws_request_present = False
expected_token_prefix = "X-Amz-Signature="
for line in caplog.text.splitlines():
if expected_token_prefix in line:
aws_request_present = True
# getattr is used to stay compatible with old driver - before SECRET_STARRED_MASK_STR was added
assert (
expected_token_prefix
+ getattr(SecretDetector, "SECRET_STARRED_MASK_STR", "****")
in line
), "connectionpool logger is leaking sensitive information"

assert (
aws_request_present
), "AWS URL was not found in logs, so it can't be assumed that no leaks happened in it"
Loading
Loading