Skip to content

FileSource not accepting S3 endpoints as path #4993

@ShaktidharK1997

Description

@ShaktidharK1997

Expected Behavior

Return a filesource which is connected to the S3 endpoint

Current Behavior

  File "C:\Users\shakt\anaconda3\envs\feast_test_env\Lib\site-packages\feast\inference.py", line 180, in update_feature_views_with_inferred_features_and_entities
    _infer_features_and_entities(
  File "C:\Users\shakt\anaconda3\envs\feast_test_env\Lib\site-packages\feast\inference.py", line 230, in _infer_features_and_entities
    provider.get_table_column_names_and_types_from_data_source(
  File "C:\Users\shakt\anaconda3\envs\feast_test_env\Lib\site-packages\feast\infra\passthrough_provider.py", line 526, in get_table_column_names_and_types_from_data_source
    return self.offline_store.get_table_column_names_and_types_from_data_source(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\shakt\anaconda3\envs\feast_test_env\Lib\site-packages\feast\infra\offline_stores\offline_store.py", line 390, in get_table_column_names_and_types_from_data_source
    return data_source.get_table_column_names_and_types(config=config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\shakt\anaconda3\envs\feast_test_env\Lib\site-packages\feast\infra\offline_stores\file_source.py", line 181, in get_table_column_names_and_types
    schema = ParquetDataset(path, **kwargs).schema
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\shakt\anaconda3\envs\feast_test_env\Lib\site-packages\pyarrow\parquet\core.py", line 1348, in __init__
    finfo = filesystem.get_file_info(path_or_paths)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\\_fs.pyx", line 590, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow\\error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\\error.pxi", line 92, in pyarrow.lib.check_status
OSError: [WinError 123] Failed querying information for path 'C:/Users/shakt/Documents/GIT/feast-artifact/feature_repo/s3:/bucket/flights.parquet'. 
Detail: [Windows error 123] The filename, directory name, or volume label syntax is incorrect.

Steps to reproduce

  1. Create a minio object store as a docker container exposing 9000 port
  2. Ran the below code to try and connect to that minio container as a File Source
bucket_name = "bucket"
file_name = "flights.parquet"
s3_endpoint = "https://localhost:9000" 


# Define the data source for flight data
flight_stats_source = FileSource(
    path=f"s3://{bucket_name}/{file_name}",  
    timestamp_field="FlightDate",
    file_format=ParquetFormat(),
    s3_endpoint_override="http://localhost:9000"  # Changed to http since use_ssl=False
)

Specifications

  • Version: 0.43.0
  • Platform:Windows
  • Subsystem:

Possible Solution

As mentioned in #4753, to revert to previous code

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions