Skip to content

Introduce Polars for dumping and loading data #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

hirosassa
Copy link
Collaborator

@hirosassa hirosassa commented Mar 10, 2025

fixes #304

In this PR, I introduced polars into gokart.
Currently the implementation is super rough and I didn't check the correctness.
Feel free to leave your comment for this software architecture / implementation.

@hirosassa hirosassa marked this pull request as draft March 10, 2025 22:04
@hirosassa hirosassa changed the title draft: temp implementation draft: Introduce Polars for dumping and loading data Mar 10, 2025
except pd.errors.EmptyDataError:
return pd.DataFrame()
except ImportError:
import polars as pl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it is a little confused to fallback to polars when import error occurred.

In my opinion, we need to introduce some global feature flags to switch dataframe frameworks.
For backward compatibility, we set pandas in default, then add a configure function like the follwoing.

DATAFRAME_FRAMEWORK = 'pandas'
def setup_dataframe_framework(framework: Literal['pandas', 'polars']):
   if framework == 'polars':
     try
         import  polars
     except ImportError:
          raise RuntimeError(...)
    DATAFRAME_FRAMEWORK = framework

According to the flag, we can switch the implementation.

In addition, we can probably set processor class, though I do not check this work...
This would make each CsvFileProcessor simple

if DATAFRAME_FRAMEWORK == 'pandas'
  CsvFileProcessor  =  PandasCsvFileProcessor 
elif DATAFRAME_FRAMEWORK == 'polars'
  CsvFileProcessor  =  PolarsCsvFileProcessor 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import os
from typing import Protocol, Type


class IFeature(Protocol):
    def run(self) -> None: ...


class Feature1:
    def __init__(self): ...
    def run(self):
        print('feature1')


class Feature2:
    def __init__(self): ...
    def run(self):
        print('feature2')


Feature: Type[IFeature]
if os.environ.get('FEATURE') == '1':
    Feature = Feature1
elif os.environ.get('FEATURE') == '2':
    Feature = Feature2
else:
    raise ValueError("Invalid FEATURE environment variable value. Please set it to '1' or '2'.")


Feature().run()
❯ uv run foo.py
Traceback (most recent call last):
  File "gokart/foo.py", line 27, in <module>
    raise ValueError("Invalid FEATURE environment variable value. Please set it to '1' or '2'.")
ValueError: Invalid FEATURE environment variable value. Please set it to '1' or '2'.
❯ FEATURE=1 uv run foo.py
feature1
❯ FEATURE=2 uv run foo.py
feature2

Switching class by an environment variable works!

Copy link
Collaborator Author

@hirosassa hirosassa Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiro-o918 Thanks for the comment. I applied your suggestion. I think it looks fine!

@hirosassa hirosassa marked this pull request as ready for review March 22, 2025 08:49
Copy link
Collaborator Author

@hirosassa hirosassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added

Comment on lines +10 to +11
pl = pytest.importorskip('polars', reason='polars required')
pl_testing = pytest.importorskip('polars.testing', reason='polars required')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip if polars is not installed

Comment on lines +12 to +13
polars_installed = importlib.util.find_spec('polars') is not None
pytestmark = pytest.mark.skipif(polars_installed, reason='polars installed, skip pandas tests')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip if polars is installed

Comment on lines +33 to +34
- name: Test with tox for polars extra
run: uvx --with tox-uv tox run -e ${{ matrix.tox-env }}-polars
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the test run with polars extra

Comment on lines +3 to +4
labels =
polars = py{39,310,311,312,313}-polars
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels with extra branching
ref: tox-dev/tox#2406

Comment on lines +12 to +13
extras =
polars: polars
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the polars label is specified, add polars to dependencies

@hirosassa hirosassa changed the title draft: Introduce Polars for dumping and loading data Introduce Polars for dumping and loading data Mar 22, 2025
@hirosassa hirosassa requested a review from hiro-o918 March 29, 2025 06:29
@@ -21,6 +20,17 @@
logger = getLogger(__name__)


try:
import polars as pl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to raise an exception instead of ignoring the ValueError.
users never find ValueError('GOKART_DATAFRAME_FRAMEWORK_POLARS_ENABLED is not set. Use pandas as dataframe framework.') since it is ignored on L31.

DATAFRAME_FRAMEWORK = os.getenv('GOKART_DATAFRAME_FRAMEWORK', 'pandas')

if GOKART_DATAFRAME_FRAMEWORK == 'polars'
  try:
       import polars
  except ImportError:
       raise ValueError('please install polars to use polars as a framework of dataframe for gokart')

Copy link
Collaborator Author

@hirosassa hirosassa Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. fixed in 79839e1

@hirosassa hirosassa requested a review from hiro-o918 March 29, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Using Polars for loading and dumping data
2 participants