-
Notifications
You must be signed in to change notification settings - Fork 63
Introduce Polars
for dumping and loading data
#457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Polars
for dumping and loading data
gokart/file_processor.py
Outdated
except pd.errors.EmptyDataError: | ||
return pd.DataFrame() | ||
except ImportError: | ||
import polars as pl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it is a little confused to fallback to polars when import error occurred.
In my opinion, we need to introduce some global feature flags to switch dataframe frameworks.
For backward compatibility, we set pandas in default, then add a configure function like the follwoing.
DATAFRAME_FRAMEWORK = 'pandas'
def setup_dataframe_framework(framework: Literal['pandas', 'polars']):
if framework == 'polars':
try
import polars
except ImportError:
raise RuntimeError(...)
DATAFRAME_FRAMEWORK = framework
According to the flag, we can switch the implementation.
In addition, we can probably set processor class, though I do not check this work...
This would make each CsvFileProcessor simple
if DATAFRAME_FRAMEWORK == 'pandas'
CsvFileProcessor = PandasCsvFileProcessor
elif DATAFRAME_FRAMEWORK == 'polars'
CsvFileProcessor = PolarsCsvFileProcessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import os
from typing import Protocol, Type
class IFeature(Protocol):
def run(self) -> None: ...
class Feature1:
def __init__(self): ...
def run(self):
print('feature1')
class Feature2:
def __init__(self): ...
def run(self):
print('feature2')
Feature: Type[IFeature]
if os.environ.get('FEATURE') == '1':
Feature = Feature1
elif os.environ.get('FEATURE') == '2':
Feature = Feature2
else:
raise ValueError("Invalid FEATURE environment variable value. Please set it to '1' or '2'.")
Feature().run()
❯ uv run foo.py
Traceback (most recent call last):
File "gokart/foo.py", line 27, in <module>
raise ValueError("Invalid FEATURE environment variable value. Please set it to '1' or '2'.")
ValueError: Invalid FEATURE environment variable value. Please set it to '1' or '2'.
❯ FEATURE=1 uv run foo.py
feature1
❯ FEATURE=2 uv run foo.py
feature2
Switching class by an environment variable works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hiro-o918 Thanks for the comment. I applied your suggestion. I think it looks fine!
fae4c7c
to
a9caea4
Compare
a9caea4
to
665489a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments added
pl = pytest.importorskip('polars', reason='polars required') | ||
pl_testing = pytest.importorskip('polars.testing', reason='polars required') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip if polars
is not installed
polars_installed = importlib.util.find_spec('polars') is not None | ||
pytestmark = pytest.mark.skipif(polars_installed, reason='polars installed, skip pandas tests') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip if polars
is installed
- name: Test with tox for polars extra | ||
run: uvx --with tox-uv tox run -e ${{ matrix.tox-env }}-polars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the test run with polars
extra
labels = | ||
polars = py{39,310,311,312,313}-polars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labels with extra branching
ref: tox-dev/tox#2406
extras = | ||
polars: polars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the polars
label is specified, add polars
to dependencies
a53d414
to
94e9fc8
Compare
Polars
for dumping and loading dataPolars
for dumping and loading data
bc23775
to
5395556
Compare
gokart/file_processor.py
Outdated
@@ -21,6 +20,17 @@ | |||
logger = getLogger(__name__) | |||
|
|||
|
|||
try: | |||
import polars as pl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to raise an exception instead of ignoring the ValueError.
users never find ValueError('GOKART_DATAFRAME_FRAMEWORK_POLARS_ENABLED is not set. Use pandas as dataframe framework.')
since it is ignored on L31.
DATAFRAME_FRAMEWORK = os.getenv('GOKART_DATAFRAME_FRAMEWORK', 'pandas')
if GOKART_DATAFRAME_FRAMEWORK == 'polars'
try:
import polars
except ImportError:
raise ValueError('please install polars to use polars as a framework of dataframe for gokart')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. fixed in 79839e1
… env var as polars
5319642
to
79839e1
Compare
4ca0d09
to
b200fd9
Compare
fixes #304
In this PR, I introduced
polars
into gokart.Currently the implementation is super rough and I didn't check the correctness.
Feel free to leave your comment for this software architecture / implementation.