Skip to content

Conversation

@ahuang11
Copy link
Contributor

Serializing a dataframe to YAML caused excessive token bloat:

Data Overview: - input_tokens: 80
  frequency: 1
- input_tokens: 231
  frequency: 1
- input_tokens: 1207
  frequency: 1
- input_tokens: 1324
  frequency: 1
- input_tokens: 1353
  frequency: 1
- input_tokens: 1452
  frequency: 1
- input_tokens: 1539
  frequency: 1
- input_tokens: 1687
  frequency: 1
- input_tokens: 1745
  frequency: 1
- input_tokens: 1895
  frequency: 1
- input_tokens: 1993
  frequency: 1
- input_tokens: 2066
  frequency: 1
- input_tokens: 2115
  frequency: 1
- input_tokens: 2155
  frequency: 1
- input_tokens: 2302
  frequency: 1
- input_tokens: 2316
  frequency: 1
- input_tokens: 2541
  frequency: 1
- input_tokens: 2553
  frequency: 1
- input_tokens: 3342
  frequency: 1
- input_tokens: 3534
  frequency: 1
- input_tokens: 5086
  frequency: 1

Then, sometimes I get:


    return yaml.dump(df.to_dict('index'), default_flow_style=False, allow_unicode=True, sort_keys=False)
                     ^^^^^^^^^^^^^^^^^^^
  File "/Users/ahuang/miniconda3/envs/lumen/lib/python3.12/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahuang/miniconda3/envs/lumen/lib/python3.12/site-packages/pandas/core/frame.py", line 2183, in to_dict
    return to_dict(self, orient, into=into, index=index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahuang/miniconda3/envs/lumen/lib/python3.12/site-packages/pandas/core/methods/to_dict.py", line 242, in to_dict
    raise ValueError("DataFrame index must be unique for orient='index'.")
ValueError: DataFrame index must be unique for orient='index'.

I think to_markdown might be best option.

@ahuang11 ahuang11 requested a review from philippjfr October 23, 2025 23:55
@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 47.91%. Comparing base (77e4af2) to head (55307db).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
lumen/ai/utils.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1463      +/-   ##
==========================================
+ Coverage   47.90%   47.91%   +0.01%     
==========================================
  Files         122      122              
  Lines       20799    20791       -8     
==========================================
  Hits         9963     9963              
+ Misses      10836    10828       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ai = [
'griffe', 'nbformat', 'duckdb >= 1.2.0', 'pyarrow', 'instructor >=1.6.4', 'pydantic >=2.8.0', 'pydantic-extra-types', 'panel-graphic-walker[kernel] >=0.6.4',
'markitdown', 'semchunk', 'tiktoken', 'chardet', "panel-material-ui >=0.4.0"
'markitdown', 'semchunk', 'tiktoken', 'chardet', "panel-material-ui >=0.4.0", "tabulate"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see tabulate used anywhere.

@philippjfr philippjfr merged commit b35cab8 into main Oct 27, 2025
12 checks passed
@philippjfr philippjfr deleted the fix_data_display branch October 27, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants