-
Notifications
You must be signed in to change notification settings - Fork 9
feat(parser): Add automatic delimiter and header detection #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces a new module for sniffing CSV dialect properties such as delimiter, quote character, and header row. To enable this, the CsvViewParser has been refactored to be more flexible. It now uses a pluggable 'source' for line retrieval, allowing it to parse from either a Neovim buffer or an arbitrary array of strings (e.g., sample lines). This change also involved making `async_chunksize`, `comments`, and `max_lookahead` explicit parameters in the parser's methods, rather than relying solely on internal options.
The `delimiter` option now supports a `fallbacks` array in config. This array allows the parser to automatically detect the delimiter if no filetype-specific delimiter is configured. This significantly improves handling of diverse CSV files without manual setup. ```
…lation - Fix parser line advancement to handle multi-line fields properly - Skip comment lines when calculating field consistency scores - Use accurate record count instead of total line count for variance - Add explicit CSV filetype delimiter in config
The `opts.parser.delimiter.default` option has been deprecated. This change adds backward compatibility by mapping the `default` value to `opts.parser.delimiter.ft.csv`. Users are advised to migrate to using `opts.parser.delimiter.fallbacks` or configuring filetype-specific delimiters via the `ft` table. A deprecation warning will be displayed when the deprecated option is used.
The header detection algorithm has been significantly improved to be more robust and accurate. The new approach analyzes each column independently, aggregating evidence from two heuristics: 1. Type Mismatch: Assesses if the header candidate's type differs from the inferred data type of the rest of the column. 2. Length Deviation: Checks if the header candidate's string length is an outlier compared to the column's data. A scoring system based on these heuristics is used to determine if the first non-comment row is a header.
The `view.header_lnum` option now defaults to `true` and supports automatic header detection. This change refactors the CSV dialect detection logic, including delimiter, quote character, and header line number. Previously, this logic was intertwined within the parser. Now, dedicated utility functions (`util.resolve_delimiter`, `util.resolve_quote_char`, `util.resolve_header_lnum`) handle the resolution, leveraging the `sniffer` module. The `CsvViewParser` and `CsvView` instances now receive the resolved dialect parameters explicitly, leading to a cleaner and more modular design. The `sniffer` module's public interface was also updated to expose buffer-level detection functions.
… better docs - Add auto-detection support (header_lnum = true, now default) - Enhance documentation with clear value explanations - Add command-line aliases 'auto' and 'none' for better UX
4e5ebfc
to
9ebb198
Compare
0a7b21a
to
918efa7
Compare
Introduces `GUIDE.md` to provide detailed documentation on csvview.nvim's features, configuration options, and API. This new guide aims to improve user understanding and ease of use. Updates `README.md` to remove redundant information and point to the new guide.
This was referenced Jul 21, 2025
Closed
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements comprehensive CSV dialect detection with automatic delimiter detection, quote character detection, and header row identification.
Key Features
Auto-detect Delimiter
Command-line
" For unknown file formats, let auto-detection work :CsvViewEnable
Configurations
How Auto-Detection Works
ft
rules (e.g.,.csv
→ comma), use that delimiterfallbacks
orderAuto-detect Header
Command-line Header Options
Configurations
How Header Auto-Detection Works