feat(parser): Add automatic delimiter and header detection #62

hat0uma · 2025-07-13T13:18:52Z

Summary

Implements comprehensive CSV dialect detection with automatic delimiter detection, quote character detection, and header row identification.

Key Features

Auto-detect Delimiter

Command-line

" For unknown file formats, let auto-detection work
:CsvViewEnable

Configurations

-- Default configuration with auto-detection
{
  parser = {
    delimiter = {
      ft = {
        csv = ",",        -- Always use comma for .csv files
        tsv = "\t",       -- Always use tab for .tsv files
      },
      fallbacks = {       -- Try these delimiters in order for other files
        ",",              -- Comma (most common)
        "\t",             -- Tab
        ";",              -- Semicolon
        "|",              -- Pipe
        ":",              -- Colon
        " ",              -- Space
      },
    },
  },
}

How Auto-Detection Works

If the file type matches ft rules (e.g., .csv → comma), use that delimiter
Otherwise, test each delimiter in fallbacks order
Score each delimiter based on field consistency across lines
Select the delimiter with the highest score

Auto-detect Header

Command-line Header Options

:CsvViewEnable header_lnum=auto  " Auto-detect header (default)
:CsvViewEnable header_lnum=1     " First line as header
:CsvViewEnable header_lnum=none  " No header line

Configurations

-- Default configuration with auto-detection
{
  view = {
    header_lnum = true,  -- Auto-detect header (default)
    sticky_header = {
      enabled = true,
      separator = "─",  -- Separator line character
    },
  },
}

How Header Auto-Detection Works

Find the first non-comment line as header candidate
Analyze each column independently using two heuristics:
- Type Mismatch: If the first row contains text while data rows are numeric, it's likely a header
- Length Deviation: If the first row's text length differs significantly from data rows, it's likely a header
Combine evidence from all columns to make the final decision

This commit introduces a new module for sniffing CSV dialect properties such as delimiter, quote character, and header row. To enable this, the CsvViewParser has been refactored to be more flexible. It now uses a pluggable 'source' for line retrieval, allowing it to parse from either a Neovim buffer or an arbitrary array of strings (e.g., sample lines). This change also involved making `async_chunksize`, `comments`, and `max_lookahead` explicit parameters in the parser's methods, rather than relying solely on internal options.

The `delimiter` option now supports a `fallbacks` array in config. This array allows the parser to automatically detect the delimiter if no filetype-specific delimiter is configured. This significantly improves handling of diverse CSV files without manual setup. ```

…lation - Fix parser line advancement to handle multi-line fields properly - Skip comment lines when calculating field consistency scores - Use accurate record count instead of total line count for variance - Add explicit CSV filetype delimiter in config

The `opts.parser.delimiter.default` option has been deprecated. This change adds backward compatibility by mapping the `default` value to `opts.parser.delimiter.ft.csv`. Users are advised to migrate to using `opts.parser.delimiter.fallbacks` or configuring filetype-specific delimiters via the `ft` table. A deprecation warning will be displayed when the deprecated option is used.

The header detection algorithm has been significantly improved to be more robust and accurate. The new approach analyzes each column independently, aggregating evidence from two heuristics: 1. Type Mismatch: Assesses if the header candidate's type differs from the inferred data type of the rest of the column. 2. Length Deviation: Checks if the header candidate's string length is an outlier compared to the column's data. A scoring system based on these heuristics is used to determine if the first non-comment row is a header.

The `view.header_lnum` option now defaults to `true` and supports automatic header detection. This change refactors the CSV dialect detection logic, including delimiter, quote character, and header line number. Previously, this logic was intertwined within the parser. Now, dedicated utility functions (`util.resolve_delimiter`, `util.resolve_quote_char`, `util.resolve_header_lnum`) handle the resolution, leveraging the `sniffer` module. The `CsvViewParser` and `CsvView` instances now receive the resolved dialect parameters explicitly, leading to a cleaner and more modular design. The `sniffer` module's public interface was also updated to expose buffer-level detection functions.

… better docs - Add auto-detection support (header_lnum = true, now default) - Enhance documentation with clear value explanations - Add command-line aliases 'auto' and 'none' for better UX

Introduces `GUIDE.md` to provide detailed documentation on csvview.nvim's features, configuration options, and API. This new guide aims to improve user understanding and ease of use. Updates `README.md` to remove redundant information and point to the new guide.

hat0uma added 10 commits July 13, 2025 20:37

docs: update README

a06075a

feat(config,docs): Improve header_lnum config with auto-detection and…

10dee2e

… better docs - Add auto-detection support (header_lnum = true, now default) - Enhance documentation with clear value explanations - Add command-line aliases 'auto' and 'none' for better UX

fix(test): Specify header_lnum false in sticky header test

7d80233

docs: update README

9ebb198

hat0uma force-pushed the feat/auto_detect_delimiter branch from 4e5ebfc to 9ebb198 Compare July 20, 2025 19:32

docs: update README

918efa7

hat0uma force-pushed the feat/auto_detect_delimiter branch from 0a7b21a to 918efa7 Compare July 20, 2025 19:52

hat0uma changed the title ~~feat(parser): auto detect delimiter~~ feat(parser): Add automatic delimiter and header detection Jul 20, 2025

docs: update README

373b6b5

hat0uma merged commit bfd95ed into main Jul 21, 2025
5 checks passed

hat0uma deleted the feat/auto_detect_delimiter branch July 21, 2025 07:53

github-actions bot mentioned this pull request Jul 21, 2025

chore(main): release 1.3.0 #46

Open

This was referenced Jul 21, 2025

Alloy customizing header #61

Closed

Auto detect separator #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parser): Add automatic delimiter and header detection #62

feat(parser): Add automatic delimiter and header detection #62

Uh oh!

hat0uma commented Jul 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

feat(parser): Add automatic delimiter and header detection #62

feat(parser): Add automatic delimiter and header detection #62

Uh oh!

Conversation

hat0uma commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Auto-detect Delimiter

Command-line

Configurations

How Auto-Detection Works

Auto-detect Header

Command-line Header Options

Configurations

How Header Auto-Detection Works

Uh oh!

Uh oh!

Uh oh!

hat0uma commented Jul 13, 2025 •

edited

Loading