Skip to content

handle text encodings besides ASCII and UTF-8 #16

@jtmoon79

Description

@jtmoon79

Problem

Only ASCII encoded and UTF-8 encoded files can be processed. Log files with different encodings are processed but no text is found by s4.

For example, on a Windows 11 host, among .log files under C:\Windows, 28 of 128 files were not printed. A spot check of a few of those non-printed files showed they were UTF-16 encoded.

PS> Get-ChildItem -Filter '*.log' -File -Path "C:\Windows" -Recurse -ErrorAction SilentlyContinue `
   | Select-Object -ExpandProperty FullName `
   | s4.exe - --summary

Solution

Handle other text encodings.

Handling UTF-16 and UTF-32 would be satisfactory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    difficultA difficult problem; a major coding effort or difficult algorithm to perfectenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions