Skip to content

charset: remove dependency on x/net for parsing html #669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 11, 2025
Merged

Conversation

gabriel-vasile
Copy link
Owner

@gabriel-vasile gabriel-vasile commented May 9, 2025

charset: remove dependency on x/net for parsing html
remove dependency on xml for parsing xml

pkg: github.com/gabriel-vasile/mimetype/internal/charset
            │  /tmp/prev   │              /tmp/curr              │
            │    sec/op    │   sec/op     vs base                │
FromHTML-4    1732.5n ± 1%   429.2n ± 0%  -75.23% (p=0.000 n=10)
FromXML-4      829.8n ± 0%   215.2n ± 1%  -74.07% (p=0.000 n=10)
FromPlain-4    192.8n ± 1%   192.8n ± 0%        ~ (p=0.670 n=10)
geomean        652.0n        261.1n       -59.95%

            │   /tmp/prev    │              /tmp/curr               │
            │      B/op      │    B/op     vs base                  │
FromHTML-4    4400.00 ± 0%     16.00 ± 0%  -99.64% (p=0.000 n=10)
FromXML-4      576.00 ± 0%     32.00 ± 0%  -94.44% (p=0.000 n=10)
FromPlain-4     0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                    ²               -94.13%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

            │  /tmp/prev   │              /tmp/curr               │
            │  allocs/op   │ allocs/op   vs base                  │
FromHTML-4    6.000 ± 0%     2.000 ± 0%  -66.67% (p=0.000 n=10)
FromXML-4     9.000 ± 0%     5.000 ± 0%  -44.44% (p=0.000 n=10)
FromPlain-4   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                  ²               -43.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Previously we we're dependent on x/net and xml packages. This change
should improve performance.
@gabriel-vasile gabriel-vasile changed the title use regular functions for detection charset: remove dependency on x/net for parsing html May 10, 2025
@gabriel-vasile gabriel-vasile marked this pull request as ready for review May 11, 2025 01:27
@gabriel-vasile gabriel-vasile merged commit 8b03458 into master May 11, 2025
4 checks passed
@gabriel-vasile gabriel-vasile deleted the charset branch May 11, 2025 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant