-
Notifications
You must be signed in to change notification settings - Fork 47
Remove emoji identifiers #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some unicode characters or character groups lead to a large increase in parser size. This change halves the size of the generated parser file.
It should be. :) |
Ok, I added the parser changes too. Makes sense with CI now enabled. |
|
You need to regenerate the parser with |
|
For the record, dropping emojis not only halves the parser size but also the large state count which directly affects performance. So I would argue that it's worth it. (Having said that, I just did a quick test and the change did not affect either compiled parser size or parsing speed...) |
I regenerated with
For me the compiled parser size is also cut in half (julia.so 5.2mb -> 2.5mb). Speed difference is much less noticeable, about 10% faster on my end. |
|
Yeah, that's now a problem with CI. It'll be probably easiest to wait until a maintainer (not me) updates the whole shebang to the latest versions. |
|
@savq are you still maintaining the parser here? |
|
There are three options:
|
|
Hi. Thanks for the PR @ChrHorn On the ABI: It hasn't been updated because there hasn't been much activity on the repo, but there's no blockers AFAIK. On the PR itself: I feel like deliberately not parsing a part of the language isn't ideal, but having a smaller parser should be an option. We'd already discussed this in #144 and I still think generating two parsers would be the happy solution. So we'd have to generate a |
|
I have already created a fork for nvim-treesitter where I will make that change; feel free to pull in the ABI 15 (including CI) changes from there if you are interested. |
Agreed that it's not ideal, but the smaller parser is very much worth it in my opinion. At least temporarily, until it is fixed upstream. Could also open another upstream issue. tree-sitter/tree-sitter#3496 was closed and focused on WASM, but I suspect it's just a cascading issue stemming from this one. |
I thought a bit more about this and I think I agree. People using tree-sitter for code analysis stuff are not spamming their code with emojis, so removing emoji identifiers is fine. @clason I see that this change (along the ABI update and #177) are already done in tree-sitter-grammars/tree-sitter-julia. Should I merge that instead and create a new release here? |
|
Yes, feel free to cherry-pick the commits (or make a master->master PR for rebasing), they should apply cleanly here (in their order). |
Not sure how much Emoji variable names are used, but this cuts the size of the parser.c file in half (45MB -> 24MB), so it might be worthy the tradeoff.
Regenerated parser is not committed.