Skip to content

gumbo: Make sure to use the char* pointer as the hashmap item #3524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 30, 2025

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

Previously we were passing a character string of variable length, but always copying 8 bytes into the item. It's not really a problem if the length of the string is less than 8, but it's more serious if the length is longer than 8.

Fixes #3500
Fixes #3508

Have you included adequate test coverage?

I've added test coverage that will fail under valgrind.

Does this change affect the behavior of either the C or the Java implementations?

The issue is only with the C (gumbo) implementation.

Previously we were passing a character string of variable length, but
always copying 8 bytes into the item. It's not really a problem if the
length of the string is less than 8, but it's more serious if the
length is longer than 8.

Fixes #3500
Fixes #3508
@flavorjones flavorjones force-pushed the flavorjones/gumbo-hashmap-heap-fix branch from 91fcf54 to a17dec4 Compare June 30, 2025 17:01
@flavorjones flavorjones merged commit ada4708 into main Jun 30, 2025
271 of 280 checks passed
@flavorjones flavorjones deleted the flavorjones/gumbo-hashmap-heap-fix branch June 30, 2025 18:52
@stevecheckoway
Copy link
Contributor

I only took a quick look (I'm on vacation without a computer) so I could be missing something. It looks like this is using the address of a local variable as the value in the hash set.

I'm assuming that the implementation is going to be storing a copy of that address (I'm basing that on the sizeof(char *) in the call to create the hash set; I haven't looked at the actual API) and using it in the lookup. If I'm correct about that, then I think this is undefined behavior.

I can take a closer look next week if that would be helpful.

Cheers from Athens!

@flavorjones
Copy link
Member Author

flavorjones commented Jun 30, 2025

@stevecheckoway Thanks, not urgent at all. I'd love your eyes but not worth interrupting vacation for.

The prescribed usage of the hashmap library is to use a fixed-length struct as the "item". I've cheated here a bit and am using the fixed-length pointer value as the "item". The char* points at the string allocated for the attribute name, and so I'm pretty sure it's OK to reference this string during tag tokenization.

The hashmap is thrown away in emit_current_tag, so the lifetime of the strings is longer than the lifetime of the hashmap. I think it's OK? But would appreciate your eyeballs when you're back.

@flavorjones flavorjones added this to the v1.19.0 milestone Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants