work around occasional segfault from accessing nonexistent last unigram #12

jonmay · 2013-12-28T07:13:05Z

The last call to this code block is for the probability of word unigram_count, i.e., the n+1th of n unigrams. In some cases this causes a segfault, presumably related to whatever is happening with mmap when the FindBlanks is initialized. I thought the right thing to do would be to just break one iter sooner but that caused other issues and I didn't have time to debug. This workaround is quite a kludge but avoids the issue.

I'm posting it as a request with the implicit request that you'll see what I'm getting at here and know much more quickly than me if this is a problem. I should point out that I didn't test with closed-class arpas; could be this is for those cases (and you reserve id 0 for unk so in that case you really do go from 1 to n?).

Sorry this is so slapdash; I hope to be able to isolate specifically why I get the segfault serializing some arpas but not all but I also would love it if this rang some bell for you.

kpu · 2013-12-30T19:01:47Z

Thanks for the heads up. You are correct; I've checked in a "clean" fix in 0eee261.

There's actually two annoying off-by-one situations happening.

The first is that closed vocabularies actually get one more unigram () but open vocabularies do not. That's supposed to be addressed by trie_sort.cc:188 before this code runs.

The second is that trie records point to the beginning of the block of extensions. The end pointer is found by reading the next entry. So even the last unigram needs a next entry, hence inserting a unigram off the end. The way I did that was to have this loop extend to one more unigram. But then the blank manager stuff wasn't setup to do that as you observed. The fix is to have the loop handle the normal range of unigrams then explicitly insert the blank for the pointer afterwards.

jonmay · 2013-12-30T22:22:49Z

great, thanks. I'll try it out to confirm no more segfault.

Incidentally, i wonder why the segfault behavior is so rare -- before the
fix the unaddressed memory is accessed for every open vocab lm and yet this
is the first time i ever came across this. even when running with valgrind
the behavior was rare.

On Mon, Dec 30, 2013 at 11:01 AM, Kenneth Heafield <[email protected]

wrote:

Thanks for the heads up. You are correct; I've checked in a "clean" fix in
0eee261 0eee261.

There's actually two annoying off-by-one situations happening.

The first is that closed vocabularies actually get one more unigram () but
open vocabularies do not. That's supposed to be addressed by
trie_sort.cc:188 before this code runs.

The second is that trie records point to the beginning of the block of
extensions. The end pointer is found by reading the next entry. So even the
last unigram needs a next entry, hence inserting a unigram off the end. The
way I did that was to have this loop extend to one more unigram. But then
the blank manager stuff wasn't setup to do that as you observed. The fix is
to have the loop handle the normal range of unigrams then explicitly insert
the blank for the pointer afterwards.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/12#issuecomment-31361869
.

"Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir
de la faire plus courte." -- Pascal

work around occasional segfault from accessing nonexistent last unigram

04263cc

kpu added a commit that referenced this pull request Dec 30, 2013

Fix segfault reported by jonmay (#12)

0eee261

kpu closed this Dec 30, 2013

Gavin90s mentioned this pull request Mar 11, 2021

core dump occurred when load LM model #328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

work around occasional segfault from accessing nonexistent last unigram #12

work around occasional segfault from accessing nonexistent last unigram #12

jonmay commented Dec 28, 2013

Uh oh!

kpu commented Dec 30, 2013

Uh oh!

jonmay commented Dec 30, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

work around occasional segfault from accessing nonexistent last unigram #12

work around occasional segfault from accessing nonexistent last unigram #12

Conversation

jonmay commented Dec 28, 2013

Uh oh!

kpu commented Dec 30, 2013

Uh oh!

jonmay commented Dec 30, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants