Skip to content

feat: Support nested inline footnotes #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 21, 2025

Conversation

moben
Copy link
Contributor

@moben moben commented Feb 21, 2025

Motivation

I have a book (Wyrd Sisters by Terry Pratchett) that has nested footnotes like this (dropped text and formatted for clarity):

<p class="footnote">
  <sup>
    <a id="pnXYZfn1"/>
    <a href="../chapter.xhtml#ipnXYZfn1">20</a>
  </sup>
  Initial footnote text
  <sup>
    <a id="ipnXYZfn2"/>
    <a href="../chapter.xhtml#pnXYZfn2">21</a>
  </sup>
  and more text.
  <sup>
    <a id="ipnXYZfn3"/>
    <a href="../chapter.xhtml#pnXYZfn3">22</a>
  </sup>
</p>

<p class="footnote">
  <sup>
      <a id="pnXYZfn2"/>
      <a href="../chapter.xhtml#ipnXYZfn2">21</a>
  </sup>
  First nested footnote.
</p>

<p class="footnote">
  <sup>
      <a id="pnXYZfn3"/>
      <a href="../chapter.xhtml#ipnXYZfn3">22</a>
  </sup>
  Second nested footnote.
</p>

In master, pnXYZfn2 and pnXYZfn3 don't show up as inline footnotes.

Change

Adds nested footnotes after their "parent" footnote instead of skipping them when inline footnotes are enabled.

Screenshots

Nested Footnote

Nested Footnote with Delayed Footnotes

Both footnote 21 and 22 are present, despite the lines referencing them being on different pages.

Still no footnotes in the non-inline footnotes

With inline footnotes enabled, footnotes in their original location (at the end of the book in this case) don't generate duplicate inline footnotes. This behavior is preserved. (I think this is the reason they were skipped completely before?)

Testing

I did some manual testing with the books that I own that have many footnotes of different formats.


This change is Reviewable

@Frenzie
Copy link
Member

Frenzie commented Feb 22, 2025

That sounds like a good idea.

@poire-z
Copy link
Contributor

poire-z commented Feb 22, 2025

Dunno if it's a good idea :) adding complexity, nesting, the risk of infinite loops or worse layout... to something already quite fragile, for some extremely rare use case :/

With inline footnotes enabled, footnotes in their original location (at the end of the book in this case) don't generate duplicate inline footnotes. This behavior is preserved. (I think this is the reason they were skipped completely before?)

Dunno about the reason, it's mostly code we inherited and I just hacked around. And we probably never have witnessed your use case, so if anything, not adding footnote to footnotes pages is probably not doing something that doesn't make sense and not taking any risk with loops/reentrant stuff and re-displaying footnotes in footnotes possibly infinitely.

Having said that, after having contemplated your only-30-lines changed for more than an hour, I believe it may be quite safe (may be by chance? taking the less intrusive path).
Tell me if I understand this right (all this is my code, but a bit old and I'm no longer really confortable with how all this works):

lvrend.cpp: ok, even if we are in a end-of-book footnote line of text, we gather links in that line and associate them to the pagecontext line (we didn't before). Shouldn't hurt if we do nothing with them.

lvpagesplitter.cpp:

  1. when adding a normal line (or a group of consecutive break-avoid lines), we go looking at the links gathered and associated to these normal lines. For each line:
  2. if the line is from a end-of-book footnote line of text, we don't do the following, so no inilne-bottom foonotes stuff added for such lines/pages. Otherwise:
  3. We get all the links in that line and store them as notes. See (A) below.
  4. For each of these links (items in notes):
  5. we check if it is empty, already shown on current page, or having its associated actual_footnote (I already have forgottn what this is all about :)) already shown on current page, and if any of that, we skip it and the following. Otherwise:
    (New stuff starting here)
  6. For each footnote object associated to that link, we look at each of its lines
  7. For each of these lines, we look at each of its links
  8. For each link, and unlike 5) above, we check if it is already present in notes (and not in cur_page_seen_footnotes like in 5) which may be fine - See (B) below.). We also check if its associated actual_footnote is present in notes, and if it is empty. If any of that, we ignore it. Othewise:
  9. we append it to notes - and it will be processed soon when we go back looping at 4), ensuring in 5) the checks we did differently in 8). See (C) below.
    (New stuff ending here)
  10. we process to handle the regular footnote we got at 5)

So, if on a line we get footnote number 20 and 21 - and if 20 has nested footnotes 33 and 34, we will show at the bottom of pages: 20, 21, 33 and 34. I would expect to see 20, 33, 34 and only then 21, but I'm fine with the other (given how rare this is). (Dunno if inserting at j+1 instead of add()/appending would just solve that without any side effect.

(B) (C) Somehow, it feels that adding them to notes instead of directly processing them, which I found initially strange, makes it all safer. We avoid a lot of checks and duplicating codes, and if there are circular references among the footnotes from that main text lines and its nested footnotes, the fact that there we check their presence in notes will avoid that: they will be processed only once in that bit of code.
If there are circular references across lines of main text, they will be handled by the cur_page_seen_footnotes.indexOf(note) one like we actually do (it may still cause redundant footnotes on consecutive pages if many footnotes have the same nested footnotes, but that can already happen with main line of text having the same footnote).
In general, we consider a nested footnote just like an original line footnote, but because the checks of its presence in notes, we avoid recursion/duplication, which feels like the by-chance-or-not good idea :)

(A) about LVFootNoteList * notes{line->getLinks()};

class LVFootNoteList : public LVArray<LVFootNote*> {
public:
LVFootNoteList() {}
};

LVArray( const LVArray & v )
{
_size = _count = v._count;
if ( _size ) {
_array = new T[_size];
for (int i=0; i<_count; i++)
_array[i] = v._array[i];
} else {
_array = NULL;
}
}

I'm not super conforable with these types - and in the past I had some surprises (memory leaks, or crash while attempting to free an already freed thingie) while working on lvpagesplitter, so it may be why I stuck with long chain for accessing stuff like line->getLinks()->get(j) instead of using intermediate variables.
In your case, I believe LVFootNoteList * notes{line->getLinks()}; will make a copy (on the stack) of that array. Am I right ? Or is it just a pointer to that same object?
Often for nothing as in 99.999% of cases, we won't find any nested footnote (so small performance loss, which may be nothing in the grand scheme of what happens while splitting pages, dunno).
But when we do, it will be added to that copy (instead of the main one).
Dunno if destroy'ing that copy will destroy the item - and if later destroying the original could cause issues.
And if destroying nested notes object could cause issues.
Also dunno if we could just not need a copy and append it to the original one (it feels like they are not re-used).
Also dunno if I'm just talking crap :)
@benoit-pierre : can you give a third quick thought about this ? If it is safe and I'm just too worried ?

If I got that right and we all think it is fine and we go on with that, I may suggest more and more accurate comments (the 2 comments you added feel a bit wrong), so it gets easier to get back into what all this is about.

@moben
Copy link
Contributor Author

moben commented Feb 22, 2025

I think you got it all correct.

Another reason why I think it's quite safe is that the book I've mainly used for testing actually has infinite recursion and this patch handles that. Note that all footnotes link back from their number to the origin location, so footnote 21 and 22 link back to footnote 20. If we didn't check correctly for duplicate entries, we'd get another instance of footnote 20 after 21 and 22.

because the checks of its presence in notes, we avoid recursion/duplication, which feels like the by-chance-or-not good idea :)

Yes, that was intentional. But what's more critical and what I tried to explain in the comment:
If we only checked cur_page_seen_footnotes then we could get duplicates when footnotes are split across pages, because of the aforementioned back-links.

In an earlier version of this patch I checked cur_page_seen_footnotes (and had the new notes loop after the continue for delayed footnotes). Then, if e.g. footnote 22 ended up on the next page, you'd get footnote 20 again because it wasn't in cur_page_seen_footnotes (of the next page). Collecting all nested footnotes and then deciding to potentially delay them avoids this.

I'll review the comments and try to make that clearer.

LVFootNoteList * notes{line->getLinks()};

This just calls the copy constructor of LVFootNoteList*, so it just copies the pointer. I can

  1. change it to LVFootNoteList * notes = line->getLinks();
  2. just duplicate the getLinks() call everywhere

if you prefer. (1) is probably more the style of the current code anyway, I missed that when reviewing it myself before creating the PR.

Edit

Forgot about this:

So, if on a line we get footnote number 20 and 21 - and if 20 has nested footnotes 33 and 34, we will show at the bottom of pages: 20, 21, 33 and 34. I would expect to see 20, 33, 34 and only then 21, but I'm fine with the other (given how rare this is). (Dunno if inserting at j+1 instead of add()/appending would just solve that without any side effect.

I'll look into that, inserting at j+1 sounds better.

@poire-z
Copy link
Contributor

poire-z commented Feb 22, 2025

LVFootNoteList * notes{line->getLinks()};

This just calls the copy constructor of LVFootNoteList*, so it just copies the pointer.

Does it ? Does it not call the snippet of lvarray.h I mentionned, actually creating a new array referencing all the items of the original one ? (Genuinely asking, I'm not really mastering C++ :/)

I can
1. change it to LVFootNoteList * notes = line->getLinks();
2. just duplicate the getLinks() call everywhere
if you prefer. (1) is probably more the style of the current code anyway

Dunno, is 1) then different from your original one, copying/referencing the same pointer to the object ?
Depends on what you all think, but I naively prefer (1) if we are all certain hacking the original array causes no issue because we don't reuse it (which I think is true).

@benoit-pierre
Copy link
Contributor

LVFootNoteList * notes{line->getLinks()};

This just calls the copy constructor of LVFootNoteList*, so it just copies the pointer.

Does it ? Does it not call the snippet of lvarray.h I mentionned, actually creating a new array referencing all the items of the original one ? (Genuinely asking, I'm not really mastering C++ :/)

I can

  1. change it to LVFootNoteList * notes = line->getLinks();
  2. just duplicate the getLinks() call everywhere
    if you prefer. (1) is probably more the style of the current code anyway

Dunno, is 1) then different from your original one, copying/referencing the same pointer to the object ? Depends on what you all think, but I naively prefer (1) if we are all certain hacking the original array causes no issue because we don't reuse it (which I think is true).

The LVFootNoteList * notes{line->getLinks()}; syntax is C++ specific and equivalent to LVFootNoteList * notes = line->getLinks();. Better stick to the later: more readable and standard.

@moben
Copy link
Contributor Author

moben commented Feb 22, 2025

these two just assign the pointer and to the best of my knowledge are identical:

LVFootNoteList * notes = line->getLinks();
LVFootNoteList * notes{line->getLinks()};

These two are also identical, but invoke the copy constructor that you linked:

LVFootNoteList notes = *line->getLinks();
LVFootNoteList notes{*line->getLinks()};

The reason I generally use {} is that there are some cases with numeric types where it's safer so I just always use it. See https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-list. Just forgot to adjust to the style here, and in this case it makes 0 difference.

Edit: Didn't refresh, so I missed @benoit-pierre's reply before sending mine 😅

@moben
Copy link
Contributor Author

moben commented Feb 22, 2025

So, if on a line we get footnote number 20 and 21 - and if 20 has nested footnotes 33 and 34, we will show at the bottom of pages: 20, 21, 33 and 34. I would expect to see 20, 33, 34 and only then 21, but I'm fine with the other (given how rare this is). (Dunno if inserting at j+1 instead of add()/appending would just solve that without any side effect.

I need to do some more testing on this, will update later.

@poire-z
Copy link
Contributor

poire-z commented Feb 22, 2025

Thanks for your answers.
I realize that all along I didn't notice the { } in LVFootNoteList * notes{line->getLinks()} ...
and I was reading it with ( ) as LVFootNoteList * notes(line->getLinks()).

So, what would happen with LVFootNoteList * notes(line->getLinks()) (minus an added or removed * until we get no type error :)) ? Would the LVArray constuctor I mentionned be called ?

for ( int nn=0; nn<nested_line->getLinks()->length(); nn++ ) {
LVFootNote * nested_note = nested_line->getLinks()->get(nn);
if ( notes->indexOf(nested_note) >= 0 )
continue; // Already referenced (recursively) on this page
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit not precise (as here, we don't look at the page, only at the current line list of notes which may or may not end up on this page. Suggestion (for my future rereading):
// Already referenced among the current lines notes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied

Comment on lines 1183 to 1190
// Collect all nested footnotes before deciding if we delay the current note
// e.g. Footnote 3 links to 4, which links to 3
// We need to see the whole chain to decide to not add footnote 3 twice
// If we collected nested footnotes after `addFootnoteToPage(note_4)`, we might have
// decided to delay footnote 4. Then, we're invoked again on the next page and think
// we haven't seen footnote 3 yet (from the 4->3 link) and emit it again.
// Collecting all nested footnotes (and skipping duplicates) gives us the same result
// regardless of if some are delayed or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading this earlier, the problem and solution of delaying the current note was not what I thought was the problem we're handling here. And it has no impact on how we decide if we delay or not (we just look at the current note), so this had confused me quite a bit.
But I understand it was what you were solving and how you came to doing this that way, but from a blank state of mind, it is not obvious why this is written :)

I would write:

// Collect all nested footnotes and add them to the current line's list of footnotes links
// (avoiding duplicates) so we can just process them as if they were regular notes on that line.
//           (^which is what I took some time to get to understand)
// Additionally, this helps .... (your stuff about the footnote chains, recursion, delayed... if you think it's worth
// mentionning - for me it's just implied, the same thing as with duplicates notes on the same line, so not
// really needed to understand what we're doing, but your choice :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update the comment accordingly. Good idea to mention the big picture of what's going on, I lost that a bit.

I think the second part is still relevant because while it's implied that we avoid duplicates from the same line, that applies to the original line. The problem I'm trying to describe arises because we also go over all the lines in the (nested) footnotes. But we want to avoid duplicates in the context of the original line, and so we have to ensure we handle all of them here and check for duplicates. If we did (even part of) the recursion into nested notes after delaying footnotes to another page, we wouldn't know anymore which ones are duplicates in the original context.

It became clearer to me with how you phrased it, very helpful 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your version of the first half.

I shrunk the second half. I hit the case I described earlier when testing extreme cases where I added more footnotes and nesting, but I think this was in a earlier version of this feature that did not get the detection for not showing nested footnotes inside the actual footnotes right and the problem I described here can't happen.

What could still happen though is if we hit delayed footnotes. Then we just flush out the notes that we accumulated via pushDelayedFootnotes, which just calls addFootnoteToPage and doesn't check for nested or duplicate notes again. It sill makes sense to mention here that we rely on doing these checks (per line) here.

@moben
Copy link
Contributor Author

moben commented Feb 22, 2025

So, if on a line we get footnote number 20 and 21 - and if 20 has nested footnotes 33 and 34, we will show at the bottom of pages: 20, 21, 33 and 34. I would expect to see 20, 33, 34 and only then 21, but I'm fine with the other (given how rare this is). (Dunno if inserting at j+1 instead of add()/appending would just solve that without any side effect.

I implemented this now and did some testing with additional levels of nesting and more footnotes on one page. It should work like you described now and does indeed make more sense.

It couldn't be plain j+1 though because otherwise we'd end up with 20, 34, 33, 21, but the fix was simple enough to not introduce much complexity imo.

//
// This needs to happen before we decide if the notes are added to this page or delayed
// so delayed footnotes does not have to check for nested footnotes and duplicates again.
int num_nested_notes = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my French brain, "num" sounds more line "numero" (meaning the index of a note) than "number" (what you mean, "nombre" in French). I'd rather see it written as "nb" which avoids the possible confustion.

Fine with your comments (except the blank line :) May be wrap the previous ones shorter so you get a small third line and your next lines stand out better as a new paragraph (Yes, I dislike "Generic web browser paragraph style" for my books and for my code :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks again for the review.

I actually never thought about it that way, but in my native German "num" would also read as "Nummer", i.e. index. Maybe the programming part of my brain is more compartmentalized to English, though that makes it hard to talk about code in any other language :)

@@ -8915,7 +8915,7 @@ void renderBlockElementEnhanced( FlowState * flow, ldomNode * enode, int x, int
// See if there are links to footnotes in that line, and add
// a reference to it so page splitting can bring the footnotes
// text on this page, and then decide about page split.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add (so we know why the diff if we compare to the other similar handling like in legacy rendering):
// (We do this also if we are isFootNoteBody so we can handle footnotes nested in footnotes.)

Copy link
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let it sit for a day or two before merging, in case of nested afterthoughts :)

@moben
Copy link
Contributor Author

moben commented Feb 23, 2025

Thanks again. I'll be off for a few weeks to hopefully get some reading done 😁


Slightly off-topic here, but not sure where else to ask (gitter?)

I have some more elaborate inline footnote improvements that I'm testing, but it's still pretty work-in-progress so I didn't open a PR yet. Specifically, I'm looking at porting the footnote extension logic for not perfectly formatted books from base/cre.cpp:_isLinkToFootnote down to crengine because in many of my books the inline footnotes are cut off currently.

I got it working well in here, but it needs a lot of cleanup and I haven't found the correct place to put this logic yet (and need to adjust base afterwards to call this and keep the logic in one place).

Would you prefer a draft PR early for this sort of change to discuss the design or immediate concerns? (will still be some weeks until I have time again)

@poire-z
Copy link
Contributor

poire-z commented Feb 23, 2025

I'm fine without anything to review for weeks :) (I'm also pretty busy).

I am really not fond of bringing this heuristics work from cre.cpp to crengine...
I explained my thoughts in koreader/koreader#11451 (comment) and the various links linked in its first part.
When a publisher has done some crappy job, we may have to live with it for that book.
(Even if you're thinking only about the extension, it doesn't feel it's in the spirit of what crengine should do: it's dumb, it follows rules, CSS rules, hints, and render. I'm also not sure that where you make that happen - early when rendering a block element, you can fully know about what comes next.)

@moben
Copy link
Contributor Author

moben commented Apr 10, 2025

Hi, just checking if there's anything I can do to push this over the line. For what it's worth I've not found any problems with it so far, and I've been doing a bunch more testing with more books and elaborate footnotes with this branch combined with #618.

Thanks again for your thoughts and hints regarding the footnote extension topic by the way.

@poire-z
Copy link
Contributor

poire-z commented Apr 14, 2025

It's fine and approved.
I'm just waiting for your 2 other PRs to settle in case there's some going back to this one.
Then we can merge the 3 before bumping crengine.
But if you want it merged now to rebase the other PRs, no pb, just tell me.

moben added 7 commits April 15, 2025 19:52
Instead of at the end.

        For example[1] like this[3]

        ---
        1 Some note with nested[2] note
        2 Nested note
        3 Other note

instead of

        For example[1] like this[3]

        ---
        1 Some note with nested[2] note
        3 Other note
        2 Nested note
I hit the described case in testing degenerate cases, but I think this
was in a earlier version of this feature that did not get the detection
for not showing nested footnotes inside the actual footnotes right.

What could still happen though is if we hit delayed footnotes. Then we
just flush out the notes that we accumulated and don't check for nested
ones or dulicates. It sill makes sense to mention here that we rely on
doing these checks (per line) here.
For easier comparison to legacy handling
@moben moben force-pushed the footnotes_nested branch from c222267 to fb2ddd6 Compare April 15, 2025 23:02
@poire-z poire-z merged commit c304df1 into koreader:master Apr 21, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants