Replace rustybuzz with HarfRust #417

valadaptive · 2025-08-24T01:39:44Z

Rustybuzz has gone pretty much dormant, and HarfRust is its unofficial successor. It lives under the HarfBuzz organization, and instead of ttf-parser, it uses read-fonts from Fontations (which Swash also uses, albeit a different version).

A few tweaks to the code here lets us get rid of ttf-parser as well, and use Skrifa (a higher-level library built atop read-fonts) instead.

Once the version of Skrifa used in Swash is bumped, this should consolidate dependencies even further.

The last consumer of ttf-parser is fontdb, which uses an ancient version of it and apparently "isn't maintained anymore".

~~I can't see any difference in the Hebrew snapshot tests, unless VS Code's LFS diff previews are broken or something.~~ It looks like these tests just always fail locally.

valadaptive · 2025-08-24T16:39:20Z

Shaping performance is currently ~25-40% worse with HarfRust on text_shaping_benchmarks. This mostly seems to be due to the cost of constructing a ShapePlan--parsing the script and feature lists is far more expensive in HarfRust than in rustybuzz. Here's the Samply profile.

Should we be caching the ShapePlans? (Since the user_features array contains text ranges, it doesn't seem like we can reuse a ShapePlan to shape different pieces of text.)

(/cc @dfrg)

jackpot51 · 2025-08-24T19:05:09Z

I agree with switching to harfrust. I will need to run lots of tests before, and I'll be busy for the next couple of weeks.

nicoburns · 2025-08-27T17:15:25Z

Should we be caching the ShapePlans? (Since the user_features array contains text ranges, it doesn't seem like we can reuse a ShapePlan to shape different pieces of text.)

I'm not exactly an expert, but everytime this comes up caching ShapePlans is mentioned as "a very important optimisation", so I think you definitely should be doing that.

See also:

Some API changes harfbuzz/harfrust#57 HarfRust PR which discusses the design behind it's API
Cache HarfRust structs linebender/parley#406 Parley PR adding caching to it's HarfRust usage. Note: all of ShaperData, ShaperInstance and ShapePlan being cached, and that the cache is a smallish LRU cache (currently 16 entries) with the idea being that there are typically only a few unique fonts (/configurations of fonts) being used in any given run of text

valadaptive · 2025-08-27T20:35:54Z

Maybe I still don't get how the shape plan API works, because the presence of the features array seems to indicate that each shape plan is meant to shape just one run of text. Sure, if you don't pass any features or each feature's span is infinite (as is the case here), then you can reuse them. But if an API consumer did genuinely want to pass in features that are only enabled for certain spans of text, then each shape plan would be tied to those features and the span indices to which they apply.

nicoburns · 2025-08-27T21:42:10Z

Maybe I still don't get how the shape plan API works, because the presence of the features array seems to indicate that each shape plan is meant to shape just one run of text. Sure, if you don't pass any features or each feature's span is infinite (as is the case here), then you can reuse them. But if an API consumer did genuinely want to pass in features that are only enabled for certain spans of text, then each shape plan would be tied to those features and the span indices to which they apply.

I think the idea is that it's common to have large amounts of text with exactly the same font, features, etc. It's also common to have a few styles that are switched between, for example "switch into bold (or italic) for one word/sentence and then back to regular text" or even "switch into the heading style for a run of text and then back to the body style". So if you cache a few ShapePlans then there's a good chance that you'll be able to reuse one of the existing ones for the next text you come to shape.

If you make that cache persistent, then you can also likely reuse that cache across frames where e.g. you change the text content but keep the same styles.

dfrg · 2025-08-27T21:49:37Z

Maybe I still don't get how the shape plan API works, because the presence of the features array seems to indicate that each shape plan is meant to shape just one run of text. Sure, if you don't pass any features or each feature's span is infinite (as is the case here), then you can reuse them. But if an API consumer did genuinely want to pass in features that are only enabled for certain spans of text, then each shape plan would be tied to those features and the span indices to which they apply.

The short answer is that you can reuse a plan as long the feature sets are the same with regard to tag, value and whether or not the feature is global. That is, the actual indices of the range limited features don’t matter when constructing a shape plan. This is the same behavior as HarfBuzz.

edit: the reason is that non-global features require mask bits to be allocated and those allocations are fixed in the plan.

valadaptive · 2025-08-28T01:28:10Z

The short answer is that you can reuse a plan as long the feature sets are the same with regard to tag, value and whether or not the feature is global. That is, the actual indices of the range limited features don’t matter when constructing a shape plan. This is the same behavior as HarfBuzz.

edit: the reason is that non-global features require mask bits to be allocated and those allocations are fixed in the plan.

This would be a great thing to put in the documentation!

jackpot51 · 2025-09-07T18:37:51Z

HarfRust recently had a new release, I'd recommend to update this PR to use it as it had performance improvements.

jackpot51 · 2025-09-07T19:07:44Z

Thanks! This next week I will be evaluating this.

valadaptive · 2025-09-07T19:20:43Z

I've updated HarfRust, which provides a ~3% perf boost. Much more significantly, I've added a shape_plan_cache to go along with the shape_run_cache. This provides massive speedups (3-4x) compared to current main:

Current

ShapeLine/ASCII Fast Path
                        time:   [1.5013 ms 1.5034 ms 1.5064 ms]
                        change: [+244.72% +245.83% +247.13%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

ShapeLine/BiDi Processing
                        time:   [2.9482 ms 2.9506 ms 2.9531 ms]
                        change: [+187.90% +188.80% +189.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

ShapeLine/Layout Heavy  time:   [3.2056 ms 3.2113 ms 3.2192 ms]
                        change: [+283.52% +285.92% +288.05%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

ShapeLine/Combined Stress
                        time:   [17.280 ms 17.303 ms 17.328 ms]
                        change: [+240.02% +240.59% +241.23%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

BidiParagraphs/ASCII    time:   [2.8834 µs 2.8856 µs 2.8879 µs]
                        change: [-0.2879% -0.0953% +0.1065%] (p = 0.36 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe

BidiParagraphs/Mixed    time:   [33.798 µs 33.828 µs 33.861 µs]
                        change: [-3.6776% -3.5828% -3.4934%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  16 (16.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

This PR w/ `shape_plan_cache`

  ShapeLine/ASCII Fast Path
                        time:   [440.47 µs 441.37 µs 442.42 µs]
                        change: [-70.930% -70.832% -70.748%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mildad
  2 (2.00%) high severe

Benchmarking ShapeLine/BiDi Processing: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, enable flat sampling, or reduce sample count to 60.
ShapeLine/BiDi Processing
                        time:   [1.0094 ms 1.0103 ms 1.0111 ms]
                        change: [-65.820% -65.778% -65.736%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

ShapeLine/Layout Heavy  time:   [828.27 µs 829.95 µs 831.82 µs]
                        change: [-74.280% -74.207% -74.145%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

ShapeLine/Combined Stress
                        time:   [5.0084 ms 5.0170 ms 5.0280 ms]
                        change: [-71.070% -71.006% -70.933%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

BidiParagraphs/ASCII    time:   [2.8644 µs 2.8662 µs 2.8684 µs]
                        change: [-0.6308% -0.4049% -0.1998%] (p = 0.00 < 0.05)
                        Change within noise threshold.

BidiParagraphs/Mixed    time:   [35.050 µs 35.106 µs 35.186 µs]
                        change: [+3.2408% +3.3620% +3.4938%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Not sure how I feel about adding another cache that the user has to clear on their own, or whether it's "cheating" the benchmark to use it.

It looks like the most granular shaping level that's publicly exposed to the API consumer is line-level (via BufferLine::layout), so we could move the shape plan cache to ShapeBuffer and clear it after every shaped line. This results in worse performance though--700µs/iter for "ShapeLine/ASCII Fast Path" (vs 450µs/iter for the current shape_plan_cache), and 1.7ms/iter for "ShapeLine/BiDi Processing" (vs 1ms/iter). If the privacy of APIs was more carefully considered and we didn't need to guarantee that laying out single lines over and over didn't blow up the cache and cause unbounded memory consumption, we could clear the cache at layout-level granularity, which should amortize the cost much better.

In general, caching seems to be a bit of a mess currently. There's shape_run_cache which the user needs to clear manually, but also font_matches_cache which has a fixed capacity of 256 (which could end up being a cliff). Maybe it's better to just use LRU or LFU caches everywhere and allow the user to configure their capacities? Although for UI purposes, an "increment cache generation and prune" operation does make sense since we can always run it after each frame...

jackpot51 · 2025-09-07T19:27:16Z

Shape run cache is optional. We had a shape plan cache but it was removed for performance and memory usage reasons. So long as it won't grow forever and generally increases performance (try the UHDR sample linked in the README) then it should be ok.

jackpot51 · 2025-09-07T19:34:07Z

I agree with improving caching control generally.

valadaptive · 2025-09-07T21:01:33Z

I put a simple "least recently added" VecDeque cache which stores the 6 most-recently-added shape plans, and uses harfrust::ShapePlanKey<'_> to check whether any match the current shaper options, into the ShapeBuffer struct. harfrust::ShapePlanKey<'_> does not implement Hash and has a lifetime parameter, so it's intended to be ephemeral and cannot be used to key a HashMap. But it works well for checking the equality of the (at most 6) shape plans in our cache. This approach avoids making the user clear the cache themselves, and seems to be slightly faster than hashing.

At first, I tried storing just the single most recently used shape plan, but we shape whitespace separately, and its script will always be Zzzz (unknown). So, shaping e.g. Latin text would ping-pong between a shape plan with the Latn script (for a word) -> a shape plan with the Zzzz script (for the space) -> Latn again, and so on. I chose 6 plans because that's what seems to be necessary for the "BiDi Processing" benchmark.

I added a benchmark which shapes sample/hello.txt, and that is slower with the VecDeque than the HashMap-based approach--it seems to attempt shaping with hundreds of different fonts, easily overflowing the cache. IMO, the root cause of the slowdown is that we're shaping the same piece of text hundreds of times during font fallback.

jackpot51 · 2025-09-09T02:23:59Z

Are there further optimizations you want to try, or is this ready for merge?

valadaptive · 2025-09-09T02:41:51Z

This is ready to merge now. If I can think of any more optimizations, I'll leave them for the future.

valadaptive added 3 commits August 23, 2025 22:06

Use HarfRust for shaping

81bccf9

Replace ttf-parser with skrifa entirely

4e7d320

Fix clippy lints

7ae1a02

valadaptive force-pushed the harfrust branch from b54271d to 7ae1a02 Compare August 24, 2025 02:06

valadaptive added 3 commits September 7, 2025 14:50

Add shape plan cache

78ce62a

Bump harfrust and skrifa

cc3ff93

Fix no_std build

4c803fe

jackpot51 self-assigned this Sep 7, 2025

jackpot51 added this to COSMIC Epoch 1 Sep 7, 2025

jackpot51 moved this to Beta in COSMIC Epoch 1 Sep 7, 2025

valadaptive added 4 commits September 7, 2025 16:25

Simplify the shape plan cache

2cc5f9c

Please the paperclip

1f32904

Cache font ID with plan

937d497

Tune shape plan cache for "BiDi Processing" bench

90f7b7f

WatchMkr added this to the beta milestone Sep 8, 2025

jackpot51 approved these changes Sep 9, 2025

View reviewed changes

jackpot51 merged commit 2610c86 into pop-os:main Sep 9, 2025
1 of 2 checks passed

WatchMkr moved this from Beta to Complete in COSMIC Epoch 1 Sep 15, 2025

Replace rustybuzz with HarfRust #417

Replace rustybuzz with HarfRust #417

Uh oh!

Conversation

valadaptive commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valadaptive commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackpot51 commented Aug 24, 2025

Uh oh!

nicoburns commented Aug 27, 2025

Uh oh!

valadaptive commented Aug 27, 2025

Uh oh!

nicoburns commented Aug 27, 2025

Uh oh!

dfrg commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valadaptive commented Aug 28, 2025

Uh oh!

jackpot51 commented Sep 7, 2025

Uh oh!

jackpot51 commented Sep 7, 2025

Uh oh!

valadaptive commented Sep 7, 2025

Uh oh!

jackpot51 commented Sep 7, 2025

Uh oh!

jackpot51 commented Sep 7, 2025

Uh oh!

valadaptive commented Sep 7, 2025

Uh oh!

jackpot51 commented Sep 9, 2025

Uh oh!

valadaptive commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

valadaptive commented Aug 24, 2025 •

edited

Loading

valadaptive commented Aug 24, 2025 •

edited

Loading

dfrg commented Aug 27, 2025 •

edited

Loading