Add complex condition finder to the closure segmenter. #171

garretrieger · 2025-12-18T21:44:20Z

This finder can locate the set of segments that are involved in any complex conditions for glyphs (conditions which have a mix of conjunction and disjunction). We can then use that set of segments to form a purely disjunctive condition which is a superset of the true complex condition. This superset condition can then be used for patches and will not violate the closure requirement. Ultimately this allows us to assign more narrow conditions to glyphs that would be otherwise placed in the always loaded fallback patch.

This is an initial pass at an implementation, but more work will be needed:

Performance improvements, particularily by leveraging knowledge of which segments interact with GSUB in some way.
Currently the analysis is performed post merging, but ideally it would happen pre-merging to allow the generated patches to participate in merging.
Longer term the output of this process could feed into a more advanced analysis that finds the true conditions.

The complex condition analysis step is currently optional and enabled via a new segmenter config setting:
unmapped_glyph_handling

This is used to assign conditions to unmapped (fallback) glyphs that fail to have conditions detected by regular closure analysis. The complex detection tries to find a minimal purely disjunctive condition for each unmapped glyph. This condition is not the true condition for the glyph but is a minimal purely disjunctive superset of the true condition. Currently this is done as a post-merging step, however I'm ultimately aiming to have this done pre-merging and make merging capable of incrementally applying complex condition detection as needed. This adds a new field to the segmenter config to control how unmapped glyphs are handled, there are three options: - Move them to the initial font (existing default behaviour) - Place them in an always loaded patch. - Run the newly added complex condition detection and place in more granular patches based on that. Lastly, no tests have been added yet. Those are coming next.

docs/experimental/closure_glyph_segmentation_complex_conditions.md

skef · 2025-12-18T22:47:29Z

docs/experimental/closure_glyph_segmentation_complex_conditions.md

+Also from the closure analysis run by the segmenter we may have discovered some partial conditions for glyphs. These
+can be incorporated as a starting point into the complex condition analysis.
+
+Furthermore we can reduce the amount of segments we need to test by checking which segments can interact in some way


Or COLR, or glyf component, or MATH, or ... ?

So this one is a bit speculative and definitely needs some further research to justify and be sure it holds in general (I'll update the doc to note that), but as far as I can tell from some initial investigation COLR, glyph components, and MATH can only do one to many glyph substitutions. One to many glyph subs should only result in disjunctive conditions. Example:
a_acute -> {a, acute}, e_acute -> {e, acute} would result in acute being required when (a_acute or e_acute) is present. GSUB is the only table, as far as I know, that enables many to many subs which is required for a conjunctive condition to show up.

That may be true, but this goes back to the lack of visibility of the closure mechanic. Most non-GSUB dependencies are post-GSUB, so a non-purely-disjunctive GSUB substitution could transfer onto any downstream dependency. That wouldn't matter if you had the graph, because you can tell what is upstream or downstream from whatever else. But if you don't have the graph you can't tell, just from looking at GSUB, what else is being affected by those GSUB lookups.

This in particular is an area where I think some more research/thinking is needed. I'm not 100% confident in it's validity but my instinct is that things which are downstream of GSUB won't matter at least in the context of this particular analysis. Here's my rough thinking:

For segments which don't interact with GSUB. The closure analysis will always surface those segments as a disjunctive condition for the glyphs regardless of what's going in in GSUB (since anything in the closure of a segment is always considered at minimum a disjunctive condition)

Those glyphs may still have other conditions which come from GSUB or GSUB + down stream stuff, but those conditions should only involve only segments that interact with GSUB, since to trigger the GSUB substitutions you have to start with segments that interact with it. (note: this point I'm not 100% confident about, definitely needs more research).

So then in this analysis you start with the partial disjunctive conditions, and set "all segments" to to only the GSUB interacting ones.

The good news is even if this isn't 100% always true, we can still take this approach and then at the very end do a additional conditions check against all segments. If we find there's still additional conditions then the analysis can be repeated using all segments instead of the GSUB interacting ones.

ultimately this is just a very light version of actual dependency analysis, and once we have a dependency graph integrated that would be the better approach.

Updated this part to note the GSUB approach is speculative and also added a note that you could instead do real dependency analysis.

skef · 2025-12-18T22:57:25Z

docs/experimental/closure_glyph_segmentation_complex_conditions.md

+
+As described above this approach can be slow since it processes glyphs one at a time. Improvements to performance
+can be made by processing glyphs in a batch. This can be done with a recursive approach where after each segment test 
+the set of input glyphs gets split into those that require the tested segment and those that don't. Each of the splits spawns


There are also potential compromises to be made here on increasing the efficiency of both the algorithm and the patch table by decreasing the specificity. If you determine that segment A is relevant to glyph x and that segment B is relevant to glyph y, one can treat A, B as relevant to both. One might regret that depending on what happens next (how much relevance one combines by going down the road), but you can reduce glyph-specific processing and glyph-specific patch table entries in this way.

skef · 2025-12-19T17:33:25Z

Some thoughts on this:

On the (substantial) positive side: As far as I can tell this does accomplish what it says on the tin (modified by the caveats I raised in the code comments). This is better than I had expected we could do with just the closure mechanic.

Shading into some concerns, and setting aside questions of encoder runtime performance, this still strikes me as a less than ideal place to address complex conditions, that place being after segmentation rather than before, when the conditions could affect what is grouped with what in the first place. In this implementation we're basically doing this analysis at this point because there is no alternative -- you need large-ish established groupings to bring it into the realm of acceptable performance. So with the fonts that the current baseline encoder was struggling with, the result will be much better and more specific, but instead of putting many glyphs in "patch 0" (or whatever it's called) we'll now have many add-on patch entries for those glyphs, probably grouped together somewhat using dependency similarity heuristics. Definitely better, but still (arguably?) a bit janky.

The question all of this processing is trying to answer, that being what glyphs (represented by segments in this analysis) are dependent ancestors of what other glyphs, is a more or less trivial question if one had access to the dependency graph. So why not just use a dependency graph, which is relatively cheap to produce (on the order of one or two closure runs? Well ...

Maybe it's expensive, in terms of code development time, to generate the graph. I don't think this is a concern because we already have a solid draft of the code required.
Maybe it would take a long time to get that dependency code exactly right, or to convince ourselves that it is exactly right. I don't see this as much of a concern either, as it doesn't necessarily have to be 100% right, because any answer it give can be verified with closure calls based on the foundational axioms outlined in this PR (and maybe elsewhere). That is, we could treat the dependency analysis as a fallible oracle about dependencies. Or use that mechanism to gradually convince ourselves that the analysis is accurate.
Maybe there isn't much to be gained taking advantage of dependency information earlier. This seems doubtful given the many cases in which dependencies could be grouped with dependents without adding much further friction. (Maybe pairwise optimization would be lower, but that seems like a more minor issue.)
Maybe the algorithms to take advantage of the dependencies earlier are too complex, or just not well enough understood.

4 seems like the most legitimate objection. After I wrote the dependency extractor I spent a long time looking at a blank screen trying to come up with a good collection of cooperative heuristics before I had to move on to other projects. In this context of this PR that would be equivalent to: Maybe we don't want to address complex conditions earlier in processing because we don't understand how to take advantage of that.

On the other hand:

Maybe we don't just because we haven't tried very hard yet? I don't think my staring-at-a-blank-screen problem is evidence of much -- that happens to me a lot.
There are other more limited potential uses of a dependency graph. For example: Suppose we just used the graph has an accelerator for the calculations in this PR -- Figure out the glyph-level dependencies, turn them into segment-level dependencies, and verify them with closure calls. Wouldn't that drastically increase the performance of this analysis? Couldn't we do something similar for the all-conjunctive and all-disjunctive conditions?

I guess all of this is a long-winded explanation of why I think it would make sense to revisit the dependency graph option rather than committing to exclusive reliance on the closure mechanic.

skef · 2025-12-19T20:36:09Z

I find the "superset" terminology counter-intuitive. It's a subset of the terms in the complex condition. Since it's being treated as a disjunction, it's also less rather than more inclusive of segments. (That is, if all the segments relevant to the complex condition were found, more segments would be included in the disjunction, and therefore more glyphs.) So what is it a superset of?

The result of the analysis is not gauranteed to include all segments in the true condition, only that the found condition is a superset of the true condition comprised only of segments from the true condition. This updates the doc to remove the claim that found conditions will contain all true condition segments. Additionally this reworks the 'Why this works' section to be a bit more rigorous.

garretrieger · 2025-12-19T21:05:32Z

I find the "superset" terminology counter-intuitive. It's a subset of the terms in the complex condition. Since it's being treated as a disjunction, it's also less rather than more inclusive of segments. (That is, if all the segments relevant to the complex condition were found, more segments would be included in the disjunction, and therefore more glyphs.) So what is it a superset of?

It's a superset in terms of activation. It will activate whenever the complex condition would plus may activate in other cases where the complex condition wouldn't have activated.

garretrieger · 2025-12-19T21:39:45Z

Some thoughts on this:

On the (substantial) positive side: As far as I can tell this does accomplish what it says on the tin (modified by the caveats I raised in the code comments). This is better than I had expected we could do with just the closure mechanic.

Shading into some concerns, and setting aside questions of encoder runtime performance, this still strikes me as a less than ideal place to address complex conditions, that place being after segmentation rather than before, when the conditions could affect what is grouped with what in the first place. In this implementation we're basically doing this analysis at this point because there is no alternative -- you need large-ish established groupings to bring it into the realm of acceptable performance. So with the fonts that the current baseline encoder was struggling with, the result will be much better and more specific, but instead of putting many glyphs in "patch 0" (or whatever it's called) we'll now have many add-on patch entries for those glyphs, probably grouped together somewhat using dependency similarity heuristics. Definitely better, but still (arguably?) a bit janky.

So the plan is to move this analysis to before merging so that discovered conditions can participate in the merging process. At the moment I'm just currently sorting of the details of incremental invalidation and reprocessing that are introduced. For performance, it remains to be seen, but my expectation is that doing the analysis before merging won't significantly increase the computational cost, since the additional analysis performed during merging is incremental.

The question all of this processing is trying to answer, that being what glyphs (represented by segments in this analysis) are dependent ancestors of what other glyphs, is a more or less trivial question if one had access to the dependency graph. So why not just use a dependency graph, which is relatively cheap to produce (on the order of one or two closure runs? Well ...

Maybe it's expensive, in terms of code development time, to generate the graph. I don't think this is a concern because we already have a solid draft of the code required.

Maybe it would take a long time to get that dependency code exactly right, or to convince ourselves that it is exactly right. I don't see this as much of a concern either, as it doesn't necessarily have to be 100% right, because any answer it give can be verified with closure calls based on the foundational axioms outlined in this PR (and maybe elsewhere). That is, we could treat the dependency analysis as a fallible oracle about dependencies. Or use that mechanism to gradually convince ourselves that the analysis is accurate.

Maybe there isn't much to be gained taking advantage of dependency information earlier. This seems doubtful given the many cases in which dependencies could be grouped with dependents without adding much further friction. (Maybe pairwise optimization would be lower, but that seems like a more minor issue.)

Maybe the algorithms to take advantage of the dependencies earlier are too complex, or just not well enough understood.

4 seems like the most legitimate objection. After I wrote the dependency extractor I spent a long time looking at a blank screen trying to come up with a good collection of cooperative heuristics before I had to move on to other projects. In this context of this PR that would be equivalent to: Maybe we don't want to address complex conditions earlier in processing because we don't understand how to take advantage of that.

On the other hand:

Maybe we don't just because we haven't tried very hard yet? I don't think my staring-at-a-blank-screen problem is evidence of much -- that happens to me a lot.

There are other more limited potential uses of a dependency graph. For example: Suppose we just used the graph has an accelerator for the calculations in this PR -- Figure out the glyph-level dependencies, turn them into segment-level dependencies, and verify them with closure calls. Wouldn't that drastically increase the performance of this analysis? Couldn't we do something similar for the all-conjunctive and all-disjunctive conditions?

I guess all of this is a long-winded explanation of why I think it would make sense to revisit the dependency graph option rather than committing to exclusive reliance on the closure mechanic.

Completely agree that we want to incorporate dependency analysis. Among other advantages there is definitely some significant performance gains versus the pure closure approach. I haven't spent as much time as I'd like looking at it, but currently this is what my thinking is (which lines up pretty closely with some of your suggestions above):

The main concern I see is generating a completely 100% provably correct dependency graph (particularly where there's recursive contextual lookups), so until we get to that point we could treat the dependency graph as a good approximation of the true closure conditions.
With a dependency graph in hand it should be straightfoward to use it to generate an initial set of activation conditions for all glyphs (including complex conditions). With activation conditions generated then that slots right into the existing segmenter implementation.
Lastly we can use approaches such as the "additional conditions" check to verify if we are infact meeting the closure requirement.
For glyphs where we aren't, we fall back to the existing closure analysis and/or this complex condition analysis to get us to something that does pass the closure requirement.

I have a few things I'd like to wrap up with the current implementation. For example moving complex condition handling to pre-merge, and handling overlapping scripts. After that though, we are in a good position to start investigating incorporating a dependency graph into the segmenter.

skef · 2025-12-19T22:21:25Z

It's a superset in terms of activation. It will activate whenever the complex condition would plus may activate in other cases where the complex condition wouldn't have activated.

Ah, OK, yes, I think I can see that.

skef · 2025-12-19T22:22:02Z

Sounds like we're in very similar places then. This is good stuff.

Mention true dependency analysis as an alternative option, note that the GSUB scoping is speculative.

garretrieger added 4 commits December 18, 2025 18:25

Add tests for the complex condition finder.

60c08c8

Add ClosureGlyphSegmenter tests for activating find complex conditions.

6dc1e04

Add some documentation on the complex condition finder implementation.

734e5f3

skef reviewed Dec 18, 2025

View reviewed changes

docs/experimental/closure_glyph_segmentation_complex_conditions.md Show resolved Hide resolved

docs/experimental/closure_glyph_segmentation_complex_conditions.md Outdated Show resolved Hide resolved

skef reviewed Dec 18, 2025

View reviewed changes

Fix use after move.

d135688

garretrieger force-pushed the complex_conditions branch from 777839a to 7a12b19 Compare December 19, 2025 19:06

garretrieger force-pushed the complex_conditions branch from 7a12b19 to b83bcb1 Compare December 19, 2025 21:02

Rework performance section discussion on GSUB scoping.

4a87734

Mention true dependency analysis as an alternative option, note that the GSUB scoping is speculative.

Add complex condition finder to the closure segmenter. #171

Are you sure you want to change the base?

Add complex condition finder to the closure segmenter. #171

Uh oh!

Conversation

garretrieger commented Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

skef Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

garretrieger Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skef Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

garretrieger Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

garretrieger Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

garretrieger Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

skef Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

skef commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skef commented Dec 19, 2025

Uh oh!

garretrieger commented Dec 19, 2025

Uh oh!

garretrieger commented Dec 19, 2025

Uh oh!

skef commented Dec 19, 2025

Uh oh!

skef commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garretrieger Dec 19, 2025 •

edited

Loading

garretrieger Dec 19, 2025 •

edited

Loading

skef commented Dec 19, 2025 •

edited

Loading