-
Notifications
You must be signed in to change notification settings - Fork 5
Add complex condition finder to the closure segmenter. #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is used to assign conditions to unmapped (fallback) glyphs that fail to have conditions detected by regular closure analysis. The complex detection tries to find a minimal purely disjunctive condition for each unmapped glyph. This condition is not the true condition for the glyph but is a minimal purely disjunctive superset of the true condition. Currently this is done as a post-merging step, however I'm ultimately aiming to have this done pre-merging and make merging capable of incrementally applying complex condition detection as needed. This adds a new field to the segmenter config to control how unmapped glyphs are handled, there are three options: - Move them to the initial font (existing default behaviour) - Place them in an always loaded patch. - Run the newly added complex condition detection and place in more granular patches based on that. Lastly, no tests have been added yet. Those are coming next.
docs/experimental/closure_glyph_segmentation_complex_conditions.md
Outdated
Show resolved
Hide resolved
| Also from the closure analysis run by the segmenter we may have discovered some partial conditions for glyphs. These | ||
| can be incorporated as a starting point into the complex condition analysis. | ||
|
|
||
| Furthermore we can reduce the amount of segments we need to test by checking which segments can interact in some way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or COLR, or glyf component, or MATH, or ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this one is a bit speculative and definitely needs some further research to justify and be sure it holds in general (I'll update the doc to note that), but as far as I can tell from some initial investigation COLR, glyph components, and MATH can only do one to many glyph substitutions. One to many glyph subs should only result in disjunctive conditions. Example:
a_acute -> {a, acute}, e_acute -> {e, acute} would result in acute being required when (a_acute or e_acute) is present. GSUB is the only table, as far as I know, that enables many to many subs which is required for a conjunctive condition to show up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be true, but this goes back to the lack of visibility of the closure mechanic. Most non-GSUB dependencies are post-GSUB, so a non-purely-disjunctive GSUB substitution could transfer onto any downstream dependency. That wouldn't matter if you had the graph, because you can tell what is upstream or downstream from whatever else. But if you don't have the graph you can't tell, just from looking at GSUB, what else is being affected by those GSUB lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in particular is an area where I think some more research/thinking is needed. I'm not 100% confident in it's validity but my instinct is that things which are downstream of GSUB won't matter at least in the context of this particular analysis. Here's my rough thinking:
- For segments which don't interact with GSUB. The closure analysis will always surface those segments as a disjunctive condition for the glyphs regardless of what's going in in GSUB (since anything in the closure of a segment is always considered at minimum a disjunctive condition)
- Those glyphs may still have other conditions which come from GSUB or GSUB + down stream stuff, but those conditions should only involve only segments that interact with GSUB, since to trigger the GSUB substitutions you have to start with segments that interact with it. (note: this point I'm not 100% confident about, definitely needs more research).
- So then in this analysis you start with the partial disjunctive conditions, and set "all segments" to to only the GSUB interacting ones.
The good news is even if this isn't 100% always true, we can still take this approach and then at the very end do a additional conditions check against all segments. If we find there's still additional conditions then the analysis can be repeated using all segments instead of the GSUB interacting ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultimately this is just a very light version of actual dependency analysis, and once we have a dependency graph integrated that would be the better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this part to note the GSUB approach is speculative and also added a note that you could instead do real dependency analysis.
|
|
||
| As described above this approach can be slow since it processes glyphs one at a time. Improvements to performance | ||
| can be made by processing glyphs in a batch. This can be done with a recursive approach where after each segment test | ||
| the set of input glyphs gets split into those that require the tested segment and those that don't. Each of the splits spawns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also potential compromises to be made here on increasing the efficiency of both the algorithm and the patch table by decreasing the specificity. If you determine that segment A is relevant to glyph x and that segment B is relevant to glyph y, one can treat A, B as relevant to both. One might regret that depending on what happens next (how much relevance one combines by going down the road), but you can reduce glyph-specific processing and glyph-specific patch table entries in this way.
|
Some thoughts on this: On the (substantial) positive side: As far as I can tell this does accomplish what it says on the tin (modified by the caveats I raised in the code comments). This is better than I had expected we could do with just the closure mechanic. Shading into some concerns, and setting aside questions of encoder runtime performance, this still strikes me as a less than ideal place to address complex conditions, that place being after segmentation rather than before, when the conditions could affect what is grouped with what in the first place. In this implementation we're basically doing this analysis at this point because there is no alternative -- you need large-ish established groupings to bring it into the realm of acceptable performance. So with the fonts that the current baseline encoder was struggling with, the result will be much better and more specific, but instead of putting many glyphs in "patch 0" (or whatever it's called) we'll now have many add-on patch entries for those glyphs, probably grouped together somewhat using dependency similarity heuristics. Definitely better, but still (arguably?) a bit janky. The question all of this processing is trying to answer, that being what glyphs (represented by segments in this analysis) are dependent ancestors of what other glyphs, is a more or less trivial question if one had access to the dependency graph. So why not just use a dependency graph, which is relatively cheap to produce (on the order of one or two closure runs? Well ...
4 seems like the most legitimate objection. After I wrote the dependency extractor I spent a long time looking at a blank screen trying to come up with a good collection of cooperative heuristics before I had to move on to other projects. In this context of this PR that would be equivalent to: Maybe we don't want to address complex conditions earlier in processing because we don't understand how to take advantage of that. On the other hand:
I guess all of this is a long-winded explanation of why I think it would make sense to revisit the dependency graph option rather than committing to exclusive reliance on the closure mechanic. |
777839a to
7a12b19
Compare
|
I find the "superset" terminology counter-intuitive. It's a subset of the terms in the complex condition. Since it's being treated as a disjunction, it's also less rather than more inclusive of segments. (That is, if all the segments relevant to the complex condition were found, more segments would be included in the disjunction, and therefore more glyphs.) So what is it a superset of? |
The result of the analysis is not gauranteed to include all segments in the true condition, only that the found condition is a superset of the true condition comprised only of segments from the true condition. This updates the doc to remove the claim that found conditions will contain all true condition segments. Additionally this reworks the 'Why this works' section to be a bit more rigorous.
7a12b19 to
b83bcb1
Compare
It's a superset in terms of activation. It will activate whenever the complex condition would plus may activate in other cases where the complex condition wouldn't have activated. |
So the plan is to move this analysis to before merging so that discovered conditions can participate in the merging process. At the moment I'm just currently sorting of the details of incremental invalidation and reprocessing that are introduced. For performance, it remains to be seen, but my expectation is that doing the analysis before merging won't significantly increase the computational cost, since the additional analysis performed during merging is incremental.
Completely agree that we want to incorporate dependency analysis. Among other advantages there is definitely some significant performance gains versus the pure closure approach. I haven't spent as much time as I'd like looking at it, but currently this is what my thinking is (which lines up pretty closely with some of your suggestions above):
I have a few things I'd like to wrap up with the current implementation. For example moving complex condition handling to pre-merge, and handling overlapping scripts. After that though, we are in a good position to start investigating incorporating a dependency graph into the segmenter. |
Ah, OK, yes, I think I can see that. |
|
Sounds like we're in very similar places then. This is good stuff. |
Mention true dependency analysis as an alternative option, note that the GSUB scoping is speculative.
This finder can locate the set of segments that are involved in any complex conditions for glyphs (conditions which have a mix of conjunction and disjunction). We can then use that set of segments to form a purely disjunctive condition which is a superset of the true complex condition. This superset condition can then be used for patches and will not violate the closure requirement. Ultimately this allows us to assign more narrow conditions to glyphs that would be otherwise placed in the always loaded fallback patch.
This is an initial pass at an implementation, but more work will be needed:
The complex condition analysis step is currently optional and enabled via a new segmenter config setting:
unmapped_glyph_handling