Skip to content

Commit 734e5f3

Browse files
committed
Add some documentation on the complex condition finder implementation.
1 parent 6dc1e04 commit 734e5f3

File tree

6 files changed

+236
-68
lines changed

6 files changed

+236
-68
lines changed

docs/experimental/closure_glyph_segmentation.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Author: Garret Rieger
44
Date: Jan 27, 2025
5-
Updated: Oct 27, 2025
5+
Updated: Dec 17, 2025
66

77
## Introduction
88

@@ -69,9 +69,10 @@ and remaining areas for development in this particular approach:
6969
* Support for merging segmentations involving multiple overlapping scripts is not yet implemented
7070
(for example creating a segmentation that supports Chinese and Japanese simultaneously).
7171

72-
* [Multi segment analysis](#multi-segment-dependencies): the current implementation only does single
73-
segment analysis which in some cases leaves sizable fallback glyph sets. How to implement multi
74-
segment analysis is an open question and more development is needed.
72+
* [Multi segment analysis](#multi-segment-dependencies): the current implementation utilizes an approach which
73+
approximates multi segment analysis by finding superset minimal disjunctive conditions for multi segment
74+
conditions. See:
75+
[closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md).
7576

7677
* Input segmentation generation: the glyph segmentation process starts with an existing
7778
codepoint/feature based segmentation. Good results can be achieved by starting with one input
@@ -252,7 +253,7 @@ that minimizes overall cost.
252253

253254
## Multi Segment Dependencies
254255

255-
Note: this section is somewhat speculative as this functionality has not yet been implemented.
256+
Note: this section is somewhat speculative as this functionality has not yet been fully implemented.
256257
More research and exploration is definitely needed.
257258

258259
The Segmenting Glyphs Based on Closure Analysis procedure places any glyphs whose conditions aren't
@@ -291,6 +292,15 @@ needed to reduce the amount of combinations to test. Some suggestions:
291292
* The performance of a segmentation is likely driven solely by the high frequency code points. So
292293
divide the font into a high frequency set and low frequency set of code points. Where a more
293294
extensive multi segment dependency check is done for only the high frequency segments.
295+
296+
As an alternative a simpler approach to the problem is to limit the scope to just finding the segments that appear in a
297+
multi segment condition. If we know the segments involved then the disjunction of them will be a superset of the true
298+
underlying condition. This superset condition can be used in place of the true condition without violating the closure
299+
requirement. This is the approach currently used in the segmenter implementation. This procedure is discussed in more
300+
details in [closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md). The
301+
advantage to this approach is it's much less computationally costly then multi segment analysis. The downside is these
302+
superset conditions will activate more frequently then the true conditions and thus may be loaded in cases where they
303+
are not actually needed.
294304

295305
## Examples
296306

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
2+
# Complex Condition Finding in Closure Glyph Keyed Segmenter
3+
4+
Author: Garret Rieger__
5+
Date: Dec 17, 2025
6+
7+
## Introduction
8+
9+
Before reading this document is recommended to first review the [closure glyph
10+
segmentation](./closure_glyph_segmentation.md) document.
11+
12+
In closure glyph segmentation the closure analysis step is capable of locating glyph activation conditions that are
13+
either fully disjunctive or fully conjunctive (eg. A or B or C). It is not capable of finding conditions that are a mix
14+
of conjunction and disjunction (eg. (A and B) or (B and C)). These are referred to as complex conditions. By default
15+
glyphs with complex conditions are assigned to a patch that is always loaded, since the true conditions are not known.
16+
17+
This document describes an algorithm which can be used to find the complete set of segments which are a part of the
18+
complex condition for a glyph. If the set of segments which are present in a condition is known, then we can form a
19+
purely disjunctive condition using those segments which is guaranteed to be a superset of the true condition. That is it
20+
will always activate at least when the true condition would. This property allows the superset condition to be used for
21+
a patch in place of the true condition without violating the closure requirement.
22+
23+
For example if we had a glyph with a activation condition of ((A and B) or (B and C)) then this process will find the set
24+
of segments {A, B, C} which would form the superset condition (A or B or C). In a segmentation we could then have a
25+
patch with condition (A or B or C) which loads the glyph and this would satisfy the closure requirement. In the future
26+
if we decide to develop an analysis to find the true condition then the segment set found by this process could be used
27+
to narrow down the search space to only those segments involved in the condition.
28+
29+
## Foundations
30+
31+
The algorithm is based on the following assertions:
32+
33+
1. For any complex activation condition of a glyph, a disjunction over all segments appearing in that condition
34+
will always activate at least when the original condition does.
35+
36+
2. Given some fully disjunctive condition, we can verify that condition is sufficient to meet the glyph closure
37+
requirement for a glyph by the following procedure: compute a glyph closure of the union of all segments except for
38+
those in the condition. If the glyph does not appear in this closure, then the condition satisfies the closure
39+
requirement for that glyph. This is called the “additional conditions” check.
40+
41+
3. The glyph closure of all segments will include the glyphs that we are analyzing.
42+
43+
4. We have a glyph which has some true activation condition. If we compute a glyph closure of some combination of
44+
segments, then adding or removing a segment, which is not part of the activation condition, to the glyph closure input
45+
will have no affect on whether or not the glyph appears in the closure output.
46+
47+
## The Algorithm
48+
49+
For each glyph with a complex condition we can use the above to find the complete set of segments which are part of the
50+
glyph's complex condition. A condition which is a disjunction across these segments will satisfy the closure requirement
51+
for that glyph.
52+
53+
### Finding a Sub Condition
54+
55+
The algorithm works by identifying a single sub condition at a time, this section describes the algorithm for
56+
finding a single sub condition.
57+
58+
Inputs:
59+
60+
* Segments to exclude from the analysis.
61+
* `glyph` to analyze.
62+
63+
Algorithm:
64+
65+
1. Start with a set of all segments except those to be excluded, called `to_test`.
66+
2. Initialize a second set of segments, `required`, to the empty set.
67+
3. Remove a segment `s` from `to_test` and compute the glyph closure of `to_test U required`.
68+
4. If `glyph` is not found in the closure then add `s` to `required`.
69+
5. If `to_test` is empty, then return the sub condition `required`.
70+
6. Otherwise, go back to step 3.
71+
72+
### Finding the Complete Condition
73+
74+
This section describes the algorithm which finds the complete condition, it utilizes `Finding a Sub Condition`.
75+
76+
Inputs:
77+
78+
* `glyph` to analyze.
79+
80+
Algorithm:
81+
82+
1. Initialize a set of segments `condition` to the empty set.
83+
2. Execute the `Finding a Sub Condition` algorithm with `condition` as the excluded set.
84+
3. Union the returned set into `condition`.
85+
4. Compute the glyph closure of all segments except those in `condition`.
86+
5. If `glyph` is found in the closure, then more conditions still exist. Go back to step 2.
87+
6. Return the complete condition, `condition`.
88+
89+
### Initial Font
90+
91+
Any time a closure operation is executed by the above two algorithms it's necessary to union the subset definition
92+
for the initial font into the closure input. That's because the closure of the initial font affects what's reachable
93+
by the segments.
94+
95+
### Why this works
96+
97+
* Any segments which are not part of the true condition will not impact the glyph's presence in the closure (assertion
98+
(4)). As a result they will never be moved into the `required` set and will not be returned by `Finding a Sub
99+
Condition`. Thus any segments returned by `Finding a Sub Condition` are part of the true condition.
100+
101+
* Each iteration of `Finding a Sub Condition` is guaranteed to select at least one segment since we know that the
102+
initial closure always starts with the glyph in it, and the closure of no segments will not have the glyph in it. So
103+
at some point during the algorithm the glyph must be found to not be present. In the first iteration this is a result
104+
of assertion (3). For subsequent iterations this is guaranteed by the "additional conditions" check prior to starting
105+
the iteration.
106+
107+
* Since all returned segments from `Finding a Sub Condition` are excluded from future calls, there will be a finite
108+
number of `Finding a Sub Condition` executions which return only segments part of the true condition.
109+
110+
* Lastly, the algorithm terminates only once the additional conditions check finds no additional conditions,
111+
guaranteeing we have found the complete superset disjunctive condition.
112+
113+
## Making it More Performant
114+
115+
As described above this approach can be slow since it processes glyphs one at a time. Improvements to performance
116+
can be made by processing glyphs in a batch. This can be done with a recursive approach where after each segment test
117+
the set of input glyphs gets split into those that require the tested segment and those that don't. Each of the splits spawns
118+
a new recursion (if there is at least one glyph in the split).
119+
120+
Also from the closure analysis run by the segmenter we may have discovered some partial conditions for glyphs. These
121+
can be incorporated as a starting point into the complex condition analysis.
122+
123+
Furthermore we can reduce the amount of segments we need to test by checking which segments can interact in some way
124+
with the GSUB table. Segments that don't interact with GSUB can't by part of a conjunctive condition, so these can
125+
always be found via the standard closure analysis procedures. Then the search can be limited to just the set of segments
126+
which interact with GSUB and were not identified during regular closure analysis.
127+
128+
129+
## Integrating into the Segmentation Algorithm
130+
131+
Initially the complex condition analysis has been added as a final step after merging. If after merging unmapped glyphs
132+
are present, then the complex condition analysis is run on those glyphs and the fallback patch is replaced with one or
133+
more patches based on the results of complex condition analysis.
134+
135+
However, ideally complex condition analysis would be run before merging so that the patches it generates can participate
136+
in the merging process. This will require incremental updates to the complex condition analysis results, but that should
137+
be straightforward. Implementing this is planned for the near future.
138+
139+
140+

ift/encoder/complex_condition_finder.cc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,12 @@ using absl::StatusOr;
1616
using common::GlyphSet;
1717
using common::SegmentSet;
1818

19+
// For more information on this process see the explanation in:
20+
// ../../docs/experimental/closure_glyph_segmentation_complex_conditions.md
21+
1922
namespace ift::encoder {
2023

24+
// One unit of work for the analysis. One from to_be_tested will be checked.
2125
struct Task {
2226
// These segments are already fully analyzed and should be excluded from this
2327
// analysis.
@@ -260,7 +264,7 @@ static SegmentSet NonEmptySegments(
260264
return segments;
261265
}
262266

263-
StatusOr<btree_map<SegmentSet, GlyphSet>> FindComplexConditionsFor(
267+
StatusOr<btree_map<SegmentSet, GlyphSet>> FindMinimalDisjunctiveConditionsFor(
264268
const RequestedSegmentationInformation& segmentation_info,
265269
const GlyphConditionSet& glyph_condition_set,
266270
GlyphClosureCache& closure_cache, GlyphSet glyphs) {

ift/encoder/complex_condition_finder.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,21 @@
99

1010
namespace ift::encoder {
1111

12+
// Finds the minimal purely disjunctive conditions that activate each
13+
// provided glyph. Returns a map from each condition to the activated
14+
// glyphs.
15+
//
16+
// Takes a glyph condition set which will be used as a starting point.
17+
//
18+
// A minimal purely disjunctive condition is the complete set of segments
19+
// which appear in the true activation condition for a glyph. The
20+
// purely disjunctive version is a super set of the true condition and
21+
// will activate at least whenever the true condition would.
22+
//
23+
// For example if a glyph has the true condition (a and b) or (b and c)
24+
// this will find the condition (a or b or c).
1225
absl::StatusOr<absl::btree_map<common::SegmentSet, common::GlyphSet>>
13-
FindComplexConditionsFor(
26+
FindMinimalDisjunctiveConditionsFor(
1427
const RequestedSegmentationInformation& segmentation_info,
1528
const GlyphConditionSet& glyph_condition_set,
1629
GlyphClosureCache& closure_cache, common::GlyphSet glyphs);

0 commit comments

Comments
 (0)