Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions docs/experimental/closure_glyph_segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Author: Garret Rieger
Date: Jan 27, 2025
Updated: Oct 27, 2025
Updated: Dec 17, 2025

## Introduction

Expand Down Expand Up @@ -69,9 +69,10 @@ and remaining areas for development in this particular approach:
* Support for merging segmentations involving multiple overlapping scripts is not yet implemented
(for example creating a segmentation that supports Chinese and Japanese simultaneously).

* [Multi segment analysis](#multi-segment-dependencies): the current implementation only does single
segment analysis which in some cases leaves sizable fallback glyph sets. How to implement multi
segment analysis is an open question and more development is needed.
* [Multi segment analysis](#multi-segment-dependencies): the current implementation utilizes an approach which
approximates multi segment analysis by finding superset disjunctive conditions for multi segment
conditions. See:
[closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md).

* Input segmentation generation: the glyph segmentation process starts with an existing
codepoint/feature based segmentation. Good results can be achieved by starting with one input
Expand Down Expand Up @@ -252,7 +253,7 @@ that minimizes overall cost.

## Multi Segment Dependencies

Note: this section is somewhat speculative as this functionality has not yet been implemented.
Note: this section is somewhat speculative as this functionality has not yet been fully implemented.
More research and exploration is definitely needed.

The Segmenting Glyphs Based on Closure Analysis procedure places any glyphs whose conditions aren't
Expand Down Expand Up @@ -291,6 +292,14 @@ needed to reduce the amount of combinations to test. Some suggestions:
* The performance of a segmentation is likely driven solely by the high frequency code points. So
divide the font into a high frequency set and low frequency set of code points. Where a more
extensive multi segment dependency check is done for only the high frequency segments.

As an alternative a simpler approach to the problem is to limit the scope to just finding conditions which are a superset
of the true condition. This superset condition can be used in place of the true condition without violating the closure
requirement. This is the approach currently used in the segmenter implementation. This procedure is discussed in more
details in [closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md). The
advantage to this approach is it's much less computationally costly then multi segment analysis. The downside is these
superset conditions will activate more frequently then the true conditions and thus may be loaded in cases where they
are not actually needed.

## Examples

Expand Down
155 changes: 155 additions & 0 deletions docs/experimental/closure_glyph_segmentation_complex_conditions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@

# Complex Condition Finding in Closure Glyph Keyed Segmenter

Author: Garret Rieger__
Date: Dec 17, 2025

## Introduction

Before reading this document is recommended to first review the
[closure glyph segmentation](./closure_glyph_segmentation.md) document. This document borrows concepts and terms from it.

In closure glyph segmentation the closure analysis step is capable of locating glyph activation conditions that are
either fully disjunctive or fully conjunctive (eg. `(A or B or C)`). It is not capable of finding conditions that are a mix
of conjunction and disjunction (eg. `(A and B) or (B and C)`). These are referred to as complex conditions. By default
glyphs with complex conditions are assigned to a patch that is always loaded, since the true conditions are not known.

This document describes an algorithm which can be used to find purely disjunctive conditions which are supersets of
complex conditions. A superset condition is one that will activate at least whenever the true condition would. This
property allows the superset condition to be used for a patch in place of the true condition without violating the
closure requirement.

For a given complex condition there typically exists more than one possible superset disjunctive condition. The
algorithm will find one of them, but not necessarily the smallest one. The found superset condition will always only
contain only segments which appear in the original condition.

For example if we had a glyph with a activation condition of `((A and B) or (B and C))` then this process will find one
of the possible superset conditions such as `(A or C)`, `(A or B)`, `(B or C)`, or `(B)`. In a segmentation we could
then have a patch with the found condition which loads the glyph and this would satisfy the closure requirement.

## Foundations

The algorithm is based on the following assertions:

1. Given some fully disjunctive condition for a glyph, we can verify that the condition is a superset of the true
condition for the glyph and meets the closure requirement by the following procedure: compute a glyph closure of the
union of all segments except for those in the condition. If the glyph does not appear in this closure, then the
condition satisfies the closure requirement for that glyph and is a superset of the true condition. This is called
the “additional conditions” check.

2. The glyph closure of all segments includes the glyph that we are analyzing.

3. We have a glyph which has some true activation condition. If we compute a glyph closure of some combination of
segments, then adding or removing a segment, which is not part of the activation condition, to the glyph closure input
will have no affect on whether or not the glyph appears in the closure output.

4. The closure of no segments contains only glyphs from the initial font.

## The Algorithm

For a glyph with a complex condition we can use the above to find a superset disjunctive condition for that
glyph's complex condition. These conditions will satisfy the closure requirement for each glyph.

### Finding a Sub Condition

The algorithm works by identifying a single sub condition at a time, this section describes the algorithm for
finding a single sub condition.

Inputs:

* Segments to exclude from the analysis.
* `glyph` to analyze.

Algorithm:

1. Start with a set of all segments except those to be excluded, called `to_test`.
2. Initialize a second set of segments, `sub_condition`, to the empty set.
3. Remove a segment `s` from `to_test` and compute the glyph closure of `to_test U sub_condition`.
4. If `glyph` is not found in the closure then add `s` to `sub_condition`.
5. If `to_test` is empty, then return the sub condition `sub_condition`.
6. Otherwise, go back to step 3.

### Finding the Complete Condition

This section describes the algorithm which finds the complete condition, it utilizes `Finding a Sub Condition`.

Inputs:

* `glyph` to analyze.

Algorithm:

1. Initialize a set of segments `condition` to the empty set.
2. Execute the `Finding a Sub Condition` algorithm with `condition` as the excluded set.
3. Union the returned set into `condition`.
4. Compute the glyph closure of all segments except those in `condition`.
5. If `glyph` is found in the closure, then more sub conditions still exist. Go back to step 2.
6. Return the complete condition, `condition`.

### Initial Font

Any time a closure operation is executed by the above two algorithms it's necessary to union the subset definition for
the initial font into the closure input. This is required because the closure of the initial font affects what's
reachable by the segments.

### Why this works

Here we show this procedure is guaranteed to find a disjunctive superset of a glyph's true condition which includes
only segments from the true condition, when the glyph is not already in the initial font:

* For each call to `Finding a Sub Condition` glyph will be in the closure of all non-excluded segments. For the first
call this is guaranteed by assertion (2). For subsequent calls this is guaranteed by the "additional conditions" check
which gates execution.

* Any segments which are not part of the true condition will not impact the glyph's presence in the closure (assertion
(3)). Further by the previous point we know that at the start of `Finding a Sub Condition` the closure of all
non-excluded segments will contain glyph. Thus testing a segment which is not part of the true condition will never
result in glyph missing from the closure, and won't be added to `sub_condition`. Therefore `Finding a Sub Condition`
will only ever return segments that are part of the true condition.

* `Finding a Sub Condition` will always return at least one segment: if when the last segment is tested `sub_condition`
is still the empty set, then the closure will be on no segments and will not have glyph in it. This is a result of
assertion (4) and the premise that glyph is not already in the initial font. As a consequence the returned
`sub_condition` will always have at least one segment in it.

* Since all returned segments from `Finding a Sub Condition` are excluded from future calls, there will be a finite
number of `Finding a Sub Condition` executions which return only segments part of the true condition.

* Lastly, the algorithm terminates only once the additional conditions check finds no additional conditions,
guaranteeing we have found a superset disjunctive condition (assertion (1)).

## Making it More Performant

As described above this approach can be slow since it processes glyphs one at a time. Improvements to performance
can be made by processing glyphs in a batch. This can be done with a recursive approach where after each segment test
the set of input glyphs gets split into those that require the tested segment and those that don't. Each of the splits spawns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also potential compromises to be made here on increasing the efficiency of both the algorithm and the patch table by decreasing the specificity. If you determine that segment A is relevant to glyph x and that segment B is relevant to glyph y, one can treat A, B as relevant to both. One might regret that depending on what happens next (how much relevance one combines by going down the road), but you can reduce glyph-specific processing and glyph-specific patch table entries in this way.

a new recursion (if there is at least one glyph in the split).

Also from the closure analysis run by the segmenter we may have discovered some partial conditions for glyphs. These
can be incorporated as a starting point into the complex condition analysis.

More substantial performance improvements could be realized by using a dependency graph generated from the font. This
could come in two forms:

1. Determine what segments interact with GSUB in some way and use that to scope the analysis. Segments that don't
interact with GSUB can be discovered via regular closure analysis as they will only ever have disjunctive conditions.
After completing a scoped analysis, a final additional conditions check against all segments could be used to ensure
we have actually arrived at a superset condition. This is more speculative and would need more research to validate the
approach.

2. Using a actual dependency graph, even if it's no fully accurate, to generate a set of initial activation
conditions. Then as needed when additional conditions check fails, we could find any additional segments for the
superset condition using this process.

## Integrating into the Segmentation Algorithm

Initially the complex condition analysis has been added as a final step after merging. If after merging unmapped glyphs
are present, then the complex condition analysis is run on those glyphs and the fallback patch is replaced with one or
more patches based on the results of complex condition analysis.

However, ideally complex condition analysis would be run before merging so that the patches it generates can participate
in the merging process. This will require incremental updates to the complex condition analysis results, but that should
be straightforward. Implementing this is planned for the near future.



21 changes: 21 additions & 0 deletions ift/encoder/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ cc_library(
"//ift/proto",
"//ift/feature_registry:feature_registry",
"//util:segmentation_plan_cc_proto",
"//util:common_cc_proto",
"@abseil-cpp//absl/container:btree",
"@abseil-cpp//absl/container:flat_hash_map",
"@abseil-cpp//absl/container:flat_hash_set",
Expand Down Expand Up @@ -60,6 +61,8 @@ cc_library(
"glyph_condition_set.h",
"glyph_groupings.h",
"glyph_groupings.cc",
"complex_condition_finder.h",
"complex_condition_finder.cc",
"candidate_merge.cc",
"candidate_merge.h",
"patch_size_cache.h",
Expand Down Expand Up @@ -185,6 +188,24 @@ cc_test(
],
)

cc_test(
name = "complex_condition_finder_test",
size = "medium",
srcs = [
"complex_condition_finder_test.cc",
],
data = [
"//common:testdata",
"//ift:testdata",
],
deps = [
":segmentation_context",
":encoder",
"//common",
"@googletest//:gtest_main",
],
)

cc_test(
name = "closure_glyph_segmenter_test",
size = "medium",
Expand Down
44 changes: 34 additions & 10 deletions ift/encoder/closure_glyph_segmenter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
}

return CodepointToGlyphSegments(face, initial_segment, subset_definitions,
merge_groups, false);
merge_groups, PATCH);
}

StatusOr<std::vector<Merger>> ToMergers(
Expand All @@ -394,12 +394,37 @@ StatusOr<std::vector<Merger>> ToMergers(
return mergers;
}

static StatusOr<GlyphSegmentation> ToFinalSegmentation(
SegmentationContext& context,
UnmappedGlyphHandling unmapped_glyph_handling) {
if (unmapped_glyph_handling == FIND_CONDITIONS) {
// TODO(garretrieger): this analysis should be performed prior to merging
// so that the found conditions can participate in merging. To make this
// performant we'll need to add support for incrementally recomputing
// complex conditions that are effected by merges.
//
// The good news here is that when we do a segment merge of the generated
// complex activation conditions that will naturally fix the unmapped
// nature of the relevant glyphs. However, changes to segments may
// also invalidate the complex conditions and require incremental
// reprocessing.
//
// Roughly, during invalidation and subsequent incremental closure
// analysis we may re-identify unmapped glyphs these would then
// need to be invalidated and reprocessed by the complex condition finder.
TRYV(context.glyph_groupings.FindFallbackGlyphConditions(
context.SegmentationInfo(), context.glyph_condition_set,
context.glyph_closure_cache));
}

return context.ToGlyphSegmentation();
}

StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
hb_face_t* face, SubsetDefinition initial_segment,
const std::vector<SubsetDefinition>& subset_definitions,
btree_map<SegmentSet, MergeStrategy> merge_groups,
bool place_fallback_in_init) const {

UnmappedGlyphHandling unmapped_glyph_handling) const {
for (const auto& [segments, strategy] : merge_groups) {
if (strategy.UseCosts()) {
TRYV(CheckForDisjointCodepoints(subset_definitions, segments));
Expand Down Expand Up @@ -442,7 +467,8 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
// if requested any remaining fallback glyphs are also moved into the init
// font.
GlyphSet fallback_glyphs = context.glyph_groupings.FallbackGlyphs();
if (place_fallback_in_init && !fallback_glyphs.empty()) {
if (unmapped_glyph_handling == MOVE_TO_INIT_FONT &&
!fallback_glyphs.empty()) {
VLOG(0) << "Moving " << fallback_glyphs.size()
<< " fallback glyphs into the initial font." << std::endl;
SubsetDefinition new_def = context.SegmentationInfo().InitFontSegment();
Expand All @@ -452,7 +478,7 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(

if (merge_groups.empty()) {
// No merging will be needed so we're done.
return context.ToGlyphSegmentation();
return ToFinalSegmentation(context, unmapped_glyph_handling);
}

// ### Iteratively merge segments and incrementally reprocess affected data.
Expand Down Expand Up @@ -499,7 +525,7 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
merger.LogMergedSizeHistogram();
}

return context.ToGlyphSegmentation();
return ToFinalSegmentation(context, unmapped_glyph_handling);
}

const auto& [merged_segment_index, modified_gids] = *merged;
Expand Down Expand Up @@ -609,10 +635,8 @@ StatusOr<SegmentationCost> ClosureGlyphSegmenter::TotalCost(
}

Status ClosureGlyphSegmenter::FallbackCost(
hb_face_t* original_face, const GlyphSegmentation& segmentation,
uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size
) const {

hb_face_t* original_face, const GlyphSegmentation& segmentation,
uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size) const {
GlyphSet all_glyphs = segmentation.InitialFontGlyphClosure();
for (const auto& [_, gids] : segmentation.GidSegments()) {
all_glyphs.union_set(gids);
Expand Down
14 changes: 8 additions & 6 deletions ift/encoder/closure_glyph_segmenter.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "ift/encoder/segmentation_context.h"
#include "ift/encoder/subset_definition.h"
#include "ift/freq/probability_calculator.h"
#include "util/common.pb.h"

namespace ift::encoder {

Expand Down Expand Up @@ -55,7 +56,7 @@ class ClosureGlyphSegmenter {
hb_face_t* face, SubsetDefinition initial_segment,
const std::vector<SubsetDefinition>& subset_definitions,
absl::btree_map<common::SegmentSet, MergeStrategy> merge_groups,
bool place_fallback_in_init) const;
UnmappedGlyphHandling unmapped_glyph_handling) const;

/*
* Generates a segmentation context for the provided segmentation input.
Expand All @@ -76,12 +77,13 @@ class ClosureGlyphSegmenter {
const freq::ProbabilityCalculator& probability_calculator) const;

/*
* Computes the total cost of the fallback patch (expected number of bytes transferred)
* Computes the total cost of the fallback patch (expected number of bytes
* transferred)
*/
absl::Status FallbackCost(
hb_face_t* original_face, const GlyphSegmentation& segmentation,
uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size
) const;
absl::Status FallbackCost(hb_face_t* original_face,
const GlyphSegmentation& segmentation,
uint32_t& fallback_glyphs_size,
uint32_t& all_glyphs_size) const;

private:
uint32_t brotli_quality_;
Expand Down
Loading
Loading