w3c · garretrieger · Dec 5, 2025 · Dec 17, 2025 · Dec 17, 2025 · Dec 18, 2025
diff --git a/docs/experimental/closure_glyph_segmentation.md b/docs/experimental/closure_glyph_segmentation.md
@@ -2,7 +2,7 @@
 
 Author: Garret Rieger  
 Date: Jan 27, 2025  
-Updated: Oct 27, 2025
+Updated: Dec 17, 2025
 
 ## Introduction
 
@@ -69,9 +69,10 @@ and remaining areas for development in this particular approach:
 * Support for merging segmentations involving multiple overlapping scripts is not yet implemented
   (for example creating a segmentation that supports Chinese and Japanese simultaneously).
 
-* [Multi segment analysis](#multi-segment-dependencies): the current implementation only does single
-  segment analysis which in some cases leaves sizable fallback glyph sets. How to implement multi
-  segment analysis is an open question and more development is needed.
+* [Multi segment analysis](#multi-segment-dependencies): the current implementation utilizes an approach which
+  approximates multi segment analysis by finding superset disjunctive conditions for multi segment
+  conditions. See:
+  [closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md).
 
 * Input segmentation generation: the glyph segmentation process starts with an existing
   codepoint/feature based segmentation. Good results can be achieved by starting with one input
@@ -252,7 +253,7 @@ that minimizes overall cost.
 
 ## Multi Segment Dependencies
 
-Note: this section is somewhat speculative as this functionality has not yet been implemented.
+Note: this section is somewhat speculative as this functionality has not yet been fully implemented.
 More research and exploration is definitely needed.
 
 The Segmenting Glyphs Based on Closure Analysis procedure places any glyphs whose conditions aren't
@@ -291,6 +292,14 @@ needed to reduce the amount of combinations to test. Some suggestions:
 * The performance of a segmentation is likely driven solely by the high frequency code points. So
   divide the font into a high frequency set and low frequency set of code points. Where a more
   extensive multi segment dependency check is done for only the high frequency segments.
+
+As an alternative a simpler approach to the problem is to limit the scope to just finding conditions which are a superset
+of the true condition. This superset condition can be used in place of the true condition without violating the closure
+requirement. This is the approach currently used in the segmenter implementation. This procedure is discussed in more
+details in [closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md). The
+advantage to this approach is it's much less computationally costly then multi segment analysis. The downside is these
+superset conditions will activate more frequently then the true conditions and thus may be loaded in cases where they
+are not actually needed.
 
 ## Examples
 

diff --git a/docs/experimental/closure_glyph_segmentation_complex_conditions.md b/docs/experimental/closure_glyph_segmentation_complex_conditions.md
@@ -0,0 +1,155 @@
+
+# Complex Condition Finding in Closure Glyph Keyed Segmenter
+
+Author: Garret Rieger__
+Date: Dec 17, 2025
+
+## Introduction
+
+Before reading this document is recommended to first review the
+[closure glyph segmentation](./closure_glyph_segmentation.md) document. This document borrows concepts and terms from it.
+
+In closure glyph segmentation the closure analysis step is capable of locating glyph activation conditions that are
+either fully disjunctive or fully conjunctive (eg. `(A or B or C)`). It is not capable of finding conditions that are a mix
+of conjunction and disjunction (eg. `(A and B) or (B and C)`). These are referred to as complex conditions. By default
+glyphs with complex conditions are assigned to a patch that is always loaded, since the true conditions are not known.
+
+This document describes an algorithm which can be used to find purely disjunctive conditions which are supersets of
+complex conditions. A superset condition is one that will activate at least whenever the true condition would. This
+property allows the superset condition to be used for a patch in place of the true condition without violating the
+closure requirement.
+
+For a given complex condition there typically exists more than one possible superset disjunctive condition. The
+algorithm will find one of them, but not necessarily the smallest one. The found superset condition will always only
+contain only segments which appear in the original condition.
+
+For example if we had a glyph with a activation condition of `((A and B) or (B and C))` then this process will find one
+of the possible superset conditions such as `(A or C)`, `(A or B)`, `(B or C)`, or `(B)`. In a segmentation we could
+then have a patch with the found condition which loads the glyph and this would satisfy the closure requirement.
+
+## Foundations
+
+The algorithm is based on the following assertions:
+
+1. Given some fully disjunctive condition for a glyph, we can verify that the condition is a superset of the true
+   condition for the glyph and meets the closure requirement by the following procedure: compute a glyph closure of the
+   union of all segments except for those in the condition. If the glyph does not appear in this closure, then the
+   condition satisfies the closure requirement for that glyph and is a superset of the true condition. This is called
+   the “additional conditions” check.
+
+2. The glyph closure of all segments includes the glyph that we are analyzing.
+
+3. We have a glyph which has some true activation condition. If we compute a glyph closure of some combination of
+   segments, then adding or removing a segment, which is not part of the activation condition, to the glyph closure input
+   will have no affect on whether or not the glyph appears in the closure output.
+
+4. The closure of no segments contains only glyphs from the initial font.
+
+## The Algorithm
+
+For a glyph with a complex condition we can use the above to find a superset disjunctive condition for that
+glyph's complex condition. These conditions will satisfy the closure requirement for each glyph.
+
+### Finding a Sub Condition
+
+The algorithm works by identifying a single sub condition at a time, this section describes the algorithm for
+finding a single sub condition.
+
+Inputs:
+
+* Segments to exclude from the analysis.
+* `glyph` to analyze.
+
+Algorithm:
+
+1. Start with a set of all segments except those to be excluded, called `to_test`.
+2. Initialize a second set of segments, `sub_condition`, to the empty set.
+3. Remove a segment `s` from `to_test` and compute the glyph closure of `to_test U sub_condition`.
+4. If `glyph` is not found in the closure then add `s` to `sub_condition`.
+5. If `to_test` is empty, then return the  sub condition `sub_condition`.
+6. Otherwise, go back to step 3.
+
+### Finding the Complete Condition
+
+This section describes the algorithm which finds the complete condition, it utilizes `Finding a Sub Condition`.
+
+Inputs:
+
+* `glyph` to analyze.
+
+Algorithm:
+
+1. Initialize a set of segments `condition` to the empty set.
+2. Execute the `Finding a Sub Condition` algorithm with `condition` as the excluded set.
+3. Union the returned set into `condition`.
+4. Compute the glyph closure of all segments except those in `condition`.
+5. If `glyph` is found in the closure, then more sub conditions still exist. Go back to step 2.
+6. Return the complete condition, `condition`.
+
+### Initial Font
+
+Any time a closure operation is executed by the above two algorithms it's necessary to union the subset definition for
+the initial font into the closure input. This is required because the closure of the initial font affects what's
+reachable by the segments.
+
+### Why this works
+
+Here we show this procedure is guaranteed to find a disjunctive superset of a glyph's true condition which includes
+only segments from the true condition, when the glyph is not already in the initial font:
+
+* For each call to `Finding a Sub Condition` glyph will be in the closure of all non-excluded segments. For the first
+  call this is guaranteed by assertion (2). For subsequent calls this is guaranteed by the "additional conditions" check
+  which gates execution.
+
+* Any segments which are not part of the true condition will not impact the glyph's presence in the closure (assertion
+  (3)). Further by the previous point we know that at the start of `Finding a Sub Condition` the closure of all
+  non-excluded segments will contain glyph. Thus testing a segment which is not part of the true condition will never
+  result in glyph missing from the closure, and won't be added to `sub_condition`. Therefore `Finding a Sub Condition`
+  will only ever return segments that are part of the true condition.
+
+* `Finding a Sub Condition` will always return at least one segment: if when the last segment is tested `sub_condition`
+  is still the empty set, then the closure will be on no segments and will not have glyph in it. This is a result of
+  assertion (4) and the premise that glyph is not already in the initial font. As a consequence the returned
+  `sub_condition` will always have at least one segment in it.
+
+* Since all returned segments from `Finding a Sub Condition` are excluded from future calls, there will be a finite
+  number of `Finding a Sub Condition` executions which return only segments part of the true condition.
+
+* Lastly, the algorithm terminates only once the additional conditions check finds no additional conditions,
+  guaranteeing we have found a superset disjunctive condition (assertion (1)).
+
+## Making it More Performant
+
+As described above this approach can be slow since it processes glyphs one at a time. Improvements to performance
+can be made by processing glyphs in a batch. This can be done with a recursive approach where after each segment test 
+the set of input glyphs gets split into those that require the tested segment and those that don't. Each of the splits spawns
+a new recursion (if there is at least one glyph in the split).
+
+Also from the closure analysis run by the segmenter we may have discovered some partial conditions for glyphs. These
+can be incorporated as a starting point into the complex condition analysis.
+
+More substantial performance improvements could be realized by using a dependency graph generated from the font. This
+could come in two forms:
+
+1. Determine what segments interact with GSUB in some way and use that to scope the analysis. Segments that don't
+   interact with GSUB can be discovered via regular closure analysis as they will only ever have disjunctive conditions.
+   After completing a scoped analysis, a final additional conditions check against all segments could be used to ensure
+   we have actually arrived at a superset condition. This is more speculative and would need more research to validate the
+   approach.
+
+2. Using a actual dependency graph, even if it's no fully accurate, to generate a set of initial activation
+   conditions. Then as needed when additional conditions check fails, we could find any additional segments for the
+   superset condition using this process.
+
+## Integrating into the Segmentation Algorithm
+
+Initially the complex condition analysis has been added as a final step after merging. If after merging unmapped glyphs
+are present, then the complex condition analysis is run on those glyphs and the fallback patch is replaced with one or
+more patches based on the results of complex condition analysis.
+
+However, ideally complex condition analysis would be run before merging so that the patches it generates can participate
+in the merging process. This will require incremental updates to the complex condition analysis results, but that should
+be straightforward. Implementing this is planned for the near future.
+
+
+
diff --git a/ift/encoder/BUILD b/ift/encoder/BUILD
@@ -21,6 +21,7 @@ cc_library(
         "//ift/proto",
         "//ift/feature_registry:feature_registry",
         "//util:segmentation_plan_cc_proto",
+        "//util:common_cc_proto",
         "@abseil-cpp//absl/container:btree",
         "@abseil-cpp//absl/container:flat_hash_map",
         "@abseil-cpp//absl/container:flat_hash_set",
@@ -60,6 +61,8 @@ cc_library(
         "glyph_condition_set.h",
         "glyph_groupings.h",
         "glyph_groupings.cc",
+        "complex_condition_finder.h",
+        "complex_condition_finder.cc",
         "candidate_merge.cc",
         "candidate_merge.h",
         "patch_size_cache.h",
@@ -185,6 +188,24 @@ cc_test(
     ],
 )
 
+cc_test(
+    name = "complex_condition_finder_test",
+    size = "medium",
+    srcs = [
+        "complex_condition_finder_test.cc",
+    ],
+    data = [
+        "//common:testdata",
+        "//ift:testdata",
+    ],
+    deps = [
+        ":segmentation_context",
+        ":encoder",
+        "//common",
+        "@googletest//:gtest_main",
+    ],
+)
+
 cc_test(
     name = "closure_glyph_segmenter_test",
     size = "medium",

diff --git a/ift/encoder/closure_glyph_segmenter.cc b/ift/encoder/closure_glyph_segmenter.cc
@@ -379,7 +379,7 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
   }
 
   return CodepointToGlyphSegments(face, initial_segment, subset_definitions,
-                                  merge_groups, false);
+                                  merge_groups, PATCH);
 }
 
 StatusOr<std::vector<Merger>> ToMergers(
@@ -394,12 +394,37 @@ StatusOr<std::vector<Merger>> ToMergers(
   return mergers;
 }
 
+static StatusOr<GlyphSegmentation> ToFinalSegmentation(
+    SegmentationContext& context,
+    UnmappedGlyphHandling unmapped_glyph_handling) {
+  if (unmapped_glyph_handling == FIND_CONDITIONS) {
+    // TODO(garretrieger): this analysis should be performed prior to merging
+    // so that the found conditions can participate in merging. To make this
+    // performant we'll need to add support for incrementally recomputing
+    // complex conditions that are effected by merges.
+    //
+    // The good news here is that when we do a segment merge of the generated
+    // complex activation conditions that will naturally fix the unmapped
+    // nature of the relevant glyphs. However, changes to segments may
+    // also invalidate the complex conditions and require incremental
+    // reprocessing.
+    //
+    // Roughly, during invalidation and subsequent incremental closure
+    // analysis we may re-identify unmapped glyphs these would then
+    // need to be invalidated and reprocessed by the complex condition finder.
+    TRYV(context.glyph_groupings.FindFallbackGlyphConditions(
+        context.SegmentationInfo(), context.glyph_condition_set,
+        context.glyph_closure_cache));
+  }
+
+  return context.ToGlyphSegmentation();
+}
+
 StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
     hb_face_t* face, SubsetDefinition initial_segment,
     const std::vector<SubsetDefinition>& subset_definitions,
     btree_map<SegmentSet, MergeStrategy> merge_groups,
-    bool place_fallback_in_init) const {
-
+    UnmappedGlyphHandling unmapped_glyph_handling) const {
   for (const auto& [segments, strategy] : merge_groups) {
     if (strategy.UseCosts()) {
       TRYV(CheckForDisjointCodepoints(subset_definitions, segments));
@@ -442,7 +467,8 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
   // if requested any remaining fallback glyphs are also moved into the init
   // font.
   GlyphSet fallback_glyphs = context.glyph_groupings.FallbackGlyphs();
-  if (place_fallback_in_init && !fallback_glyphs.empty()) {
+  if (unmapped_glyph_handling == MOVE_TO_INIT_FONT &&
+      !fallback_glyphs.empty()) {
     VLOG(0) << "Moving " << fallback_glyphs.size()
             << " fallback glyphs into the initial font." << std::endl;
     SubsetDefinition new_def = context.SegmentationInfo().InitFontSegment();
@@ -452,7 +478,7 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
 
   if (merge_groups.empty()) {
     // No merging will be needed so we're done.
-    return context.ToGlyphSegmentation();
+    return ToFinalSegmentation(context, unmapped_glyph_handling);
   }
 
   // ### Iteratively merge segments and incrementally reprocess affected data.
@@ -499,7 +525,7 @@ StatusOr<GlyphSegmentation> ClosureGlyphSegmenter::CodepointToGlyphSegments(
         merger.LogMergedSizeHistogram();
       }
 
-      return context.ToGlyphSegmentation();
+      return ToFinalSegmentation(context, unmapped_glyph_handling);
     }
 
     const auto& [merged_segment_index, modified_gids] = *merged;
@@ -609,10 +635,8 @@ StatusOr<SegmentationCost> ClosureGlyphSegmenter::TotalCost(
 }
 
 Status ClosureGlyphSegmenter::FallbackCost(
-      hb_face_t* original_face, const GlyphSegmentation& segmentation,
-      uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size
-    ) const {
-
+    hb_face_t* original_face, const GlyphSegmentation& segmentation,
+    uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size) const {
   GlyphSet all_glyphs = segmentation.InitialFontGlyphClosure();
   for (const auto& [_, gids] : segmentation.GidSegments()) {
     all_glyphs.union_set(gids);

diff --git a/ift/encoder/closure_glyph_segmenter.h b/ift/encoder/closure_glyph_segmenter.h
@@ -10,6 +10,7 @@
 #include "ift/encoder/segmentation_context.h"
 #include "ift/encoder/subset_definition.h"
 #include "ift/freq/probability_calculator.h"
+#include "util/common.pb.h"
 
 namespace ift::encoder {
 
@@ -55,7 +56,7 @@ class ClosureGlyphSegmenter {
       hb_face_t* face, SubsetDefinition initial_segment,
       const std::vector<SubsetDefinition>& subset_definitions,
       absl::btree_map<common::SegmentSet, MergeStrategy> merge_groups,
-      bool place_fallback_in_init) const;
+      UnmappedGlyphHandling unmapped_glyph_handling) const;
 
   /*
    * Generates a segmentation context for the provided segmentation input.
@@ -76,12 +77,13 @@ class ClosureGlyphSegmenter {
       const freq::ProbabilityCalculator& probability_calculator) const;
 
   /*
-   * Computes the total cost of the fallback patch (expected number of bytes transferred)
+   * Computes the total cost of the fallback patch (expected number of bytes
+   * transferred)
    */
-  absl::Status FallbackCost(
-      hb_face_t* original_face, const GlyphSegmentation& segmentation,
-      uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size
-    ) const;
+  absl::Status FallbackCost(hb_face_t* original_face,
+                            const GlyphSegmentation& segmentation,
+                            uint32_t& fallback_glyphs_size,
+                            uint32_t& all_glyphs_size) const;
 
  private:
   uint32_t brotli_quality_;