diff --git a/docs/experimental/closure_glyph_segmentation.md b/docs/experimental/closure_glyph_segmentation.md index bc4703b..d351bc8 100644 --- a/docs/experimental/closure_glyph_segmentation.md +++ b/docs/experimental/closure_glyph_segmentation.md @@ -2,7 +2,7 @@ Author: Garret Rieger Date: Jan 27, 2025 -Updated: Oct 27, 2025 +Updated: Dec 17, 2025 ## Introduction @@ -69,9 +69,10 @@ and remaining areas for development in this particular approach: * Support for merging segmentations involving multiple overlapping scripts is not yet implemented (for example creating a segmentation that supports Chinese and Japanese simultaneously). -* [Multi segment analysis](#multi-segment-dependencies): the current implementation only does single - segment analysis which in some cases leaves sizable fallback glyph sets. How to implement multi - segment analysis is an open question and more development is needed. +* [Multi segment analysis](#multi-segment-dependencies): the current implementation utilizes an approach which + approximates multi segment analysis by finding superset disjunctive conditions for multi segment + conditions. See: + [closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md). * Input segmentation generation: the glyph segmentation process starts with an existing codepoint/feature based segmentation. Good results can be achieved by starting with one input @@ -252,7 +253,7 @@ that minimizes overall cost. ## Multi Segment Dependencies -Note: this section is somewhat speculative as this functionality has not yet been implemented. +Note: this section is somewhat speculative as this functionality has not yet been fully implemented. More research and exploration is definitely needed. The Segmenting Glyphs Based on Closure Analysis procedure places any glyphs whose conditions aren't @@ -291,6 +292,14 @@ needed to reduce the amount of combinations to test. Some suggestions: * The performance of a segmentation is likely driven solely by the high frequency code points. So divide the font into a high frequency set and low frequency set of code points. Where a more extensive multi segment dependency check is done for only the high frequency segments. + +As an alternative a simpler approach to the problem is to limit the scope to just finding conditions which are a superset +of the true condition. This superset condition can be used in place of the true condition without violating the closure +requirement. This is the approach currently used in the segmenter implementation. This procedure is discussed in more +details in [closure_glyph_segmentation_complex_conditions.md](./closure_glyph_segmentation_complex_conditions.md). The +advantage to this approach is it's much less computationally costly then multi segment analysis. The downside is these +superset conditions will activate more frequently then the true conditions and thus may be loaded in cases where they +are not actually needed. ## Examples diff --git a/docs/experimental/closure_glyph_segmentation_complex_conditions.md b/docs/experimental/closure_glyph_segmentation_complex_conditions.md new file mode 100644 index 0000000..0d23852 --- /dev/null +++ b/docs/experimental/closure_glyph_segmentation_complex_conditions.md @@ -0,0 +1,155 @@ + +# Complex Condition Finding in Closure Glyph Keyed Segmenter + +Author: Garret Rieger__ +Date: Dec 17, 2025 + +## Introduction + +Before reading this document is recommended to first review the +[closure glyph segmentation](./closure_glyph_segmentation.md) document. This document borrows concepts and terms from it. + +In closure glyph segmentation the closure analysis step is capable of locating glyph activation conditions that are +either fully disjunctive or fully conjunctive (eg. `(A or B or C)`). It is not capable of finding conditions that are a mix +of conjunction and disjunction (eg. `(A and B) or (B and C)`). These are referred to as complex conditions. By default +glyphs with complex conditions are assigned to a patch that is always loaded, since the true conditions are not known. + +This document describes an algorithm which can be used to find purely disjunctive conditions which are supersets of +complex conditions. A superset condition is one that will activate at least whenever the true condition would. This +property allows the superset condition to be used for a patch in place of the true condition without violating the +closure requirement. + +For a given complex condition there typically exists more than one possible superset disjunctive condition. The +algorithm will find one of them, but not necessarily the smallest one. The found superset condition will always only +contain only segments which appear in the original condition. + +For example if we had a glyph with a activation condition of `((A and B) or (B and C))` then this process will find one +of the possible superset conditions such as `(A or C)`, `(A or B)`, `(B or C)`, or `(B)`. In a segmentation we could +then have a patch with the found condition which loads the glyph and this would satisfy the closure requirement. + +## Foundations + +The algorithm is based on the following assertions: + +1. Given some fully disjunctive condition for a glyph, we can verify that the condition is a superset of the true + condition for the glyph and meets the closure requirement by the following procedure: compute a glyph closure of the + union of all segments except for those in the condition. If the glyph does not appear in this closure, then the + condition satisfies the closure requirement for that glyph and is a superset of the true condition. This is called + the “additional conditions” check. + +2. The glyph closure of all segments includes the glyph that we are analyzing. + +3. We have a glyph which has some true activation condition. If we compute a glyph closure of some combination of + segments, then adding or removing a segment, which is not part of the activation condition, to the glyph closure input + will have no affect on whether or not the glyph appears in the closure output. + +4. The closure of no segments contains only glyphs from the initial font. + +## The Algorithm + +For a glyph with a complex condition we can use the above to find a superset disjunctive condition for that +glyph's complex condition. These conditions will satisfy the closure requirement for each glyph. + +### Finding a Sub Condition + +The algorithm works by identifying a single sub condition at a time, this section describes the algorithm for +finding a single sub condition. + +Inputs: + +* Segments to exclude from the analysis. +* `glyph` to analyze. + +Algorithm: + +1. Start with a set of all segments except those to be excluded, called `to_test`. +2. Initialize a second set of segments, `sub_condition`, to the empty set. +3. Remove a segment `s` from `to_test` and compute the glyph closure of `to_test U sub_condition`. +4. If `glyph` is not found in the closure then add `s` to `sub_condition`. +5. If `to_test` is empty, then return the sub condition `sub_condition`. +6. Otherwise, go back to step 3. + +### Finding the Complete Condition + +This section describes the algorithm which finds the complete condition, it utilizes `Finding a Sub Condition`. + +Inputs: + +* `glyph` to analyze. + +Algorithm: + +1. Initialize a set of segments `condition` to the empty set. +2. Execute the `Finding a Sub Condition` algorithm with `condition` as the excluded set. +3. Union the returned set into `condition`. +4. Compute the glyph closure of all segments except those in `condition`. +5. If `glyph` is found in the closure, then more sub conditions still exist. Go back to step 2. +6. Return the complete condition, `condition`. + +### Initial Font + +Any time a closure operation is executed by the above two algorithms it's necessary to union the subset definition for +the initial font into the closure input. This is required because the closure of the initial font affects what's +reachable by the segments. + +### Why this works + +Here we show this procedure is guaranteed to find a disjunctive superset of a glyph's true condition which includes +only segments from the true condition, when the glyph is not already in the initial font: + +* For each call to `Finding a Sub Condition` glyph will be in the closure of all non-excluded segments. For the first + call this is guaranteed by assertion (2). For subsequent calls this is guaranteed by the "additional conditions" check + which gates execution. + +* Any segments which are not part of the true condition will not impact the glyph's presence in the closure (assertion + (3)). Further by the previous point we know that at the start of `Finding a Sub Condition` the closure of all + non-excluded segments will contain glyph. Thus testing a segment which is not part of the true condition will never + result in glyph missing from the closure, and won't be added to `sub_condition`. Therefore `Finding a Sub Condition` + will only ever return segments that are part of the true condition. + +* `Finding a Sub Condition` will always return at least one segment: if when the last segment is tested `sub_condition` + is still the empty set, then the closure will be on no segments and will not have glyph in it. This is a result of + assertion (4) and the premise that glyph is not already in the initial font. As a consequence the returned + `sub_condition` will always have at least one segment in it. + +* Since all returned segments from `Finding a Sub Condition` are excluded from future calls, there will be a finite + number of `Finding a Sub Condition` executions which return only segments part of the true condition. + +* Lastly, the algorithm terminates only once the additional conditions check finds no additional conditions, + guaranteeing we have found a superset disjunctive condition (assertion (1)). + +## Making it More Performant + +As described above this approach can be slow since it processes glyphs one at a time. Improvements to performance +can be made by processing glyphs in a batch. This can be done with a recursive approach where after each segment test +the set of input glyphs gets split into those that require the tested segment and those that don't. Each of the splits spawns +a new recursion (if there is at least one glyph in the split). + +Also from the closure analysis run by the segmenter we may have discovered some partial conditions for glyphs. These +can be incorporated as a starting point into the complex condition analysis. + +More substantial performance improvements could be realized by using a dependency graph generated from the font. This +could come in two forms: + +1. Determine what segments interact with GSUB in some way and use that to scope the analysis. Segments that don't + interact with GSUB can be discovered via regular closure analysis as they will only ever have disjunctive conditions. + After completing a scoped analysis, a final additional conditions check against all segments could be used to ensure + we have actually arrived at a superset condition. This is more speculative and would need more research to validate the + approach. + +2. Using a actual dependency graph, even if it's no fully accurate, to generate a set of initial activation + conditions. Then as needed when additional conditions check fails, we could find any additional segments for the + superset condition using this process. + +## Integrating into the Segmentation Algorithm + +Initially the complex condition analysis has been added as a final step after merging. If after merging unmapped glyphs +are present, then the complex condition analysis is run on those glyphs and the fallback patch is replaced with one or +more patches based on the results of complex condition analysis. + +However, ideally complex condition analysis would be run before merging so that the patches it generates can participate +in the merging process. This will require incremental updates to the complex condition analysis results, but that should +be straightforward. Implementing this is planned for the near future. + + + diff --git a/ift/encoder/BUILD b/ift/encoder/BUILD index 6886f74..c92838e 100644 --- a/ift/encoder/BUILD +++ b/ift/encoder/BUILD @@ -21,6 +21,7 @@ cc_library( "//ift/proto", "//ift/feature_registry:feature_registry", "//util:segmentation_plan_cc_proto", + "//util:common_cc_proto", "@abseil-cpp//absl/container:btree", "@abseil-cpp//absl/container:flat_hash_map", "@abseil-cpp//absl/container:flat_hash_set", @@ -60,6 +61,8 @@ cc_library( "glyph_condition_set.h", "glyph_groupings.h", "glyph_groupings.cc", + "complex_condition_finder.h", + "complex_condition_finder.cc", "candidate_merge.cc", "candidate_merge.h", "patch_size_cache.h", @@ -185,6 +188,24 @@ cc_test( ], ) +cc_test( + name = "complex_condition_finder_test", + size = "medium", + srcs = [ + "complex_condition_finder_test.cc", + ], + data = [ + "//common:testdata", + "//ift:testdata", + ], + deps = [ + ":segmentation_context", + ":encoder", + "//common", + "@googletest//:gtest_main", + ], +) + cc_test( name = "closure_glyph_segmenter_test", size = "medium", diff --git a/ift/encoder/closure_glyph_segmenter.cc b/ift/encoder/closure_glyph_segmenter.cc index 009ea40..e211de4 100644 --- a/ift/encoder/closure_glyph_segmenter.cc +++ b/ift/encoder/closure_glyph_segmenter.cc @@ -379,7 +379,7 @@ StatusOr ClosureGlyphSegmenter::CodepointToGlyphSegments( } return CodepointToGlyphSegments(face, initial_segment, subset_definitions, - merge_groups, false); + merge_groups, PATCH); } StatusOr> ToMergers( @@ -394,12 +394,37 @@ StatusOr> ToMergers( return mergers; } +static StatusOr ToFinalSegmentation( + SegmentationContext& context, + UnmappedGlyphHandling unmapped_glyph_handling) { + if (unmapped_glyph_handling == FIND_CONDITIONS) { + // TODO(garretrieger): this analysis should be performed prior to merging + // so that the found conditions can participate in merging. To make this + // performant we'll need to add support for incrementally recomputing + // complex conditions that are effected by merges. + // + // The good news here is that when we do a segment merge of the generated + // complex activation conditions that will naturally fix the unmapped + // nature of the relevant glyphs. However, changes to segments may + // also invalidate the complex conditions and require incremental + // reprocessing. + // + // Roughly, during invalidation and subsequent incremental closure + // analysis we may re-identify unmapped glyphs these would then + // need to be invalidated and reprocessed by the complex condition finder. + TRYV(context.glyph_groupings.FindFallbackGlyphConditions( + context.SegmentationInfo(), context.glyph_condition_set, + context.glyph_closure_cache)); + } + + return context.ToGlyphSegmentation(); +} + StatusOr ClosureGlyphSegmenter::CodepointToGlyphSegments( hb_face_t* face, SubsetDefinition initial_segment, const std::vector& subset_definitions, btree_map merge_groups, - bool place_fallback_in_init) const { - + UnmappedGlyphHandling unmapped_glyph_handling) const { for (const auto& [segments, strategy] : merge_groups) { if (strategy.UseCosts()) { TRYV(CheckForDisjointCodepoints(subset_definitions, segments)); @@ -442,7 +467,8 @@ StatusOr ClosureGlyphSegmenter::CodepointToGlyphSegments( // if requested any remaining fallback glyphs are also moved into the init // font. GlyphSet fallback_glyphs = context.glyph_groupings.FallbackGlyphs(); - if (place_fallback_in_init && !fallback_glyphs.empty()) { + if (unmapped_glyph_handling == MOVE_TO_INIT_FONT && + !fallback_glyphs.empty()) { VLOG(0) << "Moving " << fallback_glyphs.size() << " fallback glyphs into the initial font." << std::endl; SubsetDefinition new_def = context.SegmentationInfo().InitFontSegment(); @@ -452,7 +478,7 @@ StatusOr ClosureGlyphSegmenter::CodepointToGlyphSegments( if (merge_groups.empty()) { // No merging will be needed so we're done. - return context.ToGlyphSegmentation(); + return ToFinalSegmentation(context, unmapped_glyph_handling); } // ### Iteratively merge segments and incrementally reprocess affected data. @@ -499,7 +525,7 @@ StatusOr ClosureGlyphSegmenter::CodepointToGlyphSegments( merger.LogMergedSizeHistogram(); } - return context.ToGlyphSegmentation(); + return ToFinalSegmentation(context, unmapped_glyph_handling); } const auto& [merged_segment_index, modified_gids] = *merged; @@ -609,10 +635,8 @@ StatusOr ClosureGlyphSegmenter::TotalCost( } Status ClosureGlyphSegmenter::FallbackCost( - hb_face_t* original_face, const GlyphSegmentation& segmentation, - uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size - ) const { - + hb_face_t* original_face, const GlyphSegmentation& segmentation, + uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size) const { GlyphSet all_glyphs = segmentation.InitialFontGlyphClosure(); for (const auto& [_, gids] : segmentation.GidSegments()) { all_glyphs.union_set(gids); diff --git a/ift/encoder/closure_glyph_segmenter.h b/ift/encoder/closure_glyph_segmenter.h index 64758ba..4646b7f 100644 --- a/ift/encoder/closure_glyph_segmenter.h +++ b/ift/encoder/closure_glyph_segmenter.h @@ -10,6 +10,7 @@ #include "ift/encoder/segmentation_context.h" #include "ift/encoder/subset_definition.h" #include "ift/freq/probability_calculator.h" +#include "util/common.pb.h" namespace ift::encoder { @@ -55,7 +56,7 @@ class ClosureGlyphSegmenter { hb_face_t* face, SubsetDefinition initial_segment, const std::vector& subset_definitions, absl::btree_map merge_groups, - bool place_fallback_in_init) const; + UnmappedGlyphHandling unmapped_glyph_handling) const; /* * Generates a segmentation context for the provided segmentation input. @@ -76,12 +77,13 @@ class ClosureGlyphSegmenter { const freq::ProbabilityCalculator& probability_calculator) const; /* - * Computes the total cost of the fallback patch (expected number of bytes transferred) + * Computes the total cost of the fallback patch (expected number of bytes + * transferred) */ - absl::Status FallbackCost( - hb_face_t* original_face, const GlyphSegmentation& segmentation, - uint32_t& fallback_glyphs_size, uint32_t& all_glyphs_size - ) const; + absl::Status FallbackCost(hb_face_t* original_face, + const GlyphSegmentation& segmentation, + uint32_t& fallback_glyphs_size, + uint32_t& all_glyphs_size) const; private: uint32_t brotli_quality_; diff --git a/ift/encoder/closure_glyph_segmenter_test.cc b/ift/encoder/closure_glyph_segmenter_test.cc index 508427d..43025c5 100644 --- a/ift/encoder/closure_glyph_segmenter_test.cc +++ b/ift/encoder/closure_glyph_segmenter_test.cc @@ -399,11 +399,69 @@ if ((s0 OR s1 OR s2 OR s3)) then p5 )"); } +TEST_F(ClosureGlyphSegmenterTest, UnmappedGlyphs_FindConditions) { + auto segmentation = segmenter.CodepointToGlyphSegments( + noto_nastaliq_urdu.get(), {}, + {{0x20}, {0x62a}, {0x62b}, {0x62c}, {0x62d}}, {}, FIND_CONDITIONS); + ASSERT_TRUE(segmentation.ok()) << segmentation.status(); + + ASSERT_TRUE(segmentation->UnmappedGlyphs().empty()) + << segmentation->UnmappedGlyphs().ToString(); + + ASSERT_EQ(segmentation->ToString(), + R"(initial font: { gid0 } +p0: { gid1 } +p1: { gid3, gid9, gid155 } +p2: { gid4, gid10, gid156 } +p3: { gid5, gid6, gid11, gid157 } +p4: { gid158 } +p5: { gid12, gid13, gid24, gid30, gid38, gid39, gid57, gid59, gid62, gid68, gid139, gid140, gid153, gid172 } +p6: { gid47, gid64, gid73, gid74, gid75, gid76, gid77, gid83, gid111, gid149, gid174, gid190, gid191 } +p7: { gid14, gid33, gid60, gid91, gid112, gid145, gid152 } +if (s0) then p0 +if (s1) then p1 +if (s2) then p2 +if (s3) then p3 +if (s4) then p4 +if ((s1 OR s2)) then p5 +if ((s3 OR s4)) then p7 +if ((s1 OR s2 OR s3 OR s4)) then p6 +)"); +} + +TEST_F(ClosureGlyphSegmenterTest, UnmappedGlyphs_FindConditions_IsFallback) { + // Here the found conditions are equal to the fallback segment, this ensures + // everything works properly in this case. + auto segmentation = segmenter.CodepointToGlyphSegments( + noto_nastaliq_urdu.get(), {}, {{0x62a}, {0x62b}, {0x62c}, {0x62d}}, {}, + FIND_CONDITIONS); + ASSERT_TRUE(segmentation.ok()) << segmentation.status(); + ASSERT_TRUE(segmentation->UnmappedGlyphs().empty()) + << segmentation->UnmappedGlyphs().ToString(); + ASSERT_EQ(segmentation->ToString(), + R"(initial font: { gid0 } +p0: { gid3, gid9, gid155 } +p1: { gid4, gid10, gid156 } +p2: { gid5, gid6, gid11, gid157 } +p3: { gid158 } +p4: { gid12, gid13, gid24, gid30, gid38, gid39, gid57, gid59, gid62, gid68, gid139, gid140, gid153, gid172 } +p5: { gid47, gid64, gid73, gid74, gid75, gid76, gid77, gid83, gid111, gid149, gid174, gid190, gid191 } +p6: { gid14, gid33, gid60, gid91, gid112, gid145, gid152 } +if (s0) then p0 +if (s1) then p1 +if (s2) then p2 +if (s3) then p3 +if ((s0 OR s1)) then p4 +if ((s2 OR s3)) then p6 +if ((s0 OR s1 OR s2 OR s3)) then p5 +)"); +} + TEST_F(ClosureGlyphSegmenterTest, UnmappedGlyphs_FallbackSegmentMovedToInitFont) { auto segmentation = segmenter.CodepointToGlyphSegments( noto_nastaliq_urdu.get(), {}, {{0x62a}, {0x62b}, {0x62c}, {0x62d}}, - btree_map{}, true); + btree_map{}, MOVE_TO_INIT_FONT); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); ASSERT_EQ(segmentation->UnmappedGlyphs().size(), 0); @@ -1088,7 +1146,7 @@ TEST_F(ClosureGlyphSegmenterTest, MultipleMergeGroups) { {'m'}, {'n'}, {'o'}}, - merge_groups, false); + merge_groups, PATCH); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); std::vector expected_segments = { @@ -1184,7 +1242,7 @@ TEST_F(ClosureGlyphSegmenterTest, MultipleMergeGroups_InitFontMove) { {'m'}, {'n'}, {'o'}}, - merge_groups, false); + merge_groups, PATCH); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); // Only segments from the groups that set a threshold are eligible to be moved @@ -1261,7 +1319,7 @@ TEST_F(ClosureGlyphSegmenterTest, MultipleMergeGroups_CompositesRespectGroups) { {'i'}, {'j'}, }, - merge_groups, false); + merge_groups, PATCH); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); // f + i would normally be a good merge, but here it's skipped since it @@ -1302,7 +1360,7 @@ TEST_F(ClosureGlyphSegmenterTest, MultipleMergeGroups_Heuristic) { {'i'}, {'j'}, }, - merge_groups, false); + merge_groups, PATCH); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); // f + i would normally be a good merge, but here it's skipped since it @@ -1413,7 +1471,7 @@ if (s2) then p2 {{0, 1}, *MergeStrategy::CostBased(std::move(frequencies), 75, 3)}, {{2}, MergeStrategy::None()}, }, - false); + PATCH); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); ASSERT_EQ(segmentation->ToString(), @@ -1455,7 +1513,7 @@ TEST_F(ClosureGlyphSegmenterTest, MultipleMergeGroups_PreGrouping) { {'h'}, {'i'}, }, - merge_groups, false); + merge_groups, PATCH); ASSERT_TRUE(segmentation.ok()) << segmentation.status(); // d, a are above the pregrouping threshold so aren't grouped. diff --git a/ift/encoder/complex_condition_finder.cc b/ift/encoder/complex_condition_finder.cc new file mode 100644 index 0000000..3971377 --- /dev/null +++ b/ift/encoder/complex_condition_finder.cc @@ -0,0 +1,310 @@ +#include "ift/encoder/complex_condition_finder.h" + +#include + +#include "absl/log/log.h" +#include "absl/status/status.h" +#include "common/int_set.h" +#include "ift/encoder/glyph_closure_cache.h" +#include "ift/encoder/requested_segmentation_information.h" +#include "ift/encoder/types.h" + +using absl::btree_map; +using absl::flat_hash_map; +using absl::Status; +using absl::StatusOr; +using common::GlyphSet; +using common::SegmentSet; + +// For more information on this process see the explanation in: +// ../../docs/experimental/closure_glyph_segmentation_complex_conditions.md + +namespace ift::encoder { + +// One unit of work for the analysis. One from to_be_tested will be checked. +struct Task { + // These segments are already fully analyzed and should be excluded from this + // analysis. + SegmentSet excluded; + + // These segments have been determined to be part of a sub condition. + SegmentSet sub_condition; + + // These segments have not yet been tested. + SegmentSet to_be_tested; + + // The set of glyphs in scope for analysis. + GlyphSet glyphs; +}; + +struct Context { + const SegmentSet all_segments; + const RequestedSegmentationInformation* segmentation_info; + GlyphClosureCache* glyph_closure_cache; + std::vector queue; + + public: + Status ScheduleInitialTasks( + GlyphSet glyphs, + flat_hash_map existing_conditions) { + if (glyphs.intersects(segmentation_info->InitFontGlyphs())) { + return absl::InvalidArgumentError( + "Can't analyze glyphs that are in the init font."); + } + + // Each existing condition will map to one initial task that excludes the + // existing condition from the analysis. + for (const auto& [segments, glyph_sub_group] : existing_conditions) { + if (!TRY(InClosure(segments, glyph_sub_group))) { + return absl::InvalidArgumentError( + "The glyphs of existing conditions must be in the closure of " + "condition segments."); + } + TRYV(ScheduleExistingConditionTask(segments, glyph_sub_group, glyphs)); + } + + if (glyphs.empty()) { + return absl::OkStatus(); + } + + if (!TRY(InClosure(all_segments, glyphs))) { + return absl::InvalidArgumentError( + "glyphs to analyze must be in the closure of all segments."); + } + + // If any glyphs remain that do not have existing conditions these are + // covered by a task with no excluded segments. + queue.push_back(Task{ + .excluded = {}, + .sub_condition = {}, + .to_be_tested = all_segments, + .glyphs = glyphs, + }); + + return absl::OkStatus(); + } + + Status ProcessQueue(btree_map& glyph_to_conditions) { + // TODO(garretrieger): to reduce runtime of this analysis the processing of + // the queue could be parallelized by using a threadpool to run tasks. The + // tasks are fully independent so this should be straightforward. + while (!queue.empty()) { + Task next = std::move(queue.back()); + queue.pop_back(); + TRYV(RunAnalysisTask(next, glyph_to_conditions)); + } + return absl::OkStatus(); + } + + private: + // Returns true if all glyphs are in the closure of segments. + StatusOr InClosure(const SegmentSet& segments, const GlyphSet& glyphs) { + GlyphSet closure = TRY(SegmentClosure(segments)); + return glyphs.is_subset_of(closure); + } + + StatusOr> HasAdditionalConditions( + const SegmentSet& segments, const GlyphSet& glyphs) { + SegmentSet except = all_segments; + except.subtract(segments); + GlyphSet closure_glyphs = TRY(SegmentClosure(except)); + closure_glyphs.intersect(glyphs); + return std::make_pair(closure_glyphs, std::move(except)); + } + + Status ScheduleExistingConditionTask(const SegmentSet& condition, + const GlyphSet& condition_glyphs, + GlyphSet& all_glyphs) { + // We need to check if there are any additional conditions, + // if there aren't there is no need to schedule the analysis. + auto [glyphs_with_additional_conditions, except] = + TRY(HasAdditionalConditions(condition, condition_glyphs)); + + if (glyphs_with_additional_conditions.empty()) { + return absl::OkStatus(); + } + + queue.push_back(Task{ + .excluded = condition, + .sub_condition = {}, + .to_be_tested = except, + .glyphs = glyphs_with_additional_conditions, + }); + all_glyphs.subtract(condition_glyphs); + + return absl::OkStatus(); + } + + // Each analysis step checks one segment to see for which glyphs that segment + // is relevant. The supplied task data structure gives the specific state + // around which the segment is tested. + // + // To test a segment a closure is run without the segment being tested: + // - For inscope glyphs which appear in the closure the test segment is not + // relevant for these glyphs + // - For inscope glyphs which do not appear in the closure the test segment is + // relevant for these glyphs. + // + // Based on the anlysis results up to two more analysis steps are spawned (one + // for glyphs where segment is relevant, the other where it is not relevant) + // to test the next segment. + // + // Once all segments are tested the resulting sub condition segments + // is recorded in out. Lastly, the non-relevant segments are checked to see + // if additional conditions are present, if they are another analysis task is + // queued to discover the additional conditions. + Status RunAnalysisTask( + Task task, btree_map& glyph_to_conditions) { + if (task.glyphs.empty()) { + // Nothing left to check. + return absl::OkStatus(); + } + + if (task.to_be_tested.empty()) { + return RecordSubCondition(task, glyph_to_conditions); + } + + segment_index_t test_segment = *task.to_be_tested.min(); + task.to_be_tested.erase(test_segment); + + SegmentSet closure_segments = task.sub_condition; + closure_segments.union_set(task.to_be_tested); + GlyphSet closure_glyphs = TRY(SegmentClosure(closure_segments)); + + GlyphSet needs_test_segment = task.glyphs; + needs_test_segment.subtract(closure_glyphs); + GlyphSet doesnt_need_test_segment = task.glyphs; + doesnt_need_test_segment.intersect(closure_glyphs); + + queue.push_back(Task{ + .excluded = task.excluded, + .sub_condition = task.sub_condition, + .to_be_tested = task.to_be_tested, + .glyphs = doesnt_need_test_segment, + }); + + task.sub_condition.insert(test_segment); + queue.push_back(Task{ + .excluded = task.excluded, + .sub_condition = task.sub_condition, + .to_be_tested = task.to_be_tested, + .glyphs = needs_test_segment, + }); + + return absl::OkStatus(); + } + + // A sub condition has been found, record it and kick off any + // further analysis needed for additional conditions. + Status RecordSubCondition( + Task task, btree_map& glyph_to_conditions) { + for (glyph_id_t gid : task.glyphs) { + glyph_to_conditions[gid].union_set(task.sub_condition); + } + + // We have identified a sub condition for glyphs, however as usual + // there may be remaining additional conditions which we need to + // check for + task.excluded.union_set(task.sub_condition); + auto [additional_condition_glyphs, remaining] = + TRY(HasAdditionalConditions(task.excluded, task.glyphs)); + + // Anything left in glyphs has additional conditions, recurse again to + // analyze them further + queue.push_back(Task{ + .excluded = task.excluded, + .sub_condition = {}, + .to_be_tested = remaining, + .glyphs = additional_condition_glyphs, + }); + return absl::OkStatus(); + } + + SubsetDefinition CombinedDefinition(const SegmentSet& segments) { + SubsetDefinition def; + for (segment_index_t s : segments) { + def.Union(segmentation_info->Segments().at(s).Definition()); + } + return def; + } + + StatusOr SegmentClosure(const SegmentSet& segments) { + SubsetDefinition closure_def = CombinedDefinition(segments); + // Init font subset definition must be part of the closure input + // since it contributes to reachability of things. + closure_def.Union(segmentation_info->InitFontSegment()); + return glyph_closure_cache->GlyphClosure(closure_def); + } +}; + +static flat_hash_map ExistingConditions( + const GlyphConditionSet& glyph_condition_set, const GlyphSet& glyphs, + btree_map& glyph_to_conditions) { + flat_hash_map existing_conditions; + for (glyph_id_t gid : glyphs) { + SegmentSet or_segments = glyph_condition_set.ConditionsFor(gid).or_segments; + if (or_segments.empty()) { + continue; + } + existing_conditions[or_segments].insert(gid); + glyph_to_conditions[gid].union_set(or_segments); + } + return existing_conditions; +} + +static SegmentSet NonEmptySegments( + const RequestedSegmentationInformation& segmentation_info) { + SegmentSet segments; + for (segment_index_t s = 0; s < segmentation_info.Segments().size(); s++) { + if (segmentation_info.Segments().at(s).Definition().Empty()) { + continue; + } + segments.insert(s); + } + return segments; +} + +StatusOr> FindSupersetDisjunctiveConditionsFor( + const RequestedSegmentationInformation& segmentation_info, + const GlyphConditionSet& glyph_condition_set, + GlyphClosureCache& closure_cache, GlyphSet glyphs) { + VLOG(0) << "Analyzing " << glyphs.size() + << " unmapped glyphs with the complex condition detector."; + + // TODO(garretrieger): we should see which unicodes (and thus which segments) + // may interact with the GSUB table. Any segments which don't interact with + // GSUB will already have relavent conditions discovered via the standard + // closure analysis. Only segments which interact with GSUB may be part of + // complex conditions (since complex conditions require at least one 'AND' + // which only GSUB can introduce). As a result we can exclude any segments + // with no GSUB interaction from this analysis which should significantly + // speed things up. + Context context{ + .all_segments = NonEmptySegments(segmentation_info), + .segmentation_info = &segmentation_info, + .glyph_closure_cache = &closure_cache, + .queue = {}, + }; + + // We may already have some partial conditions generated for the fallback + // glyphs, preload these into the output and schedule the initial tasks + // excluding those segments. + btree_map glyph_to_conditions; + flat_hash_map existing_conditions = + ExistingConditions(glyph_condition_set, glyphs, glyph_to_conditions); + TRYV(context.ScheduleInitialTasks(std::move(glyphs), existing_conditions)); + + TRYV(context.ProcessQueue(glyph_to_conditions)); + + btree_map grouped_out; + for (const auto& [gid, segments] : glyph_to_conditions) { + grouped_out[segments].insert(gid); + } + + VLOG(0) << "Found " << grouped_out.size() + << " new conditions for the unmapped glyphs."; + + return grouped_out; +} + +} // namespace ift::encoder \ No newline at end of file diff --git a/ift/encoder/complex_condition_finder.h b/ift/encoder/complex_condition_finder.h new file mode 100644 index 0000000..635500c --- /dev/null +++ b/ift/encoder/complex_condition_finder.h @@ -0,0 +1,33 @@ +#ifndef IFT_ENCODER_COMPLEX_CONDITION_FINDER_H_ +#define IFT_ENCODER_COMPLEX_CONDITION_FINDER_H_ + +#include "absl/status/statusor.h" +#include "common/int_set.h" +#include "ift/encoder/glyph_closure_cache.h" +#include "ift/encoder/glyph_condition_set.h" +#include "ift/encoder/requested_segmentation_information.h" + +namespace ift::encoder { + +// Finds superset purely disjunctive conditions that activate each +// provided glyph. Returns a map from each condition to the activated +// glyphs. +// +// Takes a glyph condition set which will be used as a starting point. +// +// A superset purely disjunctive condition will activate at least +// whenever the true condition would. It will only ever include segments +// that appear in the true condition. There are typically multiple +// possible superset conditions. This will find one of them. +// +// For example if a glyph has the true condition (a and b) or (b and c) +// this could find the condition (a or c). +absl::StatusOr> +FindSupersetDisjunctiveConditionsFor( + const RequestedSegmentationInformation& segmentation_info, + const GlyphConditionSet& glyph_condition_set, + GlyphClosureCache& closure_cache, common::GlyphSet glyphs); + +} // namespace ift::encoder + +#endif // IFT_ENCODER_COMPLEX_CONDITION_FINDER_H_ \ No newline at end of file diff --git a/ift/encoder/complex_condition_finder_test.cc b/ift/encoder/complex_condition_finder_test.cc new file mode 100644 index 0000000..929a387 --- /dev/null +++ b/ift/encoder/complex_condition_finder_test.cc @@ -0,0 +1,249 @@ +#include "ift/encoder/complex_condition_finder.h" + +#include "absl/container/btree_map.h" +#include "common/font_data.h" +#include "common/int_set.h" +#include "gtest/gtest.h" +#include "ift/encoder/closure_glyph_segmenter.h" +#include "ift/encoder/requested_segmentation_information.h" +#include "ift/encoder/segmentation_context.h" +#include "ift/encoder/types.h" +#include "ift/freq/probability_bound.h" + +using absl::btree_map; +using absl::StatusOr; +using common::FontData; +using common::GlyphSet; +using common::hb_face_unique_ptr; +using common::make_hb_face; +using common::SegmentSet; +using ift::freq::ProbabilityBound; + +namespace ift::encoder { + +class ComplexConditionFinderTest : public ::testing::Test { + protected: + ComplexConditionFinderTest() + : roboto(make_hb_face(nullptr)), segmenter(1, 1) { + roboto = from_file("common/testdata/Roboto-Regular.ttf"); + } + + hb_face_unique_ptr from_file(const char* filename) { + hb_blob_t* blob = hb_blob_create_from_file_or_fail(filename); + if (!blob) { + assert(false); + } + FontData result(blob); + hb_blob_destroy(blob); + return result.face(); + } + + SubsetDefinition CombinedDefinition(const SegmentationContext& context, + const SegmentSet& segments) { + SubsetDefinition def; + for (segment_index_t s : segments) { + def.Union(context.SegmentationInfo().Segments().at(s).Definition()); + } + return def; + } + + StatusOr SegmentClosure(SegmentationContext& context, + const SegmentSet& segments) { + SubsetDefinition closure_def = CombinedDefinition(context, segments); + return context.glyph_closure_cache.GlyphClosure(closure_def); + } + + SegmentationContext TestContext(bool basic_closure_analysis) { + auto context = *segmenter.InitializeSegmentationContext( + roboto.get(), {'f'}, + { + /* 0 */ {{0x54}, ProbabilityBound::Zero()}, + /* 1 */ {{0x6C}, ProbabilityBound::Zero()}, + /* 2 */ {{0x6E}, ProbabilityBound::Zero()}, + /* 3 */ {{0x13C}, ProbabilityBound::Zero()}, + /* 4 */ {{0x146}, ProbabilityBound::Zero()}, + /* 5 */ {{0x21A}, ProbabilityBound::Zero()}, + /* 6 */ {{0xF6C3}, ProbabilityBound::Zero()}, + /* 7 */ {{0x69}, ProbabilityBound::Zero()}, + }); + + if (!basic_closure_analysis) { + // initialaztion populates the basic conditions, clear those + // out so we can control them. + context.glyph_condition_set.InvalidateGlyphInformation( + {748, 756, 782}, {0, 1, 2, 3, 4, 5, 6}); + } + + return context; + } + + hb_face_unique_ptr roboto; + ClosureGlyphSegmenter segmenter; + + // Expected complex conditions: + // + // 0xF6C3, 0x54, 0x21A => g782 + // 0xF6C3, 0x6C, 0x13C => g748 + // 0xF6C3, 0x6E, 0x146 => g756 + btree_map expected = { + {{6, 1, 3}, {748}}, + {{6, 2, 4}, {756}}, + {{6, 0, 5}, {782}}, + }; +}; + +TEST_F(ComplexConditionFinderTest, FindConditions) { + SegmentationContext context = TestContext(false); + + auto r = FindSupersetDisjunctiveConditionsFor(context.SegmentationInfo(), + context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + 756, + 782, + }); + ASSERT_TRUE(r.ok()) << r.status(); + ASSERT_EQ(expected, *r); + + // Verify that the closure requirement is met. If all segments from + // the minimal condition are excluded then the mapped gid should not + // appear in the closure. + const SegmentSet all = {0, 1, 2, 3, 4, 5, 6}; + for (const auto& [segments, gids] : *r) { + SegmentSet except = all; + except.subtract(segments); + + GlyphSet closure = *SegmentClosure(context, except); + ASSERT_FALSE(closure.intersects(gids)); + + closure = *SegmentClosure(context, segments); + ASSERT_TRUE(gids.is_subset_of(closure)); + } +} + +TEST_F(ComplexConditionFinderTest, FindConditions_Partial) { + SegmentationContext context = TestContext(false); + + auto r = FindSupersetDisjunctiveConditionsFor(context.SegmentationInfo(), + context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + }); + ASSERT_TRUE(r.ok()) << r.status(); + expected.erase(SegmentSet{6, 0, 5}); + expected.erase(SegmentSet{6, 2, 4}); + ASSERT_EQ(expected, *r); +} + +TEST_F(ComplexConditionFinderTest, FindConditions_IncompleteExistingCondition) { + SegmentationContext context = TestContext(false); + + context.glyph_condition_set.AddOrCondition(748, 6); + auto r = FindSupersetDisjunctiveConditionsFor(context.SegmentationInfo(), + context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + }); + ASSERT_TRUE(absl::IsInvalidArgument(r.status())) << r.status(); +} + +TEST_F(ComplexConditionFinderTest, FindConditions_GlyphsNotInClosure) { + SegmentationContext context = TestContext(false); + + auto r = FindSupersetDisjunctiveConditionsFor( + context.SegmentationInfo(), context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + 40 // this is not in the full closure. + }); + ASSERT_TRUE(absl::IsInvalidArgument(r.status())) << r.status(); +} + +TEST_F(ComplexConditionFinderTest, + FindConditions_WithExistingConditions_FromClosureAnalysis) { + SegmentationContext context = TestContext(true); + + auto r = FindSupersetDisjunctiveConditionsFor(context.SegmentationInfo(), + context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + 756, + 782, + }); + ASSERT_TRUE(r.ok()) << r.status(); + ASSERT_EQ(expected, *r); +} + +TEST_F(ComplexConditionFinderTest, FindConditions_WithExistingConditions) { + SegmentationContext context = TestContext(false); + + context.glyph_condition_set.AddOrCondition(748, 1); + context.glyph_condition_set.AddOrCondition(748, 6); + + auto r = FindSupersetDisjunctiveConditionsFor(context.SegmentationInfo(), + context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + 756, + 782, + }); + ASSERT_TRUE(r.ok()) << r.status(); + ASSERT_EQ(expected, *r); +} + +TEST_F(ComplexConditionFinderTest, + FindConditions_WithExistingConditions_NoAdditionalConditions) { + SegmentationContext context = TestContext(false); + + context.glyph_condition_set.AddOrCondition(748, 1); + context.glyph_condition_set.AddOrCondition(748, 3); + context.glyph_condition_set.AddOrCondition(748, 6); + + auto r = FindSupersetDisjunctiveConditionsFor(context.SegmentationInfo(), + context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + 756, + 782, + }); + ASSERT_TRUE(r.ok()) << r.status(); + ASSERT_EQ(expected, *r); +} + +TEST_F(ComplexConditionFinderTest, FindConditions_RejectsInitFontGlyphs) { + SegmentationContext context = TestContext(false); + + auto r = FindSupersetDisjunctiveConditionsFor( + context.SegmentationInfo(), context.glyph_condition_set, + context.glyph_closure_cache, + { + 748, + 74, // f - in the init closure + }); + ASSERT_TRUE(absl::IsInvalidArgument(r.status())) << r.status(); +} + +TEST_F(ComplexConditionFinderTest, FindConditions_ClosureRespectsInitFont) { + SegmentationContext context = TestContext(false); + + auto r = FindSupersetDisjunctiveConditionsFor( + context.SegmentationInfo(), context.glyph_condition_set, + context.glyph_closure_cache, + { + 446, // fi ligature - combines i with f from the init font + }); + ASSERT_TRUE(r.ok()) << r.status(); + expected = { + {{7}, {446}}, + }; + ASSERT_EQ(expected, *r); +} + +} // namespace ift::encoder \ No newline at end of file diff --git a/ift/encoder/glyph_groupings.cc b/ift/encoder/glyph_groupings.cc index e3eca25..49a4a4f 100644 --- a/ift/encoder/glyph_groupings.cc +++ b/ift/encoder/glyph_groupings.cc @@ -6,6 +6,7 @@ #include "common/int_set.h" #include "common/try.h" #include "ift/encoder/activation_condition.h" +#include "ift/encoder/complex_condition_finder.h" #include "ift/encoder/glyph_closure_cache.h" #include "ift/encoder/glyph_condition_set.h" #include "ift/encoder/glyph_partition.h" @@ -231,6 +232,35 @@ Status GlyphGroupings::GroupGlyphs( return absl::OkStatus(); } +Status GlyphGroupings::FindFallbackGlyphConditions( + const RequestedSegmentationInformation& segmentation_info, + const GlyphConditionSet& glyph_condition_set, + GlyphClosureCache& closure_cache) { + GlyphSet fallback_glyphs = FallbackGlyphs(); + btree_map complex_conditions = + TRY(FindSupersetDisjunctiveConditionsFor(segmentation_info, + glyph_condition_set, + closure_cache, fallback_glyphs)); + + or_glyph_groups_.erase(fallback_segments_); + RemoveConditionAndGlyphs( + ActivationCondition::or_segments(fallback_segments_, 0)); + unmapped_glyphs_.clear(); + fallback_segments_.clear(); + + for (const auto& [s, g] : complex_conditions) { + or_glyph_groups_[s].union_set(g); + ActivationCondition c = ActivationCondition::or_segments(s, 0); + // There may be existing glyphs at this specific condition, so union into + // it. + UnionConditionAndGlyphs(c, g); + } + VLOG(0) + << "Unmapped glyphs patch removed and replaced with found conditions."; + + return absl::OkStatus(); +} + Status GlyphGroupings::RecomputeCombinedConditions( const GlyphConditionSet& glyph_condition_set) { // To minimize the amount of work we need to do we first detect which segments diff --git a/ift/encoder/glyph_groupings.h b/ift/encoder/glyph_groupings.h index 7a52fe6..360f67b 100644 --- a/ift/encoder/glyph_groupings.h +++ b/ift/encoder/glyph_groupings.h @@ -173,6 +173,14 @@ class GlyphGroupings { const GlyphConditionSet& glyph_condition_set, GlyphClosureCache& closure_cache, const common::GlyphSet& glyphs); + // Perform a more detailed analysis to try and find more granular conditions + // for fallback glyphs. Will replace the fallback glyphs with any found + // conditions. + absl::Status FindFallbackGlyphConditions( + const RequestedSegmentationInformation& segmentation_info, + const GlyphConditionSet& glyph_condition_set, + GlyphClosureCache& closure_cache); + // Converts this grouping into a finalized GlyphSegmentation. absl::StatusOr ToGlyphSegmentation( const RequestedSegmentationInformation& segmentation_info) const; @@ -211,6 +219,14 @@ class GlyphGroupings { } } + void UnionConditionAndGlyphs(ActivationCondition condition, + common::GlyphSet glyphs) { + conditions_and_glyphs_[condition].union_set(glyphs); + for (segment_index_t s : condition.TriggeringSegments()) { + triggering_segment_to_conditions_[s].insert(condition); + } + } + void RemoveConditionAndGlyphs(ActivationCondition condition) { conditions_and_glyphs_.erase(condition); for (segment_index_t s : condition.TriggeringSegments()) { diff --git a/ift/encoder/segmentation_context.cc b/ift/encoder/segmentation_context.cc index e1619ff..4f3fb26 100644 --- a/ift/encoder/segmentation_context.cc +++ b/ift/encoder/segmentation_context.cc @@ -19,7 +19,7 @@ namespace ift::encoder { Status SegmentationContext::ValidateSegmentation( const GlyphSegmentation& segmentation) const { - IntSet visited; + GlyphSet visited; const auto& initial_closure = segmentation.InitialFontGlyphClosure(); for (const auto& [id, gids] : segmentation.GidSegments()) { for (glyph_id_t gid : gids) { @@ -35,12 +35,15 @@ Status SegmentationContext::ValidateSegmentation( } } - IntSet full_minus_initial = segmentation_info_.FullClosure(); + GlyphSet full_minus_initial = segmentation_info_.FullClosure(); full_minus_initial.subtract(initial_closure); if (full_minus_initial != visited) { + GlyphSet missing = full_minus_initial; + missing.subtract(visited); return absl::FailedPreconditionError( - "Not all glyphs in the full closure have been placed."); + "Not all glyphs in the full closure have been placed. Missing: " + + missing.ToString()); } return absl::OkStatus(); diff --git a/util/BUILD b/util/BUILD index c3b3e60..a4f246f 100644 --- a/util/BUILD +++ b/util/BUILD @@ -35,6 +35,12 @@ proto_library( ], ) +cc_proto_library( + name = "common_cc_proto", + deps = [":common_proto"], + visibility = ["//visibility:public"], +) + proto_library( name = "segmenter_config_proto", srcs = ["segmenter_config.proto"], diff --git a/util/closure_glyph_keyed_segmenter_util.cc b/util/closure_glyph_keyed_segmenter_util.cc index e2e611a..6136572 100644 --- a/util/closure_glyph_keyed_segmenter_util.cc +++ b/util/closure_glyph_keyed_segmenter_util.cc @@ -53,8 +53,9 @@ ABSL_FLAG(bool, include_initial_codepoints_in_config, true, ABSL_FLAG(bool, output_segmentation_analysis, true, "If set an analysis of the segmentation will be output to stderr."); -ABSL_FLAG(bool, output_fallback_glyph_count, false, - "If set the number of fallback glyphs in the segmentation will be output."); +ABSL_FLAG( + bool, output_fallback_glyph_count, false, + "If set the number of fallback glyphs in the segmentation will be output."); ABSL_FLAG( int, verbosity, 0, @@ -67,9 +68,9 @@ using absl::Status; using absl::StatusOr; using absl::StrCat; using common::CodepointSet; -using common::GlyphSet; using common::FontData; using common::FontHelper; +using common::GlyphSet; using common::hb_face_unique_ptr; using common::SegmentSet; using google::protobuf::TextFormat; @@ -165,14 +166,14 @@ static void AddTableKeyedSegments( } } -static Status OutputFallbackGlyphCount( - hb_face_t* original_face, - const ClosureGlyphSegmenter& segmenter, - const GlyphSegmentation& segmentation) { +static Status OutputFallbackGlyphCount(hb_face_t* original_face, + const ClosureGlyphSegmenter& segmenter, + const GlyphSegmentation& segmentation) { uint32_t num_fallback_glyphs = segmentation.UnmappedGlyphs().size(); uint32_t fallback_glyphs_size = 0; uint32_t all_glyphs_size = 0; - TRYV(segmenter.FallbackCost(original_face, segmentation, fallback_glyphs_size, all_glyphs_size)); + TRYV(segmenter.FallbackCost(original_face, segmentation, fallback_glyphs_size, + all_glyphs_size)); GlyphSet all_glyphs; for (const auto& [_, gids] : segmentation.GidSegments()) { @@ -180,9 +181,9 @@ static Status OutputFallbackGlyphCount( } uint32_t num_glyphs = all_glyphs.size() + num_fallback_glyphs; - std::cout << "num_fallback_glyphs, " << num_fallback_glyphs << ", " << num_glyphs - << ", " << fallback_glyphs_size << ", " << all_glyphs_size - << std::endl; + std::cout << "num_fallback_glyphs, " << num_fallback_glyphs << ", " + << num_glyphs << ", " << fallback_glyphs_size << ", " + << all_glyphs_size << std::endl; return absl::OkStatus(); } @@ -201,14 +202,15 @@ static Status Main(const std::vector args) { std::vector segments; btree_map merge_groups = - TRY(config_util.ConfigToMergeGroups(config, font_codepoints, font_features, segments)); + TRY(config_util.ConfigToMergeGroups(config, font_codepoints, + font_features, segments)); ClosureGlyphSegmenter segmenter( config.brotli_quality(), config.brotli_quality_for_initial_font_merging()); GlyphSegmentation segmentation = TRY(segmenter.CodepointToGlyphSegments( font.get(), init_segment, segments, merge_groups, - config.move_fallback_glyphs_into_initial_font())); + config.unmapped_glyph_handling())); if (absl::GetFlag(FLAGS_output_segmentation_plan)) { SegmentationPlan plan = segmentation.ToSegmentationPlanProto(); diff --git a/util/common.proto b/util/common.proto index 2b2437b..8d2b355 100644 --- a/util/common.proto +++ b/util/common.proto @@ -34,4 +34,17 @@ message DesignSpace { message AxisRangeProto { float start = 1; float end = 2; +} + +// This enum is used to specify how unmapped glyphs are handled in the produced segmentation. +enum UnmappedGlyphHandling { + // All unmapped glyphs will be placed into a patch which is always loaded. + PATCH = 0; + + // All unmapped glyphs will be moved into the initial font. + MOVE_TO_INIT_FONT = 1; + + // Advanced condition analysis will be used (expensive) to discover the conditions + // for unmapped glyphs. + FIND_CONDITIONS = 2; } \ No newline at end of file diff --git a/util/segmenter_config.proto b/util/segmenter_config.proto index ace6ed2..a7f5855 100644 --- a/util/segmenter_config.proto +++ b/util/segmenter_config.proto @@ -40,9 +40,9 @@ message SegmenterConfig { // initial font, and hence be always available. SegmentProto initial_segment = 1; - // When set any glyphs that would be in the fallback patch (ie. glyphs that are always loaded) - // are instead moved into the initial font. - bool move_fallback_glyphs_into_initial_font = 2 [default = true]; + // Specifies what to do with unmapped glyphs. Unmapped glyphs are glyphs in which the analysis + // is unable to determine activation conditions for. + UnmappedGlyphHandling unmapped_glyph_handling = 2 [default = MOVE_TO_INIT_FONT]; // If enabled then the generated segmentation plan will include a set of table keyed segments. // One table keyed segment will be generated per merge group, including the auto generated