planner: maintain a map of column indices by name in DataSource #64204

henrybw · 2025-10-31T02:00:16Z

What problem does this PR solve?

Issue Number: ref #63856

Problem Summary: Profiling the time spent in the optimizer repeatedly executing, for 2 minutes, a query on a table like this, but with many more indexes (500) and columns (6,500), reveals that 34% of the time now is spent in expression.IndexCol2Col(), due to this function being passed a slice of columns that the function has to search linearly with a for loop. Instead of scanning this slice, we should be using a map to reduce lookups of column by ID from O(N) to O(1).

What changed and how does it work?

Similar to #64053, this is a preliminary PR that establishes the foundation for replacing the O(N) loop with an O(1) map lookup. The goal is to eventually add a colIdxsByName map[string]int parameter to expression.IndexCol2Col() that can be supplied by the logical data source.

Add a new model.GetColIdxMapping() function that builds this map from a slice of model.ColumnInfo structs, along with a TestColIdxMapping() test case for this new function.
- There are two reasons why this function is in model: (1) tests in the util/ranger package depend on expression.IndexInfo2PrefixCols() and will need to build this map too, and (2) it is next to the definition of the model.ColumnInfo that it operates on.
Add a ColIdxsByName map to the logical data source in the planner.
Refactor all places in the planner that append columns to ds.Columns and ds.Schema() directly to instead call a new AppendColumn() function that also ensures the column is tracked by the ds.ColIdxsByName map.
- The only exception is PruneColumns(), which has been updated to also remove the pruned column from ds.ColIdxsByName and update the affected column indices in the map.
Add a new TestDataSourceColumnMaps test case that ensures that AppendColumn() maintains coherency between ds.Columns and ds.ColIdxsByName, and also ensures that AppendTableCol() maintains coherency between ds.TblCols and ds.TblColsByID.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

tiprow · 2025-10-31T02:00:35Z

Hi @henrybw. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2025-10-31T02:21:30Z

Codecov Report

❌ Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.5241%. Comparing base (bdd2b6f) to head (eb145bd).
⚠️ Report is 19 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #64204        +/-   ##
================================================
+ Coverage   72.7334%   73.5241%   +0.7907%     
================================================
  Files          1859       1860         +1     
  Lines        503870     505223      +1353     
================================================
+ Hits         366482     371461      +4979     
+ Misses       115127     111646      -3481     
+ Partials      22261      22116       -145

Flag	Coverage Δ
integration	`41.8379% <90.6250%> (?)`
unit	`72.6008% <96.8750%> (+0.3017%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.8700% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`46.4181% <ø> (+0.0328%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hawkingrei · 2025-10-31T03:22:18Z

/retest

Copilot

Pull Request Overview

This PR introduces a ColIdxsByName map to track column name to index mappings across various planner data structures, maintaining consistency between column slices and their name-based lookups. The changes also add an AppendColumn helper method to ensure these mappings stay synchronized when columns are added.

Key changes:

Added ColIdxsByName map field to DataSource, LogicalIndexScan, and PhysicalTableScan structs
Introduced AppendColumn helper method to maintain synchronization between columns and name mappings
Added GetColIdxsByName utility function to build name-to-index maps
Implemented map updates during column pruning operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pkg/meta/model/column.go	Adds `GetColIdxsByName` utility function to build column name-to-index maps
pkg/meta/model/column_test.go	Tests the new `GetColIdxsByName` function
pkg/planner/core/operator/logicalop/logical_datasource.go	Adds `ColIdxsByName` field, implements `AppendColumn` helper, and updates column pruning logic to maintain map consistency
pkg/planner/core/operator/logicalop/logical_index_scan.go	Adds `ColIdxsByName` field to LogicalIndexScan struct
pkg/planner/core/operator/physicalop/physical_table_scan.go	Adds `ColIdxsByName` field and ensures it's cloned/copied when creating PhysicalTableScan instances
pkg/planner/core/logical_plan_builder.go	Updates DataSource construction to use `AppendColumn` helper and initialize `ColIdxsByName` map
pkg/planner/core/planbuilder.go	Initializes `ColIdxsByName` when building PhysicalIndexLookUpReader
pkg/planner/core/find_best_task.go	Populates `ColIdxsByName` when converting to index scan
pkg/planner/core/exhaust_physical_plans.go	Passes through `ColIdxsByName` when constructing table scan tasks
pkg/planner/core/planbuilder_test.go	Adds test to verify `AppendColumn` maintains consistency between columns and name mappings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/planner/core/operator/logicalop/logical_datasource.go

hawkingrei · 2025-10-31T03:30:40Z

/retest

D3Hunter

meta part lgtm

i remember search from map is slower compared with iterating slice, when the number of items is small, as hashing itself have overhead

pkg/meta/model/column.go

D3Hunter · 2025-11-04T03:33:47Z

/approve

elsa0520

LGTM

ti-chi-bot · 2025-11-04T03:48:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, elsa0520, hawkingrei, terry1purcell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/meta/OWNERS~~ [D3Hunter]
~~pkg/planner/OWNERS~~ [elsa0520,hawkingrei,terry1purcell]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

winoros · 2025-11-04T03:53:07Z

pkg/planner/core/operator/logicalop/logical_datasource.go

 	Columns       []*model.ColumnInfo
 	DBName        ast.CIStr

+	ColIdxsByName map[string]int


You can use a map map[int]int to store the mapping: id -> offset.
And then use the tableInfo.Columns[IndexColumn.Offset].ID to get the needed information.

I think the map[string]int will be hard to extend for other usage.

This map is used to cooperate with expression.Schema and other related stuff, but it's defined outside of it.
It's easy to forget to update it when pruning columns and adding columns.

As an example, I don't see that we update it during column pruning.

You can use a map map[int]int to store the mapping: id -> offset. And then use the tableInfo.Columns[IndexColumn.Offset].ID to get the needed information.

Hmm, doing this would require us to also pass the table info as an argument to IndexInfo2PrefixCols()/IndexInfo2Cols()...

I'm open to the idea, though. It might mean that we wouldn't need to maintain this mapping after all, which would make things a lot simpler.

I'll try your approach and see how it goes.

As an example, I don't see that we update it during column pruning.

We should be also updating it during column pruning (see lines 201-204 added to logical_datasource.go), unless I missed a spot.

That said, I think you make a good point here:

It's easy to forget to update it when pruning columns and adding columns.

henrybw · 2025-11-04T03:57:17Z

/hold

Copilot

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-04T03:57:34Z

pkg/planner/core/operator/logicalop/logical_datasource.go

+			delete(ds.ColIdxsByName, ds.Columns[i].Name.L)
+			for j := i + 1; j < len(ds.Columns); j++ {
+				ds.ColIdxsByName[ds.Columns[j].Name.L] = j - 1
+			}


The nested loop updating ColIdxsByName for all subsequent columns on every deletion results in O(n²) complexity when pruning multiple columns. Consider rebuilding the map once after all deletions are complete, or use a more efficient data structure update strategy.

winoros · 2025-11-04T03:58:09Z

pkg/planner/core/operator/logicalop/logical_datasource.go

 	Columns       []*model.ColumnInfo
 	DBName        ast.CIStr

+	ColIdxsByName map[string]int


As an example, I don't see that we update it during column pruning.

ti-chi-bot · 2025-11-04T03:58:33Z

[LGTM Timeline notifier]

Timeline:

2025-11-04 00:52:54.788109857 +0000 UTC m=+145024.231139736: ☑️ agreed by terry1purcell.
2025-11-04 03:47:36.452027529 +0000 UTC m=+155505.895057398: ☑️ agreed by hawkingrei.
2025-11-04 03:58:31.949692206 +0000 UTC m=+156161.392722075: ✖️🔁 reset by winoros.

winoros · 2025-11-04T04:01:11Z

🤔But this function's usage is limited to one specific case.
It also sounds good to me if it's possible to build it when we use it.

AilinKid · 2025-11-04T05:39:09Z

pkg/planner/core/operator/physicalop/physical_table_scan.go

 	ts := PhysicalTableScan{
 		Table:           ds.TableInfo,
 		Columns:         ds.Columns,
+		ColIdxsByName:   ds.ColIdxsByName,


why this place didn't use the clone？

I only cloned in places where ds.Columns was also cloned, but I might have misunderstood when it's necessary to clone. I will revisit this.

AilinKid · 2025-11-04T05:42:44Z

pkg/planner/core/operator/logicalop/logical_index_scan.go


 	Index          *model.IndexInfo
 	Columns        []*model.ColumnInfo
+	ColIdxsByName  map[string]int


i saw the adding of ColIdxsByName into the PhysicalTableScan and LogicalIndexScan, but any usage of them?

This would be used when calling expression.IndexInfo2PrefixCols(), which would come in a follow-up PR.

qw4990 · 2025-11-05T08:29:15Z

/retest

tiprow · 2025-11-05T08:29:40Z

@qw4990: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2025-11-05T09:31:15Z

@henrybw: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
idc-jenkins-ci-tidb/unit-test	`eb145bd`	link	true	`/test unit-test`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

henrybw · 2025-11-06T05:02:37Z

🤔But this function's usage is limited to one specific case. It also sounds good to me if it's possible to build it when we use it.

The problem with building it only when we use it is that the cost of building the map outweighs the benefits of using it, if we only build it and use it once.

That said, your suggestion to instead use the table info for finding the correlated columns might be a better approach than trying to maintain a map in the data source.

terry1purcell · 2025-11-06T19:02:16Z

/hold

Only holding to avoid an accidental merge (since already 2 positive reviews). You can remove the hold without prior authorization from me. Thank you

hawkingrei · 2025-11-11T14:55:06Z

I think we can add some more Go benchmark data to verify the performance improvement here.

planner: maintain a map of column indices by name in DataSource

5a57bc0

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 31, 2025

hawkingrei requested a review from Copilot October 31, 2025 03:19

Copilot AI reviewed Oct 31, 2025

View reviewed changes

pkg/planner/core/operator/logicalop/logical_datasource.go Outdated Show resolved Hide resolved

D3Hunter reviewed Oct 31, 2025

View reviewed changes

pkg/meta/model/column.go Outdated Show resolved Hide resolved

henrybw added 3 commits October 31, 2025 12:23

don't shadow the loop induction variable

1078e5e

GetColIdxsByName -> GetColIdxMapping

21d6381

TestColIdxsByName -> TestColIdxMapping

eb145bd

terry1purcell requested a review from D3Hunter November 4, 2025 00:51

terry1purcell approved these changes Nov 4, 2025

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 4, 2025

ti-chi-bot bot added the approved label Nov 4, 2025

hawkingrei approved these changes Nov 4, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 4, 2025

hawkingrei requested a review from Copilot November 4, 2025 03:47

elsa0520 approved these changes Nov 4, 2025

View reviewed changes

winoros reviewed Nov 4, 2025

View reviewed changes

ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2025

Copilot AI reviewed Nov 4, 2025

View reviewed changes

winoros requested changes Nov 4, 2025

View reviewed changes

ti-chi-bot bot removed the lgtm label Nov 4, 2025

AilinKid reviewed Nov 4, 2025

View reviewed changes

planner: maintain a map of column indices by name in DataSource #64204

Are you sure you want to change the base?

planner: maintain a map of column indices by name in DataSource #64204

Uh oh!

Conversation

henrybw commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

tiprow bot commented Oct 31, 2025

Uh oh!

codecov bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hawkingrei commented Oct 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

hawkingrei commented Oct 31, 2025

Uh oh!

D3Hunter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

D3Hunter commented Nov 4, 2025

Uh oh!

elsa0520 left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henrybw commented Nov 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Nov 4, 2025

[LGTM Timeline notifier]

Uh oh!

winoros commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qw4990 commented Nov 5, 2025

Uh oh!

tiprow bot commented Nov 5, 2025

Uh oh!

ti-chi-bot bot commented Nov 5, 2025

Uh oh!

henrybw commented Nov 6, 2025

henrybw commented Oct 31, 2025 •

edited

Loading

codecov bot commented Oct 31, 2025 •

edited

Loading