-
Notifications
You must be signed in to change notification settings - Fork 6.1k
planner: maintain a map of column indices by name in DataSource #64204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @henrybw. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #64204 +/- ##
================================================
+ Coverage 72.7334% 73.5241% +0.7907%
================================================
Files 1859 1860 +1
Lines 503870 505223 +1353
================================================
+ Hits 366482 371461 +4979
+ Misses 115127 111646 -3481
+ Partials 22261 22116 -145
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a ColIdxsByName map to track column name to index mappings across various planner data structures, maintaining consistency between column slices and their name-based lookups. The changes also add an AppendColumn helper method to ensure these mappings stay synchronized when columns are added.
Key changes:
- Added
ColIdxsByNamemap field toDataSource,LogicalIndexScan, andPhysicalTableScanstructs - Introduced
AppendColumnhelper method to maintain synchronization between columns and name mappings - Added
GetColIdxsByNameutility function to build name-to-index maps - Implemented map updates during column pruning operations
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/meta/model/column.go | Adds GetColIdxsByName utility function to build column name-to-index maps |
| pkg/meta/model/column_test.go | Tests the new GetColIdxsByName function |
| pkg/planner/core/operator/logicalop/logical_datasource.go | Adds ColIdxsByName field, implements AppendColumn helper, and updates column pruning logic to maintain map consistency |
| pkg/planner/core/operator/logicalop/logical_index_scan.go | Adds ColIdxsByName field to LogicalIndexScan struct |
| pkg/planner/core/operator/physicalop/physical_table_scan.go | Adds ColIdxsByName field and ensures it's cloned/copied when creating PhysicalTableScan instances |
| pkg/planner/core/logical_plan_builder.go | Updates DataSource construction to use AppendColumn helper and initialize ColIdxsByName map |
| pkg/planner/core/planbuilder.go | Initializes ColIdxsByName when building PhysicalIndexLookUpReader |
| pkg/planner/core/find_best_task.go | Populates ColIdxsByName when converting to index scan |
| pkg/planner/core/exhaust_physical_plans.go | Passes through ColIdxsByName when constructing table scan tasks |
| pkg/planner/core/planbuilder_test.go | Adds test to verify AppendColumn maintains consistency between columns and name mappings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/retest |
D3Hunter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta part lgtm
i remember search from map is slower compared with iterating slice, when the number of items is small, as hashing itself have overhead
|
/approve |
elsa0520
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, elsa0520, hawkingrei, terry1purcell The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| Columns []*model.ColumnInfo | ||
| DBName ast.CIStr | ||
|
|
||
| ColIdxsByName map[string]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a map map[int]int to store the mapping: id -> offset.
And then use the tableInfo.Columns[IndexColumn.Offset].ID to get the needed information.
I think the map[string]int will be hard to extend for other usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map is used to cooperate with expression.Schema and other related stuff, but it's defined outside of it.
It's easy to forget to update it when pruning columns and adding columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, I don't see that we update it during column pruning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a map
map[int]intto store the mapping: id -> offset. And then use thetableInfo.Columns[IndexColumn.Offset].IDto get the needed information.
Hmm, doing this would require us to also pass the table info as an argument to IndexInfo2PrefixCols()/IndexInfo2Cols()...
I'm open to the idea, though. It might mean that we wouldn't need to maintain this mapping after all, which would make things a lot simpler.
I'll try your approach and see how it goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, I don't see that we update it during column pruning.
We should be also updating it during column pruning (see lines 201-204 added to logical_datasource.go), unless I missed a spot.
That said, I think you make a good point here:
It's easy to forget to update it when pruning columns and adding columns.
|
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| delete(ds.ColIdxsByName, ds.Columns[i].Name.L) | ||
| for j := i + 1; j < len(ds.Columns); j++ { | ||
| ds.ColIdxsByName[ds.Columns[j].Name.L] = j - 1 | ||
| } |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested loop updating ColIdxsByName for all subsequent columns on every deletion results in O(n²) complexity when pruning multiple columns. Consider rebuilding the map once after all deletions are complete, or use a more efficient data structure update strategy.
| Columns []*model.ColumnInfo | ||
| DBName ast.CIStr | ||
|
|
||
| ColIdxsByName map[string]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, I don't see that we update it during column pruning.
[LGTM Timeline notifier]Timeline:
|
|
🤔But this function's usage is limited to one specific case. |
| ts := PhysicalTableScan{ | ||
| Table: ds.TableInfo, | ||
| Columns: ds.Columns, | ||
| ColIdxsByName: ds.ColIdxsByName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this place didn't use the clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only cloned in places where ds.Columns was also cloned, but I might have misunderstood when it's necessary to clone. I will revisit this.
|
|
||
| Index *model.IndexInfo | ||
| Columns []*model.ColumnInfo | ||
| ColIdxsByName map[string]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i saw the adding of ColIdxsByName into the PhysicalTableScan and LogicalIndexScan, but any usage of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be used when calling expression.IndexInfo2PrefixCols(), which would come in a follow-up PR.
|
/retest |
|
@qw4990: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@henrybw: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The problem with building it only when we use it is that the cost of building the map outweighs the benefits of using it, if we only build it and use it once. That said, your suggestion to instead use the table info for finding the correlated columns might be a better approach than trying to maintain a map in the data source. |
|
/hold Only holding to avoid an accidental merge (since already 2 positive reviews). You can remove the hold without prior authorization from me. Thank you |
|
I think we can add some more Go benchmark data to verify the performance improvement here. |
What problem does this PR solve?
Issue Number: ref #63856
Problem Summary: Profiling the time spent in the optimizer repeatedly executing, for 2 minutes, a query on a table like this, but with many more indexes (500) and columns (6,500), reveals that 34% of the time now is spent in
expression.IndexCol2Col(), due to this function being passed a slice of columns that the function has to search linearly with a for loop. Instead of scanning this slice, we should be using a map to reduce lookups of column by ID from O(N) to O(1).What changed and how does it work?
Similar to #64053, this is a preliminary PR that establishes the foundation for replacing the O(N) loop with an O(1) map lookup. The goal is to eventually add a
colIdxsByName map[string]intparameter toexpression.IndexCol2Col()that can be supplied by the logical data source.model.GetColIdxMapping()function that builds this map from a slice ofmodel.ColumnInfostructs, along with aTestColIdxMapping()test case for this new function.model: (1) tests in theutil/rangerpackage depend onexpression.IndexInfo2PrefixCols()and will need to build this map too, and (2) it is next to the definition of themodel.ColumnInfothat it operates on.ColIdxsByNamemap to the logical data source in the planner.ds.Columnsandds.Schema()directly to instead call a newAppendColumn()function that also ensures the column is tracked by theds.ColIdxsByNamemap.PruneColumns(), which has been updated to also remove the pruned column fromds.ColIdxsByNameand update the affected column indices in the map.TestDataSourceColumnMapstest case that ensures thatAppendColumn()maintains coherency betweends.Columnsandds.ColIdxsByName, and also ensures thatAppendTableCol()maintains coherency betweends.TblColsandds.TblColsByID.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.