Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Oct 30, 2025

Description

Pushdown top 10 usenull=true bytes by method to nested terms aggregation:

{
  "size": 0,
  "aggs": {
    "topBy": {
      "terms": {
        "field": "method"
      },
      "aggs": {
        "topField": {
          "terms": {
            "field": "bytes",
            "size": 10,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
  }
}

And pushdown rare 10 usenull=true bytes by method to:

{
  "size": 0,
  "aggs": {
    "topBy": {
      "terms": {
        "field": "method"
      },
      "aggs": {
        "topField": {
          "terms": {
            "field": "bytes",
            "size": 10,
            "order": {
              "_count": "asc"
            }
          }
        }
      }
    }
  }
}

Additional, rename metrics to measure, rename rule file names.

Related Issues

Resolves #4671

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@LantaoJin LantaoJin added the enhancement New feature or request label Oct 30, 2025
@LantaoJin LantaoJin marked this pull request as ready for review October 30, 2025 15:36
@LantaoJin LantaoJin changed the title Optimize the top rare commands to nested aggregation Pushdown the top rare commands to nested aggregation Oct 30, 2025
yuancu
yuancu previously approved these changes Oct 31, 2025
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[2], expr#5=[<=($t3, $t4)], proj#0..2=[{exprs}], $condition=[$t5])
EnumerableWindow(window#0=[window(partition {0} order by [2] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},count=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},count=COUNT()), RARE_TOP->rare 2 state by gender, PROJECT->[gender, state, count], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"gender":{"terms":{"field":"gender.keyword","size":10000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]},"aggregations":{"state":{"terms":{"field":"state.keyword","size":2,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"asc"},{"_key":"asc"}]}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Copy link
Collaborator

@yuancu yuancu Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] It's advised to use rare_terms aggregation in place of terms aggregation with ascending count order: https://docs.opensearch.org/latest/aggregations/bucket/rare-terms/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace rare in an followup PR?

Copy link
Collaborator

@qianheng-aws qianheng-aws Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to push down limit into the size of the first level term agg? I guess it will produce 10000 * 2 rows in the end if there is enough buckets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guessed right. In general, neither 65535 nor 10000 is precise if the first group has 100,000 keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not proper to fix it in this PR. @qianheng-aws Can you submit another fix for limit pushdown on nested aggregate: Pushdown to first tier + keep EnumerableLimit.

Comment on lines 1895 to 1896

//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintentional insertion of empty comments?

Copy link
Member Author

@LantaoJin LantaoJin Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted.

EnumerableCalc(expr#0..3=[{inputs}], expr#4=[2], expr#5=[<=($t3, $t4)], proj#0..2=[{exprs}], $condition=[$t5])
EnumerableWindow(window#0=[window(partition {0} order by [2] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},count=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},count=COUNT()), RARE_TOP->rare 2 state by gender, PROJECT->[gender, state, count], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"gender":{"terms":{"field":"gender.keyword","size":10000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]},"aggregations":{"state":{"terms":{"field":"state.keyword","size":2,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"asc"},{"_key":"asc"}]}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Copy link
Collaborator

@qianheng-aws qianheng-aws Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to push down limit into the size of the first level term agg? I guess it will produce 10000 * 2 rows in the end if there is enough buckets.

rowCount,
RelMdUtil.guessSelectivity(((FilterDigest) operation.digest()).condition()));
case LIMIT -> Math.min(rowCount, ((LimitDigest) operation.digest()).limit());
case RARE_TOP -> Math.min(
Copy link
Collaborator

@qianheng-aws qianheng-aws Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accuracy to set the row count to rare/top's number directly? Shouldn't be rare/top's number * estimated buckets' size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// Because we'd like to push down LIMIT even when the fetch in LIMIT is greater than
// dRows.
case LIMIT -> dRows = Math.min(dRows, ((LimitDigest) operation.digest()).limit()) - 1;
case RARE_TOP -> dRows = Math.min(dRows, ((RareTopDigest) operation.digest()).number()) - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Optimize the top rare commands to nested aggregation

3 participants