Skip to content

[BUG] Optimize the top rare commands to nested aggregation #4671

@LantaoJin

Description

@LantaoJin

What is the bug?
The current top rare commands were implemented by window. Since the aggregation has bucket size limitation and paginating aggregation is unsupported yet. The result could incorrect, for example.

> source = nginx_logs
   | top bytes by method
   | head;
   
fetched rows / total rows = 10/10
+--------+-------+-------+
| method | bytes | count |
|--------+-------+-------|
| DELETE | 81    | 13    |
| DELETE | 74    | 12    |
| DELETE | 51    | 10    |
| DELETE | 71    | 10    |
| DELETE | 88    | 10    |
| DELETE | 95    | 10    |
| DELETE | 44    | 9     |
| DELETE | 49    | 9     |
| DELETE | 52    | 9     |
| DELETE | 53    | 9     |
+--------+-------+-------+

But

> source = nginx_logs
   | dedup method, bytes
   | stats count();

fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 10804   |
+---------+

We can optimize top to nested terms aggregation:

{
  "size": 0,
  "aggs": {
    "topBy": {
      "terms": {
        "field": "method"
      },
      "aggs": {
        "topField": {
          "terms": {
            "field": "bytes",
            "size": 10
          }
        }
      }
    }
  }
}

And optimize rare to nested terms aggregation:

{
  "size": 0,
  "aggs": {
    "topBy": {
      "terms": {
        "field": "method"
      },
      "aggs": {
        "topField": {
          "terms": {
            "field": "bytes",
            "size": 10,
            "order": {
              "_count": "asc"
            }
          }
        }
      }
    }
  }
}

How can one reproduce the bug?

  1. Create a dataset with at least 1000 different buckets according to some sort of grouping
  2. Observe incorrect results when trying to top/rare

Do you have any additional context?
#4278

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't workingpushdownpushdown related issues

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions