generated from amazon-archives/__template_Custom
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 177
Open
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't workingpushdownpushdown related issuespushdown related issues
Description
What is the bug?
The current top rare commands were implemented by window. Since the aggregation has bucket size limitation and paginating aggregation is unsupported yet. The result could incorrect, for example.
> source = nginx_logs
   | top bytes by method
   | head;
   
fetched rows / total rows = 10/10
+--------+-------+-------+
| method | bytes | count |
|--------+-------+-------|
| DELETE | 81    | 13    |
| DELETE | 74    | 12    |
| DELETE | 51    | 10    |
| DELETE | 71    | 10    |
| DELETE | 88    | 10    |
| DELETE | 95    | 10    |
| DELETE | 44    | 9     |
| DELETE | 49    | 9     |
| DELETE | 52    | 9     |
| DELETE | 53    | 9     |
+--------+-------+-------+
But
> source = nginx_logs
   | dedup method, bytes
   | stats count();
fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 10804   |
+---------+
We can optimize top to nested terms aggregation:
{
  "size": 0,
  "aggs": {
    "topBy": {
      "terms": {
        "field": "method"
      },
      "aggs": {
        "topField": {
          "terms": {
            "field": "bytes",
            "size": 10
          }
        }
      }
    }
  }
}
And optimize rare to nested terms aggregation:
{
  "size": 0,
  "aggs": {
    "topBy": {
      "terms": {
        "field": "method"
      },
      "aggs": {
        "topField": {
          "terms": {
            "field": "bytes",
            "size": 10,
            "order": {
              "_count": "asc"
            }
          }
        }
      }
    }
  }
}
How can one reproduce the bug?
- Create a dataset with at least 1000 different buckets according to some sort of grouping
- Observe incorrect results when trying to top/rare
Do you have any additional context?
#4278
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't workingpushdownpushdown related issuespushdown related issues
Type
Projects
Status
In progress