[Algorithm] Drain cache look up optimization, evaluate accuracy impacts

Now, @Liangshumin has a prototype to use cache look-up to speed up Drain significantly. I changed the lookup to after masking since

- I intend to ingest raw log, and the raw log has unique timestamps, cache will always miss before masking :) 

The algorithm sped up at least 40%, by reducing tree traversal almost to neglectable time. (then it's the divide and conquer the problem of masking [task](https://github.com/SkyAPM/aiops-engine-for-skywalking/issues/10)).

We should keep this optimization in mind and conduct further testing. If it's stable, we should probably contribute back upstream as it's a general purpose optimization.

This thread tracks our case by testing and theoretical evaluation in case unwanted side-effects emerge.

We also need to evaluate the choice of cache size to limit memory usage, it most likely should be near max_cluster limit.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Algorithm] Drain cache look up optimization, evaluate accuracy impacts #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Algorithm] Drain cache look up optimization, evaluate accuracy impacts #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions