Skip to content

[Algorithm] Drain cache look up optimization, evaluate accuracy impacts #12

@Superskyyy

Description

@Superskyyy

Now, @Liangshumin has a prototype to use cache look-up to speed up Drain significantly. I changed the lookup to after masking since

  • I intend to ingest raw log, and the raw log has unique timestamps, cache will always miss before masking :)

The algorithm sped up at least 40%, by reducing tree traversal almost to neglectable time. (then it's the divide and conquer the problem of masking task).

We should keep this optimization in mind and conduct further testing. If it's stable, we should probably contribute back upstream as it's a general purpose optimization.

This thread tracks our case by testing and theoretical evaluation in case unwanted side-effects emerge.

We also need to evaluate the choice of cache size to limit memory usage, it most likely should be near max_cluster limit.

Metadata

Metadata

Labels

AlgorithmThe work is on the algorithm sideanalysis: logenhancementNew feature or requestupstreamA issue that could be submitted to upstream repos first

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions