CPU memory leak

A CPU memory leak is observed when inferencing using GPU, even when `NativeOps` is not used (by removing libllm_sharp_ops.so).

CPU memory continues to grow while inferencing. Diagnostics from `torch.Tensor.TotalCount` and `torch.Tensor.PeakCount` is stable during multiple turns of chat. GPU memory is also stable, and no GPU memory leak is observed.

Profiled the program with valgrind massif and memcheck. No obvious clues for the leakage from the logs.

[massif.out.gz](https://github.com/K024/llm-sharp/files/13915672/massif.out.gz)
[vgdump.gz](https://github.com/K024/llm-sharp/files/13915673/vgdump.gz)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU memory leak #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CPU memory leak #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions