Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Greptile SummaryThis PR introduces a new buffer pool-based storage backend with several critical implementation bugs that must be fixed before merging. Critical bugs identified:
These are not edge cases — Confidence Score: 1/5
Sequence DiagramsequenceDiagram
participant Caller
participant WrappedSegment
participant BufferStorage
participant VecBufferPoolHandle
participant VecBufferPool
participant LPMap
Caller->>WrappedSegment: fetch(offset, buf, len)
WrappedSegment->>BufferStorage: get_buffer(offset❌, len, segment_id)
note over WrappedSegment,BufferStorage: BUG: should be absolute file offset,<br/>not segment-relative offset
BufferStorage->>VecBufferPoolHandle: get_block(offset, len, segment_id)
VecBufferPoolHandle->>VecBufferPool: acquire_buffer(segment_id, offset, len, 5)
VecBufferPool->>LPMap: acquire_block(segment_id)
alt Block already cached
LPMap-->>VecBufferPool: cached buffer ptr
else Not cached — fetch from disk
VecBufferPool->>VecBufferPool: pread(fd, buffer, len, offset)
VecBufferPool->>LPMap: set_block_acquired(segment_id, buffer)
note over VecBufferPool,LPMap: BUG: recycle() called without mutex_<br/>before reaching this point (line 185)
LPMap-->>VecBufferPool: placed buffer ptr
end
VecBufferPool-->>VecBufferPoolHandle: buffer ptr
VecBufferPoolHandle-->>BufferStorage: buffer ptr
BufferStorage-->>WrappedSegment: buffer ptr
WrappedSegment->>WrappedSegment: memmove(buf, buffer + offset❌, len)
note over WrappedSegment: BUG: offset applied twice → OOB if offset≥len
WrappedSegment-->>Caller: len
|
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: yinzefeng.yzf <yinzefeng.yzf@alibaba-inc.com>
Changed 'your-org' placeholder to 'alibaba' in the git clone command to reflect the correct repository URL.
…BUTING (#150) - README.md: remove spurious space in align=" center" → align="center" (logo was not centered on GitHub due to invalid HTML attribute value) - CONTRIBUTING.md: correct Python prerequisite from '>= 3.9' to '3.10 - 3.12' to match pyproject.toml classifiers and CI matrix (cp310, cp312)
* docs: remove x.com in join us * docs: join discord with widget
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
* feat: linux ci with github runner * fix: add CFLAGS with -march=native * fix: fix linux x86 ci * fix: with zen3 in x86 ci * feat: nightly report with github-runner * feat: refactor ci workflow * feat: rename workflow * feat: rename job * feat: update ci badge * feat: remove python coverage
* add Jina Embeddings v5 support * fix ruff format for jina_embedding_function.py
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
| for (int i = 0; i < retry; i++) { | ||
| lp_map_.recycle(free_buffers_); | ||
| found = free_buffers_.try_dequeue(buffer); | ||
| if (found) { | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
recycle() called without holding mutex_ in acquire_buffer()
The header file (buffer_pool.h:76) documents that recycle() must be called under lock. However, at line 185, lp_map_.recycle(free_buffers_) is invoked without holding mutex_. The mutex is only acquired later at line 206 for set_block_acquired().
This creates a race window where concurrent threads can race during the eviction and placement phases. One thread can validate and evict a block while another thread is attempting to acquire and place the same block.
Wrap the recycle call with the mutex:
for (int i = 0; i < retry; i++) {
{
std::lock_guard<std::mutex> lock(mutex_);
lp_map_.recycle(free_buffers_);
}
found = free_buffers_.try_dequeue(buffer);
if (found) break;
}| ~VecBufferPool() { | ||
| close(fd_); | ||
| } |
There was a problem hiding this comment.
VecBufferPool destructor leaks all pooled buffers
The destructor only closes the file descriptor but never frees the char* buffers. All buffers allocated via ailego_malloc() in init() (line 158) and enqueued into free_buffers_ are never deallocated. Additionally, any buffers currently held in lp_map_.entries_[*].buffer are also leaked on destruction. This causes a permanent memory leak every time the buffer pool is destroyed.
The destructor should drain both the free pool and any buffers still held in the LPMap:
~VecBufferPool() {
// Free all buffers in the free list
char *buf = nullptr;
while (free_buffers_.try_dequeue(buf)) {
ailego_free(buf);
}
// Free any buffers still pinned in the map
for (size_t i = 0; i < lp_map_.entry_num(); ++i) {
char *b = lp_map_.evict_block(i);
if (b) ailego_free(b);
}
close(fd_);
}
resolve #64