-
Couldn't load subscription status.
- Fork 9
feat: storage v2 binlog data source #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shaoting-huang The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1ac7af1 to
dc6b468
Compare
a2889d6 to
b755020
Compare
Though milvus-storage provides ffi interface so that spark can read/write binlogs with storage v2 format, the current Milvus 2.6 is using milvus-storage packed format, which there is no manifest file. Therefore, in order to be compatibile with Milvus 2.6 binlog format with the ffi reader interface, we need to implement a manifest builder to build a manifest file based on segment info and milvus schema. Introduces a new Spark data source for reading Milvus Storage V2 binlog data. Key additions include: MilvusStorageV2DataSource: Spark DataSourceV2 implementation for accessing V2 binlogs. binlogv2 module: handles manifest generation, binlog grouping, and Parquet metadata reading. FFI and JNI integration via milvus-storage to interface with native binlog parsing libraries. Utility packages (serde, schema, etc.) to support serialization, schema mapping, and type conversion. Test suites covering manifest building, native integration, and source loading with Spark SQL logic. Updated build.sbt and added spark_submit_demo.sh for native linking and demo execution. Registered new DataSource in META-INF/services. This enhancement enables Spark to efficiently read Milvus V2 binlog files and makes the connector compatible with the latest Milvus storage format. Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
b755020 to
ab0a6f5
Compare
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
da08492 to
df0dfd0
Compare
Although milvus-storage provides an FFI interface that allows Spark to read and write binlogs in the Storage v2 format, the current Milvus 2.6 version uses the milvus-storage packed format, which does not include a manifest file.
To ensure compatibility with the Milvus 2.6 binlog format when using the FFI reader interface, we need to implement a manifest builder that generates a manifest file based on the segment information and Milvus schema.
This commit introduces a new Spark data source for reading Milvus Storage V2 binlog data. Key additions include:
MilvusStorageV2DataSource: Spark DataSourceV2 implementation for accessing V2 binlogs.binlogv2module: handles manifest generation, binlog grouping, and Parquet metadata reading.milvus-storageto interface with native binlog parsing libraries.schema, etc.) to support schema mapping, and type conversion.build.sbtand addedspark_submit_demo.shfor native linking and demo execution.META-INF/services.This enhancement enables Spark to efficiently read Milvus V2 binlog files and makes the connector compatible with the latest Milvus storage format.
Support ability: