Skip to content

Conversation

pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Sep 4, 2024

Description

Split the dataset files into small pieces and process them in different batches to avoid exceeding the memory limit of Ray.

Copy link

This PR is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this PR will be closed in 3 day.

Copy link

Close this stale PR.

@yxdyc
Copy link
Collaborator

yxdyc commented Dec 12, 2024

Cc: @pan-x-c, @chenyushuo

When available, please add the new rule that considers the Ray's auto-split feature in this PR and resolve conflicts for CR.

Additionally, we need to incorporate the streaming_load_json patch into the main branch to align with our 2.0 paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feat] Support explicit FusedOP that allows for the configuration and application of multiple operators in smaller, manageable batches
2 participants