Skip to content

Commit e62400f

Browse files
authored
Merge pull request #53 from TogetherCrew/feat/mediawiki-batch-ingestion
feat: added max 10 parallel processing!
2 parents 74b2348 + cc09908 commit e62400f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

hivemind_etl/mediawiki/etl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ def load(self, documents: list[Document]) -> None:
106106
batch_size = 1000
107107
batches = [documents[i:i + batch_size] for i in range(0, len(documents), batch_size)]
108108

109-
with ThreadPoolExecutor() as executor:
109+
with ThreadPoolExecutor(max_workers=10) as executor:
110110
# Submit all batch processing tasks
111111
future_to_batch = {
112112
executor.submit(ingestion_pipeline.run_pipeline, batch): i

0 commit comments

Comments
 (0)