Skip to content

Conversation

rootfs
Copy link
Collaborator

@rootfs rootfs commented Sep 28, 2025

  • Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
  • Add unified error handling and configuration loading systems
  • Implement dual-path architecture for traditional and LoRA models
  • Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

refactor: Implement modular candle-binding architecture 

  • Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
  • Add unified error handling and configuration loading systems
  • Implement dual-path architecture for traditional and LoRA models
  • Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No


- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.



refactor: Implement modular candle-binding architecture

- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

Signed-off-by: OneZero-Y <[email protected]>
@rootfs rootfs requested a review from Xunzhuo as a code owner September 28, 2025 12:40
Copy link

netlify bot commented Sep 28, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit a48b0be
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68dbcf9933a669000858d5e3
😎 Deploy Preview https://deploy-preview-266--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Sep 28, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/src/classifiers/lora/intent_lora.rs
  • candle-binding/src/classifiers/lora/intent_lora_test.rs
  • candle-binding/src/classifiers/lora/mod.rs
  • candle-binding/src/classifiers/lora/parallel_engine.rs
  • candle-binding/src/classifiers/lora/pii_lora.rs
  • candle-binding/src/classifiers/lora/pii_lora_test.rs
  • candle-binding/src/classifiers/lora/security_lora.rs
  • candle-binding/src/classifiers/lora/security_lora_test.rs
  • candle-binding/src/classifiers/lora/token_lora.rs
  • candle-binding/src/classifiers/lora/token_lora_test.rs
  • candle-binding/src/classifiers/mod.rs
  • candle-binding/src/classifiers/traditional/batch_processor.rs
  • candle-binding/src/classifiers/traditional/batch_processor_test.rs
  • candle-binding/src/classifiers/traditional/mod.rs
  • candle-binding/src/classifiers/traditional/modernbert_classifier.rs
  • candle-binding/src/classifiers/traditional/modernbert_classifier_test.rs
  • candle-binding/src/classifiers/unified.rs
  • candle-binding/src/classifiers/unified_test.rs
  • candle-binding/src/core/config_loader.rs
  • candle-binding/src/core/config_loader_test.rs
  • candle-binding/src/core/mod.rs
  • candle-binding/src/core/similarity.rs
  • candle-binding/src/core/tokenization.rs
  • candle-binding/src/core/unified_error.rs
  • candle-binding/src/core/unified_error_test.rs
  • candle-binding/src/ffi/classify.rs
  • candle-binding/src/ffi/classify_test.rs
  • candle-binding/src/ffi/init.rs
  • candle-binding/src/ffi/memory.rs
  • candle-binding/src/ffi/memory_safety.rs
  • candle-binding/src/ffi/memory_safety_test.rs
  • candle-binding/src/ffi/mod.rs
  • candle-binding/src/ffi/similarity.rs
  • candle-binding/src/ffi/state_manager.rs
  • candle-binding/src/ffi/tokenization.rs
  • candle-binding/src/ffi/types.rs
  • candle-binding/src/ffi/validation.rs
  • candle-binding/src/model_architectures/config.rs
  • candle-binding/src/model_architectures/lora/bert_lora.rs
  • candle-binding/src/model_architectures/lora/bert_lora_test.rs
  • candle-binding/src/model_architectures/lora/lora_adapter.rs
  • candle-binding/src/model_architectures/lora/mod.rs
  • candle-binding/src/model_architectures/mod.rs
  • candle-binding/src/model_architectures/model_factory.rs
  • candle-binding/src/model_architectures/model_factory_test.rs
  • candle-binding/src/model_architectures/routing.rs
  • candle-binding/src/model_architectures/routing_test.rs
  • candle-binding/src/model_architectures/traditional/base_model.rs
  • candle-binding/src/model_architectures/traditional/base_model_test.rs
  • candle-binding/src/model_architectures/traditional/bert.rs
  • candle-binding/src/model_architectures/traditional/bert_test.rs
  • candle-binding/src/model_architectures/traditional/mod.rs
  • candle-binding/src/model_architectures/traditional/modernbert.rs
  • candle-binding/src/model_architectures/traditional/modernbert_test.rs
  • candle-binding/src/model_architectures/traits.rs
  • candle-binding/src/model_architectures/unified_interface.rs
  • candle-binding/src/model_architectures/unified_interface_test.rs
  • candle-binding/src/test_fixtures.rs
  • candle-binding/src/utils/memory.rs
  • candle-binding/src/utils/mod.rs
  • candle-binding/Cargo.toml
  • candle-binding/src/lib.rs

📁 config

Owners: @rootfs
Files changed:

  • config/config.yaml

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/build-run-test.mk
  • tools/make/common.mk
  • tools/make/rust.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator Author

rootfs commented Sep 28, 2025

@OneZero-Y @Xunzhuo Let's have the following resolved before merging

  • Add more candle unit tests
  • Verify API accuracy
  • Ensure semantic-router use the right binding API
  • Remove legacy comment and code

@rootfs rootfs added this to the v0.1 milestone Sep 28, 2025
feat:unit tests for candle refactoring



feat:unit tests for candle refactoring

Signed-off-by: OneZero-Y <[email protected]>
@rootfs
Copy link
Collaborator Author

rootfs commented Sep 30, 2025

@OneZero-Y now since we work on the feature branch, how about you use this branch for both refactoring and new embedding models?

@rootfs rootfs mentioned this pull request Oct 5, 2025
@OneZero-Y
Copy link
Contributor

@rootfs OK, I'll advance the embedded model on this branch

@rootfs
Copy link
Collaborator Author

rootfs commented Oct 9, 2025

@OneZero-Y that's great! I'll switch to this work as soon as i can.

Comment on lines +89 to +104
let handles = vec![
self.spawn_intent_task(texts_owned.clone(), Arc::clone(&intent_results)),
self.spawn_pii_task(texts_owned.clone(), Arc::clone(&pii_results)),
self.spawn_security_task(texts_owned, Arc::clone(&security_results)),
];

// Wait for all threads to complete
for handle in handles {
handle.join().map_err(|_| {
let unified_err = concurrency_error(
"thread join",
"Failed to join parallel classification thread",
);
candle_core::Error::from(unified_err)
})?;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified a bit. Something like

let intent_handle = thread::spawn(|| intent_task(texts)); // slice is fine, no need to own the data.
let pii_handle = ... same
let security_handle = ... same

let intent_results = intent_handle.join()?; // map_err omitted
let pii_results = pii_handle.join()?;
let security_results = security_handle.join()?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're on the topic of threads - you may like some of the abstractions that the rayon crate provides

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @ivarflakstad

Comment on lines +162 to +168
pub fn parallel_detect(&self, texts: &[&str]) -> Result<Vec<PIIResult>> {
let mut results = Vec::new();
for text in texts {
results.push(self.detect_pii(text)?);
}
Ok(results)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want this to be in parallel you could do something like

// add `use rayon::prelude::*;` at top of file 
Ok(texts.par_iter().map(|text| self.detect_pii(text)?).collect())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I'm starting to suspect that what you actually want, for the long term, is an async runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivarflakstad thanks for looking into this. On a separate note, for async to run most efficiently, would you help look at the if locking is done the right way?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)
Are you thinking about any specific locks in particular? (pr is fairly large 😉 )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @ivarflakstad

The classify_text is currently protected under lock. This could get us performance hit, would you help share your ideas? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants