-
Notifications
You must be signed in to change notification settings - Fork 209
[WIP] refactor: Implement modular candle-binding architecture (#254) #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. refactor: Implement modular candle-binding architecture - Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. Signed-off-by: OneZero-Y <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
@OneZero-Y @Xunzhuo Let's have the following resolved before merging
|
feat:unit tests for candle refactoring feat:unit tests for candle refactoring Signed-off-by: OneZero-Y <[email protected]>
@OneZero-Y now since we work on the feature branch, how about you use this branch for both refactoring and new embedding models? |
@rootfs OK, I'll advance the embedded model on this branch |
@OneZero-Y that's great! I'll switch to this work as soon as i can. |
let handles = vec![ | ||
self.spawn_intent_task(texts_owned.clone(), Arc::clone(&intent_results)), | ||
self.spawn_pii_task(texts_owned.clone(), Arc::clone(&pii_results)), | ||
self.spawn_security_task(texts_owned, Arc::clone(&security_results)), | ||
]; | ||
|
||
// Wait for all threads to complete | ||
for handle in handles { | ||
handle.join().map_err(|_| { | ||
let unified_err = concurrency_error( | ||
"thread join", | ||
"Failed to join parallel classification thread", | ||
); | ||
candle_core::Error::from(unified_err) | ||
})?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified a bit. Something like
let intent_handle = thread::spawn(|| intent_task(texts)); // slice is fine, no need to own the data.
let pii_handle = ... same
let security_handle = ... same
let intent_results = intent_handle.join()?; // map_err omitted
let pii_results = pii_handle.join()?;
let security_results = security_handle.join()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're on the topic of threads - you may like some of the abstractions that the rayon
crate provides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @ivarflakstad
pub fn parallel_detect(&self, texts: &[&str]) -> Result<Vec<PIIResult>> { | ||
let mut results = Vec::new(); | ||
for text in texts { | ||
results.push(self.detect_pii(text)?); | ||
} | ||
Ok(results) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want this to be in parallel you could do something like
// add `use rayon::prelude::*;` at top of file
Ok(texts.par_iter().map(|text| self.detect_pii(text)?).collect())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I'm starting to suspect that what you actually want, for the long term, is an async runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivarflakstad thanks for looking into this. On a separate note, for async to run most efficiently, would you help look at the if locking is done the right way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
Are you thinking about any specific locks in particular? (pr is fairly large 😉 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @ivarflakstad
The classify_text is currently protected under lock. This could get us performance hit, would you help share your ideas? Thanks
Maintains backward compatibility while enabling future model integrations.
refactor: Implement modular candle-binding architecture
Maintains backward compatibility while enabling future model integrations.
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Release Notes: Yes/No