-
Notifications
You must be signed in to change notification settings - Fork 59
[ChatQnA Core] Ollama Integration #890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Update packages to fix vulnerability Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Change opt
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have not checked the config files and test files in detail yet
download_huggingface_model(config.EMBEDDING_MODEL_ID, config._CACHE_DIR) | ||
download_huggingface_model(config.RERANKER_MODEL_ID, config._CACHE_DIR) | ||
download_huggingface_model(config.LLM_MODEL_ID, config._CACHE_DIR) | ||
elif config.MODEL_BACKEND == "ollama": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out aloud. Can we use runtime instead of backend? Or a better name
Reason being Ollama can have OpenVINO backend too though that is not the case currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can rename that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes updated.
search_kwargs={"k": 3, "score_threshold": 0.5}, search_type=search_method | ||
search_kwargs={ | ||
"k": 3, | ||
"fetch_k": fetch_k, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any empirical number on the quality of retrieval with threshold vs. top-k strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are finds and now i try to align with what modular does to keep the design same.
else: | ||
raise ValueError(f"Unsupported model backend: {config.MODEL_BACKEND}") | ||
|
||
embedding, llm, reranker = backend_instance.init_models() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about initialization of llm, embedding, reranker in test mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test mode, initial design was model download, conversion, and initialization will be mocked via pytest. So, in test mode, we are not gonna actually to download the model, doing the conversion and initialization. Those processes will be mocked.
if not model_id: | ||
raise ValueError(f"{model_name} must not be an empty string.") | ||
|
||
def _validate_backend_settings(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validate function is common across. Maintenance etc. becomes tougher on a continuous basis. Is it instead possible to have validate as part of backend specific configuration class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya we can will try to figure this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes updated. implemented a validator class to include those validators in corresponding runtime. In future, the new runtime can have their validator implemented under the class.
# Set the `OLLAMA_MODELS` to store the Ollama models | ||
os.environ['OLLAMA_MODELS'] = f"{self.cache_dir}/ollama_models" | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the check for already active server part of Ollama handling? If not, should we not add some logic for the same? Also ensure no zombie process is left behind w.r.t cleanup etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ollama server will only start once at the beginning of the application. The process is spawn within the container so when container is terminated the process will terminated as well.
from datetime import datetime, timezone | ||
import ollama | ||
import os | ||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya re is used in line 91 in exception block to clean the ANSI codes from the ollama server error. This is to give a better view for the error if anything happens/error when starting the ollama server
|
||
@router.get("/ollama-model", tags=["Model API"], summary="Get OLLAMA model metadata") | ||
async def get_ollama_model_metadata(model_id: str = ""): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is model_id is ""? Are we setting to a non-null value in config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. this is the default value for the endpoint. And this is just some utilities route for specific to ollama to get or show model metadata detail. Instead of go into the container and run 'ollama ps' or 'ollama show <model_id>' cli command. I provide these routes just for debug and utitlity purpose.
model_path = os.path.join(cache_dir, model_id) | ||
|
||
if os.path.isdir(model_path): | ||
logger.info(f"Optimized {model_id} exists in {cache_dir}. Skipping conversion...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it was not converted correctly? Is just a folder presence sufficient?
Check if you can place a marker at end of successful conversion or checksum based check.
uiTag: "core_1.2.2" | ||
pullPolicy: IfNotPresent | ||
tags: | ||
ui: "core_1.3.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What change in the UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nginx image upgrade and path change from /usr/ to /opt/ for nginx config storing. Also UI package vulnerabilities fix.
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
ChatQna modular changes are already merged, please remove them @hteeyeoh |
Done. Resolved the conflict. |
Description
Integrate Ollama framework into ChatQnA Core. User can choose to download model from from Ollama and serve them using Ollama.
Fixes # (issue)
Any Newly Introduced Dependencies
How Has This Been Tested?
local dev system and development kubernetes cluster. Unit test have been included using pytest framework
Checklist: