Skip to content

Conversation

hteeyeoh
Copy link
Contributor

@hteeyeoh hteeyeoh commented Sep 3, 2025

Description

Integrate Ollama framework into ChatQnA Core. User can choose to download model from from Ollama and serve them using Ollama.

Fixes # (issue)

Any Newly Introduced Dependencies

  • ollama binary
  • langchain-ollama

How Has This Been Tested?

local dev system and development kubernetes cluster. Unit test have been included using pytest framework

Checklist:

  • I agree to use the APACHE-2.0 license for my code changes.
  • I have not introduced any 3rd party components incompatible with APACHE-2.0.
  • I have not included any company confidential information, trade secret, password or security token.
  • I have performed a self-review of my code.

14pankaj and others added 16 commits August 4, 2025 16:00
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Update packages to fix vulnerability

Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Copy link
Contributor

@bharagha bharagha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not checked the config files and test files in detail yet

download_huggingface_model(config.EMBEDDING_MODEL_ID, config._CACHE_DIR)
download_huggingface_model(config.RERANKER_MODEL_ID, config._CACHE_DIR)
download_huggingface_model(config.LLM_MODEL_ID, config._CACHE_DIR)
elif config.MODEL_BACKEND == "ollama":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out aloud. Can we use runtime instead of backend? Or a better name
Reason being Ollama can have OpenVINO backend too though that is not the case currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can rename that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes updated.

search_kwargs={"k": 3, "score_threshold": 0.5}, search_type=search_method
search_kwargs={
"k": 3,
"fetch_k": fetch_k,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any empirical number on the quality of retrieval with threshold vs. top-k strategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are finds and now i try to align with what modular does to keep the design same.

else:
raise ValueError(f"Unsupported model backend: {config.MODEL_BACKEND}")

embedding, llm, reranker = backend_instance.init_models()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about initialization of llm, embedding, reranker in test mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test mode, initial design was model download, conversion, and initialization will be mocked via pytest. So, in test mode, we are not gonna actually to download the model, doing the conversion and initialization. Those processes will be mocked.

if not model_id:
raise ValueError(f"{model_name} must not be an empty string.")

def _validate_backend_settings(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate function is common across. Maintenance etc. becomes tougher on a continuous basis. Is it instead possible to have validate as part of backend specific configuration class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya we can will try to figure this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes updated. implemented a validator class to include those validators in corresponding runtime. In future, the new runtime can have their validator implemented under the class.

# Set the `OLLAMA_MODELS` to store the Ollama models
os.environ['OLLAMA_MODELS'] = f"{self.cache_dir}/ollama_models"

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the check for already active server part of Ollama handling? If not, should we not add some logic for the same? Also ensure no zombie process is left behind w.r.t cleanup etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ollama server will only start once at the beginning of the application. The process is spawn within the container so when container is terminated the process will terminated as well.

from datetime import datetime, timezone
import ollama
import os
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya re is used in line 91 in exception block to clean the ANSI codes from the ollama server error. This is to give a better view for the error if anything happens/error when starting the ollama server


@router.get("/ollama-model", tags=["Model API"], summary="Get OLLAMA model metadata")
async def get_ollama_model_metadata(model_id: str = ""):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is model_id is ""? Are we setting to a non-null value in config file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. this is the default value for the endpoint. And this is just some utilities route for specific to ollama to get or show model metadata detail. Instead of go into the container and run 'ollama ps' or 'ollama show <model_id>' cli command. I provide these routes just for debug and utitlity purpose.

model_path = os.path.join(cache_dir, model_id)

if os.path.isdir(model_path):
logger.info(f"Optimized {model_id} exists in {cache_dir}. Skipping conversion...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it was not converted correctly? Is just a folder presence sufficient?
Check if you can place a marker at end of successful conversion or checksum based check.

uiTag: "core_1.2.2"
pullPolicy: IfNotPresent
tags:
ui: "core_1.3.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What change in the UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nginx image upgrade and path change from /usr/ to /opt/ for nginx config storing. Also UI package vulnerabilities fix.

madhuri-rai07
madhuri-rai07 previously approved these changes Sep 26, 2025
@madhuri-rai07
Copy link
Contributor

ChatQna modular changes are already merged, please remove them @hteeyeoh

@hteeyeoh
Copy link
Contributor Author

ChatQna modular changes are already merged, please remove them @hteeyeoh

Done. Resolved the conflict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants