- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.2k
 
Open
Description
Hi,
When using ollama and passing in "keep_alive" as a "language_model_params", the model is loaded with the default keep_alive of 5 minutes.
        result = lx.extract(
            text_or_documents=input_text,
            prompt_description=prompt,
            examples=examples,
            language_model_type=lx.inference.OllamaLanguageModel,
            model_id="qwen2.5:14b",
            model_url=os.getenv("OLLAMA_HOST", "http://localhost:11434"),
            temperature=0.3,
            fence_output=False,
            use_schema_constraints=False,
            max_char_buffer=5000,
            language_model_params={
                "num_ctx": 8192,
                "keep_alive": 10*60,   # 10 minutes
                "timeout": 10*60       # 10 minutes
            }
        )
You can run the following to verify (assuming the model wasn't in memory already), it will be loaded for 5 minutes.
ollama ps
In the Ollama.py file, it looks like keep_alive is put under the "options" parameter, but the Ollama API documentation shows that it is one of the top level parameters so the payload should be:
payload: dict[str, Any] = {
        'model': model,
        'prompt': prompt,
        'system': system,
        'stream': False,
        'raw': raw,
        'keep_alive': keep_alive,
        'options': options,
    }
chinmaynadgir
Metadata
Metadata
Assignees
Labels
No labels