Skip to content

Cannot search with Ollama #11

@raffaem

Description

@raffaem

I have set it up with Ollama.

Here is my model_config.py:

import os

"""
This module defines the configuration for language model (LLM) and embedding models.
Attributes:
    api_key (str): The OpenAI API key, loaded from the environment variable 'OPENAI_API_KEY'.
    model_config (dict): A dictionary containing configuration parameters for LLM and embedding models.
        Keys:
            - "llm_model_name" (str): Name of the LLM model to use.
            - "llm_type" (str): Type of the LLM provider (e.g., "openai"). "local" is for lmstudio, 
            for ollama and other local models use "others" with base_url updated in openai_compatible.
            - If you using others llm type, then check the openai_compatible url dict for others key, you can generally 
            find it by "googling YOUR provider name openai api base compatilble url"
            - "llm_tools" (list): List of tools or plugins to use with the LLM.
            - "llm_kwargs" (dict): Additional keyword arguments for LLM initialization.
                - "temperature" (float): Sampling temperature for generation.
                - "max_tokens" (int or None): Maximum number of tokens to generate.
                - "timeout" (int or None): Timeout for API requests.
                - "max_retries" (int): Maximum number of retries for failed requests.
                - "api_key" (str): API key for authentication.
            - "embedding_model_name" (str): Name of the embedding model to use.
            - "embed_mode" (str): Embedding mode or backend.
            - "cross_encoder_name" (str): Name of the cross-encoder model for reranking.
"""
############## PORT and HOST SETTINGS
PORT_NUM_SEARXNG = 8085
PORT_NUM_APP = 8000
HOST_APP = "localhost"
HOST_SEARXNG = "localhost"
###############

## USER INPUTS NEEDED
# for open source model you can replace it by 'DUMMY' (for both llm and embed), else respective providers
llm_api_key = os.environ.get(
    "GOOGLE_API_KEY", "DUMMY"
)  # either paste llm key, based on provider (for an instance, Google) here directly or export it in the env, else dummy for local
embed_api_key = os.environ.get(
    "GOOGLE_API_KEY", "DUMMY"
)  # either paste embeder key, based on provider (for an instance, Google) here directly or export it in the env, else dummy for local

model_config = {
    # Name of the LLM model to use. For local models, use the model name served by your local server.
    "llm_model_name": "gpt-oss",
    # LLM provider type: choose from 'google', 'local', 'groq', or 'openai' or 'others'
    # in case of 'others' (base url needs to be updated in the `openai_compatible` dictionary below).
    # Make sure to update the api_key variable above to match the provider.
    # "local" is for lmstudio, for ollama and other local models use "others" with base_url updated in openai_compatible.
    # You can generally find it by "googling YOUR PROVIDER (example ollama) name openai api base compatible url"
    "llm_type": "others",
    # List of tools or plugins to use with the LLM, if any. Set to None if not used.
    "llm_tools": None,
    # Additional keyword arguments for LLM initialization.
    "llm_kwargs": {
        "temperature": 0.1,  # Sampling temperature for generation.
        "max_tokens": None,  # Maximum number of tokens to generate (None for default).
        "timeout": None,  # Timeout for API requests (None for default).
        "max_retries": 2,  # Maximum number of retries for failed requests.
        "api_key": llm_api_key,  # API key for authentication.
    },
    # Name of the embedding model to use.
    # For Google, use their embedding model names. For local/HuggingFace, use the model path or name.
    # Tested models can be found at https://github.com/michaelfeil/infinity?tab=readme-ov-file#supported-tasks-and-models-by-infinity
    "embedding_model_name": "mixedbread-ai/mxbai-embed-large-v1",
    "embed_kwargs": {},  # optional additional kwargs for embedding model initialization
    # Embedding backend: 'google' for Google, 'infinity_emb' for local/HuggingFace models.
    "embed_mode": "infinity_emb",
    # Name of the cross-encoder model for reranking, typically a HuggingFace model.
    "cross_encoder_name": "BAAI/bge-reranker-base",
}

# NO CHANGE NEEDED UNLESS PROVIDER CHANGES THE BASE URLS, OR YOU WANT TO USE DIFFERENT PROVIDER UNDER "others"
openai_compatible = {
    "google": "https://generativelanguage.googleapis.com/v1beta/openai/",
    "local": "http://127.0.0.1:1234/v1",
    "groq": "https://api.groq.com/openai/v1",
    "openai": "https://api.openai.com/v1",
    "others": "http://localhost:11434/v1",  # Ollama default port
}

But when I try to run a search:

>>> import requests
>>> res = requests.post("http://localhost:8000/web-search", json={"query": "When was Napoleon born?"})

In the interface of Coexist I get:

2025-09-28 10:39:48.742 | INFO     | logging:callHandlers:1737 | 🔍 PROFILER [When was Napoleon born?...]: Starting query_agent - Generating search queries from user input
2025-09-28 10:40:05.446 | INFO     | logging:callHandlers:1737 | HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-28 10:40:05.472 | WARNING  | logging:callHandlers:1737 | Structured output failed: Structured Output response does not have a 'parsed' field nor a 'refusal' field. Received message:

content='' additional_kwargs={'parsed': None, 'refusal': None} response_metadata={'token_usage': {'completion_tokens': 82, 'prompt_tokens': 111, 'total_tokens': 193, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-oss', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-436', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None} id='run--899d3716-46dd-43c1-b631-9d74b53584c5-0' usage_metadata={'input_tokens': 111, 'output_tokens': 82, 'total_tokens': 193, 'input_token_details': {}, 'output_token_details': {}}. Falling back to prompt-based extraction.
2025-09-28 10:41:29.505 | INFO     | logging:callHandlers:1737 | HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-28 10:41:29.508 | ERROR    | logging:callHandlers:1737 | Both structured and prompt-based extraction failed: 'str' object has no attribute 'text'
2025-09-28 10:41:29.508 | ERROR    | logging:callHandlers:1737 | Error generating search response for query 'When was Napoleon born?': not enough values to unpack (expected 3, got 0)
2025-09-28 10:41:29.508 | INFO     | logging:callHandlers:1737 | ⏱️  PROFILER [When was Napoleon born?...]: Completed query_agent in 100.767s - Failed with error
2025-09-28 10:41:29.508 | INFO     | logging:callHandlers:1737 | 127.0.0.1:59581 - "POST /web-search HTTP/1.1" 200

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions