-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
I have set it up with Ollama.
Here is my model_config.py
:
import os
"""
This module defines the configuration for language model (LLM) and embedding models.
Attributes:
api_key (str): The OpenAI API key, loaded from the environment variable 'OPENAI_API_KEY'.
model_config (dict): A dictionary containing configuration parameters for LLM and embedding models.
Keys:
- "llm_model_name" (str): Name of the LLM model to use.
- "llm_type" (str): Type of the LLM provider (e.g., "openai"). "local" is for lmstudio,
for ollama and other local models use "others" with base_url updated in openai_compatible.
- If you using others llm type, then check the openai_compatible url dict for others key, you can generally
find it by "googling YOUR provider name openai api base compatilble url"
- "llm_tools" (list): List of tools or plugins to use with the LLM.
- "llm_kwargs" (dict): Additional keyword arguments for LLM initialization.
- "temperature" (float): Sampling temperature for generation.
- "max_tokens" (int or None): Maximum number of tokens to generate.
- "timeout" (int or None): Timeout for API requests.
- "max_retries" (int): Maximum number of retries for failed requests.
- "api_key" (str): API key for authentication.
- "embedding_model_name" (str): Name of the embedding model to use.
- "embed_mode" (str): Embedding mode or backend.
- "cross_encoder_name" (str): Name of the cross-encoder model for reranking.
"""
############## PORT and HOST SETTINGS
PORT_NUM_SEARXNG = 8085
PORT_NUM_APP = 8000
HOST_APP = "localhost"
HOST_SEARXNG = "localhost"
###############
## USER INPUTS NEEDED
# for open source model you can replace it by 'DUMMY' (for both llm and embed), else respective providers
llm_api_key = os.environ.get(
"GOOGLE_API_KEY", "DUMMY"
) # either paste llm key, based on provider (for an instance, Google) here directly or export it in the env, else dummy for local
embed_api_key = os.environ.get(
"GOOGLE_API_KEY", "DUMMY"
) # either paste embeder key, based on provider (for an instance, Google) here directly or export it in the env, else dummy for local
model_config = {
# Name of the LLM model to use. For local models, use the model name served by your local server.
"llm_model_name": "gpt-oss",
# LLM provider type: choose from 'google', 'local', 'groq', or 'openai' or 'others'
# in case of 'others' (base url needs to be updated in the `openai_compatible` dictionary below).
# Make sure to update the api_key variable above to match the provider.
# "local" is for lmstudio, for ollama and other local models use "others" with base_url updated in openai_compatible.
# You can generally find it by "googling YOUR PROVIDER (example ollama) name openai api base compatible url"
"llm_type": "others",
# List of tools or plugins to use with the LLM, if any. Set to None if not used.
"llm_tools": None,
# Additional keyword arguments for LLM initialization.
"llm_kwargs": {
"temperature": 0.1, # Sampling temperature for generation.
"max_tokens": None, # Maximum number of tokens to generate (None for default).
"timeout": None, # Timeout for API requests (None for default).
"max_retries": 2, # Maximum number of retries for failed requests.
"api_key": llm_api_key, # API key for authentication.
},
# Name of the embedding model to use.
# For Google, use their embedding model names. For local/HuggingFace, use the model path or name.
# Tested models can be found at https://github.com/michaelfeil/infinity?tab=readme-ov-file#supported-tasks-and-models-by-infinity
"embedding_model_name": "mixedbread-ai/mxbai-embed-large-v1",
"embed_kwargs": {}, # optional additional kwargs for embedding model initialization
# Embedding backend: 'google' for Google, 'infinity_emb' for local/HuggingFace models.
"embed_mode": "infinity_emb",
# Name of the cross-encoder model for reranking, typically a HuggingFace model.
"cross_encoder_name": "BAAI/bge-reranker-base",
}
# NO CHANGE NEEDED UNLESS PROVIDER CHANGES THE BASE URLS, OR YOU WANT TO USE DIFFERENT PROVIDER UNDER "others"
openai_compatible = {
"google": "https://generativelanguage.googleapis.com/v1beta/openai/",
"local": "http://127.0.0.1:1234/v1",
"groq": "https://api.groq.com/openai/v1",
"openai": "https://api.openai.com/v1",
"others": "http://localhost:11434/v1", # Ollama default port
}
But when I try to run a search:
>>> import requests
>>> res = requests.post("http://localhost:8000/web-search", json={"query": "When was Napoleon born?"})
In the interface of Coexist I get:
2025-09-28 10:39:48.742 | INFO | logging:callHandlers:1737 | 🔍 PROFILER [When was Napoleon born?...]: Starting query_agent - Generating search queries from user input
2025-09-28 10:40:05.446 | INFO | logging:callHandlers:1737 | HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-28 10:40:05.472 | WARNING | logging:callHandlers:1737 | Structured output failed: Structured Output response does not have a 'parsed' field nor a 'refusal' field. Received message:
content='' additional_kwargs={'parsed': None, 'refusal': None} response_metadata={'token_usage': {'completion_tokens': 82, 'prompt_tokens': 111, 'total_tokens': 193, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-oss', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-436', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None} id='run--899d3716-46dd-43c1-b631-9d74b53584c5-0' usage_metadata={'input_tokens': 111, 'output_tokens': 82, 'total_tokens': 193, 'input_token_details': {}, 'output_token_details': {}}. Falling back to prompt-based extraction.
2025-09-28 10:41:29.505 | INFO | logging:callHandlers:1737 | HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-28 10:41:29.508 | ERROR | logging:callHandlers:1737 | Both structured and prompt-based extraction failed: 'str' object has no attribute 'text'
2025-09-28 10:41:29.508 | ERROR | logging:callHandlers:1737 | Error generating search response for query 'When was Napoleon born?': not enough values to unpack (expected 3, got 0)
2025-09-28 10:41:29.508 | INFO | logging:callHandlers:1737 | ⏱️ PROFILER [When was Napoleon born?...]: Completed query_agent in 100.767s - Failed with error
2025-09-28 10:41:29.508 | INFO | logging:callHandlers:1737 | 127.0.0.1:59581 - "POST /web-search HTTP/1.1" 200
Metadata
Metadata
Assignees
Labels
No labels