Get ollama to work in the fastapi server

Context: there is a fastAPI-based server with various endpoints defined. In particular there are two endpoints of interest here:
- `agent/query` -- this allows a query to a langroid agent using an OpenAI LLM
- `agent-ollama/query` -- this is similar, except that the agent uses an LLM via ollama (llama2)

The first endpoint works fine locally as well as on Google cloud run, however the second (ollama) endpoint works locally but not on Google cloud run.

When testing locally, i.e.
```
make build
make up
pytest -xs tests/test_fastapi.py::test_agent_ollama_query
```
both tests pass (one test is against the "real" container endpoint, the other is against the code directly).

However when I push to google cloud run, the ollama endpt test fails with a connection error. To build and push to google cloud run, do:

```
make gserver
make gpush
make gdeploy
LANGROID_BASE_URL=<url of cloud-run app> pytest -xs tests/test_fastapi.py::test_agent_ollama_query
```
NOTE: in the Makefile you will need to adjust the various variables like the GCP project id and server name to match yours, since you will not have authorization to push to the langroid project.

Some possibilities to look into:
- maybe we need 2 separate containers, one running ollama, one running our fastapi server?
- maybe we can get it to work with just one container, but I am missing some settings?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get ollama to work in the fastapi server #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Get ollama to work in the fastapi server #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions