Skip to content

Get ollama to work in the fastapi server #1

@pchalasani

Description

@pchalasani

Context: there is a fastAPI-based server with various endpoints defined. In particular there are two endpoints of interest here:

  • agent/query -- this allows a query to a langroid agent using an OpenAI LLM
  • agent-ollama/query -- this is similar, except that the agent uses an LLM via ollama (llama2)

The first endpoint works fine locally as well as on Google cloud run, however the second (ollama) endpoint works locally but not on Google cloud run.

When testing locally, i.e.

make build
make up
pytest -xs tests/test_fastapi.py::test_agent_ollama_query

both tests pass (one test is against the "real" container endpoint, the other is against the code directly).

However when I push to google cloud run, the ollama endpt test fails with a connection error. To build and push to google cloud run, do:

make gserver
make gpush
make gdeploy
LANGROID_BASE_URL=<url of cloud-run app> pytest -xs tests/test_fastapi.py::test_agent_ollama_query

NOTE: in the Makefile you will need to adjust the various variables like the GCP project id and server name to match yours, since you will not have authorization to push to the langroid project.

Some possibilities to look into:

  • maybe we need 2 separate containers, one running ollama, one running our fastapi server?
  • maybe we can get it to work with just one container, but I am missing some settings?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions