-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context: there is a fastAPI-based server with various endpoints defined. In particular there are two endpoints of interest here:
agent/query-- this allows a query to a langroid agent using an OpenAI LLMagent-ollama/query-- this is similar, except that the agent uses an LLM via ollama (llama2)
The first endpoint works fine locally as well as on Google cloud run, however the second (ollama) endpoint works locally but not on Google cloud run.
When testing locally, i.e.
make build
make up
pytest -xs tests/test_fastapi.py::test_agent_ollama_query
both tests pass (one test is against the "real" container endpoint, the other is against the code directly).
However when I push to google cloud run, the ollama endpt test fails with a connection error. To build and push to google cloud run, do:
make gserver
make gpush
make gdeploy
LANGROID_BASE_URL=<url of cloud-run app> pytest -xs tests/test_fastapi.py::test_agent_ollama_query
NOTE: in the Makefile you will need to adjust the various variables like the GCP project id and server name to match yours, since you will not have authorization to push to the langroid project.
Some possibilities to look into:
- maybe we need 2 separate containers, one running ollama, one running our fastapi server?
- maybe we can get it to work with just one container, but I am missing some settings?