|
| 1 | +# TrueFoundry |
| 2 | + |
| 3 | +TrueFoundry provides an enterprise-ready [AI Gateway](https://www.truefoundry.com/ai-gateway) to provide governance and observability to agentic frameworks like LangChain. TrueFoundry AI Gateway serves as a unified interface for LLM access, providing: |
| 4 | + |
| 5 | +- **Unified API Access**: Connect to 250+ LLMs (OpenAI, Claude, Gemini, Groq, Mistral) through one API |
| 6 | +- **Low Latency**: Sub-3ms internal latency with intelligent routing and load balancing |
| 7 | +- **Enterprise Security**: SOC 2, HIPAA, GDPR compliance with RBAC and audit logging |
| 8 | +- **Quota and cost management**: Token-based quotas, rate limiting, and comprehensive usage tracking |
| 9 | +- **Observability**: Full request/response logging, metrics, and traces with customizable retention |
| 10 | + |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | + |
| 14 | +Before integrating LangChain with TrueFoundry, ensure you have: |
| 15 | + |
| 16 | +1. **TrueFoundry Account**: A [TrueFoundry account](https://www.truefoundry.com/register) with at least one model provider configured. Follow quick start guide [here](https://docs.truefoundry.com/gateway/quick-start) |
| 17 | +2. **Personal Access Token**: Generate a token by following the [TrueFoundry token generation guide](https://docs.truefoundry.com/gateway/authentication) |
| 18 | + |
| 19 | +## Quickstart |
| 20 | + |
| 21 | +You can connect to TrueFoundry's unified LLM gateway through the `ChatOpenAI` interface. |
| 22 | + |
| 23 | +- Set the `base_url` to your TrueFoundry endpoint (explained below) |
| 24 | +- Set the `api_key` to your TrueFoundry [PAT (Personal Access Token)](https://docs.truefoundry.com/gateway/authentication#personal-access-token-pat) |
| 25 | +- Use the same `model-name` as shown in the unified code snippet |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +### Installation |
| 30 | + |
| 31 | +```bash |
| 32 | +pip install langchain-openai |
| 33 | +``` |
| 34 | + |
| 35 | +### Basic Setup |
| 36 | + |
| 37 | +Connect to TrueFoundry by updating the `ChatOpenAI` model in LangChain: |
| 38 | + |
| 39 | +```python |
| 40 | +from langchain_openai import ChatOpenAI |
| 41 | + |
| 42 | +llm = ChatOpenAI( |
| 43 | + api_key=TRUEFOUNDRY_API_KEY, |
| 44 | + base_url=TRUEFOUNDRY_GATEWAY_BASE_URL, |
| 45 | + model="openai-main/gpt-4o" # Similarly you can call any model from any model provider |
| 46 | +) |
| 47 | + |
| 48 | +llm.invoke("What is the meaning of life, universe and everything?") |
| 49 | +``` |
| 50 | + |
| 51 | +The request is routed through your TrueFoundry gateway to the specified model provider. TrueFoundry automatically handles rate limiting, load balancing, and observability. |
| 52 | + |
| 53 | +### LangGraph Integration |
| 54 | + |
| 55 | + |
| 56 | +```python |
| 57 | +from langchain_openai import ChatOpenAI |
| 58 | +from langgraph.graph import StateGraph, MessagesState |
| 59 | +from langchain_core.messages import HumanMessage |
| 60 | + |
| 61 | +# Define your LangGraph workflow |
| 62 | +def call_model(state: MessagesState): |
| 63 | + model = ChatOpenAI( |
| 64 | + api_key=TRUEFOUNDRY_API_KEY, |
| 65 | + base_url=TRUEFOUNDRY_GATEWAY_BASE_URL, |
| 66 | + # Copy the exact model name from gateway |
| 67 | + model="openai-main/gpt-4o" |
| 68 | + ) |
| 69 | + response = model.invoke(state["messages"]) |
| 70 | + return {"messages": [response]} |
| 71 | + |
| 72 | +# Build workflow |
| 73 | +workflow = StateGraph(MessagesState) |
| 74 | +workflow.add_node("agent", call_model) |
| 75 | +workflow.set_entry_point("agent") |
| 76 | +workflow.set_finish_point("agent") |
| 77 | + |
| 78 | +app = workflow.compile() |
| 79 | + |
| 80 | +# Run agent through TrueFoundry |
| 81 | +result = app.invoke({"messages": [HumanMessage(content="Hello!")]}) |
| 82 | +``` |
| 83 | + |
| 84 | + |
| 85 | +## Observability and Governance |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +With the Metrics Dashboard, you can monitor and analyze: |
| 90 | + |
| 91 | +- **Performance Metrics**: Track key latency metrics like Request Latency, Time to First Token (TTFS), and Inter-Token Latency (ITL) with P99, P90, and P50 percentiles |
| 92 | +- **Cost and Token Usage**: Gain visibility into your application's costs with detailed breakdowns of input/output tokens and the associated expenses for each model |
| 93 | +- **Usage Patterns**: Understand how your application is being used with detailed analytics on user activity, model distribution, and team-based usage |
| 94 | +- **Rate Limiting & Load Balancing**: Configure limits, distribute traffic across models, and set up fallbacks |
| 95 | + |
| 96 | +## Support |
| 97 | + |
| 98 | +For questions, issues, or support: |
| 99 | + |
| 100 | + |
| 101 | +- **Documentation**: [https://docs.truefoundry.com/](https://docs.truefoundry.com/) |
0 commit comments