A sophisticated Agent Memory API powered by Redis Vectorset.
Intelligent memory storage, retrieval, and analysis using Redis VectorSet and OpenAI embeddings. This project focuses entirely on advanced memory capabilities for AI agents.
This project provides four main interfaces for different use cases:
# Interactive chat mode
python cli.py
# Single query mode
python cli.py "Remember that I like pizza"
# Start the web server
python web_app.py
# Access web interface at http://localhost:5001
# Use REST API endpoints like /api/memory, /api/chat
from langgraph_memory_agent import LangGraphMemoryAgent
agent = LangGraphMemoryAgent()
response = agent.run("What restaurants have I been to?")
# Start the MCP server for Claude Desktop and other MCP clients
python mcp_server.py
# Or use the setup script for easy configuration
python scripts/setup_mcp_server.py
-
Start Redis 8:
docker run -d --name redis -p 6379:6379 redis:8
-
Install dependencies:
pip install -r requirements.txt
-
Set OpenAI API key:
cp .env.example .env # Edit .env and add: OPENAI_API_KEY=your_key_here
-
Choose your interface:
- CLI:
python cli.py
- Web:
python web_app.py
→ visit http://localhost:5001 - Programmatic: Import
LangGraphMemoryAgent
in your code - MCP:
python scripts/setup_mcp_server.py
→ use with Claude Desktop or other MCP clients
- CLI:
- LangGraph Workflow: Intelligent tool orchestration for complex memory operations
- Vector Memory Storage: Semantic similarity search using Redis VectorSet API
- Contextual Grounding: Automatic resolution of relative time/location references
- Question Answering: AI-powered answers with confidence indicators and supporting evidence
- Memory Management: Store, search, update, and delete memories with full CRUD operations
- Web Interface: Clean, minimalist web UI with responsive design
- CLI Interface: Simple command-line interface for storing and recalling memories
- Context Management: Set and track current context for memory grounding
# Using Docker (recommended)
docker run -d --name redis -p 6379:6379 redis:8
# Or using Homebrew (macOS)
brew install redis
redis-server --port 6379
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Copy the example environment file and add your OpenAI API key:
cp .env.example .env
Edit .env
and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
Make sure your virtual environment is activated:
source venv/bin/activate # On Windows: venv\Scripts\activate
python cli.py
python cli.py "Remember that I like pizza"
python web_app.py
Then visit http://localhost:5001 for the web interface.
# Set up MCP server for Claude Desktop
python scripts/setup_mcp_server.py
# Or run the server directly
python mcp_server.py
The MCP server exposes memory capabilities as standardized tools for Claude Desktop and other MCP-compatible clients. See MCP_SERVER_README.md for detailed setup instructions.
Run the entire stack with Docker:
# Quick start with Docker Compose
docker-compose up -d
# Or use the Makefile
make docker-run
For development:
# Development mode with hot reload
make docker-dev
This project includes modern development tools:
# Install development dependencies
make install-dev
# Run all quality checks
make check
# Format code
make format
# Run tests
make test
# See all available commands
make help
remem/
├── cli.py # CLI entry point
├── web_app.py # Web API entry point
├── mcp_server.py # MCP server entry point
├── requirements.txt # Python dependencies
├── pyproject.toml # Modern Python packaging
├── Dockerfile # Container configuration
├── docker-compose.yml # Multi-service deployment
├── Makefile # Development commands
├── api/ # FastAPI application
├── memory/ # Core memory system
├── clients/ # External service clients
├── llm/ # LLM management
├── scripts/ # Utility scripts
├── tests/ # Test suite
└── docs/ # Documentation
The memory agent uses a sophisticated multi-layer architecture:
- Vector Storage: Uses Redis VectorSet for semantic memory storage
- Embeddings: Configurable embedding providers (OpenAI, Ollama) with multiple model support
- Contextual Grounding: Converts relative references (today, here) to absolute ones
- Confidence Analysis: Sophisticated question answering with confidence scoring
- Memory Tools: LangChain tools that wrap core memory operations
- Tool Integration: Seamless integration with LangGraph workflow
- LangGraph Orchestration: Intelligent tool selection and multi-step reasoning
- Workflow Nodes: Analyzer → Memory Agent → Tools → Synthesizer
- Complex Reasoning: Handles multi-step questions and sophisticated analysis
- Developer APIs: Core memory operations for agent developers
- Chat API: Conversational interface using LangGraph workflow
- Web Interface: Clean, responsive UI for memory management
store_memory
- Store new memories with contextual groundingsearch_memories
- Find relevant memories using vector similarity searchanswer_with_confidence
- Answer questions with sophisticated confidence analysisformat_memory_results
- Format memory search results for displayset_context
- Set current context (location, activity, people) for better memory groundingget_memory_stats
- Get statistics about stored memoriesanalyze_question_type
- Analyze what type of question the user is asking
- Python 3.8+
- Redis Stack (with RedisSearch module)
- OpenAI API key
- Internet connection for API calls
The Memory Agent provides a comprehensive RESTful API designed for two primary use cases:
- Developer Memory APIs - Core memory operations for agent developers
- Chat Application API - Conversational interface for demo/UI applications
http://localhost:5001
Currently no authentication is required. The API is designed for local development and testing.
All memory operations require a vectorstore_name
path parameter. The examples below show the old API structure for reference, but the current API uses:
POST /api/memory/{vectorstore_name}
instead ofPOST /api/memory
POST /api/memory/{vectorstore_name}/search
instead ofPOST /api/memory/search
- etc.
See the API Migration Guide section below for complete current API structure.
These endpoints provide core memory operations for developers building agents.
Store a new memory with optional contextual grounding.
POST /api/memory
Request Body:
{
"text": "I went to Mario's Italian Restaurant and had amazing pasta carbonara",
"apply_grounding": true
}
Parameters:
text
(string, required): Memory text to storeapply_grounding
(boolean, optional): Whether to apply contextual grounding (default: true)
Response:
{
"success": true,
"memory_id": "memory:550e8400-e29b-41d4-a716-446655440000",
"message": "Memory stored successfully"
}
Search for memories using vector similarity.
POST /api/memory/search
Request Body:
{
"query": "Italian restaurant food",
"top_k": 5,
"filter": "optional_filter_expression"
}
Parameters:
query
(string, required): Search query texttop_k
(integer, optional): Number of results to return (default: 5)filter
(string, optional): Filter expression for Redis VSIM command
Response:
{
"success": true,
"query": "Italian restaurant food",
"memories": [
{
"id": "memory:550e8400-e29b-41d4-a716-446655440000",
"text": "I went to Mario's Italian Restaurant and had amazing pasta carbonara",
"score": 0.952,
"formatted_time": "2024-01-15 18:30:00",
"tags": ["restaurant", "food", "Mario's"]
}
],
"count": 1
}
Answer questions using sophisticated confidence analysis and structured responses.
⭐ This endpoint calls memory_agent.answer_question()
directly for the highest quality responses.
POST /api/memory/answer
Request Body:
{
"question": "What did I eat at Mario's restaurant?",
"top_k": 5,
"filter": "optional_filter_expression"
}
Parameters:
question
(string, required): Question to answertop_k
(integer, optional): Number of memories to retrieve for context (default: 5)filter
(string, optional): Filter expression for Redis VSIM command
Response:
{
"success": true,
"question": "What did I eat at Mario's restaurant?",
"type": "answer",
"answer": "You had amazing pasta carbonara at Mario's Italian Restaurant.",
"confidence": "high",
"reasoning": "Memory directly mentions pasta carbonara at Mario's with specific details",
"supporting_memories": [
{
"id": "memory:550e8400-e29b-41d4-a716-446655440000",
"text": "I went to Mario's Italian Restaurant and had amazing pasta carbonara",
"relevance_score": 95.2,
"timestamp": "2024-01-15 18:30:00",
"tags": ["restaurant", "food", "Mario's"]
}
]
}
Confidence Levels:
high
: Memories directly and clearly answer the question with specific, relevant informationmedium
: Memories provide good information but may be incomplete or somewhat indirectlow
: Memories are tangentially related or don't provide enough information to answer confidently
Get statistics about stored memories and system information.
GET /api/memory
Response:
{
"success": true,
"memory_count": 42,
"vector_dimension": 1536,
"vectorset_name": "memories",
"embedding_model": "text-embedding-ada-002",
"redis_host": "localhost",
"redis_port": 6379,
"timestamp": "2024-01-15T18:30:00Z"
}
Set and get current context for memory grounding.
POST /api/memory/context
Request Body:
{
"location": "Jakarta, Indonesia",
"activity": "working on Redis project",
"people_present": ["John", "Sarah"],
"weather": "sunny",
"mood": "focused"
}
Parameters:
location
(string, optional): Current locationactivity
(string, optional): Current activitypeople_present
(array, optional): List of people present- Additional fields will be stored as environment context
Response:
{
"success": true,
"message": "Context updated successfully",
"context": {
"location": "Jakarta, Indonesia",
"activity": "working on Redis project",
"people_present": ["John", "Sarah"],
"environment": {
"weather": "sunny",
"mood": "focused"
}
}
}
GET /api/memory/context
Response:
{
"success": true,
"context": {
"temporal": {
"date": "2024-01-15",
"time": "18:30:00"
},
"spatial": {
"location": "Jakarta, Indonesia",
"activity": "working on Redis project"
},
"social": {
"people_present": ["John", "Sarah"]
},
"environmental": {
"weather": "sunny",
"mood": "focused"
}
}
}
Delete a specific memory by ID.
DELETE /api/memory/{memory_id}
Response:
{
"success": true,
"message": "Memory 550e8400-e29b-41d4-a716-446655440000 deleted successfully",
"memory_id": "550e8400-e29b-41d4-a716-446655440000"
}
Delete all memories from the system.
DELETE /api/memory
Response:
{
"success": true,
"message": "Successfully cleared all memories",
"memories_deleted": 42,
"vectorset_existed": true
}
These endpoints implement Minsky's K-lines cognitive model for advanced memory operations, providing mental state construction and reasoning capabilities.
Construct a mental state (K-line) by recalling relevant memories for a specific query or context.
POST /api/klines/recall
Request Body:
{
"query": "restaurant preferences for dinner",
"top_k": 5,
"filter": "optional_filter_expression",
"use_llm_filtering": false
}
Parameters:
query
(string, required): Query to construct mental state aroundtop_k
(integer, optional): Number of memories to include (default: 5)filter
(string, optional): Filter expression for Redis VSIM commanduse_llm_filtering
(boolean, optional): Apply LLM-based relevance filtering (default: false)
Response:
{
"success": true,
"query": "restaurant preferences for dinner",
"mental_state": "Here's what I remember that might be useful:\n1. I prefer Italian restaurants with good pasta (from 2024-01-15 18:30:00, 95.2% similar)\n Tags: restaurant, food, preferences",
"memories": [
{
"id": "memory:550e8400-e29b-41d4-a716-446655440000",
"text": "I prefer Italian restaurants with good pasta",
"score": 0.952,
"formatted_time": "2024-01-15 18:30:00",
"tags": ["restaurant", "food", "preferences"],
"relevance_reasoning": "Directly relates to restaurant preferences for dining"
}
],
"memory_count": 1,
"llm_filtering_applied": false
}
LLM Filtering Enhancement:
When use_llm_filtering: true
, the system applies the same intelligent relevance filtering used in the /ask
endpoint:
- Each memory is evaluated by an LLM for actual relevance to the query
- Only memories deemed relevant are included in the mental state
- Adds
relevance_reasoning
to each memory explaining why it was kept - Response includes
original_memory_count
andfiltered_memory_count
for transparency
Answer questions using K-line construction and sophisticated reasoning with confidence analysis.
POST /api/klines/ask
Request Body:
{
"question": "What restaurants should I try for dinner tonight?",
"top_k": 5,
"filter": "optional_filter_expression"
}
Parameters:
question
(string, required): Question to answertop_k
(integer, optional): Number of memories to retrieve for context (default: 5)filter
(string, optional): Filter expression for Redis VSIM command
Response:
{
"success": true,
"question": "What restaurants should I try for dinner tonight?",
"type": "answer",
"answer": "Based on your preferences, I'd recommend trying Italian restaurants with good pasta, as you've expressed a preference for this type of cuisine.",
"confidence": "medium",
"reasoning": "Your memory indicates a preference for Italian restaurants with good pasta",
"supporting_memories": [
{
"id": "memory:550e8400-e29b-41d4-a716-446655440000",
"text": "I prefer Italian restaurants with good pasta",
"relevance_score": 95.2,
"timestamp": "2024-01-15 18:30:00",
"tags": ["restaurant", "food", "preferences"],
"relevance_reasoning": "Directly relates to restaurant preferences for dining"
}
]
}
This endpoint provides a conversational interface using the LangGraph workflow for complex multi-step reasoning.
POST /api/chat
Request Body:
{
"message": "What restaurants have I been to and what did I eat at each?"
}
Parameters:
message
(string, required): User message/question
Response:
{
"success": true,
"message": "What restaurants have I been to and what did I eat at each?",
"response": "Based on your memories, you've been to Mario's Italian Restaurant where you had pasta carbonara. The LangGraph workflow analyzed your memories and found this information with high confidence."
}
Key Differences from /api/memory/answer
:
- Uses full LangGraph workflow with tool orchestration
- Can perform multi-step reasoning and complex conversations
- More flexible but potentially higher latency
- Best for conversational UIs and complex queries
GET /api/health
Response:
{
"status": "healthy",
"service": "LangGraph Memory Agent API",
"timestamp": "2024-01-15T18:30:00Z"
}
All endpoints return consistent error responses:
{
"error": "Description of the error",
"success": false
}
Common HTTP status codes:
400
- Bad Request (missing required parameters)404
- Not Found (memory ID not found)500
- Internal Server Error
# Store a memory
curl -X POST http://localhost:5001/api/memory \
-H "Content-Type: application/json" \
-d '{"text": "I love pizza with pepperoni"}'
# Search for memories
curl -X POST http://localhost:5001/api/memory/search \
-H "Content-Type: application/json" \
-d '{"query": "pizza", "top_k": 3}'
# Answer a question
curl -X POST http://localhost:5001/api/memory/answer \
-H "Content-Type: application/json" \
-d '{"question": "What kind of pizza do I like?"}'
# Set context first
curl -X POST http://localhost:5001/api/memory/context \
-H "Content-Type: application/json" \
-d '{
"location": "New York",
"activity": "dining",
"people_present": ["Alice", "Bob"]
}'
# Store memory (will be grounded with context)
curl -X POST http://localhost:5001/api/memory \
-H "Content-Type: application/json" \
-d '{"text": "We had an amazing dinner here"}'
# Complex conversational query
curl -X POST http://localhost:5001/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "Tell me about all my restaurant experiences and what I learned about food preferences"}'
# Store memories
Memory Agent> remember "I went to the mall to visit New Planet store"
Memory Agent> remember "Had lunch at Olive Garden with Sarah"
# Search memories
Memory Agent> recall "I'm going to the mall"
Memory Agent> recall "Where did I eat?"
# Ask questions
Memory Agent> ask "How much did I spend on my laptop?"
Memory Agent> ask "What did I eat with Sarah?"
# Start the web UI
python web_app.py
# Open browser to http://localhost:5001
# Use the clean, modern interface to:
# - Store memories with the text area
# - Search memories with similarity scores
# - Ask questions and get AI-powered answers
Use /api/memory/search
when you need:
- Raw vector similarity search results
- Multiple memory candidates for further processing
- Building your own confidence analysis
Use /api/memory/answer
when you need:
- High-quality question answering with confidence scores
- Structured responses with supporting evidence
- Single-step question answering
Use /api/chat
when you need:
- Multi-step reasoning and complex conversations
- Tool orchestration and workflow management
- Conversational UI interfaces
/api/memory/search
: Fastest, single vector search/api/memory/answer
: Medium, includes LLM analysis for confidence/api/chat
: Slowest, full LangGraph workflow with potential multiple LLM calls
When apply_grounding: true
(default), the system will:
- Convert relative time references ("yesterday", "last week") to absolute dates
- Resolve location context ("here", "this place") using current context
- Add people context based on current social setting
Set apply_grounding: false
for raw memory storage without context resolution.
The filter
parameter supports Redis VectorSet filter syntax:
# Filter by exact match
"filter": ".category == \"work\""
# Filter by range
"filter": ".priority >= 5"
# Multiple conditions
"filter": ".category == \"work\" and .priority >= 5"
# Array containment
"filter": ".tags in [\"important\", \"urgent\"]"
All memory operations now require a vectorstore_name
path parameter for proper memory isolation:
Current API Structure:
- Store memory:
POST /api/memory/{vectorstore_name}
- Search memories:
POST /api/memory/{vectorstore_name}/search
- Get memory info:
GET /api/memory/{vectorstore_name}
- Set context:
POST /api/memory/{vectorstore_name}/context
- Get context:
GET /api/memory/{vectorstore_name}/context
- Delete memory:
DELETE /api/memory/{vectorstore_name}/{memory_id}
- Clear all:
DELETE /api/memory/{vectorstore_name}/all
- K-lines recall:
POST /api/klines/{vectorstore_name}/recall
- K-lines ask:
POST /api/klines/{vectorstore_name}/ask
Example: Use memories
as the default vectorstore name:
curl -X POST http://localhost:5001/api/memory/memories \
-H "Content-Type: application/json" \
-d '{"text": "I love pizza"}'
The API provides proper memory isolation with vectorstore-specific operations and cleaner RESTful endpoints.
The system now includes a comprehensive Configuration Management API that allows runtime configuration of all system components:
- GET /api/config - Get current system configuration
- PUT /api/config - Update configuration settings
- POST /api/config/test - Test configuration changes without applying them
- POST /api/config/reload - Reload configuration and restart memory agent
- Redis: host, port, database, vector set name
- Embedding: provider (OpenAI/Ollama), model, dimensions, API settings
- OpenAI: API key, models, temperature, embedding settings (deprecated)
- LangGraph: model selection, temperature, system prompts
- Memory Agent: search parameters, grounding settings, validation
- Web Server: host, port, debug mode, CORS settings
curl -X PUT http://localhost:5001/api/config \
-H "Content-Type: application/json" \
-d '{
"redis": {
"host": "redis.example.com",
"port": 6379,
"db": 1
}
}'
# Test the configuration first
curl -X POST http://localhost:5001/api/config/test \
-H "Content-Type: application/json" \
-d '{"redis": {"host": "redis.example.com", "port": 6379}}'
# Apply changes
curl -X POST http://localhost:5001/api/config/reload
# Configure Ollama embedding provider
curl -X PUT http://localhost:5001/api/config \
-H "Content-Type: application/json" \
-d '{
"embedding": {
"provider": "ollama",
"model": "nomic-embed-text",
"dimension": 768,
"base_url": "http://localhost:11434"
}
}'
# Test embedding configuration
curl -X POST http://localhost:5001/api/config/test \
-H "Content-Type: application/json" \
-d '{
"embedding": {
"provider": "ollama",
"model": "nomic-embed-text",
"dimension": 768,
"base_url": "http://localhost:11434"
}
}'
# Configure OpenAI embedding provider
curl -X PUT http://localhost:5001/api/config \
-H "Content-Type: application/json" \
-d '{
"embedding": {
"provider": "openai",
"model": "text-embedding-3-small",
"dimension": 1536,
"api_key": "your-openai-api-key"
}
}'
See CONFIG_API.md for complete documentation and examples.
Run the test scripts to verify everything works:
# Test core memory functionality
python test_memory_agent.py
# Test configuration management API
python test_config_api.py
See setup_redis.md
for detailed Redis setup instructions.
MIT License