Caution
Actively Refactoring
Flemma (formerly Claudius) is in the middle of a large-scale rename and architecture refresh. Expect new functionality, renamed modules, and occasional breaking changes while the project settles. Pin a commit if you need a steady target.
Flemma turns Neovim into a first-class AI workspace. It gives .chat buffers streaming conversations, reusable prompt templates, attachment support, cost tracking, and ergonomic commands for the three major providers: Anthropic Claude, OpenAI, and Google Vertex AI.
Q: What is this and who is it for?
A: Flemma is not a coding assistant. I [
@StanAngeloff] created Flemma as my AI workspace for everything else. [continued]
Flemma is for the technical writers, researchers, creators, and tinkerers, for those who occasionally get in hot water and need advice. It's for everyone who wants to experiment with AI.
Q: Why Flemma and not X or Y? (where X = Claude Workbench, Y = ChatGPT, etc.)
A: The terminal and Neovim are where I spend most of my time. I needed a plug-in that would maximize my productivity and let me experiment with multiple models. I needn't worry about [continued]
…accidentally pressing <C-R> and refreshing the page midway through a prompt (or <C-W> trying to delete a word)… or Chrome sending a tab to sleep whilst I had an unsaved session… or having to worry about whether files I shared with Claude Workbench were stored on some Anthropic server indefinitely. I can be fast! I can be reckless! I can tinker! I can use my Vim keybindings and years of muscle memory!
If I have an idea, it's a buffer away. Should I want to branch off and experiment, I'd duplicate the .chat file and go in a different direction. Is the conversation getting too long? I'd summarize a set of instructions and start with them in a new .chat file, then share them each time I need a fresh start. Need backups or history? I have Git for that.
Q: What can I use Flemma for?
A: Flemma is versatile - I'm personally using it mostly professionally and occasionally for personal tasks. Over the last 6+ months since Flemma was created, I've used it to [continued]
- Write countless technical documents, from PRDs (Product Requirements Document), AKM (Architecture Knowledge Management), infrastructure and architecture diagrams with Mermaid, detailed storyboards for LMS content, release notes, FR (Functional Requirements), etc.
- Write detailed software design documents using Figma designs as input and the cheap OCR capabilities of Gemini Flash to annotate them, then the excellent reasoning capabilities of Gemini Pro to generate storyboards and interaction flows.
- Record video sessions which I later transcribed using Whisper and then turned into training materials using Flemma.
- Generate client-facing documentation from very technical input, stripping it of technical jargon and making it accessible to a wider audience.
- Create multiple SOW (Statement of Work) documents for clients.
- Keep track of evolving requirements and decisions by maintaining a long history of meeting minutes.
- Collect large swaths of emails, meeting minutes, Slack conversations, Trello cards, and distill them into actionable tasks and project plans.
- As a tool for other AI agents - generate prompts for Midjourney, Reve, etc. and even prompts that I'd feed to different
.chatbuffers in Flemma.
There really is no limit to what you can do with Flemma - if you can write it down and reason about it, you can use Flemma to help you with it.
On a personal level, I've used Flemma to generate bedtime stories with recurring characters for my kids, made small financial decisions based on collected evidence, asked for advice on how to respond to difficult situations, consulted (usual disclaimer, blah blah) it for legal advice and much more.
Flemma can also be a playground for coding experiments - it can help with the occasional small task. I've personally used it to generate Awk scripts, small Node.js jobs, etc. Flemma is not a coding assistant or agent. It's not pretending to be one and it'll never be one. You should keep your Codex, Claude Code, etc. for that purpose - and they'll do a great job at it.
- Multi-provider chat – work with Claude, OpenAI, and Vertex models through one command tree while keeping prompts in plain
.chatbuffers. .chatediting tools – get markdown folding, visual rulers,<thinking>highlighting, and message text objects tuned for chat transcripts.- Structured templates – combine Lua or JSON frontmatter, inline
{{ expressions }}, andinclude()helpers to assemble prompts without leaving Neovim. - Context attachments – reference local files with
@./path; Flemma handles MIME detection and surfaces warnings when a provider can’t ingest the asset. - Reasoning visibility – stream Vertex thinking blocks into the buffer, expose OpenAI reasoning effort in lualine, and strip thought traces from the history sent back to models.
- Usage reporting – per-request and session notifications show token totals and costs using the bundled pricing tables.
- Presets and hooks – store favourite provider configurations, run
on_request_*callbacks, auto-write finished chats, and recall the latest usage notification when auditing work. - Contributor tooling – toggle structured logs, drop into the project’s Nix dev shell, and run the bundled headless tests without extra setup.
Flemma works with any plugin manager. With lazy.nvim you only need to declare the plugin – opts = {} triggers require("flemma").setup({}) automatically:
{
"Flemma-Dev/flemma.nvim",
opts = {},
}For managers that do not wire opts, call require("flemma").setup({}) yourself after the plugin is on the runtime path.
| Requirement | Why it matters |
|---|---|
| Neovim 0.11 or newer | Uses Tree-sitter folding APIs introduced in 0.11 and relies on vim.fs helpers. |
curl |
Streaming is handled by spawning curl with Server-Sent Events enabled. |
| Markdown Tree-sitter grammar | Flemma registers .chat buffers to reuse the markdown parser for syntax highlighting and folding. |
file CLI (optional but recommended) |
Provides reliable MIME detection for @./path attachments. When missing, extensions are used as a best effort. |
| Provider | Environment variable | Notes |
|---|---|---|
| Anthropic Claude | ANTHROPIC_API_KEY |
|
| OpenAI | OPENAI_API_KEY |
Supports GPT‑5 family, including reasoning effort settings. |
| Google Vertex AI | VERTEX_AI_ACCESS_TOKEN or service-account credentials |
Requires additional configuration (see below). |
Linux keyring setup (Secret Service)
When environment variables are absent Flemma looks for secrets in the Secret Service keyring. Store them once and every Neovim instance can reuse them:
secret-tool store --label="Claude API Key" service anthropic key api
secret-tool store --label="OpenAI API Key" service openai key api
secret-tool store --label="Vertex AI Service Account" service vertex key api project_id your-gcp-projectVertex AI service-account flow
- Create a service account in Google Cloud and grant it the Vertex AI user role.
- Download its JSON credentials and either:
- export them via
VERTEX_SERVICE_ACCOUNT='{"type": "..."}', or - store them in the Secret Service entry above (the JSON is stored verbatim).
- export them via
- Ensure the Google Cloud CLI is on your
$PATH; Flemma shells out togcloud auth application-default print-access-tokenwhenever it needs to refresh the token. - Set the project/location in configuration or via
:Flemma switch vertex gemini-2.5-pro project_id=my-project location=us-central1.
[!NOTE] If you only supply
VERTEX_AI_ACCESS_TOKEN, Flemma uses that token until it expires and skipsgcloud.
-
Configure the plugin:
require("flemma").setup({})
-
Create a new file that ends with
.chat. Flemma only activates on that extension. -
Type a message, for example:
@You: Turn the notes below into a short project update. - Added Vertex thinking budget support. - Refactored :Flemma command routing. - Documented presets in the README.
-
Press Ctrl-] (normal or insert mode) or run
:Flemma send. Flemma freezes the buffer while the request is streaming and shows@Assistant: Thinking.... -
When the reply finishes, a floating notification lists token counts and cost for the request and the session.
Cancel an in-flight response with Ctrl-c or :Flemma cancel.
Tip
Legacy commands (:FlemmaSend, :FlemmaCancel, …) still work but forward to the new command tree with a deprecation notice.
```lua
release = {
version = "v25.10-1",
focus = "command presets and UI polish",
}
notes = [[
- Presets appear first in :Flemma switch completion.
- Thinking tags have dedicated highlights.
- Logging toggles now live under :Flemma logging:*.
]]
```
@System: You turn engineering notes into concise changelog entries.
@You: Summarise {{release.version}} with emphasis on {{release.focus}} using the points below:
{{notes}}
@Assistant:
- Changelog bullets...
- Follow-up actions...
<thinking>
Model thoughts stream here and auto-fold.
</thinking>- Frontmatter sits on the first line and must be fenced with triple backticks. Lua and JSON parsers ship with Flemma; you can register more via
flemma.frontmatter.parsers.register("yaml", parser_fn). - Messages begin with
@System:,@You:, or@Assistant:. The parser is whitespace-tolerant and handles blank lines between messages. - Thinking blocks appear only in assistant messages. Vertex AI models stream
<thinking>sections; Flemma folds them automatically and keeps dedicated highlights for the tags and body.
| Fold level | What folds | Why |
|---|---|---|
| Level 3 | The frontmatter block | Keep templates out of the way while you focus on chat history. |
| Level 2 | <thinking>...</thinking> |
Reasoning traces are useful, but often secondary to the answer. |
| Level 1 | Each message | Collapse long exchanges without losing context. |
Toggle folds with your usual mappings (za, zc, etc.). The fold text shows a snippet of the hidden content so you know whether to expand it.
Between messages, Flemma draws a ruler using the configured ruler.char and highlight. This keeps multi-step chats legible even with folds open.
Inside .chat buffers Flemma defines:
]m/[m– jump to the next/previous message header.im/am(configurable) – select the inside or entire message as a text object. Thinking blocks are skipped so yankingimnever includes<thinking>sections unintentionally.- Buffer-local mappings for send/cancel default to
<C-]>and<C-c>in normal mode. Insert-mode<C-]>stops insert, sends, and re-enters insert when the response finishes.
Disable or remap these through the keymaps section (see Configuration reference).
Use the single entry point :Flemma {command}. Autocompletion lists every available sub-command.
| Command | Purpose | Example |
|---|---|---|
:Flemma send [key=value …] |
Send the current buffer. Optional callbacks run before/after the request. | :Flemma send on_request_start=stopinsert on_request_complete=startinsert! |
:Flemma cancel |
Abort the active request and clean up the spinner. | |
:Flemma switch … |
Choose or override provider/model parameters. | See below. |
:Flemma message:next / :Flemma message:previous |
Jump through message headers. | |
:Flemma logging:enable / :…:disable / :…:open |
Toggle structured logging and open the log file. | |
:Flemma notification:recall |
Reopen the last usage/cost notification. | |
:Flemma import |
Convert Anthropics Claude Workbench code snippets into .chat format. |
:Flemma switch(no arguments) opens twovim.ui.selectpickers: first provider, then model.:Flemma switch openai gpt-5 temperature=0.3changes provider, model, and overrides parameters in one go.:Flemma switch vertex project_id=my-project location=us-central1 thinking_budget=4096demonstrates long-form overrides. Anything that looks likekey=valueis accepted; unknown keys are passed to the provider for validation.
Define reusable setups under the presets key. Preset names must begin with $; completions prioritise them above built-in providers.
require("flemma").setup({
presets = {
["$fast"] = "vertex gemini-2.5-flash temperature=0.2",
["$review"] = {
provider = "claude",
model = "claude-sonnet-4-5",
max_tokens = 6000,
},
},
})Switch using :Flemma switch $fast or :Flemma switch $review temperature=0.1 to override individual values.
| Provider | Defaults | Extra parameters | Notes |
|---|---|---|---|
| Claude | claude-sonnet-4-0 |
Standard max_tokens, temperature, timeout, connect_timeout. |
Supports text, image, and PDF attachments. |
| OpenAI | gpt-5 |
reasoning=<low|medium|high> toggles reasoning effort. When set, lualine includes the reasoning level and Flemma keeps your configured max_tokens aligned with OpenAI’s completion limit automatically. |
Cost notifications include reasoning tokens. |
| Vertex AI | gemini-2.5-pro |
project_id (required), location (default global), thinking_budget enables streamed <thinking> traces. |
thinking_budget ≥ 1 activates Google’s experimental thinking output; set to 0 or nil to disable. |
The full model cataloguel (including pricing) is in lua/flemma/models.lua. You can access it from Neovim with:
:lua print(vim.inspect(require("flemma.provider.config").models))Flemma’s prompt pipeline runs through three stages: parse, evaluate, and send. Errors at any stage surface via diagnostics before the request leaves your editor.
- Place a fenced block on the first line (
```luaor```json). - Return a table of variables to inject into the template environment.
- Errors (syntax problems, missing parser) block the request and show in a detailed notification with filename and line number.
```lua
recipient = "QA team"
notes = [[
- Verify presets list before providers.
- Check spinner no longer triggers spell checking.
- Confirm logging commands live under :Flemma logging:*.
]]
```Use {{ expression }} inside any non-assistant message. Expressions run in a sandbox that exposes:
- Standard Lua libs (
string,table,math,utf8). vim.fn(fnamemodify,getcwd) andvim.fs(normalize,abspath).- Variables returned from frontmatter.
Outputs are converted to strings. Tables are JSON-encoded automatically.
@You: Draft a short update for {{recipient}} covering:
{{notes}}Errors in expressions are downgraded to warnings. The request still sends, and the literal {{ expression }} remains in the prompt so you can see what failed.
Call include("relative/or/absolute/path") inside frontmatter or an expression to inline another template fragment. Includes are evaluated in isolation (they do not inherit your variables) and support their own {{ }} and @./ references.
Guards in place:
- Relative paths resolve against the file that called
include(). - Circular includes raise a descriptive error with the include stack.
- Missing files or read errors raise warnings that block the request.
Flemma groups diagnostics by type in the notification shown before sending:
- Frontmatter errors (blocking) – malformed code, unknown parser, include issues.
- Expression warnings (non-blocking) – runtime errors during
{{ }}evaluation. - File reference warnings (non-blocking) – missing files, unsupported MIME types.
If any blocking error occurs the buffer becomes modifiable again and the request is cancelled before hitting the network.
Embed local context with @./relative/path (or @../up-one/path). Flemma handles:
- Resolving the path against the
.chatfile (after decoding URL-escaped characters like%20). - Detecting the MIME type via
fileor the extension fallback. - Streaming the file in the provider-specific format.
Examples:
@You: Critique @./patches/fix.lua;type=text/x-lua.
@You: OCR this screenshot @./artifacts/failure.png.
@You: Compare these specs: @./specs/v1.pdf and @./specs/v2.pdf.Trailing punctuation such as . or ) is ignored so you can keep natural prose. To coerce a MIME type, append ;type=<mime> as in the Lua example above.
| Provider | Text files | Images | PDFs | Behaviour when unsupported |
|---|---|---|---|---|
| Claude | Embedded as plain text parts | Uploaded as base64 image parts | Sent as document parts | The literal @./path is kept and a warning is shown. |
| OpenAI | Embedded as text parts | Sent as image_url entries with data URLs |
Sent as file objects |
Unsupported types become plain text with a diagnostic. |
| Vertex AI | Embedded as text parts | Sent as inlineData |
Sent as inlineData |
Falls back to text with a warning. |
If a file cannot be read or the provider refuses its MIME type, Flemma warns you (including line number) and continues with the raw reference so you can adjust your prompt.
Each completed request emits a floating report that names the provider/model, lists input/output tokens (reasoning tokens are counted under ⊂ thoughts), and – when pricing is enabled – shows the per-request and cumulative session cost derived from lua/flemma/models.lua. Token accounting persists for the lifetime of the Neovim instance; call require("flemma.state").reset_session() if you need to zero the counters without restarting. pricing.enabled = false suppresses the dollar amounts while keeping token totals for comparison.
Flemma keeps the most recent notification available via :Flemma notification:recall, which helps when you close the floating window before capturing the numbers. Logging lives in the same subsystem: toggle it with :Flemma logging:enable / :Flemma logging:disable and open the log file (~/.local/state/nvim/flemma.log or your stdpath("cache")) through :Flemma logging:open whenever you need the redacted curl command and streaming trace.
Configuration keys map to dedicated highlight groups:
| Key | Applies to |
|---|---|
highlights.system |
System messages (FlemmaSystem) |
highlights.user |
User messages (FlemmaUser) |
highlights.assistant |
Assistant messages (FlemmaAssistant) |
highlights.user_lua_expression |
{{ expression }} fragments |
highlights.user_file_reference |
@./path fragments |
highlights.thinking_tag |
<thinking> / </thinking> tags |
highlights.thinking_block |
Content inside thinking blocks |
Each value accepts a highlight name, a hex colour string, or a table of highlight attributes ({ fg = "#ffcc00", bold = true }).
Role markers inherit role_style (comma-separated GUI attributes) so marker styling tracks your message colours.
Set signs.enabled = true to place signs for each message line. Each role (system, user, assistant) can override the character and highlight. Signs default to using the message highlight colour.
While a request runs Flemma appends @Assistant: Thinking... with an animated braille spinner. The line is flagged as non-spellable so spell check integrations stay quiet. Once streaming starts, the spinner is removed and replaced with the streamed content.
Add the bundled component to show the active model (and reasoning effort when set):
require("lualine").setup({
sections = {
lualine_x = {
{ "flemma", icon = "🧠" },
"encoding",
"filetype",
},
},
})The component only renders in chat buffers. Switching providers or toggling OpenAI reasoning effort causes Flemma to refresh lualine automatically.
Flemma works without arguments, but every option can be overridden:
require("flemma").setup({
provider = "claude",
model = nil, -- provider default
parameters = {
max_tokens = 4000,
temperature = 0.7,
timeout = 120,
connect_timeout = 10,
vertex = {
project_id = nil,
location = "global",
thinking_budget = nil,
},
openai = {
reasoning = nil, -- "low" | "medium" | "high"
},
},
presets = {},
highlights = {
system = "Special",
user = "Normal",
assistant = "Comment",
user_lua_expression = "PreProc",
user_file_reference = "Include",
thinking_tag = "Comment",
thinking_block = "Comment",
},
role_style = "bold,underline",
ruler = { char = "━", hl = "NonText" },
signs = {
enabled = false,
char = "▌",
system = { char = nil, hl = true },
user = { char = "▏", hl = true },
assistant = { char = nil, hl = true },
},
notify = require("flemma.notify").default_opts,
pricing = { enabled = true },
text_object = "m",
editing = {
disable_textwidth = true,
auto_write = false,
},
logging = {
enabled = false,
path = vim.fn.stdpath("cache") .. "/flemma.log",
},
keymaps = {
enabled = true,
normal = {
send = "<C-]>",
cancel = "<C-c>",
next_message = "]m",
prev_message = "[m",
},
insert = {
send = "<C-]>",
},
},
})Additional notes:
editing.auto_write = truewrites the buffer after each successful request or cancellation.- Set
text_object = falseto disable the message text object entirely. notify.default_optsexposes floating-window appearance (timeout, width, border, title).logging.enabled = truestarts the session with logging already active.
Quick steps – Export the TypeScript snippet in Claude, paste it into Neovim, then run :Flemma import.
Flemma can turn Claude Workbench exports into ready-to-send .chat buffers. Follow the short checklist above when you only need a reminder; the full walkthrough below explains each step and the safeguards in place.
Before you start
:Flemma importdelegates to the current provider. Keep Claude active (:Flemma switch claude) so the importer knows how to interpret the snippet.- Use an empty scratch buffer –
Flemma importoverwrites the entire buffer with the converted chat.
Export from Claude Workbench
- Navigate to https://console.anthropic.com/workbench and open the saved prompt you want to migrate.
- Click Get code in the top-right corner, then switch the language dropdown to TypeScript. The importer expects the
anthropic.messages.create({ ... })call produced by that export. - Press Copy code; Claude copies the whole TypeScript example (including the
import Anthropic from "@anthropic-ai/sdk"header).
Convert inside Neovim
- In Neovim, paste the snippet into a new buffer (or delete any existing text first).
- Run
:Flemma import. The command:- Scans the buffer for
anthropic.messages.create(...). - Normalises the JavaScript object syntax and decodes it as JSON.
- Emits a system message (if present) and rewrites every Workbench message as
@You:/@Assistant:lines. - Switches the buffer's filetype to
chatso folds, highlights, and keymaps activate immediately.
- Scans the buffer for
Troubleshooting
- If the snippet does not contain an
anthropic.messages.createcall, the importer aborts with “No Claude API call found”. - JSON decoding errors write both the original snippet and the cleaned JSON to
flemma_import_debug.login your temporary directory (e.g./tmp/flemma_import_debug.log). Open that file to spot mismatched brackets or truncated copies. - Nothing happens? Confirm Claude is the active provider – other providers currently do not ship an importer.
The repository provides a Nix shell so everyone shares the same toolchain:
nix developInside the shell you gain convenience wrappers:
flemma-fmt– runnixfmt,stylua, andprettieracross the repo.flemma-amp– open the Amp CLI, preconfigured for this project.flemma-codex– launch the OpenAI Codex helper.
Run the automated tests with:
make testThe suite boots headless Neovim via tests/minimal_init.lua and executes Plenary+Busted specs in tests/flemma/, printing detailed results for each spec so you can follow along.
To exercise the plugin without installing it globally:
nvim --cmd "set runtimepath+=`pwd`" \
-c 'lua require("flemma").setup({})' \
-c ':edit scratch.chat'Note
Almost every line of code in Flemma has been authored through AI pair-programming tools (Aider, Amp, and Codex). Traditional contributions are welcome – just keep changes focused, documented, and tested.
- Nothing happens when I send: confirm the buffer name ends with
.chatand the first message starts with@You:or@System:. - Frontmatter errors: notifications list the exact line and file. Fix the error and resend; Flemma will not contact the provider until the frontmatter parses cleanly.
- Attachments ignored: ensure the file exists relative to the
.chatfile and that the provider supports its MIME type. Use;type=to override when necessary. - Vertex refuses requests: double-check
parameters.vertex.project_idand authentication. Rungcloud auth application-default print-access-tokenmanually to ensure credentials are valid. - Keymaps clash: disable built-in mappings via
keymaps.enabled = falseand register your own:Flemmacommands.
Happy prompting!
