Skip to content

🖇️ feat: Send Attachments Directly to Providers #9088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

dustinhealy
Copy link
Collaborator

@dustinhealy dustinhealy commented Aug 16, 2025

Summary

This pull request extends support for sending attachments directly to LLM providers using the new multimodal 'Upload to Provider' button in the AttachFileMenu component for the following supported endpoints: Anthropic, OpenAI, AzureOpenAI, and Google (and Agents created which utilize those endpoints).

All four of the listed providers will now support PDF attachments in addition to their existing support for image attachments, with Google conversations now additionally supporting video and audio attachments as well when used in conjunction with compatible multimodal models. In order to achieve this functionality new encoding and formatting utilities for these file types have been introduced, as well as the agent client having been updated to categorize, process, and validate all supported attachments. Alongside this, two new SVGs have been introduced to properly display attached video and audio files in chat.

A small change has been made in PR #14 in the agents repo in order to support the new audio and video filetypes for Google conversations.

For more specific information about the changes made, the constituent PRs adding support for each endpoint can be referenced individually:

Change Type

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have made pertinent documentation changes
  • My changes do not introduce new warnings
  • Local unit tests pass with my changes
  • A pull request for updating the documentation has been submitted.

@dustinhealy dustinhealy changed the title WIP: Direct Provider Uploads WIP: Send Attachments Directly to Providers Aug 16, 2025
@dustinhealy dustinhealy force-pushed the feat/direct-provider-upload-v2 branch from e7412bd to 2c09ca4 Compare August 17, 2025 07:30
dustinhealy and others added 4 commits August 18, 2025 05:50
* feat: implement Anthropic native PDF support with document preservation

- Add comprehensive debug logging throughout PDF processing pipeline
- Refactor attachment processing to separate image and document handling
- Create distinct addImageURLs(), addDocuments(), and processAttachments() methods
- Fix critical bugs in stream handling and parameter passing
- Add streamToBuffer utility for proper stream-to-buffer conversion
- Remove api/agents submodule from repository

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* chore: remove out of scope formatting changes

* fix: stop duplication of file in chat on end of response stream

* chore: bring back file search and ocr options

* chore: localize upload to provider string in file menu

* refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support

* feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching

* feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload

* feat: add citations support according to docs

* refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now

* refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic

* fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs

* chore: remove client specific handling since the direct provider stuff is handled by the agent client

* feat: handle documents in AgentClient so no need for change to agents repo

* chore: removed unused changes

* chore: remove auto generated comments from OG commit

* feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic)

* fix: reintroduce role check to fix render error because of undefined value for Content Part

* fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent

---------

Co-authored-by: Andres Restrepo <[email protected]>
Co-authored-by: Claude <[email protected]>
* refactor: change references from direct upload to direct attach to better reflect functionality

since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support

* feat: add upload to provider option for openai (and agent) ui

* chore: move anthropic pdf validator over to packages/api

* feat: simple pdf validation according to openai docs

* feat: add provider agnostic validatePdf logic to start handling multiple endpoints

* feat: add handling for openai specific documentPart formatting

* refactor: move require statement to proper place at top of file

* chore: add in openAI endpoint for the rest of the document handling logic

* feat: add direct attach support for azureOpenAI endpoint and agents

* feat: add pdf validation for azureOpenAI endpoint

* refactor: unify all the endpoint checks with isDocumentSupportedEndpoint

* refactor: consolidate Upload to Provider vs Upload image logic for clarity

* refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now
* feat: add validation for google PDFs and add google endpoint as a document supporting endpoint

* feat: add proper pdf formatting for google endpoints (requires PR #14 in agents)

* feat: add multimodal support for google endpoint attachments

* feat: add audio file svg

* fix: refactor attachments logic so multi-attachment messages work properly

* feat: add video file svg

* fix: allows for followup questions of uploaded multimodal attachments

* fix: remove incorrect final message filtering that was breaking Attachment component rendering
… picked up due to case insensitivity in dir name
@dustinhealy dustinhealy force-pushed the feat/direct-provider-upload-v2 branch from c9173f2 to 08ddf5e Compare August 18, 2025 12:51
@dustinhealy dustinhealy changed the title WIP: Send Attachments Directly to Providers 🖇️ feat: Send Attachments Directly to Providers Aug 18, 2025
@dustinhealy dustinhealy marked this pull request as ready for review August 18, 2025 14:01
* refactor: move audio encode over to TS

* refactor: audio encoding now functional in LC again

* refactor: move video encode over to TS

* refactor: move document encode over to TS

* refactor: video encoding now functional in LC again

* refactor: document encoding now functional in LC again

* fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider

* feat: only accept pdfs if responses api is enabled for openai convos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant