High quality resources & applications for LLMs, multi-modal models and VectorDBs
-
Updated
Sep 18, 2025 - Jupyter Notebook
High quality resources & applications for LLMs, multi-modal models and VectorDBs
【新增PDF和Office文件解析上传】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、PDF和Office文件解析等 | GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates and parsing PDF and Office documents.
The most advanced Web UI for AI chat
Cool experiments at the intersection of Computer Vision and Sports ⚽🏃
SGPT is a command-line tool that provides a convenient way to interact with OpenAI models, enabling users to run queries, generate shell commands and produce code directly from the terminal.
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
Convert a screenshot to a working Flutter app.
A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage visual recognition, and engage in voice interactions. It integrates seamlessly with local LLMs and commercial models like OpenAI, Gemini, Perplexity, and Claude, and allows to converse with uploaded documents and websites.
Maintained version of bettergpt. An amazing UI for OpenAI's ChatGPT (Website + Windows + MacOS + Linux). https://discord.gg/2CKfAbAJrH
Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦
GPT-4 Vision Chatbot examples
ChatGPT wrapper in your TTY
GPT 4 Turbo Vision with Chainlit
This sample project integrates OpenAI's GPT-4 Vision, with advanced image recognition capabilities, and DALL·E 3, the state-of-the-art image generation model, with the Chat completions API. This powerful combination allows for simultaneous image creation and analysis.
Language instructions to mycobot using GPT-4V
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
This tool offers an interactive way to analyze and understand your screenshots using OpenAI's GPT-4 Vision API. Capture any part of your screen and engage in a dialogue with ChatGPT to uncover detailed insights, ask follow-up questions, and explore visual data in a user-friendly format.
Using Azure OpenAI deployment of GPT-4 Turbo with Vision to analyse out-of-stock situation in a fictitious retail shop.
Add a description, image, and links to the gpt-4-vision topic page so that developers can more easily learn about it.
To associate your repository with the gpt-4-vision topic, visit your repo's landing page and select "manage topics."