-
Notifications
You must be signed in to change notification settings - Fork 122
Implement Image support #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
First, thanks for this work. At quick glance, I got a c06-image error Also, this example, does not seem to have a correct base64 image. Anyway, let me play with it, to get a better gist of the changes. I wanted to add it but didn’t have the time, so this is excellent. If it aligns with the API, I will squash and merge. I might make post-commits for API changes if realignments are needed. But this looks to be a great start. Thanks. |
btw, thanks for the context for
This could be seen as an asymmetry between a remote URL and a local file, but I understand your point that LLM services accept remote URLs and obviously cannot handle local files. Therefore, the goal is to have the I see your point. However, providing a true local file reference at the In that vein, I’m not sure if the Either way, this doesn’t take away from the great work done here—this is awesome. Just making sure we consider all angles. |
Hi @jeremychone, Thank you for checking this out. You were correct that the example wasn't compiling. I fixed it and renamed to use prefix c07. Also to allow on faster verification how this code works I added it to cargo examples, so it can be started by: cargo run --example images Like you noticed I provided fake Base64 in example. I didn't wanted to focus on how to properly read and encode image, because that is out of scope of this example and would require install some extra dev-dependencies and also I didn't wanted to include any picture to project (with possible problems with licensing etc.). I switched this example to Url, I use image provided by Wikipedia, so it should allow on running this example as is without modification. Looking at your other commits and issue list I picked up same feeling, that currently there are more important features to be handled. Honestly, I thought it would be harder job than it was. I was unsure where to put I would be cautious with including file support inside library. Like I stated in comments this introducers another point of failure and what is more important introduces new dependency. Implementation of file handling by I think more important matter is that not all API supports both images sources and that content-type is required. After merging I thought to work on proper error handling for most common AI providers. I saw your comments that wrong types don't raise any warning or error. If you have ideas for improvements let me know, I would like to help in doing that for this feature in the future. Unless there is something critical required to be fixed, I would like to limit changes in current Pull Request to minimum. |
I also had my doubts. I started from approach where |
Thanks for the feedback. I hear you on the file part; but I might differ, but I am okay with deferring the decision for now. So, here is the plan.
Then, I will need to decide if that warrants a genai 0.2.0 jump or not, depending on how much API change it involves. After that, I will probably squash all my commits after yours in the Hope this makes sense. We might differ on the File issue, but that can come later. It should not be structural to the API anyway. Again, thanks for the great work. |
This PR has been merged manually at 59f0b14 (GitHub did not pick up the merge, but it was). @AdamStrojek Thank you for the great work. I made some subsequent commits to add tests and some minor API modifications. |
This pull request addresses a significant gap in the GenAI crate by adding initial image support for chat requests.
Key updates:
ChatMessage
structure to supportContentPart
, enabling more flexibility.Testing and results:
I've successfully tested the API calls with various chat requests:
In all cases, the Large Language Model responded accurately to the requests. However, I encountered difficulties adhering strictly to your proposed interface due to differences in how each API handles images. As a result, I extracted a common interface that requires providing MIME/Content-Type for each image.
Future possibilities:
This interface could be utilized later to enable PDF support for Anthropic API.
Remaining work:
Two aspects require further attention:
TryFrom<File/PathBuf>
forContentPart
so handling this error would be on user side