-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add OpenAI , process PDF sample #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Thanks for contributing @elbruno! We will review the pull request and get back to you soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new PDF file processing sample project using OpenAI’s Semantic Kernel and updates the solution to include it.
- Introduces
OpenAI-FileProcessing-Pdf-01project to extract structured contract data from PDFs. - Updates the solution file to include the new PDF project.
- Provides a sample
Program.cswith a top-level await and data model for deserializing contract details.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| 03-CoreGenerativeAITechniques/src/OpenAI-FileProcessing-Pdf-01/Program.cs | Implements PDF ingestion, chat history setup, and JSON deserialization into Contract model |
| 03-CoreGenerativeAITechniques/src/OpenAI-FileProcessing-Pdf-01/OpenAI-FileProcessing-Pdf-01.csproj | Defines the new .NET 9.0 console project and dependencies |
| 03-CoreGenerativeAITechniques/src/CoreGenerativeAITechniques.sln | Adds solution entries for the new PDF project |
Comments suppressed due to low confidence (1)
03-CoreGenerativeAITechniques/src/OpenAI-FileProcessing-Pdf-01/Program.cs:53
- Consider adding unit or integration tests around this PDF→chat pipeline to validate JSON output (e.g., mock the
GetChatMessageContentAsynccall and verifyContractdeserialization).
var response = await chatService.GetChatMessageContentAsync(history, executionSettings);
| Console.WriteLine(response.Content); | ||
| Console.WriteLine("---"); | ||
|
|
||
| var contract = JsonSerializer.Deserialize<Contract>(response.ToString()); |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re passing response.ToString() into the deserializer, which may not be the raw JSON. Use response.Content instead to deserialize the actual payload.
| var contract = JsonSerializer.Deserialize<Contract>(response.ToString()); | |
| var contract = JsonSerializer.Deserialize<Contract>(response.Content); |
| var kernel = builder.Build(); | ||
| var chatService = kernel.GetRequiredService<IChatCompletionService>(); | ||
|
|
||
| var filePath = Path.Combine(Directory.GetCurrentDirectory(), "docs", "real-state-contract-1.pdf"); |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The filename uses "real-state" but the domain term is "real-estate"; consider renaming the file and references to real-estate-contract-1.pdf for clarity.
| var filePath = Path.Combine(Directory.GetCurrentDirectory(), "docs", "real-state-contract-1.pdf"); | |
| var filePath = Path.Combine(Directory.GetCurrentDirectory(), "docs", "real-estate-contract-1.pdf"); |
| Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "12 Files", "12 Files", "{E224737A-CFFB-4292-8F73-A543A0387938}" | ||
| EndProject |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution entry references a "12 Files" folder without a project file; it won’t build. Remove or update this stub entry to avoid confusion.
| Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "12 Files", "12 Files", "{E224737A-CFFB-4292-8F73-A543A0387938}" | |
| EndProject |
This pull request introduces several significant changes to the project, including the addition of new projects for video generation and PDF file processing, updates to the solution file to accommodate these changes, and the implementation of a new program for extracting structured data from PDFs using OpenAI's Semantic Kernel. Below is a summary of the most important changes:
Solution Updates
11 Video Generationand12 Files, along with their respective project files (VideoGeneration-AzureSora-01.csprojandOpenAI-FileProcessing-Pdf-01.csproj).DebugandRelease) for the newly added projects.New PDF File Processing Project
OpenAI-FileProcessing-Pdf-01targeting.NET 9.0, with dependencies onMicrosoft.Extensions.Configuration.UserSecretsandMicrosoft.SemanticKernel.Program.csto process real estate contracts in PDF format. The program uses OpenAI's GPT model to extract structured data (e.g., seller, buyer, property details) and outputs it in JSON format.