-
Notifications
You must be signed in to change notification settings - Fork 843
Introduce set of built-in Enrichers #6957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a suite of built-in enricher classes for the data ingestion pipeline, leveraging AI chat models to enhance document chunks with additional metadata. These enrichers provide semantic analysis capabilities including summarization, sentiment analysis, keyword extraction, content classification, and image alternative text generation.
Key changes:
- Added five new enricher implementations that process ingestion chunks/documents using AI chat clients
- Implemented test suite with comprehensive coverage for all enrichers
- Added utility method
ToListAsyncfor async enumerable testing support
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Utils/IAsyncEnumerableExtensions.cs | Added ToListAsync helper method for converting async enumerables to lists in tests |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Processors/SummaryEnricherTests.cs | Test suite for summary text generation enricher |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Processors/SentimentEnricherTests.cs | Test suite for sentiment analysis enricher |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Processors/KeywordEnricherTests.cs | Test suite for keyword extraction enricher |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Processors/ClassificationEnricherTests.cs | Test suite for content classification enricher |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Processors/AlternativeTextEnricherTests.cs | Test suite for image alternative text generation enricher |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj | Added reference to shared TestChatClient test utility |
| src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SummaryEnricher.cs | Enricher implementation that generates summary text for chunks |
| src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SentimentEnricher.cs | Enricher implementation that analyzes sentiment (Positive/Negative/Neutral/Unknown) |
| src/Libraries/Microsoft.Extensions.DataIngestion/Processors/KeywordEnricher.cs | Enricher implementation that extracts keywords from chunk content |
| src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ImageAlternativeTextEnricher.cs | Enricher implementation that generates alternative text descriptions for images |
| src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs | Enricher implementation that classifies chunks into predefined categories |
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/KeywordEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ImageAlternativeTextEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ImageAlternativeTextEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/KeywordEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SentimentEnricher.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SummaryEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SummaryEnricher.cs
Show resolved
Hide resolved
- use ChatOptions.Instructions - validate the responses
…e prompt message to better handle wordCount = 1
|
@stephentoub I believe I've addressed all your blocking concerns. Could you PTAL? Thanks! |
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ClassificationEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/ImageAlternativeTextEnricher.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/KeywordEnricher.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SentimentEnricher.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Processors/SummaryEnricher.cs
Show resolved
Hide resolved
… 10.0.0 (#73) Updated [Microsoft.Extensions.Http.Resilience](https://github.com/dotnet/extensions) from 9.10.0 to 10.0.0. <details> <summary>Release notes</summary> _Sourced from [Microsoft.Extensions.Http.Resilience's releases](https://github.com/dotnet/extensions/releases)._ ## 10.0.0 ## What's Changed * Give FunctionInvokingChatClient span a more OTELy name by @verdie-g in dotnet/extensions#6911 * Update repository branding from 9.10 to 10.0 by @Copilot in dotnet/extensions#6907 * Clean up local function names in AIFunctionFactory by @Copilot in dotnet/extensions#6909 * Update OpenTelemetryChatClient to output data on all tools by @stephentoub in dotnet/extensions#6906 * Update ToChatResponse{Async} to also factor in AuthorName by @stephentoub in dotnet/extensions#6910 * add support for background responses by @SergeyMenshykh in dotnet/extensions#6854 * Fix `METGEN004` error message: print return type in `ErrorInvalidMethodReturnType` by @eduherminio in dotnet/extensions#6905 * Fix OpenTelemetryChatClient failing on unknown content types by @stephentoub in dotnet/extensions#6915 * Add support for Connector ID and other follow ups by @jozkee in dotnet/extensions#6881 * Update AI lib changelogs by @stephentoub in dotnet/extensions#6920 * Merge internal changes by @joperezr in dotnet/extensions#6921 * Add Workstream, Stage, and PackageValidationBaselineVersion metadata to ServiceDiscovery libraries by @Copilot in dotnet/extensions#6919 * Add back Uri ctor to HostedMcpServerTool by @jozkee in dotnet/extensions#6926 * Set DisableNETStandardCompatErrors in ServiceDiscovery libraries by @eerhardt in dotnet/extensions#6927 * Update Package validation baseline version to 9.10.0 by @Copilot in dotnet/extensions#6922 * [main] Update dependencies from dotnet/arcade by @dotnet-maestro[bot] in dotnet/extensions#6802 * Extend service discovery to support Consul-based DNS lookups: by @bart-vmware in dotnet/extensions#6914 * Update AsOpenAIResponseItems to roundtrip User AIContent ResponseItems by @stephentoub in dotnet/extensions#6931 * Special-case AIContent returned from AIFunctionFactory.Create AIFunctions to not be serialized by @stephentoub in dotnet/extensions#6935 * Preserve function content in `SummarizingChatReducer` by @MackinnonBuck in dotnet/extensions#6908 * Tool reduction by @MackinnonBuck in dotnet/extensions#6781 * Fix coalescing of TextReasoningContent with ProtectedData by @stephentoub in dotnet/extensions#6936 * Doc updates by @gewarren in dotnet/extensions#6930 * Support DisplayNameAttribute for name resolution in AI libraries by @Copilot in dotnet/extensions#6942 * Fix EquivalenceEvaluator MaxOutputTokens to meet Azure OpenAI minimum requirement by @Copilot in dotnet/extensions#6948 * Support DefaultValueAttribute in AIFunctionFactory parameter handling by @Copilot in dotnet/extensions#6947 * Bump vite from 6.3.6 to 6.4.1 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @dependabot[bot] in dotnet/extensions#6938 * Introduce Microsoft.Extensions.DataIngestion.Abstractions by @adamsitnik in dotnet/extensions#6949 * Update to latest schema version (accepted by MCP registry) by @joelverhagen in dotnet/extensions#6956 * Introduce IngestionChunkWriter build on top of MEVD by @adamsitnik in dotnet/extensions#6951 * Update AI Chat Web dependencies by @MackinnonBuck in dotnet/extensions#6955 * Add AITool -> OpenAI.Responses.ResponseTool conversion utility by @rogerbarreto in dotnet/extensions#6958 * Update AI changelogs for 9.10.1 by @stephentoub in dotnet/extensions#6950 * Add Name property to OtelMessage to store ChatMessage.AuthorName per OpenTelemetry semantic conventions by @Copilot in dotnet/extensions#6953 * Fix serialization of UserInputRequest/ResponseContent by @stephentoub in dotnet/extensions#6962 * Expose building blocks for external service discovery implementations by @bart-vmware in dotnet/extensions#6946 * Bump validator from 13.15.0 to 13.15.20 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @dependabot[bot] in dotnet/extensions#6974 * Add eng/sdl-tsa-vars.config for TSA integration by @Copilot in dotnet/extensions#6980 * Add CodeInterpreterToolCall/ResultContent content types by @stephentoub in dotnet/extensions#6964 * Update to 1.38 of the otel genai standard convention by @stephentoub in dotnet/extensions#6981 * Introduce set of built-in Enrichers by @adamsitnik in dotnet/extensions#6957 * Allow ChatOptions.ConversationId to be an OpenAI conversation ID with Responses by @stephentoub in dotnet/extensions#6960 * Fix warning breaking official build, enable warningAsError in all pipelines by @ericstj in dotnet/extensions#6988 * Introduce HeaderChunker by @adamsitnik in dotnet/extensions#6979 * Introduce Markdown readers by @adamsitnik in dotnet/extensions#6969 * Add usage telemetry for aieval dotnet tool by @shyamnamboodiripad in dotnet/extensions#6773 * Update to OpenAI 2.6.0 by @stephentoub in dotnet/extensions#6996 * Don't specify MaxOutputTokens for EquivalenceEvaluator by @shyamnamboodiripad in dotnet/extensions#7006 * Fix Assert.Throws to validate parameter names by @stephentoub in dotnet/extensions#7007 ... (truncated) Commits viewable in [compare view](dotnet/extensions@v9.10.0...v10.0.0). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Microsoft Reviewers: Open in CodeFlow