Decouple IngestionDocumentReader from IngestionPipeline constructor#7454
Decouple IngestionDocumentReader from IngestionPipeline constructor#7454Copilot wants to merge 3 commits into
Conversation
…(IAsyncEnumerable<IngestionDocument>) overload Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/4dc3f0c2-40aa-445e-9392-fa3e254d2d05 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot fetch latest changes from https://github.com/dotnet/extensions/tree/data-ingestion-preview2 and solve the merge conflicts
…pilot/extend-ingestionpipeline-with-new-method # Conflicts: # src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md # src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs # src/ProjectTemplates/Microsoft.Extensions.AI.Templates/templates/AIChatWeb-CSharp/AIChatWeb-CSharp.Web/Services/Ingestion/DataIngestor.cs # test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs # test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.A.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs # test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.A_aoai_aais.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs # test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb._defaults.verified/aichatweb/Services/Ingestion/DataIngestor.cs # test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.o_q.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs # test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.oai_aais.verified/aichatweb/Services/Ingestion/DataIngestor.cs
Done — fetched the latest from Key resolutions:
|
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot address my feedback
| IngestionDocumentReader reader = new MarkdownReader(); | ||
|
|
||
| using IngestionPipeline<string> pipeline = new(reader, CreateChunker(), CreateWriter()); | ||
| using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter()); |
There was a problem hiding this comment.
IngestionPipeline is no longer generic, update all readme files
| using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter()); | |
| using IngestionPipeline pipeline = new(CreateChunker(), CreateWriter()); |
There was a problem hiding this comment.
Fixed in the latest commit — updated all three README files (Markdig, MarkItDown, and DataIngestion) to use IngestionPipeline without the generic type parameter, and also updated the prose description in the DataIngestion README from IngestionPipeline<T> to IngestionPipeline.
Done — all README files now use non-generic |
🎉 Good job! The coverage increased 🎉
Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1485806&view=codecoverage-tab |
IngestionPipeline<T>required anIngestionDocumentReaderat construction time, making it impossible to use with in-memory or programmatically created documents.Changes
IngestionPipeline<T>constructor — removedreaderparameter; pipeline now only requireschunkerandwriterProcessAsync(IAsyncEnumerable<IngestionDocument>, CancellationToken)processes documents directly without any file-system dependencyProcessAsync(IngestionDocumentReader, DirectoryInfo, ...)andProcessAsync(IngestionDocumentReader, IEnumerable<FileInfo>, ...)now takereaderas a mandatory first argumentDiagnosticsConstants— addedProcessDocuments/ProcessDocumentactivity names for the new overload's tracingCanProcessDocumentsWithoutReaderdemonstrating direct document ingestionDataIngestor.csand all 5 integration-test snapshotsMicrosoft.Extensions.DataIngestion,MarkItDown, andMarkdigdocsUsage without a reader
Usage with a reader (file system)
Microsoft Reviewers: Open in CodeFlow