Skip to content

Decouple IngestionDocumentReader from IngestionPipeline constructor#7454

Draft
Copilot wants to merge 3 commits into
data-ingestion-preview2from
copilot/extend-ingestionpipeline-with-new-method
Draft

Decouple IngestionDocumentReader from IngestionPipeline constructor#7454
Copilot wants to merge 3 commits into
data-ingestion-preview2from
copilot/extend-ingestionpipeline-with-new-method

Conversation

Copilot AI commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

IngestionPipeline<T> required an IngestionDocumentReader at construction time, making it impossible to use with in-memory or programmatically created documents.

Changes

  • IngestionPipeline<T> constructor — removed reader parameter; pipeline now only requires chunker and writer
  • New overloadProcessAsync(IAsyncEnumerable<IngestionDocument>, CancellationToken) processes documents directly without any file-system dependency
  • File-system overloadsProcessAsync(IngestionDocumentReader, DirectoryInfo, ...) and ProcessAsync(IngestionDocumentReader, IEnumerable<FileInfo>, ...) now take reader as a mandatory first argument
  • DiagnosticsConstants — added ProcessDocuments / ProcessDocument activity names for the new overload's tracing
  • Tests — updated existing tests for the new signatures; added CanProcessDocumentsWithoutReader demonstrating direct document ingestion
  • Template + snapshots — updated DataIngestor.cs and all 5 integration-test snapshots
  • READMEs / CHANGELOG — updated Microsoft.Extensions.DataIngestion, MarkItDown, and Markdig docs

Usage without a reader

using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());

IngestionDocument document = new("my-doc-id");
document.Sections.Add(new IngestionDocumentSection());
document.Sections[0].Elements.Add(new IngestionDocumentParagraph("In-memory content."));

await foreach (IngestionResult result in pipeline.ProcessAsync(new[] { document }.ToAsyncEnumerable()))
{
    Console.WriteLine($"{result.DocumentId}: {result.Succeeded}");
}

Usage with a reader (file system)

IngestionDocumentReader reader = new MarkdownReader();
using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());

await foreach (IngestionResult result in pipeline.ProcessAsync(reader, new DirectoryInfo("docs"), "*.md"))
{
    Console.WriteLine($"{result.DocumentId}: {result.Succeeded}");
}
Microsoft Reviewers: Open in CodeFlow

…(IAsyncEnumerable<IngestionDocument>) overload

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/4dc3f0c2-40aa-445e-9392-fa3e254d2d05

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
@github-actions github-actions Bot added the area-ai-templates Microsoft.Extensions.AI.Templates label Apr 7, 2026
@adamsitnik adamsitnik added area-data-ingestion and removed area-ai-templates Microsoft.Extensions.AI.Templates labels Apr 7, 2026
@adamsitnik adamsitnik added this to the Data Ingestion Preview 2 milestone Apr 7, 2026

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot fetch latest changes from https://github.com/dotnet/extensions/tree/data-ingestion-preview2 and solve the merge conflicts

…pilot/extend-ingestionpipeline-with-new-method

# Conflicts:
#	src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md
#	src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
#	src/ProjectTemplates/Microsoft.Extensions.AI.Templates/templates/AIChatWeb-CSharp/AIChatWeb-CSharp.Web/Services/Ingestion/DataIngestor.cs
#	test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
#	test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.A.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
#	test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.A_aoai_aais.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
#	test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb._defaults.verified/aichatweb/Services/Ingestion/DataIngestor.cs
#	test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.o_q.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
#	test/ProjectTemplates/Microsoft.Extensions.AI.Templates.IntegrationTests/Snapshots/aichatweb/aichatweb.oai_aais.verified/aichatweb/Services/Ingestion/DataIngestor.cs

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fetch latest changes from https://github.com/dotnet/extensions/tree/data-ingestion-preview2 and solve the merge conflicts

Done — fetched the latest from data-ingestion-preview2 and resolved all merge conflicts in commit b862662.

Key resolutions:

  • CHANGELOG.md — accepted deletion from the base branch
  • IngestionPipeline.cs — combined both changes: removed generic <T> parameter (from base branch's non-generic refactor) and kept reader out of the constructor / on ProcessAsync overloads (from this PR)
  • IngestionPipelineTests.cs — resolved 4 conflict blocks plus fixed auto-merged PipelineWorksWithEmbeddingGenerator and CanProcessDocumentsWithoutReader tests to use the non-generic types
  • Template + 5 snapshotsDataIngestor.cs now uses new IngestionPipeline(chunker, writer) with reader passed to ProcessAsync

Copilot AI requested a review from adamsitnik June 29, 2026 12:41

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address my feedback

IngestionDocumentReader reader = new MarkdownReader();

using IngestionPipeline<string> pipeline = new(reader, CreateChunker(), CreateWriter());
using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IngestionPipeline is no longer generic, update all readme files

Suggested change
using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());
using IngestionPipeline pipeline = new(CreateChunker(), CreateWriter());

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit — updated all three README files (Markdig, MarkItDown, and DataIngestion) to use IngestionPipeline without the generic type parameter, and also updated the prose description in the DataIngestion README from IngestionPipeline<T> to IngestionPipeline.

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot address my feedback

Done — all README files now use non-generic IngestionPipeline.

@dotnet-comment-bot

Copy link
Copy Markdown
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Diagnostics.Testing Line 99 98.65 🔻
Microsoft.Extensions.Telemetry Line 93 91.95 🔻
Microsoft.Extensions.AI Line 89 88.54 🔻
Microsoft.Extensions.AI Branch 89 88.57 🔻
Microsoft.Extensions.AI.OpenAI Line 75 62.89 🔻
Microsoft.Extensions.AI.OpenAI Branch 75 50.41 🔻
Microsoft.Extensions.DataIngestion.Abstractions Branch 75 72.73 🔻
Microsoft.Extensions.DataIngestion.MarkItDown Line 75 4.46 🔻
Microsoft.Extensions.DataIngestion.MarkItDown Branch 75 0 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring Line 99 96.03 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring Branch 99 94.39 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes Line 99 97.73 🔻
Microsoft.Extensions.ServiceDiscovery.Dns Line 75 68.32 🔻
Microsoft.Extensions.ServiceDiscovery.Abstractions Line 75 42.11 🔻
Microsoft.Extensions.ServiceDiscovery.Abstractions Branch 75 42.86 🔻
Microsoft.Extensions.ServiceDiscovery Line 75 68.11 🔻
Microsoft.Extensions.ServiceDiscovery Branch 75 71.43 🔻
Microsoft.Extensions.ServiceDiscovery.Yarp Line 75 73.85 🔻
Microsoft.Extensions.ServiceDiscovery.Yarp Branch 75 70 🔻
Microsoft.Extensions.VectorData.Abstractions Line 75 37.39 🔻
Microsoft.Extensions.VectorData.Abstractions Branch 75 22.73 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Gen.BuildMetadata 97 100
Microsoft.Gen.MetadataExtractor 57 73
Microsoft.Gen.MetricsReports 67 69
Microsoft.Extensions.AI.Abstractions 82 85
Microsoft.Extensions.AI.Evaluation.NLP 0 78
Microsoft.Extensions.Caching.Hybrid 82 89
Microsoft.Extensions.DataIngestion 75 87
Microsoft.Extensions.DataIngestion.Markdig 75 90
Microsoft.Extensions.Http.Resilience 97 100

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1485806&view=codecoverage-tab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants