feat(azure_blob sink): add append blob support via blob_type option by Danielku15 · Pull Request #25627 · vectordotdev/vector

Danielku15 · 2026-06-15T16:31:11Z

Summary

Note

This PR overlaps with with #25545
I kept this PR as draft to update it with the tag support once the other PR is merged. append blobs will also need tagging/metadata support. Feel free to review this PR already.

Adds blob_type: append support to the azure_blob sink, implementing #19397.

The default behavior (blob_type: block) is unchanged. When blob_type: append is set, each flush appends to a stable-named Azure Append Blob rather than creating a new uniquely-named blob per batch. This is the natural model for continuous log streaming where you want a single growing file per time window.

Key design decisions:

Type-aware defaults: blob_time_format defaults to %Y-%m-%d (daily rotation) and blob_append_uuid defaults to false for append type, matching the expected append use case. Both can still be overridden.
EAFP append flow: try append_block first (hot path = 1 API call for an existing blob), create the blob on 404, retry. A 409 Conflict on create is swallowed — it means a concurrent writer created the blob first.
Azure hard limit enforcement: batch.max_bytes is automatically defaulted to 4 MiB when blob_type: append is configured and the setting is not explicitly set. Values above 4 MiB are rejected at startup with a clear error.
Compressed append blobs produce concatenated compressed frames (one per batch). Decompressors that support multi-stream formats (gunzip, zstd -d) handle these correctly.

Vector configuration

Minimal append blob configuration:

sinks:
  my_append_logs:
    type: azure_blob
    inputs: [...]
    connection_string: "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net"
    container_name: "logs"
    blob_prefix: "app/%F/"
    blob_type: append
    encoding:
      codec: json

This produces a single blob per day (e.g. app/2024-07-18/2024-07-18.log) and appends each batch to it. Defaults are: blob_time_format: "%Y-%m-%d", blob_append_uuid: false, batch.max_bytes: 4194304.

Explicit batch size and custom rotation:

sinks:
  my_append_logs:
    type: azure_blob
    inputs: [...]
    connection_string: "..."
    container_name: "logs"
    blob_prefix: "app/"
    blob_type: append
    blob_time_format: "%Y-%m-%d-%H"   # hourly rotation
    batch:
      max_bytes: 2097152              # 2 MiB per block
    encoding:
      codec: json

How did you test this PR?

Unit tests (cargo test --no-default-features --features sinks-azure_blob sinks::azure_blob): 27 tests pass, including new tests for:
- Append blob request building with daily time format and no UUID
- Compressed append blob request building
- Stable key generation (no UUID, empty time format)
- UUID override in append mode
- Hourly rotation via custom blob_time_format
- Default blob_type is block
- Config parsing of blob_type = "append"
- blob_type: append with no explicit batch.max_bytes succeeds at startup (C1 regression test)
- blob_type: append with batch.max_bytes > 4 MiB fails at startup with a max_bytes exceeds error
- Direct validate().limit_max_bytes() rejection for oversized values
Integration tests (cargo vdev int test azure): tested against Azurite (local Azure emulator) covering:
- Append blob reuses the same blob across multiple batches
- Content ordering is preserved across flushes
- JSON encoding with content-type verification
- Default daily rotation (blob name changes at day boundary)
- Multiple forced flushes all land in a single blob

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Closes: AppendBlob support for azure_blob sink #19397

Notes

Please read our Vector contributor resources.
Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
Some CI checks run only after we manually approve them.
- We recommend adding a pre-push hook, please see this template.
- Alternatively, we recommend running the following locally before pushing to the remote branch:
  - make fmt
  - make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
  - make test
After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run git merge origin master and git push.
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

…te docs

Danielku15 · 2026-06-15T17:33:27Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5734e634c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T17:46:34Z

+        if self.blob_type == AzureBlobType::Append && batch.max_bytes.is_none() {
+            batch.max_bytes = Some(APPEND_BLOB_MAX_BLOCK_BYTES);


Respect omitted append batch max_bytes

For append configs that set any other [batch] field but omit max_bytes (for example batch.timeout_secs), BatchConfig's per-field serde default has already populated batch.max_bytes with the bulk default of 10_000_000. This condition is therefore false, and the later limit_max_bytes(4 MiB) rejects the config at startup even though the user did not explicitly configure max_bytes, contradicting the new append-mode default.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T17:46:34Z

+                AzureBlobType::Append => {
+                    append_blob(
+                        &blob_client.append_blob_client(),
+                        request.blob_data,
+                        request.content_type,
+                        request.content_encoding,
+                    )
+                    .await


Serialize append writes per blob

When blob_type = append, these calls still go through the existing sink driver with configurable/adaptive request concurrency, so two batches targeting the same partition can execute append_blob at the same time. Azure orders appended blocks by the order the service receives them, not the original event order, so under request.concurrency > 1 (or after adaptive concurrency ramps up) flushes to the same blob can be persisted out of order; append mode needs per-blob serialization or an equivalent ordering guard.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T17:46:34Z

+    match append_client
+        .append_block(RequestContent::from(data.to_vec()), data_len, None)
+        .instrument(info_span!("request").or_current())
+        .await


Avoid retrying non-idempotent appends

This append mutates the blob, but the request is still wrapped in the existing retry policy, which retries timeouts and Azure 5xx/429 errors by cloning and replaying the same AzureBlobRequest. If Azure commits the block and the client then observes a timeout or transient response error, the retry path appends the same batch again; append mode needs an append-position/ETag condition or retries disabled for unsafe cases.

Useful? React with 👍 / 👎.

Overstated. Vector is explicitly at-least-once (guarantees.md: duplicates "possible," exactly-once unsupported). No Vector sink implements append-position/ETag idempotency — CloudWatch's sequence token is service-provided, not a replicable pattern. Implementing Azure append_position needs stateful per-blob offset tracking, incompatible with the stateless service + partitioned stream driver, and inconsistent with every peer sink. Also the retry logic only retries 5xx/429 (not None/timeouts — config.rs:474), so the window is narrow (Azure commits, then returns 5xx).

Plan: document in the blob_type field docs + changelog that append is at-least-once and a retried flush may re-append on rare server errors; note request.retry_attempts = 0 for at-most-once. I'd push back on building idempotency here as out of scope.

chatgpt-codex-connector · 2026-06-15T17:46:34Z

+    content_type: &str,
+    content_encoding: Option<&str>,
+) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
+    let data_len = data.len() as u64;


Enforce the limit on encoded append payloads

The startup check limits the batcher's pre-encoding event size, but Azure enforces the append-block limit on the serialized/compressed body length used here. With blob_type: append and JSON logs containing many escapable characters (or gzip overhead around an incompressible near-limit batch), Vector can produce data.len() > 4 MiB and send it to append_block, causing Azure to reject the whole batch despite the config validation.

Useful? React with 👍 / 👎.

Consistent with the dominant pattern. Kinesis firehose (4 MB), streams (5 MB), pubsub (10 MB), loki (1 MB), cloudwatch batch (1 MB) all rely solely on pre-encoding batch.max_bytes. With the default gzip, encoded ≤ pre-encoding, so 4 MiB pre-encoding is headroom. Risk only with compression disabled + escape-heavy JSON.

chatgpt-codex-connector · 2026-06-15T17:46:34Z

+    /// - `append`: each batch appends to the same blob.
+    ///   `blob_append_uuid` defaults to `false`; `blob_time_format` defaults to `%Y-%m-%d`.
+    ///   Multiple batches within the same time window write to the same blob.


Document SAS permissions for appends

For SAS-authenticated configs, the existing connection-string guidance only tells users to grant Read/Create permissions, which can pass healthcheck and blob creation but is insufficient for Append Block. This new append option should document that SAS tokens also need Add or Write permission; otherwise users following the component docs get 403s on every flush in append mode.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T17:46:34Z

+    let request_metadata = request_metadata_builder.build(&payload);
+    let request = request_options.build_request(metadata, request_metadata, payload);
+
+    let expected_date = Utc::now().format("%Y-%m-%d").to_string();


Capture one timestamp in date-sensitive tests

This assertion calls Utc::now() after build_request, which already formatted its key using an earlier Utc::now(). If the test runs across a UTC date boundary between those calls, the generated key legitimately contains the previous day while the expected value uses the new day, creating a rare but avoidable flaky failure; capture the time window once or inject a clock into the request builder test.

Useful? React with 👍 / 👎.

…fault, ordering, docs)

feat(azure_blob sink): add append blob support via blob_type option

0ae378f

github-actions Bot added the domain: sinks Anything related to the Vector's sinks label Jun 15, 2026

chore(azure_blob sink): fix lints, add changelog author, and regenera…

e5734e6

…te docs

github-actions Bot added domain: external docs Anything related to Vector's external, public documentation docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. labels Jun 15, 2026

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

chore(azure_blob sink): address append blob review feedback (batch de…

4029daa

…fault, ordering, docs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(azure_blob sink): add append blob support via blob_type option#25627

feat(azure_blob sink): add append blob support via blob_type option#25627
Danielku15 wants to merge 3 commits into
vectordotdev:masterfrom
Danielku15:feature/append-blob

Danielku15 commented Jun 15, 2026

Uh oh!

Danielku15 commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

Danielku15 Jun 16, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

Danielku15 Jun 16, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if self.blob_type == AzureBlobType::Append && batch.max_bytes.is_none() {
		batch.max_bytes = Some(APPEND_BLOB_MAX_BLOCK_BYTES);

Conversation

Danielku15 commented Jun 15, 2026

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

Danielku15 commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Danielku15 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Danielku15 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant