feat(azure_blob sink): add append blob support via blob_type option#25627
feat(azure_blob sink): add append blob support via blob_type option#25627Danielku15 wants to merge 3 commits into
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5734e634c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if self.blob_type == AzureBlobType::Append && batch.max_bytes.is_none() { | ||
| batch.max_bytes = Some(APPEND_BLOB_MAX_BLOCK_BYTES); |
There was a problem hiding this comment.
Respect omitted append batch max_bytes
For append configs that set any other [batch] field but omit max_bytes (for example batch.timeout_secs), BatchConfig's per-field serde default has already populated batch.max_bytes with the bulk default of 10_000_000. This condition is therefore false, and the later limit_max_bytes(4 MiB) rejects the config at startup even though the user did not explicitly configure max_bytes, contradicting the new append-mode default.
Useful? React with 👍 / 👎.
| AzureBlobType::Append => { | ||
| append_blob( | ||
| &blob_client.append_blob_client(), | ||
| request.blob_data, | ||
| request.content_type, | ||
| request.content_encoding, | ||
| ) | ||
| .await |
There was a problem hiding this comment.
Serialize append writes per blob
When blob_type = append, these calls still go through the existing sink driver with configurable/adaptive request concurrency, so two batches targeting the same partition can execute append_blob at the same time. Azure orders appended blocks by the order the service receives them, not the original event order, so under request.concurrency > 1 (or after adaptive concurrency ramps up) flushes to the same blob can be persisted out of order; append mode needs per-blob serialization or an equivalent ordering guard.
Useful? React with 👍 / 👎.
| match append_client | ||
| .append_block(RequestContent::from(data.to_vec()), data_len, None) | ||
| .instrument(info_span!("request").or_current()) | ||
| .await |
There was a problem hiding this comment.
Avoid retrying non-idempotent appends
This append mutates the blob, but the request is still wrapped in the existing retry policy, which retries timeouts and Azure 5xx/429 errors by cloning and replaying the same AzureBlobRequest. If Azure commits the block and the client then observes a timeout or transient response error, the retry path appends the same batch again; append mode needs an append-position/ETag condition or retries disabled for unsafe cases.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Overstated. Vector is explicitly at-least-once (guarantees.md: duplicates "possible," exactly-once unsupported). No Vector sink implements append-position/ETag idempotency — CloudWatch's sequence token is service-provided, not a replicable pattern. Implementing Azure append_position needs stateful per-blob offset tracking, incompatible with the stateless service + partitioned stream driver, and inconsistent with every peer sink. Also the retry logic only retries 5xx/429 (not None/timeouts — config.rs:474), so the window is narrow (Azure commits, then returns 5xx).
Plan: document in the blob_type field docs + changelog that append is at-least-once and a retried flush may re-append on rare server errors; note request.retry_attempts = 0 for at-most-once. I'd push back on building idempotency here as out of scope.
| content_type: &str, | ||
| content_encoding: Option<&str>, | ||
| ) -> Result<(), Box<dyn std::error::Error + Send + Sync>> { | ||
| let data_len = data.len() as u64; |
There was a problem hiding this comment.
Enforce the limit on encoded append payloads
The startup check limits the batcher's pre-encoding event size, but Azure enforces the append-block limit on the serialized/compressed body length used here. With blob_type: append and JSON logs containing many escapable characters (or gzip overhead around an incompressible near-limit batch), Vector can produce data.len() > 4 MiB and send it to append_block, causing Azure to reject the whole batch despite the config validation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Consistent with the dominant pattern. Kinesis firehose (4 MB), streams (5 MB), pubsub (10 MB), loki (1 MB), cloudwatch batch (1 MB) all rely solely on pre-encoding batch.max_bytes. With the default gzip, encoded ≤ pre-encoding, so 4 MiB pre-encoding is headroom. Risk only with compression disabled + escape-heavy JSON.
| /// - `append`: each batch appends to the same blob. | ||
| /// `blob_append_uuid` defaults to `false`; `blob_time_format` defaults to `%Y-%m-%d`. | ||
| /// Multiple batches within the same time window write to the same blob. |
There was a problem hiding this comment.
Document SAS permissions for appends
For SAS-authenticated configs, the existing connection-string guidance only tells users to grant Read/Create permissions, which can pass healthcheck and blob creation but is insufficient for Append Block. This new append option should document that SAS tokens also need Add or Write permission; otherwise users following the component docs get 403s on every flush in append mode.
Useful? React with 👍 / 👎.
| let request_metadata = request_metadata_builder.build(&payload); | ||
| let request = request_options.build_request(metadata, request_metadata, payload); | ||
|
|
||
| let expected_date = Utc::now().format("%Y-%m-%d").to_string(); |
There was a problem hiding this comment.
Capture one timestamp in date-sensitive tests
This assertion calls Utc::now() after build_request, which already formatted its key using an earlier Utc::now(). If the test runs across a UTC date boundary between those calls, the generated key legitimately contains the previous day while the expected value uses the new day, creating a rare but avoidable flaky failure; capture the time window once or inject a clock into the request builder test.
Useful? React with 👍 / 👎.
…fault, ordering, docs)
Summary
Note
This PR overlaps with with #25545
I kept this PR as draft to update it with the tag support once the other PR is merged. append blobs will also need tagging/metadata support. Feel free to review this PR already.
Adds
blob_type: appendsupport to theazure_blobsink, implementing #19397.The default behavior (
blob_type: block) is unchanged. Whenblob_type: appendis set, each flush appends to a stable-named Azure Append Blob rather than creating a new uniquely-named blob per batch. This is the natural model for continuous log streaming where you want a single growing file per time window.Key design decisions:
blob_time_formatdefaults to%Y-%m-%d(daily rotation) andblob_append_uuiddefaults tofalsefor append type, matching the expected append use case. Both can still be overridden.append_blockfirst (hot path = 1 API call for an existing blob), create the blob on 404, retry. A 409 Conflict on create is swallowed — it means a concurrent writer created the blob first.batch.max_bytesis automatically defaulted to 4 MiB whenblob_type: appendis configured and the setting is not explicitly set. Values above 4 MiB are rejected at startup with a clear error.gunzip,zstd -d) handle these correctly.Vector configuration
Minimal append blob configuration:
This produces a single blob per day (e.g.
app/2024-07-18/2024-07-18.log) and appends each batch to it. Defaults are:blob_time_format: "%Y-%m-%d",blob_append_uuid: false,batch.max_bytes: 4194304.Explicit batch size and custom rotation:
How did you test this PR?
cargo test --no-default-features --features sinks-azure_blob sinks::azure_blob): 27 tests pass, including new tests for:blob_time_formatblob_typeisblockblob_type = "append"blob_type: appendwith no explicitbatch.max_bytessucceeds at startup (C1 regression test)blob_type: appendwithbatch.max_bytes > 4 MiBfails at startup with amax_bytes exceedserrorvalidate().limit_max_bytes()rejection for oversized valuescargo vdev int test azure): tested against Azurite (local Azure emulator) covering:Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.