|
| 1 | +--- |
| 2 | +title: Batch Processor (deprecated) |
| 3 | +redirect_from: |
| 4 | + - /sdk/telemetry/spans/batch-processor/ |
| 5 | +sidebar_order: 10 |
| 6 | +--- |
| 7 | + |
| 8 | +<Alert level="warning"> |
| 9 | + The BatchProcessor is deprecated. Please use the [Telemetry Buffer](/sdk/telemetry/telemetry-buffer/) instead. |
| 10 | +</Alert> |
| 11 | + |
| 12 | +<Alert> |
| 13 | + This document uses key words such as "MUST", "SHOULD", and "MAY" as defined in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) to indicate requirement levels. |
| 14 | +</Alert> |
| 15 | + |
| 16 | +# BatchProcessor (deprecated) |
| 17 | + |
| 18 | +This section covers the initial specification of the BatchProcessor, which some SDKs use as a reference when implementing logs. This exists only as a reference until we fully spec out the [telemetry buffer](/sdk/telemetry/telemetry-buffer/) across all platforms. |
| 19 | + |
| 20 | +## Overview |
| 21 | + |
| 22 | +The BatchProcessor batches spans and logs into one envelope to reduce the number of HTTP requests. When an SDK implements span streaming or logs, it **MUST** use a BatchProcessor, which is similar to [OpenTelemetry's Batch Processor](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md). The BatchProcessor holds logs and finished spans in memory and batches them together into envelopes. It uses a combination of time and size-based batching. At the time of writing, the BatchProcessor only handles spans and logs, but an SDK **MAY** use it for other telemetry data in the future. |
| 23 | + |
| 24 | +## Specification |
| 25 | + |
| 26 | +Whenever the SDK finishes a span or captures a log, it **MUST** put it into the BatchProcessor. The SDK **MUST NOT** put unfinished spans into the BatchProcessor. |
| 27 | + |
| 28 | +The BatchProcessor **MUST** start a timeout of 5 seconds when the SDK adds the first span or log. When the timeout expires, the BatchProcessor **MUST** forward all spans or logs to the transport, no matter how many items it contains. The SDK **MAY** choose a different value for the timeout, but it **MUST NOT** exceed 30 seconds, as this can lead to problems with the span buffer on the backend, which uses a time interval of 60 seconds for determining segments for spans. The BatchProcessor **SHOULD** only start a new timeout when it has spans or logs to send, this avoids running the timeout unnecessarily. |
| 29 | + |
| 30 | +The BatchProcessor **MUST** forward all items to the transport after the SDK when containing spans or logs exceeding 1MiB in size. The SDK **MAY** choose a different value for the max batch size keeping the [envelope max sizes](/sdk/data-model/envelopes/#size-limits) in mind. The SDK **MUST** calculate the size of a span or a log to manage the BatchProcessor's memory footprint, as well as serialize the span or log and calculate the size based on the serialized JSON bytes. As serialization is expensive, the BatchProcessor **SHOULD** keep track of the serialized spans and logs and pass these to the envelope to avoid serializing multiple times. |
| 31 | + |
| 32 | +When the BatchProcessor forwards all spans or logs to the transport, it **MUST** reset its timeout and remove all spans and logs. The SDK **MUST** apply filtering and sampling before adding spans or logs to the BatchProcessor. The SDK **MUST** apply rate limits to spans and logs after they leave the BatchProcessor to send as much data as possible by dropping data as late as possible. |
| 33 | + |
| 34 | +The BatchProcessor **MUST** forward all spans and logs in memory to the transport to avoid data loss in the following scenarios: |
| 35 | + |
| 36 | +1. When the user calls `SentrySDK.flush()`, the BatchProcessor **MUST** forward all data in memory to the transport, and only then **SHOULD** the transport flush the data. |
| 37 | +2. When the user calls `SentrySDK.close()`, the BatchProcessor **MUST** forward all data in memory to the transport. SDKs **SHOULD** keep their existing closing behavior. |
| 38 | +3. When the application shuts down gracefully, the BatchProcessor **SHOULD** forward all data in memory to the transport. The transport **SHOULD** keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call a transport `flush`. This is mostly relevant for mobile SDKs already subscribed to these hooks, such as [applicationWillTerminate](https://developer.apple.com/documentation/uikit/uiapplicationdelegate/applicationwillterminate(_:)) on iOS. |
| 39 | +4. When the application moves to the background, the BatchProcessor **SHOULD** forward all the data in memory to the transport and stop the timer. The transport **SHOULD** keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call the transport `flush`. This is mostly relevant for mobile SDKs. |
| 40 | +5. Mobile SDKs **MUST** minimize data loss when sudden process terminations occur. Refer to the [Mobile Telemetry Buffer](/sdk/telemetry/telemetry-buffer/mobile-telemetry-buffer) section for more details. |
| 41 | + |
| 42 | +The detailed specification is written in the [Gherkin syntax](https://cucumber.io/docs/gherkin/reference/). The specification uses spans as an example, but the same applies to logs or any other future telemetry data. |
| 43 | + |
| 44 | + |
| 45 | +```Gherkin |
| 46 | +Scenario: No spans in BatchProcessor 1 span added |
| 47 | + Given no spans in the BatchProcessor |
| 48 | + When the SDK finishes 1 span |
| 49 | + Then the SDK puts this span to the BatchProcessor |
| 50 | + And starts a timeout of 5 seconds |
| 51 | + And doesn't forward the span to the transport |
| 52 | +
|
| 53 | +Scenario: Span added before timeout exceeds |
| 54 | + Given span A in the BatchProcessor |
| 55 | + Given 4.9 seconds pass |
| 56 | + When the SDK finishes span B |
| 57 | + Then the SDK adds span B to the BatchProcessor |
| 58 | + And doesn't reset the timeout |
| 59 | + And doesn't forward the spans A and B in the BatchProcessor to the transport |
| 60 | +
|
| 61 | +Scenario: Timeout exceeds and no spans or logs to send |
| 62 | + Given no spans in the BatchProcessor |
| 63 | + When the timeout exceeds |
| 64 | + Then the BatchProcessor does nothing |
| 65 | + And doesn't start a new timeout |
| 66 | +
|
| 67 | +Scenario: Spans with size of 1 MiB - 1 byte added, timeout exceeds |
| 68 | + Given spans with size of 1 MiB - 1 byte in the BatchProcessor |
| 69 | + When the timeout exceeds |
| 70 | + Then the SDK adds all the spans to one envelope |
| 71 | + And forwards them to the transport |
| 72 | + And resets the timeout |
| 73 | + And clears the BatchProcessor |
| 74 | +
|
| 75 | +Scenario: Spans with size of 1 MiB - 1 byte added within 4.9 seconds |
| 76 | + Given spans with size of 1 MiB - 1 byte in the BatchProcessor |
| 77 | + When the SDK finishes another span and puts it into the BatchProcessor |
| 78 | + Then the BatchProcessor puts all spans into one envelope |
| 79 | + And forwards the envelope to the transport |
| 80 | + And resets the timeout |
| 81 | + And clears the BatchProcessor |
| 82 | +
|
| 83 | +Scenario: Unfinished spans |
| 84 | + Given no span is in the BatchProcessor |
| 85 | + When the SDK starts a span but doesn't finish it |
| 86 | + Then the BatchProcessor is empty |
| 87 | +
|
| 88 | +Scenario: Span filtered out |
| 89 | + Given no span is in the BatchProcessor |
| 90 | + When the finishes a span |
| 91 | + And the span is filtered out |
| 92 | + Then the BatchProcessor is empty |
| 93 | +
|
| 94 | +Scenario: Span not sampled |
| 95 | + Given no span is in the BatchProcessor |
| 96 | + When the finishes a span |
| 97 | + And the span is not sampled |
| 98 | + Then the BatchProcessor is empty |
| 99 | +
|
| 100 | +Scenario: 1 span added application crashes |
| 101 | + Given 1 span in the SpansAggregator |
| 102 | + When the SDK detects a crash |
| 103 | + Then the SDK does nothing with the items in the BatchProcessor |
| 104 | + And loses the spans in the BatchProcessor |
| 105 | +
|
| 106 | +``` |
| 107 | + |
| 108 | +<PageGrid /> |
0 commit comments