-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Component(s)
receiver/googlecloudpubsub
What happened?
Description
googlecloudpubsub receiver splits logs over 4MB after removal of raw_text encoding in version 132. That's the gist of it.
First off, thank you all for your work here!
We're exporting Cloud Run logs via this receiver. GCP regularly emits some logs that are a little larger than 4 megabytes. Up until version 131, the googlecloudpubsub
receiver was able to export each of these logs without splitting them, using the raw_text
encoding.
In version 132 (see PR #41813), the raw_text
encoding was removed, and the guidance is to use the text_encoding
extension.
Example config change
Old config:receivers:
googlecloudpubsub:
project: "${env:PROJECT_ID}"
subscription: "projects/${env:PROJECT_ID}/subscriptions/${env:PUBSUB_SUBSCRIPTION}"
encoding: raw_text
New config:
extensions:
text_encoding:
encoding: utf-8
receivers:
googlecloudpubsub:
project: "${env:PROJECT_ID}"
subscription: "projects/${env:PROJECT_ID}/subscriptions/${env:PUBSUB_SUBSCRIPTION}"
encoding: text_encoding
service:
extensions: [text_encoding]
With this change, I'm now observing that logs bigger than 4MB are split up into multiple logs, resulting in malformed JSON and drastically reducing the usefulness of the log.
Steps to Reproduce
I created a docker compose config that reproduces this issue using the pubsub emulator, a python client, and a large JSON payload. I put in some effort to make it easy to run, please let me know if you run into issues: https://github.com/dantheman39/otel-pubsub-debugging
Expected Result
Logs that are above 4MB aren't split into multiple messages.
Actual Result
Single logs above 4MB in size are split into multiple messages.
Collector version
0.135.0
Environment information
Environment
OS: Have seen on docker (in Mac OS), and ubuntu. Can give more specifics if requested.
OpenTelemetry Collector configuration
This reproduces the issue, see linked repo.
extensions:
text_encoding:
encoding: utf-8
receivers:
googlecloudpubsub:
endpoint: "pubsub:8085"
insecure: true
project: "${env:PROJECT_ID}"
subscription: "projects/${env:PROJECT_ID}/subscriptions/${env:PUBSUB_SUBSCRIPTION}"
encoding: text_encoding
processors:
batch: {}
resource/env:
attributes:
- key: deployment.environment
value: "local"
action: upsert
exporters:
debug:
use_internal_logger: false
verbosity: detailed
service:
extensions: [text_encoding]
pipelines:
logs:
receivers: [googlecloudpubsub]
processors: [batch, resource/env]
exporters: [debug]
Log output
Snippets, since these are large:
Logs {"resource logs": 2, "log records": 2}
ResourceLog #0
Resource SchemaURL:
Resource attributes:
-> deployment.environment: Str(local)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-09-18 21:35:37.781243625 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"message": "Execution fake-job-79gc8 has completed successfully."
},
"serviceName": "run.googleapis.com",
"methodName": "/Jobs.RunJob",
"resourceName": "namespaces/fake-project/executions/fake-job-79gc8",
"response": {
"metadata": {
"name": "fake-job-79gc8",
"namespace": "845684099668",
---TRUNCATED----
ResourceLog #1
Resource SchemaURL:
Resource attributes:
-> deployment.environment: Str(local)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-09-18 21:35:37.781243625 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(
{
"name": "VAR_11",
"value": "VAR_11"
},
{
"name": "VAR_12",
"value": "VAR_12"
},
{
"name": "VAR_13",
"value": "VAR_13"
},
{
"name": "VAR_14",
"value": "VAR_14"
},
{
"name": "VAR_15",
"value": "VAR_15"
}
],
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.