googlecloudpubsub receiver splits logs over 4MB after removal of raw_text encoding in version 132

### Component(s)

receiver/googlecloudpubsub

### What happened?

## Description

googlecloudpubsub receiver splits logs over 4MB after removal of raw_text encoding in version 132. That's the gist of it.

First off, thank you all for your work here!

We're exporting Cloud Run logs via this receiver. GCP regularly emits some logs that are a little larger than 4 megabytes. Up until version 131, the `googlecloudpubsub` receiver was able to export each of these logs without splitting them, using the `raw_text` encoding.

In version 132 (see PR https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/41813), the `raw_text` encoding was removed, and the guidance is to use the `text_encoding` extension.

<details>
  <summary>Example config change</summary>
Old config:

```yaml
receivers:
  googlecloudpubsub:
    project: "${env:PROJECT_ID}"
    subscription: "projects/${env:PROJECT_ID}/subscriptions/${env:PUBSUB_SUBSCRIPTION}"
    encoding: raw_text
```

New config:

```yaml
extensions:
  text_encoding:
    encoding: utf-8

receivers:
  googlecloudpubsub:
    project: "${env:PROJECT_ID}"
    subscription: "projects/${env:PROJECT_ID}/subscriptions/${env:PUBSUB_SUBSCRIPTION}"
    encoding: text_encoding

service:
  extensions: [text_encoding]
```
</details>

With this change, I'm now observing that logs bigger than 4MB are split up into multiple logs, resulting in malformed JSON and drastically reducing the usefulness of the log.

## Steps to Reproduce

I created a docker compose config that reproduces this issue using the pubsub emulator, a python client, and a large JSON payload. I put in some effort to make it easy to run, please let me know if you run into issues: https://github.com/dantheman39/otel-pubsub-debugging

## Expected Result

Logs that are above 4MB aren't split into multiple messages.

## Actual Result

Single logs above 4MB in size are split into multiple messages.

### Collector version

0.135.0

### Environment information

## Environment
OS: Have seen on docker (in Mac OS), and ubuntu. Can give more specifics if requested.


### OpenTelemetry Collector configuration

This reproduces the issue, see linked repo.

```yaml
extensions:
  text_encoding:
    encoding: utf-8

receivers:
  googlecloudpubsub:
    endpoint: "pubsub:8085"
    insecure: true
    project: "${env:PROJECT_ID}"
    subscription: "projects/${env:PROJECT_ID}/subscriptions/${env:PUBSUB_SUBSCRIPTION}"
    encoding: text_encoding

processors:
  batch: {}
  resource/env:
    attributes:
      - key: deployment.environment
        value: "local"
        action: upsert

exporters:
  debug:
    use_internal_logger: false
    verbosity: detailed

service:
  extensions: [text_encoding]
  pipelines:
    logs:
      receivers: [googlecloudpubsub]
      processors: [batch, resource/env]
      exporters: [debug]
```

### Log output

```shell
Snippets, since these are large:


Logs	{"resource logs": 2, "log records": 2}
ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> deployment.environment: Str(local)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-09-18 21:35:37.781243625 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "message": "Execution fake-job-79gc8 has completed successfully."
    },
    "serviceName": "run.googleapis.com",
    "methodName": "/Jobs.RunJob",
    "resourceName": "namespaces/fake-project/executions/fake-job-79gc8",
    "response": {
      "metadata": {
        "name": "fake-job-79gc8",
        "namespace": "845684099668",
---TRUNCATED----



ResourceLog #1
Resource SchemaURL:
Resource attributes:
     -> deployment.environment: Str(local)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-09-18 21:35:37.781243625 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(
                  {
                    "name": "VAR_11",
                    "value": "VAR_11"
                  },
                  {
                    "name": "VAR_12",
                    "value": "VAR_12"
                  },
                  {
                    "name": "VAR_13",
                    "value": "VAR_13"
                  },
                  {
                    "name": "VAR_14",
                    "value": "VAR_14"
                  },
                  {
                    "name": "VAR_15",
                    "value": "VAR_15"
                  }
                ],
```

### Additional context

_No response_

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

googlecloudpubsub receiver splits logs over 4MB after removal of raw_text encoding in version 132 #42775

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

googlecloudpubsub receiver splits logs over 4MB after removal of raw_text encoding in version 132 #42775

Description

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions