Refactor stanzareceiver into a helper package (1/2) #2306

djaglowski · 2021-02-08T20:12:29Z

Link to tracking Issue:
This PR partially addresses the following issues:

Resolves: Refactor stanzareceiver to allow splitting into separate receivers #2265
Related: Add filelog receiver #2268, Make sure regex, json, timestamp, severity parser operators are supported for all log receivers. #2282.

Description:

The main idea here is to convert stanzareceiver into a helper package for building various other stanza-based receivers. Each of these other receivers will only vary by input operator. Functionality pulled out of stanzareceiver was moved into a new filelogreceiver. stanzareceiver should most likely be renamed and/or moved, but is left in its previous package for this initial PR.

stanzareceiver defines an interface called LogReceiverType which each stanza-based receiver must implement and pass to stanzareceiver.NewFactory(LogReceiverType) component.ReceiverFactory.

With this interface, each stanza-based receiver should only need a small amount of work to have a fully functional receiver. Support for parsing operations, emission from stanza's internal pipeline, and conversion to pdata format are all handled in the helper package so that these will be standardized across all the full set of stanza-based receivers.

Next Steps
Input operators are not yet isolated to the top level of the configuration. The end goal is:

filelog:
 include: [ receiver/stanzareceiver/testdata/simple.log ]
 start_at: beginning
 operators:
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev

but the current state is still:

filelog:
 operators:
   - type: file_input
      include: [ receiver/stanzareceiver/testdata/simple.log ]
      start_at: beginning
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev

The primary requirement #2265 is to promote the input operator to the top level of the receiver config. This will be the focus of the next PR. This PR is mostly concerned with splitting up the package. The configuration changes might be a little messy so I wanted to address those separately.

On the subject of configuration - the interface defined by stanzareceiver has a method Decode(configmodels.Receiver) (pipeline.Config, error) which is in my opinion much too loosely defined. Too much responsibility is delegated to each stanza-based receiver. The main reason this is left this way for now is that stanza operators do not currently use mapstructure for config unmarshaling. There is currently a workaround in place, but once stanza operators are migrated to mapstructure, more responsibility for unmarshaling should be extracted back into the helper package, and this interface method should end up a lot cleaner. I'm planning to look into this in the next PR.

Open questions (which can be addressed in this PR or the next):

Should the helper package be completely standalone, or does it belong in receivercreator or similar?
If the helper package should be standalone, what should it be called? (probably not stanzareceiver)

Temporarily removed functionality
This functionality will be implemented in the near future. There is some design to do on how exactly this should work when used by multiple receivers:

Offsets database (tracked by Design and implement persistence mechanism for log receivers #2287)
Plugins (tracked as item on Add basic log collection capabilities #2264)

Testing:
Unit tests are roughly the same as before. A few cases were dropped because they no longer applied. Certainly more tests will be added as this pattern is solidified.

Testbed scenario is unchanged and still passing:

> make run-tests
./runtests.sh
=== RUN   TestLog10kDPS
=== RUN   TestLog10kDPS/OTLP
... (abbreviated)
=== RUN   TestLog10kDPS/Stanza
... (abbreviated)
--- PASS: TestLog10kDPS (30.73s)
    --- PASS: TestLog10kDPS/OTLP (15.32s)
    --- PASS: TestLog10kDPS/Stanza (15.41s)
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests_unstable_exe    31.406s
# Test PerformanceResults
Started: Mon, 08 Feb 2021 13:35:08 -0500

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     15s|    19.9|    20.6|         39|         47|    149900|        149900|
Log10kDPS/Stanza                        |PASS  |     15s|    28.4|    29.3|         40|         48|    150000|        150000|

Total duration: 31s

codecov · 2021-02-08T20:27:39Z

Codecov Report

Merging #2306 (5b4172c) into main (ffce884) will decrease coverage by 0.03%.
The diff coverage is 95.12%.

@@            Coverage Diff             @@
##             main    #2306      +/-   ##
==========================================
- Coverage   90.63%   90.59%   -0.04%     
==========================================
  Files         400      401       +1     
  Lines       19895    19899       +4     
==========================================
- Hits        18031    18028       -3     
- Misses       1411     1414       +3     
- Partials      453      457       +4

Flag	Coverage Δ
integration	`69.33% <ø> (+0.06%)`	⬆️
unit	`89.41% <95.12%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
receiver/stanzareceiver/converter.go	`100.00% <ø> (ø)`
receiver/stanzareceiver/factory.go	`92.59% <92.59%> (-7.41%)`	⬇️
receiver/filelogreceiver/filelog.go	`100.00% <100.00%> (ø)`
receiver/stanzareceiver/config.go	`71.42% <100.00%> (ø)`
receiver/stanzareceiver/receiver.go	`100.00% <100.00%> (ø)`
exporter/signalfxexporter/dimensions/requests.go	`82.35% <0.00%> (-9.81%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffce884...5b4172c. Read the comment docs.

receiver/filelogreceiver/filelog.go

receiver/stanzareceiver/converter.go

receiver/stanzareceiver/factory.go

receiver/filelogreceiver/filelog.go

receiver/stanzareceiver/factory.go

receiver/filelogreceiver/filelog.go

testbed/datasenders/stanza.go

receiver/stanzareceiver/factory.go

tigrannajaryan · 2021-02-09T22:37:24Z

Should the helper package be completely standalone, or does it belong in receivercreator or similar?

If the helper package should be standalone, what should it be called? (probably not stanzareceiver)

Maybe place in internal/logreceiverhelper for now.

tigrannajaryan · 2021-02-10T15:53:49Z

Please resolve the conflicts.

receiver/stanzareceiver/mocks_test.go

tigrannajaryan · 2021-02-10T15:57:27Z

@djaglowski thanks a lot for working on this. You will unblock the rest of the logs issues. I know @pmm-sumo is also eager to work on some.

…lector-contrib into stanza-helper

djaglowski · 2021-02-10T16:05:55Z

Of course. I'm eager to move it forward. Will have the followup PR asap.

tigrannajaryan

LGTM

tigrannajaryan · 2021-02-10T22:44:33Z

Will merge after the release. Something is wrong with CircleCI building the release, waiting for that.

Signed-off-by: Bogdan Drutu <[email protected]>

**Link to tracking Issue:** This PR partially addresses the following issues: - Resolves: #2265 - Related: #2268, #2282. **Description:** The main idea here is to convert `stanzareceiver` into a helper package for building various other stanza-based receivers. Each of these other receivers will only vary by input operator. Functionality pulled out of `stanzareceiver` was moved into a new `filelogreceiver`. `stanzareceiver` should most likely be renamed and/or moved, but is left in its previous package for this initial PR. `stanzareceiver` defines an interface called `LogReceiverType` which each stanza-based receiver must implement and pass to `stanzareceiver.NewFactory(LogReceiverType) component.ReceiverFactory`. With this interface, each stanza-based receiver should only need a small amount of work to have a fully functional receiver. Support for parsing operations, emission from stanza's internal pipeline, and conversion to pdata format are all handled in the helper package so that these will be standardized across all the full set of stanza-based receivers. **Next Steps** Input operators are _not yet_ isolated to the top level of the configuration. The end goal is: ``` filelog: include: [ receiver/stanzareceiver/testdata/simple.log ] start_at: beginning operators: - type: regex_parser regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$' timestamp: parse_from: time layout: '%Y-%m-%d' severity: parse_from: sev ``` but the current state is still: ``` filelog: operators: - type: file_input include: [ receiver/stanzareceiver/testdata/simple.log ] start_at: beginning - type: regex_parser regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$' timestamp: parse_from: time layout: '%Y-%m-%d' severity: parse_from: sev ``` The primary requirement #2265 is to promote the input operator to the top level of the receiver config. This will be the focus of the next PR. This PR is mostly concerned with splitting up the package. The configuration changes might be a little messy so I wanted to address those separately. On the subject of configuration - the interface defined by `stanzareceiver` has a method `Decode(configmodels.Receiver) (pipeline.Config, error)` which is in my opinion much too loosely defined. Too much responsibility is delegated to each stanza-based receiver. The main reason this is left this way for now is that `stanza` operators do not currently use `mapstructure` for config unmarshaling. There is currently a workaround in place, but once stanza operators are migrated to `mapstructure`, more responsibility for unmarshaling should be extracted back into the helper package, and this interface method should end up a lot cleaner. I'm planning to look into this in the next PR. **Open questions** (which can be addressed in this PR or the next): - Should the helper package be completely standalone, or does it belong in `receivercreator` or similar? - If the helper package should be standalone, what should it be called? (probably not `stanzareceiver`) **Temporarily removed functionality** This functionality will be implemented in the near future. There is some design to do on how exactly this should work when used by multiple receivers: - Offsets database (tracked by #2287) - Plugins (tracked as item on #2264) **Testing:** Unit tests are roughly the same as before. A few cases were dropped because they no longer applied. Certainly more tests will be added as this pattern is solidified. Testbed scenario is unchanged and still passing: ``` > make run-tests ./runtests.sh === RUN TestLog10kDPS === RUN TestLog10kDPS/OTLP ... (abbreviated) === RUN TestLog10kDPS/Stanza ... (abbreviated) --- PASS: TestLog10kDPS (30.73s) --- PASS: TestLog10kDPS/OTLP (15.32s) --- PASS: TestLog10kDPS/Stanza (15.41s) PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests_unstable_exe 31.406s # Test PerformanceResults Started: Mon, 08 Feb 2021 13:35:08 -0500 Test |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items| ----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:| Log10kDPS/OTLP |PASS | 15s| 19.9| 20.6| 39| 47| 149900| 149900| Log10kDPS/Stanza |PASS | 15s| 28.4| 29.3| 40| 48| 150000| 150000| Total duration: 31s ```

djaglowski added 3 commits February 8, 2021 14:03

WIP - extracting individual receiver from stanzareceiver

fdd92fb

WIP - most tests passing

bf93cea

Enable all former unit tests

600d238

djaglowski added 4 commits February 8, 2021 15:50

Clean up lint

80c24eb

Remove unused method

6680010

Fix lint, actually

1451135

Fix group order

5525016

djaglowski marked this pull request as ready for review February 8, 2021 21:46

djaglowski requested review from a team and tigrannajaryan February 8, 2021 21:46

github-actions bot assigned owais Feb 8, 2021

bogdandrutu assigned tigrannajaryan and unassigned owais Feb 9, 2021

tigrannajaryan reviewed Feb 9, 2021

View reviewed changes

djaglowski added 3 commits February 10, 2021 10:26

Address PR feedback except converter

5f60076

Make converter stateless

69dd912

Remove version from LogReceiverType interface

9856a1e

djaglowski mentioned this pull request Feb 10, 2021

Move and rename stanzareceiver #2321

Closed

Merge branch 'main' into stanza-helper

f325272

tigrannajaryan reviewed Feb 10, 2021

View reviewed changes

receiver/stanzareceiver/mocks_test.go Outdated Show resolved Hide resolved

djaglowski added 2 commits February 10, 2021 10:58

Clean up mock log receiver type

ae90fa6

Merge branch 'stanza-helper' of github.com:observIQ/opentelemetry-col…

f984739

…lector-contrib into stanza-helper

tigrannajaryan approved these changes Feb 10, 2021

View reviewed changes

Tidy mod files

5b4172c

tigrannajaryan merged commit 339dd56 into open-telemetry:main Feb 11, 2021

djaglowski mentioned this pull request Feb 11, 2021

Refactor stanzareceiver to allow splitting into separate receivers #2265

Closed

djaglowski deleted the stanza-helper branch February 11, 2021 14:16

tigrannajaryan added the spec:logs label Feb 13, 2021

kisieland referenced this pull request in kisieland/opentelemetry-collector-contrib Mar 16, 2021

Remove contribtest from circleCI, start using github action (#2306)

7a6598f

Signed-off-by: Bogdan Drutu <[email protected]>

Refactor stanzareceiver into a helper package (1/2) #2306

Refactor stanzareceiver into a helper package (1/2) #2306

Uh oh!

Conversation

djaglowski commented Feb 8, 2021 • edited by tigrannajaryan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tigrannajaryan commented Feb 9, 2021

Uh oh!

tigrannajaryan commented Feb 10, 2021

Uh oh!

Uh oh!

tigrannajaryan commented Feb 10, 2021

Uh oh!

djaglowski commented Feb 10, 2021

Uh oh!

tigrannajaryan left a comment

Choose a reason for hiding this comment

Uh oh!

tigrannajaryan commented Feb 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djaglowski commented Feb 8, 2021 •

edited by tigrannajaryan

Loading

codecov bot commented Feb 8, 2021 •

edited

Loading