More reliable response body size and `data_received` metric emission #117

imiric · 2021-11-16T12:56:56Z

This is related to #90, but doesn't really fix the issue as the current behavior is correct. See this comment.

Essentially Frame being nil is a valid state for responses to preflight/OPTIONS requests, so the fix in b2052bb is all we need to do for handling this case.

What this PR does is:

move the data_received metric emission to the LoadingFinished event handler, which according to this comment is more reliable. This results in more accurate data_received metric values.

Before, on main (54cd293) with the script from this comment:
```
data_received....................: 17 MB  737 kB/s
```
After, on a9d9868:
```
data_received....................: 39 MB  754 kB/s
```
I confirmed this is more accurate by comparing with the Chrome DevTools Network tab.

This change also produces less error fetching response body warning events, though it doesn't eliminate them entirely, except when running the test suite.
enhance the logging so that we can see the type of request when Frame is nil. In all cases I've seen with the script from Tom these are preflight/OPTIONS requests which don't have an associated FrameID.

robingustafsson

LGTM, I'll let @inancgumus comment on the synchronization part :)

inancgumus

Wow, a lot of work, good job. Minor requests.

common/response.go

imiric · 2021-12-24T16:27:36Z

After some more research into this, for a reason I'm unable to explain, we just don't receive a ResponseReceived CDP event for some requests. 😕

We get RequestWillBeSent and LoadingFinished, but between those we should also get ResponseReceived, which happens for most request flows. This is not only the case for preflight/OPTIONS requests, but some standard GET or POST requests as well. I haven't noticed a pattern where it happens for certain requests or URLs only; it can basically happen at any point with the Vistaprint script.

So it's not a matter of synchronizing how events are processed (thankfully! 😅), but the fact we just don't get the ResponseReceived event.

I did a hacky fallback so that we at least don't panic when this happens, but metrics for these requests will be less useful.

I'll pick this back up when I'm back from vacation, but in the meantime @inancgumus, if you have some time, please test the Vistaprint script and any others you have on this branch and compare it to main. From my tests error fetching response body is much less frequent (it's not printed anymore in tests when running with go test -v), and the data_received metric seems more accurate now (I compared it with the DevTools Network tab), though it's probably not 100% accurate.

inancgumus · 2021-12-29T10:20:55Z

From my tests, error fetching response body is much less frequent

It was displayed 16 times on the main branch:

# main
$ go test ./tests -v -count=1 |& grep 'error fetching response body' | wc -l
16

I'm no longer observing these error log messages on this branch:

# fix/90-nil-request-frame
$ go test ./tests -v -count=1 |& grep 'error fetching response body' | wc -l
0

Using Tom's original script, I see the errors three times on the main branch:

# main
$ XK6_BROWSER_LOG=debug xk6 run -q ../tests/tom.js |& grep 'error fetching response body' | wc -l
3

However, I see them twice on this branch:

# fix/90-nil-request-frame
$ XK6_BROWSER_LOG=debug xk6 run -q ../tests/tom.js |& grep 'error fetching response body' | wc -l
2

I wrote an improved version of Tom's script that reaches almost to the checkout page:

Using it, I see the errors 5 times on the main branch.
However, I see them 23 times on this branch.
I rebased Fix/49 cannot find context with specified id #169 onto this branch, used the modified script, and saw the errors 100 times.
However, when I used the modified script only on Fix/49 cannot find context with specified id #169, I saw the errors 29 times.

Here's the modified script.

import { sleep, group } from 'k6';
import launcher from 'k6/x/browser';

import { randomIntBetween } from 'https://jslib.k6.io/k6-utils/1.1.0/index.js';

export const options = {};

let browser, context, page;

const pauseMin = 0;
const pauseMax = 1;

export default function() {
  initialize();

  navigateToHomepage();
  clickBusinessCards();
  clickBrowseDesigns();
  clickDesign();
  clickCustomizeDesign();
  clickNext();
  continueWithoutBack();
  customerReviewed();
  addToCart();
  continuePastRecommendations();
  continueToCart();
  goToCheckout();
}

function initialize() {
  browser = launcher.launch('chromium', {
    args: [],
    debug: false,
    devtools: false,
    env: {},
    executablePath: null,
    headless: false,
    ignoreDefaultArgs: [],
    proxy: {},
    slowMo: '500ms',
    timeout: '30s',
  });

  context = browser.newContext({
    ignoreHTTPSErrors: true,
    screen: {width: 1920, height: 1080},
    viewport: {width: 1920, height: 1080},
  });

  page = context.newPage();
}

function navigateToHomepage() {
  console.log('Starting navigateToHomepage...');

  group("Navigate to Homepage", () => {
    page.goto('https://vistaprint.ca', { waitUntil: 'networkidle' });
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickBusinessCards() {
  console.log('Starting clickBusinessCards...');

  group("Click Business Cards", () => {
    page.click("//span[text()='Business Cards']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickBrowseDesigns() {
  console.log('Starting clickBrowseDesigns...');

  group("Click Browse Designs", () => {
    page.waitForSelector("//a[text()='Browse designs']");
    page.click("//a[text()='Browse designs']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickDesign() {
  console.log('Starting clickDesign...');

  const designOption = page.waitForSelector(`li.design-tile >> nth=0`);
  console.log(designOption.textContent());
  
  group("Select Design Option", () => {
    page.click(`li.design-tile >> nth=0`)
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickCustomizeDesign() {
  console.log('Starting clickCustomizeDesign...');

  
  group("Click Customize Design", () => {
    page.click("//a[text()='Customize this design']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickNext() {
  console.log('Starting clickNext...');

  group("Click Next", () => {
    page.click("//button[text()='Next']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function continueWithoutBack() {
  console.log('Starting continueWithoutBack...');

  group("Click Continue Without Back", () => {
    page.click("//button[text()='Continue without back']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function customerReviewed() {
  console.log('Starting customerReviewed...');

  page.click("//input[@value='customerReviewed']");

  sleep(5);
  
  group("Click Customer Reviewed", () => {
    page.click("//button[text()='Continue']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function addToCart() {
  console.log('Starting addToCart...');

  group("Click Add to Cart", () => {
    page.click("//button[text()='Add to cart']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function continuePastRecommendations() {
  console.log('Starting continuePastRecommendations...');

  group("Click Continue (Recommendations)", () => {
    page.waitForSelector("//a[text()='Continue']");
    page.click("//a[text()='Continue']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function continueToCart() {
  console.log('Starting continueToCart...');

  group("Click Continue to Cart", () => {
    page.click("//a[text()='Continue to cart']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function goToCheckout() {
  console.log('Starting goToCheckout...');

  group("Click Checkout", () => {
    page.click("//a[text()='Checkout']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

imiric · 2022-01-13T11:55:03Z

@inancgumus It's difficult to judge the improvement based on that Vistaprint script since it's a very dynamic site and we use random sleeps in the script you're testing with, so the amount of requests will be different each time. Moreover, I see the error fetching response body message for blob URLs (e.g. blob:https://www.vistaprint.ca/1a44f750-8ddd-4d3b-a454-8f1ee270946e), which we should probably ignore for determining body size like we do for data URLs, since they're internal URLs used only in the browser.

This PR isn't meant to completely eliminate the error fetching response body message, caused by an underlying No data found for resource with given identifier (-32000) error, for which we don't have a fix yet. What it does is move response metric emission to the LoadingFinished event handler, which results in more accurate data_received metrics (you can confirm this within Chrome DevTools, or by packet capture).

E.g. on main running that example script:

data_received....................: 17 MB  737 kB/s

And on this branch rebased on current main:

data_received....................: 39 MB  754 kB/s

That said, it's surprising that you get a ~4x increase of the error on this branch, or that you have that many to begin with. My difference is +/- ~5, and usually not more than 10 in total on either branch.

I'll run some more tests and add ignoring of blob URLs.

inancgumus · 2022-01-13T13:00:39Z

It's difficult to judge the improvement based on that Vistaprint script since it's a very dynamic site and we use random sleeps in the script you're testing with, so the amount of requests will be different each time.
That said, it's surprising that you get a ~4x increase of the error on this branch, or that you have that many to begin with. My difference is +/- ~5, and usually not more than 10 in total on either branch.

I agree 👍 I was measuring this in an earlier version of the main branch, things probably have changed since then 🤷

we should probably ignore for determining body size like we do for data URLs, since they're internal URLs used only in the browser

👍

imiric · 2022-01-14T12:01:11Z

@inancgumus This is ready for review. Please test again when compared to main.

Like I mentioned above, comparing the amount of error fetching response body messages is unreliable, since the Vistaprint test script is so dynamic. For example, sometimes running it on main would fail after ~30s with the error cannot call function on expression and would finish at the Click Add to Cart stage, while running it on this branch would run for longer (~1m), but still fail at the Click Continue (Recommendations) stage, so it makes sense that more error messages would be produced. I also reverted the removal of checking Content-Length which should make us rely less on CDP if the header exists.

Instead of focusing on the message count, you can confirm that data_received reporting is more accurate now, and that the message is gone when running the test suite.

I'm frankly unsure if we should merge this as it's difficult to judge the improvement or whether it's not breaking anything else, but from a metrics standpoint it seems like an improvement.

imiric · 2022-02-04T09:07:44Z

@inancgumus WDYT about merging this?

See ChromeDevTools/devtools-protocol#44 (comment) There is a race condition here where sometimes we receive the LoadingFinished event before we finish processing ResponseReceived, so we don't set the Response on the Request and req.response is nil in onLoadingFinished resulting in a nil pointer panic, but I wasn't able to synchronize this properly.

Ideally we shouldn't need to do this, and with the previous change of emitting response metrics on LoadingFinished, fetchBody() is more reliable.

imiric · 2022-02-04T09:23:06Z

I pushed a rebase on main.

A smaller test that shows the improvement is the fillform.js example I've been testing for grafana/k6#4469.

When running on current main (0f2ba32):

$ XK6_BROWSER_LOG=info XK6_HEADLESS=1 xk6 run examples/fillform.js
[...]
WARN[0001] error fetching response body for 'https://test.k6.io/': No data found for resource with given identifier (-32000)  category=cdp elapsed="0 ms" goroutine=75
ERRO[0003] unknown command or event "Page.frameScheduledNavigation"
ERRO[0003] unknown command or event "Page.frameClearedScheduledNavigation"
WARN[0003] error fetching response body for 'https://test.k6.io/my_messages.php': No data found for resource with given identifier (-32000)  category=cdp elapsed="0 ms" goroutine=75
ERRO[0007] unknown command or event "Page.frameScheduledNavigation"
ERRO[0007] unknown command or event "Page.frameClearedScheduledNavigation"
WARN[0007] error fetching response body for 'https://test.k6.io/login.php': No data found for resource with given identifier (-32000)  category=cdp elapsed="0 ms" goroutine=75

And on this branch (3295ede):

$ XK6_BROWSER_LOG=info XK6_HEADLESS=1 xk6 run examples/fillform.js
[...]
ERRO[0003] unknown command or event "Page.frameScheduledNavigation"
ERRO[0003] unknown command or event "Page.frameClearedScheduledNavigation"
ERRO[0007] unknown command or event "Page.frameScheduledNavigation"
ERRO[0007] unknown command or event "Page.frameClearedScheduledNavigation"

inancgumus · 2022-02-04T09:33:43Z

OK let me review this today.

imiric · 2022-02-04T10:04:15Z

And to confirm that the data_received metric is now more accurate, here's a simple example script:

// simple.js
import launcher from 'k6/x/browser';
import { sleep } from 'k6';

export default function() {
  const browser = launcher.launch('chromium', {
    headless: true,
  });
  const context = browser.newContext();
  const page = context.newPage();

  page.goto('https://test.k6.io/', { waitUntil: 'networkidle' });
  sleep(1);  // waitFor workaround

  page.close();
  browser.close();
}

Running this on main (0f2ba32) with XK6_BROWSER_LOG=info xk6 run -o json=out-main.json examples/simple.js and then filtering out the data_received samples:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url)' out-main.json
{"type":"Point","data":{"time":"2022-02-04T06:44:31.082861+01:00","value":174,"tags":{"from_cache":"false","from_prefetch_cache":"false","from_service_worker":"false","group":"","method":"GET","proto":"http/1.1","scenario":"default","status":"200","url":"https://test.k6.io/"}},"metric":"data_received"}
{"type":"Point","data":{"time":"2022-02-04T06:44:31.287852+01:00","value":5076,"tags":{"from_cache":"false","from_prefetch_cache":"false","from_service_worker":"false","group":"","method":"GET","proto":"http/1.1","scenario":"default","status":"200","url":"https://test.k6.io/static/css/site.css"}},"metric":"data_received"}
{"type":"Point","data":{"time":"2022-02-04T06:44:31.87362+01:00","value":10112,"tags":{"from_cache":"false","from_prefetch_cache":"false","from_service_worker":"false","group":"","method":"GET","proto":"http/1.1","scenario":"default","status":"200","url":"https://test.k6.io/static/js/prisms.js"}},"metric":"data_received"}

Only the sizes:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | {url: .data.tags.url, received: .data.value}' out-main.json
{"url":"https://test.k6.io/","received":174}
{"url":"https://test.k6.io/static/css/site.css","received":5076}
{"url":"https://test.k6.io/static/js/prisms.js","received":10112}

Summing up:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | .data.value' out-main.json | paste -sd+ | bc
15362

Comparing this to fix/90-nil-request-frame (3295ede) with XK6_BROWSER_LOG=info xk6 run -o json=out-117.json examples/simple.js:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | {url: .data.tags.url, received: .data.value}' out-117.json
{"url":"https://test.k6.io/","received":11277}
{"url":"https://test.k6.io/static/css/site.css","received":5076}
{"url":"https://test.k6.io/static/js/prisms.js","received":10112}

Sum:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | .data.value' out-117.json | paste -sd+ | bc
26465

To confirm which one is correct:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | .data.tags.url' out-main.json | xargs -I{} sh -c 'echo {}; curl -is {} | wc --bytes'
https://test.k6.io/
11277
https://test.k6.io/static/css/site.css
5076
https://test.k6.io/static/js/prisms.js
10112

😄

inancgumus

Overall, it's good. Glad that we can finally get rid of these ancient error messages!

But I'd like to separate k6 concerns from the browser concerns as much as we can.

sampleTags := k6stats.IntoSampleTags(&tags)
k6stats.PushIfNotDone(m.ctx, state.Samples, k6stats.ConnectedSamples{
	...
})

So, would you like to move this part of the code to another function like:

func (m *NetworkManager) emitResponseMetrics(resp *Response) {
	...
	emitK6Metrics(...)
	...
}

func emitK6Metrics(...) {
	sampleTags := k6stats.IntoSampleTags(&tags)
	k6stats.PushIfNotDone( ... )
	...
	k6stats.PushIfNotDone( ... )
	...
}

I know what you're going to say: "I thought you were against creating functions that are not used by multiple functions". I'm not against them as long as they are contextual and serve as a chance to make things better. I see it as a first step in refactoring and simplifying the extension code later.

common/network_manager.go

common/response.go

common/network_manager.go

common/response.go

inancgumus · 2022-02-04T10:08:38Z

Thanks for the short jq tutorial as well, nice work! 😄 👍

inancgumus · 2022-02-04T10:16:24Z

Btw, are we going to have a test for this?

imiric · 2022-02-04T10:19:52Z

I'm not sure how to test this TBH without an E2E test like that script.

Hhmm though we might be able to get the metric for some httpmultibin endpoint and assert on that?

I'll look into it.

inancgumus · 2022-02-04T10:24:47Z

Hhmm though we might be able to get the metric for some httpmultibin endpoint and assert on that?

It sounds like a plan!

common/response.go

imiric · 2022-02-04T11:09:54Z

I'd like to separate k6 concerns from the browser concerns as much as we can.

I think that makes sense, but this function is not the place to do it. All of these metrics are "k6 metrics". The custom ones we emit is mostly done from onPageLifecycle().

inancgumus · 2022-02-04T11:12:18Z

I don't mean that but never mind, no big deal.

imiric · 2022-02-04T11:43:34Z

@inancgumus A test for this would be similar to e6db54c, and we decided to revert that because of waitFor* issues (#200). So let's wait until that is fixed to add a test for both #40 and this. It would likely be the same test, actually.

Let me know if there's anything pending here, otherwise let's merge it.

(Sorry for the last force push... It was only for adding a test in f29064e.)

inancgumus · 2022-02-04T11:51:51Z

Fair enough, let's merge it 🚀 Maybe create an issue for testing both, so we don't forget? And maybe add a note next to grafana/k6#4469.

imiric requested review from inancgumus and robingustafsson November 16, 2021 12:56

imiric force-pushed the fix/90-nil-request-frame branch from 46d16b5 to e6d688f Compare November 16, 2021 16:23

imiric changed the base branch from main to fix/109-panic-nil-networkidlectx November 16, 2021 16:23

robingustafsson approved these changes Nov 17, 2021

View reviewed changes

inancgumus suggested changes Nov 18, 2021

View reviewed changes

common/response.go Outdated Show resolved Hide resolved

common/response.go Outdated Show resolved Hide resolved

common/response.go Outdated Show resolved Hide resolved

imiric force-pushed the fix/109-panic-nil-networkidlectx branch 2 times, most recently from 2f50e4d to fa39ea5 Compare November 18, 2021 13:51

Base automatically changed from fix/109-panic-nil-networkidlectx to main November 18, 2021 14:10

imiric mentioned this pull request Nov 18, 2021

Fix data race on Response.body #124

Merged

imiric force-pushed the fix/90-nil-request-frame branch from e6d688f to 8718977 Compare November 18, 2021 14:42

imiric mentioned this pull request Nov 18, 2021

Request.frame is sometimes nil #90

Closed

imiric force-pushed the fix/90-nil-request-frame branch from 8718977 to 405ee49 Compare December 24, 2021 16:03

imiric requested a review from inancgumus December 24, 2021 16:27

inancgumus assigned imiric Jan 3, 2022

imiric mentioned this pull request Jan 14, 2022

Negative browser_first_meaningful_paint metric #191

Closed

imiric force-pushed the fix/90-nil-request-frame branch from 405ee49 to a9d9868 Compare January 14, 2022 11:19

imiric marked this pull request as ready for review January 14, 2022 12:01

Ivan Mirić added 3 commits February 4, 2022 10:16

Remove relying on Content-Length header for determining body size

7a2858d

Ideally we shouldn't need to do this, and with the previous change of emitting response metrics on LoadingFinished, fetchBody() is more reliable.

Skip redirect responses when returning response body size

0013d43

imiric force-pushed the fix/90-nil-request-frame branch from 650cc5f to 3295ede Compare February 4, 2022 09:18

inancgumus approved these changes Feb 4, 2022

View reviewed changes

common/network_manager.go Outdated Show resolved Hide resolved

common/response.go Show resolved Hide resolved

common/network_manager.go Outdated Show resolved Hide resolved

common/network_manager.go Outdated Show resolved Hide resolved

common/response.go Outdated Show resolved Hide resolved

imiric force-pushed the fix/90-nil-request-frame branch from 3295ede to 3369bfc Compare February 4, 2022 11:01

inancgumus reviewed Feb 4, 2022

View reviewed changes

common/response.go Outdated Show resolved Hide resolved

Ivan Mirić added 2 commits February 4, 2022 12:05

Enhance NetworkManager and Response logging

b26247d

Fallback to default values for Response metrics if Response is nil

f47e144

imiric force-pushed the fix/90-nil-request-frame branch from 3369bfc to 49ae1bd Compare February 4, 2022 11:05

imiric force-pushed the fix/90-nil-request-frame branch from 49ae1bd to 286eafa Compare February 4, 2022 11:17

Ignore blob URLs for emitting response metrics

f29064e

imiric force-pushed the fix/90-nil-request-frame branch from 286eafa to f29064e Compare February 4, 2022 11:42

imiric merged commit 136e7f6 into main Feb 4, 2022

imiric deleted the fix/90-nil-request-frame branch February 4, 2022 11:58

imiric mentioned this pull request Feb 4, 2022

waitForXXX are not working grafana/k6#4469

Closed

imiric mentioned this pull request Apr 1, 2022

Fix negative metric emission #288

Merged

More reliable response body size and data_received metric emission #117

More reliable response body size and data_received metric emission #117

Uh oh!

Conversation

imiric commented Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robingustafsson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inancgumus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imiric commented Dec 24, 2021

Uh oh!

inancgumus commented Dec 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imiric commented Jan 13, 2022

Uh oh!

inancgumus commented Jan 13, 2022

Uh oh!

imiric commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imiric commented Feb 4, 2022

Uh oh!

imiric commented Feb 4, 2022

Uh oh!

inancgumus commented Feb 4, 2022

Uh oh!

imiric commented Feb 4, 2022

Uh oh!

inancgumus left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

inancgumus commented Feb 4, 2022

Uh oh!

inancgumus commented Feb 4, 2022

Uh oh!

imiric commented Feb 4, 2022

Uh oh!

inancgumus commented Feb 4, 2022

Uh oh!

Uh oh!

imiric commented Feb 4, 2022

Uh oh!

inancgumus commented Feb 4, 2022

Uh oh!

imiric commented Feb 4, 2022

Uh oh!

inancgumus commented Feb 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More reliable response body size and `data_received` metric emission #117

More reliable response body size and `data_received` metric emission #117

imiric commented Nov 16, 2021 •

edited

Loading

robingustafsson left a comment •

edited

Loading

inancgumus commented Dec 29, 2021 •

edited

Loading

imiric commented Jan 14, 2022 •

edited

Loading

inancgumus left a comment •

edited

Loading