Skip to content
This repository was archived by the owner on Jan 30, 2025. It is now read-only.

Conversation

imiric
Copy link
Contributor

@imiric imiric commented Nov 16, 2021

This is related to #90, but doesn't really fix the issue as the current behavior is correct. See this comment.

Essentially Frame being nil is a valid state for responses to preflight/OPTIONS requests, so the fix in b2052bb is all we need to do for handling this case.

What this PR does is:

  • move the data_received metric emission to the LoadingFinished event handler, which according to this comment is more reliable. This results in more accurate data_received metric values.

    Before, on main (54cd293) with the script from this comment:

    data_received....................: 17 MB  737 kB/s
    

    After, on a9d9868:

    data_received....................: 39 MB  754 kB/s
    

    I confirmed this is more accurate by comparing with the Chrome DevTools Network tab.

    This change also produces less error fetching response body warning events, though it doesn't eliminate them entirely, except when running the test suite.

  • enhance the logging so that we can see the type of request when Frame is nil. In all cases I've seen with the script from Tom these are preflight/OPTIONS requests which don't have an associated FrameID.

@imiric imiric force-pushed the fix/90-nil-request-frame branch from 46d16b5 to e6d688f Compare November 16, 2021 16:23
@imiric imiric changed the base branch from main to fix/109-panic-nil-networkidlectx November 16, 2021 16:23
Copy link
Collaborator

@robingustafsson robingustafsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll let @inancgumus comment on the synchronization part :)

Copy link
Collaborator

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, a lot of work, good job. Minor requests.

@imiric imiric force-pushed the fix/109-panic-nil-networkidlectx branch 2 times, most recently from 2f50e4d to fa39ea5 Compare November 18, 2021 13:51
Base automatically changed from fix/109-panic-nil-networkidlectx to main November 18, 2021 14:10
@imiric imiric force-pushed the fix/90-nil-request-frame branch from e6d688f to 8718977 Compare November 18, 2021 14:42
@imiric imiric force-pushed the fix/90-nil-request-frame branch from 8718977 to 405ee49 Compare December 24, 2021 16:03
@imiric
Copy link
Contributor Author

imiric commented Dec 24, 2021

After some more research into this, for a reason I'm unable to explain, we just don't receive a ResponseReceived CDP event for some requests. 😕

We get RequestWillBeSent and LoadingFinished, but between those we should also get ResponseReceived, which happens for most request flows. This is not only the case for preflight/OPTIONS requests, but some standard GET or POST requests as well. I haven't noticed a pattern where it happens for certain requests or URLs only; it can basically happen at any point with the Vistaprint script.

So it's not a matter of synchronizing how events are processed (thankfully! 😅), but the fact we just don't get the ResponseReceived event.

I did a hacky fallback so that we at least don't panic when this happens, but metrics for these requests will be less useful.

I'll pick this back up when I'm back from vacation, but in the meantime @inancgumus, if you have some time, please test the Vistaprint script and any others you have on this branch and compare it to main. From my tests error fetching response body is much less frequent (it's not printed anymore in tests when running with go test -v), and the data_received metric seems more accurate now (I compared it with the DevTools Network tab), though it's probably not 100% accurate.

@imiric imiric requested a review from inancgumus December 24, 2021 16:27
@inancgumus
Copy link
Collaborator

inancgumus commented Dec 29, 2021

From my tests, error fetching response body is much less frequent

It was displayed 16 times on the main branch:

# main
$ go test ./tests -v -count=1 |& grep 'error fetching response body' | wc -l
16

I'm no longer observing these error log messages on this branch:

# fix/90-nil-request-frame
$ go test ./tests -v -count=1 |& grep 'error fetching response body' | wc -l
0

Using Tom's original script, I see the errors three times on the main branch:

# main
$ XK6_BROWSER_LOG=debug xk6 run -q ../tests/tom.js |& grep 'error fetching response body' | wc -l
3

However, I see them twice on this branch:

# fix/90-nil-request-frame
$ XK6_BROWSER_LOG=debug xk6 run -q ../tests/tom.js |& grep 'error fetching response body' | wc -l
2

I wrote an improved version of Tom's script that reaches almost to the checkout page:

Here's the modified script.
import { sleep, group } from 'k6';
import launcher from 'k6/x/browser';

import { randomIntBetween } from 'https://jslib.k6.io/k6-utils/1.1.0/index.js';

export const options = {};

let browser, context, page;

const pauseMin = 0;
const pauseMax = 1;

export default function() {
  initialize();

  navigateToHomepage();
  clickBusinessCards();
  clickBrowseDesigns();
  clickDesign();
  clickCustomizeDesign();
  clickNext();
  continueWithoutBack();
  customerReviewed();
  addToCart();
  continuePastRecommendations();
  continueToCart();
  goToCheckout();
}

function initialize() {
  browser = launcher.launch('chromium', {
    args: [],
    debug: false,
    devtools: false,
    env: {},
    executablePath: null,
    headless: false,
    ignoreDefaultArgs: [],
    proxy: {},
    slowMo: '500ms',
    timeout: '30s',
  });

  context = browser.newContext({
    ignoreHTTPSErrors: true,
    screen: {width: 1920, height: 1080},
    viewport: {width: 1920, height: 1080},
  });

  page = context.newPage();
}

function navigateToHomepage() {
  console.log('Starting navigateToHomepage...');

  group("Navigate to Homepage", () => {
    page.goto('https://vistaprint.ca', { waitUntil: 'networkidle' });
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickBusinessCards() {
  console.log('Starting clickBusinessCards...');

  group("Click Business Cards", () => {
    page.click("//span[text()='Business Cards']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickBrowseDesigns() {
  console.log('Starting clickBrowseDesigns...');

  group("Click Browse Designs", () => {
    page.waitForSelector("//a[text()='Browse designs']");
    page.click("//a[text()='Browse designs']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickDesign() {
  console.log('Starting clickDesign...');

  const designOption = page.waitForSelector(`li.design-tile >> nth=0`);
  console.log(designOption.textContent());
  
  group("Select Design Option", () => {
    page.click(`li.design-tile >> nth=0`)
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickCustomizeDesign() {
  console.log('Starting clickCustomizeDesign...');

  
  group("Click Customize Design", () => {
    page.click("//a[text()='Customize this design']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function clickNext() {
  console.log('Starting clickNext...');

  group("Click Next", () => {
    page.click("//button[text()='Next']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function continueWithoutBack() {
  console.log('Starting continueWithoutBack...');

  group("Click Continue Without Back", () => {
    page.click("//button[text()='Continue without back']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function customerReviewed() {
  console.log('Starting customerReviewed...');

  page.click("//input[@value='customerReviewed']");

  sleep(5);
  
  group("Click Customer Reviewed", () => {
    page.click("//button[text()='Continue']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function addToCart() {
  console.log('Starting addToCart...');

  group("Click Add to Cart", () => {
    page.click("//button[text()='Add to cart']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function continuePastRecommendations() {
  console.log('Starting continuePastRecommendations...');

  group("Click Continue (Recommendations)", () => {
    page.waitForSelector("//a[text()='Continue']");
    page.click("//a[text()='Continue']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function continueToCart() {
  console.log('Starting continueToCart...');

  group("Click Continue to Cart", () => {
    page.click("//a[text()='Continue to cart']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

function goToCheckout() {
  console.log('Starting goToCheckout...');

  group("Click Checkout", () => {
    page.click("//a[text()='Checkout']");
  });

  sleep(randomIntBetween(pauseMin, pauseMax));
}

@imiric
Copy link
Contributor Author

imiric commented Jan 13, 2022

@inancgumus It's difficult to judge the improvement based on that Vistaprint script since it's a very dynamic site and we use random sleeps in the script you're testing with, so the amount of requests will be different each time. Moreover, I see the error fetching response body message for blob URLs (e.g. blob:https://www.vistaprint.ca/1a44f750-8ddd-4d3b-a454-8f1ee270946e), which we should probably ignore for determining body size like we do for data URLs, since they're internal URLs used only in the browser.

This PR isn't meant to completely eliminate the error fetching response body message, caused by an underlying No data found for resource with given identifier (-32000) error, for which we don't have a fix yet. What it does is move response metric emission to the LoadingFinished event handler, which results in more accurate data_received metrics (you can confirm this within Chrome DevTools, or by packet capture).

E.g. on main running that example script:

data_received....................: 17 MB  737 kB/s

And on this branch rebased on current main:

data_received....................: 39 MB  754 kB/s

That said, it's surprising that you get a ~4x increase of the error on this branch, or that you have that many to begin with. My difference is +/- ~5, and usually not more than 10 in total on either branch.

I'll run some more tests and add ignoring of blob URLs.

@inancgumus
Copy link
Collaborator

It's difficult to judge the improvement based on that Vistaprint script since it's a very dynamic site and we use random sleeps in the script you're testing with, so the amount of requests will be different each time.
That said, it's surprising that you get a ~4x increase of the error on this branch, or that you have that many to begin with. My difference is +/- ~5, and usually not more than 10 in total on either branch.

I agree 👍 I was measuring this in an earlier version of the main branch, things probably have changed since then 🤷

we should probably ignore for determining body size like we do for data URLs, since they're internal URLs used only in the browser

👍

@imiric
Copy link
Contributor Author

imiric commented Jan 14, 2022

@inancgumus This is ready for review. Please test again when compared to main.

Like I mentioned above, comparing the amount of error fetching response body messages is unreliable, since the Vistaprint test script is so dynamic. For example, sometimes running it on main would fail after ~30s with the error cannot call function on expression and would finish at the Click Add to Cart stage, while running it on this branch would run for longer (~1m), but still fail at the Click Continue (Recommendations) stage, so it makes sense that more error messages would be produced. I also reverted the removal of checking Content-Length which should make us rely less on CDP if the header exists.

Instead of focusing on the message count, you can confirm that data_received reporting is more accurate now, and that the message is gone when running the test suite.

I'm frankly unsure if we should merge this as it's difficult to judge the improvement or whether it's not breaking anything else, but from a metrics standpoint it seems like an improvement.

@imiric imiric marked this pull request as ready for review January 14, 2022 12:01
@imiric
Copy link
Contributor Author

imiric commented Feb 4, 2022

@inancgumus WDYT about merging this?

Ivan Mirić added 3 commits February 4, 2022 10:16
See ChromeDevTools/devtools-protocol#44 (comment)

There is a race condition here where sometimes we receive the LoadingFinished
event before we finish processing ResponseReceived, so we don't set the Response
on the Request and req.response is nil in onLoadingFinished resulting in a nil
pointer panic, but I wasn't able to synchronize this properly.
Ideally we shouldn't need to do this, and with the previous change of
emitting response metrics on LoadingFinished, fetchBody() is more
reliable.
@imiric imiric force-pushed the fix/90-nil-request-frame branch from 650cc5f to 3295ede Compare February 4, 2022 09:18
@imiric
Copy link
Contributor Author

imiric commented Feb 4, 2022

I pushed a rebase on main.

A smaller test that shows the improvement is the fillform.js example I've been testing for grafana/k6#4469.

When running on current main (0f2ba32):

$ XK6_BROWSER_LOG=info XK6_HEADLESS=1 xk6 run examples/fillform.js
[...]
WARN[0001] error fetching response body for 'https://test.k6.io/': No data found for resource with given identifier (-32000)  category=cdp elapsed="0 ms" goroutine=75
ERRO[0003] unknown command or event "Page.frameScheduledNavigation"
ERRO[0003] unknown command or event "Page.frameClearedScheduledNavigation"
WARN[0003] error fetching response body for 'https://test.k6.io/my_messages.php': No data found for resource with given identifier (-32000)  category=cdp elapsed="0 ms" goroutine=75
ERRO[0007] unknown command or event "Page.frameScheduledNavigation"
ERRO[0007] unknown command or event "Page.frameClearedScheduledNavigation"
WARN[0007] error fetching response body for 'https://test.k6.io/login.php': No data found for resource with given identifier (-32000)  category=cdp elapsed="0 ms" goroutine=75

And on this branch (3295ede):

$ XK6_BROWSER_LOG=info XK6_HEADLESS=1 xk6 run examples/fillform.js
[...]
ERRO[0003] unknown command or event "Page.frameScheduledNavigation"
ERRO[0003] unknown command or event "Page.frameClearedScheduledNavigation"
ERRO[0007] unknown command or event "Page.frameScheduledNavigation"
ERRO[0007] unknown command or event "Page.frameClearedScheduledNavigation"

@inancgumus
Copy link
Collaborator

OK let me review this today.

@imiric
Copy link
Contributor Author

imiric commented Feb 4, 2022

And to confirm that the data_received metric is now more accurate, here's a simple example script:

// simple.js
import launcher from 'k6/x/browser';
import { sleep } from 'k6';

export default function() {
  const browser = launcher.launch('chromium', {
    headless: true,
  });
  const context = browser.newContext();
  const page = context.newPage();

  page.goto('https://test.k6.io/', { waitUntil: 'networkidle' });
  sleep(1);  // waitFor workaround

  page.close();
  browser.close();
}

Running this on main (0f2ba32) with XK6_BROWSER_LOG=info xk6 run -o json=out-main.json examples/simple.js and then filtering out the data_received samples:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url)' out-main.json
{"type":"Point","data":{"time":"2022-02-04T06:44:31.082861+01:00","value":174,"tags":{"from_cache":"false","from_prefetch_cache":"false","from_service_worker":"false","group":"","method":"GET","proto":"http/1.1","scenario":"default","status":"200","url":"https://test.k6.io/"}},"metric":"data_received"}
{"type":"Point","data":{"time":"2022-02-04T06:44:31.287852+01:00","value":5076,"tags":{"from_cache":"false","from_prefetch_cache":"false","from_service_worker":"false","group":"","method":"GET","proto":"http/1.1","scenario":"default","status":"200","url":"https://test.k6.io/static/css/site.css"}},"metric":"data_received"}
{"type":"Point","data":{"time":"2022-02-04T06:44:31.87362+01:00","value":10112,"tags":{"from_cache":"false","from_prefetch_cache":"false","from_service_worker":"false","group":"","method":"GET","proto":"http/1.1","scenario":"default","status":"200","url":"https://test.k6.io/static/js/prisms.js"}},"metric":"data_received"}

Only the sizes:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | {url: .data.tags.url, received: .data.value}' out-main.json
{"url":"https://test.k6.io/","received":174}
{"url":"https://test.k6.io/static/css/site.css","received":5076}
{"url":"https://test.k6.io/static/js/prisms.js","received":10112}

Summing up:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | .data.value' out-main.json | paste -sd+ | bc
15362

Comparing this to fix/90-nil-request-frame (3295ede) with XK6_BROWSER_LOG=info xk6 run -o json=out-117.json examples/simple.js:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | {url: .data.tags.url, received: .data.value}' out-117.json
{"url":"https://test.k6.io/","received":11277}
{"url":"https://test.k6.io/static/css/site.css","received":5076}
{"url":"https://test.k6.io/static/js/prisms.js","received":10112}

Sum:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | .data.value' out-117.json | paste -sd+ | bc
26465

To confirm which one is correct:

$ jq -c '. | select(.type=="Point" and .metric == "data_received" and .data.tags.url) | .data.tags.url' out-main.json | xargs -I{} sh -c 'echo {}; curl -is {} | wc --bytes'
https://test.k6.io/
11277
https://test.k6.io/static/css/site.css
5076
https://test.k6.io/static/js/prisms.js
10112

😄

Copy link
Collaborator

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it's good. Glad that we can finally get rid of these ancient error messages!

But I'd like to separate k6 concerns from the browser concerns as much as we can.

sampleTags := k6stats.IntoSampleTags(&tags)
k6stats.PushIfNotDone(m.ctx, state.Samples, k6stats.ConnectedSamples{
	...
})

So, would you like to move this part of the code to another function like:

func (m *NetworkManager) emitResponseMetrics(resp *Response) {
	...
	emitK6Metrics(...)
	...
}

func emitK6Metrics(...) {
	sampleTags := k6stats.IntoSampleTags(&tags)
	k6stats.PushIfNotDone( ... )
	...
	k6stats.PushIfNotDone( ... )
	...
}

I know what you're going to say: "I thought you were against creating functions that are not used by multiple functions". I'm not against them as long as they are contextual and serve as a chance to make things better. I see it as a first step in refactoring and simplifying the extension code later.

@inancgumus
Copy link
Collaborator

Thanks for the short jq tutorial as well, nice work! 😄 👍

@inancgumus
Copy link
Collaborator

Btw, are we going to have a test for this?

@imiric
Copy link
Contributor Author

imiric commented Feb 4, 2022

I'm not sure how to test this TBH without an E2E test like that script.

Hhmm though we might be able to get the metric for some httpmultibin endpoint and assert on that?

I'll look into it.

@inancgumus
Copy link
Collaborator

Hhmm though we might be able to get the metric for some httpmultibin endpoint and assert on that?

It sounds like a plan!

@imiric imiric force-pushed the fix/90-nil-request-frame branch from 3295ede to 3369bfc Compare February 4, 2022 11:01
@imiric imiric force-pushed the fix/90-nil-request-frame branch from 3369bfc to 49ae1bd Compare February 4, 2022 11:05
@imiric
Copy link
Contributor Author

imiric commented Feb 4, 2022

I'd like to separate k6 concerns from the browser concerns as much as we can.

I think that makes sense, but this function is not the place to do it. All of these metrics are "k6 metrics". The custom ones we emit is mostly done from onPageLifecycle().

@inancgumus
Copy link
Collaborator

I don't mean that but never mind, no big deal.

@imiric imiric force-pushed the fix/90-nil-request-frame branch from 49ae1bd to 286eafa Compare February 4, 2022 11:17
@imiric imiric force-pushed the fix/90-nil-request-frame branch from 286eafa to f29064e Compare February 4, 2022 11:42
@imiric
Copy link
Contributor Author

imiric commented Feb 4, 2022

@inancgumus A test for this would be similar to e6db54c, and we decided to revert that because of waitFor* issues (#200). So let's wait until that is fixed to add a test for both #40 and this. It would likely be the same test, actually.

Let me know if there's anything pending here, otherwise let's merge it.

(Sorry for the last force push... It was only for adding a test in f29064e.)

@inancgumus
Copy link
Collaborator

Fair enough, let's merge it 🚀 Maybe create an issue for testing both, so we don't forget? And maybe add a note next to grafana/k6#4469.

@imiric imiric merged commit 136e7f6 into main Feb 4, 2022
@imiric imiric deleted the fix/90-nil-request-frame branch February 4, 2022 11:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants