Skip to content

Conversation

@Furisto
Copy link
Member

@Furisto Furisto commented Jul 9, 2022

Description

Introduce limiting of new connections created per minute. For more info see the RFC.

Related Issue(s)

Related to #10651

How to test

  • Open a workspace in the preview environment
  • Annotate the pod with gitpd.io/netConnLimit=true
  • This should add additonal nftable rules to the network namespace of the pod (ssh into the node and then use nsenter -t workspacekitPid -n and nft list ruleset)
  • Metrics can be obtained with kubectl port-forward ds/ws-daemon 9500:9500, then get the metrics with curl -XGET localhost:9500/metrics
  • Run roblox scanner

Release Notes

Limit the rate at which network connections can be made by a workspace 

Werft options:

  • /werft with-preview

rule := fmt.Sprintf("add rule ip gitpod ratelimit ip protocol tcp ct state new meter connmeter { ip daddr & 0.0.0.0 timeout 1m limit rate over %d/minute } counter drop", limit)

// the go library does not support meters, so use nft instead
cmd := exec.CommandContext(ctx, "/usr/sbin/nft", rule)
Copy link
Contributor

@aledbf aledbf Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behavior when the version of nft on the host is different? (this is a known issue with k8s kubelet)

Edit: is kube-proxy not kubelet. Sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what kind of issues have been observed in the context of kubelet? I was not able to find anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aledbf Are you ok with merging this or would you like to discuss it further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should merge this without knowing the behavior of different nftable versions and if there is any conflict with the CNI bandwidth plugin.

Did you test this change in an ephemeral cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace preview

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used nftables to replace slirp4net, if there is an issue with versions, would we have more fundamental issues with workspace networking?

@aledbf I like the idea of testing with an ephemeral cluster. Can you add a related test here, so we can have a repeatable way to do in the future?

@Furisto let's leave on-hold till we've had a chance to test in an ephemeral cluster (to test for impacts to CNI bandwidth plugin).

@Furisto Furisto marked this pull request as ready for review July 11, 2022 19:39
@Furisto Furisto requested review from a team July 11, 2022 19:39
@github-actions github-actions bot added team: delivery Issue belongs to the self-hosted team team: workspace Issue belongs to the Workspace team labels Jul 11, 2022
Copy link
Contributor

@sagor999 sagor999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
adding hold in case if @aledbf comment requires resolving prior to merging
feel free to remove it.

/hold

Copy link
Contributor

@jenting jenting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would ❤️ to have the release note for this one.

@Furisto Furisto marked this pull request as draft July 17, 2022 18:10
rule := fmt.Sprintf("add rule ip gitpod ratelimit ip protocol tcp ct state new meter connmeter { ip daddr & 0.0.0.0 timeout 1m limit rate over %d/minute } counter drop", limit)

// the go library does not support meters, so use nft instead
cmd := exec.CommandContext(ctx, "/usr/sbin/nft", rule)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function runs before pivot_root, I believe it is safe to specify the nftable bin with the complete path at this stage. Am I right? If yes, please add a comment that why it is safe to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that the user may be able to override this bin with a custom image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not enter the mount namespace of the workspace, only the network namespace. The binary is part of ws-daemon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 Can I ask you to add a comment about that? Why? I think this is a very important security point that is difficult to see from the code.

func addConnectionLimitRule(limit int) error {
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()
rule := fmt.Sprintf("add rule ip gitpod ratelimit ip protocol tcp ct state new meter connmeter { ip daddr & 0.0.0.0 timeout 1m limit rate over %d/minute } counter drop", limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use set syntax instead of meter as official documentation says here?
https://wiki.nftables.org/wiki-nftables/index.php/Meters

Note that the meter keyword is obsolete, the dynamic set and map syntax is now preferred for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can. Thanks for the hint. I also will update this rule to use a named counter, so that we can scrape metrics about dropped packets more easily.

@roboquat roboquat added size/XL and removed size/L labels Jul 28, 2022
@stale
Copy link

stale bot commented Aug 10, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed ws-manager-api per code owners: change LGTM

@Furisto
Copy link
Member Author

Furisto commented Aug 23, 2022

/werft run

👍 started the job as gitpod-build-fo-connlimit.32
(with .werft/ from main)

@Furisto Furisto force-pushed the fo/connlimit branch 3 times, most recently from 5cd2b02 to 4f415dc Compare August 23, 2022 12:54
@Furisto
Copy link
Member Author

Furisto commented Aug 23, 2022

/werft run with-clean-slate-deployment

👍 started the job as gitpod-build-fo-connlimit.37
(with .werft/ from main)

@Furisto
Copy link
Member Author

Furisto commented Aug 23, 2022

@utam0k PTAL

Copy link
Contributor

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@Furisto
Copy link
Member Author

Furisto commented Aug 24, 2022

/unhold

@roboquat roboquat merged commit 0a9e6f7 into main Aug 24, 2022
@roboquat roboquat deleted the fo/connlimit branch August 24, 2022 12:14
@roboquat roboquat added the deployed: webapp Meta team change is running in production label Aug 29, 2022
@roboquat roboquat added the deployed: workspace Workspace team change is running in production label Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployed: webapp Meta team change is running in production deployed: workspace Workspace team change is running in production feature: connection-limiting release-note size/XXL team: delivery Issue belongs to the self-hosted team team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants