Skip to content

Conversation

@mumoshu
Copy link
Collaborator

@mumoshu mumoshu commented Jul 4, 2021

This is a POC of GitHub Webhook Delivery Forwarder for a demonstration purpose.

GitHub has recently released Webhook Deliveries API. It is usually used to browse historical webhook events on a repository or organizational webhook and trigger redeliveries of the events via API.

This POC exploits the API to poll historical webhook events and then forward unseen events as webhook requests to the target URL. The end result is that you can use actions-runner-controller's webhook-based autoscaler without exposing the webhook server to the Internet.

| GitHub | <----Poll----| This Forwarder |----HTTP POST----> | Webhook-based Autoscaler |----Update----> HRA, RunnerDeployment

This is important for me, as one of my original motivations to use self-hosted runners was to avoid exposing the webhook server to the Internet for triggering CI builds.

actions-runner-controller and self-hosted runners did allow me to trigger CI builds without exposing the server. But if you've chosen to use webhook-based autoscaler for faster autoscaling, you had to expose another one. With this feature, you can have everything without exposing the server to the Internet at all.

Usage

To forward any webhook events occurred on a repo, for example actions-runner-controller/mumoshu-actions-test, you run it like:

go run ./pkg/githubwebhookdeliveryforwarder/cmd \
  --rule '{"from":["githuborg1/repo1","githuborg2"],"to":"http://foo-github-webhook-server.actions-runner-system.svc.cluster.local:80/","hook":{"config":{"url":"https://blackhole.valid_domain_you_own.com/"},"events":["check_run","push"]}}'

Given the above config, the forwarder will first look for a repository hook already installed on the repository githuborg1/repo1. If not found, it will create a hook from the config specified under the hook field, in this case {"config":{"url":"https://blackhole.valid_domain_you_own.com/"},"events":["check_run","push"]}. The config field in the hook corresponds to the same field in [create repository hook API](https://docs.github.com/en/rest/reference/repos#create-a-repository-webhook). You usually only need to specify config urlandevents`.

To be clear, please note that there's one limitation to this feature. That is, you still need to have a "dummy" webhook configuration created on GitHub. Otherwise, there will be no "webhook deliveries" attempted by GitHub at all.

As explained before, you can specify the hook config via --rule flag to let the forwarder create the webhook configuration on your behalf. But you still need a valid config url in it, which is recommended to point a blackhole or logging-only HTTP server that returns 200 on webhook events sent from GitHub. Again, it is required to let GitHub try delivering webhook requests in the first place.

Theoretically, the webhook configuration can point to any valid HTTP endpoint. But I strongly suggest you run a very fast HTTP server that is associated with your own subdomain.

As I've written "blackhole or logging-only", the HTTP server doesn't need to "process" requests as all it needs to do is return HTTP responses to webhook requests, not stress testing your other production servers.

Implementation Notes

This pull request doesn't include the chart changes and kustomize config changes that is needed to actually deploy it as a Kubernetes deployment. But it should be relatively easy to add.

Note that this POC currently depends on a fork of go-github v36 that can be found at mumoshu/go-github@b399073 (and later mumoshu/go-github@0a3739c). It is a temporary fork and I'm going to submit a pull request to go-github soon.

Lastly, this can eventually be a dedicated project outside of the actions-runner-controller project. But I wanted to start this a companion project so that we can avoid over and leaky abstraction and provide values to actions-runner-controller users earlier.

@sledigabel
Copy link
Contributor

Hi @mumoshu,

This looks super interesting!
I have a few questions around this just to make sure I understand.

  1. To make it fairly responsive (with a similar user experience as the webhook server), you'd need a frequent polling. Won't this cause the API calls to get throttled pretty quickly?
  2. How is this different compared to the "classic" polling of the job queue? From a conceptual point of you it would still poll and create runners based on job requests?

Rather than replacing it, I see this work as very complimentary to the webhook server, ensuring that every webhook has been handled, whether real time or deferred, more like a safety net basically. On top of the autoscaler policy based on %busy + job queue, there shouldn't be any job left behind with this?

I'm sure there's something I'm missing here :-) I'd love to be in a position where no webhook server is exposed and we still get a good responsiveness overall in the autoscaling process.

@mumoshu
Copy link
Collaborator Author

mumoshu commented Jul 12, 2021

@sledigabel Hey! Thanks for your comments.

To make it fairly responsive (with a similar user experience as the webhook server), you'd need a frequent polling. Won't this cause the API calls to get throttled pretty quickly?

I think you're mostly right. It depends on your load.

Each list hook deliveries API call can get 100 deliveries and a get hook delivery API can get the payload of the delivery per call. Assuming GitHub API rate limit is 15000 calls per hour, theoretically, you can scale up to 15000 webhooks events per hour per a forwarder instance.

A forwarder instance can receive/forward multiple repositories and organisational hooks, but you can also have one instance per repository for horizontal scalability. In the end, your theoretical limit is approx 15000 webhook events per repository. If that's too few, you can have a real webhook server exposed to the Internet.

How is this different compared to the "classic" polling of the job queue? From a conceptual point of you it would still poll and create runners based on job requests?

Assuming the "classic" polling of the job queue means TotalNumberOfQueuedAndInProgressWorkflowRuns, it works only for repository-wide runners. In contract the webhook-based autoscaler works uniformly against both repository-wide and organizational runners. The biggest downside of the webhook-autoscaler (to me) was the necessity to expose it to the Internet. The delivery forwarder solves that.

Rather than replacing it, I see this work as very complimentary to the webhook server, ensuring that every webhook has been handled, whether real time or deferred, more like a safety net basically.

I was thinking this as a way to use the webhook-based autoscaler without exposing the server to the Internet :) But yes, it can be used to make it resilient, too.

On top of the autoscaler policy based on %busy + job queue, there shouldn't be any job left behind with this?

I believe so :)

@mumoshu
Copy link
Collaborator Author

mumoshu commented Jul 14, 2021

The only side-effect of adding this into actions-runner-controller would be the new version of go-github and that involves some code change in imports.

Please let me merge this to avoid potential conflicts with future pull requests.
I gave the name hookdeliveryforwarder to the package of webhook delivery forwarder. The name isn't stabilized. It should work, but it isn't supported. Use at your own risk!

Please read pkg/hookdeliveryforwarder/README.md for more information.

@mumoshu mumoshu merged commit f858e2e into master Jul 14, 2021
@mumoshu mumoshu deleted the github-webhook-delivery-forwarder-poc branch July 14, 2021 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants