Skip to content

Conversation

@m8vago
Copy link
Contributor

@m8vago m8vago commented Aug 23, 2023

Rework agent connection handling. From now on there are two types of tokens.

  • Install tokens: Agents receive this as the GRPC_TOKEN env. They exchange it on the first run to a connection token.
  • Connection tokens: These are the ones, which are used for regular connections.

During an update, agents gets a new token.

Also there is a workaround for legacy agents, whom are getting shutdown and removed by the newly updated agent, so they do not stuck in a crashloop.

@m8vago m8vago requested a review from a team as a code owner August 23, 2023 11:07
@m8vago m8vago marked this pull request as draft August 23, 2023 11:07
@m8vago m8vago force-pushed the feat/oneshot-agent-token branch 2 times, most recently from 74e1383 to 8fe6d45 Compare August 23, 2023 14:34
@m8vago m8vago force-pushed the feat/oneshot-agent-token branch from 8fe6d45 to 387dac6 Compare August 23, 2023 14:47
@github-actions github-actions bot added pr:feat source:agent The scope of the issue or pull request is agent. source:web The scope of the issue or pull request is web. lang:golang lang:typescript lang:sql source:proto Protobuf related changes. and removed pr:feat labels Aug 23, 2023
@m8vago m8vago marked this pull request as ready for review August 23, 2023 16:16
Copy link
Collaborator

@polaroi8d polaroi8d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested yet, add some questions and reviews. I started commenting on all Golang error handling but after a few I stopped, we start sentences with lowercase letters.

@polaroi8d
Copy link
Collaborator

When I attempted to update my currently running agent, I encountered a segmentation violation.

{"level":"debug","containerID":"15459d1e3593cbc01dde6e65fee6895589a71e156dc3ad8cba9215d8d88c1aa9","time":"2023-08-24T13:52:25Z","message":"Created new dagent"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xe9a3ae]

goroutine 290 [running]:
github.com/dyrector-io/dyrectorio/golang/pkg/dagent/update.RewriteUpdateErrors({0x0, 0x0})
	/__w/dyrectorio/dyrectorio/golang/pkg/dagent/update/update.go:214 +0x8e
github.com/dyrector-io/dyrectorio/golang/pkg/dagent/update.SelfUpdate({0x126fa70, 0xc0004215f0}, {0xc0007201d0, 0x6}, 0xa?)
	/__w/dyrectorio/dyrectorio/golang/pkg/dagent/update/update.go:116 +0x111
github.com/dyrector-io/dyrectorio/golang/internal/grpc.executeUpdate({0x126fa70, 0xc0004215f0}, 0xc00032e000?, 0xc0009022a0?)
	/__w/dyrectorio/dyrectorio/golang/internal/grpc/grpc.go:591 +0x69
created by github.com/dyrector-io/dyrectorio/golang/internal/grpc.grpcProcessCommand
	/__w/dyrectorio/dyrectorio/golang/internal/grpc/grpc.go:231 +0x66a

@m8vago
Copy link
Contributor Author

m8vago commented Aug 28, 2023

When I attempted to update my currently running agent, I encountered a segmentation violation.

{"level":"debug","containerID":"15459d1e3593cbc01dde6e65fee6895589a71e156dc3ad8cba9215d8d88c1aa9","time":"2023-08-24T13:52:25Z","message":"Created new dagent"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xe9a3ae]

goroutine 290 [running]:
github.com/dyrector-io/dyrectorio/golang/pkg/dagent/update.RewriteUpdateErrors({0x0, 0x0})
	/__w/dyrectorio/dyrectorio/golang/pkg/dagent/update/update.go:214 +0x8e
github.com/dyrector-io/dyrectorio/golang/pkg/dagent/update.SelfUpdate({0x126fa70, 0xc0004215f0}, {0xc0007201d0, 0x6}, 0xa?)
	/__w/dyrectorio/dyrectorio/golang/pkg/dagent/update/update.go:116 +0x111
github.com/dyrector-io/dyrectorio/golang/internal/grpc.executeUpdate({0x126fa70, 0xc0004215f0}, 0xc00032e000?, 0xc0009022a0?)
	/__w/dyrectorio/dyrectorio/golang/internal/grpc/grpc.go:591 +0x69
created by github.com/dyrector-io/dyrectorio/golang/internal/grpc.grpcProcessCommand
	/__w/dyrectorio/dyrectorio/golang/internal/grpc/grpc.go:231 +0x66a

I have found the problem. Looks like in func RewriteUpdateErrors() it is not handled, what happens when the err is nil. I guess the old agent got a shutdown message before it could run into that part of the code or something. I will look for a workaround on the crux's side.

@m8vago
Copy link
Contributor Author

m8vago commented Aug 29, 2023

@polaroi8d Pushed the fixes for the legacy agent update.
If you want to test it you should escape some lines in the agent.service.ts's agentVersionSupported() function. Specifically:

    if (this.configService.get<string>('NODE_ENV') !== PRODUCTION) {
      return true
    }

Also be aware of that the agents are force pulling, so they are going to overwrite your locally built image during the updating process. :(
To fix that i've copied the GRPC_TOKEN from the new container's envs and started it with docker run.

@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2023

Codecov Report

Patch coverage: 8.99% and project coverage change: -1.34% ⚠️

Comparison is base (5c2ff48) 25.46% compared to head (468e1b2) 24.13%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #796      +/-   ##
===========================================
- Coverage    25.46%   24.13%   -1.34%     
===========================================
  Files           60       62       +2     
  Lines         5709     5954     +245     
===========================================
- Hits          1454     1437      -17     
- Misses        4152     4411     +259     
- Partials       103      106       +3     
Files Changed Coverage Δ
golang/internal/config/secret.go 24.67% <0.00%> (-31.58%) ⬇️
golang/internal/dogger/dogger.go 0.00% <ø> (ø)
golang/internal/grpc/grpc.go 0.00% <0.00%> (ø)
golang/pkg/crane/k8s/secret.go 0.00% <0.00%> (ø)
golang/pkg/crane/k8s/secret_store.go 0.00% <0.00%> (ø)
golang/pkg/dagent/config/config.go 0.00% <0.00%> (ø)
golang/pkg/dagent/update/update.go 39.01% <13.15%> (-5.80%) ⬇️
golang/pkg/dagent/config/secret.go 37.17% <37.50%> (-62.83%) ⬇️
golang/internal/config/jwt.go 79.24% <100.00%> (-7.64%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@m8vago m8vago force-pushed the feat/oneshot-agent-token branch from 49258d5 to 6e761b0 Compare August 30, 2023 11:07
@m8vago m8vago force-pushed the feat/oneshot-agent-token branch from 6e761b0 to db6915c Compare August 30, 2023 11:08
Copy link
Contributor

@robot9706 robot9706 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏗

Copy link
Contributor

@nandor-magyar nandor-magyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted small things, otherwise looks good, tested in Kubernetes, it works.

@m8vago
Copy link
Contributor Author

m8vago commented Aug 31, 2023

Noted small things, otherwise looks good, tested in Kubernetes, it works.

Thanks.

@m8vago m8vago merged commit e129d59 into develop Sep 1, 2023
@m8vago m8vago deleted the feat/oneshot-agent-token branch September 1, 2023 09:17
chandhuDev pushed a commit to chandhuDev/dyrectorio that referenced this pull request Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang:golang lang:sql lang:typescript pr:feat source:agent The scope of the issue or pull request is agent. source:proto Protobuf related changes. source:web The scope of the issue or pull request is web.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants