Skip to content

[CI][Infra]B200 Smoke Test Periodic Job Success Depends on Docker Pull Time #163786

@nWEIdia

Description

@nWEIdia

🐛 Describe the bug

Hi,

As stated in the title, on B200 runner and specifically this job, we are getting:

Error: The provided token has expired.

I had retried the past github actions, e.g. https://github.com/pytorch/pytorch/actions/runs/17965956992/job/51106912669 passed and initially it failed https://github.com/pytorch/pytorch/actions/runs/17965956992/attempts/1

Note that the retry had to be retrying the build altogether. See the retry history of the initial github actions:
https://github.com/pytorch/pytorch/actions/runs/17952091497/job/51081277758

The common of the failed github actions were: the docker pull taking 1+ hours. Is this exceeding some token expiration time?

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @pytorch/pytorch-dev-infra @ptrblck @eqy @tinglvv @atalman @huydhn @ZainRizvi @drisspg

Versions

TOT

Metadata

Metadata

Assignees

Labels

module: ciRelated to continuous integrationmodule: dockertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions