Skip to content

got some problem when uploading files #141

@Kenny-Ch

Description

@Kenny-Ch

when i do training, i found that wandb suddenly can't upload wandb-metadata.json. After training , I try to upload the file with wandb sync and I got these error.
image

wandb sync wandb/run-20240826_190835-g7b6iqjc/
Find logs at: /home/JIng/kenny/Project/personal_copilot/training/wandb/debug-cli.JIng.log
Syncing: http://localhost:8080/charly/personal-code-copilot/runs/g7b6iqjc ... wandb: ERROR Error uploading "code/train.py": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-metadata.json": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-summary.json": CommError, <Response [507]>
wandb: ERROR Error uploading "conda-environment.yaml": CommError, <Response [507]>
wandb: ERROR Error uploading "output.log": CommError, <Response [507]>
wandb: ERROR Error uploading "requirements.txt": CommError, <Response [507]>
wandb: ERROR Error uploading "config.yaml": CommError, <Response [507]>

and I also got the error when I running wandb verify

Default host selected: http://localhost:8080
Find detailed logs for this test at: /tmp/tmp5033o82e/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................Traceback (most recent call last):
  File "/home/JIng/miniconda3/envs/starcode-3b/bin/wandb", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/cli/cli.py", line 2960, in verify
    url_success, url = wandb_verify.check_graphql_put(api, host)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/sdk/verify/verify.py", line 400, in check_graphql_put
    contents = read_file.read()
               ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'read'

here is some error log I found in /var/log

./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:34:12.313204066Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058451284Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058625177Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 33:FlatRunsMigrator paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 33:FlatRunsMigrator paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:12.317314097Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:36:12.317093934Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:37:12.316296925Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:38:12.315714385Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla.log:{"level":"ERROR","time":"2024-08-29T05:32:51.071134593Z","info":{"program":"gorilla","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":59},"data":{"dd.service":"gorilla","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b","http":{"url":"http://192.168.104.9/oidc/auth","method":"GET","headers":{"Host":"192.168.104.9","Connection":"close","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0","Accept-Encoding":"gzip, deflate","Accept-Language":"zh,en-US;q=0.9,en;q=0.8","X-Original-Uri":"/system-admin/static/css/main.c9951160.css.map","X-Forwarded-For":"192.168.104.9"}}},"message":"Not logged in","dd.trace_id":"10464612527120353434","error":{"kind":"*errors.errorString","message":"Not logged in"}}
./mysql.log:2024-08-29T05:33:11.670654Z 27 [Note] Aborted connection 27 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670658Z 22 [Note] Aborted connection 22 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670773Z 25 [Note] Aborted connection 25 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670709Z 23 [Note] Aborted connection 23 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670743Z 21 [Note] Aborted connection 21 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670767Z 24 [Note] Aborted connection 24 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670656Z 28 [Note] Aborted connection 28 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670788Z 17 [Note] Aborted connection 17 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670797Z 26 [Note] Aborted connection 26 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670889Z 20 [Note] Aborted connection 20 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670895Z 19 [Note] Aborted connection 19 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670958Z 18 [Note] Aborted connection 18 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.768660Z 7 [Note] Aborted connection 7 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194361Z 15 [Note] Aborted connection 15 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194462Z 8 [Note] Aborted connection 8 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194516Z 11 [Note] Aborted connection 11 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194523Z 9 [Note] Aborted connection 9 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194478Z 13 [Note] Aborted connection 13 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)

and here is the debug bundle:
debug.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions