Skip to content

Conversation

@DesmonDay
Copy link
Contributor

@DesmonDay DesmonDay commented Oct 10, 2024

PR types

New features

PR changes

Others

Description

  1. Support sharding stage1 v2 for unified checkpoint.
  2. Refactor uc code.

@paddle-bot
Copy link

paddle-bot bot commented Oct 10, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Oct 10, 2024

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 11.13549% with 1620 lines in your changes missing coverage. Please review.

Project coverage is 52.84%. Comparing base (81ffc78) to head (dbd13df).
Report is 259 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/trainer/unified_checkpoint/utils.py 12.07% 364 Missing ⚠️
...p/trainer/unified_checkpoint/unified_checkpoint.py 11.34% 297 Missing ⚠️
...ddlenlp/trainer/unified_checkpoint/load_dynamic.py 9.44% 259 Missing ⚠️
...r/unified_checkpoint/sharding_split_param_utils.py 7.97% 173 Missing ⚠️
...nlp/trainer/unified_checkpoint/check_completion.py 9.37% 145 Missing ⚠️
...dlenlp/trainer/unified_checkpoint/async_handler.py 11.32% 141 Missing ⚠️
paddlenlp/trainer/unified_checkpoint/load_local.py 12.12% 116 Missing ⚠️
...rainer/unified_checkpoint/load_save_single_card.py 15.32% 116 Missing ⚠️
paddlenlp/utils/nested.py 14.28% 6 Missing ⚠️
paddlenlp/trainer/training_args.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9240      +/-   ##
===========================================
+ Coverage    52.78%   52.84%   +0.06%     
===========================================
  Files          661      669       +8     
  Lines       106945   107240     +295     
===========================================
+ Hits         56450    56671     +221     
- Misses       50495    50569      +74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZHUI ZHUI requested review from DrownFish19 and ZHUI October 11, 2024 06:42
return state_dict, shard_file, sharded_index


def load_unified_optimizer_split_param(args, model, optimizer, resume_from_checkpoint):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数是不是和load_unified_optimizer_locally大部分逻辑相似

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前初步开发中,后续会修改

@DesmonDay DesmonDay changed the title [Unified Checkpoint] Add split param [WIP][Unified Checkpoint] Add split param Oct 12, 2024
@DesmonDay DesmonDay force-pushed the add_split_param branch 2 times, most recently from 3abfe71 to 9bce15b