create API jit::Module::deepcopy(device) #106521

jiayisuse · 2023-08-03T03:14:24Z

Summary:
Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive.

Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge

pytorch-bot · 2023-08-03T03:14:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106521

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce80834:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-08-03T03:15:49Z

This pull request was exported from Phabricator. Differential Revision: D47973149

facebook-github-bot · 2023-08-07T18:06:06Z

This pull request was exported from Phabricator. Differential Revision: D47973149

Summary: Pull Request resolved: pytorch#106521 Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive. Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge Test Plan: ## 96GB Prospector ### Loading **Before: I0802 09:34:25.094766 129758 ModelManagerBase.cpp:1088 req:00007f12f9e10920] Loaded 966767825_1382 in 348838 ms (48854 ms of IO) memory used 103091239504 byte(s) **After: I0802 09:42:06.447378 3471149 ModelManagerBase.cpp:1101 req:00007fe771610920] Loaded 966767825_1382 in 215768 ms (33238 ms of IO) memory used 103085785104 byte(s) ### Accuracy P802409613 ### 600GB HSTU 962077327_11 Loaded 962077327_11 in 969158 ms (680962 ms of IO) memory used 721660906784 byte(s) THRIFT_THREADS=30 SIGRID_MAX_RSS_SIZE_BYTES=751619276800 SUBMOD_TO_DEVICE='' PYTORCH_PREDICTOR_ENABLE_XL_FORMAT_V2=true PYTORCH_PREDICTOR_ENABLE_XL_FORMAT_V2_INPLACE_LOADING=true GIF_LOAD_LOCAL_NET=true RUN_D2H_TOGETHER_WITH_EXECUTION_STREAM=true EVENT_POOL_ENABLE_BLOCKING_SYNC=false ENABLE_DEPLOY_INPLACE_LOADING=true TGIF_REPLICATE_MERGE_BY_TEMPFILE=true USE_STATIC_PATH=1 USE_ALL_TO_ONE_OP=false MAX_NUM_ADS=10240 REQUEST_BATCHING_PARAM_OVERRIDE="max_batch_size|${MAX_NUM_ADS};batch_time_us|50000" NUM_DEPLOY_INTERPRETER=32 NUM_DESER_AND_REMOTE_RO_CPU_WORKERS=20 MODULE_NUM_WORKERS_PER_GPU='merge|8;remote|0' MODEL_ID=962077327 SNAPSHOT_ID=11 SERVER_PORT=7456 CUDA_VISIBLE_DEVICES_FOR_PREDICTOR="0,1,5,6" CPU_NUMA_NODES_FOR_PREDICTOR="0,2" ENABLE_THRIFT_WARMUP=false hpc/inference/scripts/gif/prospector/1_cards/launch_gpu_sigrid_predictor_task_0.sh Differential Revision: D47973149 fbshipit-source-id: babbfdfff5d6785a74edc6fd79367341cde310de

facebook-github-bot · 2023-08-07T18:25:21Z

This pull request was exported from Phabricator. Differential Revision: D47973149

davidberard98

LGTM!

jiayisuse · 2023-08-07T21:50:46Z

@pytorchbot merge

pytorchmergebot · 2023-08-07T21:53:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive. Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge Pull Request resolved: pytorch#106521 Approved by: https://github.com/davidberard98

pytorch-bot bot added the release notes: jit release notes category label Aug 3, 2023

facebook-github-bot added the fb-exported label Aug 3, 2023

jiayisuse requested review from houseroad, zyan0 and davidberard98 August 3, 2023 04:43

jiayisuse force-pushed the export-D47973149 branch from 0b63ad3 to e840647 Compare August 7, 2023 18:06

jiayisuse force-pushed the export-D47973149 branch from e840647 to ce80834 Compare August 7, 2023 18:25

davidberard98 approved these changes Aug 7, 2023

View reviewed changes

davidberard98 added the topic: not user facing topic category label Aug 7, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2023

pytorchmergebot added the merging label Aug 7, 2023

jiayisuse changed the title ~~[tgif] replicate merge with no overhead~~ create API jit::Module::deepcopy(device) Aug 7, 2023

pytorchmergebot added Merged and removed merging labels Aug 8, 2023

pytorchmergebot closed this in 8ef7512 Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

create API jit::Module::deepcopy(device) #106521

create API jit::Module::deepcopy(device) #106521

Uh oh!

jiayisuse commented Aug 3, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 3, 2023 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 3, 2023

Uh oh!

facebook-github-bot commented Aug 7, 2023

Uh oh!

facebook-github-bot commented Aug 7, 2023

Uh oh!

davidberard98 left a comment

Uh oh!

jiayisuse commented Aug 7, 2023

Uh oh!

pytorchmergebot commented Aug 7, 2023

Uh oh!

Uh oh!

create API jit::Module::deepcopy(device) #106521

create API jit::Module::deepcopy(device) #106521

Uh oh!

Conversation

jiayisuse commented Aug 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106521

✅ No Failures

Uh oh!

facebook-github-bot commented Aug 3, 2023

Uh oh!

facebook-github-bot commented Aug 7, 2023

Uh oh!

facebook-github-bot commented Aug 7, 2023

Uh oh!

davidberard98 left a comment

Choose a reason for hiding this comment

Uh oh!

jiayisuse commented Aug 7, 2023

Uh oh!

pytorchmergebot commented Aug 7, 2023

Merge started

Uh oh!

Uh oh!

jiayisuse commented Aug 3, 2023 •

edited

Loading

pytorch-bot bot commented Aug 3, 2023 •

edited

Loading