Skip to content

Conversation

jiayisuse
Copy link
Contributor

@jiayisuse jiayisuse commented Aug 3, 2023

Summary:
Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive.

Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Aug 3, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106521

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce80834:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D47973149

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D47973149

Summary:
Pull Request resolved: pytorch#106521

Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive.

Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge

Test Plan:
## 96GB Prospector
### Loading
**Before:
I0802 09:34:25.094766 129758 ModelManagerBase.cpp:1088 req:00007f12f9e10920] Loaded 966767825_1382 in 348838 ms (48854 ms of IO) memory used 103091239504 byte(s)

**After:
I0802 09:42:06.447378 3471149 ModelManagerBase.cpp:1101 req:00007fe771610920] Loaded 966767825_1382 in 215768 ms (33238 ms of IO) memory used 103085785104 byte(s)

### Accuracy
P802409613

### 600GB HSTU
962077327_11
Loaded 962077327_11 in 969158 ms (680962 ms of IO) memory used 721660906784 byte(s)

THRIFT_THREADS=30 SIGRID_MAX_RSS_SIZE_BYTES=751619276800 SUBMOD_TO_DEVICE='' PYTORCH_PREDICTOR_ENABLE_XL_FORMAT_V2=true PYTORCH_PREDICTOR_ENABLE_XL_FORMAT_V2_INPLACE_LOADING=true GIF_LOAD_LOCAL_NET=true RUN_D2H_TOGETHER_WITH_EXECUTION_STREAM=true EVENT_POOL_ENABLE_BLOCKING_SYNC=false ENABLE_DEPLOY_INPLACE_LOADING=true TGIF_REPLICATE_MERGE_BY_TEMPFILE=true USE_STATIC_PATH=1 USE_ALL_TO_ONE_OP=false MAX_NUM_ADS=10240 REQUEST_BATCHING_PARAM_OVERRIDE="max_batch_size|${MAX_NUM_ADS};batch_time_us|50000" NUM_DEPLOY_INTERPRETER=32 NUM_DESER_AND_REMOTE_RO_CPU_WORKERS=20 MODULE_NUM_WORKERS_PER_GPU='merge|8;remote|0' MODEL_ID=962077327 SNAPSHOT_ID=11 SERVER_PORT=7456 CUDA_VISIBLE_DEVICES_FOR_PREDICTOR="0,1,5,6" CPU_NUMA_NODES_FOR_PREDICTOR="0,2" ENABLE_THRIFT_WARMUP=false hpc/inference/scripts/gif/prospector/1_cards/launch_gpu_sigrid_predictor_task_0.sh

Differential Revision: D47973149

fbshipit-source-id: babbfdfff5d6785a74edc6fd79367341cde310de
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D47973149

Copy link
Contributor

@davidberard98 davidberard98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@davidberard98 davidberard98 added the topic: not user facing topic category label Aug 7, 2023
@jiayisuse
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jiayisuse jiayisuse changed the title [tgif] replicate merge with no overhead create API jit::Module::deepcopy(device) Aug 7, 2023
Cyril-Anto pushed a commit to Cyril-Anto/pytorch that referenced this pull request Aug 17, 2023
Summary:
Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive.

Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge
Pull Request resolved: pytorch#106521
Approved by: https://github.com/davidberard98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged release notes: jit release notes category topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants