-
Notifications
You must be signed in to change notification settings - Fork 25.2k
create API jit::Module::deepcopy(device) #106521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106521
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ce80834: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D47973149 |
0b63ad3
to
e840647
Compare
This pull request was exported from Phabricator. Differential Revision: D47973149 |
Summary: Pull Request resolved: pytorch#106521 Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive. Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge Test Plan: ## 96GB Prospector ### Loading **Before: I0802 09:34:25.094766 129758 ModelManagerBase.cpp:1088 req:00007f12f9e10920] Loaded 966767825_1382 in 348838 ms (48854 ms of IO) memory used 103091239504 byte(s) **After: I0802 09:42:06.447378 3471149 ModelManagerBase.cpp:1101 req:00007fe771610920] Loaded 966767825_1382 in 215768 ms (33238 ms of IO) memory used 103085785104 byte(s) ### Accuracy P802409613 ### 600GB HSTU 962077327_11 Loaded 962077327_11 in 969158 ms (680962 ms of IO) memory used 721660906784 byte(s) THRIFT_THREADS=30 SIGRID_MAX_RSS_SIZE_BYTES=751619276800 SUBMOD_TO_DEVICE='' PYTORCH_PREDICTOR_ENABLE_XL_FORMAT_V2=true PYTORCH_PREDICTOR_ENABLE_XL_FORMAT_V2_INPLACE_LOADING=true GIF_LOAD_LOCAL_NET=true RUN_D2H_TOGETHER_WITH_EXECUTION_STREAM=true EVENT_POOL_ENABLE_BLOCKING_SYNC=false ENABLE_DEPLOY_INPLACE_LOADING=true TGIF_REPLICATE_MERGE_BY_TEMPFILE=true USE_STATIC_PATH=1 USE_ALL_TO_ONE_OP=false MAX_NUM_ADS=10240 REQUEST_BATCHING_PARAM_OVERRIDE="max_batch_size|${MAX_NUM_ADS};batch_time_us|50000" NUM_DEPLOY_INTERPRETER=32 NUM_DESER_AND_REMOTE_RO_CPU_WORKERS=20 MODULE_NUM_WORKERS_PER_GPU='merge|8;remote|0' MODEL_ID=962077327 SNAPSHOT_ID=11 SERVER_PORT=7456 CUDA_VISIBLE_DEVICES_FOR_PREDICTOR="0,1,5,6" CPU_NUMA_NODES_FOR_PREDICTOR="0,2" ENABLE_THRIFT_WARMUP=false hpc/inference/scripts/gif/prospector/1_cards/launch_gpu_sigrid_predictor_task_0.sh Differential Revision: D47973149 fbshipit-source-id: babbfdfff5d6785a74edc6fd79367341cde310de
e840647
to
ce80834
Compare
This pull request was exported from Phabricator. Differential Revision: D47973149 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive. Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge Pull Request resolved: pytorch#106521 Approved by: https://github.com/davidberard98
Summary:
Before we copy a meta merge, and use it as a skeleton to do d2d merge replication. However some models like prospector has CPU op LongIndex which takes quite long time to load. That makes the meta merge copy expensive.
Modify jit::Module::deepcopy() to allow device copy. It simplifies user code and removes all unnecessary copies like tempfile, meta merge