Skip to content

Conversation

@kurtamohler
Copy link
Contributor

@kurtamohler kurtamohler commented Feb 12, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2780

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 9 Unrelated Failures

As of commit f1bb16d with merge base 1ed5d29 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kurtamohler added a commit that referenced this pull request Feb 12, 2025
ghstack-source-id: 9ee2183
Pull Request resolved: #2780
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2025
@kurtamohler kurtamohler requested a review from vmoens February 12, 2025 00:56
@github-actions
Copy link

github-actions bot commented Feb 12, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.5890s 0.5038s 1.9849 Ops/s 1.8706 Ops/s $\textbf{\color{#35bf28}+6.11\%}$
test_transformed 1.1177s 0.9817s 1.0186 Ops/s 0.9628 Ops/s $\textbf{\color{#35bf28}+5.80\%}$
test_serial 1.5971s 1.4943s 0.6692 Ops/s 0.6354 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_parallel 1.4269s 1.3370s 0.7479 Ops/s 0.7442 Ops/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[True-True-True-True-True] 0.2428ms 29.9276μs 33.4140 KOps/s 32.0825 KOps/s $\color{#35bf28}+4.15\%$
test_step_mdp_speed[True-True-True-True-False] 50.7640μs 17.6357μs 56.7031 KOps/s 54.5803 KOps/s $\color{#35bf28}+3.89\%$
test_step_mdp_speed[True-True-True-False-True] 76.3930μs 17.0073μs 58.7983 KOps/s 56.8459 KOps/s $\color{#35bf28}+3.43\%$
test_step_mdp_speed[True-True-True-False-False] 34.0530μs 9.9699μs 100.3020 KOps/s 96.8473 KOps/s $\color{#35bf28}+3.57\%$
test_step_mdp_speed[True-True-False-True-True] 85.5610μs 32.1652μs 31.0895 KOps/s 30.2941 KOps/s $\color{#35bf28}+2.63\%$
test_step_mdp_speed[True-True-False-True-False] 45.5250μs 19.5731μs 51.0905 KOps/s 49.6501 KOps/s $\color{#35bf28}+2.90\%$
test_step_mdp_speed[True-True-False-False-True] 64.3900μs 18.8143μs 53.1511 KOps/s 51.1985 KOps/s $\color{#35bf28}+3.81\%$
test_step_mdp_speed[True-True-False-False-False] 36.7390μs 11.7903μs 84.8158 KOps/s 81.0025 KOps/s $\color{#35bf28}+4.71\%$
test_step_mdp_speed[True-False-True-True-True] 0.1161ms 36.0579μs 27.7332 KOps/s 28.6299 KOps/s $\color{#d91a1a}-3.13\%$
test_step_mdp_speed[True-False-True-True-False] 0.6569ms 21.4698μs 46.5770 KOps/s 45.4024 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[True-False-True-False-True] 46.4370μs 18.8128μs 53.1554 KOps/s 51.2048 KOps/s $\color{#35bf28}+3.81\%$
test_step_mdp_speed[True-False-True-False-False] 62.7470μs 11.7593μs 85.0389 KOps/s 81.8969 KOps/s $\color{#35bf28}+3.84\%$
test_step_mdp_speed[True-False-False-True-True] 73.4770μs 35.6852μs 28.0228 KOps/s 27.1754 KOps/s $\color{#35bf28}+3.12\%$
test_step_mdp_speed[True-False-False-True-False] 76.3920μs 23.1645μs 43.1696 KOps/s 41.9474 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[True-False-False-False-True] 82.0440μs 20.6125μs 48.5141 KOps/s 47.0402 KOps/s $\color{#35bf28}+3.13\%$
test_step_mdp_speed[True-False-False-False-False] 44.2130μs 13.6409μs 73.3087 KOps/s 71.1866 KOps/s $\color{#35bf28}+2.98\%$
test_step_mdp_speed[False-True-True-True-True] 87.8740μs 34.1020μs 29.3238 KOps/s 28.4461 KOps/s $\color{#35bf28}+3.09\%$
test_step_mdp_speed[False-True-True-True-False] 53.9710μs 21.5250μs 46.4577 KOps/s 44.9097 KOps/s $\color{#35bf28}+3.45\%$
test_step_mdp_speed[False-True-True-False-True] 79.9470μs 21.3621μs 46.8119 KOps/s 44.3989 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_step_mdp_speed[False-True-True-False-False] 57.2160μs 13.1276μs 76.1756 KOps/s 72.8750 KOps/s $\color{#35bf28}+4.53\%$
test_step_mdp_speed[False-True-False-True-True] 99.3460μs 35.3406μs 28.2961 KOps/s 27.1057 KOps/s $\color{#35bf28}+4.39\%$
test_step_mdp_speed[False-True-False-True-False] 83.3960μs 23.1264μs 43.2406 KOps/s 41.5521 KOps/s $\color{#35bf28}+4.06\%$
test_step_mdp_speed[False-True-False-False-True] 2.7428ms 23.3602μs 42.8079 KOps/s 40.9190 KOps/s $\color{#35bf28}+4.62\%$
test_step_mdp_speed[False-True-False-False-False] 83.9270μs 14.8916μs 67.1520 KOps/s 64.0049 KOps/s $\color{#35bf28}+4.92\%$
test_step_mdp_speed[False-False-True-True-True] 0.1086ms 37.3690μs 26.7601 KOps/s 26.2692 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[False-False-True-True-False] 70.1400μs 24.9225μs 40.1244 KOps/s 38.5776 KOps/s $\color{#35bf28}+4.01\%$
test_step_mdp_speed[False-False-True-False-True] 0.6237ms 23.4416μs 42.6593 KOps/s 41.4975 KOps/s $\color{#35bf28}+2.80\%$
test_step_mdp_speed[False-False-True-False-False] 41.1570μs 14.8807μs 67.2010 KOps/s 64.9131 KOps/s $\color{#35bf28}+3.52\%$
test_step_mdp_speed[False-False-False-True-True] 0.1072ms 38.9303μs 25.6869 KOps/s 24.8621 KOps/s $\color{#35bf28}+3.32\%$
test_step_mdp_speed[False-False-False-True-False] 80.8710μs 26.4983μs 37.7383 KOps/s 36.2677 KOps/s $\color{#35bf28}+4.05\%$
test_step_mdp_speed[False-False-False-False-True] 68.7990μs 25.2139μs 39.6606 KOps/s 36.9781 KOps/s $\textbf{\color{#35bf28}+7.25\%}$
test_step_mdp_speed[False-False-False-False-False] 65.0320μs 16.5897μs 60.2785 KOps/s 58.3402 KOps/s $\color{#35bf28}+3.32\%$
test_values[generalized_advantage_estimate-True-True] 10.3817ms 10.0454ms 99.5480 Ops/s 99.9330 Ops/s $\color{#d91a1a}-0.39\%$
test_values[vec_generalized_advantage_estimate-True-True] 28.1957ms 26.2634ms 38.0758 Ops/s 40.9153 Ops/s $\textbf{\color{#d91a1a}-6.94\%}$
test_values[td0_return_estimate-False-False] 0.2343ms 0.2018ms 4.9560 KOps/s 4.8538 KOps/s $\color{#35bf28}+2.10\%$
test_values[td1_return_estimate-False-False] 25.1352ms 24.6175ms 40.6215 Ops/s 40.2550 Ops/s $\color{#35bf28}+0.91\%$
test_values[vec_td1_return_estimate-False-False] 28.4080ms 26.4014ms 37.8768 Ops/s 40.5721 Ops/s $\textbf{\color{#d91a1a}-6.64\%}$
test_values[td_lambda_return_estimate-True-False] 44.2228ms 36.4371ms 27.4446 Ops/s 28.2492 Ops/s $\color{#d91a1a}-2.85\%$
test_values[vec_td_lambda_return_estimate-True-False] 28.2533ms 26.3450ms 37.9578 Ops/s 40.5583 Ops/s $\textbf{\color{#d91a1a}-6.41\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.8968ms 8.6957ms 114.9995 Ops/s 117.3580 Ops/s $\color{#d91a1a}-2.01\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3436ms 1.8322ms 545.8043 Ops/s 515.4643 Ops/s $\textbf{\color{#35bf28}+5.89\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5490ms 0.3735ms 2.6774 KOps/s 2.6864 KOps/s $\color{#d91a1a}-0.33\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 46.3727ms 43.3072ms 23.0909 Ops/s 24.0383 Ops/s $\color{#d91a1a}-3.94\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.4751ms 3.4919ms 286.3735 Ops/s 287.8738 Ops/s $\color{#d91a1a}-0.52\%$
test_dqn_speed[False-None] 6.4522ms 1.4222ms 703.1349 Ops/s 683.7624 Ops/s $\color{#35bf28}+2.83\%$
test_dqn_speed[False-backward] 2.0512ms 1.9202ms 520.7831 Ops/s 499.5255 Ops/s $\color{#35bf28}+4.26\%$
test_dqn_speed[True-None] 1.2064ms 0.5002ms 1.9994 KOps/s 1.9144 KOps/s $\color{#35bf28}+4.44\%$
test_dqn_speed[True-backward] 0.9991ms 0.9288ms 1.0767 KOps/s 1.0518 KOps/s $\color{#35bf28}+2.37\%$
test_dqn_speed[reduce-overhead-None] 0.7739ms 0.5030ms 1.9880 KOps/s 1.9594 KOps/s $\color{#35bf28}+1.46\%$
test_dqn_speed[reduce-overhead-backward] 1.1124ms 0.9640ms 1.0374 KOps/s 1.0146 KOps/s $\color{#35bf28}+2.24\%$
test_ddpg_speed[False-None] 3.3583ms 2.9251ms 341.8658 Ops/s 333.7260 Ops/s $\color{#35bf28}+2.44\%$
test_ddpg_speed[False-backward] 4.2962ms 4.0945ms 244.2281 Ops/s 242.3230 Ops/s $\color{#35bf28}+0.79\%$
test_ddpg_speed[True-None] 1.7664ms 1.2702ms 787.2523 Ops/s 774.1939 Ops/s $\color{#35bf28}+1.69\%$
test_ddpg_speed[True-backward] 3.3506ms 2.2111ms 452.2690 Ops/s 445.1883 Ops/s $\color{#35bf28}+1.59\%$
test_ddpg_speed[reduce-overhead-None] 1.5777ms 1.2646ms 790.7659 Ops/s 775.2056 Ops/s $\color{#35bf28}+2.01\%$
test_ddpg_speed[reduce-overhead-backward] 2.2374ms 2.1925ms 456.0900 Ops/s 445.0945 Ops/s $\color{#35bf28}+2.47\%$
test_sac_speed[False-None] 8.8745ms 8.3129ms 120.2957 Ops/s 116.9101 Ops/s $\color{#35bf28}+2.90\%$
test_sac_speed[False-backward] 13.0525ms 11.3359ms 88.2155 Ops/s 88.8143 Ops/s $\color{#d91a1a}-0.67\%$
test_sac_speed[True-None] 2.8664ms 2.1595ms 463.0803 Ops/s 453.5782 Ops/s $\color{#35bf28}+2.09\%$
test_sac_speed[True-backward] 5.9498ms 4.6878ms 213.3207 Ops/s 243.1382 Ops/s $\textbf{\color{#d91a1a}-12.26\%}$
test_sac_speed[reduce-overhead-None] 2.9292ms 2.4187ms 413.4428 Ops/s 407.9553 Ops/s $\color{#35bf28}+1.35\%$
test_sac_speed[reduce-overhead-backward] 4.7891ms 4.5075ms 221.8502 Ops/s 212.5592 Ops/s $\color{#35bf28}+4.37\%$
test_redq_speed[False-None] 14.7224ms 13.8565ms 72.1683 Ops/s 70.9438 Ops/s $\color{#35bf28}+1.73\%$
test_redq_speed[False-backward] 24.4232ms 23.6951ms 42.2028 Ops/s 41.5333 Ops/s $\color{#35bf28}+1.61\%$
test_redq_speed[True-None] 7.4073ms 6.4622ms 154.7458 Ops/s 152.1167 Ops/s $\color{#35bf28}+1.73\%$
test_redq_speed[True-backward] 14.7572ms 13.9573ms 71.6473 Ops/s 70.9159 Ops/s $\color{#35bf28}+1.03\%$
test_redq_speed[reduce-overhead-None] 7.9625ms 6.4728ms 154.4938 Ops/s 176.2773 Ops/s $\textbf{\color{#d91a1a}-12.36\%}$
test_redq_speed[reduce-overhead-backward] 14.1757ms 13.4024ms 74.6136 Ops/s 70.2325 Ops/s $\textbf{\color{#35bf28}+6.24\%}$
test_redq_deprec_speed[False-None] 15.4133ms 13.1953ms 75.7848 Ops/s 66.5082 Ops/s $\textbf{\color{#35bf28}+13.95\%}$
test_redq_deprec_speed[False-backward] 20.4321ms 18.8988ms 52.9133 Ops/s 51.3530 Ops/s $\color{#35bf28}+3.04\%$
test_redq_deprec_speed[True-None] 4.7610ms 4.2904ms 233.0811 Ops/s 204.0893 Ops/s $\textbf{\color{#35bf28}+14.21\%}$
test_redq_deprec_speed[True-backward] 9.8971ms 8.8888ms 112.5011 Ops/s 98.9816 Ops/s $\textbf{\color{#35bf28}+13.66\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.6656ms 3.9260ms 254.7127 Ops/s 221.7288 Ops/s $\textbf{\color{#35bf28}+14.88\%}$
test_redq_deprec_speed[reduce-overhead-backward] 10.5260ms 8.4288ms 118.6409 Ops/s 108.7096 Ops/s $\textbf{\color{#35bf28}+9.14\%}$
test_td3_speed[False-None] 9.3776ms 8.0595ms 124.0771 Ops/s 119.4437 Ops/s $\color{#35bf28}+3.88\%$
test_td3_speed[False-backward] 16.8976ms 10.8503ms 92.1630 Ops/s 93.9997 Ops/s $\color{#d91a1a}-1.95\%$
test_td3_speed[True-None] 2.0099ms 1.8231ms 548.5256 Ops/s 537.1766 Ops/s $\color{#35bf28}+2.11\%$
test_td3_speed[True-backward] 3.4552ms 3.4015ms 293.9902 Ops/s 290.1626 Ops/s $\color{#35bf28}+1.32\%$
test_td3_speed[reduce-overhead-None] 1.9225ms 1.8156ms 550.7915 Ops/s 542.4585 Ops/s $\color{#35bf28}+1.54\%$
test_td3_speed[reduce-overhead-backward] 3.5007ms 3.4272ms 291.7829 Ops/s 286.4324 Ops/s $\color{#35bf28}+1.87\%$
test_cql_speed[False-None] 40.0323ms 37.0227ms 27.0105 Ops/s 26.6548 Ops/s $\color{#35bf28}+1.33\%$
test_cql_speed[False-backward] 55.6179ms 47.5679ms 21.0226 Ops/s 20.5623 Ops/s $\color{#35bf28}+2.24\%$
test_cql_speed[True-None] 16.9709ms 15.9369ms 62.7475 Ops/s 60.4257 Ops/s $\color{#35bf28}+3.84\%$
test_cql_speed[True-backward] 24.2674ms 22.7847ms 43.8890 Ops/s 41.2339 Ops/s $\textbf{\color{#35bf28}+6.44\%}$
test_cql_speed[reduce-overhead-None] 16.3739ms 15.9793ms 62.5811 Ops/s 58.2273 Ops/s $\textbf{\color{#35bf28}+7.48\%}$
test_cql_speed[reduce-overhead-backward] 24.3321ms 23.4013ms 42.7326 Ops/s 42.0462 Ops/s $\color{#35bf28}+1.63\%$
test_a2c_speed[False-None] 8.2248ms 7.2093ms 138.7104 Ops/s 135.1945 Ops/s $\color{#35bf28}+2.60\%$
test_a2c_speed[False-backward] 15.7850ms 14.3788ms 69.5468 Ops/s 69.4387 Ops/s $\color{#35bf28}+0.16\%$
test_a2c_speed[True-None] 4.1304ms 3.7475ms 266.8426 Ops/s 264.0410 Ops/s $\color{#35bf28}+1.06\%$
test_a2c_speed[True-backward] 11.3674ms 10.3872ms 96.2719 Ops/s 97.2574 Ops/s $\color{#d91a1a}-1.01\%$
test_a2c_speed[reduce-overhead-None] 4.4938ms 3.7934ms 263.6141 Ops/s 239.6879 Ops/s $\textbf{\color{#35bf28}+9.98\%}$
test_a2c_speed[reduce-overhead-backward] 11.0960ms 10.4195ms 95.9741 Ops/s 93.5128 Ops/s $\color{#35bf28}+2.63\%$
test_ppo_speed[False-None] 8.9626ms 7.7524ms 128.9918 Ops/s 129.6851 Ops/s $\color{#d91a1a}-0.53\%$
test_ppo_speed[False-backward] 16.0680ms 15.1896ms 65.8346 Ops/s 64.8109 Ops/s $\color{#35bf28}+1.58\%$
test_ppo_speed[True-None] 4.4780ms 4.1396ms 241.5688 Ops/s 233.2973 Ops/s $\color{#35bf28}+3.55\%$
test_ppo_speed[True-backward] 10.4944ms 10.1341ms 98.6764 Ops/s 93.3222 Ops/s $\textbf{\color{#35bf28}+5.74\%}$
test_ppo_speed[reduce-overhead-None] 4.4430ms 4.1437ms 241.3282 Ops/s 241.3303 Ops/s $-0.00\%$
test_ppo_speed[reduce-overhead-backward] 11.2091ms 10.1468ms 98.5531 Ops/s 99.3585 Ops/s $\color{#d91a1a}-0.81\%$
test_reinforce_speed[False-None] 7.4966ms 6.5328ms 153.0728 Ops/s 151.7590 Ops/s $\color{#35bf28}+0.87\%$
test_reinforce_speed[False-backward] 10.7685ms 9.7497ms 102.5676 Ops/s 101.2958 Ops/s $\color{#35bf28}+1.26\%$
test_reinforce_speed[True-None] 3.5007ms 3.0776ms 324.9306 Ops/s 317.7691 Ops/s $\color{#35bf28}+2.25\%$
test_reinforce_speed[True-backward] 9.6743ms 9.0811ms 110.1194 Ops/s 109.7324 Ops/s $\color{#35bf28}+0.35\%$
test_reinforce_speed[reduce-overhead-None] 3.8063ms 3.1085ms 321.7034 Ops/s 322.2558 Ops/s $\color{#d91a1a}-0.17\%$
test_reinforce_speed[reduce-overhead-backward] 11.7254ms 9.3011ms 107.5143 Ops/s 110.0838 Ops/s $\color{#d91a1a}-2.33\%$
test_iql_speed[False-None] 34.7680ms 32.8328ms 30.4573 Ops/s 29.5647 Ops/s $\color{#35bf28}+3.02\%$
test_iql_speed[False-backward] 65.0678ms 45.5399ms 21.9588 Ops/s 21.5555 Ops/s $\color{#35bf28}+1.87\%$
test_iql_speed[True-None] 12.4584ms 11.6937ms 85.5163 Ops/s 88.8452 Ops/s $\color{#d91a1a}-3.75\%$
test_iql_speed[True-backward] 34.8051ms 23.2530ms 43.0052 Ops/s 44.9321 Ops/s $\color{#d91a1a}-4.29\%$
test_iql_speed[reduce-overhead-None] 12.4304ms 11.2652ms 88.7692 Ops/s 86.9755 Ops/s $\color{#35bf28}+2.06\%$
test_iql_speed[reduce-overhead-backward] 23.0999ms 21.9336ms 45.5922 Ops/s 43.0394 Ops/s $\textbf{\color{#35bf28}+5.93\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.3829ms 4.7735ms 209.4912 Ops/s 206.0385 Ops/s $\color{#35bf28}+1.68\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6977ms 0.5072ms 1.9717 KOps/s 1.9349 KOps/s $\color{#35bf28}+1.90\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.9137ms 0.4868ms 2.0542 KOps/s 2.0093 KOps/s $\color{#35bf28}+2.23\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.5341ms 4.6013ms 217.3295 Ops/s 213.4611 Ops/s $\color{#35bf28}+1.81\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2012ms 0.4989ms 2.0045 KOps/s 1.9702 KOps/s $\color{#35bf28}+1.74\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6955ms 0.4747ms 2.1065 KOps/s 2.0520 KOps/s $\color{#35bf28}+2.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.1612ms 1.6519ms 605.3696 Ops/s 604.4422 Ops/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.3270ms 1.5673ms 638.0404 Ops/s 636.7318 Ops/s $\color{#35bf28}+0.21\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.1262ms 4.7151ms 212.0857 Ops/s 215.2076 Ops/s $\color{#d91a1a}-1.45\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 3.5701ms 0.6471ms 1.5454 KOps/s 1.5453 KOps/s $+0.01\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9080ms 0.6212ms 1.6097 KOps/s 1.6012 KOps/s $\color{#35bf28}+0.53\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 9.9169ms 4.7485ms 210.5923 Ops/s 217.7414 Ops/s $\color{#d91a1a}-3.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0095ms 0.5072ms 1.9717 KOps/s 1.9543 KOps/s $\color{#35bf28}+0.89\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2.1367ms 0.5422ms 1.8444 KOps/s 2.0431 KOps/s $\textbf{\color{#d91a1a}-9.72\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.6238ms 4.5304ms 220.7294 Ops/s 219.2464 Ops/s $\color{#35bf28}+0.68\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0270ms 0.5004ms 1.9984 KOps/s 1.9700 KOps/s $\color{#35bf28}+1.44\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7268ms 0.4773ms 2.0951 KOps/s 2.0670 KOps/s $\color{#35bf28}+1.36\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.4031ms 4.7258ms 211.6030 Ops/s 207.3108 Ops/s $\color{#35bf28}+2.07\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1836ms 0.6412ms 1.5596 KOps/s 1.5201 KOps/s $\color{#35bf28}+2.59\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8624ms 0.6233ms 1.6044 KOps/s 1.5851 KOps/s $\color{#35bf28}+1.21\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.4361ms 4.2361ms 236.0637 Ops/s 232.3491 Ops/s $\color{#35bf28}+1.60\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.3981ms 2.3443ms 426.5653 Ops/s 459.4563 Ops/s $\textbf{\color{#d91a1a}-7.16\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.5842ms 1.4687ms 680.8930 Ops/s 655.1387 Ops/s $\color{#35bf28}+3.93\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4338s 12.8870ms 77.5975 Ops/s 235.7103 Ops/s $\textbf{\color{#d91a1a}-67.08\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.5532ms 2.3070ms 433.4641 Ops/s 419.5867 Ops/s $\color{#35bf28}+3.31\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.4027ms 1.4479ms 690.6413 Ops/s 773.4347 Ops/s $\textbf{\color{#d91a1a}-10.70\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 5.6995ms 4.4200ms 226.2429 Ops/s 223.9909 Ops/s $\color{#35bf28}+1.01\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.9845ms 2.3885ms 418.6795 Ops/s 375.5053 Ops/s $\textbf{\color{#35bf28}+11.50\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.4027ms 1.5834ms 631.5370 Ops/s 620.4692 Ops/s $\color{#35bf28}+1.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 14.0932ms 11.8024ms 84.7284 Ops/s 83.1138 Ops/s $\color{#35bf28}+1.94\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.2814ms 14.6220ms 68.3902 Ops/s 68.3803 Ops/s $\color{#35bf28}+0.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 22.3633ms 20.6124ms 48.5145 Ops/s 47.4747 Ops/s $\color{#35bf28}+2.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 15.7978ms 14.7077ms 67.9916 Ops/s 67.5384 Ops/s $\color{#35bf28}+0.67\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 22.7219ms 20.6097ms 48.5209 Ops/s 47.2897 Ops/s $\color{#35bf28}+2.60\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.6229ms 15.9259ms 62.7906 Ops/s 61.2827 Ops/s $\color{#35bf28}+2.46\%$

@github-actions
Copy link

github-actions bot commented Feb 12, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.8762s 0.7894s 1.2667 Ops/s 1.2459 Ops/s $\color{#35bf28}+1.67\%$
test_transformed 1.4386s 1.3542s 0.7385 Ops/s 0.7175 Ops/s $\color{#35bf28}+2.93\%$
test_serial 2.3287s 2.2430s 0.4458 Ops/s 0.4380 Ops/s $\color{#35bf28}+1.80\%$
test_parallel 1.9422s 1.8480s 0.5411 Ops/s 0.5395 Ops/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-True-True-True-True] 0.2416ms 40.1204μs 24.9250 KOps/s 25.8519 KOps/s $\color{#d91a1a}-3.59\%$
test_step_mdp_speed[True-True-True-True-False] 0.1543ms 22.0895μs 45.2704 KOps/s 44.1611 KOps/s $\color{#35bf28}+2.51\%$
test_step_mdp_speed[True-True-True-False-True] 61.0610μs 21.4149μs 46.6964 KOps/s 44.9525 KOps/s $\color{#35bf28}+3.88\%$
test_step_mdp_speed[True-True-True-False-False] 0.1294ms 12.5578μs 79.6315 KOps/s 78.9107 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[True-True-False-True-True] 0.2251ms 41.8311μs 23.9056 KOps/s 24.0566 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-True-False-True-False] 0.2355ms 25.0313μs 39.9500 KOps/s 40.0780 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-True-False-False-True] 0.1251ms 24.0594μs 41.5638 KOps/s 41.3365 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[True-True-False-False-False] 80.4510μs 15.0421μs 66.4801 KOps/s 67.6377 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-False-True-True-True] 0.2023ms 44.4385μs 22.5030 KOps/s 22.5600 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[True-False-True-True-False] 0.1098ms 27.1866μs 36.7828 KOps/s 37.6579 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[True-False-True-False-True] 0.1992ms 24.2162μs 41.2947 KOps/s 41.3087 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-False-True-False-False] 0.1952ms 15.0723μs 66.3469 KOps/s 67.1004 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[True-False-False-True-True] 0.1249ms 45.7876μs 21.8400 KOps/s 21.5285 KOps/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[True-False-False-True-False] 0.2219ms 29.4456μs 33.9609 KOps/s 34.2761 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-False-False-False-True] 0.1088ms 26.1082μs 38.3021 KOps/s 37.9444 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[True-False-False-False-False] 53.7910μs 17.2349μs 58.0219 KOps/s 58.9807 KOps/s $\color{#d91a1a}-1.63\%$
test_step_mdp_speed[False-True-True-True-True] 0.1057ms 43.8436μs 22.8083 KOps/s 22.7127 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[False-True-True-True-False] 0.1186ms 26.1930μs 38.1782 KOps/s 36.5899 KOps/s $\color{#35bf28}+4.34\%$
test_step_mdp_speed[False-True-True-False-True] 0.4293ms 27.5028μs 36.3599 KOps/s 35.6232 KOps/s $\color{#35bf28}+2.07\%$
test_step_mdp_speed[False-True-True-False-False] 0.4012ms 16.6113μs 60.1999 KOps/s 60.1278 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[False-True-False-True-True] 83.0610μs 45.8103μs 21.8291 KOps/s 21.7047 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[False-True-False-True-False] 0.1244ms 29.5703μs 33.8178 KOps/s 33.4128 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[False-True-False-False-True] 3.0545ms 30.2443μs 33.0641 KOps/s 33.7109 KOps/s $\color{#d91a1a}-1.92\%$
test_step_mdp_speed[False-True-False-False-False] 55.1920μs 18.9582μs 52.7477 KOps/s 53.7996 KOps/s $\color{#d91a1a}-1.96\%$
test_step_mdp_speed[False-False-True-True-True] 0.4595ms 46.8496μs 21.3449 KOps/s 20.9324 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[False-False-True-True-False] 80.5420μs 31.4743μs 31.7720 KOps/s 31.6459 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[False-False-True-False-True] 0.4428ms 29.6152μs 33.7664 KOps/s 32.8761 KOps/s $\color{#35bf28}+2.71\%$
test_step_mdp_speed[False-False-True-False-False] 0.4182ms 18.6248μs 53.6920 KOps/s 53.2570 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[False-False-False-True-True] 0.1065ms 49.1525μs 20.3449 KOps/s 19.7671 KOps/s $\color{#35bf28}+2.92\%$
test_step_mdp_speed[False-False-False-True-False] 0.4284ms 34.0012μs 29.4107 KOps/s 29.7479 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[False-False-False-False-True] 70.3210μs 31.7879μs 31.4585 KOps/s 31.5794 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[False-False-False-False-False] 0.4184ms 21.2224μs 47.1201 KOps/s 48.2248 KOps/s $\color{#d91a1a}-2.29\%$
test_values[generalized_advantage_estimate-True-True] 26.3961ms 25.3887ms 39.3875 Ops/s 40.3660 Ops/s $\color{#d91a1a}-2.42\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1039s 2.9772ms 335.8860 Ops/s 326.4275 Ops/s $\color{#35bf28}+2.90\%$
test_values[td0_return_estimate-False-False] 0.1089ms 79.3327μs 12.6051 KOps/s 12.5377 KOps/s $\color{#35bf28}+0.54\%$
test_values[td1_return_estimate-False-False] 55.7383ms 55.1770ms 18.1235 Ops/s 18.1291 Ops/s $\color{#d91a1a}-0.03\%$
test_values[vec_td1_return_estimate-False-False] 1.2872ms 1.0904ms 917.0557 Ops/s 921.4353 Ops/s $\color{#d91a1a}-0.48\%$
test_values[td_lambda_return_estimate-True-False] 88.0211ms 87.3718ms 11.4453 Ops/s 11.3782 Ops/s $\color{#35bf28}+0.59\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.4540ms 1.0811ms 924.9574 Ops/s 927.4437 Ops/s $\color{#d91a1a}-0.27\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.9358ms 24.7211ms 40.4513 Ops/s 40.4429 Ops/s $\color{#35bf28}+0.02\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0133ms 0.7466ms 1.3394 KOps/s 1.3305 KOps/s $\color{#35bf28}+0.67\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8338ms 0.6635ms 1.5071 KOps/s 1.4983 KOps/s $\color{#35bf28}+0.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.8791ms 1.4782ms 676.5163 Ops/s 675.6817 Ops/s $\color{#35bf28}+0.12\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8609ms 0.6828ms 1.4646 KOps/s 1.4625 KOps/s $\color{#35bf28}+0.14\%$
test_dqn_speed[False-None] 6.8736ms 1.5000ms 666.6824 Ops/s 657.6725 Ops/s $\color{#35bf28}+1.37\%$
test_dqn_speed[False-backward] 2.2746ms 2.1071ms 474.5787 Ops/s 467.1781 Ops/s $\color{#35bf28}+1.58\%$
test_dqn_speed[True-None] 0.9855ms 0.5640ms 1.7731 KOps/s 1.7646 KOps/s $\color{#35bf28}+0.48\%$
test_dqn_speed[True-backward] 1.3926ms 1.2294ms 813.4186 Ops/s 878.2627 Ops/s $\textbf{\color{#d91a1a}-7.38\%}$
test_dqn_speed[reduce-overhead-None] 1.0013ms 0.5812ms 1.7205 KOps/s 1.7016 KOps/s $\color{#35bf28}+1.11\%$
test_dqn_speed[reduce-overhead-backward] 1.1098ms 1.0710ms 933.7493 Ops/s 1.0242 KOps/s $\textbf{\color{#d91a1a}-8.83\%}$
test_ddpg_speed[False-None] 3.1971ms 2.8045ms 356.5658 Ops/s 339.0082 Ops/s $\textbf{\color{#35bf28}+5.18\%}$
test_ddpg_speed[False-backward] 4.6854ms 4.2063ms 237.7371 Ops/s 237.0096 Ops/s $\color{#35bf28}+0.31\%$
test_ddpg_speed[True-None] 1.7524ms 1.3542ms 738.4242 Ops/s 737.5208 Ops/s $\color{#35bf28}+0.12\%$
test_ddpg_speed[True-backward] 2.7205ms 2.5722ms 388.7768 Ops/s 384.9920 Ops/s $\color{#35bf28}+0.98\%$
test_ddpg_speed[reduce-overhead-None] 1.8872ms 1.3651ms 732.5674 Ops/s 736.3898 Ops/s $\color{#d91a1a}-0.52\%$
test_ddpg_speed[reduce-overhead-backward] 2.1778ms 2.0441ms 489.2246 Ops/s 484.4310 Ops/s $\color{#35bf28}+0.99\%$
test_sac_speed[False-None] 8.2653ms 7.8607ms 127.2158 Ops/s 123.2594 Ops/s $\color{#35bf28}+3.21\%$
test_sac_speed[False-backward] 11.4819ms 11.0443ms 90.5441 Ops/s 88.4043 Ops/s $\color{#35bf28}+2.42\%$
test_sac_speed[True-None] 2.2016ms 1.8649ms 536.2198 Ops/s 532.6524 Ops/s $\color{#35bf28}+0.67\%$
test_sac_speed[True-backward] 3.7318ms 3.5703ms 280.0885 Ops/s 264.7850 Ops/s $\textbf{\color{#35bf28}+5.78\%}$
test_sac_speed[reduce-overhead-None] 17.8748ms 10.7489ms 93.0329 Ops/s 92.2282 Ops/s $\color{#35bf28}+0.87\%$
test_sac_speed[reduce-overhead-backward] 1.7578ms 1.6359ms 611.3005 Ops/s 538.3335 Ops/s $\textbf{\color{#35bf28}+13.55\%}$
test_redq_speed[False-None] 7.7444ms 7.3325ms 136.3785 Ops/s 131.6441 Ops/s $\color{#35bf28}+3.60\%$
test_redq_speed[False-backward] 11.7276ms 11.1617ms 89.5924 Ops/s 84.5584 Ops/s $\textbf{\color{#35bf28}+5.95\%}$
test_redq_speed[True-None] 2.5140ms 2.3335ms 428.5421 Ops/s 423.4157 Ops/s $\color{#35bf28}+1.21\%$
test_redq_speed[True-backward] 4.2949ms 4.0840ms 244.8609 Ops/s 229.2762 Ops/s $\textbf{\color{#35bf28}+6.80\%}$
test_redq_speed[reduce-overhead-None] 2.6271ms 2.3591ms 423.8894 Ops/s 417.8576 Ops/s $\color{#35bf28}+1.44\%$
test_redq_speed[reduce-overhead-backward] 4.4774ms 4.0992ms 243.9522 Ops/s 231.5309 Ops/s $\textbf{\color{#35bf28}+5.36\%}$
test_redq_deprec_speed[False-None] 9.1134ms 8.8502ms 112.9923 Ops/s 110.0794 Ops/s $\color{#35bf28}+2.65\%$
test_redq_deprec_speed[False-backward] 12.6766ms 11.9346ms 83.7898 Ops/s 80.7747 Ops/s $\color{#35bf28}+3.73\%$
test_redq_deprec_speed[True-None] 2.8603ms 2.6514ms 377.1611 Ops/s 370.9209 Ops/s $\color{#35bf28}+1.68\%$
test_redq_deprec_speed[True-backward] 4.5835ms 4.3269ms 231.1121 Ops/s 218.5049 Ops/s $\textbf{\color{#35bf28}+5.77\%}$
test_redq_deprec_speed[reduce-overhead-None] 2.8735ms 2.6603ms 375.9017 Ops/s 371.0497 Ops/s $\color{#35bf28}+1.31\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.6164ms 4.4088ms 226.8205 Ops/s 217.0766 Ops/s $\color{#35bf28}+4.49\%$
test_td3_speed[False-None] 7.9588ms 7.8108ms 128.0277 Ops/s 125.7139 Ops/s $\color{#35bf28}+1.84\%$
test_td3_speed[False-backward] 10.7958ms 10.2091ms 97.9516 Ops/s 94.7920 Ops/s $\color{#35bf28}+3.33\%$
test_td3_speed[True-None] 1.7488ms 1.6857ms 593.2114 Ops/s 588.9647 Ops/s $\color{#35bf28}+0.72\%$
test_td3_speed[True-backward] 3.4070ms 3.2232ms 310.2526 Ops/s 292.8115 Ops/s $\textbf{\color{#35bf28}+5.96\%}$
test_td3_speed[reduce-overhead-None] 70.6002ms 26.6230ms 37.5615 Ops/s 38.8010 Ops/s $\color{#d91a1a}-3.19\%$
test_td3_speed[reduce-overhead-backward] 1.5518ms 1.3883ms 720.3013 Ops/s 642.4019 Ops/s $\textbf{\color{#35bf28}+12.13\%}$
test_cql_speed[False-None] 16.8288ms 16.4048ms 60.9576 Ops/s 59.1787 Ops/s $\color{#35bf28}+3.01\%$
test_cql_speed[False-backward] 22.4314ms 21.6812ms 46.1230 Ops/s 44.9561 Ops/s $\color{#35bf28}+2.60\%$
test_cql_speed[True-None] 3.5883ms 3.2956ms 303.4354 Ops/s 299.8305 Ops/s $\color{#35bf28}+1.20\%$
test_cql_speed[True-backward] 6.3381ms 5.6648ms 176.5282 Ops/s 173.7244 Ops/s $\color{#35bf28}+1.61\%$
test_cql_speed[reduce-overhead-None] 19.2723ms 12.8665ms 77.7210 Ops/s 77.6966 Ops/s $\color{#35bf28}+0.03\%$
test_cql_speed[reduce-overhead-backward] 1.9855ms 1.8399ms 543.5179 Ops/s 477.7304 Ops/s $\textbf{\color{#35bf28}+13.77\%}$
test_a2c_speed[False-None] 3.3523ms 3.1151ms 321.0196 Ops/s 296.6177 Ops/s $\textbf{\color{#35bf28}+8.23\%}$
test_a2c_speed[False-backward] 6.5981ms 6.0207ms 166.0929 Ops/s 152.6426 Ops/s $\textbf{\color{#35bf28}+8.81\%}$
test_a2c_speed[True-None] 1.5169ms 1.3539ms 738.5880 Ops/s 707.8065 Ops/s $\color{#35bf28}+4.35\%$
test_a2c_speed[True-backward] 3.0348ms 2.8846ms 346.6719 Ops/s 334.5149 Ops/s $\color{#35bf28}+3.63\%$
test_a2c_speed[reduce-overhead-None] 14.1426ms 8.2469ms 121.2571 Ops/s 122.6579 Ops/s $\color{#d91a1a}-1.14\%$
test_a2c_speed[reduce-overhead-backward] 1.5849ms 1.4499ms 689.6845 Ops/s 676.6520 Ops/s $\color{#35bf28}+1.93\%$
test_ppo_speed[False-None] 3.8517ms 3.6049ms 277.4024 Ops/s 269.4138 Ops/s $\color{#35bf28}+2.97\%$
test_ppo_speed[False-backward] 7.2971ms 6.7624ms 147.8758 Ops/s 145.0848 Ops/s $\color{#35bf28}+1.92\%$
test_ppo_speed[True-None] 1.6102ms 1.4135ms 707.4530 Ops/s 703.7091 Ops/s $\color{#35bf28}+0.53\%$
test_ppo_speed[True-backward] 3.7720ms 3.2446ms 308.2039 Ops/s 302.3008 Ops/s $\color{#35bf28}+1.95\%$
test_ppo_speed[reduce-overhead-None] 1.1276ms 0.9700ms 1.0309 KOps/s 1.0416 KOps/s $\color{#d91a1a}-1.02\%$
test_ppo_speed[reduce-overhead-backward] 1.7391ms 1.5729ms 635.7830 Ops/s 613.7617 Ops/s $\color{#35bf28}+3.59\%$
test_reinforce_speed[False-None] 2.4825ms 2.2270ms 449.0326 Ops/s 413.0426 Ops/s $\textbf{\color{#35bf28}+8.71\%}$
test_reinforce_speed[False-backward] 3.7128ms 3.3358ms 299.7746 Ops/s 281.3354 Ops/s $\textbf{\color{#35bf28}+6.55\%}$
test_reinforce_speed[True-None] 1.5238ms 1.3155ms 760.1452 Ops/s 761.1327 Ops/s $\color{#d91a1a}-0.13\%$
test_reinforce_speed[True-backward] 3.1959ms 3.0395ms 328.9998 Ops/s 326.3374 Ops/s $\color{#35bf28}+0.82\%$
test_reinforce_speed[reduce-overhead-None] 16.1173ms 9.2060ms 108.6252 Ops/s 111.9801 Ops/s $\color{#d91a1a}-3.00\%$
test_reinforce_speed[reduce-overhead-backward] 1.7899ms 1.6564ms 603.7051 Ops/s 598.0257 Ops/s $\color{#35bf28}+0.95\%$
test_iql_speed[False-None] 9.4443ms 9.0176ms 110.8944 Ops/s 107.4418 Ops/s $\color{#35bf28}+3.21\%$
test_iql_speed[False-backward] 13.6829ms 13.0157ms 76.8304 Ops/s 74.5224 Ops/s $\color{#35bf28}+3.10\%$
test_iql_speed[True-None] 2.6769ms 2.2429ms 445.8612 Ops/s 424.4653 Ops/s $\textbf{\color{#35bf28}+5.04\%}$
test_iql_speed[True-backward] 5.6043ms 4.9463ms 202.1721 Ops/s 194.8266 Ops/s $\color{#35bf28}+3.77\%$
test_iql_speed[reduce-overhead-None] 0.4792s 12.5705ms 79.5514 Ops/s 96.4550 Ops/s $\textbf{\color{#d91a1a}-17.52\%}$
test_iql_speed[reduce-overhead-backward] 2.2365ms 2.0852ms 479.5763 Ops/s 502.1528 Ops/s $\color{#d91a1a}-4.50\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8210ms 6.1219ms 163.3489 Ops/s 160.9980 Ops/s $\color{#35bf28}+1.46\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7732ms 0.2687ms 3.7217 KOps/s 3.0872 KOps/s $\textbf{\color{#35bf28}+20.55\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6838ms 0.2456ms 4.0715 KOps/s 3.0227 KOps/s $\textbf{\color{#35bf28}+34.70\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3788ms 5.8498ms 170.9455 Ops/s 168.9649 Ops/s $\color{#35bf28}+1.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1080ms 0.3719ms 2.6887 KOps/s 3.8356 KOps/s $\textbf{\color{#d91a1a}-29.90\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8214ms 0.3396ms 2.9449 KOps/s 4.1569 KOps/s $\textbf{\color{#d91a1a}-29.16\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7127ms 1.4337ms 697.4881 Ops/s 785.6862 Ops/s $\textbf{\color{#d91a1a}-11.23\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7033ms 1.2512ms 799.2188 Ops/s 833.6573 Ops/s $\color{#d91a1a}-4.13\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3832ms 5.9843ms 167.1048 Ops/s 163.4684 Ops/s $\color{#35bf28}+2.22\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1969ms 0.5146ms 1.9432 KOps/s 2.0863 KOps/s $\textbf{\color{#d91a1a}-6.86\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9452ms 0.4653ms 2.1490 KOps/s 2.2882 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 9.6755ms 5.9941ms 166.8303 Ops/s 168.1982 Ops/s $\color{#d91a1a}-0.81\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.2232ms 0.3681ms 2.7168 KOps/s 3.3413 KOps/s $\textbf{\color{#d91a1a}-18.69\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 1.0913ms 0.2601ms 3.8446 KOps/s 3.1096 KOps/s $\textbf{\color{#35bf28}+23.64\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2394ms 5.8083ms 172.1668 Ops/s 169.4250 Ops/s $\color{#35bf28}+1.62\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9005ms 0.3250ms 3.0773 KOps/s 3.2102 KOps/s $\color{#d91a1a}-4.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5208ms 0.2815ms 3.5518 KOps/s 3.1458 KOps/s $\textbf{\color{#35bf28}+12.91\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4182ms 6.0121ms 166.3319 Ops/s 164.5024 Ops/s $\color{#35bf28}+1.11\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3772ms 0.4801ms 2.0828 KOps/s 2.4240 KOps/s $\textbf{\color{#d91a1a}-14.08\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7243ms 0.4601ms 2.1737 KOps/s 2.5647 KOps/s $\textbf{\color{#d91a1a}-15.25\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.9082ms 5.3241ms 187.8245 Ops/s 182.0119 Ops/s $\color{#35bf28}+3.19\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.6557ms 2.0842ms 479.8003 Ops/s 434.3639 Ops/s $\textbf{\color{#35bf28}+10.46\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.2099ms 1.2011ms 832.5597 Ops/s 794.3557 Ops/s $\color{#35bf28}+4.81\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4632s 14.6264ms 68.3697 Ops/s 184.7674 Ops/s $\textbf{\color{#d91a1a}-63.00\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 10.7940ms 2.2390ms 446.6244 Ops/s 445.4647 Ops/s $\color{#35bf28}+0.26\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.3038ms 1.1669ms 856.9882 Ops/s 832.5244 Ops/s $\color{#35bf28}+2.94\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.0417ms 5.6251ms 177.7752 Ops/s 31.4631 Ops/s $\textbf{\color{#35bf28}+465.03\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.8847ms 2.1551ms 464.0067 Ops/s 443.6897 Ops/s $\color{#35bf28}+4.58\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.6608ms 1.4154ms 706.5220 Ops/s 732.9477 Ops/s $\color{#d91a1a}-3.61\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.0242ms 12.7447ms 78.4641 Ops/s 73.8197 Ops/s $\textbf{\color{#35bf28}+6.29\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.3065ms 16.6602ms 60.0231 Ops/s 59.0032 Ops/s $\color{#35bf28}+1.73\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 17.9287ms 17.4309ms 57.3694 Ops/s 55.0850 Ops/s $\color{#35bf28}+4.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.0784ms 17.0496ms 58.6522 Ops/s 58.2126 Ops/s $\color{#35bf28}+0.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 17.9498ms 17.3806ms 57.5354 Ops/s 54.7835 Ops/s $\textbf{\color{#35bf28}+5.02\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 19.2188ms 18.0653ms 55.3546 Ops/s 52.7185 Ops/s $\textbf{\color{#35bf28}+5.00\%}$

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature!
Do you think we could use Spec.enumerate() as a default? Presumably, envs that don't have a list of actions are also those where enumerate would fail

@kurtamohler
Copy link
Contributor Author

kurtamohler commented Feb 13, 2025

Ah I didn't know about enumerate! Yeah that would be a great way to implement it. It looks like the enumerate methods for Categorical/OneHot/etc. ignore the mask if it is set. So I think I should add an argument that will make it use the mask. Then I think ChessEnv and other envs that use action spec masks won't even need a specialized impl of all_actions

@vmoens
Copy link
Collaborator

vmoens commented Feb 13, 2025

Ah I didn't know about enumerate! Yeah that would be a great way to implement it. It looks like the enumerate methods for Categorical/OneHot/etc. ignore the mask if it is set. So I think I should add an argument that will make it use the mask. Then I think ChessEnv and other envs that use action spec masks won't even need a specialized impl of all_actions

Oh yes they should definitely use the mask!

Have you seen the mask transform by the way? It automatically sets the mask for the action spec

kurtamohler added a commit to kurtamohler/torchrl that referenced this pull request Feb 13, 2025
ghstack-source-id: 9ee2183
Pull Request resolved: pytorch#2780
[ghstack-poisoned]
kurtamohler added a commit that referenced this pull request Feb 13, 2025
ghstack-source-id: 7abf9d4
Pull Request resolved: #2780
@kurtamohler
Copy link
Contributor Author

Have you seen the mask transform by the way?

Nope, I hadn't seen it until now. That's cool!

@kurtamohler kurtamohler changed the title [Feature] Add EnvBase.all_actions and impl for ChessEnv [Feature] Add EnvBase.all_actions Feb 13, 2025
def enumerate(self) -> torch.Tensor:
def enumerate(self, use_mask: bool = False) -> torch.Tensor:
if use_mask:
raise NotImplementedError
Copy link
Contributor Author

@kurtamohler kurtamohler Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this unimplemented for OneHot for now, just because I want to focus on getting MCTS working with ChessEnv. I'll submit a followup PR in the near future to add it--or if you'd rather I just add it to this PR, I'm happy to do that

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vmoens vmoens added the enhancement New feature or request label Feb 14, 2025
@vmoens vmoens merged commit f1bb16d into gh/kurtamohler/3/base Feb 14, 2025
64 of 74 checks passed
@vmoens vmoens deleted the gh/kurtamohler/3/head branch February 14, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants