[User model][tracker] Improve compilation of PPO model (Stable-baselines3)

`pip install stable-baselines3[extra]`

## Repro

```python
from stable_baselines3 import PPO
import torchdynamo 



@torchdynamo.optimize("inductor")
def train():
    model = PPO("MlpPolicy", "CartPole-v1").learn(10_000)


import time 
tic = time.time()
train()
toc = time.time()
print(toc - tic)
```

Lots of assertion errors



## Logs

Current state of things (feel free to edit as we fix things):
https://github.com/pytorch/pytorch/issues/93697#issuecomment-1984826847

cc @ezyang @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @mruberry @rgommers @wconstab @zou3519 @aakhundov @soumith @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[User model][tracker] Improve compilation of PPO model (Stable-baselines3) #93697

Repro

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[User model][tracker] Improve compilation of PPO model (Stable-baselines3) #93697

Description

Repro

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions