This will likely require it's own device function in the future, as I suspect this will be implement similarly to conv_forward where it's just looping over the inputs. Can this re-use conv_forward/backward?
Also, should there just be a single pool method that takes an enum { MaxPool, AvgPool, ... }
?