Feat (or maybe docs?): How to use GoLLM with vision models

I couldn't find it in the docs and perhaps I'm over complicating it but it would be great to understand how to use vision models with GoLLM.

For example if you want to provide an image to Claude Sonnet 4, or a vision capable Ollama model for analysis or conversion to text.