-
Notifications
You must be signed in to change notification settings - Fork 14.1k
webui: Client-side implementation of tool calling (with two tools) #18059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hey, thanks a lot for your contribution! Will take a closer look at this when also reviewing and testing #17487. These are in a way intertwining changes and we need to be thoughtful when merging changes to the master branch. |
|
This approach is interesting, but for a cleaner and more extensible codebase, it would be better to implement proper MCP protocol support first. This means providing small example MCP servers in Python (calculator, web browsing, etc.), and then nothing would prevent us from adding local JavaScript execution as an additional MCP tool, using the same unified context management architecture. |
|
MCP is a much higher bar to clear, and I don’t see it as a replacement for this. Client-side tools immediately benefit everyone, where MCP is a much more advanced, much more niche technology. It certainly gets lots of hype because it can do cool things, but having tools that exist purely in the chat app allows everyone to instantly to give their models access to a code interpreter and calculator with no additional configuration, no additional services, etc. One could even see MCP as a subset of this client side tool registry: there could be an MCP tool which lets models interact with configured MCP servers to discover which tools they offer and then run those tools through the MCP tool plugin. Presenting tools over MCP that are just built into the client feels like only having a hammer and seeing every problem as a nail. Tool calls are the first level abstraction here, and MCPs are an abstraction on top of that. There’s no need to use MCP to access the tool calls that are part of the application itself; MCP is meant for connecting with external systems. |
…l and (javascript) code interpreter tool
…reter documentation around async
|
I think there's a misunderstanding. I'm not suggesting we force local js tool through MCP: that would indeed be overengineering. |
Mine does exactly that? I recommend looking at the video titled "Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt" in which the model calls multiple tools in response to a single prompt. Although this PR does not implement a limit on the number of tool calls that a model can do, instead leaving that up to the model and the user's patience.
I agree that would make sense
This is where you've lost me... this PR provides a loosely coupled tool registry ( I designed it to be very modular and extensible. People should be able to add additional tools without editing the settings dialog box, without editing the agentic loop -- without editing anything outside of their tool, other than adding an import statement for the registry to be aware of the new tool. Adding an import statement in |
|
You're right, I missed the recursive loop in processToolCallsAndContinue. That's actually really clean. |
|
In practice, I really enjoy being able to run/benchmark my models on a code interpreter to study algorithms or accurately solve complex math problems. But realistically, this remains useful for a minority of users. |
|
Same here, I haven't had time to test your PR yet. The goal is to think about what we should implement in llama.cpp. A JS sandbox is obviously one of the ultimate toys for an LLM, and it's clear we already had the idea to do it, but there's been a lot of refactoring and more urgent things. You saw that we already have a button to execute an HTML/JS code block directly? But it doesn't have context enrichment capability, we didn't want to complicate the code. But if we properly standardize context management, it will be simple to add a "virtual MCP tool" that can be enabled with a checkbox, in the right place in the code. I need to study and test what you've done. Technically, what I had in mind for a good JS sandbox is a layer for editing that works very well intuitively with all models (what Anthropic Claude does on their site), slightly modified as "create_file / str_replace / view / exec_file" and the output enriches the context. And there you have a scientific tool in the browser, you don't even need a calculator, the LLM produces it :) CORS can be bypassed by manipulating the browser, but that's really not great for security. In fact, we're limited to 127.0.0.1 (the immediate local client) but also, which is very powerful: the origin site hosting the SPA, meaning the llama.cpp backend itself. And there, imagine what we can do :) |
I was thinking along these same lines earlier today, but it is important to tie the code interpreter state to the message tree, so that there is no cross-contamination between branches of messages or if the user reverts to an earlier point in the message tree (by clicking edit or regenerate). I have some ideas for how this could be done, but I haven't had time to fully explore it, and figured that might be a good follow-up PR if this one were merged. |
One thought around this that's worth noting... I don't run But, changing |
|
I have a complete MCP backend implementation if you're curious. It exposes the same OpenAI Compat interface and handles everything in the background, like industry-standard LLM websites. I just need to add Anthropic API compatibility (not much to change). And it manages a Podman sandbox for producing, debugging, and running code in any language. Like, clone this repository, add this feature, compile, push or share me a link. The tools are "isomorphic" to Claude, with GLM 4.5 Air / GLM 4.6, it offers the same level of quality! -> Real-life use case: From my phone and a simple OpenAI compatible client, I cloned, patched, and recompiled the Jellyfin server (.NET). It gave me a .DLL (for Linux!), and I patched my other server, all locally, in 10 minutes, using GLM 4.5 Air. Doing it manually would have taken me several hours because I've never worked with .NET before. I wanted to attach it to the MCP client later, along with a sandboxed Firefox browser, but we need to start with small, simple, ready-to-use examples. and your JavaScript sandbox would have higher priority, we need to add it to the MCP client as a checkbox:) Integrating a basic MCP relay/proxy/bridge in pure C++ into the backend shouldn't be daunting for those who have already integrated a routing system like "llama-swap" into the same binary. it's just an already integrated cpp-httplib job ! |
This PR allows webui to give models access to two tools: a calculator and a code interpreter. The calculator is a simple expression calculator, used to enhance math abilities. The code interpreter runs arbitrary JavaScript in a (relatively isolated) Web Worker, and returns the output to the model, which can be used for more advanced analysis.
This PR also lays the groundwork for a modular tool system, such that one could easily imagine adding a Canvas tool or a Web Search tool.
When an assistant message emits tool calls, the web UI...
Included tools
output + the final evaluated value, with improved error reporting (line/column/snippet).
UX changes
(arguments + result + timing) to avoid extra message bubbles.
Configuration & extensibility
a Tools section (toggles + per-tool fields like timeout), and defaults are derived from tool registrations.
Tests
reactivity/regressions, etc. These tests were created when bugs were encountered. I would be perfectly fine with throwing most of them away, but I figured there was no harm in including them.
Videos
Calculator tool
Screen.Recording.2025-12-15.at.8.10.04.AM.mov
Code Interpreter tool
Screen.Recording.2025-12-15.at.8.11.08.AM.mov
Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt
Screen.Recording.2025-12-15.at.8.14.48.AM.mov
Demonstrating how tool calling works for an Instruct model
Screen.Recording.2025-12-15.at.8.12.32.AM.mov
Demonstrating how the regenerate button will correctly treat the entire response as one message, instead of regenerating just the last segment after the last tool call.
Screen.Recording.2025-12-15.at.8.39.48.AM.mov
Deleting an entire response
Screen.Recording.2025-12-15.at.8.53.48.AM.mov
Screenshots
New Settings Screen for Tools
Known Bugs