Skip to content

Conversation

@coder543
Copy link

@coder543 coder543 commented Dec 15, 2025

This PR allows webui to give models access to two tools: a calculator and a code interpreter. The calculator is a simple expression calculator, used to enhance math abilities. The code interpreter runs arbitrary JavaScript in a (relatively isolated) Web Worker, and returns the output to the model, which can be used for more advanced analysis.

This PR also lays the groundwork for a modular tool system, such that one could easily imagine adding a Canvas tool or a Web Search tool.

AI Disclosure: I spent about 8 hours yesterday developing this with significant assistance from AI. I'm perfectly capable of writing this kind of frontend code if I had the time, but this was just a fun project. I'm sharing this PR because the result generally worked well, but I have not had time to ensure that all of the code meets my quality standards. I chose to share it anyways because I feel like tool calling is an essential feature that has been missing so far, and this implementation results in an elegant, effective user experience. I may have more time to carefully review the code changes in the near future, in which case I will update this description and the PR as needed, but I figured there was no harm in making this available in case other people were interested in having tool calling in their llama-server webui.

When an assistant message emits tool calls, the web UI...

  • Executes any enabled tools locally in the browser
  • Persists the results as role: tool messages linked via tool_call_id (including execution duration)
  • Automatically continues generation with a follow-up completion request that includes the tool outputs

Included tools

  • Calculator: evaluates a constrained math expression syntax (operators + selected Math.* functions/constants).
  • Code Interpreter (JavaScript): runs arbitrary JS in a Web Worker with a configurable timeout, capturing console
    output + the final evaluated value, with improved error reporting (line/column/snippet).

UX changes

  • Collapses assistant→tool→assistant chains into a single assistant “reasoning” thread and renders tool calls inline
    (arguments + result + timing) to avoid extra message bubbles.
    • This is probably where most of the complexity in this PR is, but it is essential to getting a good UX here. The simplest possible implementation involved creating a message bubble as the model started reasoning, then creating a separate message bubble for a tool call, then another message bubble as the model continued reasoning, and so on. It was essentially unusable. Having the UI layer collapse all of these related messages into one continuous message mirrors the experience that users expect.

Configuration & extensibility

  • Introduces a small tool registry so tools self-register with their schema + settings; the Settings UI auto-populates
    a Tools section (toggles + per-tool fields like timeout), and defaults are derived from tool registrations.

Tests

  • Adds unit + browser/e2e coverage for interpreter behavior, inline tool rendering, timeout settings UI, streaming
    reactivity/regressions, etc. These tests were created when bugs were encountered. I would be perfectly fine with throwing most of them away, but I figured there was no harm in including them.

Videos

Calculator tool

Screen.Recording.2025-12-15.at.8.10.04.AM.mov

Code Interpreter tool

Screen.Recording.2025-12-15.at.8.11.08.AM.mov

Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt

Screen.Recording.2025-12-15.at.8.14.48.AM.mov

Demonstrating how tool calling works for an Instruct model

Screen.Recording.2025-12-15.at.8.12.32.AM.mov

Demonstrating how the regenerate button will correctly treat the entire response as one message, instead of regenerating just the last segment after the last tool call.

Screen.Recording.2025-12-15.at.8.39.48.AM.mov

Deleting an entire response

Screen.Recording.2025-12-15.at.8.53.48.AM.mov

Screenshots

New Settings Screen for Tools

image

Known Bugs

  1. The delete button dialog pops up a count of messages that will be deleted, but the user would only expect that they are deleting "one" message.
  2. Sometimes the server returns that there was an error in the input stream after a tool call, and I haven't been able to reliably reproduce that.

@allozaur
Copy link
Collaborator

Hey, thanks a lot for your contribution! Will take a closer look at this when also reviewing and testing #17487. These are in a way intertwining changes and we need to be thoughtful when merging changes to the master branch.

@ServeurpersoCom
Copy link
Collaborator

This approach is interesting, but for a cleaner and more extensible codebase, it would be better to implement proper MCP protocol support first. This means providing small example MCP servers in Python (calculator, web browsing, etc.), and then nothing would prevent us from adding local JavaScript execution as an additional MCP tool, using the same unified context management architecture.
#17487 implements a full MCP client with protocol-compliant transport layers (WebSocket + Streamable HTTP), proper JSON-RPC 2.0 messaging, and an agentic orchestrator that can work with any MCP server.

@coder543
Copy link
Author

coder543 commented Dec 16, 2025

MCP is a much higher bar to clear, and I don’t see it as a replacement for this. Client-side tools immediately benefit everyone, where MCP is a much more advanced, much more niche technology. It certainly gets lots of hype because it can do cool things, but having tools that exist purely in the chat app allows everyone to instantly to give their models access to a code interpreter and calculator with no additional configuration, no additional services, etc.

One could even see MCP as a subset of this client side tool registry: there could be an MCP tool which lets models interact with configured MCP servers to discover which tools they offer and then run those tools through the MCP tool plugin.

Presenting tools over MCP that are just built into the client feels like only having a hammer and seeing every problem as a nail. Tool calls are the first level abstraction here, and MCPs are an abstraction on top of that. There’s no need to use MCP to access the tool calls that are part of the application itself; MCP is meant for connecting with external systems.

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 16, 2025

I think there's a misunderstanding. I'm not suggesting we force local js tool through MCP: that would indeed be overengineering.
My point is about code integration and agentic orchestration architecture. Your PR handles one tool execution cycle, but doesn't implement a true agentic loop (multi-turn tool->LLM->tool->LLM with state management, context protection, turn limits, etc.).
Rather than building two separate orchestrators (one for JS tools, one for external integrations later), we could have one unified agentic loop that accepts tools from any source: your JS Workers included.
The architecture question is: do we want modular tool plugins that plug into a shared orchestrator, or tightly-coupled per-tool-type orchestration logic and a higher code complexity ?

@coder543
Copy link
Author

multi-turn tool->LLM->tool->LLM

Mine does exactly that? I recommend looking at the video titled "Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt" in which the model calls multiple tools in response to a single prompt. Although this PR does not implement a limit on the number of tool calls that a model can do, instead leaving that up to the model and the user's patience.

we could have one unified agentic loop that accepts tools from any source

I agree that would make sense

The architecture question is: do we want modular tool plugins that plug into a shared orchestrator, or tightly-coupled per-tool-type orchestration logic and a higher code complexity ?

This is where you've lost me... this PR provides a loosely coupled tool registry (tools/server/webui/src/lib/services/tools/registry.ts). The agentic loop is not aware of any specific tools. The tools are defined in their own separate modules, like the calculator (tools/server/webui/src/lib/services/tools/calculator.ts).

I designed it to be very modular and extensible. People should be able to add additional tools without editing the settings dialog box, without editing the agentic loop -- without editing anything outside of their tool, other than adding an import statement for the registry to be aware of the new tool. Adding an import statement in tools/server/webui/src/lib/services/tools/index.ts is the only shared code that has to be touched when adding a new tool here. Otherwise, tools are completely independent of the existing code base.

@ServeurpersoCom
Copy link
Collaborator

You're right, I missed the recursive loop in processToolCallsAndContinue. That's actually really clean.
Genuine question though: beyond calculator and sandboxed code execution, what can client-side JS tools realistically do that's useful for LLMs? Everything interesting (filesystem access, web scraping, external APIs, database queries, RAG) hits either CORS restrictions or browser security boundaries.
That's the core reason I went with MCP servers as backend processes. Not because of protocol preference, but because the browser sandbox fundamentally limits what tools can accomplish.

@ServeurpersoCom
Copy link
Collaborator

In practice, I really enjoy being able to run/benchmark my models on a code interpreter to study algorithms or accurately solve complex math problems. But realistically, this remains useful for a minority of users.
Most production use cases need external integrations: databases, RAG, filesystems, APIs, web scraping. That's where the browser sandbox becomes the bottleneck. And shipping a pre-configured Python or binary MCP server is really not a problem. Users click "enable" in the UI, it launches .exe in the background, done. Zero manual configuration needed

@coder543
Copy link
Author

I agree that CORS does limit things pretty significantly, and that's one reason I see value in having MCP (even though I consider that more difficult precisely because of the requirement for something outside of the browser).

To directly address the question, one important tool that doesn't need CORS that I haven't implemented is a Canvas. A shared context that the model can edit and update, which can render as an HTML document view, a table, a markdown document, etc. This could be done entirely in a client-side tool, although that would require some kind of frontend support so that the plugin has somewhere to put the canvas content.

Another tool that such a plugin could provide without CORS would be a memory system, where the LLM is allowed to view and edit "memories" that persist between conversations.

FWIW, some public APIs will allow CORS from anything:

image

But, I agree that APIs like this are few and far between.

In an ideal world, I think APIs that are intended for agentic usage would ensure that their CORS allows being called directly from any origin.

The good news here is that if someone wants to use a CORS proxy, this tool registry plugin architecture has no trouble supporting a tool that calls APIs through the CORS-friendly proxy. Similarly, one could imagine an MCP tool in this registry which uses an MCP proxy to speak to MCPs that don't set their CORS headers in the right way.

But, I have no issue with the MCP concept being integrated more tightly than this tool registry – it could be considered a special tool that the current plugin architecture isn't suitable for. (Although I can envision ways to work around that if we really wanted to stick to just the plugin architecture in this PR.)

I'm not sure how these two PRs should be reconciled. I'm just presenting a solution that I think is minimally disruptive while bringing some real value. Giving LLMs access to a code interpreter is almost like giving them a superpower when they're dealing with challenging math-heavy problems. Whether llama.cpp adopts my PR or not doesn't really matter to me as long as we can get these two client-side tools into webui somehow. webui has become my default local LLM interface over the past several months, but the LLMs really struggle without these basic tools.

If people want to merge this PR: great. If people want to take pieces from it: that's great too. If people decide they don't want any part of this PR, I'm also fine with that, but I have been enjoying using this PR on my own system.

I have not had time to review or test your PR, so it could be great, and maybe it is the better solution – I do not know the answer.

@ServeurpersoCom
Copy link
Collaborator

Same here, I haven't had time to test your PR yet. The goal is to think about what we should implement in llama.cpp. A JS sandbox is obviously one of the ultimate toys for an LLM, and it's clear we already had the idea to do it, but there's been a lot of refactoring and more urgent things. You saw that we already have a button to execute an HTML/JS code block directly? But it doesn't have context enrichment capability, we didn't want to complicate the code. But if we properly standardize context management, it will be simple to add a "virtual MCP tool" that can be enabled with a checkbox, in the right place in the code. I need to study and test what you've done. Technically, what I had in mind for a good JS sandbox is a layer for editing that works very well intuitively with all models (what Anthropic Claude does on their site), slightly modified as "create_file / str_replace / view / exec_file" and the output enriches the context. And there you have a scientific tool in the browser, you don't even need a calculator, the LLM produces it :)

CORS can be bypassed by manipulating the browser, but that's really not great for security. In fact, we're limited to 127.0.0.1 (the immediate local client) but also, which is very powerful: the origin site hosting the SPA, meaning the llama.cpp backend itself. And there, imagine what we can do :)

@coder543
Copy link
Author

Technically, what I had in mind for a good JS sandbox is a layer for editing that works very well intuitively with all models (what Anthropic Claude does on their site), slightly modified as "create_file / str_replace / view / exec_file" and the output enriches the context. And there you have a scientific tool in the browser, you don't even need a calculator, the LLM produces it :)

I was thinking along these same lines earlier today, but it is important to tie the code interpreter state to the message tree, so that there is no cross-contamination between branches of messages or if the user reverts to an earlier point in the message tree (by clicking edit or regenerate). I have some ideas for how this could be done, but I haven't had time to fully explore it, and figured that might be a good follow-up PR if this one were merged.

@coder543
Copy link
Author

coder543 commented Dec 16, 2025

And shipping a pre-configured Python or binary MCP server is really not a problem. Users click "enable" in the UI, it launches .exe in the background, done. Zero manual configuration needed

One thought around this that's worth noting... I don't run llama-server on my laptop. I have a moderately powerful server where models run, but it is headless. Unless llama-server itself proxies requests to these helper servers, then there can be difficulties with directing the API calls to the right place. If the helper is not listening to 0.0.0.0, calls may not reach it at all. If the server is only allowing the llama-server ports through, the helpers may be unreachable even if they are listening to 0.0.0.0. If anything is going to be auto-configured and auto-launched, it might be worth having llama-server take care of directing requests appropriately. But, at that point... I would almost suggest these should be service-native tool-calls. With the major LLM APIs, you can basically pass an argument that says to allow the code interpreter tool (or the web search tool, etc.), and then the service processes those tool calls without involving your client application at all – and that's one solution that I think would be valid for what we're describing for MCP support. Then any application that is using llama-server could reap the benefits of these generic tool calls without having to implement tool calling support themselves.

But, changing llama-server is more invasive than just changing the webui frontend, so I can see how that might be a higher friction route.

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 16, 2025

I have a complete MCP backend implementation if you're curious. It exposes the same OpenAI Compat interface and handles everything in the background, like industry-standard LLM websites. I just need to add Anthropic API compatibility (not much to change). And it manages a Podman sandbox for producing, debugging, and running code in any language. Like, clone this repository, add this feature, compile, push or share me a link. The tools are "isomorphic" to Claude, with GLM 4.5 Air / GLM 4.6, it offers the same level of quality! -> Real-life use case: From my phone and a simple OpenAI compatible client, I cloned, patched, and recompiled the Jellyfin server (.NET). It gave me a .DLL (for Linux!), and I patched my other server, all locally, in 10 minutes, using GLM 4.5 Air. Doing it manually would have taken me several hours because I've never worked with .NET before.

I wanted to attach it to the MCP client later, along with a sandboxed Firefox browser, but we need to start with small, simple, ready-to-use examples. and your JavaScript sandbox would have higher priority, we need to add it to the MCP client as a checkbox:)

Integrating a basic MCP relay/proxy/bridge in pure C++ into the backend shouldn't be daunting for those who have already integrated a routing system like "llama-swap" into the same binary. it's just an already integrated cpp-httplib job !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants