Chat LLM is a Swiss Army Knife for LLM-powered tasks. It's a simple, zero-dependency tool to chat with an LLM (large language model) that provides advanced features like agent orchestration, context management, memory systems, task automation, and enterprise-grade monitoring.
It works seamlessly with both cloud-based LLM services (e.g., OpenAI GPT, Groq, OpenRouter) and locally hosted LLMs (e.g. llama.cpp, LM Studio, Ollama).
Chat LLM is accessible via the terminal or through its minimalist web interface.
Current Version: 2.2.0 (Development)
To run Chat LLM, ensure that Node.js (v18 or higher) or Bun is installed.
./chat-llm.jsTo obtain quick responses, pipe a question directly:
echo "Top travel destinations in Indonesia?" | ./chat-llm.jsFor specific tasks:
echo "Translate 'thank you' into German" | ./chat-llm.jsChat LLM also includes a minimalist front-end web interface. To launch it, specify the environment variable HTTP_PORT, for example:
HTTP_PORT=5000 ./chat-llm.jsThen, open a web browser and go to localhost:5000.
Chat LLM v2 introduces powerful multi-purpose agent capabilities:
Specialized agents for different tasks:
- Researcher - Information gathering and synthesis
- Coder - Programming and debugging
- Writer - Content creation and editing
- Analyst - Data analysis and insights
- Tutor - Educational explanations
- Solver - Problem-solving methodology
- Support - Customer service
./chat-llm.js agent-list # View all agents
./chat-llm.js agent-activate coder # Activate agent
./chat-llm.js agent-stats # View statisticsWork with custom data and knowledge bases:
./chat-llm.js context-create research # Create context
./chat-llm.js context-list # List contexts
./chat-llm.js context-activate research # Activate contextPre-built templates for common tasks with variable substitution and conditionals:
./chat-llm.js prompt-list # View templates
./chat-llm.js prompt-render analysis # Display templatePersistent conversation memory with automatic summarization:
./chat-llm.js memory-list # List conversations
./chat-llm.js memory-stats # Memory usageQueue tasks, manage workflows, and batch process:
./chat-llm.js task-list # View tasks
./chat-llm.js task-stats # Queue statisticsBuilt-in sentiment analysis, request logging, and statistics:
./chat-llm.js sentiment "text" # Analyze sentiment
./chat-llm.js stats # Request statistics
./chat-llm.js export json # Export logsChat LLM v2.2 introduces enterprise-grade monitoring, webhooks, and observability:
Export system metrics in Prometheus or JSON format for monitoring dashboards:
./chat-llm.js metrics-summary # Quick metrics overview
./chat-llm.js metrics-export prometheus # Prometheus format
./chat-llm.js metrics-export json # JSON formatMetrics include:
- System uptime and performance
- Request/response statistics
- Cache hit/miss ratios
- Agent usage patterns
- Memory and resource utilization
Trigger HTTP webhooks for events and integrate with external systems:
./chat-llm.js webhook-list # List webhooks
./chat-llm.js webhook-register event url # Register webhook
./chat-llm.js webhook-stats # Delivery statisticsFeatures:
- Event-based webhooks with pattern matching
- Retry logic with exponential backoff
- HMAC signatures for security
- Delivery tracking and logs
Comprehensive test suites for quality assurance:
node tests/e2e-tests.js # End-to-end tests
node tests/test-suite.js # Unit testsFor detailed v2 features and examples, see QUICK_START.md and DEVELOPMENT.md.
For v2.2 development roadmap, see V2_2_DEVELOPMENT_PLAN.md.
Chat LLM automatically caches responses for 24 hours to avoid repeated calls to your LLM API. Cached results live in ./cache and are reused instantly in the terminal and web UI. This keeps latency low and saves API credits when you revisit the same prompt during a debugging or evaluation session.
./chat-llm.js cache-stats # Inspect memory/disk cache usage
./chat-llm.js cache-clear # Purge all cached responses
./chat-llm.js config-get caching.enabled
./chat-llm.js config-set caching.enabled false # Disable caching
./chat-llm.js config-set caching.enabled true # Re-enable cachingWhen caching is disabled via the config command, Chat LLM immediately falls back to live responses without touching the cache. Re-enabling restores the 24-hour TTL without restarting the app.
Activating an agent (./chat-llm.js agent-activate coder) now feeds that persona’s system prompt straight into every chat request, so the CLI/web UI instantly adopts the tone and capabilities of the selected specialist. The active context (context-create, context-activate) is summarized and injected as another system message, giving the LLM a compact view of your tagged data and uploaded documents before it answers.
./chat-llm.js agent-list
./chat-llm.js agent-activate researcher
./chat-llm.js context-create customer-success
./chat-llm.js context-activate customer-success
./chat-llm.js memory-list
./chat-llm.js memory-statsTerminal and web sessions persist into the new Memory Manager (./memory/), so memory-list can replay full transcripts even across restarts. Each cache hit is streamed via the existing delegate, logged with metadata (agent, context, model), and recorded in memory so you can audit automated workflows.
The Prompt Manager ships with battle-tested templates for analysis, coding, research, and more. You can now render them directly from the CLI with inline variables:
./chat-llm.js prompt-run analysis data="Q4 sales dipped 14%" focus="root-cause, mitigation"Combine this with prompt-list and prompt-render to inspect or extend the templates before handing the generated instructions to the chat runtime.
Chat LLM is capable of conversing in multiple languages beyond English. It consistently responds in the same language as the question posed. Additionally, it supports seamless language switching between queries, as illustrated in the following example:
>> Which planet in our solar system is the largest?
Jupiter is the largest planet in our solar system.
>> ¿Y el más caliente?
Venus es el planeta más caliente, con hasta 475 grados Celsius.
The continuous integration workflows for Chat LLM include evaluation tests in English, Spanish, German, French, Italian, and Indonesian. All tests are conducted with LLMs that have at least a 128K context window length.
Supported local LLM servers include llama.cpp, Jan, Ollama, LocalAI, LM Studio, and Msty.
To utilize llama.cpp locally with its inference engine, load a quantized model like Llama-3.2 3B or Gemma-3 4B. Then set the LLM_API_BASE_URL environment variable:
/path/to/llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf
export LLM_API_BASE_URL=http://127.0.0.1:8080/v1To use Jan with its local API server, refer to its documentation. Load a model like Llama-3.2 3B or Gemma-3 4B, and set the following environment variables:
export LLM_API_BASE_URL=http://127.0.0.1:1337/v1
export LLM_CHAT_MODEL='llama3-8b-instruct'To use Ollama locally, load a model and configure the environment variable LLM_API_BASE_URL:
ollama pull llama3.2
export LLM_API_BASE_URL=http://127.0.0.1:11434/v1
export LLM_CHAT_MODEL='llama3.2'For LocalAI, initiate its container and adjust the environment variable LLM_API_BASE_URL:
docker run -ti -p 8080:8080 localai/localai llama-3.2-3b-instruct:q4_k_m
export LLM_API_BASE_URL=http://localhost:3928/v1For LM Studio, pick a model (e.g., Llama-3.2 3B). Next, go to the Developer tab, select the model to load, and click the Start Server button. Then, set the LLM_API_BASE_URL environment variable, noting that the server by default runs on port 1234:
export LLM_API_BASE_URL=http://127.0.0.1:1234/v1For Msty, choose a model (e.g., Llama-3.2 3B) and ensure the local AI is running. Go to the Settings menu, under Local AI, and note the Service Endpoint (which defaults to port 10002). Then set the LLM_API_BASE_URL environment variable accordingly:
export LLM_API_BASE_URL=http://127.0.0.1:10002/v1Supported LLM services include Cerebras, Deep Infra, DeepSeek, Fireworks, Groq, Hyperbolic, Mistral, Nebius, Novita, OpenAI, OpenRouter, and Together.
For configuration specifics, refer to the relevant section. The quality of answers can vary based on the model's performance.
export LLM_API_BASE_URL=https://api.cerebras.ai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="llama3.1-8b"export LLM_API_BASE_URL=https://api.deepinfra.com/v1/openai
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"export LLM_API_BASE_URL=https://api.deepseek.com/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="deepseek-chat"export LLM_API_BASE_URL=https://api.fireworks.ai/inference/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="accounts/fireworks/models/qwen3-30b-a3b"export LLM_API_BASE_URL=https://glama.ai/api/gateway/openai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL='ministral-3b-2410'export LLM_API_BASE_URL=https://api.groq.com/openai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="llama-3.1-8b-instant"export LLM_API_BASE_URL=https://api.hyperbolic.xyz/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"export LLM_API_BASE_URL=https://api.mistral.ai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="mistral-small-latest"export LLM_API_BASE_URL=https://api.studio.nebius.ai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"export LLM_API_BASE_URL=https://api.novita.ai/v3/openai
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/llama-3.1-8b-instruct"export LLM_API_BASE_URL=https://api.openai.com/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="gpt-5-nano"export LLM_API_BASE_URL=https://openrouter.ai/api/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/llama-3.1-8b-instruct"export LLM_API_BASE_URL=https://api.together.xyz/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"If there is a text file containing pairs of User and Assistant messages, it can be evaluated with Chat LLM:
User: Which planet is the largest?
Assistant: The largest planet is /Jupiter/.
User: and the smallest?
Assistant: The smallest planet is /Mercury/.
Assuming the above content is in qa.txt, executing the following command will initiate a multi-turn conversation with the LLM, asking questions sequentially and verifying answers using regular expressions:
./chat-llm.js qa.txtFor additional examples, please refer to the tests/ subdirectory.
Chat LLM v2 includes a comprehensive configuration system for managing application settings and profiles:
# Get configuration values
./chat-llm.js config-get models.temperature
./chat-llm.js config-get caching.enabled
# Set configuration values
./chat-llm.js config-set models.temperature 0.8
./chat-llm.js config-set caching.ttl 3600000
# List available profiles
./chat-llm.js config-listDefault configuration includes:
- Model settings (temperature, max tokens)
- Caching behavior (TTL: 24 hours)
- Logging configuration
- API timeout and retry settings
Intelligent response caching reduces API calls and improves performance:
# View cache statistics
./chat-llm.js cache-stats
# Clear cache
./chat-llm.js cache-clearThe cache automatically stores responses with a 24-hour TTL and can be enabled/disabled via configuration.
All requests are logged automatically for monitoring and debugging:
# View statistics
./chat-llm.js stats
# Export logs
./chat-llm.js export json > logs.json
./chat-llm.js export csv > logs.csvLogs include:
- Request timestamps
- Operation types
- Response times
- Request/response content (truncated)
Built-in sentiment analysis for understanding user input and conversation tone:
./chat-llm.js sentiment "This is amazing!"Returns sentiment classification (positive, negative, neutral) with scores.
Test the UI and features without API credentials:
LLM_DEMO_MODE=1 HTTP_PORT=5000 ./chat-llm.jsDemo mode simulates intelligent responses for testing and development.
Chat LLM v2 is built with the following components:
- Core: Zero-dependency chat interface
- Cache: Automatic response caching (memory + disk)
- Config: Settings and profile management
- Logger: Request tracking and analytics
- Monitor: Performance metrics collection
- Tools: Sentiment analysis and utilities
- Agents: Multi-purpose agent orchestration
- Context: Custom data and knowledge management
- Prompts: Advanced template system
- Memory: Conversation history and persistence
- Tasks: Workflow and batch processing
All components are optional and can be disabled via configuration for minimal resource usage.
- README.md - Overview and quick start guide (this file)
- QUICK_START.md - Quick reference for v2 features
- DEVELOPMENT.md - Development guide and architecture details
- API_REFERENCE.md - Complete API documentation for all modules
- EXAMPLES.md - Practical examples and real-world use cases
- ROADMAP.md - Current development roadmap and enhancements
- FUTURE_FEATURES.md - Future feature proposals and ideas
- RELEASE_NOTES_V2.md - Version 2 release notes
- tests/ - Evaluation tests in multiple languages
Chat LLM v2 includes comprehensive improvements:
All public APIs validate inputs with descriptive error messages:
// Example: Type checking and validation
if (typeof text !== 'string' || text.trim().length === 0) {
throw new TypeError('Text must be a non-empty string');
}Automatic limits prevent memory overflow:
- Request Logger: 10,000 in-memory logs
- Response Cache: 1,000 memory entries
- Performance Monitor: Configurable limit (default: 10,000)
All modules include comprehensive error handling:
try {
const result = await operation();
logger.logRequest('operation', input, result, duration);
} catch (error) {
logger.logRequest('operation', input, error.message, duration, { error: true });
throw error;
}Built-in metrics tracking:
- P95/P99 latency percentiles
- Cache hit rates
- Memory usage monitoring
- Operation statistics
Contributions are welcome! Please:
- Read the DEVELOPMENT.md guide
- Review FUTURE_FEATURES.md for feature ideas
- Check existing issues and PRs
- Follow the code quality standards
- Add tests for new features
- Update documentation
- Issues: Report bugs or request features on GitHub
- Discussions: Share ideas and ask questions
- Examples: Check EXAMPLES.md for common patterns
- API Docs: See API_REFERENCE.md for detailed API info
See LICENSE file for details.
Chat LLM v2 is built with zero external dependencies, leveraging only Node.js built-in modules for maximum portability and minimal overhead.
Version: 2.0.0
Last Updated: December 8, 2025
Maintainer: yonikashi432