Skip to content

Commit faea2bc

Browse files
Joseph PoonJoseph Poon
authored andcommitted
More hopper support
1 parent 195f36a commit faea2bc

File tree

7 files changed

+245
-153
lines changed

7 files changed

+245
-153
lines changed

Tools/llm-eval-compare/README.md

Lines changed: 26 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -6,153 +6,57 @@ A Python-based tool for comparing LLM evaluation results across multiple models,
66

77
### 1. Connect to Hopper Cluster
88

9-
The backend runs on the Hopper cluster. You need to forward a port from Hopper to your local machine.
10-
**⚠️ IMPORTANT**: Each user must use a **different Hopper port** to avoid conflicts.
11-
12-
#### Option A: Use the helper script from this repo (Recommended)
13-
149
```bash
1510
./hopper_connect.sh <your-username>
1611
```
1712

18-
This script:
19-
- Finds a free port on Hopper (8000-8999)
20-
- Connects to hopper via SSH with port forwarding from **your localhost:8000 to the free port on Hopper**
21-
- Later, the frontend (from your local machine) will send http requests to the backend (on Hopper) through this SSH tunnel.
22-
23-
#### Option B: Manual SSH port forwarding
13+
This script automatically:
14+
- Finds a free local port on your machine
15+
- Finds a free remote port on Hopper
16+
- Sets up SSH port forwarding
17+
- Saves port numbers to `.env` file (this file is automatically updated by the scripts)
2418

25-
```bash
26-
# Find a free port that other users aren't using
27-
ssh -L 8000:localhost:8000 <your-username>@hopper3.nus.edu.sg
28-
```
19+
**⚠️ Important:**
2920

30-
**Keep this SSH session open** while using the tool.
21+
- **If you run the script multiple times** - it updates `.env` with new port numbers each time
22+
- **Always use the latest SSH session** created by the most recent script run for starting your backend container
3123

3224
### 2. Start Backend on Hopper
3325

34-
SSH into Hopper and
35-
1. cd into `/scratch_aisg/SPEC-SF-AISG/tools/compare`
36-
2. There should be 3 things there: A sqsh file, a script (sh), and a copy of this repository.
37-
3. Run the script with `./llm-eval.sh`
26+
In the **same SSH session** opened by `hopper_connect.sh`:
3827

39-
**What happens under the hood:**
40-
- Enroot creates a container from the `.sqsh` image (contains all Python packages we need for the backend.)
41-
- Mounts the code directory (`llm-eval-compare`). Container will run the backend code later.
42-
- Mounts the data directory, hardcoded to `/scratch_aisg/SPEC-SF-AISG/xb.yong/results`.
43-
- Sets environment variables so the backend knows where to find data
44-
- The backend validates data on startup and prints status
28+
```bash
29+
cd /scratch_aisg/SPEC-SF-AISG/tools/compare
30+
./llm-eval.sh
31+
```
4532

46-
### 3. Run Frontend Locally
33+
The `LLM_EVAL_PORT` environment variable is already set correctly in this session.
4734

48-
Now start another terminal window from this repository's home directory:
49-
**z️ IMPORTANT**: This application uses PySide6 and certain libraries to fetch images from HuggingFace, so you are required to install them **in your machine**. See `requirements-frontend.txt`. If you have a suitable venv already, consider using it.
35+
### 3. Run Frontend Locally
5036

37+
In a **new terminal** window:
5138

5239
```bash
53-
# OPTIONAL: For safety, install these packages in a venv
54-
# This application uses PySide6 and certain libraries to fetch images from HuggingFace.
40+
# Install dependencies (first time only)
5541
pip install -r requirements-frontend.txt
5642

57-
# Set the API URL to match your forwarded port
58-
export LLM_EVAL_API_URL=http://localhost:8000
59-
# Use the LOCAL port that you used for forwarding. If you used the bash script `llm-eval.sh`, port is 8000
60-
61-
# Run the GUI
62-
python -m frontend/gui.py
43+
# Run the frontend
44+
./start_frontend.sh
6345
```
6446

65-
The GUI will connect to the backend through your SSH tunnel.
47+
The frontend reads port numbers from the `.env` file automatically. Port numbers are saved there by `hopper_connect.sh`.
6648

6749
## User Preferences
6850

69-
Customize field display and UI buttons by editing:
70-
71-
```
72-
frontend/config/user_preferences.yaml
73-
```
74-
75-
### How User Preferences Work
76-
77-
Preferences are scoped hierarchically (later overrides earlier):
78-
79-
1. **Global preferences** - Apply to all tasks/files
80-
2. **Task-specific preferences** - Override global for specific tasks
81-
3. **File pattern preferences** - Override for files matching patterns
82-
83-
### Common Customizations
84-
85-
**Hide a field globally:**
86-
```yaml
87-
global:
88-
image_url:
89-
display_mode: hidden
90-
```
91-
92-
**Show a field for specific tasks:**
93-
```yaml
94-
by_task:
95-
mmlu_pro:
96-
options:
97-
display_mode: tree
98-
priority: 15
99-
```
100-
101-
**Configure UI buttons:**
102-
```yaml
103-
ui_buttons:
104-
default:
105-
- field: prompt_text
106-
label: Query
107-
enabled: true
108-
priority: 10
109-
# Add more buttons...
110-
```
111-
112-
**Note**: If you have preferences that should be defaults for everyone, let us know and we can add them to the repository.
113-
114-
## Project Structure
115-
116-
For developers who need to modify the codebase:
117-
118-
```
119-
backend/
120-
├── core/ # Core implementation
121-
│ ├── handlers/ # Endpoint business logic
122-
│ │ └── runs.py # Functions like load_runs(), convert_to_run_details()
123-
│ ├── adapters/ # Data access layer
124-
│ │ ├── filesystem.py # Read from local filesystem
125-
│ │ └── s3.py # Read from S3 (future)
126-
│ └── models/ # Data structures
127-
│ ├── run_detail.py # RunDetailEnhanced class
128-
│ └── field_registry.py # Field metadata system
129-
├── main.py # FastAPI application
130-
└── utils/ # Utilities
131-
```
132-
133-
**Key locations:**
134-
- **Endpoint logic**: `backend/core/handlers/runs.py`
135-
- **Data access**: `backend/core/adapters/filesystem.py`
136-
- **Data structures**: `backend/core/models/run_detail.py`
137-
- **Field configuration**: `backend/config/field_registry.yaml`
138-
- **UI preferences**: `frontend/config/user_preferences.yaml`
51+
Customize field display and UI buttons by editing `frontend/config/user_preferences.yaml`.
13952

14053
## Troubleshooting
14154

14255
**Backend not responding:**
143-
- Check SSH tunnel is active: `ps aux | grep "ssh -L"`
144-
- Verify backend is running on cluster
145-
- Test connection: `curl http://localhost:8000/api/health`
56+
- Verify SSH tunnel is active: `ps aux | grep "ssh -L"`
57+
- Check backend container is running on cluster
58+
- The error dialog shows debugging info including port numbers
14659

14760
**Port conflicts:**
148-
- Each user must use a different local port
149-
- Use `hopper_connect.sh` to automatically find a free port
150-
151-
**Data not found:**
152-
- Verify data directory path is correct
153-
- Check enroot container has proper mount: `--mount /path/to/data:/app/data:ro`
154-
- Verify `LLM_EVAL_DATA_ROOT=/app/data` is set
155-
156-
## License
157-
158-
MIT License
61+
- `hopper_connect.sh` automatically finds free ports
62+
- Ports are saved in `.env` file

Tools/llm-eval-compare/backend/main.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1010,10 +1010,23 @@ async def get_memory_metrics():
10101010
print(f"PYTHONPATH: {os.getenv('PYTHONPATH', 'not set')}", file=sys.stderr)
10111011
print(f"LLM_EVAL_DATA_ROOT: {os.getenv('LLM_EVAL_DATA_ROOT', 'not set')}", file=sys.stderr)
10121012

1013+
# Port MUST be configured via LLM_EVAL_PORT environment variable (required, no default)
1014+
port_str = os.getenv("LLM_EVAL_PORT")
1015+
if not port_str:
1016+
print("ERROR: LLM_EVAL_PORT environment variable is required but not set", file=sys.stderr)
1017+
print("Please set LLM_EVAL_PORT to the port you want the backend to listen on", file=sys.stderr)
1018+
sys.exit(1)
1019+
try:
1020+
port = int(port_str)
1021+
except ValueError:
1022+
print(f"ERROR: LLM_EVAL_PORT must be a valid integer, got: {port_str}", file=sys.stderr)
1023+
sys.exit(1)
1024+
print(f"Starting server on port {port}", file=sys.stderr)
1025+
10131026
uvicorn.run(
10141027
app,
10151028
host="0.0.0.0",
1016-
port=8000,
1029+
port=port,
10171030
log_level="info",
10181031
access_log=True
10191032
)

Tools/llm-eval-compare/docker-compose.yml

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1-
# ⚠️ BEFORE RUNNING: You MUST set the LLM_EVAL_DATA_ROOT_HOST environment variable
2-
# Example: export LLM_EVAL_DATA_ROOT_HOST=/path/to/seahelm-v4-instruct
1+
# ⚠️ BEFORE RUNNING: You MUST set these environment variables:
2+
# - LLM_EVAL_DATA_ROOT_HOST: Path to your data directory on the host
3+
# - LLM_EVAL_PORT: Port for the FastAPI server to listen on
34
#
4-
# This is required - the compose file will fail to parse without it.
5+
# Example:
6+
# export LLM_EVAL_DATA_ROOT_HOST=/path/to/seahelm-v4-instruct
7+
# export LLM_EVAL_PORT=8000
8+
#
9+
# These are required - the backend will fail to start without them.
510

611
services:
712
backend:
@@ -11,7 +16,7 @@ services:
1116
container_name: llm-eval-compare-backend
1217
mem_limit: 256m
1318
ports:
14-
- "8000:8000"
19+
- "${LLM_EVAL_PORT}:${LLM_EVAL_PORT}"
1520
volumes:
1621
# ⚠️ REQUIRED: Mount your data directory containing model evaluation results
1722
# Set LLM_EVAL_DATA_ROOT_HOST environment variable before running docker-compose
@@ -30,6 +35,8 @@ services:
3035
# Host data root path (from user's environment variable) - used for clipboard copying
3136
# This converts /app/data/... to /path/to/host/... when copying file paths
3237
- LLM_EVAL_DATA_ROOT_HOST=${LLM_EVAL_DATA_ROOT_HOST}
38+
# Port for the FastAPI server (REQUIRED - must be set before running docker-compose)
39+
- LLM_EVAL_PORT=${LLM_EVAL_PORT}
3340
# Optional: Enable memory metrics logging to file (for plotting/analysis)
3441
# Commented out for production deployment (metrics still available via API endpoint)
3542
# - LLM_EVAL_MEMORY_LOG=/app/metrics/memory.jsonl
@@ -44,7 +51,8 @@ services:
4451
restart: "no" # Fail fast if data validation fails
4552
healthcheck:
4653
# Use lightweight health endpoint (includes memory metrics)
47-
test: ["CMD", "python", "-c", "import requests; r=requests.get('http://localhost:8000/api/health', timeout=5); assert r.status_code==200 and r.json().get('status')=='ok'"]
54+
# Note: Healthcheck uses the port from LLM_EVAL_PORT env var (must be set)
55+
test: ["CMD", "sh", "-c", "port=${LLM_EVAL_PORT}; python -c \"import requests; r=requests.get(f'http://localhost:{port}/api/health', timeout=5); assert r.status_code==200 and r.json().get('status')=='ok'\""]
4856
interval: 30s
4957
timeout: 10s
5058
retries: 3

Tools/llm-eval-compare/frontend/gui.py

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -337,12 +337,29 @@ def main():
337337

338338
ds = RestDataSource(API_BASE_URL)
339339
if not ds.ping():
340-
QMessageBox.critical(None, "Error",
341-
f"API server is not responding at {API_BASE_URL}.\n\n"
342-
"To connect to a remote backend, set:\n"
343-
" export LLM_EVAL_API_URL=http://your-cluster-host:8000\n\n"
344-
"For local development:\n"
345-
" cd backend && python main.py")
340+
# Try to get remote port info from environment for better error message
341+
remote_port = os.getenv("LLM_EVAL_REMOTE_PORT", "unknown")
342+
local_port = os.getenv("LLM_EVAL_LOCAL_PORT", "unknown")
343+
344+
error_msg = (
345+
f"Backend API server is not responding at {API_BASE_URL}.\n\n"
346+
f"Debugging info:\n"
347+
f" • Local port (SSH tunnel): {local_port}\n"
348+
f" • Remote port (on cluster): {remote_port}\n\n"
349+
"Troubleshooting steps:\n"
350+
"1. Verify SSH tunnel is active:\n"
351+
" - Check that hopper_connect.sh session is still running\n"
352+
" - Try: ps aux | grep 'ssh -L'\n\n"
353+
f"2. Verify backend is running on cluster:\n"
354+
f" - SSH to cluster and check container is running\n"
355+
f" - Backend should be listening on port {remote_port}\n\n"
356+
f"3. Test SSH tunnel manually:\n"
357+
f" curl http://localhost:{local_port}/api/health\n\n"
358+
"4. Restart connection:\n"
359+
" ./hopper_connect.sh <username>"
360+
)
361+
362+
QMessageBox.critical(None, "Backend Connection Error", error_msg)
346363
sys.exit(1)
347364

348365
window = MainWindow(data_source=ds)

Tools/llm-eval-compare/frontend/image_loader.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@
1818
os.environ.setdefault('HF_DATASETS_CACHE', os.path.expanduser('~/.cache/huggingface/datasets'))
1919

2020
try:
21-
from image_dataloaders import (
21+
from .image_dataloaders import (
2222
load_cvqa_image,
2323
load_marvl_image,
2424
load_xm3600_image,
2525
load_mathvista_image,
2626
load_world_cuisine_image,
2727
is_world_cuisine_dataset_loaded,
2828
)
29-
from image_dataloaders.marvl import is_marvl_dataset_loaded, load_marvl_dataset_full
29+
from .image_dataloaders.marvl import is_marvl_dataset_loaded, load_marvl_dataset_full
3030
from PIL import Image
3131
from io import BytesIO
3232
DATALOADERS_AVAILABLE = True

0 commit comments

Comments
 (0)