More hopper support

Joseph Poon · Joseph Poon · commit faea2bcfa9de · 2025-11-04T16:08:47.000+08:00
diff --git a/Tools/llm-eval-compare/README.md b/Tools/llm-eval-compare/README.md
@@ -6,153 +6,57 @@ A Python-based tool for comparing LLM evaluation results across multiple models,
 
 ### 1. Connect to Hopper Cluster
 
-The backend runs on the Hopper cluster. You need to forward a port from Hopper to your local machine.
-**⚠️ IMPORTANT**: Each user must use a **different Hopper port** to avoid conflicts.
-
-#### Option A: Use the helper script from this repo (Recommended)
-
 ```bash
 ./hopper_connect.sh <your-username>
 ```
 
-This script:
-- Finds a free port on Hopper (8000-8999)
-- Connects to hopper via SSH with port forwarding from **your localhost:8000 to the free port on Hopper**
-- Later, the frontend (from your local machine) will send http requests to the backend (on Hopper) through this SSH tunnel.
-
-#### Option B: Manual SSH port forwarding
+This script automatically:
+- Finds a free local port on your machine
+- Finds a free remote port on Hopper
+- Sets up SSH port forwarding
+- Saves port numbers to `.env` file (this file is automatically updated by the scripts)
 
-```bash
-# Find a free port that other users aren't using
-ssh -L 8000:localhost:8000 <your-username>@hopper3.nus.edu.sg
-```
+**⚠️ Important:**
 
-**Keep this SSH session open** while using the tool.
+- **If you run the script multiple times** - it updates `.env` with new port numbers each time
+- **Always use the latest SSH session** created by the most recent script run for starting your backend container
 
 ### 2. Start Backend on Hopper
 
-SSH into Hopper and
-1. cd into `/scratch_aisg/SPEC-SF-AISG/tools/compare`
-2. There should be 3 things there: A sqsh file, a script (sh), and a copy of this repository.
-3. Run the script with `./llm-eval.sh`
+In the **same SSH session** opened by `hopper_connect.sh`:
 
-**What happens under the hood:**
-- Enroot creates a container from the `.sqsh` image (contains all Python packages we need for the backend.)
-- Mounts the code directory (`llm-eval-compare`). Container will run the backend code later.
-- Mounts the data directory, hardcoded to `/scratch_aisg/SPEC-SF-AISG/xb.yong/results`.
-- Sets environment variables so the backend knows where to find data
-- The backend validates data on startup and prints status
+```bash
+cd /scratch_aisg/SPEC-SF-AISG/tools/compare
+./llm-eval.sh
+```
 
-### 3. Run Frontend Locally
+The `LLM_EVAL_PORT` environment variable is already set correctly in this session.
 
-Now start another terminal window from this repository's home directory:
-**z️ IMPORTANT**: This application uses PySide6 and certain libraries to fetch images from HuggingFace, so you are required to install them **in your machine**. See `requirements-frontend.txt`. If you have a suitable venv already, consider using it.
+### 3. Run Frontend Locally
 
+In a **new terminal** window:
 
 ```bash
-# OPTIONAL: For safety, install these packages in a venv
-# This application uses PySide6 and certain libraries to fetch images from HuggingFace.
+# Install dependencies (first time only)
 pip install -r requirements-frontend.txt
 
-# Set the API URL to match your forwarded port
-export LLM_EVAL_API_URL=http://localhost:8000  
-# Use the LOCAL port that you used for forwarding. If you used the bash script `llm-eval.sh`, port is 8000
-
-# Run the GUI
-python -m frontend/gui.py
+# Run the frontend
+./start_frontend.sh
 ```
 
-The GUI will connect to the backend through your SSH tunnel.
+The frontend reads port numbers from the `.env` file automatically. Port numbers are saved there by `hopper_connect.sh`.
 
 ## User Preferences
 
-Customize field display and UI buttons by editing:
-
-```
-frontend/config/user_preferences.yaml
-```
-
-### How User Preferences Work
-
-Preferences are scoped hierarchically (later overrides earlier):
-
-1. **Global preferences** - Apply to all tasks/files
-2. **Task-specific preferences** - Override global for specific tasks
-3. **File pattern preferences** - Override for files matching patterns
-
-### Common Customizations
-
-**Hide a field globally:**
-```yaml
-global:
-  image_url:
-    display_mode: hidden
-```
-
-**Show a field for specific tasks:**
-```yaml
-by_task:
-  mmlu_pro:
-    options:
-      display_mode: tree
-      priority: 15
-```
-
-**Configure UI buttons:**
-```yaml
-ui_buttons:
-  default:
-    - field: prompt_text
-      label: Query
-      enabled: true
-      priority: 10
-    # Add more buttons...
-```
-
-**Note**: If you have preferences that should be defaults for everyone, let us know and we can add them to the repository.
-
-## Project Structure
-
-For developers who need to modify the codebase:
-
-```
-backend/
-├── core/                    # Core implementation
-│   ├── handlers/            # Endpoint business logic
-│   │   └── runs.py          # Functions like load_runs(), convert_to_run_details()
-│   ├── adapters/            # Data access layer
-│   │   ├── filesystem.py    # Read from local filesystem
-│   │   └── s3.py            # Read from S3 (future)
-│   └── models/              # Data structures
-│       ├── run_detail.py   # RunDetailEnhanced class
-│       └── field_registry.py # Field metadata system
-├── main.py                  # FastAPI application
-└── utils/                   # Utilities
-```
-
-**Key locations:**
-- **Endpoint logic**: `backend/core/handlers/runs.py`
-- **Data access**: `backend/core/adapters/filesystem.py`
-- **Data structures**: `backend/core/models/run_detail.py`
-- **Field configuration**: `backend/config/field_registry.yaml`
-- **UI preferences**: `frontend/config/user_preferences.yaml`
+Customize field display and UI buttons by editing `frontend/config/user_preferences.yaml`.
 
 ## Troubleshooting
 
 **Backend not responding:**
-- Check SSH tunnel is active: `ps aux | grep "ssh -L"`
-- Verify backend is running on cluster
-- Test connection: `curl http://localhost:8000/api/health`
+- Verify SSH tunnel is active: `ps aux | grep "ssh -L"`
+- Check backend container is running on cluster
+- The error dialog shows debugging info including port numbers
 
 **Port conflicts:**
-- Each user must use a different local port
-- Use `hopper_connect.sh` to automatically find a free port
-
-**Data not found:**
-- Verify data directory path is correct
-- Check enroot container has proper mount: `--mount /path/to/data:/app/data:ro`
-- Verify `LLM_EVAL_DATA_ROOT=/app/data` is set
-
-## License
-
-MIT License
+- `hopper_connect.sh` automatically finds free ports
+- Ports are saved in `.env` file
diff --git a/Tools/llm-eval-compare/backend/main.py b/Tools/llm-eval-compare/backend/main.py
@@ -1010,10 +1010,23 @@ async def get_memory_metrics():
         print(f"PYTHONPATH: {os.getenv('PYTHONPATH', 'not set')}", file=sys.stderr)
         print(f"LLM_EVAL_DATA_ROOT: {os.getenv('LLM_EVAL_DATA_ROOT', 'not set')}", file=sys.stderr)
         
+        # Port MUST be configured via LLM_EVAL_PORT environment variable (required, no default)
+        port_str = os.getenv("LLM_EVAL_PORT")
+        if not port_str:
+            print("ERROR: LLM_EVAL_PORT environment variable is required but not set", file=sys.stderr)
+            print("Please set LLM_EVAL_PORT to the port you want the backend to listen on", file=sys.stderr)
+            sys.exit(1)
+        try:
+            port = int(port_str)
+        except ValueError:
+            print(f"ERROR: LLM_EVAL_PORT must be a valid integer, got: {port_str}", file=sys.stderr)
+            sys.exit(1)
+        print(f"Starting server on port {port}", file=sys.stderr)
+        
         uvicorn.run(
             app,
             host="0.0.0.0",
-            port=8000,
+            port=port,
             log_level="info",
             access_log=True
         )
diff --git a/Tools/llm-eval-compare/docker-compose.yml b/Tools/llm-eval-compare/docker-compose.yml
@@ -1,7 +1,12 @@
-# ⚠️ BEFORE RUNNING: You MUST set the LLM_EVAL_DATA_ROOT_HOST environment variable
-# Example: export LLM_EVAL_DATA_ROOT_HOST=/path/to/seahelm-v4-instruct
+# ⚠️ BEFORE RUNNING: You MUST set these environment variables:
+#   - LLM_EVAL_DATA_ROOT_HOST: Path to your data directory on the host
+#   - LLM_EVAL_PORT: Port for the FastAPI server to listen on
 #
-# This is required - the compose file will fail to parse without it.
+# Example:
+#   export LLM_EVAL_DATA_ROOT_HOST=/path/to/seahelm-v4-instruct
+#   export LLM_EVAL_PORT=8000
+#
+# These are required - the backend will fail to start without them.
 
 services:
   backend:
@@ -11,7 +16,7 @@ services:
     container_name: llm-eval-compare-backend
     mem_limit: 256m
     ports:
-      - "8000:8000"
+      - "${LLM_EVAL_PORT}:${LLM_EVAL_PORT}"
     volumes:
       # ⚠️ REQUIRED: Mount your data directory containing model evaluation results
       # Set LLM_EVAL_DATA_ROOT_HOST environment variable before running docker-compose
@@ -30,6 +35,8 @@ services:
       # Host data root path (from user's environment variable) - used for clipboard copying
       # This converts /app/data/... to /path/to/host/... when copying file paths
       - LLM_EVAL_DATA_ROOT_HOST=${LLM_EVAL_DATA_ROOT_HOST}
+      # Port for the FastAPI server (REQUIRED - must be set before running docker-compose)
+      - LLM_EVAL_PORT=${LLM_EVAL_PORT}
       # Optional: Enable memory metrics logging to file (for plotting/analysis)
       # Commented out for production deployment (metrics still available via API endpoint)
       # - LLM_EVAL_MEMORY_LOG=/app/metrics/memory.jsonl
@@ -44,7 +51,8 @@ services:
     restart: "no"  # Fail fast if data validation fails
     healthcheck:
       # Use lightweight health endpoint (includes memory metrics)
-      test: ["CMD", "python", "-c", "import requests; r=requests.get('http://localhost:8000/api/health', timeout=5); assert r.status_code==200 and r.json().get('status')=='ok'"]
+      # Note: Healthcheck uses the port from LLM_EVAL_PORT env var (must be set)
+      test: ["CMD", "sh", "-c", "port=${LLM_EVAL_PORT}; python -c \"import requests; r=requests.get(f'http://localhost:{port}/api/health', timeout=5); assert r.status_code==200 and r.json().get('status')=='ok'\""]
       interval: 30s
       timeout: 10s
       retries: 3
diff --git a/Tools/llm-eval-compare/frontend/gui.py b/Tools/llm-eval-compare/frontend/gui.py
@@ -337,12 +337,29 @@ def main():
 
     ds = RestDataSource(API_BASE_URL)
     if not ds.ping():
-        QMessageBox.critical(None, "Error",
-                              f"API server is not responding at {API_BASE_URL}.\n\n"
-                              "To connect to a remote backend, set:\n"
-                              "  export LLM_EVAL_API_URL=http://your-cluster-host:8000\n\n"
-                              "For local development:\n"
-                              "  cd backend && python main.py")
+        # Try to get remote port info from environment for better error message
+        remote_port = os.getenv("LLM_EVAL_REMOTE_PORT", "unknown")
+        local_port = os.getenv("LLM_EVAL_LOCAL_PORT", "unknown")
+        
+        error_msg = (
+            f"Backend API server is not responding at {API_BASE_URL}.\n\n"
+            f"Debugging info:\n"
+            f"  • Local port (SSH tunnel): {local_port}\n"
+            f"  • Remote port (on cluster): {remote_port}\n\n"
+            "Troubleshooting steps:\n"
+            "1. Verify SSH tunnel is active:\n"
+            "   - Check that hopper_connect.sh session is still running\n"
+            "   - Try: ps aux | grep 'ssh -L'\n\n"
+            f"2. Verify backend is running on cluster:\n"
+            f"   - SSH to cluster and check container is running\n"
+            f"   - Backend should be listening on port {remote_port}\n\n"
+            f"3. Test SSH tunnel manually:\n"
+            f"   curl http://localhost:{local_port}/api/health\n\n"
+            "4. Restart connection:\n"
+            "   ./hopper_connect.sh <username>"
+        )
+        
+        QMessageBox.critical(None, "Backend Connection Error", error_msg)
         sys.exit(1)
 
     window = MainWindow(data_source=ds)
diff --git a/Tools/llm-eval-compare/frontend/image_loader.py b/Tools/llm-eval-compare/frontend/image_loader.py
@@ -18,15 +18,15 @@
 os.environ.setdefault('HF_DATASETS_CACHE', os.path.expanduser('~/.cache/huggingface/datasets'))
 
 try:
-    from image_dataloaders import (
+    from .image_dataloaders import (
         load_cvqa_image,
         load_marvl_image,
         load_xm3600_image,
         load_mathvista_image,
         load_world_cuisine_image,
         is_world_cuisine_dataset_loaded,
     )
-    from image_dataloaders.marvl import is_marvl_dataset_loaded, load_marvl_dataset_full
+    from .image_dataloaders.marvl import is_marvl_dataset_loaded, load_marvl_dataset_full
     from PIL import Image
     from io import BytesIO
     DATALOADERS_AVAILABLE = True
diff --git a/Tools/llm-eval-compare/hopper_connect.sh b/Tools/llm-eval-compare/hopper_connect.sh
diff --git a/Tools/llm-eval-compare/start_frontend.sh b/Tools/llm-eval-compare/start_frontend.sh