SemRay - AI-Powered Semantic Analysis for IDA Pro

SemRay is a powerful IDA Pro plugin that leverages Google's Gemini AI to provide intelligent semantic analysis of binary code. It automatically suggests meaningful function names, detailed comments, and descriptive variable renames based on deep contextual understanding of your code.

Features

Intelligent Function Naming: Generate concise, descriptive function names that encode role and domain (e.g., crc32_checksum, parse_http_header)
Comprehensive Comments: Automatically create detailed multi-line comments explaining function behavior
Variable Renaming: Suggest meaningful names for local variables and function arguments
Context-Aware Analysis: Analyzes callers, callees, and cross-references to understand function relationships
Flexible Analysis Modes:
- Analyze single functions
- Analyze all functions in context
- Analyze functions within N levels of call depth
Multiple Content Modes: Choose between decompiled C code or raw assembly for LLM analysis
Optional CodeDumper Integration: Enhanced context discovery with virtual calls, jump tables, and PTN provenance annotations
Interactive UI: Review and selectively apply suggested changes through an intuitive tabbed interface

Requirements

Essential

IDA Pro 7.6+ with Python 3 and PyQt5 support
Hex-Rays Decompiler (for decompilation mode and PTN analysis)
Python Libraries:
```
pip install google-genai pydantic
```
Google AI API Key: Required for Gemini API access

Installation

1. Plugin Installation

Copy the plugin directory to your IDA plugins folder:

# Linux
cp -r semray ~/.idapro/plugins/semray

# Windows
copy semray "C:\Users\YourName\AppData\Roaming\Hex-Rays\IDA Pro\plugins\semray"

# macOS
cp -r semray ~/Library/Application\ Support/Hex-Rays/IDA\ Pro/plugins/semray

Alternatively, you can place it directly in the IDA installation's plugins directory:

# Example for Linux
cp -r semray /opt/ida-pro/plugins/semray

2. Python Dependencies

Install required Python libraries in IDA's Python environment:

# If using system Python (ensure it matches IDA's Python version)
pip install google-genai pydantic

# If using IDA's bundled Python
/path/to/ida/python3 -m pip install google-genai pydantic

3. API Key Configuration

Set your Google AI API key as an environment variable:

# Linux/macOS - Add to ~/.bashrc or ~/.zshrc
export GOOGLE_API_KEY="your-api-key-here"

# Windows - Set as system environment variable
setx GOOGLE_API_KEY "your-api-key-here"

To obtain a Google AI API key:

Visit Google AI Studio
Sign in with your Google account
Create a new API key
Copy and set it as the GOOGLE_API_KEY environment variable

4. Verify Installation

Start IDA Pro and check the output window for:

Initializing SemRay (Google AI Semantic Analysis) plugin.
SemRay: CodeDumper integration enabled.  # (if CodeDumper is available)
SemRay (Google AI Semantic Analysis) initialized successfully.

Usage

Quick Start

Navigate to a function in IDA Pro (Disassembly or Pseudocode view)
Right-click to open the context menu
Select SemRay Analysis from the menu
Choose your analysis mode:
- Analyze CURRENT Func Only: Analyzes only the selected function
- Analyze ALL Funcs in Context: Analyzes the function plus all callers/callees in context
- Analyze Current + N Levels: Analyzes functions within N depth levels

Configuration Prompts

When you trigger an analysis, you'll be prompted for:

Content Mode: Choose between:
- Decompiled: Uses Hex-Rays decompiled C pseudocode (recommended)
- Assembly: Uses raw disassembly (useful when decompilation fails)
Context Depths:
- Caller Depth: How many levels of calling functions to include (default: 1)
- Callee Depth: How many levels of called functions to include (default: 1)
- Analysis Depth: (Depth-limited mode only) How many function levels to analyze

Analysis Workflow

Context Collection: The plugin gathers code, call graphs, cross-references, and literals
LLM Processing: Sends the context to Google's Gemini model for semantic analysis
Results Display: Opens a tabbed UI showing suggestions for each function
Review & Apply:
- Review each suggestion
- Uncheck items you don't want to apply
- Click Apply Selected to update your IDB
- Click Close to dismiss without changes

Results UI

The results window displays tabs for each analyzed function, showing:

Suggested Function Name: With reasoning and evidence
Suggested Comment: Multi-line documentation of function behavior
Variable Renames: Original → New name mappings with explanations

Each suggestion has a checkbox - uncheck to exclude it from being applied.

Batch Analysis

You can also analyze multiple functions at once:

Go to Edit → Plugins → SemRay (Google AI Semantic Analysis)
Enter comma-separated function names or addresses:
```
sub_401000, parse_header, 0x402340
```
Choose content mode and context depths
Review and apply results

How It Works

Context Building

SemRay builds rich context for the LLM by collecting:

Code Content:
- Decompiled C pseudocode (via Hex-Rays)
- Or raw disassembly with labels and addresses
Call Graph:
- Direct calls
- Indirect calls
- Virtual calls (with CodeDumper)
- Jump tables (with CodeDumper)
- Tail calls
Semantic Hints:
- String literals referenced in functions
- Large constant values
- Function prototypes/signatures
PTN Annotations (with CodeDumper):
- Provenance tracking of data flows
- Virtual table analysis
- Enhanced cross-reference context

LLM Processing

The plugin sends a carefully crafted prompt to Google's Gemini model that includes:

Persona: "You are an expert reverse engineer"
Naming Contract: Rules for meaningful, non-generic names
Call Graph: Relationships between functions
Code Context: All relevant source code
Schema Enforcement: Structured JSON output via response schema

The LLM analyzes the code holistically and provides:

Function names that encode purpose and domain
Detailed comments explaining behavior
Evidence-based reasoning for each suggestion
Variable renames that clarify intent

Name Validation

The plugin filters out generic/unhelpful names using regex patterns:

Rejects: var5, tmp, foo, bar, helper, unused
Only accepts: meaningful, descriptive identifiers

IDB Updates

When you apply changes, the plugin:

Sets function comments (with word wrapping)
Renames functions (with collision checking)
Renames local variables (using Hex-Rays API)
Marks affected functions as dirty
Refreshes pseudocode views automatically

Configuration

Model Selection

By default, SemRay uses gemini-flash-latest for speed and cost-efficiency. To change the model, edit semray.py:

DEFAULT_GEMINI_MODEL = "gemini-flash-latest"  # or "gemini-pro-latest"
MODELS_TO_REGISTER = [DEFAULT_GEMINI_MODEL]

Default Depths

Customize default analysis depths:

DEFAULT_CONTEXT_CALLER_DEPTH = 1
DEFAULT_CONTEXT_CALLEE_DEPTH = 1
DEFAULT_ANALYSIS_DEPTH = 1

Cross-Reference Types

Control which reference types are considered (when using CodeDumper):

DEFAULT_XREF_TYPES = {
    'direct_call',
    'indirect_call',
    'data_ref',
    'immediate_ref',
    'tail_call_push_ret',
    'virtual_call',
    'jump_table',
}

Safety Settings

The plugin disables Google AI's content filtering to avoid blocking reverse engineering content. Adjust in semray.py if needed:

DEFAULT_SAFETY_SETTINGS = [
    types.SafetySetting(category='HARM_CATEGORY_HATE_SPEECH', threshold='BLOCK_NONE'),
    types.SafetySetting(category='HARM_CATEGORY_DANGEROUS_CONTENT', threshold='BLOCK_NONE'),
    types.SafetySetting(category='HARM_CATEGORY_HARASSMENT', threshold='BLOCK_NONE'),
    types.SafetySetting(category='HARM_CATEGORY_SEXUALLY_EXPLICIT', threshold='BLOCK_NONE'),
]

CodeDumper Integration

SemRay can optionally use the CodeDumper plugin for enhanced capabilities:

With CodeDumper

Advanced virtual call resolution via v-table analysis
Jump table detection and analysis
Detailed cross-reference reasons
PTN (Provenance Tracking Network) annotations showing data flow
More comprehensive context discovery

Without CodeDumper

Falls back to standard IDA API functions
Basic call graph analysis
Direct and indirect call tracking
Fully functional but with less contextual information

The plugin automatically detects CodeDumper and enables integration if available.

Troubleshooting

Plugin Not Loading

Check IDA Output Window for error messages:

"PyQt5 not found": Install PyQt5 in IDA's Python environment
"pydantic not found": Install pydantic (pip install pydantic)
"google-genai not found": Install google-genai (pip install google-genai)

API Key Issues

"GOOGLE_API_KEY environment variable not set":

Verify the environment variable is set: echo $GOOGLE_API_KEY (Linux/Mac) or echo %GOOGLE_API_KEY% (Windows)
Restart IDA Pro after setting the variable
Check for typos in the variable name

Empty or Blocked Responses

"Google AI response was empty or blocked":

Check Google AI API quota/billing
Review safety settings if content is being filtered
Try a simpler function first to verify API connectivity

Decompilation Failures

"Decompilation FAILED":

Ensure Hex-Rays decompiler is installed and licensed
Try Assembly mode instead of Decompiled mode
Some functions may not decompile due to code complexity

Variable Rename Failures

Variables not renamed:

Ensure the function can be decompiled
Check that variable names match exactly (case-sensitive)
Some variables may be compiler-generated and cannot be renamed

Performance Issues

Slow analysis:

Reduce caller/callee depth (try 1 or 2 instead of higher values)
Analyze fewer functions at once
Use "Analyze CURRENT Func Only" for individual functions
Consider using gemini-flash-latest instead of gemini-pro

Best Practices

Start Small: Begin with single function analysis to verify setup and understand results
Iterative Refinement: Analyze high-level functions first, then drill down into details
Context Balance: More context improves accuracy but increases cost and time
- Depth 1-2: Fast, good for focused analysis
- Depth 3+: Slower, better for understanding complex relationships
Review Carefully: Always review suggestions before applying - AI can make mistakes
Backup Your IDB: Keep backups before applying large batch changes
Use Decompiled Mode: Generally provides better results than assembly
Check Naming Contract: Ensure suggested names follow your team's conventions

Architecture

Plugin Structure

semray/
├── semray.py              # Main plugin file
├── ida-plugin.json        # IDA plugin metadata
└── codedump/              # Optional CodeDumper integration
    ├── codedump.py        # Context discovery utilities
    ├── micro-analyzer.py  # Micro-architectural analysis
    └── ptn_utils.py       # PTN provenance tracking

Key Components

Configuration (Lines 127-173): Constants, API settings, models
Data Models (Lines 186-219): Pydantic schemas for validation
Context Builder (Lines 337-496): Gathers code, call graphs, semantics
Analysis Orchestrator (Lines 501-647): Manages the analysis pipeline
UI Components (Lines 650-908): PyQt5 widgets for results display
IDA Integration (Lines 911-1207): Actions, hooks, plugin lifecycle

Execution Flow

User Action (Right-click menu)
    ↓
CtxActionHandler.activate()
    ↓
async_call() orchestrates:
    ↓
1. build_context_material() [IDA main thread]
    ↓
2. Construct LLM prompt with context
    ↓
3. do_google_ai_analysis() [Background thread]
    ↓
4. Parse & validate JSON response
    ↓
5. do_show_ui() [UI thread]
    ↓
User reviews and clicks "Apply Selected"
    ↓
_perform_ida_updates() [IDA main thread]
    ↓
IDB updated, views refreshed

License

This plugin is provided as-is for reverse engineering and security research purposes.

Contributing

Contributions are welcome! Key areas for improvement:

Support for additional LLM providers (OpenAI, Claude, etc.)
Enhanced prompt engineering for better results
Additional context extraction strategies
UI/UX improvements
Performance optimizations

Changelog

Current Version

Initial release with Google AI (Gemini) integration
Support for decompiled and assembly analysis modes
Optional CodeDumper integration
Interactive UI for reviewing suggestions
Concurrent analysis prevention
Comprehensive error handling and fallbacks

Support

For issues, questions, or feature requests, please check the output window in IDA Pro for diagnostic information and error messages.

Credits

Built on IDA Pro's powerful reverse engineering platform
Leverages Google's Gemini AI for semantic understanding
Integrates with CodeDumper plugin for enhanced context (optional)
Uses Pydantic for robust data validation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
dumper.py		dumper.py
ida-plugin.json		ida-plugin.json
micro-analyzer.py		micro-analyzer.py
ptn_utils.py		ptn_utils.py
semray.py		semray.py

19h/ida-semray

Folders and files

Latest commit

History

Repository files navigation