A privacy-focused document redaction tool that automatically detects and redacts sensitive information from PDF and DOCX documents.
π 100% Local Processing β’ π No Server Required β’ π― Smart Detection β’ π‘οΈ Privacy First
- Automatic Detection: Uses regex patterns and ML-based detection to find sensitive information
- Multiple Detection Methods: Combines pattern matching with machine learning for high accuracy
- Customizable Patterns: Add your own predefined words and custom detection rules
- Language Support: Detects content in multiple languages (Latin, Chinese, Arabic, etc.)
- PDF Files: Full support for text-based PDFs
- DOCX Files: Microsoft Word document support
- Metadata Extraction: View and remove document metadata and hidden content
- Image Extraction: Extract and download embedded images from documents
- Interactive Review: Review and confirm detected entities before redaction
- Manual Selection: Click and drag to manually select sensitive areas
- Bulk Operations: Confirm or reject multiple entities at once
- Real-time Preview: See exactly what will be redacted
- 100% Local Processing: All processing happens in your browser
- No Server Required: No data is sent to external servers
- No Tracking: No analytics or telemetry
- Open Source: Fully auditable code
Safe Redact automatically detects the following types of sensitive information in both PDF and DOCX documents:
- Social Security Numbers (SSN): US Social Security Numbers in various formats
- Email Addresses: RFC 5322 compliant email addresses
- Phone Numbers:
- US phone numbers (various formats)
- International phone numbers
- China mobile numbers (11-digit format)
- China landline numbers
- Credit Card Numbers:
- Generic credit cards (13-19 digits with Luhn validation)
- Visa cards (13 or 16 digits)
- Mastercard (16 digits)
- American Express (15 digits)
- Discover cards
- China UnionPay cards
- MM/DD/YYYY format
- YYYY-MM-DD format
- Month DD, YYYY format
- Chinese date format (YYYYεΉ΄MMζDDζ₯)
- DD/MM/YYYY format (International)
- Chinese National ID: 18-digit ID with validation
- Passports:
- Chinese Passport (current and legacy formats)
- US Passport
- Network Information:
- IPv4 addresses
- IPv6 addresses
- URLs (HTTP/HTTPS)
- Cryptocurrency: Bitcoin addresses
- Custom Predefined Words: User-defined sensitive terms
When the "Sanitize Document" option is enabled, Safe Redact removes metadata and hidden content to prevent information leakage.
The following elements are removed from PDF documents:
Metadata:
- Title, Author, Subject, Keywords
- Creator, Producer
- Creation Date, Modification Date
- All other metadata fields
Hidden Content:
- Comments and annotations (all types)
- Markup annotations (highlights, underlines, strikeouts, etc.)
- Stamps and file attachments
- Multimedia content (sound, video)
- Form fields (optional)
- Embedded files
- JavaScript actions
- Optional Content Groups (PDF layers)
The following elements are removed from DOCX documents:
Metadata:
- Core Properties: Title, Author, Subject, Keywords, Creator, Last Modified By, Dates, Category, Content Status
- App Properties: Application name, Version, Company, Manager, Template
- Custom Properties: All custom metadata
Hidden Content:
- Comments and comment references
- Track Changes/Revisions (insertions, deletions, moves, formatting changes)
- Bookmarks
- Custom XML data
- Document Settings: Revision identifiers (RSIDs), proof errors, document protection
- VBA/Macros and macro data
- (Optional) Headers and footers
- (Optional) Embedded objects and files
- Node.js 18 or higher
- npm or yarn
- Clone the repository:
git clone https://github.com/zhendong/safe-redact.git
cd safe-redact- Install dependencies:
npm installStart the development server:
npm run devThe application will be available at http://localhost:5173
Build for production:
npm run buildThe built files will be in the dist/ directory.
Run the test suite:
npm test- React 19 + TypeScript: UI framework with type safety
- Vite: Fast build tool and dev server
- MuPDF: PDF parsing, rendering, and manipulation
- PizZip + Mammoth: DOCX processing
- Transformers.js: ML-based entity detection (optional)
- Tailwind CSS: Utility-first styling
- All document processing happens 100% locally in your browser
- No data is sent to external servers
- Documents never leave your device
- Open source and auditable
Contributions are welcome!
- Report bugs and issues
- Suggest new features
- Improve documentation
- Submit pull requests
- Add test cases
MIT License - see LICENSE file for details