A robust and feature-rich tool for searching text patterns across multiple file types. This advanced search utility goes beyond simple grep-like functionality by supporting PDFs, Microsoft Office documents, images (OCR), archives, and more.
- Multi-Format Support: Search across plain text, PDFs, Word documents, Excel spreadsheets, images (with OCR), and archives
- High Performance: Multi-threaded searching for faster results
- Beautiful UI: Rich terminal interface with progress bars and colorful output
- Advanced Filtering: Regular expressions, case sensitivity, and whole-word matching
- Smart Detection: Automatically detects available search capabilities based on installed libraries
- Detailed Results: Clear presentation of search matches with context
# Clone the repository
git clone https://github.com/yourusername/advanced-file-search.git
cd advanced-file-search
# Install dependencies
pip install -r requirements.txt
# Install additional format support as needed
pip install PyPDF2 pdfplumber # PDF support
pip install python-docx # Word document support
pip install openpyxl xlrd # Excel spreadsheet support
pip install easyocr Pillow # OCR for images
pip install chardet # Advanced encoding detectionpython search.py "search term" # Search in current directory
python search.py "function main" /path/to/code # Search in specific directorypython search.py "TODO" --include "*.py" # Search only Python files
python search.py "error" --exclude "*.log" # Exclude log files
python search.py --regex "function.*main" . # Regular expression search
python search.py "bug" --no-case-sensitive # Case-insensitive search
python search.py "test" --max-workers 8 # Use 8 threadspython search.py --capabilities # Show available search capabilities
python search.py --extensions # Show all supported file extensions
python search.py --lib-purposes # Show purposes of all used librariesadvanced-file-search/
├── search.py # Main script
├── pyproject.toml # Project configuration
├── requirements.txt # Dependencies
├── LICENSE # MIT License
├── README.md # This file
├── test_search.py # Test suite
└── docs/ # Documentation
├── CHANGELOG.md # Version history
└── CONTRIBUTING.md # Contribution guidelines
# Run all tests
pytest
# Run specific test file
pytest test_search.py
# Run with verbose output
pytest -v test_search.py- Text Files: .py, .js, .html, .css, .md, .json, .xml, .yml, .txt, etc.
- PDF Documents: .pdf
- Word Documents: .docx, .doc, .rtf
- Spreadsheets: .xlsx, .xls, .xlsm, .xlsb, .ods
- Images (with OCR): .png, .jpg, .jpeg, .gif, .bmp, .tiff, etc.
- Archives: .zip, .tar, .gz, .bz2, .7z, etc.
- Use
--includepatterns to limit search to specific file types - Use
--excludepatterns to skip irrelevant files - Adjust
--max-workersbased on your CPU cores and file count - Consider file size limits (files over 100MB are skipped by default)
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request