Skip to content

fScan is a flexible and powerful file content search tool built in Python. It supports multiple file formats—text and binary—including .txt, .md, .json, .pdf, .docx, and more. With customizable search strategies such as regex, fuzzy matching, and semantic search, fScan helps you find exactly what you're looking for—fast and intelligently.

License

Notifications You must be signed in to change notification settings

alaamer12/fScan

Advanced File Search Tool

Python Version License

A robust and feature-rich tool for searching text patterns across multiple file types. This advanced search utility goes beyond simple grep-like functionality by supporting PDFs, Microsoft Office documents, images (OCR), archives, and more.

🌟 Features

  • Multi-Format Support: Search across plain text, PDFs, Word documents, Excel spreadsheets, images (with OCR), and archives
  • High Performance: Multi-threaded searching for faster results
  • Beautiful UI: Rich terminal interface with progress bars and colorful output
  • Advanced Filtering: Regular expressions, case sensitivity, and whole-word matching
  • Smart Detection: Automatically detects available search capabilities based on installed libraries
  • Detailed Results: Clear presentation of search matches with context

📋 Installation

# Clone the repository
git clone https://github.com/yourusername/advanced-file-search.git
cd advanced-file-search

# Install dependencies
pip install -r requirements.txt

# Install additional format support as needed
pip install PyPDF2 pdfplumber  # PDF support
pip install python-docx        # Word document support
pip install openpyxl xlrd      # Excel spreadsheet support
pip install easyocr Pillow     # OCR for images
pip install chardet            # Advanced encoding detection

📊 Usage

Basic Usage

python search.py "search term"                  # Search in current directory
python search.py "function main" /path/to/code  # Search in specific directory

Advanced Options

python search.py "TODO" --include "*.py"       # Search only Python files
python search.py "error" --exclude "*.log"      # Exclude log files
python search.py --regex "function.*main" .     # Regular expression search
python search.py "bug" --no-case-sensitive     # Case-insensitive search
python search.py "test" --max-workers 8        # Use 8 threads

Information Commands

python search.py --capabilities                # Show available search capabilities
python search.py --extensions                  # Show all supported file extensions
python search.py --lib-purposes                # Show purposes of all used libraries

🗂️ Folder Structure

advanced-file-search/
├── search.py             # Main script
├── pyproject.toml        # Project configuration
├── requirements.txt      # Dependencies
├── LICENSE               # MIT License
├── README.md             # This file
├── test_search.py        # Test suite
└── docs/                 # Documentation
    ├── CHANGELOG.md      # Version history
    └── CONTRIBUTING.md   # Contribution guidelines

🧪 Running Tests

# Run all tests
pytest

# Run specific test file
pytest test_search.py

# Run with verbose output
pytest -v test_search.py

📦 Supported File Types

  • Text Files: .py, .js, .html, .css, .md, .json, .xml, .yml, .txt, etc.
  • PDF Documents: .pdf
  • Word Documents: .docx, .doc, .rtf
  • Spreadsheets: .xlsx, .xls, .xlsm, .xlsb, .ods
  • Images (with OCR): .png, .jpg, .jpeg, .gif, .bmp, .tiff, etc.
  • Archives: .zip, .tar, .gz, .bz2, .7z, etc.

📊 Performance Tips

  • Use --include patterns to limit search to specific file types
  • Use --exclude patterns to skip irrelevant files
  • Adjust --max-workers based on your CPU cores and file count
  • Consider file size limits (files over 100MB are skipped by default)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About

fScan is a flexible and powerful file content search tool built in Python. It supports multiple file formats—text and binary—including .txt, .md, .json, .pdf, .docx, and more. With customizable search strategies such as regex, fuzzy matching, and semantic search, fScan helps you find exactly what you're looking for—fast and intelligently.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages