A web scraper using Crawl4AI to find real estate developers who play golf, with contact information extraction (email, phone) and a Streamlit dashboard for data visualization. This project implements a FAISS-based vector knowledge graph for storing and querying entities and relationships.
-
Install the package:
pip install -e . -
Run the scraper:
python -m construction_scraper scrape
-
Launch the dashboard:
python -m construction_scraper dashboard
construction_scraper/: Main packagecore/: Core data models and structuresknowledge_graph/: FAISS-based vector knowledge graph implementationscrapers/: Web scrapers for collecting datautils/: Utility functions and helpersweb/: Web dashboard for data visualization
tests/: Test filesdocs/: Documentationscripts/: Utility scriptsdata/: Data directory (created automatically)
-
Advanced Data Collection:
- Uses Crawl4AI for efficient, LLM-friendly web crawling
- Automatically follows relevant links to discover more information
- Identifies real estate developers with golf connections
-
Contact Information Extraction:
- Extracts email addresses using regex pattern matching
- Identifies phone numbers in various formats
- Links contacts to specific developer profiles
-
FAISS Vector Knowledge Graph:
- Stores entities and relationships in a structured knowledge graph
- Enables semantic similarity search using FAISS vector embeddings
- Maintains persistent data storage using SQLite
- Performs network analysis to identify key influencers
- Visualizes connections between developers, golf entities, and companies
See the docs/ directory for detailed documentation, including:
DEVELOPMENT.md: Detailed architecture and implementation notes
MIT