⚡ Thordata: Global Proxy Network for AI & Web Data

Residential, Mobile, ISP & Datacenter proxies, plus Scraping APIs
— built for AI pipelines, growth teams, and large-scale web data collection.

Website • Docs • Dashboard • Python SDK (PyPI)

🧩 Product Overview

Thordata provides a full‑stack web data platform:

1. Proxy Network (Core)

Product	Description
Residential Proxy	City‑level IP rotation for difficult targets.
Mobile Proxy	4G/5G carrier IPs for mobile‑only experiences.
Static ISP Proxy	Static, ISP‑grade IPs with high trust.
Datacenter Proxy	High‑bandwidth IPs for bulk crawling.
Datacenter ISP Proxy	Blended ISP + DC routes for performance & trust.

All proxies are exposed via a simple HTTP/HTTPS gateway.

2. Scraping APIs

API	Description
SERP API	Real‑time Google/Bing/Yandex/DuckDuckGo search results with rich options.
Universal Scraper	JS‑rendered HTML/PNG from any URL, bypassing antibot systems.
Web Scraper API	Task‑based scraping using pre‑built spiders from the Web Scraper Store.

3. Data Layer (In Progress)

Product	Description
Datasets	Ready‑to‑use web datasets for AI training and analytics.
Integrations	RAG pipelines, vector databases, MCP toolchains, and more.

⚙️ SDKs & Core Clients

Official SDKs and low‑level clients for accessing Thordata products.

thordata-python-sdk
Modern Python SDK (published as thordata-sdk) with sync & async clients for: Residential / Datacenter / Mobile proxies, SERP API, Universal Scraper API, and Web Scraper task management.

Planned:

thordata-node-sdk — Node.js SDK for proxies + SERP

thordata-go-sdk — Go SDK for proxy & scraping workloads

🤖 AI & LLM Integrations

Tools and examples that connect Thordata with AI agents, RAG pipelines, and model tool ecosystems.

thordata-cookbook
A collection of end‑to‑end recipes:
- RAG data pipeline with Universal Scraper → HTML cleaning → Markdown
- Web QA Agent: question → SERP search → page scraping → LLM answer
- MCP tools: expose search_web, search_news, read_website, extract_links to LLMs
- GitHub repository intelligence and app‑store review analysis
thordata-langchain-tools
LangChain tools powered by Thordata:
- ThordataSerpTool — real‑time web search via SERP API
- ThordataScrapeTool — universal single‑page scraping with optional JS rendering
thordata-web-qa-agent
CLI Web Q&A agent: question → Thordata SERP → Universal Scraper → HTML cleaning → OpenAI answer.

🌍 Proxy & Network Examples

Quick‑start examples for using Thordata's proxy network.

thordata-proxy-examples
Minimal examples showing how to:
- Send HTTP requests via Thordata Residential / Mobile / Datacenter proxies (Python + curl)
- Configure basic geo‑targeting (country / city‑level)
- Run concurrent IP checks and simple health monitoring

Planned:

thordata-proxy-docker — dockerized local forward proxy using Thordata credentials

📰 Google & SERP Examples

SERP‑based examples focused on Google and news use cases.

google-news-scraper
Full‑featured CLI example for engine=google_news, supporting:
- q (query), hl (language), gl (country)
- topic_token, publication_token, section_token, story_token, so
- CSV export of structured news results via Thordata SERP API
google-play-reviews-rag
Google Play app reviews analysis + RAG: fetch reviews via Thordata Web Scraper, build an embeddings index, and answer questions about user sentiment.

Planned:

Generic Google web search examples (engine=google)

Google Maps / Play / Shopping examples as separate repositories

📚 Tutorials & Notebooks

Hands‑on guides and notebooks to help you build data pipelines on top of Thordata.

Most of these live in thordata-cookbook:

notebooks/rag/rag_openai_research.ipynb —
Prepare dynamic HTML content (e.g. OpenAI Research) for RAG by scraping, cleaning, and exporting to Markdown.
notebooks/devtools/github_repo_intel.ipynb —
Use Web Scraper API spiders to collect GitHub repository metadata (stars, issues, contributors, languages) into a Pandas DataFrame.
notebooks/ai/web_qa_agent_with_thordata.ipynb —
End‑to‑end "Web Q&A Agent": question → SERP search → Universal Scraper → HTML cleaning → LLM answer.

All notebooks support:

Live mode — call Thordata APIs and cache results under data/
Offline mode — reuse cached HTML/JSON without consuming credits

🚀 Quick Start (Python)

Install the SDK:

pip install thordata-sdk

1. Initialize the client

from thordata import ThordataClient

client = ThordataClient(
    scraper_token="YOUR_SCRAPER_TOKEN",
    public_token="YOUR_PUBLIC_TOKEN",
    public_key="YOUR_PUBLIC_KEY",
)

2. Send a request via the proxy network

resp = client.get("http://httpbin.org/ip")
print(resp.json())  # → see your Thordata exit IP

3. Run a SERP search

from thordata import Engine

results = client.serp_search(
    query="Thordata proxy network",
    engine=Engine.GOOGLE,
    num=5,
    # Pass engine‑specific params via **kwargs, e.g.:
    # engine="google_news", location="United States"
)

print("Organic results:", len(results.get("organic", [])))

4. Universal Scraper (HTML)

html = client.universal_scrape(
    url="https://www.thordata.com",
    js_render=True,
    output_format="html",
)
print(html[:500])

📚 Further Resources

For more advanced examples (Web Scraper tasks, async high‑concurrency, RAG pipelines, MCP tools), see:

SDK examples → thordata-python-sdk/examples
Cookbook scripts & notebooks → thordata-cookbook

🤝 Community & Support

Dashboard: https://www.thordata.com/
Docs: https://doc.thordata.com
Python SDK: https://github.com/Thordata/thordata-python-sdk
Cookbook: https://github.com/Thordata/thordata-cookbook
Support: [email protected]

If you are building something interesting on top of Thordata (RAG pipelines, AI agents, dashboards), feel free to open an issue and share your project — we are happy to feature selected community examples.

Thordata powers the proxy network and web data pipelines behind modern AI.

_{Last updated: 2025‑12‑01}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡ Thordata: Global Proxy Network for AI & Web Data

🧩 Product Overview

1. Proxy Network (Core)

2. Scraping APIs

3. Data Layer (In Progress)

⚙️ SDKs & Core Clients

🤖 AI & LLM Integrations

🌍 Proxy & Network Examples

📰 Google & SERP Examples

📚 Tutorials & Notebooks

🚀 Quick Start (Python)

Install the SDK:

1. Initialize the client

2. Send a request via the proxy network

3. Run a SERP search

4. Universal Scraper (HTML)

📚 Further Resources

🤝 Community & Support

About

Uh oh!

Releases

Packages

License

Thordata/Thordata

Folders and files

Latest commit

History

Repository files navigation

⚡ Thordata: Global Proxy Network for AI & Web Data

🧩 Product Overview

1. Proxy Network (Core)

2. Scraping APIs

3. Data Layer (In Progress)

⚙️ SDKs & Core Clients

🤖 AI & LLM Integrations

🌍 Proxy & Network Examples

📰 Google & SERP Examples

📚 Tutorials & Notebooks

🚀 Quick Start (Python)

Install the SDK:

1. Initialize the client

2. Send a request via the proxy network

3. Run a SERP search

4. Universal Scraper (HTML)

📚 Further Resources

🤝 Community & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages