Skip to content

Thordata‑readme is the official GitHub developer portal for the Thordata data collection and proxy platform, offering a carefully curated one-stop overview that includes its core open-source SDKs, tutorials, and integration links.

License

Notifications You must be signed in to change notification settings

Thordata/Thordata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

⚡ Thordata: Global Proxy Network for AI & Web Data

Residential, Mobile, ISP & Datacenter proxies, plus Scraping APIs
— built for AI pipelines, growth teams, and large-scale web data collection.

Thordata overview

WebsiteDocsDashboardPython SDK (PyPI)

PyPI version Python versions License


🧩 Product Overview

Thordata provides a full‑stack web data platform:

1. Proxy Network (Core)

Product Description
Residential Proxy City‑level IP rotation for difficult targets.
Mobile Proxy 4G/5G carrier IPs for mobile‑only experiences.
Static ISP Proxy Static, ISP‑grade IPs with high trust.
Datacenter Proxy High‑bandwidth IPs for bulk crawling.
Datacenter ISP Proxy Blended ISP + DC routes for performance & trust.

All proxies are exposed via a simple HTTP/HTTPS gateway.

2. Scraping APIs

API Description
SERP API Real‑time Google/Bing/Yandex/DuckDuckGo search results with rich options.
Universal Scraper JS‑rendered HTML/PNG from any URL, bypassing antibot systems.
Web Scraper API Task‑based scraping using pre‑built spiders from the Web Scraper Store.

3. Data Layer (In Progress)

Product Description
Datasets Ready‑to‑use web datasets for AI training and analytics.
Integrations RAG pipelines, vector databases, MCP toolchains, and more.

⚙️ SDKs & Core Clients

Official SDKs and low‑level clients for accessing Thordata products.

  • thordata-python-sdk
    Modern Python SDK (published as thordata-sdk) with sync & async clients for: Residential / Datacenter / Mobile proxies, SERP API, Universal Scraper API, and Web Scraper task management.

Planned:

  • thordata-node-sdk — Node.js SDK for proxies + SERP
  • thordata-go-sdk — Go SDK for proxy & scraping workloads

🤖 AI & LLM Integrations

Tools and examples that connect Thordata with AI agents, RAG pipelines, and model tool ecosystems.

  • thordata-cookbook
    A collection of end‑to‑end recipes:

    • RAG data pipeline with Universal Scraper → HTML cleaning → Markdown
    • Web QA Agent: question → SERP search → page scraping → LLM answer
    • MCP tools: expose search_web, search_news, read_website, extract_links to LLMs
    • GitHub repository intelligence and app‑store review analysis
  • thordata-langchain-tools
    LangChain tools powered by Thordata:

    • ThordataSerpTool — real‑time web search via SERP API
    • ThordataScrapeTool — universal single‑page scraping with optional JS rendering
  • thordata-web-qa-agent
    CLI Web Q&A agent: question → Thordata SERP → Universal Scraper → HTML cleaning → OpenAI answer.


🌍 Proxy & Network Examples

Quick‑start examples for using Thordata's proxy network.

  • thordata-proxy-examples
    Minimal examples showing how to:
    • Send HTTP requests via Thordata Residential / Mobile / Datacenter proxies (Python + curl)
    • Configure basic geo‑targeting (country / city‑level)
    • Run concurrent IP checks and simple health monitoring

Planned:

  • thordata-proxy-docker — dockerized local forward proxy using Thordata credentials

📰 Google & SERP Examples

SERP‑based examples focused on Google and news use cases.

  • google-news-scraper
    Full‑featured CLI example for engine=google_news, supporting:

    • q (query), hl (language), gl (country)
    • topic_token, publication_token, section_token, story_token, so
    • CSV export of structured news results via Thordata SERP API
  • google-play-reviews-rag
    Google Play app reviews analysis + RAG: fetch reviews via Thordata Web Scraper, build an embeddings index, and answer questions about user sentiment.

Planned:

  • Generic Google web search examples (engine=google)
  • Google Maps / Play / Shopping examples as separate repositories

📚 Tutorials & Notebooks

Hands‑on guides and notebooks to help you build data pipelines on top of Thordata.

Most of these live in thordata-cookbook:

  • notebooks/rag/rag_openai_research.ipynb
    Prepare dynamic HTML content (e.g. OpenAI Research) for RAG by scraping, cleaning, and exporting to Markdown.

  • notebooks/devtools/github_repo_intel.ipynb
    Use Web Scraper API spiders to collect GitHub repository metadata (stars, issues, contributors, languages) into a Pandas DataFrame.

  • notebooks/ai/web_qa_agent_with_thordata.ipynb
    End‑to‑end "Web Q&A Agent": question → SERP search → Universal Scraper → HTML cleaning → LLM answer.

All notebooks support:

  • Live mode — call Thordata APIs and cache results under data/
  • Offline mode — reuse cached HTML/JSON without consuming credits

🚀 Quick Start (Python)

Install the SDK:

pip install thordata-sdk

1. Initialize the client

from thordata import ThordataClient

client = ThordataClient(
    scraper_token="YOUR_SCRAPER_TOKEN",
    public_token="YOUR_PUBLIC_TOKEN",
    public_key="YOUR_PUBLIC_KEY",
)

2. Send a request via the proxy network

resp = client.get("http://httpbin.org/ip")
print(resp.json())  # → see your Thordata exit IP

3. Run a SERP search

from thordata import Engine

results = client.serp_search(
    query="Thordata proxy network",
    engine=Engine.GOOGLE,
    num=5,
    # Pass engine‑specific params via **kwargs, e.g.:
    # engine="google_news", location="United States"
)

print("Organic results:", len(results.get("organic", [])))

4. Universal Scraper (HTML)

html = client.universal_scrape(
    url="https://www.thordata.com",
    js_render=True,
    output_format="html",
)
print(html[:500])

📚 Further Resources

For more advanced examples (Web Scraper tasks, async high‑concurrency, RAG pipelines, MCP tools), see:

  • SDK examples → thordata-python-sdk/examples
  • Cookbook scripts & notebooks → thordata-cookbook

🤝 Community & Support

If you are building something interesting on top of Thordata (RAG pipelines, AI agents, dashboards), feel free to open an issue and share your project — we are happy to feature selected community examples.


Thordata powers the proxy network and web data pipelines behind modern AI.

Last updated: 2025‑12‑01


About

Thordata‑readme is the official GitHub developer portal for the Thordata data collection and proxy platform, offering a carefully curated one-stop overview that includes its core open-source SDKs, tutorials, and integration links.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published