Stars
java10000 / SeimiCrawler
Forked from zhegexiaohuozi/SeimiCrawler一个敏捷的,分布式的爬虫框架;An agile, distributed crawler framework.
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).