Taming the Web. Inspired by the cowboy's dexterity in capturing wild horses, Cowurl offers the ability to capture and manage URLs from the vastness of cyberspace.
Use this when your auto-scraping CLI tools are hindered by WAFs that require real browser-based access.
- Automatic URL Collection
- Automatic Pagination
- Real-time Output Cleaning
- Persistent Storage
- Scraping Control
- Smart Auto-Stop
- Easy Output Management
- Download or clone this repository.
- Open Chrome and navigate to
chrome://extensions. - Enable "Developer mode" in the top right corner.
- Click "Load unpacked".
- Select the folder where you saved the Cowurl extension files.
- Start URL: Enter the initial URL from which you want to start scraping. Make sure the URL includes the page parameter if the site uses pagination (ex:
https://example.com/search?page=1orhttps://example.com/items?offset=SPG:0). - Max Page: Specify the maximum number of pages you wish to scrape.
- Main Tag: Enter the primary HTML tag containing the elements you want to extract (e.g.,
div,a,li). - Lock Selector (attr/class/id): Use this selector to filter
Main Tagelements.attr:attribute_name(e.g.,attr:hrefto filter elements that have anhrefattribute)class:class_name(e.g.,class:product-itemto filter elements with theproduct-itemclass)id:id_name(e.g.,id:main-contentto filter elements with themain-contentID)- Leave blank if no specific filter is needed.
- Get Value From (attr/class/id): Define how values will be extracted from the filtered elements.
attr:attribute_name(e.g.,attr:srcto get the value from thesrcattribute)class:class_name(e.g.,class:titleto get the text from a child element with thetitleclass)id:id_name(e.g.,id:product-linkto get the text from a child element with theproduct-linkID)
- Output Cleaner (regex): (Optional) Enter a regular expression to purify the extracted output. This is applied to each scraped item before being displayed and saved. Example:
(?<=images/)(.*?)(?=\.png)will extract the.pngfilename from an image URL.
Contributions are welcome! Please feel free to open an issue or create a pull request.
Created with ❤️ by [xcapri/Tegalsec]