brought up this tool months back but I haven't had time to play with it yet. It covers the following:
- Recursively traverse website sub-pages
- Handle dynamic JavaScript-based content
- Bypass common web scraping blockers
- Extract clean, structured data for AI/ML applications
I says it had a notebook but following the link to Github, it says the page no longer exists. I have asked the Author where we can find the notebook as it could be helpful.
TOP HINTEnsure you limit your crawl. I didn't and burned through my free 500 credits on the free tier in one sitting. You can limit how deep you go by setting a limit. Read up on it before using, but basically, you need is
`params = {"limit": 3}` # 3 being the depth
Also ensure you use other limiters like external crawls etc. Again read the docs
Enjoy