Great article on Firecrawll
brought up this tool months back but I haven't had time to play with it yet.
It covers the following:
  • Recursively traverse website sub-pages
  • Handle dynamic JavaScript-based content
  • Bypass common web scraping blockers
  • Extract clean, structured data for AI/ML applications
I says it had a notebook but following the link to Github, it says the page no longer exists. I have asked the Author where we can find the notebook as it could be helpful.
The Github repo for Firecrawl is at https://github.com/mendableai/firecrawl/tree/main
TOP HINTEnsure you limit your crawl. I didn't and burned through my free 500 credits on the free tier in one sitting. You can limit how deep you go by setting a limit. Read up on it before using, but basically, you need is
`params = {"limit": 3}` # 3 being the depth
Also ensure you use other limiters like external crawls etc. Again read the docs
Enjoy
2
1 comment
Tom Welsh
6
Great article on Firecrawll
AI Developer Accelerator
skool.com/ai-developer-accelerator
Master AI & software development to build apps and unlock new income streams. Transform ideas into profits. šŸ’”āž•šŸ¤–āž•šŸ‘Øā€šŸ’»šŸŸ°šŸ’°
powered by