A collection of resources for building low-latency, scalable web crawlers on Apache Storm®

Apache StormCrawler is an open source SDK for building distributed web crawlers based on Apache Storm®. The project is under Apache license v2 and consists of a collection of reusable resources and components, written mostly in Java.

The aim of Apache StormCrawler is to help build web crawlers that are :

Apache StormCrawler is a library and collection of resources that developers can leverage to build their own crawlers. The good news is that doing so can be pretty straightforward! Have a look at the Getting Started section for more details.

Apart from the core components, we provide some external resources that you can reuse in your project, like for instance our spout and bolts for OpenSearch® or a ParserBolt which uses Apache Tika® to parse various document formats.

Apache StormCrawler is perfectly suited to use cases where the URL to fetch and parse come as streams but is also an appropriate solution for large scale recursive crawls, particularly where low latency is required. The project is used in production by many organisations and is actively developed and maintained.

The Presentations page contains links to some recent presentations made about this project.

*

He said to her: "If you won't do it for others, then who's gonna do it for you?"

Justin Robertson - "Love Movement (Ulrich Schnauss Remix)"


***

Justin Robertson
James at Headphonesex noticed that I've been revisiting Ulrich Schnauss lately and sent me two songs. Both are beautiful but Schnauss' remix of Justin Robertson's gorgeous "Love Movement" sort of stunned me and my friend Jordan into silence for a moment. Some songs, though it's rare, draw an emotional reaction out of me. This is the stuff that I really love.

Justin Robertson Presents Revtone is hard to come by in the U.S. but I guess it's more accessible in the U.K. I could buy a used CD on Amazon which I hate doing. And this stuff doesn't exist on any of those freebie download sites like Limewire. It's frustrating to be limited to U.S. releases and infrequent imports. So I'm thankful for my international buddies who keep me up to speed on all the latest and greatest.
Ulrich Schnauss
One more thing: While I was talking to my 9-year-old cousin Stephen the other day about music, he told that he doesn't know who Billy Idol is. Granted, he's of the XBox generation and Billy Idol hasn't entirely secured a solid foothold in music history the way, say, Bob Dylan has but still... should I be surprised by this?

So this for little Stephen. xo

Billy Idol - "Flesh For Fantasy (Remix)"







JW: thanks so much
JZ: make sure your feet face forward