Implement non concurrent crawler

For broad web crawling we probably do not need any concurrency within a single job... which means we can save up a bunch of resources and annoy site owners less...
Additionally I'm considering using this in a so called "breeder" - a dedicated non-concurrent web crawler which purposes are
 - download && parse robots.txt, while
 - resolving redirects
 - resolving additional DNS requests(if any) as long as it falls within the same `addr_key`, see https://github.com/let4be/crusty-core/issues/14
 - `head` index page to figure out if there are any redirects(if allowed by robots.txt)
 - Jobs that resolved all DNS(within our restrictions) and successfully `HEAD` index page are considered "breeded"

all jobs extracted from JobQ will be added to a breeder first and only then to a typical web crawler(if they survive the breeding process) with a `StaticDnsResolver`(breeder and regular web crawler will have quite different rules and settings)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement non concurrent crawler #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement non concurrent crawler #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions