You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-12Lines changed: 2 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,19 +12,14 @@ Apache StormCrawler (Incubating) is an open source collection of resources for b
12
12
13
13
NOTE: These instructions assume that you have [Apache Maven](https://maven.apache.org/install.html) installed. You will need to install [Apache Storm 2.6.2](http://storm.apache.org/) to run the crawler.
14
14
15
-
StormCrawler requires Java 11 or above.
15
+
StormCrawler requires Java 11 or above. To execute tests, it requires you to have a locally installed and working Docker environment.
16
16
17
17
DigitalPebble's [Ansible-Storm](https://github.com/DigitalPebble/ansible-storm) repository contains resources to install Apache Storm using Ansible. Alternatively, this [stormcrawler-docker](https://github.com/DigitalPebble/stormcrawler-docker) project should help you run Apache Storm on Docker.
18
18
19
19
Once Storm is installed, the easiest way to get started is to generate a new StormCrawler project following the instructions below:
20
20
21
-
### First, build the Stormcrawler codebase
22
21
```shell
23
-
mvn install
24
-
```
25
-
### Then, generate a project using the locally installed archetype
You'll be asked to enter a groupId (e.g. com.mycompany.crawler), an artefactId (e.g. stormcrawler), a version, a package name and details about the user agent to use.
@@ -35,11 +30,6 @@ Alternatively if you can't or don't want to use the Maven archetype above, you c
35
30
36
31
Have a look at the code of the [CrawlTopology class](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/src/main/java/CrawlTopology.java), the [crawler-conf.yaml](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler-conf.yaml) file as well as the files in [src/main/resources/](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/archetype-resources/src/main/resources), they are all that is needed to run a crawl topology : all the other components come from the core module.
37
32
38
-
#### Archetype Notes
39
-
40
-
While you will always be able to build StormCrawler from source we are working towards getting our first release out under the Apache Software Foundation.
41
-
Once this happens, generating StormCrawler projects will not require you to install the Maven archetype from source.
42
-
43
33
## Getting help
44
34
45
35
The [WIKI](https://github.com/apache/incubator-stormcrawler/wiki) is a good place to start your investigations but if you are stuck please use the tag [stormcrawler](http://stackoverflow.com/questions/tagged/stormcrawler) on StackOverflow or ask a question in the [discussions](https://github.com/apache/incubator-stormcrawler/discussions) section.
0 commit comments