You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG
+47Lines changed: 47 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,50 @@
1
+
========
2
+
1.8.0
3
+
========
4
+
---------------
5
+
Overview
6
+
---------------
7
+
This release is huge! Spark-NLP made the leap into Spark 2.4.0, even with the challenge of not having everyone yet on board there (i.e. Zeppelin doesn't yet support it).
8
+
In this version we release three new NLP annotators. Two for dependency parsing processes and one for contextual deep learning based spell checking.
9
+
We also significantly improved OCR functionality, fine-tuning capabilities and general output performance, particularly on tesseract.
10
+
Finally, there's plenty of bug fixes and improvements in the word embeddings field, along with performance boosts and reduced disk IO.
11
+
Feel free to shoot us with any feedback you have! Particularly on your Spark 2.4.x experience.
12
+
13
+
---------------
14
+
New Features
15
+
---------------
16
+
* Built on top of Spark 2.4.0
17
+
* Dependency Parser annotator allows for sentence relationship encoding
18
+
* Typed Dependency Parser annotator allows for labeling relationships within dependency tags
19
+
* ContextSpellChecker is our first Deep Learning based Spell Checker that evaluates context and not only tokens
20
+
21
+
---------------
22
+
Enhancements
23
+
---------------
24
+
* More OCR parameters exposed for further fine tuning, including preferred methods priority and page segmentation modes
25
+
* OCR now has a setting setSplitPages() which allows setting whether to output one page per row or the entire document instead
26
+
* Improved word embeddings performance when working in local filesystems
27
+
* Reduced the amount of disk IO when working with Word Embeddings
28
+
* All python notebooks improved for better readability and better documentation
29
+
* Simplified PySpark interface API
30
+
* CoNLLGenerator utility class which helps building CoNLL-2003 files for NER training
31
+
* EmbeddingsHelper now allows reading word embeddings files directly from s3a:// paths
32
+
33
+
---------------
34
+
Bugfixes
35
+
---------------
36
+
* Solved race-condition issues in regards of cluster usage of RocksDB index for embeddings
@@ -43,14 +45,14 @@ Use either one of the following options
43
45
44
46
* Add the following Maven Coordinates to the interpreter's library list
45
47
```
46
-
com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.3
48
+
com.johnsnowlabs.nlp:spark-nlp_2.11:1.8.0
47
49
```
48
50
* Add path to pre-built jar from [here](#pre-compiled-spark-nlp-and-spark-nlp-ocr) in the interpreter's library list making sure the jar is available to driver path
49
51
50
52
### Python in Zeppelin
51
53
Apart from previous step, install python module through pip
52
54
```
53
-
pip install spark-nlp==1.7.3
55
+
pip install spark-nlp==1.8.0
54
56
```
55
57
Configure Zeppelin properly, use cells with %spark.pyspark or any interpreter name you chose.
56
58
@@ -61,7 +63,7 @@ An alternative option would be to set `SPARK_SUBMIT_OPTIONS` (zeppelin-env.sh) a
61
63
## Python without explicit Spark installation
62
64
If you installed pyspark through pip, you can install sparknlp through pip as well
63
65
```
64
-
pip install spark-nlp==1.7.3
66
+
pip install spark-nlp==1.8.0
65
67
```
66
68
Then you'll have to create a SparkSession manually, for example:
67
69
```
@@ -87,7 +89,7 @@ export PYSPARK_PYTHON=python3
87
89
export PYSPARK_DRIVER_PYTHON=jupyter
88
90
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
89
91
90
-
pyspark --packages JohnSnowLabs:spark-nlp:1.7.3
92
+
pyspark --packages JohnSnowLabs:spark-nlp:1.8.0
91
93
```
92
94
93
95
Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
<p><spanclass="label label-warning">2018 Nov 11st - Update!</span> 1.7.3 Released! Word embeddings decoupled from annotators, better Windows and improved cluster support</p>
82
-
<p><spanclass="label label-danger">Apache Spark 2.4.x not yet supported</span></p>
81
+
<p><spanclass="label label-warning">2018 Nov 21st - Update!</span> 1.8.0 Released! Dependency Parser, new Spell Checker, Spark 2.4.0, performance boosts and more!</p>
Since we are dealing with small amounts of data, we put in practice LightPipelines.
104
104
</p>
105
105
<p>
106
-
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/example/src/TrainViveknSentiment.scala" target="_blank"> Take me to code!</a>
106
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/example/src/TrainViveknSentiment.scala" target="_blank"> Take me to code!</a>
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/python/example/vivekn-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
138
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/python/example/vivekn-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
Each of these sentences will be used for giving a score to text
158
158
</p>
159
159
</p>
160
-
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/python/example/dictionary-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
160
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/python/example/dictionary-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
approach to use the same pipeline for tagging external resources.
178
178
</p>
179
179
<p>
180
-
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/python/example/crf-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
180
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/python/example/crf-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
and it will leverage batch-based distributed calls to native TensorFlow libraries during prediction.
197
197
</p>
198
198
<p>
199
-
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/python/example/dl-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
199
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/python/example/dl-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
200
200
</p>
201
201
</div>
202
202
<div>
@@ -211,7 +211,7 @@ <h4 id="text-notebook" class="section-block"> Simple Text Matching</h4>
211
211
This annotator is an Annotator Model and does not require training.
212
212
</p>
213
213
<p>
214
-
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/python/example/text-matcher/extractor.ipynb" target="_blank"> Take me to notebook!</a>
214
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/python/example/text-matcher/extractor.ipynb" target="_blank"> Take me to notebook!</a>
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.7.3/python/example/model-downloader/ModelDownloaderExample.ipynb" target="_blank"> Take me to notebook!</a>
227
+
<aclass="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.8.0/python/example/model-downloader/ModelDownloaderExample.ipynb" target="_blank"> Take me to notebook!</a>
0 commit comments