You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG
+54Lines changed: 54 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,57 @@
1
+
========
2
+
2.0.2
3
+
========
4
+
---------------
5
+
Overview
6
+
---------------
7
+
Thank you for joining us in this exciting Spark NLP year!. We continue to make progress towards a better performing library, both in speed and in accuracy.
8
+
This release focuses strongly in the quality and stability of the library, making sure it works well in most cluster environments
9
+
and improving the compatibility across systems. Word Embeddings continue to be improved for better performance and lower memory blueprint.
10
+
Context Spell Checker continues to receive enhancements in concurrency and usage of spark. Finally, tensorflow based annotators
11
+
have been significantly improved by refactoring the serialization design. Help us with feedback and we'll welcome any issue reports!
12
+
13
+
---------------
14
+
New Features
15
+
---------------
16
+
* NerCrf annotator has now includeConfidence param that includes confidence scores for predictions in metadata
17
+
18
+
---------------
19
+
Enhancements
20
+
---------------
21
+
* Cluster mode performance improved in tensorflow annotators by serializing to bytes internal information
22
+
* Doc2Chunk annotator added new params startCol, startColByTokenIndex, failOnMissing and lowerCase allows better chunking of documents
23
+
* All annotations that derive from sentence or chunk types now contain metadata information referring to the sentence or chunk ID they belong to
24
+
* ContextSpellChecker now creates a window around the token to improve computation performance
25
+
* Improved WordEmbeddings matching accuracy by trying alternative case sensitive tokens
26
+
* WordEmbeddings won't load twice if already loaded
27
+
* WordEmbeddings can use embeddingsRef if source was not provided, improving reutilization of embeddings in a pipeline
28
+
* WordEmbeddings new param includeEmbeddings allow annotators not to save entire embeddings source along them
29
+
* Contrib tensorflow dependencies now only load if necessary
30
+
31
+
---------------
32
+
Bugfixes
33
+
---------------
34
+
* Added missing Symmetric delete pretrained model
35
+
* Fixed a broken param name in Normalizer (thanks @RobertSassen)
36
+
* Fixed Cloudera cluster support
37
+
* Fixed concurrent access in ContextSpellChecker in high partition number use cases and LightPipelines
38
+
* Fixed POS dataset creator to better handle corrupted pairs
39
+
* Fixed a bug in Word Embeddings not matching exact case sensitive tokens in some scenarios
40
+
* Fixed OCR Tess4J initialization problems in concurrent scenarios
41
+
42
+
---------------
43
+
Models and Pipelines
44
+
---------------
45
+
* Renaming of models and pipelines (work in progress)
46
+
* Better output column naming in pipelines
47
+
48
+
---------------
49
+
Developer API
50
+
---------------
51
+
* Unified more WordEmbeddings interface with dimension params and individual setters
52
+
* Improved unit tests for better compatibility on Windows
Copy file name to clipboardExpand all lines: README.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,14 +43,14 @@ Take a look at our official spark-nlp page: http://nlp.johnsnowlabs.com/ for use
43
43
44
44
## Apache Spark Support
45
45
46
-
Spark-NLP *2.0.1* has been built on top of Apache Spark 2.4.0
46
+
Spark-NLP *2.0.2* has been built on top of Apache Spark 2.4.0
47
47
48
48
Note that Spark is not retrocompatible with Spark 2.3.x, so models and environments might not work.
49
49
50
50
If you are still stuck on Spark 2.3.x feel free to use [this assembly jar](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-2.3.2-nlp-assembly-1.8.0.jar) instead. Support is limited.
51
51
For OCR module, [this](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-2.3.2-nlp-ocr-assembly-1.8.0.jar) is for spark `2.3.x`.
@@ -224,7 +224,7 @@ Use either one of the following options
224
224
* Add the following Maven Coordinates to the interpreter's library list
225
225
226
226
```bash
227
-
com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.1
227
+
com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2
228
228
```
229
229
230
230
* Add path to pre-built jar from [here](#pre-compiled-spark-nlp-and-spark-nlp-ocr) in the interpreter's library list making sure the jar is available to driver path
0 commit comments