Skip to content

Commit 934b678

Browse files
committed
Changelog
1 parent 0991ad5 commit 934b678

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

CHANGELOG

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,57 @@
1+
========
2+
2.0.2
3+
========
4+
---------------
5+
Overview
6+
---------------
7+
Thank you for joining us in this exciting Spark NLP year!. We continue to make progress towards a better performing library, both in speed and in accuracy.
8+
This release focuses strongly in the quality and stability of the library, making sure it works well in most cluster environments
9+
and improving the compatibility across systems. Word Embeddings continue to be improved for better performance and lower memory blueprint.
10+
Context Spell Checker continues to receive enhancements in concurrency and usage of spark. Finally, tensorflow based annotators
11+
have been significantly improved by refactoring the serialization design. Help us with feedback and we'll welcome any issue reports!
12+
13+
---------------
14+
New Features
15+
---------------
16+
* NerCrf annotator has now includeConfidence param that includes confidence scores for predictions in metadata
17+
18+
---------------
19+
Enhancements
20+
---------------
21+
* Cluster mode performance improved in tensorflow annotators by serializing to bytes internal information
22+
* Doc2Chunk annotator added new params startCol, startColByTokenIndex, failOnMissing and lowerCase allows better chunking of documents
23+
* All annotations that derive from sentence or chunk types now contain metadata information referring to the sentence or chunk ID they belong to
24+
* ContextSpellChecker now creates a window around the token to improve computation performance
25+
* Improved WordEmbeddings matching accuracy by trying alternative case sensitive tokens
26+
* WordEmbeddings won't load twice if already loaded
27+
* WordEmbeddings can use embeddingsRef if source was not provided, improving reutilization of embeddings in a pipeline
28+
* WordEmbeddings new param includeEmbeddings allow annotators not to save entire embeddings source along them
29+
* Contrib tensorflow dependencies now only load if necessary
30+
31+
---------------
32+
Bugfixes
33+
---------------
34+
* Added missing Symmetric delete pretrained model
35+
* Fixed a broken param name in Normalizer (thanks @RobertSassen)
36+
* Fixed Cloudera cluster support
37+
* Fixed concurrent access in ContextSpellChecker in high partition number use cases and LightPipelines
38+
* Fixed POS dataset creator to better handle corrupted pairs
39+
* Fixed a bug in Word Embeddings not matching exact case sensitive tokens in some scenarios
40+
* Fixed OCR Tess4J initialization problems in concurrent scenarios
41+
42+
---------------
43+
Models and Pipelines
44+
---------------
45+
* Renaming of models and pipelines (work in progress)
46+
* Better output column naming in pipelines
47+
48+
---------------
49+
Developer API
50+
---------------
51+
* Unified more WordEmbeddings interface with dimension params and individual setters
52+
* Improved unit tests for better compatibility on Windows
53+
* Python embeddings moved to sparknlp.embeddings
54+
155
========
256
2.0.1
357
========

0 commit comments

Comments
 (0)