Spark NLP 4.2.8: Patch release
π’ Overview
Spark NLP 4.2.8 π comes with some important bug fixes and improvements. As a result, we highly recommend to update to this latest version if you are using Spark NLP 4.2.x.
As always, we would like to thank our community for their feedback, questions, and feature requests. π
β π Bug Fixes & Improvements
- Fix the issue with optional keys (labels) in metadata when using XXXForSequenceClassitication annotators. This fixes
Some(neg) -> 0.13602075
asneg -> 0.13602075
to be in harmony with all the other classifiers. #13396
before 4.2.8:
+-----------------------------------------------------------------------------------------------+
|label |
+-----------------------------------------------------------------------------------------------+
|[{category, 0, 87, pos, {sentence -> 0, Some(neg) -> 0.13602075, Some(pos) -> 0.8639792}, []}] |
|[{category, 0, 47, neg, {sentence -> 0, Some(neg) -> 0.7505674, Some(pos) -> 0.24943262}, []}] |
|[{category, 0, 17, pos, {sentence -> 0, Some(neg) -> 0.31065974, Some(pos) -> 0.6893403}, []}] |
|[{category, 0, 71, neg, {sentence -> 0, Some(neg) -> 0.5079189, Some(pos) -> 0.4920811}, []}] |
+-----------------------------------------------------------------------------------------------+
after 4.2.8:
+-----------------------------------------------------------------------------------+
|label |
+-----------------------------------------------------------------------------------+
|[{category, 0, 87, pos, {sentence -> 0, neg -> 0.13602075, pos -> 0.8639792}, []}] |
|[{category, 0, 47, neg, {sentence -> 0, neg -> 0.7505674, pos -> 0.24943262}, []}] |
|[{category, 0, 17, pos, {sentence -> 0, neg -> 0.31065974, pos -> 0.6893403}, []}] |
|[{category, 0, 71, neg, {sentence -> 0, neg -> 0.5079189, pos -> 0.4920811}, []}] |
+-----------------------------------------------------------------------------------+
- Introducing a config to skip
LightPipeline
validation forinputCols
on the Python side for projects depending on Spark NLP. This toggle should only be used for specific annotators that do not follow the convention of predefinedinputAnnotatorTypes
andoutputAnnotatorType
#13402
π Documentation
- TF Hub & HuggingFace to Spark NLP
- Models Hub with new models
- Spark NLP documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
- Spark NLP Workshop notebooks
- Spark NLP publications
- Spark NLP in Action
- Spark NLP training certification notebooks for Google Colab and Databricks
- Spark NLP Display for visualization of different types of annotations
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
Installation
Python
#PyPI
pip install spark-nlp==4.2.8
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
M1
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>4.2.8</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>4.2.8</version>
</dependency>
spark-nlp-m1:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-m1_2.12</artifactId>
<version>4.2.8</version>
</dependency>
spark-nlp-aarch64:
<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -->
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>4.2.8</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.2.8.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.2.8.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-m1-assembly-4.2.8.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-4.2.8.jar
What's Changed
- Updated VIT model hub cards to remove duplicate entities by @ahmedlone127 in #13281
- Models hub by @maziyarpanahi in #13340
- 2023-01-13-finmapper_wikipedia_parentcompanies_en (#13341) by @josejuanmartinez in #13342
- Models hub legal by @josejuanmartinez in #13343
- Models hub finance by @josejuanmartinez in #13345
- Models hub internal by @Cabir40 in #13351
- added 4.2.7 HC RN by @Cabir40 in #13353
- Add new demos 29 by @agsfer in #13355
- Updated compat table jsl ocr by @albertoandreottiATgmail in #13356
- Models Hub v2.9.0 by @pabla in #13361
- Update head.html by @agsfer in #13367
- FEATURE NMH-140: Add the "Copy S3 URI" to existing documents [skip-test] by @pabla in #13368
- Update programmingLanguageSwitcherScalaPython.js by @agsfer in #13370
- Added Visual NLP 4.2.1 release notes by @albertoandreottiATgmail in #13381
- Update release notes 1 by @albertoandreottiATgmail in #13384
- Finance NLP 1.6.0 by @josejuanmartinez in #13385
- Legal NLP 1.6.0 by @josejuanmartinez in #13386
- Update release notes 1 by @albertoandreottiATgmail in #13387
- Docs/nlp lab4.6.2 by @rpranab in #13394
- Update tabs in docs by @agsfer in #13395
- Legal 1.6.0 additional model by @josejuanmartinez in #13398
- Finance NLP 1.6.0 by @josejuanmartinez in #13399
- [skip ci] Create PR 4.2.7-healthcare-docs-debe9225c540f2f95c464ed9e4be42807e431106-18 by @jsl-builder in #13372
- Fixed some md files by @Damla-Gurbaz in #13400
- Uptade ocr cards by @aymanechilah in #13407
- 428 release candidate by @maziyarpanahi in #13406
Full Changelog: 4.2.7...4.2.8