Skip to content

Commit 4379df7

Browse files
maziyarpanahidcecchinidanilojslDevinTDHajosejuanmartinez
authored
Spark NLP 427 release candidate (#13290)
* Removed duplicated method definition (#13280) Removed the duplicated definition of method `setWeightedDistPath` from `ContextSpellCheckerApproach`. * SPARKNLP-703 Fix Finisher outputAnnotatorType Issue (#13282) * SPARKNLP-703 adding control to avoid loading outputAnnotatorType attribute when components don't override it * SPARKNLP-703 Adding validation when PipelineModel is part of stages * SPARKNLP-667: Fix indexing issue for custom pattern (#13283) - fix for patterns with lookahead/behinds that have 0 width matches, indexes would not be calculated correctly - resolved some warnings - refactored tokenizer tests and added new index alignment check * SPARKNLP-708 Enabling embeddings output in LightPipeline.fullAnnotate (#13284) Co-authored-by: Maziyar Panahi <[email protected]> * Bump version to 4.2.7 * SPARKNLP-667: Added try-catch block for custom pattern/char (#13291) - if a user provided pattern/char can not be applied, a message will be logged instead of throwing an exception * Enable dropInvalid in reading photos * disable `assemble an image input` unit test - this unit test fails randomly for either a `javax.imageio.IIOException: Unsupported Image Type` or bad assert of `annotationImage.height`. Which suggests something is happening on the OS/file system level as if you re-try it will pass * SPARKNLP-713 Modifies Default Values GraphExtraction (#13305) * SPARKNLP-713 Modifies default values of explodeEntities and mergeEntities * SPARKNLP-713 Refactor GraphFinisher Tests * SPARKNLP-713 Adding warning message for empty paths * Fix links for APIs in Open Source (#13312) * Update 2022-09-27-finassertion_time_en.md * Update 2022-08-17-finner_orgs_prods_alias_en_3_2.md * Update 2022-08-17-legner_orgs_prods_alias_en_3_2.md * Update fin/leg clf models' benchmark (#13276) * relese note for 4.5.0 including gif (#13301) Co-authored-by: pranab <[email protected]> Co-authored-by: diatrambitas <JSL.Git2018> * Databricks installation instructions update. (#13261) * Databricks installation instructions update. * updated DB installation steps Co-authored-by: diatrambitas <JSL.Git2018> * Update 2022-09-27-legassertion_time_en.md * Input output images (#13310) * [skip test] Fix links for APIs in Open Source Co-authored-by: Jose J. Martinez <[email protected]> Co-authored-by: Bünyamin Polat <[email protected]> Co-authored-by: rpranab <[email protected]> Co-authored-by: pranab <[email protected]> Co-authored-by: Jiri Dobes <[email protected]> Co-authored-by: Lev <[email protected]> * SPARKNLP-715 Fix sentence index computation (#13318) * Update CHANGELOG for 4.2.7 [run doc] * Update Scala and Python APIs * Release Spark NLP 4.2.7 on Conda [skip test] Co-authored-by: David Cecchini <[email protected]> Co-authored-by: Danilo Burbano <[email protected]> Co-authored-by: Devin Ha <[email protected]> Co-authored-by: Jose J. Martinez <[email protected]> Co-authored-by: Bünyamin Polat <[email protected]> Co-authored-by: rpranab <[email protected]> Co-authored-by: pranab <[email protected]> Co-authored-by: Jiri Dobes <[email protected]> Co-authored-by: Lev <[email protected]> Co-authored-by: github-actions <[email protected]>
1 parent 4ce624d commit 4379df7

File tree

1,441 files changed

+5893
-5607
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,441 files changed

+5893
-5607
lines changed

CHANGELOG

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
========
2+
4.2.7
3+
========
4+
----------------
5+
Bug Fixes & Enhancements
6+
----------------
7+
* Fix `outputAnnotatorType` issue in pipelines with `Finisher` annotator. This change adds `outputAnnotatorType` to `AnnotatorTransformer` to avoid loading `outputAnnotatorType` attribute when a stage in pipeline does not use it.
8+
* Fix the wrong sentence index calculation in metadata by annotators in the pipeline when `setExplodeSentences` param was set to `true` in SentenceDetector annotator
9+
* Fix the issue in `Tokenizer` when a custom pattern is used with `lookahead/-behinds` and it has `0 width` matches. This led to indexes not being calculated correctly
10+
* Fix missing to output embeddings in `.fullAnnotate()` method when `parseEmbeddings` param was set to `True/true`
11+
* Fix broken links to the Python API pages, as the generation of the PyDocs was slightly changed in a previous release. This makes the Python APIs accessible from the Annotators and Transformers pages like before
12+
* Change default values of `explodeEntities` and `mergeEntities` parameters to `true`
13+
* Better error handling when there are empty paths/relations in `GraphExctraction`annotator. New message will better guide the user on how to configure `GraphExtraction` to output meaningful relationships
14+
* Removed the duplicated definition of method `setWeightedDistPath` from `ContextSpellCheckerApproach`
15+
16+
117
========
218
4.2.6
319
========

README.md

Lines changed: 46 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,8 @@ Take a look at our official Spark NLP page: [http://nlp.johnsnowlabs.com/](http:
139139
- Easy TensorFlow integration
140140
- GPU Support
141141
- Full integration with Spark ML functions
142-
- +6150+ pre-trained models in +200 languages!
143-
- +1840 pre-trained pipelines in +200 languages!
142+
- +8500 pre-trained models in +200 languages!
143+
- +3200 pre-trained pipelines in +200 languages!
144144
- Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.
145145

146146
## Requirements
@@ -152,7 +152,7 @@ To use Spark NLP you need the following requirements:
152152

153153
**GPU (optional):**
154154

155-
Spark NLP 4.2.6 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
155+
Spark NLP 4.2.7 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
156156

157157
- NVIDIA® GPU drivers version 450.80.02 or higher
158158
- CUDA® Toolkit 11.2
@@ -168,7 +168,7 @@ $ java -version
168168
$ conda create -n sparknlp python=3.7 -y
169169
$ conda activate sparknlp
170170
# spark-nlp by default is based on pyspark 3.x
171-
$ pip install spark-nlp==4.2.6 pyspark==3.2.3
171+
$ pip install spark-nlp==4.2.7 pyspark==3.2.3
172172
```
173173

174174
In Python console or Jupyter `Python3` kernel:
@@ -213,7 +213,7 @@ For more examples, you can visit our dedicated [repository](https://github.com/J
213213

214214
## Apache Spark Support
215215

216-
Spark NLP *4.2.6* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
216+
Spark NLP *4.2.7* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
217217

218218
| Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x |
219219
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
@@ -247,7 +247,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
247247

248248
## Databricks Support
249249

250-
Spark NLP 4.2.6 has been tested and is compatible with the following runtimes:
250+
Spark NLP 4.2.7 has been tested and is compatible with the following runtimes:
251251

252252
**CPU:**
253253

@@ -291,7 +291,7 @@ NOTE: Spark NLP 4.0.x is based on TensorFlow 2.7.x which is compatible with CUDA
291291

292292
## EMR Support
293293

294-
Spark NLP 4.2.6 has been tested and is compatible with the following EMR releases:
294+
Spark NLP 4.2.7 has been tested and is compatible with the following EMR releases:
295295

296296
- emr-6.2.0
297297
- emr-6.3.0
@@ -329,23 +329,23 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
329329
```sh
330330
# CPU
331331

332-
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
332+
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
333333

334-
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
334+
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
335335

336-
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
336+
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
337337
```
338338

339339
The `spark-nlp` has been published to the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp).
340340

341341
```sh
342342
# GPU
343343

344-
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.6
344+
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
345345

346-
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.6
346+
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
347347

348-
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.6
348+
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
349349

350350
```
351351

@@ -354,11 +354,11 @@ The `spark-nlp-gpu` has been published to the [Maven Repository](https://mvnrepo
354354
```sh
355355
# AArch64
356356

357-
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.6
357+
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
358358

359-
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.6
359+
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
360360

361-
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.6
361+
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
362362

363363
```
364364

@@ -367,11 +367,11 @@ The `spark-nlp-aarch64` has been published to the [Maven Repository](https://mvn
367367
```sh
368368
# M1
369369

370-
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.6
370+
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
371371

372-
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.6
372+
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
373373

374-
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.6
374+
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
375375

376376
```
377377

@@ -383,7 +383,7 @@ The `spark-nlp-m1` has been published to the [Maven Repository](https://mvnrepos
383383
spark-shell \
384384
--driver-memory 16g \
385385
--conf spark.kryoserializer.buffer.max=2000M \
386-
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
386+
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
387387
```
388388

389389
## Scala
@@ -399,7 +399,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
399399
<dependency>
400400
<groupId>com.johnsnowlabs.nlp</groupId>
401401
<artifactId>spark-nlp_2.12</artifactId>
402-
<version>4.2.6</version>
402+
<version>4.2.7</version>
403403
</dependency>
404404
```
405405

@@ -410,7 +410,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
410410
<dependency>
411411
<groupId>com.johnsnowlabs.nlp</groupId>
412412
<artifactId>spark-nlp-gpu_2.12</artifactId>
413-
<version>4.2.6</version>
413+
<version>4.2.7</version>
414414
</dependency>
415415
```
416416

@@ -421,7 +421,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
421421
<dependency>
422422
<groupId>com.johnsnowlabs.nlp</groupId>
423423
<artifactId>spark-nlp-aarch64_2.12</artifactId>
424-
<version>4.2.6</version>
424+
<version>4.2.7</version>
425425
</dependency>
426426
```
427427

@@ -432,7 +432,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
432432
<dependency>
433433
<groupId>com.johnsnowlabs.nlp</groupId>
434434
<artifactId>spark-nlp-m1_2.12</artifactId>
435-
<version>4.2.6</version>
435+
<version>4.2.7</version>
436436
</dependency>
437437
```
438438

@@ -442,28 +442,28 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
442442

443443
```sbtshell
444444
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
445-
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.6"
445+
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.7"
446446
```
447447

448448
**spark-nlp-gpu:**
449449

450450
```sbtshell
451451
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
452-
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.6"
452+
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.7"
453453
```
454454

455455
**spark-nlp-aarch64:**
456456

457457
```sbtshell
458458
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
459-
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.6"
459+
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.7"
460460
```
461461

462462
**spark-nlp-m1:**
463463

464464
```sbtshell
465465
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1
466-
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.6"
466+
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.7"
467467
```
468468

469469
Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
@@ -483,7 +483,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
483483
Pip:
484484

485485
```bash
486-
pip install spark-nlp==4.2.6
486+
pip install spark-nlp==4.2.7
487487
```
488488

489489
Conda:
@@ -511,7 +511,7 @@ spark = SparkSession.builder \
511511
.config("spark.driver.memory","16G")\
512512
.config("spark.driver.maxResultSize", "0") \
513513
.config("spark.kryoserializer.buffer.max", "2000M")\
514-
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6")\
514+
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7")\
515515
.getOrCreate()
516516
```
517517

@@ -579,7 +579,7 @@ Use either one of the following options
579579
- Add the following Maven Coordinates to the interpreter's library list
580580

581581
```bash
582-
com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
582+
com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
583583
```
584584

585585
- Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is available to driver path
@@ -589,7 +589,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
589589
Apart from the previous step, install the python module through pip
590590

591591
```bash
592-
pip install spark-nlp==4.2.6
592+
pip install spark-nlp==4.2.7
593593
```
594594

595595
Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -614,7 +614,7 @@ The easiest way to get this done on Linux and macOS is to simply install `spark-
614614
$ conda create -n sparknlp python=3.8 -y
615615
$ conda activate sparknlp
616616
# spark-nlp by default is based on pyspark 3.x
617-
$ pip install spark-nlp==4.2.6 pyspark==3.2.3 jupyter
617+
$ pip install spark-nlp==4.2.7 pyspark==3.2.3 jupyter
618618
$ jupyter notebook
619619
```
620620

@@ -630,7 +630,7 @@ export PYSPARK_PYTHON=python3
630630
export PYSPARK_DRIVER_PYTHON=jupyter
631631
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
632632

633-
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
633+
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
634634
```
635635

636636
Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -655,7 +655,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
655655
# -s is for spark-nlp
656656
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
657657
# by default they are set to the latest
658-
!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.6
658+
!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.7
659659
```
660660

661661
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
@@ -676,7 +676,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
676676
# -s is for spark-nlp
677677
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
678678
# by default they are set to the latest
679-
!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.6
679+
!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.7
680680
```
681681

682682
[Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP pretrained pipeline.
@@ -694,9 +694,9 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
694694

695695
3. In `Libraries` tab inside your cluster you need to follow these steps:
696696

697-
3.1. Install New -> PyPI -> `spark-nlp==4.2.6` -> Install
697+
3.1. Install New -> PyPI -> `spark-nlp==4.2.7` -> Install
698698

699-
3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6` -> Install
699+
3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7` -> Install
700700

701701
4. Now you can attach your notebook to the cluster and use Spark NLP!
702702

@@ -744,7 +744,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
744744
"spark.kryoserializer.buffer.max": "2000M",
745745
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
746746
"spark.driver.maxResultSize": "0",
747-
"spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6"
747+
"spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7"
748748
}
749749
}]
750750
```
@@ -753,7 +753,7 @@ A sample of AWS CLI to launch EMR cluster:
753753
754754
```.sh
755755
aws emr create-cluster \
756-
--name "Spark NLP 4.2.6" \
756+
--name "Spark NLP 4.2.7" \
757757
--release-label emr-6.2.0 \
758758
--applications Name=Hadoop Name=Spark Name=Hive \
759759
--instance-type m4.4xlarge \
@@ -817,7 +817,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
817817
--enable-component-gateway \
818818
--metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
819819
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
820-
--properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
820+
--properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
821821
```
822822
823823
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -856,7 +856,7 @@ spark = SparkSession.builder \
856856
.config("spark.kryoserializer.buffer.max", "2000m") \
857857
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") \
858858
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") \
859-
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6") \
859+
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7") \
860860
.getOrCreate()
861861
```
862862
@@ -870,7 +870,7 @@ spark-shell \
870870
--conf spark.kryoserializer.buffer.max=2000M \
871871
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
872872
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
873-
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
873+
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
874874
```
875875
876876
**pyspark:**
@@ -883,7 +883,7 @@ pyspark \
883883
--conf spark.kryoserializer.buffer.max=2000M \
884884
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
885885
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
886-
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.6
886+
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
887887
```
888888
889889
**Databricks:**
@@ -1147,12 +1147,12 @@ spark = SparkSession.builder \
11471147
.config("spark.driver.memory","16G")\
11481148
.config("spark.driver.maxResultSize", "0") \
11491149
.config("spark.kryoserializer.buffer.max", "2000M")\
1150-
.config("spark.jars", "/tmp/spark-nlp-assembly-4.2.6.jar")\
1150+
.config("spark.jars", "/tmp/spark-nlp-assembly-4.2.7.jar")\
11511151
.getOrCreate()
11521152
```
11531153
11541154
- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.0.x, 3.1.x, 3.2.x, and 3.3.x)
1155-
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.6.jar`)
1155+
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.7.jar`)
11561156
11571157
Example of using pretrained Models and Pipelines in offline:
11581158

build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ name := getPackageName(is_m1, is_gpu, is_aarch64)
66

77
organization := "com.johnsnowlabs.nlp"
88

9-
version := "4.2.6"
9+
version := "4.2.7"
1010

1111
(ThisBuild / scalaVersion) := scalaVer
1212

conda/meta.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
package:
22
name: "spark-nlp"
3-
version: 4.2.6
3+
version: 4.2.7
44

55
app:
66
entry: spark-nlp
77
summary: Natural Language Understanding Library for Apache Spark.
88

99
source:
10-
fn: spark-nlp-4.2.6.tar.gz
11-
url: https://files.pythonhosted.org/packages/75/6f/4d93b4ac48117284d00efe44b9f5b2f9ff46bd8e94f00b92efc99a68a427/spark-nlp-4.2.6.tar.gz
12-
sha256: 0c5c682ebddfb7e1e76610b9b7c46c3ab102219b7c5ff1278cd5aa81fc57bc0e
10+
fn: spark-nlp-4.2.7.tar.gz
11+
url: https://files.pythonhosted.org/packages/1d/e0/c123346f12e9d312c0b6bfecbd96db9e899882e01bc1adb338349d9e1088/spark-nlp-4.2.7.tar.gz
12+
sha256: 071f5b06ae10319cffe5a4fa22586a5b269800578e8a74de912abf123fd01bdf
1313
build:
1414
noarch: generic
1515
number: 0

0 commit comments

Comments
 (0)