📢 Spark NLP 6.1.3: NerDL Graph Checker, Reader2Doc Enhancements, Ranking Finisher

We are pleased to announce Spark NLP 6.1.3, introducing a new graph validation annotator for NER training, enhancements to Reader2Doc for flexible document handling, and a new ranking finisher for AutoGGUFReranker outputs. This release focuses on improving training robustness, document processing flexibility, and retrieval ranking capabilities.

🔥 Highlights

New NerDLGraphChecker annotator to validate NER training graphs before training starts.
Reader2Doc enhancements with options for consolidated output and filtering.
New AutoGGUFRerankerFinisher for ranking, filtering, and normalizing reranker outputs.

🚀 New Features & Enhancements

Named Entity Recognition (NER)

NerDLGraphChecker:
A new annotator that validates whether a suitable NerDL graph is available for a given training dataset before embeddings or training start. This helps avoid wasted computation in custom training scenarios. (Link to notebook)

Must be placed before embedding or NerDLApproach annotators.
Requires token and label columns in the dataset.
Automatically extracts embedding dimensions from the pipeline to validate graph compatibility.

Document Processing

Reader2Doc Enhancements:
New configuration options provide more control over output formatting:

outputAsDocument: Concatenates all sentences into a single document.
excludeNonText: Filters out non-textual elements (e.g., tables, images) from the document.

Ranking & Retrieval

AutoGGUFRerankerFinisher:
A finisher for processing AutoGGUFReranker outputs, adding advanced ranking and filtering capabilities (Link to notebook):

Top-k document selection.
Score threshold filtering.
Min-max score normalization (0–1 range).
Sorting by relevance score.
Rank assignment in metadata while preserving document structure.

🐛 Bug Fixes

None.

❤️ Community Support

Slack Live discussion with the Spark NLP community and team
GitHub Bug reports, feature requests, and contributions
Discussions Share ideas and engage with other community members
Medium Spark NLP technical articles
JohnSnowLabs Medium Official blog
YouTube Spark NLP tutorials and demos

Installation

Python

pip install spark-nlp==6.1.3

Spark Packages

spark-nlp on Apache Spark 3.0.x–3.4.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.3

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.3

Apple Silicon (M1 & M2)

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.3

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.3

Maven

spark-nlp:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>6.1.3</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>6.1.3</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>6.1.3</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>6.1.3</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 6.1.2...6.1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

6.1.3