JohnSnowLabs
diff --git a/‎.sync/ignoreFiles‎
Lines changed: 352 additions & 0 deletions b/‎.sync/ignoreFiles‎
Lines changed: 352 additions & 0 deletions
diff --git a/‎CHANGELOG‎
Lines changed: 24 additions & 0 deletions b/‎CHANGELOG‎
Lines changed: 24 additions & 0 deletions
@@ -0,0 +1,352 @@
+docs
+docs/*
+*/docs
+*/docs/*
+**/docs/**
+docs/**
+**/*.min.js
+**/*.js
+**/*.py
+python/**
+**/python/**
+/python/tensorflow/
+/python/tensorflow/*
+
+target
+target/*
+/target
+/target/*
+*/target/*
+
+*.json
+
+### Eclipse ###
+
+.metadata
+bin/
+tmp/
+*.tmp
+*.bak
+*.swp
+*~.nib
+local.properties
+.settings/
+.loadpath
+.recommenders
+PubMed*
+*cache_pretrained*
+*.crc
+*.sst
+_SUCCESS*
+*stages*
+*auxdata*
+# External tool builders
+.externalToolBuilders/
+
+# Locally stored "Eclipse launch configurations"
+*.launch
+
+# PyDev specific (Python IDE for Eclipse)
+*.pydevproject
+
+# CDT-specific (C/C++ Development Tooling)
+.cproject
+
+# Java annotation processor (APT)
+.factorypath
+
+# PDT-specific (PHP Development Tools)
+.buildpath
+
+# sbteclipse plugin
+.target
+
+# Tern plugin
+.tern-project
+
+# TeXlipse plugin
+.texlipse
+
+# STS (Spring Tool Suite)
+.springBeans
+
+# Code Recommenders
+.recommenders/
+
+# Scala IDE specific (Scala & Java development for Eclipse)
+.cache-main
+.scala_dependencies
+.worksheet
+
+### Eclipse Patch ###
+# Eclipse Core
+.project
+
+# JDT-specific (Eclipse Java Development Tools)
+.classpath
+
+### Intellij ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+
+# User-specific stuff:
+.idea/**/workspace.xml
+.idea/**/tasks.xml
+.idea/dictionaries
+
+# Sensitive or high-churn files:
+.idea/**/dataSources/
+.idea/**/dataSources.ids
+.idea/**/dataSources.xml
+.idea/**/dataSources.local.xml
+.idea/**/sqlDataSources.xml
+.idea/**/dynamic.xml
+.idea/**/uiDesigner.xml
+
+# Gradle:
+.idea/**/gradle.xml
+.idea/**/libraries
+
+# CMake
+cmake-build-debug/
+
+# Mongo Explorer plugin:
+.idea/**/mongoSettings.xml
+
+## File-based project format:
+*.iws
+
+## Plugin-specific files:
+
+# IntelliJ
+/out/
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Cursive Clojure plugin
+.idea/replstate.xml
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+fabric.properties
+
+### Intellij Patch ###
+# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
+
+*.iml
+# modules.xml
+# .idea/misc.xml
+# *.ipr
+
+# Sonarlint plugin
+.idea/sonarlint
+
+### Intellij+all ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+
+# User-specific stuff:
+
+# Sensitive or high-churn files:
+
+# Gradle:
+
+# CMake
+
+# Mongo Explorer plugin:
+
+## File-based project format:
+
+## Plugin-specific files:
+
+# IntelliJ
+
+# mpeltonen/sbt-idea plugin
+
+# JIRA plugin
+
+# Cursive Clojure plugin
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+
+### Intellij+all Patch ###
+# Ignores the whole idea folder
+# See https://github.com/joeblau/gitignore.io/issues/186 and https://github.com/joeblau/gitignore.io/issues/360
+
+.idea/
+
+### Java ###
+# Compiled class file
+*.class
+
+# Log file
+*.log
+
+# BlueJ files
+*.ctxt
+
+# Mobile Tools for Java (J2ME)
+.mtj.tmp/
+
+# Package Files #
+*.jar
+*.war
+*.ear
+*.zip
+*.tar.gz
+*.rar
+
+# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
+hs_err_pid*
+
+### Python ###
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+python/lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+docs/vendor/
+
+# Frontend
+docs/_frontend/node_modules
+docs/_frontend/static
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+### SBT ###
+# Simple Build Tool
+# http://www.scala-sbt.org/release/docs/Getting-Started/Directories.html#configuring-version-control
+
+dist/*
+lib_managed/
+src_managed/
+project/boot/
+project/plugins/project/
+.history
+.lib/
+
+### Scala ###
+
+# End of https://www.gitignore.io/api/sbt,java,scala,python,eclipse,intellij,intellij+all
+
+### Local ###
+tmp_pipeline/
+tmp_symspell/
+test-output-tmp/
+spark-warehouse/
+/python/python.iml
+test_crf_pipeline/
+test_*_pipeline/
+*metastore_db*
+python/src/
+python/tensorflow/bert/models/**
+**/.DS_Store
+**/tmp_*
+docs/_site/**
+docs/.sass-cache/**
+tst_shortcut_sd/
+src/*/resources/*.classes
+/word_segmenter_metrics/
+/special_class.ser
+.bsp/sbt.json
+python/docs/_build/**
+python/docs/reference/_autosummary/**
@@ -1,3 +1,27 @@
+========
+4.2.0
+========
+----------------
+New Features & Enhancements
+----------------
+* **NEW:** Introducing **Wav2Vec2ForCTC** annotator in Spark NLP 🚀. `Wav2Vec2ForCTC` can load `Wav2Vec2` models for the Automatic Speech Recognition (ASR) task. Wav2Vec2 is a multi-modal model, that combines speech and text. It's the first multi-modal model of its kind we welcome in Spark NLP. This annotator is compatible with all the models trained/fine-tuned by using `Wav2Vec2ForCTC` for **PyTorch** or `TFWav2Vec2ForCTC` for **TensorFlow** models in HuggingFace 🤗 (https://github.com/JohnSnowLabs/spark-nlp/pull/12767)
+* **NEW:** Introducing **TapasForQuestionAnswering** annotator in Spark NLP 🚀. `TapasForQuestionAnswering` can load TAPAS Models with a cell selection head and optional aggregation head on top for question-answering tasks on tables (linear layers on top of the hidden-states output to compute logits and optional logits_aggregation), e.g. for SQA, WTQ or WikiSQL-supervised tasks. TAPAS is a BERT-based model specifically designed (and pre-trained) for answering questions about tabular data. This annotator is compatible with all the models trained/fine-tuned by using `TapasForQuestionAnswering` for **PyTorch** or `TFTapasForQuestionAnswering` for **TensorFlow** models in HuggingFace 🤗
+* **NEW:** Introducing **CamemBertForTokenClassification** annotator in Spark NLP 🚀. `CamemBertForTokenClassification` can load CamemBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `CamembertForTokenClassification` for PyTorch or `TFCamembertForTokenClassification` for TensorFlow in HuggingFace 🤗
+(https://github.com/JohnSnowLabs/spark-nlp/pull/12752)
+* Implementing  `setTestDataset`  to evaluate metrics on an external dataset during training of Text Classifiers in Spark NLP. This feature is similar to NerDLApproach where metrics are calculated on each Epoch and have been added to the following multi-class/multi-label text classifier annotators: `ClassifierDLApproach`, `SentimentDLApproach`, and `MultiClassifierDLApproach` (https://github.com/JohnSnowLabs/spark-nlp/pull/12796)
+* Refactoring and improving `EntityRuler` annotator inference to up to 24x faster especially when used with a long list of labels/entities. We speed up the inference process by implementing the Aho-Corasick algorithm to match patterns in a string. This requires the following changes when using `EntityRuler` https://github.com/JohnSnowLabs/spark-nlp/pull/12634
+* Add support for S3 storage in the `cache_folder` where models are downloaded, extracted, and loaded from. Previously, we only supported all local file systems, HDFS, and DBFS. This new feature is especially useful for users on Kubernetes clusters with no access to HDFS or any other distributed file systems (https://github.com/JohnSnowLabs/spark-nlp/pull/12707)
+* Implementing `lookaround` functionalities in `DocumentNormalizer` annotator. Currently, `DocumentNormalizer` has both `lookahead` and `lookbehind` functionalities. To extend support for more complex normalizations, especially within the clinical text we are introducing the `lookaround` feature (https://github.com/JohnSnowLabs/spark-nlp/pull/12735)
+* Implementing `setReplaceEntities` param to `NerOverwriter` annotator to replace all the NER labels (entities) with the given new labels (entities) (https://github.com/JohnSnowLabs/spark-nlp/pull/12745)
+
+----------------
+Bug Fixes
+----------------
+* Fix a bug in generating the NerDL graph by using TF v2. The previous graph generated by the `TFGraphBuilder` annotator resulted in an exception when the length of the sequence was 1. This issue has been resolved and the new graphs created by `TFGraphBuilder` won't have this issue anymore (https://github.com/JohnSnowLabs/spark-nlp/pull/12636)
+* Fix a bug introduced in the 4.0.0 release between Transformer-based Word Embeddings annotators. In the 4.0.0 release, the following annotators were migrated to BatchAnnotate to improve their performance, especially on GPU. However, a bug was introduced in sentence indices which when it is combined with SentenceEmbeddings for Text Classifications tasks (ClassifierDLApproach, SentimentDLApproach, and ClassifierDLApproach) resulted in low accuracy: AlbertEmbeddings, CamemBertEmbeddings, DeBertaEmbeddings, DistilBertEmbeddings, LongformerEmbeddings, RoBertaEmbeddings, XlmRoBertaEmbeddings, and XlnetEmbeddings (https://github.com/JohnSnowLabs/spark-nlp/pull/12641)
+* Add support for a list of questions and context in LightPipline. Previously, only one context and question at a time were supported in LightPipeline for Question Answering annotators. We have added support to `fullAnnotate` and `annotate` to receive two lists of questions and contexts (https://github.com/JohnSnowLabs/spark-nlp/pull/12653)
+* Fix division by zero exception in the `GPT2Transformer` annotator when the `setDoSample` param was set to true (https://github.com/JohnSnowLabs/spark-nlp/pull/12661)
+
 ========
 4.1.0
 ========