Spark NLP 5.1.1: ONNX support for more features!

Maziyar Panahi
spark-nlp
Published in
5 min readSep 15, 2023

--

Introducing ONNX Support for MPNet, AlbertForTokenClassification, AlbertForSequenceClassification, AlbertForQuestionAnswering transformers, access to full vectors in Word2VecModel, Doc2VecModel, WordEmbeddingsModel annotators, 460+ new ONNX models, and bug fixes

πŸ“’ Overview

Spark NLP 5.1.1 πŸš€ comes with new ONNX support for MPNet, AlbertForTokenClassification, AlbertForSequenceClassification, and AlbertForQuestionAnswering annotators, a new getVectors feature in Word2VecModel, Doc2VecModel, and WordEmbeddingsModel annotators, 460+ new ONNX models for MPNet and BERT transformers, and bug fixes!

We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 18,800+ free and truly open-source models & pipelines. πŸŽ‰

πŸ”₯ New Features & Enhancements

  • NEW: Introducing support for ONNX Runtime in MPNet embedding annotator
  • NEW: Introducing support for ONNX Runtime in AlbertForTokenClassification annotator
  • NEW: Introducing support for ONNX Runtime in AlbertForSequenceClassification annotator
  • NEW: Introducing support for ONNX Runtime in AlbertForQuestionAnswering annotator
  • Implement getVectors feature in Word2VecModel, Doc2VecModel, and WordEmbeddingsModel annotators. This new feature allows access to the entire tokens and their vectors from the loaded models.

πŸ› Bug Fixes

  • Fix how to save and load Whisper models
  • Fix saving ONNX model on Windows operating system

❀️ Community support

  • Slack For live discussion with the Spark NLP community and the team
  • GitHub Bug reports, feature requests, and contributions
  • Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium Spark NLP articles
  • JohnSnowLabs official Medium
  • YouTube Spark NLP video tutorials

Installation

Python

#PyPI
pip install spark-nlp==5.1.1

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.1

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.1

Apple Silicon (M1 & M2)

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.1

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.1

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>5.1.1</version>
</dependency>

com.johnsnowlabs.nlp spark-nlp_2.12 5.1.1 β€œ tabindex=”0" role=”button” style=”box-sizing: border-box; position: relative; display: inline-block; font-size: 14px; font-weight: var( β€” base-text-weight-medium, 500); line-height: 20px; white-space: nowrap; vertical-align: middle; cursor: pointer; -webkit-user-select: none; border-width: 1px; border-style: solid; border-color: var( β€” button-default-borderColor-rest, var( β€” color-btn-border)); border-image: none; border-radius: 6px; appearance: none; color: var( β€” button-default-fgColor-rest, var( β€” color-btn-text)); background-color: var( β€” button-default-bgColor-rest, var( β€” color-btn-bg)); box-shadow: var( β€” button-default-shadow-resting, var( β€” color-btn-shadow)),var( β€” button-default-shadow-inset, var( β€” color-btn-inset-shadow)); transition: color 80ms cubic-bezier(0.33, 1, 0.68, 1), background-color, box-shadow, border-color; padding: 0px !important; margin: var( β€” base-size-8, 8px) !important;”>

spark-nlp-gpu:

<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>5.1.1</version>
</dependency>

com.johnsnowlabs.nlp spark-nlp-gpu_2.12 5.1.1 β€œ tabindex=”0" role=”button” style=”box-sizing: border-box; position: relative; display: inline-block; font-size: 14px; font-weight: var( β€” base-text-weight-medium, 500); line-height: 20px; white-space: nowrap; vertical-align: middle; cursor: pointer; -webkit-user-select: none; border-width: 1px; border-style: solid; border-color: var( β€” button-default-borderColor-rest, var( β€” color-btn-border)); border-image: none; border-radius: 6px; appearance: none; color: var( β€” button-default-fgColor-rest, var( β€” color-btn-text)); background-color: var( β€” button-default-bgColor-rest, var( β€” color-btn-bg)); box-shadow: var( β€” button-default-shadow-resting, var( β€” color-btn-shadow)),var( β€” button-default-shadow-inset, var( β€” color-btn-inset-shadow)); transition: color 80ms cubic-bezier(0.33, 1, 0.68, 1), background-color, box-shadow, border-color; padding: 0px !important; margin: var( β€” base-size-8, 8px) !important;”>

spark-nlp-silicon:

<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>5.1.1</version>
</dependency>

com.johnsnowlabs.nlp spark-nlp-silicon_2.12 5.1.1 β€œ tabindex=”0" role=”button” style=”box-sizing: border-box; position: relative; display: inline-block; font-size: 14px; font-weight: var( β€” base-text-weight-medium, 500); line-height: 20px; white-space: nowrap; vertical-align: middle; cursor: pointer; -webkit-user-select: none; border-width: 1px; border-style: solid; border-color: var( β€” button-default-borderColor-rest, var( β€” color-btn-border)); border-image: none; border-radius: 6px; appearance: none; color: var( β€” button-default-fgColor-rest, var( β€” color-btn-text)); background-color: var( β€” button-default-bgColor-rest, var( β€” color-btn-bg)); box-shadow: var( β€” button-default-shadow-resting, var( β€” color-btn-shadow)),var( β€” button-default-shadow-inset, var( β€” color-btn-inset-shadow)); transition: color 80ms cubic-bezier(0.33, 1, 0.68, 1), background-color, box-shadow, border-color; padding: 0px !important; margin: var( β€” base-size-8, 8px) !important;”>

spark-nlp-aarch64:

<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>5.1.1</version>
</dependency>

com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 5.1.1 β€œ tabindex=”0" role=”button” style=”box-sizing: border-box; position: relative; display: inline-block; font-size: 14px; font-weight: var( β€” base-text-weight-medium, 500); line-height: 20px; white-space: nowrap; vertical-align: middle; cursor: pointer; -webkit-user-select: none; border-width: 1px; border-style: solid; border-color: var( β€” button-default-borderColor-rest, var( β€” color-btn-border)); border-image: none; border-radius: 6px; appearance: none; color: var( β€” button-default-fgColor-rest, var( β€” color-btn-text)); background-color: var( β€” button-default-bgColor-rest, var( β€” color-btn-bg)); box-shadow: var( β€” button-default-shadow-resting, var( β€” color-btn-shadow)),var( β€” button-default-shadow-inset, var( β€” color-btn-inset-shadow)); transition: color 80ms cubic-bezier(0.33, 1, 0.68, 1), background-color, box-shadow, border-color; padding: 0px !important; margin: var( β€” base-size-8, 8px) !important;”>

FAT JARs

What’s Changed

Full Changelog: 5.1.0…5.1.1

--

--