|
| 1 | +--- |
| 2 | +layout: docs |
| 3 | +header: true |
| 4 | +seotitle: |
| 5 | +title: Dependency Parsing |
| 6 | +permalink: docs/en/tasks/dependency_parsing |
| 7 | +key: docs-tasks-dependency-parsing |
| 8 | +modify_date: "2024-09-28" |
| 9 | +show_nav: true |
| 10 | +sidebar: |
| 11 | + nav: sparknlp |
| 12 | +--- |
| 13 | + |
| 14 | +**Dependency Parsing** is a syntactic analysis task that focuses on the grammatical structure of sentences. It identifies the dependencies between words, showcasing how they relate in terms of grammar. Spark NLP provides advanced dependency parsing models that can accurately analyze sentence structures, enabling various applications in natural language processing. |
| 15 | + |
| 16 | +Dependency parsing models process input sentences and generate a structured representation of word relationships. Common use cases include: |
| 17 | + |
| 18 | +- **Grammatical Analysis:** Understanding the grammatical structure of sentences for better comprehension. |
| 19 | +- **Information Extraction:** Identifying key relationships and entities in sentences for tasks like knowledge graph construction. |
| 20 | + |
| 21 | +By using Spark NLP dependency parsing models, you can build efficient systems to analyze and understand sentence structures accurately. |
| 22 | + |
| 23 | +## Picking a Model |
| 24 | + |
| 25 | +When selecting a dependency parsing model, consider factors such as the **language of the text** and the **complexity of sentence structures**. Some models may be optimized for specific languages or types of text. Evaluate whether you need **detailed syntactic parsing** or a more **general analysis** based on your application. |
| 26 | + |
| 27 | +Explore the available dependency parsing models at [Spark NLP Models](https://sparknlp.org/models) to find the one that best fits your requirements. |
| 28 | + |
| 29 | +#### Recommended Models for Dependency Parsing Tasks |
| 30 | + |
| 31 | +- **General Dependency Parsing:** Consider models such as [`dependency_conllu_en_3_0`](https://sparknlp.org/2022/06/29/dependency_conllu_en_3_0.html){:target="_blank"} for analyzing English sentences. You can also explore language-specific models tailored for non-English languages. |
| 32 | + |
| 33 | +Choosing the appropriate model ensures you produce accurate syntactic structures that suit your specific language and use case. |
| 34 | + |
| 35 | +## How to use |
| 36 | + |
| 37 | +<div class="tabs-box" markdown="1"> |
| 38 | +{% include programmingLanguageSelectScalaPython.html %} |
| 39 | +```python |
| 40 | +from sparknlp.annotator import * |
| 41 | +from sparknlp.base import * |
| 42 | +from pyspark.ml import Pipeline |
| 43 | + |
| 44 | +# Document Assembler: Converts raw text into a document format suitable for processing |
| 45 | +documentAssembler = DocumentAssembler() \ |
| 46 | + .setInputCol("text") \ |
| 47 | + .setOutputCol("document") |
| 48 | + |
| 49 | +# Sentence Detector: Splits text into individual sentences |
| 50 | +sentenceDetector = SentenceDetector() \ |
| 51 | + .setInputCols(["document"]) \ |
| 52 | + .setOutputCol("sentence") |
| 53 | + |
| 54 | +# Tokenizer: Breaks sentences into tokens (words) |
| 55 | +tokenizer = Tokenizer() \ |
| 56 | + .setInputCols(["sentence"]) \ |
| 57 | + .setOutputCol("token") |
| 58 | + |
| 59 | +# Part-of-Speech Tagger: Tags each token with its respective POS (pretrained model) |
| 60 | +posTagger = PerceptronModel.pretrained() \ |
| 61 | + .setInputCols(["token", "sentence"]) \ |
| 62 | + .setOutputCol("pos") |
| 63 | + |
| 64 | +# Dependency Parser: Analyzes the grammatical structure of a sentence |
| 65 | +dependencyParser = DependencyParserModel.pretrained() \ |
| 66 | + .setInputCols(["sentence", "pos", "token"]) \ |
| 67 | + .setOutputCol("dependency") |
| 68 | + |
| 69 | +# Typed Dependency Parser: Assigns typed labels to the dependencies |
| 70 | +typedDependencyParser = TypedDependencyParserModel.pretrained() \ |
| 71 | + .setInputCols(["token", "pos", "dependency"]) \ |
| 72 | + .setOutputCol("labdep") |
| 73 | + |
| 74 | +# Create a pipeline that includes all the stages |
| 75 | +pipeline = Pipeline(stages=[ |
| 76 | + documentAssembler, |
| 77 | + sentenceDetector, |
| 78 | + tokenizer, |
| 79 | + posTagger, |
| 80 | + dependencyParser, |
| 81 | + typedDependencyParser |
| 82 | +]) |
| 83 | + |
| 84 | +# Sample input data (a DataFrame with one text example) |
| 85 | +data = {"text": ["Dependencies represent relationships between words in a sentence."]} |
| 86 | +df = spark.createDataFrame(data) |
| 87 | + |
| 88 | +# Run the pipeline on the input data |
| 89 | +result = pipeline.fit(df).transform(df) |
| 90 | + |
| 91 | +# Show the dependency parsing results |
| 92 | +result.select("dependency.result").show(truncate=False) |
| 93 | + |
| 94 | ++---------------------------------------------------------------------------------+ |
| 95 | +|result | |
| 96 | ++---------------------------------------------------------------------------------+ |
| 97 | +|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]| |
| 98 | ++---------------------------------------------------------------------------------+ |
| 99 | +``` |
| 100 | +```scala |
| 101 | +import com.johnsnowlabs.nlp.DocumentAssembler |
| 102 | +import com.johnsnowlabs.nlp.annotator._ |
| 103 | +import org.apache.spark.ml.Pipeline |
| 104 | +import spark.implicits._ |
| 105 | + |
| 106 | +// Document Assembler: Converts raw text into a document format for NLP processing |
| 107 | +val documentAssembler = new DocumentAssembler() |
| 108 | + .setInputCol("text") |
| 109 | + .setOutputCol("document") |
| 110 | + |
| 111 | +// Sentence Detector: Splits the input text into individual sentences |
| 112 | +val sentenceDetector = new SentenceDetector() |
| 113 | + .setInputCols(Array("document")) |
| 114 | + .setOutputCol("sentence") |
| 115 | + |
| 116 | +// Tokenizer: Breaks sentences into individual tokens (words) |
| 117 | +val tokenizer = new Tokenizer() |
| 118 | + .setInputCols(Array("sentence")) |
| 119 | + .setOutputCol("token") |
| 120 | + |
| 121 | +// Part-of-Speech Tagger: Tags each token with its respective part of speech (pretrained model) |
| 122 | +val posTagger = PerceptronModel.pretrained() |
| 123 | + .setInputCols(Array("token", "sentence")) |
| 124 | + .setOutputCol("pos") |
| 125 | + |
| 126 | +// Dependency Parser: Analyzes the grammatical structure of the sentence |
| 127 | +val dependencyParser = DependencyParserModel.pretrained() |
| 128 | + .setInputCols(Array("sentence", "pos", "token")) |
| 129 | + .setOutputCol("dependency") |
| 130 | + |
| 131 | +// Typed Dependency Parser: Assigns typed labels to the dependencies |
| 132 | +val typedDependencyParser = TypedDependencyParserModel.pretrained() |
| 133 | + .setInputCols(Array("token", "pos", "dependency")) |
| 134 | + .setOutputCol("labdep") |
| 135 | + |
| 136 | +// Create a pipeline that includes all stages |
| 137 | +val pipeline = new Pipeline().setStages(Array( |
| 138 | + documentAssembler, |
| 139 | + sentenceDetector, |
| 140 | + tokenizer, |
| 141 | + posTagger, |
| 142 | + dependencyParser, |
| 143 | + typedDependencyParser |
| 144 | +)) |
| 145 | + |
| 146 | +// Sample input data (a DataFrame with one text example) |
| 147 | +val df = Seq("Dependencies represent relationships between words in a Sentence").toDF("text") |
| 148 | + |
| 149 | +// Run the pipeline on the input data |
| 150 | +val result = pipeline.fit(df).transform(df) |
| 151 | + |
| 152 | +// Show the dependency parsing results |
| 153 | +result.select("dependency.result").show(truncate = false) |
| 154 | + |
| 155 | ++---------------------------------------------------------------------------------+ |
| 156 | +|result | |
| 157 | ++---------------------------------------------------------------------------------+ |
| 158 | +|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]| |
| 159 | ++---------------------------------------------------------------------------------+ |
| 160 | +``` |
| 161 | +</div> |
| 162 | + |
| 163 | +## Try Real-Time Demos! |
| 164 | + |
| 165 | +If you want to see the outputs of dependency parsing models in real time, visit our interactive demos: |
| 166 | + |
| 167 | +- **[Grammar Analysis & Dependency Parsing](https://huggingface.co/spaces/abdullahmubeen10/sparknlp-grammar-analysis-and-dependency-parsing){:target="_blank"}** – An interactive demo to visualize dependencies in sentences. |
| 168 | + |
| 169 | +## Useful Resources |
| 170 | + |
| 171 | +Want to dive deeper into dependency parsing with Spark NLP? Here are some curated resources to help you get started and explore further: |
| 172 | + |
| 173 | +**Articles and Guides** |
| 174 | +- *[Mastering Dependency Parsing with Spark NLP and Python](https://www.johnsnowlabs.com/supercharge-your-nlp-skills-mastering-dependency-parsing-with-spark-nlp-and-python/){:target="_blank"}* |
| 175 | + |
| 176 | +**Notebooks** |
| 177 | +- *[Extract Part of speech tags and perform dependency parsing on a text](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/GRAMMAR_EN.ipynb#scrollTo=syePZ-1gYyj3){:target="_blank"}* |
| 178 | +- *[Typed Dependency Parsing with NLU.](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/nlu/colab/component_examples/dependency_parsing/NLU_typed_dependency_parsing_example.ipynb){:target="_blank"}* |
0 commit comments