-
|
I use the following pipeline with BioBERT Sentence Embeddings. 24/08/08 03:19:13.581 [task-result-getter-3] WARN o.a.spark.scheduler.TaskSetManager - Lost task 7.2 in stage 10.0 (TID 370) (10.0.0.12 executor 4): org.apache.spark.SparkException: Failed to execute user defined function (LSHModel$$Lambda$5263/1056329262: (struct<type:tinyint,size:int,indices:array,values:array>) => array<struct<type:tinyint,size:int,indices:array,values:array>>) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
The exception still raises even I use sent_roberta_base. |
Beta Was this translation helpful? Give feedback.
-
|
Finally I found the root cause. There exists It will be viewed as 2 sentences. The solution to my case is to set custom bound for SentenceDetector |
Beta Was this translation helpful? Give feedback.
Finally I found the root cause. There exists
.in dataset like thisIt will be viewed as 2 sentences.
The output column(sentence_embeddings) of BertSentenceEmbeddings and RoBertaSentenceEmbeddings is an array of size 2.
DocumentSimilarityRankerApproach.train()will flattensentence_embeddings.embeddingsand causes the dimension be 1536 (768 * 2)The solution to my case is to set custom bound for SentenceDetector