-
Notifications
You must be signed in to change notification settings - Fork 854
Open
Description
SynapseML version
synapseml_2.12:1.0.13
System information
- Language version (e.g. python 3.8, scala 2.12): python 3.12
- Spark Version (e.g. 3.2.3): 3.4.4, 3.5.1
- Spark Platform (e.g. Synapse, Databricks): Naked, Databricks
Describe the problem
When using AzureSearchWriter in SynapseML to update a GeographyPoint field in Azure AI Search, the writer sends the location value as a JSON string rather than a proper GeoJSON object. Azure AI Search expects a valid spatial object, so the request fails with HTTP 400.
Code to reproduce issue
from pyspark.sql import SparkSession, functions as F, types as T
import synapse.ml.services.search
AZURE_SEARCH_ADMIN_KEY = "api_key"
AZURE_SEARCH_SERVICE = "ai-search"
INDEX_NAME = "demo-geoindex"
spark = (
SparkSession.builder
.appName("synapseml-to-azure-search")
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.13")
.getOrCreate()
)
indexJson = r'''
{
"name": "demo-geoindex",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": true, "retrievable": true },
{ "name": "location", "type": "Edm.GeographyPoint", "searchable": false, "filterable": true, "retrievable": true, "sortable": true }
]
}
'''.strip()
# location as STRING containing GeoJSON (lon, lat)
doc_schema = T.StructType([
T.StructField("id", T.StringType(), False),
T.StructField("location", T.StringType(), False),
])
geojson = '{"type":"Point","coordinates":[-0.1276, 51.5072]}'
df = (spark.createDataFrame([("1", geojson)], schema=doc_schema)
.withColumn("searchAction", F.lit("upload")))
df.show(1)
df.writeToAzureSearch(
subscriptionKey=AZURE_SEARCH_ADMIN_KEY,
serviceName=AZURE_SEARCH_SERVICE,
indexJson=indexJson,
actionCol="searchAction",
batchSize="1"
)UTILITY - REPRO without Spark:
curl -s \
"https://<ai-search>.search.windows.net/indexes/demo-geoindex/docs/index?api-version=2023-07-01-Preview" \
-H "api-key: <api_key>" \
-H "Content-Type: application/json" \
-d '{"value":[{"id":"1","location":"{\"type\":\"Point\",\"coordinates\":[-0.1276, 51.5072]}","@search.action":"upload"}]}'UTILITY - EXPECTED
curl -s \
"https://<ai-search>.search.windows.net/indexes/demo-geoindex/docs/index?api-version=2023-07-01-Preview" \
-H "api-key: <api_key>" \
-H "Content-Type: application/json" \
-d '{"value":[{"id":"3","location":{"type":"Point","coordinates":[-0.1276, 51.5072]},"@search.action":"upload"}]}'Other info / logs
25/09/16 19:15:04 WARN HandlingUtils: got error 400: Bad Request on https://ai-search-m-se.search.windows.net/indexes/demo-geoindex/docs/index?api-version=2023-07-01-Preview {"value":[{"id":"1","location":"{\"type\":\"Point\",\"coordinates\":[-0.1276, 51.5072]}","@search.action":"upload"}]}
25/09/16 19:15:04 INFO SynapseMLLogging: finished sending to https://ai-search-m-se.search.windows.net/indexes/demo-geoindex/docs/index?api-version=2023-07-01-Preview took (248ms)
25/09/16 19:15:04 INFO CodeGenerator: Code generated in 13.930916 ms
25/09/16 19:15:04 INFO CodeGenerator: Code generated in 11.4245 ms
25/09/16 19:15:04 INFO CodeGenerator: Code generated in 20.387 ms
25/09/16 19:15:04 INFO CodeGenerator: Code generated in 6.13 ms
25/09/16 19:15:04 ERROR Executor: Exception in task 10.0 in stage 3.0 (TID 21)
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$3203/0x000000700207c000: (struct<response:string,status:struct<protocolVersion:struct<protocol:string,major:int,minor:int>,statusCode:int,reasonPhrase:string>>, struct<col1:string>) => struct<response:string,status:struct<protocolVersion:struct<protocol:string,major:int,minor:int>,statusCode:int,reasonPhrase:string>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ScalaUDF_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1004)
at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1004)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2298)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.RuntimeException: Service Exception:
[{"error":{"code":"","message":"The request is invalid. Details: The value specified for the spatial property was not valid. You must specify a valid spatial value."}},[[HTTP,1,1],400,Bad Request]]
for input:
[{"value":[{"id":"1","location":"{\"type\":\"Point\",\"coordinates\":[-0.1276, 51.5072]}","@search.action":"upload"}]}]
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.$anonfun$checkForErrors$1(AzureSearch.scala:156)
at scala.Option.map(Option.scala:230)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.checkForErrors(AzureSearch.scala:153)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.$anonfun$prepareDF$23(AzureSearch.scala:356)
at org.apache.spark.injections.UDFUtils$$anon$2.call(UDFUtils.scala:29)
at org.apache.spark.sql.functions$.$anonfun$udf$93(functions.scala:5438)
... 22 more
What component(s) does this bug affect?
-
area/cognitive: Cognitive project -
area/core: Core project -
area/deep-learning: DeepLearning project -
area/lightgbm: Lightgbm project -
area/opencv: Opencv project -
area/vw: VW project -
area/website: Website -
area/build: Project build system -
area/notebooks: Samples under notebooks folder -
area/docker: Docker usage -
area/models: models related issue
What language(s) does this bug affect?
-
language/scala: Scala source code -
language/python: Pyspark APIs -
language/r: R APIs -
language/csharp: .NET APIs -
language/new: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/synapse: Azure Synapse integrations -
integrations/azureml: Azure ML integrations -
integrations/databricks: Databricks integrations
abossardabossard