1
1
---
2
2
date :
3
- created : 2025-04-30
3
+ created : 2025-7-26
4
4
authors :
5
5
6
6
- matt_powers
7
7
- kelly
8
8
title : Geospatial Data on Iceberg - The Lakehouse Advantage
9
9
---
10
10
11
- <<<<<<< HEAD
12
- =======
13
11
<!--
14
12
Licensed to the Apache Software Foundation (ASF) under one
15
13
or more contributor license agreements. See the NOTICE file
@@ -29,32 +27,12 @@ title: Geospatial Data on Iceberg - The Lakehouse Advantage
29
27
under the License.
30
28
-->
31
29
32
- <<<<<<< HEAD
33
- <<<<<<< HEAD
34
- >>>>>>> 14074d417b ([ DOCS] putting metadata at top)
35
- # Geospatial Data on Iceberg: The Lakehouse Advantage
36
-
37
- This post delves into the benefits of Lakehouse architecture for spatial tables
38
- and differentiate its approach from standard data warehouses and data lakes.
39
- =======
40
30
This post discusses the benefits of Lakehouse architecture for spatial
41
31
tables, comparing the Lakehouse approach to standard data warehouses and data lakes.
42
- >>>>>>> 3128e022f4 (blog refinements)
43
- =======
44
- TODO: Rework intro
45
-
46
- This post discusses the benefits of Lakehouse architecture for spatial
47
- tables, comparing the Lakehouse approach to that of data warehouses and data lakes.
48
- >>>>>>> bad3190a21 (formatting fixes)
49
32
50
33
While spatial data requires different types of metadata and optimizations,
51
34
it _ doesn't_ require entirely different file formats.
52
35
53
- #### Key Points
54
-
55
- * Geospatial Data has native support in Apache Parquet and Apache Iceberg.
56
- * TODO
57
-
58
36
Recent advancements, specifically the addition of native support for geometry/geography types to
59
37
Apache Parquet and the Apache Iceberg V3 specification, enable the spatial data community
60
38
to fully integrate with and leverage the benefits of Lakehouse architectures.
@@ -319,12 +297,8 @@ and `Polygon` geometries for purchases tied to an approximate region.
319
297
320
298
## Joining tables containing spatial and non-spatial data
321
299
322
- <<<<<<< HEAD
323
- Let's discuss how to join the customers and customer_purchases tables.
324
- =======
325
300
Let's discuss how to use Sedona to join the non-spatial data
326
301
in the ` customers ` table with the spatial data in the ` customer_purchases ` table.
327
- >>>>>>> bad3190a21 (formatting fixes)
328
302
329
303
``` py
330
304
customers = sedona.table(" local.db.customers" )
@@ -601,8 +575,7 @@ filter. The required process involves reading the entire existing dataset, apply
601
575
the modification to all records in memory, and then rewriting the whole modified
602
576
dataset back to storage, overwriting the original.
603
577
604
- This highlights key data
605
- lake disadvantages: the inefficiency of the full read/ rewrite cycle, and the
578
+ This highlights key data lake disadvantages: the inefficiency of the full read/ rewrite cycle, and the
606
579
critical lack of atomicity in the overwrite step, which risks data corruption
607
580
or loss if the write operation fails partway through.
608
581
0 commit comments