You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+92-7Lines changed: 92 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,7 +77,7 @@ This functionality is presented to customers as a Python library to allow maximu
77
77
78
78
### Example Credentials File
79
79
80
-
For the example usage scripts, you can configure a file on your filesystem with the following structure, which includes Access and Secret Keys for each of the personas used to demonstrate the functionality. You then reference this file in the examples through the `CredentialsFile` environment variable.
80
+
To run these functions, you must provide identities that can operate on the producer, consumer, or mesh accounts. These can be configured in a credentials file for simplicity, with the following structure:
81
81
82
82
```
83
83
{
@@ -110,7 +110,15 @@ For the example usage scripts, you can configure a file on your filesystem with
110
110
}
111
111
```
112
112
113
-
Please make sure not to add this file to any publicly shared resources such as git forks of the codebase!
113
+
This file includes the following identities:
114
+
115
+
***Mesh** - Administrative identity used to configure and manage central Data Mesh objects like catalogs and shared tables. This identity is required for initializing the Data Mesh infrastructure.
116
+
***ProducerAdmin** - Administrative identity used to setup an account as as data producer. This identity is only used to enable an AWS account on initial setup.
117
+
***ConsumerAdmin** - Administrative identity used to setup an account as as data consumer. This identity is only used to enable an AWS account on initial setup.
118
+
***Producer** - Identity used for day-to-day producer tasks such as `create-data-product`, `approve-access-request` and `modify-subscription`. In general, you should use the pre-installed `DataMeshProducer` user or those users who are part of the `DataMeshProducerGroup` in the Producer AWS Account.
119
+
***Consumer** - Identity used for day-to-day consumer tasks such as `request-access` and `import-subscription`. In general, you should use the pre-installed `DataMeshConsumer` user or those users who are part of the `DataMeshConsumerGroup` in the Consumer AWS Account.
120
+
121
+
For the example usage scripts, you can configure a file on your filesystem, and eference this file in through the `CredentialsFile` environment variable. For the `cli`, you can provide the path to this file using argument `--credentials-file`. Please make sure not to add this file to any publicly shared resources such as git forks of the codebase!
You can also use [examples/0\_setup\_central\_account.py](examples/0_setup_central_account.py) as an example to build your own application.
158
172
173
+
If you get an error that looks like:
174
+
175
+
```
176
+
An error occurred (AccessDeniedException) when calling the PutDataLakeSettings operation: User: arn:aws:iam::<account>:user/<user> is not authorized to perform: lakeformation:PutDataLakeSettings on resource: arn:aws:lakeformation:us-east-1:<account>:catalog:<account> with an explicit deny in an identity-based policy
177
+
```
178
+
179
+
This probably means that you have attached the `AWSLakeFormationDataAdmin` IAM policy to your user, which prevents you setting data lake permissions.
180
+
159
181
### Step 1.1 - Enable an AWS Account as a Producer
160
182
161
183
You must configure an account to act as a Producer in order to offer data shares to other accounts. This is an administrative task that is run once per AWS Account. The configured credentials must have AdministratorAccess as well as Lake Formation Data Lake Admin. To setup an account as a Producer, run:
The above Steps 1.1 and 1.2 can be run for any number of accounts that you require to act as Producers or Consumers. You can also use [examples/0\_5\_setup\_account\_as.py](examples/0_5_setup_account_as.py) as an example to build your own application.. If you want to provision an account as both Producer _and_ Consumer, then use `account_type='both'` in the above call to `bootstrap_account()`.
240
274
241
275
### Step 2: Create a Data Product
242
276
243
-
Creating a data product replicates Glue Catalog metadata from the Producer's account into the Data Mesh account, while leaving the source storage at rest within the Producer. The data mesh objects are shared back to the Producer account to enable local control without accessing the data mesh. Data Products can be created from Glue Catalog Databases or one-or-more Tables, but all permissions are managed at Table level. Producers can run this as many times as they require. To create a data product:
277
+
Data products can be created from one-or-more Glue tables, and the API provides a variety of configuration options to allow you to control how they are exposed. To create a data product:
244
278
245
279
```python
246
280
from data_mesh_util import DataMeshProducer as dmp
@@ -265,7 +299,7 @@ table_name = 'The Table Name'
265
299
domain_name ='The name of the Domain which the table should be tagged with'
266
300
data_product_name ='If you are publishing multiple tables, the product name to be used for all'
267
301
cron_expr ='daily'
268
-
crawler_role ='IAM Role that the created Glue Crawler should run as'
302
+
crawler_role ='IAM Role that the created Glue Crawler should run as - calling identity must have iam::PassRole on the ARN'
269
303
create_public_metadata =Trueif'Use value True to allow any user to see the shared object in the data mesh otherwise False'elseFalse
You can also use [examples/1\_create\_data\_product.py](examples/1_create_data_product.py) as an example to build your own application.
285
325
286
-
Please note that upon creation of a data product, you will see a new Database and Table created in the Data Mesh Account, and this Database and Table have been shared back to the producer AWS Account using Resource Access Manager (RAM). Your producer Account may now be able to query data both from within the data mesh and from their own account, but the security Principal used for Data Mesh Utils may require additional permissions to use Athena or other query services.
326
+
By default, a data product replicates Glue Catalog metadata from the Producer's account into the Data Mesh account. The new tables created in the Data Mesh account are shared back to the Producer account through a new database and resource link which let's the Producer change objects in the mesh from within their own Account.
327
+
328
+
Alternatively, some customers may wish to have a single version of their table metadata which only resides within the Data Mesh, for example for when datasets are prepared specifically for sharing. In this case, the `create-data-product` request allows for the version of the Table in the Data Mesh to be the only master copy, and transparently shared back to the producer. To use this option, instead use API `migrate_tables_to_mesh`:
329
+
330
+
```
331
+
...
332
+
333
+
data_mesh_producer.migrate_tables_to_mesh(
334
+
source_database_name=database_name,
335
+
table_name_regex=table_name,
336
+
domain=domain_name,
337
+
data_product_name=data_product_name,
338
+
create_public_metadata=True,
339
+
sync_mesh_catalog_schedule=cron_expr,
340
+
sync_mesh_crawler_role_arn=crawler_role
341
+
)
342
+
343
+
```
344
+
345
+
Upon completion, you will see that the table in the Producer AWS Account has been replaced with a Resource Link shared from the Mesh Account. Your producer Account may now be able to query data both from within the data mesh and from their own account, but the security Principal used for Data Mesh Utils may require additional permissions to use Athena or other query services.
287
346
288
347
### Step 3: Request access to a Data Product Table
0 commit comments