Skip to content

Commit 489153d

Browse files
committed
2 parents e0e3c05 + 9231fa5 commit 489153d

File tree

1 file changed

+92
-7
lines changed

1 file changed

+92
-7
lines changed

README.md

Lines changed: 92 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ This functionality is presented to customers as a Python library to allow maximu
7777

7878
### Example Credentials File
7979

80-
For the example usage scripts, you can configure a file on your filesystem with the following structure, which includes Access and Secret Keys for each of the personas used to demonstrate the functionality. You then reference this file in the examples through the `CredentialsFile` environment variable.
80+
To run these functions, you must provide identities that can operate on the producer, consumer, or mesh accounts. These can be configured in a credentials file for simplicity, with the following structure:
8181

8282
```
8383
{
@@ -110,7 +110,15 @@ For the example usage scripts, you can configure a file on your filesystem with
110110
}
111111
```
112112

113-
Please make sure not to add this file to any publicly shared resources such as git forks of the codebase!
113+
This file includes the following identities:
114+
115+
* **Mesh** - Administrative identity used to configure and manage central Data Mesh objects like catalogs and shared tables. This identity is required for initializing the Data Mesh infrastructure.
116+
* **ProducerAdmin** - Administrative identity used to setup an account as as data producer. This identity is only used to enable an AWS account on initial setup.
117+
* **ConsumerAdmin** - Administrative identity used to setup an account as as data consumer. This identity is only used to enable an AWS account on initial setup.
118+
* **Producer** - Identity used for day-to-day producer tasks such as `create-data-product`, `approve-access-request` and `modify-subscription`. In general, you should use the pre-installed `DataMeshProducer` user or those users who are part of the `DataMeshProducerGroup` in the Producer AWS Account.
119+
* **Consumer** - Identity used for day-to-day consumer tasks such as `request-access` and `import-subscription`. In general, you should use the pre-installed `DataMeshConsumer` user or those users who are part of the `DataMeshConsumerGroup` in the Consumer AWS Account.
120+
121+
For the example usage scripts, you can configure a file on your filesystem, and eference this file in through the `CredentialsFile` environment variable. For the `cli`, you can provide the path to this file using argument `--credentials-file`. Please make sure not to add this file to any publicly shared resources such as git forks of the codebase!
114122

115123
## Getting Started
116124

@@ -154,8 +162,22 @@ mesh_admin = dmu.DataMeshAdmin(
154162
mesh_admin.initialize_mesh_account()
155163
```
156164

165+
or
166+
167+
```
168+
./data-mesh-cli install-mesh-objects --credentials-file <my credentials file> ...
169+
```
170+
157171
You can also use [examples/0\_setup\_central\_account.py](examples/0_setup_central_account.py) as an example to build your own application.
158172

173+
If you get an error that looks like:
174+
175+
```
176+
An error occurred (AccessDeniedException) when calling the PutDataLakeSettings operation: User: arn:aws:iam::<account>:user/<user> is not authorized to perform: lakeformation:PutDataLakeSettings on resource: arn:aws:lakeformation:us-east-1:<account>:catalog:<account> with an explicit deny in an identity-based policy
177+
```
178+
179+
This probably means that you have attached the `AWSLakeFormationDataAdmin` IAM policy to your user, which prevents you setting data lake permissions.
180+
159181
### Step 1.1 - Enable an AWS Account as a Producer
160182

161183
You must configure an account to act as a Producer in order to offer data shares to other accounts. This is an administrative task that is run once per AWS Account. The configured credentials must have AdministratorAccess as well as Lake Formation Data Lake Admin. To setup an account as a Producer, run:
@@ -197,6 +219,13 @@ mesh_macros.bootstrap_account(
197219
account_credentials=producer_credentials
198220
)
199221
```
222+
223+
or
224+
225+
```
226+
./data-mesh-cli enable-account --credentials-file <credentials-file> --account-type producer ...
227+
```
228+
200229
You can also use [examples/0\_5\_setup\_account\_as.py](examples/0_5_setup_account_as.py) as an example to build your own application.
201230

202231
### Step 1.2: Enable an AWS Account as a Consumer
@@ -235,12 +264,17 @@ mesh_macros.bootstrap_account(
235264
account_credentials=consumer_credentials
236265
)
237266
```
267+
or
268+
269+
```
270+
./data-mesh-cli enable-account --credentials-file <credentials-file> --account-type producer ...
271+
```
238272

239273
The above Steps 1.1 and 1.2 can be run for any number of accounts that you require to act as Producers or Consumers. You can also use [examples/0\_5\_setup\_account\_as.py](examples/0_5_setup_account_as.py) as an example to build your own application.. If you want to provision an account as both Producer _and_ Consumer, then use `account_type='both'` in the above call to `bootstrap_account()`.
240274

241275
### Step 2: Create a Data Product
242276

243-
Creating a data product replicates Glue Catalog metadata from the Producer's account into the Data Mesh account, while leaving the source storage at rest within the Producer. The data mesh objects are shared back to the Producer account to enable local control without accessing the data mesh. Data Products can be created from Glue Catalog Databases or one-or-more Tables, but all permissions are managed at Table level. Producers can run this as many times as they require. To create a data product:
277+
Data products can be created from one-or-more Glue tables, and the API provides a variety of configuration options to allow you to control how they are exposed. To create a data product:
244278

245279
```python
246280
from data_mesh_util import DataMeshProducer as dmp
@@ -265,7 +299,7 @@ table_name = 'The Table Name'
265299
domain_name = 'The name of the Domain which the table should be tagged with'
266300
data_product_name = 'If you are publishing multiple tables, the product name to be used for all'
267301
cron_expr = 'daily'
268-
crawler_role = 'IAM Role that the created Glue Crawler should run as'
302+
crawler_role = 'IAM Role that the created Glue Crawler should run as - calling identity must have iam::PassRole on the ARN'
269303
create_public_metadata = True if 'Use value True to allow any user to see the shared object in the data mesh otherwise False' else False
270304

271305
data_mesh_producer.create_data_products(
@@ -277,13 +311,38 @@ data_mesh_producer.create_data_products(
277311
sync_mesh_catalog_schedule=cron_expr,
278312
sync_mesh_crawler_role_arn=crawler_role,
279313
expose_data_mesh_db_name=None,
280-
expose_table_references_with_suffix=None
314+
expose_table_references_with_suffix=None,
315+
use_original_table_name=None
281316
)
282317
```
318+
or
319+
320+
```
321+
./data-mesh-cli create-data-product --credentials-file <credentials-file> --source-database-name <database> --table-regex <regular expression matching tables> ...
322+
```
283323

284324
You can also use [examples/1\_create\_data\_product.py](examples/1_create_data_product.py) as an example to build your own application.
285325

286-
Please note that upon creation of a data product, you will see a new Database and Table created in the Data Mesh Account, and this Database and Table have been shared back to the producer AWS Account using Resource Access Manager (RAM). Your producer Account may now be able to query data both from within the data mesh and from their own account, but the security Principal used for Data Mesh Utils may require additional permissions to use Athena or other query services.
326+
By default, a data product replicates Glue Catalog metadata from the Producer's account into the Data Mesh account. The new tables created in the Data Mesh account are shared back to the Producer account through a new database and resource link which let's the Producer change objects in the mesh from within their own Account.
327+
328+
Alternatively, some customers may wish to have a single version of their table metadata which only resides within the Data Mesh, for example for when datasets are prepared specifically for sharing. In this case, the `create-data-product` request allows for the version of the Table in the Data Mesh to be the only master copy, and transparently shared back to the producer. To use this option, instead use API `migrate_tables_to_mesh`:
329+
330+
```
331+
...
332+
333+
data_mesh_producer.migrate_tables_to_mesh(
334+
source_database_name=database_name,
335+
table_name_regex=table_name,
336+
domain=domain_name,
337+
data_product_name=data_product_name,
338+
create_public_metadata=True,
339+
sync_mesh_catalog_schedule=cron_expr,
340+
sync_mesh_crawler_role_arn=crawler_role
341+
)
342+
343+
```
344+
345+
Upon completion, you will see that the table in the Producer AWS Account has been replaced with a Resource Link shared from the Mesh Account. Your producer Account may now be able to query data both from within the data mesh and from their own account, but the security Principal used for Data Mesh Utils may require additional permissions to use Athena or other query services.
287346

288347
### Step 3: Request access to a Data Product Table
289348

@@ -319,10 +378,15 @@ subscription = data_mesh_consumer.request_access_to_product(
319378
)
320379
print(subscription.get('SubscriptionId')
321380
```
381+
or
382+
383+
```
384+
./data-mesh-cli request-access --credentials-file <credentials-file> --database-name <database> --tables <table1, table2, table3> --request-permissions <list of permissions requested, including INSERT, SELECT, DESCRIBE, UPDATE, DELETE> ...
385+
```
322386

323387
You can also use [examples/2\_consumer\_request\_access.py](examples/2_consumer_request_access.py) as an example to build your own application.
324388

325-
### Step 4: Grant Access to the Consumer
389+
### Step 4: Grant or Deny Access to the Consumer
326390

327391
In this step, you will grant permissions to the Consumer who has requested access:
328392

@@ -368,6 +432,20 @@ approval = data_mesh_producer.approve_access_request(
368432
grantable_permissions=grantable_permissions,
369433
decision_notes=approval_notes
370434
)
435+
436+
# or deny access request
437+
approval = data_mesh_producer.deny_access_request(
438+
request_id=subscription_id,
439+
decision_notes="no way"
440+
)
441+
```
442+
443+
or
444+
445+
```
446+
./data-mesh-cli approve-subscription --credentials-file <credentials-file> --request_id <request id> --notes <notes with the approval> ...
447+
448+
./data-mesh-cli deny-subscription --credentials-file <credentials-file> --request_id <request id> --decision-notes <notes for the denial>
371449
```
372450

373451
You can also use [examples/3\_grant\_data\_product\_access.py](examples/3_grant_data_product_access.py) as an example to build your own application.
@@ -400,6 +478,13 @@ data_mesh_consumer.finalize_subscription(
400478
subscription_id=subscription_id
401479
)
402480
```
481+
482+
or
483+
484+
```
485+
./data-mesh-cli import-subscription --credentials-file <credentials-file> --subscription_id <subscription request id> ...
486+
```
487+
403488
You can also use [examples/4\_finalize\_subscription.py](examples/4_finalize_subscription.py) as an example to build your own application.
404489

405490
---

0 commit comments

Comments
 (0)