Skip to content

Commit c888721

Browse files
authored
Merge pull request #1 from unisoncomputing/architecture-docs
Add architecture and security docs
2 parents f0f7ec3 + 52013f5 commit c888721

File tree

2 files changed

+248
-0
lines changed

2 files changed

+248
-0
lines changed

docs/architecture/README.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Unison Cloud architecture
2+
3+
At a high level, the system architecture for Unison Cloud looks like this:
4+
5+
```mermaid
6+
flowchart LR
7+
dev("🧑‍💻 Developer<br/>ucm, @unison/cloud library")
8+
browser("🌐 Client <br/>browser, curl, external service, etc")
9+
internet("🌐 Public internet")
10+
subgraph data-plane["☁️ Data plane (your infra)"]
11+
reverse-proxy("Reverse proxy<br/>(optional)<br>Nginx, Caddy, etc")
12+
subgraph nimbus["Nimbus instance"]
13+
nimbus-public@{ shape: comment, label: "Public HTTP port" }
14+
nimbus-gossip@{ shape: brace-r, label: "Gossip port" }
15+
end
16+
other-nimbus-instances@{ shape: processes, label: "Other nimbus instances" }
17+
storage[("Storage (DynamoDB)")]
18+
blobstore[("Blobs (S3)")]
19+
secrets[("Secret store (Vault)")]
20+
outbound-proxy["Outbound proxy<br/>(optional)"]
21+
end
22+
control-plane("☁️ Control plane<br/>(Unison Computing infra)")
23+
dev -->|Cloud.run auth| control-plane
24+
dev -->|Cloud.run| reverse-proxy
25+
reverse-proxy --> nimbus-public
26+
browser -->|HTTP request to user-deployed service| reverse-proxy
27+
nimbus --> storage
28+
nimbus --> blobstore
29+
nimbus --> secrets
30+
nimbus --->|Persistent WebSocket| control-plane
31+
nimbus --> outbound-proxy --> internet
32+
nimbus-gossip <--> other-nimbus-instances
33+
```
34+
35+
The server side of Unison Cloud is divided between the [control plane](#control-plane) and the [data plane](#data-plane). The **control plane** is in charge of authentication/authorization and coordination, while the **data plane** runs user jobs/services and houses user data.
36+
37+
To understand the connections between these components it may be helpful to see a diagram of the sequence of events when a developer deploys a Unison Cloud web service and then another user accesses it:
38+
39+
```mermaid
40+
sequenceDiagram
41+
box rgba(128,128,128, 0.1) Control Plane<br/>#40;Unison Computing infra#41;
42+
participant cloud-api as cloud-api
43+
end
44+
actor dev as developer (via cloud client lib)
45+
box rgba(128,128,128, 0.1) Data Plane #40;your infra#41;
46+
participant nimbus
47+
end
48+
actor consumer as service consumer (browser, external service, etc)
49+
dev->>+cloud-api: deploy service to env `task-app-prod` (auth token)
50+
cloud-api->>dev: `task-app-prod` access confirmed (scoped token)
51+
dev->>+nimbus: Deploy service (scoped token, code)
52+
nimbus->>dev: Service deployed! (service URL)
53+
consumer->>nimbus: GET /s/task-app/tasks
54+
nimbus->>consumer: ["dishes", "laundry"]
55+
```
56+
57+
## Control plane
58+
59+
The control plane is in charge of authentication/authorization and coordination of Unison Cloud clusters. It typically runs on centralized Unison Computing infrastructure (but contact us if you have other needs) and does not have access to the user/application secrets, code, or data that are housed in the data plane.
60+
61+
Interactions with the control plane go through an HTTP API that you may see referred to as `cloud-api`.
62+
63+
### Authentication and authorization
64+
65+
The control plane manages two separate types of **authentication**:
66+
67+
- Nimbus nodes requesting to join a cluster
68+
- Developers or CI/CD servers submitting jobs, deploying services, creating databases, etc. These requests typically come from the [cloud client][cloud-client].
69+
70+
While **authentication** uses long-lived credentials, **authorization** for job submissions, service deployments, etc is enabled via narrowly-scoped per-request tokens.
71+
72+
See [the security guide][auth] for details about authentication and authorization.
73+
74+
### Cluster membership and orchestration
75+
76+
One of the primary functions of the control plane is to keep track of cluster membership. As each Nimbus node starts up, it establishes a persistent connection to the control plane, registering with a location ID and address (URI). The control plane then broadcasts a message to other connected Nimbus nodes to inform them about the new cluster member. It sends periodic health checks to each Nimbus instance and will alert other nodes if one disconnects or fails health checks.
77+
78+
As the source of truth on cluster membership, the control plane also orchestrates cluster operations. For example it informs each node which [daemons][daemons] it should run.
79+
80+
### Cluster events
81+
82+
The control plane broadcasts messages to each connected Nimbus node to keep it updated on cluster events. In many cases these events are used to optimize/invalidate local caches. Events include:
83+
84+
- members joining, leaving, or failing health checks (mentioned above)
85+
- [Environment][Environment] changes, such as a [Config][env-config] value changing
86+
- user service changes, such as a new implementation being assigned to a [service name][ServiceName]
87+
88+
## Data plane
89+
90+
### Nimbus
91+
92+
Nimbus instances are the primary workers of a Unison Cloud cluster. They run user-submitted jobs (via [Cloud.submit][Cloud.submit]) and services (via [Cloud.deploy][Cloud.deploy]), supporting the [Remote ability][Remote] for distributed programs.
93+
94+
Nimbus is itself implemented in Unison and greatly benefits from the Unison programming language's support for distributed computation:
95+
96+
- Values and entire programs/services can be serialized and sent to another node to distribute computation.
97+
- Thanks to content-addressed code there are never runtime dependency conflicts or name collisions, even in a shared cluster.
98+
- User code is sandboxed with fine-grained control of which operations are permitted.
99+
100+
Nimbus serves a few different types of requests:
101+
102+
- Authenticated developer requests to run jobs, deploy services, and other functionality provided by the [cloud client][cloud-client]. These arrive via the public HTTP port.
103+
- Unauthorized requests to user-deployed services that arrive via the public HTTP port under dedicated `/s/` (service name) and `/h/` (service hash) endpoints. The underlying user-defined web service may implement its own authentication, but Nimbus itself considers these endpoints to be public.
104+
- Nimbus cluster peer requests that arrive via the gossip port. These are requests supporting distributed computation such as "run this program in a new thread", "cancel this thread", "atomically modify this mutable reference", and "read this promise and send me the result when it is available".
105+
106+
For more details on Nimbus authentication and ports see the [security documentation][security].
107+
108+
### Transactional Storage
109+
110+
The transactional storage provider of the data plane supports the [Storage][Storage] ability. Currently the only supported backend is [DynamoDB][DynamoDB].
111+
112+
### Blob store
113+
114+
The blob store provider of the data plane supports the [Blobs][Blobs] ability. Currently blob stores with an [s3][s3]-compatible API are supported.
115+
116+
### Secrets
117+
118+
The secrets provider of the data plane supports the [Environment][Environment] and [Environment.Config][Environment.Config] abilities. Currently [Vault][vault] is supported, but we plan to support other secrets providers as requested.
119+
120+
### Outbound proxy
121+
122+
The data plane can optionally configure an outbound network proxy that will be used for all user [{Http}][Http] and [{Tcp}][Tcp] requests. This may be your existing corporate proxy or could be a custom [squid][squid] proxy that only lets user code connect to vetted domains/ports/IPs/etc.
123+
124+
[auth]: security.md
125+
[Blobs]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Blobs
126+
[cloud-client]: https://share.unison-lang.org/@unison/cloud
127+
[Cloud.deploy]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/terms/Cloud/deploy
128+
[Cloud.submit]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/terms/Cloud/submit
129+
[daemons]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Daemon
130+
[DynamoDb]: https://docs.aws.amazon.com/dynamodb/
131+
[Environment]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Environment
132+
[Environment.Config]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Environment/Config
133+
[env-config]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Environment/Config
134+
[Http]: https://share.unison-lang.org/@unison/http/code/releases/5.0.2/latest/types/client/Http
135+
[Remote]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Remote
136+
[security]: security.md
137+
[ServiceName]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/ServiceName
138+
[s3]: https://aws.amazon.com/s3/
139+
[squid]: https://www.squid-cache.org/
140+
[Storage]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/Storage
141+
[Tcp]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/types/provisional/Remote/Tcp
142+
[vault]: https://www.hashicorp.com/en/products/vault

docs/architecture/security.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Unison Cloud security
2+
3+
This document provides a security overview of Unison Cloud. It will be helpful to first familiarize yourself with the [Unison Cloud architecture][architecture].
4+
5+
## User (developer) authentication
6+
7+
Before allowing a developer (or a CI/CD job) to create an environment, submit a job, deploy a service, we must validate their identity.
8+
9+
[Unison Share][Share] acts as the authentication server for Unison Cloud. It implements [OAuth2](https://datatracker.ietf.org/doc/html/rfc6749) with
10+
the [PKCE extension](https://www.oauth.com/oauth2-servers/pkce/), and a subset of the [OpenID Connect Core](https://openid.net/specs/openid-connect-core-1_0.html) and [OpenID Discovery specifications](https://openid.net/specs/openid-connect-discovery-1_0.html). Its open-source auth implementation can be found [here][share-auth]. Its current JSON Web Key Set can be found at its [.well-known/jwks.json endpoint](https://api.unison-lang.org/.well-known/jwks.json).
11+
12+
## User (developer) authorization
13+
14+
While **authentication** uses long-lived OAuth credentials, **authorization** for job submissions, service deployments, etc is enabled via narrowly-scoped per-request JWT. The developer first requests a request token from the control plane, which verifies that they have authorization to act on the requested environment/service/etc. It then provides the developer a narrowly-scoped request token along with a redirect to the data plane. The data plane verifies the request token and then performs the requested token.
15+
16+
An example sequence of requests and responses when a developer deploys a Unison Cloud web service and then another user sends the service a request might look like this:
17+
18+
```mermaid
19+
sequenceDiagram
20+
box rgba(128,128,128, 0.1) Control Plane<br/>#40;Unison Computing infra#41;
21+
participant cloud-api as cloud-api
22+
end
23+
actor dev as developer (via cloud client lib)
24+
box rgba(128,128,128, 0.1) Data Plane #40;your infra#41;
25+
participant nimbus
26+
end
27+
actor consumer as service consumer (browser, external service, etc)
28+
dev->>+cloud-api: deploy service to env `task-app-prod` (auth token)
29+
cloud-api->>dev: `task-app-prod` access confirmed (scoped token)
30+
dev->>+nimbus: Deploy service (scoped token, code)
31+
nimbus->>dev: Service deployed! (service URL)
32+
consumer->>nimbus: GET /s/task-app/tasks
33+
nimbus->>consumer: ["dishes", "laundry"]
34+
```
35+
36+
## Cluster member authentication
37+
38+
When a Nimbus member connects to the control plane to join the cluster, it authenticates using a random token generated by the control plane at cluster creation time. At any time a cluster administrator can make an authenticated request to the control plane to generate new tokens and revoke old tokens.
39+
40+
## Service authentication and authorization
41+
42+
When a developer deploys a Unison Cloud service via [Cloud.deploy], by the default the service is public and does _not_ perform any authentication. If developers want to secure the service they can do so in their Unison code, such as by utilizing the [@unison/auth library][unison-auth].
43+
44+
## Network security
45+
46+
### VPC/firewall
47+
48+
It's best practice to run Nimbus nodes (and as much of the data plane as practical) within a VPC or limited-access network. Nimbus nodes initiate requests to the control plane and establish a persistent WebSocket connection on startup, so there is no need for them to allow incoming connections from the control plane or anyone else.
49+
50+
### Nimbus gossip port
51+
52+
The most important security consideration when deploying nimbus is:
53+
54+
**The Nimbus gossip port is trusted and should remain private**
55+
56+
Nimbus assumes that requests coming in through its gossip port are from Nimbus peers and trusts them. Nimbus instances need to be able to talk to their peers' gossip ports, but nobody else should need to. You may want to deploy a small sidecar proxy (such as [Envoy][envoy]) next to each Nimbus instance to gate access to its gossip port.
57+
58+
If this is a concern for you and you aren't in a good position to secure the gossip port via other means, reach out to Unison Cloud support to discuss options.
59+
60+
### Nimbus public HTTP port
61+
62+
The Nimbus public HTTP port is considered safe to expose publicly. It handles:
63+
64+
- HTTP requests to user-deployed Unison Cloud services. These are public or implement their own auth. See [service auth][service-auth].
65+
- Developer requests from the [cloud client][cloud-client] to run a job, deploy a service, etc. These validate narrowly-scoped tokens generated by the control plane. See [User (developer) authorization][developer-authorization].
66+
67+
You'll likely want to deploy a reverse proxy/load balancer in front of Nimbus's public port to distribute service requests across nodes and to terminate TLS.
68+
69+
While it's referred to as the "public" HTTP port, there is no _need_ to make it publicly accessible. If your requests should only come from other services within a VPC or from clients connected to your VPN, then there's no reason to expose your instances beyond that network.
70+
71+
## User-provided code execution
72+
73+
Like Hadoop/Spark clusters, Kubernetes/Nomad clusters, and many other cloud offerings, nimbus nodes run user-provided code in submitted jobs and deployed services. Let's be real: at some level this is remote code execution as a service. However, Nimbus is able to lean on the strengths of the Unison programming language to mitigate the risks of running arbitrary code.
74+
75+
### Content-addressed code
76+
77+
All code in Unison (and by extension user code submitted to Nimbus) is content-addressed. When a user (or cluster peer) sends a computation to a Nimbus node, they don't say that they want to run `Environment.Config.expect`; they specify that they want to run the function with the hash `01bga3jq5u8ev85lsatbqip9hkrgr4jtr4li520b0r5gpvmi9el30`. If the receiving node already has a definition for `01bga3jq5u8ev85lsatbqip9hkrgr4jtr4li520b0r5gpvmi9el30` then it can proceed with the computation. If it does _not_ know the definition that hashes to `01bga3jq5u8ev85lsatbqip9hkrgr4jtr4li520b0r5gpvmi9el30`, then it will ask for a definition. Upon receiving the requested definition it verifies that the provided code matches the specified hash.
78+
79+
This unique approach provides significant benefits:
80+
81+
- Two versions of `Environment.Config.expect` never collide. They aren't identified by the string `Environment.Config.expect` in a mutable bag of strings known as a classpath. User code specifies the _exact_ version of `Environment.Config.expect` via hash and the runtime is never aware that the two distinct definitions happened to both be named `Environment.Config.expect` at different points in time or in different codebases.
82+
- A malicious user cannot pollute the runtime code namespace with hijacked implementations of terms. Imagine that a malicious user wanted to override `Environment.Config.expect` to send the key/value to a remote server before returning them. They may succeed in finding a Nimbus node without a definition for `Environment.Config.expect`, and they could even get that node to request the definition! But then their efforts will fall apart. If they provide anything other than the actual implementation then the definition will no longer hash to `01bga3jq5u8ev85lsatbqip9hkrgr4jtr4li520b0r5gpvmi9el30` and code from other users that calls `Environment.Config.expect` will be blissfully ignorant of any definition with a different hash.
83+
84+
### Code sandboxing
85+
86+
Addressing code by its content isn't enough to make it safe to run arbitrary user code. For example you wouldn't want a malicious user to be able to submit a Unison Cloud job that looked like:
87+
88+
```
89+
Cloud.submit Environment.default() do
90+
doBadStuff = coerceAbilities do
91+
getEnv "AWS_SECRET_ACCESS_KEY"
92+
doBadStuff()
93+
```
94+
95+
Luckily, the Unison programming language supports fine-grained code sandboxing. Before Nimbus runs any user-provided code (via job submission, service deployment, etc), it runs the code through [reflection.Value.validateSandboxed][Value.validateSandboxed]. By default it disallows all code that performs IO or uses reflection to dynamically load code/values. By default it does allow a small number of builtins that are tracked as being potential sandbox candidates such as `toDebugText` which returns a textual representation of an arbitrary value (in practice this one is harmless albeit not particularly useful since the Nimbus runtime doesn't have user-provided names for definitions). If you'd like a different set of sandbox rules for your cluster, contact Unison Cloud support, and we can make the sandbox rules configurable.
96+
97+
[architecture]: README.md
98+
[cloud-client]: https://share.unison-lang.org/@unison/cloud
99+
[Cloud.deploy]: https://share.unison-lang.org/@unison/cloud/code/releases/21.2.0/latest/terms/Cloud/deploy
100+
[developer-authorization]: #user-developer-authorization
101+
[envoy]: https://www.envoyproxy.io/
102+
[Value.validateSandboxed]: https://share.unison-lang.org/@unison/base/code/releases/6.5.0/latest/terms/reflection/Value/validateSandboxed
103+
[service-auth]: #service-authentication-and-authorization
104+
[Share]: https://share.unison-lang.org/
105+
[share-auth]: https://github.com/unisoncomputing/share-api/tree/main/share-auth
106+
[unison-auth]: https://share.unison-lang.org/@unison/auth

0 commit comments

Comments
 (0)