A super lightweight cloud management tool designed with deep learning applications in mind.
Built with the belief that managing cloud resources should be as easy as:
import cloud
cloud.connect()
train_my_network()
cloud.down()We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at [email protected] with ideas!
Sort of stable:
sudo pip install dl-cloudBleeding edge:
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloudSee configs/cloud.toml-* for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).
Place your completed configuration file (named cloud.toml) in either root / or $HOME. Otherwise, provide a full path to the file in $CLOUD_CFG.
If you use GCP as a provider for your cloud.toml it will use GCP Instance metadata APIs to fetch APIs. If you want to configure for Google Cloud Build, please use;
is_gcb = true
zone = '{{DESIRED_ZONE}}' import cloud
cloud.connect()
# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.
while True:
# train your model or w/e
cloud.down() # stop the instance (does not delete instance)import cloud
cloud.connect()
tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(background=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e
cloud.down() # release all resources, then stop the instance (does not delete instance)Takes/Creates a cloud.Instance object and sets cloud.instance to it.
| returns | desc. |
|---|---|
| cloud_env | a cloud.Instance. |
Calls cloud.instance.down().
Calls cloud.instance.delete(confirm).
Takes/Creates a cloud.Instance object and sets cloud.instance to it.
| properties | desc. |
|---|---|
name |
str, name of the instance |
usable |
bool, whether this resource is usable |
| methods | desc. |
up(background=False) |
start an existing stopped resource |
down(background=False) |
stop the resource. Note: this should not necessarily delete this resource |
delete(background=False) |
delete this resource |
An object representing a cloud instance with a set of Resources that can be allocated/deallocated.
| properties | desc. |
|---|---|
resource_managers |
list of ResourceManagers |
| methods | desc. |
down(background=False, delete_resources=True) |
stop this instance and optionally delete all managed resources |
delete(background=False, confirm=True) |
delete this instance with optional user confirmation |
Class for managing the creation and maintanence of cloud.Resources.
| properties | desc. |
|---|---|
instance |
cloud.Instance instance owning this resource manager |
resource_cls |
cloud.Resource type, the class of the resource to be managed |
resources |
list of cloud.Resources, managed resources |
| methods | desc. |
__init__(instance, resource_cls) |
instance: the cloud.Instance object operating this ResourceManager |
resource_cls : the cloud.Resource class this object manages |
|
add(*args, **kwargs) |
add an existing resource to this manager |
remove(*args, **kwargs) |
remove an existing resource from this manager |
A cloud.Instance object for AWS EC2 instances.
A cloud.Instance object for Microsoft Azure instances.
Our GCPInstance requires that your instances have gcloud installed and properly authenticated so that gcloud alpha compute tpus create test_name runs without issue.
A cloud.Instance object for Google Cloud instances.
| properties | desc. |
|---|---|
tpu |
cloud.TPUManager, a resource manager for this instance's TPUs |
resource_managers |
list of owned cloud.ResourceManagers |
| methods | desc. |
__init__(collect_existing_tpus=True, **kwargs) |
collect_existing_tpus : bool, whether to add existing TPUs to this manager |
**kwargs : passed to cloud.Instance's initializer |
Resource class for TPU accelerators.
| properties | desc. |
|---|---|
ip |
str, IP address of the TPU |
preemptible |
bool, whether this TPU is preemptible or not |
details |
dict {str: str}, properties of this TPU |
| methods | desc. |
up(background=False) |
start this TPU |
down(background=False) |
stop this TPU |
delete(background=False) |
delete this TPU |
ResourceManager class for TPU accelerators.
| properties | desc. |
|---|---|
names |
list of str, names of the managed TPUs |
ips |
list of str, ips of the managed TPUs |
| methods | desc. |
__init__(instance, collect_existing=True) |
instance: the cloud.GCPInstance object operating this TPUManager |
collect_existing: bool, whether to add existing TPUs to this manager |
|
clean(background=True) |
delete all managed TPUs with unhealthy states |
get(preemptible=True) |
get an available TPU, or create one using up() if none exist |
up(preemptible=True, background=False) |
allocate and manage a new instance of resource_cls |