Skip to content

Releases: aws/aws-parallelcluster

AWS ParallelCluster v2.4.1

29 Jul 10:42
8f5359f

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.4.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Docs

New docs are available here: https://docs.aws.amazon.com/parallelcluster/latest/ug/

Enhancements

  • Add support for ap-east-1 region (Hong Kong)
  • Add possibility to specify instance type to use when building custom AMIs with pcluster createami
  • Speed up cluster creation by having compute nodes starting together with master node
  • Enable ASG CloudWatch metrics for the ASG managing compute nodes
  • Install Intel MPI 2019u4 on Amazon Linux, Centos 7 and Ubuntu 1604
  • Upgrade Elastic Fabric Adapter (EFA) to version 1.4.1 that supports Intel MPI
  • Run all node daemons and cookbook recipes in isolated Python virtualenvs. This allows our code to always run with the required Python dependencies and solves all conflicts and runtime failures that were being caused by user packages installed in the system Python
  • Torque:
    • Process nodes added to or removed from the cluster in batches in order to speed up cluster scaling
    • Scale up only if required CPU/nodes can be satisfied
    • Scale down if pending jobs have unsatisfiable CPU/nodes requirements
    • Add support for jobs in hold/suspended state (this includes job dependencies)
    • Automatically terminate and replace faulty or unresponsive compute nodes
    • Add retries in case of failures when adding or removing nodes
    • Add support for ncpus reservation and multi nodes resource allocation (e.g. -l nodes=2:ppn=3+3:ppn=6)
    • Optimized Torque global configuration to faster react to the dynamic cluster scaling

Changes

  • Update EFA installer to a new version, note this changes the location of mpicc and mpirun. To avoid breaking existing code, we recommend you use the modulefile module load openmpi and which mpicc for anything that requires the full path
  • Eliminate Launch Configuration and use Launch Templates in all the regions
  • Torque: upgrade to version 6.1.2
  • Run all ParallelCluster daemons with Python 3.6 in a virtualenv. Daemons code now supports Python >= 3.5

Bug Fixes

  • Fix issue with sanity check at creation time that was preventing clusters from being created in private subnets
  • Fix pcluster configure when relative config path is used
  • Make FSx Substack depend on ComputeSecurityGroupIngress to keep FSx from trying to create prior to the SG allowing traffic within itself
  • Restore correct value for filehandle_limit that was getting reset when setting memory_limit for EFA
  • Torque: fix compute nodes locking mechanism to prevent job scheduling on nodes being terminated
  • Restore logic that was automatically adding compute nodes identity to SSH known_hosts file
  • Slurm: fix issue that was causing the ParallelCluster daemons to fail when the cluster is stopped and an empty compute nodes file is imported in Slurm config

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster v2.4.0

11 Jun 15:26
1c53ad5

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.4.0.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Docs

New docs are available here: https://docs.aws.amazon.com/parallelcluster/latest/ug/

Enhancements

  • Add support for EFA on Centos 7, Amazon Linux and Ubuntu 1604
  • Add support for Ubuntu in China region cn-northwest-1
  • SGE:
    • process nodes added to or removed from the cluster in batches in order to speed up cluster scaling.
    • scale up only if required slots/nodes can be satisfied
    • scale down if pending jobs have unsatisfiable CPU/nodes requirements
    • add support for jobs in hold/suspended state (this includes job dependencies)
    • automatically terminate and replace faulty or unresponsive compute nodes
    • add retries in case of failures when adding or removing nodes
    • configure scheduler to handle rescheduling and cancellation of jobs running on failing or terminated nodes
  • Slurm:
    • scale up only if required slots/nodes can be satisfied
    • scale down if pending jobs have unsatisfiable CPU/nodes requirements
    • automatically terminate and replace faulty or unresponsive compute nodes
    • decrease SlurmdTimeout to 120 seconds to speed up replacement of faulty nodes
  • Automatically replace compute instances that fail initialization and dump logs to shared home directory.
  • Dynamically fetch compute instance type and cluster size in order to support updates in scaling daemons
  • Always use full master FQDN when mounting NFS on compute nodes. This solves some issues occurring with some networking setups and custom DNS configurations
  • List the version and status during pcluster list
  • Remove double quoting of the post_install args
  • awsbsub: use override option to set the number of nodes rather than creating multiple JobDefinitions
  • Add support for AWS_PCLUSTER_CONFIG_FILE env variable to specify pcluster config file

Changes

  • Update Open MPI library to version 3.1.4 on Centos 7, Amazon Linux and Ubuntu 1604. This also changes the default openmpi path to /opt/amazon/efa/bin/ and the openmpi module name to openmpi/3.1.4
  • Set soft and hard ulimit on open files to 10000 for all supported OSs
  • For a better security posture, we're removing AWS credentials from the parallelcluster config file. Credentials can be now setup following the canonical procedure used for the aws cli
  • When using FSx or EFS do not enforce in sanity check that the compute security group is open to 0.0.0.0/0
  • When updating an existing cluster, the same template version is now used, no matter the pcluster cli version
  • SQS messages that fail to be processed in sqswatcher are now re-queued only 3 times and not forever
  • Reset nodewatcher idletime to 0 when the host becomes essential for the cluster (because of min size of ASG or because there are pending jobs in the scheduler queue)
  • SGE: a node is considered as busy when in one of the following states "u", "C", "s", "d", "D", "E", "P", "o". This allows a quick replacement of the node without waiting for the nodewatcher to terminate it.
  • Do not update DynamoDB table on cluster updates in order to avoid hitting strict API limits (1 update per day).

Bug Fixes

  • Fix issue that was preventing Torque from being used on Centos 7
  • Start node daemons at the end of instance initialization. The time spent for post-install script and node initialization is not counted as part of node idletime anymore.
  • Fix issue which was causing an additional and invalid EBS mount point to be added in case of multiple EBS
  • Install Slurm libpmpi/libpmpi2 that is distributed in a separate package since Slurm 17
  • pcluster ssh command now works for clusters with use_public_ips = false
  • Slurm: add "BeginTime", "NodeDown", "Priority" and "ReqNodeNotAvail" to the pending reasons that trigger a cluster scaling
  • Add a timeout on remote commands execution so that the daemons are not stuck if the compute node is unresponsive
  • Fix an edge case that was causing the nodewatcher to hang forever in case the node had become essential to the cluster during a call to self_terminate.
  • Fix pcluster start/stop commands when used with an awsbatch cluster

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster 2.3.1

03 Apr 08:54
47b8751

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.3.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Enhancements

  • Add support for FSx Lustre with Amazon Linux. In case of custom AMI,
    The kernel will need to be >= 4.14.104-78.84.amzn1.x86_64
  • Slurm
    • set compute nodes to DRAIN state before removing them from cluster. This prevents the scheduler from submitting a job to a node that is being terminated.
    • dynamically adjust max cluster size based on ASG settings
    • dynamically change the number of configured FUTURE nodes based on the actual nodes that join the cluster. The max size of the cluster seen by the scheduler always matches the max capacity of the ASG.
    • process nodes added to or removed from the cluster in batches. This speeds up cluster scaling which is able to react with a delay of less than 1 minute to variations in the ASG capacity.
    • add support for job dependencies and pending reasons. The cluster won't scale up if the job cannot start due to an unsatisfied dependency.
    • set ReturnToService=1 in scheduler config in order to recover instances that were initially marked as down due to a transient issue.
  • Validate FSx parameters. Fixes #896 .

Changes

  • Slurm - Upgrade version to 18.08.6.2
  • NVIDIA - update drivers to version 418.56
  • CUDA - update toolkit to version 10.0
  • Increase default EBS volume size from 15GB to 17GB
  • Disabled updates to FSx File Systems, updates to most parameters would cause the filesystem, and all it's data, to be deleted

Bug Fixes

  • Cookbook wasn't fetched when custom_ami parameter specified in the config
  • Cfn-init is now fetched from us-east-1, this bug effected non-alinux custom ami's in regions other than us-east-1.
  • Account limit check not done for SPOT or AWS Batch Clusters
  • Account limit check fall back to master subnet. Fixes #910 .
  • Boto3 upperbound removed

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster 2.2.1

28 Feb 11:41
9e76b10

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.2.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Features

  • Support for FSx Lustre with Centos 7
  • Check AWS EC2 account limits before starting cluster creation
  • Allow users to force job deletion with SGE scheduler

Changes

  • Set default value to compute for placement_group option
  • pcluster ssh: use private IP when the public one is not available
  • pcluster ssh: now works also when stack is not completed as long as the master IP is available

Bugfixes

  • awsbsub: fix file upload with absolute path
  • pcluster ssh: fix issue that was preventing the command from working correctly when stack status is UPDATE_ROLLBACK_COMPLETE
  • Fix block device conversion to correctly attach EBS nvme volumes
  • Wait for Torque scheduler initialization before completing master node setup
  • pcluster version: now works also when no ParallelCluster config is present
  • Improve nodewatcher daemon logic to detect if a SGE compute node has running jobs

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster 2.1.1

08 Jan 14:04
1e0fbc6

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.1.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Features

  • Support for AWS Beijing Region (cn-north-1) and Ningxia Region (cn-northwest-1)

Bugfixes

  • No longer schedule jobs on compute nodes that are terminating

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster v2.1.0

07 Jan 22:15

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.1.0.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Features

  • Support for Elastic File System (EFS)
  • AWS Batch Multinode Parallel support
  • Support for RAID 0 and 1 EBS Volumes
  • Support for AWS Stockholm Region (eu-north-1)

Bugfixes

  • No longer schedule jobs on compute nodes that are terminating

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster v2.0.2

20 Nov 00:04

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.0.2.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Features

  • Support for new GovCloud region us-gov-east-1

Bugfixes

  • Fix regression with shared_dir parameter in the cluster configuration section.
  • Fixed issue with jq that prevented customers from using extra_json
  • Fixed issue with awscli version on ubuntu1404

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster v2.0.1

20 Nov 00:03

Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.0.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Bugfixes

  • Fix pcluster configure and pcluster createami commands

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

AWS ParallelCluster v2.0.0

19 Nov 22:21

Choose a tag to compare

We are happy to launch AWS ParallelCluster!

AWS ParallelCluster is an enhanced and productized version of CfnCluster.

Moving from CfnCluster to AWS ParallelCluster

If you are a previous CfnCluster’s user, we encourage you to start using and creating new clusters only with AWS ParallelCluster. Clusters created with CfnCluster can continue to be managed with CfnCluster and clusters created with AWS ParallelCluster can coexist.

The configuration file from cfncluster can be used with AWS ParallelCluster. To read more about the differences see: https://aws-parallelcluster.readthedocs.io/en/latest/getting_started.html#moving-from-cfncluster-to-aws-parallelcluster

Installation

How to install?

sudo pip install aws-parallelcluster

Features

  • AWS Batch Integration
  • Support for creating custom AMI's
  • Multiple EBS Volume support

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

CfnCluster v1.6.1

30 Oct 21:41
f9ff393

Choose a tag to compare

We are happy to announce the availability of CfnCluster v1.6.1.

How to update?

sudo pip install --upgrade cfncluster

Features:

  • Fix a bug in cfncluster configure introduced in 1.6.0

Need help / have a feature request?
CfnCluster Issues tracker on GitHub: https://github.com/awslabs/cfncluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192