Skip to content

Releases: aws/aws-parallelcluster

AWS ParallelCluster v3.14.0

30 Sep 12:13
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.14.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Include drivers for P6e-GB200 and P6-B200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
  • Support prioritized and capacity-optimized-prioritized Allocation Strategy. This allows users to prioritize subnets for instance placement to optimize costs and performance.
  • Add build-image support for Amazon Linux 2023 AMIs based on kernel 6.12 (in addition to 6.1).
  • Support DCV on Amazon Linux 2023.
  • Echo chef-client logs in the instance console when a node fails to bootstrap. This helps with investigating bootstrap failures in cases CloudWatch logs are not available.

LIMITATIONS

  • P6e-GB200 instances are only tested on Amazon Linux 2023, Ubuntu 22.04 and Ubuntu 24.04.
  • Using IMEX on P6e-GB200 requires additional setup. Please refer to the dedicated tutorial in our public documentation.
  • P6-B200 instances are only tested on Amazon Linux 2023, RHEL 8 & 9, Rocky 8 & 9, Ubuntu 22.04 and Ubuntu 24.04.
  • GPU HealthChecks are not recommended for instances with GPU memory above 320GB (such as p6-b200.48xlarge). Health check duration can exceed 10 minutes, potentially causing job failures and significantly reducing the job throughput.

CHANGES

  • Install nvidia-imex for all OSs except Amazon Linux 2.
  • Remove UnkillableStepTimeout from slurm.conf and let slurm set this value.
  • Upgrade Python runtime used by Lambda functions to Python 3.12 (from 3.9). See Lambda Documentation for important information about Python 3.9 EOL: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
  • Support encryption of EFS file system used for the head node internal shared storage via a new configuration parameter HeadNode/SharedStorageEfsSettings/Encrypted
  • Add validator that warns against using non GPU instances with DCV.
  • Upgrade Slurm to version 24.11.6 (from 24.05.8).
  • Upgrade EFA installer to 1.43.2 (from 1.41.0).
    • Efa-driver: efa-2.17.2-1
    • Efa-config: efa-config-1.18-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-2.1.0-5
    • Rdma-core: rdma-core-58.0-1
    • Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6-11
  • Upgrade Cinc Client to version 18.4.12 (from 18.2.7).
  • Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except Amazon Linux 2.
  • Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except Amazon Linux 2.
  • Upgrade DCGM to version 4.4.1 (from 3.3.6) for all OSs except Amazon Linux 2.
  • Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except Amazon Linux 2.
  • Upgrade Python to 3.9.23 (from 3.9.20) for Amazon Linux 2.
  • Upgrade Intel MPI Library to 2021.16.0 (from 2021.13.1).
  • Upgrade DCV to version 2024.0-19030.
  • Upgrade the official ParallelCluster Amazon Linux 2023 AMIs to kernel 6.12 (from 6.1).

BUG FIXES

  • Prevent build-image stack deletion failures by deploying a global role that automatically deletes the build-image stack after images either succeed or fail the build.
    The role is meant to exist even after the stack has been deleted. See #5914.
  • Fix an issue where Security Group validation failed when a rule contained both IPv4 ranges (IpRanges) and security group references (UserIdGroupPairs).
  • Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version on the latest Rocky minor version.
  • Fix cluster id mismatch issue which causes cluster update failures when slurm accounting is used.
  • Fix a race condition in CloudWatch Agent startup that could cause node bootstrap failures.

DEPRECATIONS

  • The configuration parameter LoginNodes/Pools/Ssh/KeyName has been deprecated, and it will be removed in future releases. The CLI now returns a warning message when it is used in the cluster configuration.
    See #6811.
  • Ubuntu 20.04 is no longer supported.

AWS ParallelCluster v3.13.2

24 Jun 21:40
9fe28d4
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.13.2

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

3.13.2

BUG FIXES

  • Fix a bug which may cause update-cluster and update-compute-fleet to fail when compute resources reference an expired Capacity Reservation that is no longer accessible via EC2 APIs.
  • Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version. See #6874.

AWS ParallelCluster v3.13.1

04 Jun 20:53
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.13.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

  • Upgrade Slurm to version 24.05.8.
  • Upgrade EFA installer to 1.41.0 (from 1.38.1).
    • Efa-driver: efa-2.15.0-1
    • Efa-config: efa-config-1.18-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-2.1.0-1
    • Rdma-core: rdma-core-57.0-1
    • Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6
  • Upgrade amazon-efs-utils to version 2.3.1 (from v2.1.0) for non-Amazon Linux AMI's.
  • Support DCV in us-isob-east-1 and us-iso-east-1.
  • Support FSX for Lustre and Ontap in us-isob-east-1 and us-iso-east-1.
  • Ensure kernel consistency throughout ParallelCluster image build by pinning at the beginning and unpinning at completion.

BUG FIXES

  • Fix a bug in the installation of ARM Performance Library that was causing the build image fail in isolated environments.
  • Fix a bug that was preventing the script 'update_directory_service_password.sh' from updating the AD password.

AWS ParallelCluster v3.13.0

01 Apr 20:39
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.13.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

DEPRECATIONS

  • This is the last ParallelCluster release supporting Ubuntu 20.04
    as Ubuntu 20.04 will be in End-Of-Standard-Support on May 2025.

ENHANCEMENTS

  • Add support for Ubuntu 24.04.
  • Add support for ap-southeast-7 region.
  • Disable unused services cups and wpa_supplicant from Official ParallelCluster AMIs to improve security.

CHANGES

  • Upgrade Slurm to version 24.05.7.
  • Upgrade NVIDIA driver to version 570.86.15 (from 550.127.08) for all OSs except AL2.
  • Upgrade CUDA Toolkit to version 12.8.0 (from 12.4.1) for all OSs except AL2.
  • Upgrade Python to 3.12.8 for all OSs except AL2 (from 3.9.20).
  • On Ubuntu 22.04, install the Nvidia driver with the same compiler version used to compile the kernel.
  • Upgrade aws-cfn-bootstrap to version 2.0-33.
  • Upgrade EFA installer to 1.38.0 (from 1.36.0).
    • Efa-driver: efa-2.13.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-54.0-1
    • Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
  • Upgrade amazon-efs-utils to version 2.1.0.
  • Remove third-party cookbook: apt-7.5.22 and pyenv-4.2.3.
  • Upgrade third-party cookbook dependencies:
    • line-4.5.21 (from line-4.5.13)
    • nfs-5.1.5 (from nfs-5.1.2)
    • openssh-2.11.14 (from openssh-2.11.12)
    • yum-7.4.20 (from yum-7.4.13)
    • yum-epel-5.0.8 (from yum-epel-5.0.2)
  • Upgrade Pmix to 5.0.6 (from 5.0.3).
  • Upgrade ARM PL to version 24.10 (from 23.10).
  • Upgrade Python to version 3.12.8 (from 3.9.17) in Lambda layer and installer.
  • Upgrade NodeJS to version 20.18.3 (from 18.20.3) in Lambda layer and installer.
  • Remove generation of DSA keys for login nodes as DSA, which became unsupported in OpenSSH 9.7+.
  • Set instance ID and instance type information in Slurm upon compute nodes launch.
  • Install NVIDIA drivers without the option 'no-cc-version-check', which is now deprecated in the NVIDIA installer.
  • Add validator to enforce up to 10- login node pools.
  • Update the default root volume size to 45 GB.
  • Increase HeadNodeBootstrapTimeout by 5 minutes, making it 35 minutes in total.

BUG FIXES

  • Remove usage of cfn-init for compute node bootstrapping to reduce node scale up time.
  • Fix an issue causing compute node bootstrap failure when a proxy is used.
  • On Ubuntu 22.04, install the Nvidia driver with the same compiler version used to compile the kernel
    to prevent installation failures.- Fix the execution of overriding aws-parallelcluster-node package only on the head node during update.
  • Fix an issue where containerized jobs executed through Pyxis/Enroot in a multi-user environment (integrated with Active Directory) would fail.
  • Fix usage of authselect causing node bootstrap failures on Rocky 9.5+ when directory service is used.

AWS ParallelCluster v3.12.0

18 Dec 22:10
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.12.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add new build image configuration section Build/Installation to turn on/off Nvidia software and Lustre client installations. By default, Nvidia software, although included in official ParallelCluster AMIs, is not installed by build-image. By default, Lustre client is installed.
  • The CLI commands export-cluster-logs and export-image-logs can now by default export the logs to the default ParallelCluster bucket or to the CustomS3Bucket if specified in the config.
  • Extend Amazon DCV support to Ubuntu2204 on ARM instances.

CHANGES

  • Upgrade NVIDIA driver to version 550.127.08 (from 550.90.07). This addresses a known issue from Nivdia.
  • Upgrade Amazon DCV to version 2024.0-18131.
    • server: 2024.0-18131-1
    • xdcv: 2024.0.631-1
    • gl: 2024.0.1078-1
    • web_viewer: 2024.0-18131-1
  • Upgrade EFA installer to 1.36.0.
    • Efa-driver: efa-2.13.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-54.0-1
    • Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
  • Auto-restart slurmctld on failure.
  • Upgrade mysql-community-client to version 8.0.39.
  • Remove support for Python 3.7 and 3.8, which are in end of life.

BUG FIXES

  • Fix an issue where changes in sequence of custom actions scripts were not detected during cluster updates.
  • Add missing permissions for ParallelCluster API to create the service linked roles for Elastic Load Balancing and Auto Scaling, that are required to deploy login nodes.
  • Fix an issue in the way we get region when manage volumes so that it can correctly handle local zone.
  • Fix an issue where adding EFS filesystems with AccessPointIds during an update would fail.
  • Fix an issue where when using PCAPI, cluster update could fail when updating a parameter that is not type String (e.g. MaxCount).
  • When mounting an external OpenZFS, it is no longer required to set the outbound rules for ports 111, 2049, 20001, 20002, 20003.

AWS ParallelCluster v3.11.1

21 Oct 16:54
c877343
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.11.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

  • Pyxis is now disabled by default, so it must be manually enabled as documented in the product documentation.
  • Upgrade Python runtime to version 3.12 in ParallelCluster Lambda Layer.
  • Remove version pinning for setuptools to version prior to 70.0.0.
  • Upgrade libjwt to version 1.17.0.

BUG FIXES

  • Fix an issue in the way we configure the Pyxis Slurm plugin in ParallelCluster that can lead to job submission failures.
    #6459
  • Add missing permissions required by login nodes to the public template of policies.

AWS ParallelCluster v3.11.0

26 Sep 18:26
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.11.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for custom actions on login nodes.
  • Allow DCV connection to login nodes.
  • Add support for ap-southeast-3 region.
  • Add security groups to login node network load balancer.
  • Add AllowedIps configuration for login nodes.
  • Add new configuration SharedStorage/EfsSettings/AccessPointId to specify an optional EFS access point for a mount
  • Allow up to 10 login node pools.
  • Install enroot and pyxis in official pcluster AMIs

CHANGES

  • [BREAKING] The loginNodes field returned by the API DescribeCluster and the CLI command describe-cluster
    has been changed from a dictionary to an array to support multiple pools of login nodes.
    This change breaks backward compatibility, making these operations incompatible with clusters deployed with older versions.
  • Upgrade Slurm to 23.11.10 (from 23.11.7).
  • Upgrade Pmix to 5.0.3 (from 5.0.2).
  • Upgrade EFA installer to 1.34.0.
    • Efa-driver: efa-2.10.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-52.0-1
    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.3-11
  • Upgrade NVIDIA driver to version 550.90.07 (from 535.183.01).
  • Upgrade CUDA Toolkit to version 12.4.1 (from 12.2.2).
  • Upgrade Python to 3.9.20 (from 3.9.19).
  • Upgrade Intel MPI Library to 2021.13.1.769 (from 2021.12.1.8).

BUG FIXES

  • Fix validator EfaPlacementGroupValidator so that it does not suggest to configure a Placement Group when Capacity Blocks are used.
  • Fix occasional cluster creation failures by ensuring that FSx for Lustre file systems are created after security group rules.
  • Fix cluster deletion failure when placement group is enabled.
  • Fix issue with login nodes being marked unhealthy when restricting SSH access.
  • Fix retrieve_supported_regions so that it can get the correct S3 url.
  • Fix describe_images to use pagination.
  • Fix No route tables found bug when specifying default VPC subnet to LoginNodes/Networking/SubnetIds.

AWS ParallelCluster v3.10.1

08 Jul 20:05
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.10.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

BUG FIXES

  • Fix image build failure in China regions.

AWS ParallelCluster v3.10.0

27 Jun 21:42
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.10.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add new configuration section Scheduling/SlurmSettings/ExternalSlurmdbd to connect the cluster to an external Slurmdbd.
  • Allow build-image to be run in an isolated network.
  • Add support for Amazon Linux 2023.
  • Add support for price-capacity-optimized as an AllocationStrategy.
  • Add validator to prevent the use of Placement Groups with Capacity Blocks.

CHANGES

  • CentOS 7 is no longer supported.
  • Upgrade Cinc Client to version to 18.4.12 from 18.2.7.
  • Upgrade munge to version 0.5.16 (from 0.5.15).
  • Upgrade Pmix to 5.0.2 (from 4.2.9).
  • Upgrade third-party cookbook dependencies:
    • apt-7.5.22 (from apt-7.5.14)
    • openssh-2.11.12 (from openssh-2.11.3)
  • Remove third-party cookbook: selinux-6.1.12.
  • Upgrade EFA installer to 1.32.0.
    • Efa-driver: efa-2.8.0-1
    • Efa-config: efa-config-1.16-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.21.0-1
    • Rdma-core: rdma-core-50.0-1
    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.2-12
  • Upgrade NVIDIA driver to version 535.183.01 (from 535.154.05).
  • Upgrade Python to 3.9.19 (from 3.9.17).
  • Upgrade Intel MPI Library to 2021.12.1.8 (from 2021.9.0.43482).

BUG FIXES

  • Fix Data Repository Associations configuration to make AutoExportPolicy and AutoImportPolicy optional.
  • Fixed an issue during cluster deletion that now completes compute fleet cleanup when instances are either in shutting-down or terminated state.
    This is to avoid cluster deletion failures for instance types with longer termination cycles.
  • Allow cloudwatch dashboard to be enabled and alarms to be disabled in the Monitoring section of the cluster config.
  • Allow ParallelCluster Custom Resource to suppress validators using PclusterCluster/SuppressValidators.
  • Removing /etc/profile.d/pcluster.sh so that it's not executed at every user login and
    cfn_bootstrap_virtualenv is not added in PATH environment variable.
  • Fix ParallelCluster API spec by replacing field failureReason with failures in DescribeCluster response.
  • Fix ParallelCluster API spec by adding the CloudFormation stack status that were missing:
    IMPORT_*, REVIEW_IN_PROGRESS and UPDATE_FAILED.
  • Fix an issue that prevented cluster updates from including EFS filesystems with encryption in transit.
  • Fix an issue that prevented slurmctld and slurmdbd services from restarting on head node reboot when
    EFS is used for shared internal data.
  • On Ubuntu systems, remove default logrotate configuration for cloud-init log files that clashed with the
    configuration coming from Parallelcluster.
  • Fix image build failure with RHEL 8.10 or newer.

AWS ParallelCluster v3.9.3

19 Jun 12:19
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.9.3

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for FSx Lustre as a shared storage type in us-iso-east-1.

BUG FIXES

  • Remove cloud_dns from the SlurmctldParameters in the Slurm config to avoid Slurm fanout issues.
    This is also not required since we set the IP addresses on instance launch.