Fix KeyError when PBS commands return FQDN hostnames in autoscaler #86
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The autoscaler was failing with a
KeyErrorwhen PBS commandsqmgr list schedandqmgr list serverreturned FQDN hostnames instead of short hostnames. This issue occurred when the scheduler host was returned as a fully qualified domain name (e.g.,headnode.internal.cloudapp.net) but the server host dictionary was keyed by short hostnames (e.g.,headnode).The error manifested as:
Root Cause:
The
read_schedulers()function was using the raw hostname values from PBS output to create and lookup entries in the server dictionary. When PBS returned mixed hostname formats (FQDN for schedulers, short names for servers), the lookup would fail.Solution:
Modified the hostname handling in
read_schedulers()to consistently use short hostnames for both dictionary creation and lookup:Server dictionary creation - Extract short hostname when creating the server lookup dictionary:
Scheduler lookup - Extract short hostname from scheduler host before lookup:
This ensures consistent hostname format handling regardless of whether PBS returns FQDN or short hostnames, while maintaining full backward compatibility with existing deployments.
Testing:
Added comprehensive unit tests covering:
Fixes #85.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.