Skip to content

awscli2 ignores SIGTSTP, breaks shell job control (only from pyinstaller) #5478

@salewski

Description

@salewski

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug

The aws command from awscli2 seems to be ignoring the SIGTSTP signal. This
violates the user's expectation when working interactively in the shell because
the process cannot be easily suspended (CTRL-Z) and resumed. Effectively, this
behavior breaks job control.

This is a regression from the awscli version 1 behavior, which works with
standard Unix job control.

SDK version number

This is using the awscli2 program downloaded on 2020-08-13 from:

    $ aws --version
    aws-cli/2.0.40 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10

Platform/OS/Hardware/Device

What are you running the cli on?

    $ uname -srmvo
    Linux 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

To Reproduce (observed behavior)

You'll need two terminal emulator windows. The first ("window A" below) is for
normal user activity; the second ("window B" below) is to see what is happening
and to allow us to easily "rescue" the first window when it inevitably hangs.

There are four steps, but only the first two are needed to reproduce the issue.

  • Step 1 of 4 - setup
  • Step 2 of 4 - hang terminal window A
  • Step 3 of 4 (optional) - rescue with SIGSTOP
  • Step 4 of 4 (optional) - rescue with SIGCONT

Steps 3 and 4 describe ways of dealing with the issue once encountered, and give
us an opportunity to show some additional detail.

Step 1 of 4 - setup

In window A, we note our terminal and then just run 'aws help'. That lands the
user in the pager (less).

    $ tty
    /dev/pts/52

    $ aws help
    [output to pager (less)]

In window B, use the ps command take a look at the process tree from window A.

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
     7971 pts/52   S+     0:00  \_ aws help
     7972 pts/52   S+     0:00      \_ aws help
     7981 pts/52   S+     0:00          \_ less

So far, so good. No problems yet.

Note that the plus (+) signs in the STAT column indicate that the aws help
process (and friends) are in the foreground process group:

Step 2 of 4 - hang terminal window A

In window A, press CTRL-Z to suspend the process; the terminal session will be
effectively hung:

    CTRL-Z
    [terminal is hung]

In window B, again use ps to look at the process tree from window A. This time
we can see that only two of the three processes involved have been stopped (as
indicated by the T for them in the STAT column). The plus (+) signs in the
STAT column tell us that the aws help process (and friends) are still in the
foreground process group. The process group is "half suspended":

$ ps fww -t /dev/pts/52
  PID TTY      STAT   TIME COMMAND
  896 pts/52   Ss     0:00 bash
 7971 pts/52   S+     0:00  \_ aws help
 7972 pts/52   T+     0:00      \_ aws help
 7981 pts/52   T+     0:00          \_ less

Step 3 of 4 (optional) - rescue with SIGSTOP

We can force the aws process to stop by sending it a SIGSTOP signal (recall
that SIGSTOP cannot be caught):

    $ kill -SIGSTOP 7971

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss+    0:00 bash
     7971 pts/52   T      0:00  \_ aws help
     7972 pts/52   T      0:00      \_ aws help
     7981 pts/52   T      0:00          \_ less

You can see that the aws process (and friends) are no longer in the foreground
process group. The single + in the STAT column shows that the parent shell
(bash) is back in control.

Furthermore, the shell in window A is again usable:

    $ jobs
    [1]+  Stopped                 aws help

Step 4 of 4 (optional) - rescue with SIGCONT

In window A, bring the suspended aws help process group back into the
foreground. This will resume the pager. Press CTRL-Z to again hang the
terminal:

    $ fg
    [resumed pager (less)]

    CTRL-Z
    [terminal is hung (again)]

In window B, confirm the state looks like the hung scenario described above:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
     7971 pts/52   S+     0:00  \_ aws help
     7972 pts/52   T+     0:00      \_ aws help
     7981 pts/52   T+     0:00          \_ less

Now rather than send SIGSTOP to force the process group to suspend, instead
send two SIGCONT signals to the already-stopped processes to resume them:

    $ kill -SIGCONT 7972
    $ kill -SIGCONT 7981

And again confirm the state of the process tree. We can see that the entire aws help
process group is again in the foreground; it is no longer in a "half suspended" state:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
     7971 pts/52   S+     0:00  \_ aws help
     7972 pts/52   S+     0:00      \_ aws help
     7981 pts/52   S+     0:00          \_ less

At this point, the pager in window A is again usable. You can type 'q' to exit
out of 'less' and otherwise continue using the terminal window.

Expected behavior

Standard Unix job control could be used with aws (awscli2).

Pressing CTRL-Z while reading the docs would put the aws help/pager process
group into the background and return control to the shell. The use would be able
to jump back into the pager using the standard Unix job control features of the
shell.

Logs/output

Nothing much relevant. All logging precedes the triggering of the hung behavior,
and no additional log messages are recorded beyond that point when performing
the above steps. But FWIW, below are some snippets as captured by running:

    $ aws --debug help 2> stderr.log

The top and bottom of the debug log:

2020-08-14 11:46:32,351 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/2.0.40 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10
2020-08-14 11:46:32,351 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['--debug', 'help']
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_timestamp_parser at 0x7f2716c3f158>
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function register_uri_param_handler at 0x7f27175d8b70>
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_binary_formatter at 0x7f2716bfe400>
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x7f2717534b70>
2020-08-14 11:46:32,353 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x7f2716d83f28>
2020-08-14 11:46:32,353 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_json_file_cache at 0x7f2716dbe8c8>
...
2020-08-14 11:46:32,365 - MainThread - botocore.hooks - DEBUG - Event doc-relateditems-start.aws: calling handler <bound method CLIDocumentEventHandler.doc_relateditems_start of <awscli.clidocs.ProviderDocumentEventHandler object at 0x7f2716b73a20>>
2020-08-14 11:46:32,365 - MainThread - botocore.hooks - DEBUG - Event doc-relateditem.aws.aws help topics: calling handler <bound method CLIDocumentEventHandler.doc_relateditem of <awscli.clidocs.ProviderDocumentEventHandler object at 0x7f2716b73a20>>
2020-08-14 11:46:32,432 - MainThread - awscli.help - DEBUG - Running command: ['groff', '-m', 'man', '-T', 'ascii']
2020-08-14 11:46:32,445 - MainThread - awscli.help - DEBUG - Running command: ['less']

Additional context

Empirical testing suggests there are actually two different flavors of this
issue: one that affects aws help output, and one that affects the output AWS
API service calls.

The former is what is described above, and is "the worse" of the two both
because a) it is the more common use case (for the author) for using aws with
job control; and b) there is no easy workaround for it.

[UPDATE (2020-08-20): There is a workaround: build from the git 'v2'
branch and invoke the aws python script directly; avoid invoking the aws
binary executable from the pyinstaller-created installer. See comments
below, especially this one.]

The second has a workaround, which is to use the pager to consume all of the
service output data before attempting to suspend the pager. While clumsy, this
workaround can be performed "inline" in the shell session; it can be applied
without the need to open additional terminal windows and go hunting for PIDs to
which we would then send signals (as described above).

Note that for the second flavor the bug works differently depending on the
cli_pager setting[0]. With the default settings it will be aws that
launches less, and when that is the case the "consume all data before
suspending" workaround does not apply. Even after consuming all of the data, the
aws process will go into the "half suspended" state if you press CTRL-Z in
the pager.

[0] For the above tests, the --no-paginate command line option is not
equivalent to setting the cli_pager option to an empty value in
~/.aws/config. That seems like a different bug, though.

With cli_pager set to an empty value, it would be the user piping the output
of the aws to less. The less invocation is part of the same process group,
so is still susceptible to hanging the terminal. But in this arrangement, having
the pager consume all of the output data from the AWS service call allows the
aws command to exit. When that happens, the only process left in the
foreground process group will be the pager; at that point job control will work
as expected. The user can suspend the pager, resume it, etc.

Step 1 of 3 - disable paging of service data output

In ~/.aws.config, add this setting to the relevant profile:

    cli_pager =

Step 2 of 3 - pipe data to pager

In terminal window A, invoke a command that triggers an AWS API service call
and emits a decent amount of data (enough for multiple pages in your pager
app). Here we happen to be using iam list-policies for that purpose:

    $ aws iam list-policies | less

In window B, use ps to look at the state of the process tree from window A:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
    10135 pts/52   S+     0:00  \_ aws iam list-policies
    10137 pts/52   S+     0:00  |   \_ aws iam list-policies
    10136 pts/52   S+     0:00  \_ less

So far, so good. Note that less is a direct descendant of the shell (bash)
rather than of the aws command.

Note: If you were to press CTRL-Z at this point, then window A would
effectively be hung as described above. Don't do that here.

Step 3 of 3 - use pager to slurp-up all data

In window A, tell your pager to "jump to the end". This will have the effect of
consuming all of the data from the AWS API service call, as emitted by the aws
command. In less, this can be done by issuing the command 0G (zero
capital-gee).

In window B, again look at the process tree:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
    10136 pts/52   S+     0:00  \_ less

Notice that the aws process has completed its work and exited. The only
process still in the foreground process group is the pager. At this point, you
can suspend the pager (CTRL-Z) and it will work as expected; it will not
hang the terminal.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions