Skip to content

Better feedback when agent isn't running after ungraceful exit #344

@tegefaulkes

Description

@tegefaulkes

Specification

Currently when you you use the CLI to make calls to an agent, it will check the existing status file in the node path and use that for the connection information. When it's running this file contains useful running information. When it stops gracefully this file reports that it is currently stopped.

However when the agent crashes and fails to shut down gracefully. this status file is left as is from the moment of the crash. It is never updated. So we have a situation where the status file reports a running agent but we can't connect to it.

ERROR:polykey.PolykeyClient.WebSocketClient:ErrorWebSocketConnectionLocal: WebSocket Connection local error - WebSocket could not open due to internal error
ERROR:polykey.PolykeyClient.WebSocketClient.WebSocketConnection 0:ErrorWebSocketConnectionLocal: WebSocket Connection local error - WebSocket could not open due to internal error
ErrorPolykeyCLIUnexpectedError: An unexpected error occured - Thrown 'ErrorWebSocketConnectionLocal'
  cause: ErrorWebSocketConnectionLocal: WebSocket could not open due to internal error

As of now this was expected behaviour. But this feedback looks worse than the actual problem of the node not running. We need better feedback for this scenario.

So we need to following changes.

  1. If a Websocket client fails to connect then we need a nicer error to be returned without all this error logging from the logger.
  2. If we take the connection info from the status file but fail to connect with these details, we need the nicer connection failure message AND report that the status file was incorrect and attempt to correct the status file.

Additional context

Related: #198 (comment)
Related: #198

Tasks

  1. Clean up the error reporting if we fail to connect with a websocket. WE shouldn't get a bunch of ERROR level logs, we should catch the connection failure and report it directly with a nicer formatted error.
  2. We need a more specific error reported if we failed to connect with details taken from a status file with the --node-path option.
  3. We need to clean up the status file if we determine it to be stale and orphaned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions