Skip to content

Scheduler dies when trying to start instance with invalid state #126

@Metallion

Description

@Metallion

Problem

The following scenario:

  1. Start an LXC instance and wait until running.
  2. Reboot executor machine.

Now the LXC container is stopped but OpenVDC thinks it's running.

  1. Start the instance

Result:

Feb 20 14:38:24 ci openvdc-scheduler[2806]: 2017-02-20 14:38:24 [FATAL] github.com/axsh/openvdc/api/instance_service.go:86 BUGON: Detected un-handled state instance_id=i-0000000000 state=state:RUNNING created_at:<seconds:1487314564 nanos:237858284 >
Feb 20 14:38:24 ci systemd[1]: openvdc-scheduler.service: main process exited, code=exited, status=1/FAILURE
Feb 20 14:38:24 ci systemd[1]: Unit openvdc-scheduler.service entered failed state.
Feb 20 14:38:24 ci systemd[1]: openvdc-scheduler.service failed.

The openvdc-scheduler service dies.

# systemctl status openvdc-scheduler
● openvdc-scheduler.service - OpenVDC scheduler
   Loaded: loaded (/usr/lib/systemd/system/openvdc-scheduler.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 14:38:24 JST; 6min ago
  Process: 2806 ExecStart=/opt/axsh/openvdc/bin/openvdc-scheduler (code=exited, status=1/FAILURE)
 Main PID: 2806 (code=exited, status=1/FAILURE)

Suggested solution

  • On executor start, OpenVDC should check that all instances are in their expected state. If they are not, they should be brought to the states OpenVDC expects them to be.

  • When start is called on a container OpenVDC thinks is "RUNNING", first check which state the instance is actually in. Then switch it to the correct state and run the start command on that.

  • Make sure that scheduler never dies no matter what state start is called on.

Other suggestions welcome. ^_^

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions