We have hit this issues multiple times where the framework is restarted when brokers are starting up, during reconciliation the framework fails to register an endpoint on that broker as it does not recognise TASK_STARTING as a valid state.
This eventually leaves the broker state as follows when running:
{
"id": "0",
...
"task": {
"id": "kafka-0-20a281db-069a-4885-96c2-53a6cc3db252",
"slaveId": "c30ad8fa-8a52-45fb-bcf0-29e22140c8a3-S24",
"executorId": "kafka-0-02690702-ae31-415e-8a0d-44d85d9636d1",
"hostname": "localhost",
"attributes": {},
"state": "running"
},
...
}
As the framework knows the hostname and port (from the mesos offer it receives) that the broker will start up on, wouldn't it make sense to add the endpoint initially when launching the task rather than appending to it after the task has started.