Maniphest T49982

Retry GCS on 503 errors
Confirmed, NormalDESIGN

Assigned To
None
Authored By
Sybren A. Stüvel (sybren)
Nov 10 2016, 9:01 AM
Tags
  • Pillar
Subscribers
Francesco Siddi (fsiddi)
Pablo Vazquez (pablovazquez)
Sybren A. Stüvel (sybren)

Description

Google Cloud Service sometimes throws 503 Service Temporarily Unavailable errors at us. We might want to retry once or twice after such a reply, to see if the service has come up. However, we should consider this carefully, given that:

  • we probably want to pause for a second or two between retries, to give GCS to come up and service us again, and
  • during this delay the Pillar process is sleeping, blocking others from performing requests on it.

If we do this the wrong way, a hickup at GCS could cause all Pillar processes to hang in a retry loop, DOSsing ourselves.

Probably related to T48956.

Related Objects

Event Timeline

Sybren A. Stüvel (sybren) created this task.Nov 10 2016, 9:01 AM
Sybren A. Stüvel (sybren) updated the task description.