Anonymous Git clones were offline this morning for several hours (SSH and HTTP clones remained functional). Several days ago we removed the client inactivity timeout from our backend HAProxy configuration to prevent long-running RPC calls from incorrectly being severed. It turns out that some Git connections will behave badly and sit idle keeping a connection open without any data transfer. Without the timeout in place these connections eventually consume every spot within HAProxy’s session limit, meaning that any subsequent connections will be dropped. We have reinstated the timeouts and will carefully review any changes we need to make here to ensure this does not happen again.
At the same time, our monitoring of anonymous Git clones is lacking, so we were not notified of the problem as soon as we should have been. I’m now in the process of putting in place a more robust monitoring solution for the Git protocol so that in case there are problems in the future we will be notified in a timely fashion.
I’m very sorry about the problems this morning. We take all matters of availability very seriously and will do everything we can to make sure these problems do not arise again.