Service degradation and high latency after database failoverOperational

Components

Website, API, Git Operations, Container Registry, GitLab Pages, CI/CD - Hosted runners on Linux, Background Processing, GitLab Customers Portal, Support Services, packages.gitlab.com, version.gitlab.com

Locations

Google Compute Engine, Azure, Zendesk, AWS



August 14, 2019 18:05 UTC
[Resolved] Issue closed.

August 14, 2019 15:43 UTC
[Monitoring] We are still monitoring GitLab.com. API and web requests latencies are still good and CI pending queues continue to look better.

August 14, 2019 15:15 UTC
[Monitoring] We are continuing to monitor the recovery of GitLab.com. CI pending jobs queues are continuing to recover. Web and API request latencies are back at normal levels.

August 14, 2019 14:52 UTC
[Monitoring] The failed DB node has been re-added and we are monitoring recovery. We are fixing an issue with our monitoring now before we call an all clear.

August 14, 2019 14:02 UTC
[Identified] We've repaired the failed database node and added it back into the load balancer rotation. We've also created an additional read-replica, and once it's ready we'll add it too. Latency continues to be an issue.

August 14, 2019 12:47 UTC
[Identified] We are facing degraded performance on GitLab.com. We are working to restore the failed database node. We are tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054.

August 14, 2019 12:05 UTC
[Identified] We are facing errors and latency in the API requests. We are investigating our database read-replica load balancing mechanism as it is not behaving as expected to distribute the load when a node leaves the rotation. We are tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054.

August 14, 2019 11:30 UTC
[Identified] We are investigating our database read-replica load balancing mechanism as it is not behaving as expected to distribute the load when a node leaves the rotation. We are tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054.

August 14, 2019 10:49 UTC
[Identified] At the moment we found performance problems with another database read-only node, we remove it from the cluster. We are tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054.

August 14, 2019 10:28 UTC
[Identified] We are facing errors and latency in the API requests. At the moment we are restoring the failed database node. We are tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054.

August 14, 2019 09:54 UTC
[Identified] We are facing a slowdown in some requests to the database. At the moment we are restoring the failed database node. Tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054

August 14, 2019 09:21 UTC
[Identified] We identified a database failover. At the moment we are mitigating the side effects. Tracking the issue in gitlab.com/gitlab-com/gl-infra/production/issues/1054

August 14, 2019 08:53 UTC
[Investigating] We had a brief downtime to most of our services, and we are currently investigating.

Back to current status