Between May 19, 2020 22:47 PST and May 20, 2020 02:51 PST, we experienced an incident impacting several video services.
API calls to create new sessions and archive requests failed intermittently on both Enterprise and Standard environments. Other services unavailable as a result of this incident included: Broadcast, SIP and Session Monitoring, logging service to the Account Portal and Developer Tools, including Inspector and Playground.
Most services were recovered at 00:58 PST. The Logging service to the Account Portal and Developer Tools, were recovered at 02:51 PST.
Ongoing sessions or sessions created prior to the beginning of the incident, without further API calls, were not impacted.
One of the servers hosting the non-persistent database, used to provide Video API services, experienced a problem (Kernel error). As a result, the server triggered an automatic fail over to two master databases, which is the expected behavior. While one of the masters became operational, the other master experienced a configuration problem and remained unavailable.
Due to the above, all services requiring interaction with the impacted server were affected. Services interacting with the master were successfully setup and were not affected.
Most services were recovered after a migration from the current Data Center provider. The Account Portal and Developer Tools were fully recovered after the servers hosting these services were able to access the new database nodes.