Resolved April 03, 2019 at 3:49 pm.
Canvas announced from 7:00 to 7:34 pm on April 1, 2019, all users taking or attempting to take a quiz encountered an error. Instructure provided the following information regarding the service disruption.
All users taking—or trying to take—a quiz between 00:00 and 00:34 UTC encountered an error when we began writing quiz events to a non-existent database partition.
A process that normally makes a new database partition for quiz events to write to each month failed to do its job, and went unnoticed. When the clock struck 00:00 UTC on April 1st, every quiz trying to write events to the non-existent partition failed and caused this error.
After ten minutes, the initial cause was identified and our DevOps team started creating the database partition that had failed on automatic creation. This partition was manually created, and quiz events began to properly write to that location by 00:34 UTC, restoring full functionality to quizzes. We also deployed a fix to ensure our automated process would properly create new partitions at the beginning of each month.
We were surprised by the abruptness of this, but because it occurred when a number of engineers were still readily available, this was identified and resolved quickly. Given the fact that we hadn’t considered this possibility, we had neither monitoring nor checks to help us head this off. Both of these are being examined as options to proactively alert us when database partitions will be required soon but don’t exist yet.
We apologize for the trouble this caused for you and your users. We understand any challenges encountered while using Canvas are frustrating and impact your ability to serve your students and teachers. We do not take lightly the trust you have placed in us. We will continue to apply the best technology and processes to minimize future issues.