I just wanted to take a few minutes and comment on some of the recent issues we’ve had over at the UbuntuForums. Last week Thursday around 4pm EST, our main database server went into a “crashed” state. The result of this state was over 2000 queries that were lingering without completion for over 8000 seconds which resulted in resource issues, this caused the mysqld process to go into an unresponsive state and would not shutdown effectively. After the crash it was realized we had some inconsistent data in the database and a check/repair was ran to potentially correct the issues.
On Sunday evening around 11:30pm EST the forums were down again but this time with table corruption on our largest tables which is our post table (it has around 6.5 million rows) and the thread table (over 1 million rows). I’m assuming everything “wasn’t” ok, after the crash on Thursday even though the checks said they were. Monday was spent cleaning up the tables, fixing corruption issues etc. Unfortunately, because of the massive size of our tables a check and repair takes upwards of 4-5 hours per check and repair until the issues were mostly resolved, this had to be completed a few times. We’ll be monitoring the data integrity over the next couple days to ensure everything stays consistent.
When the forums were ready to come online last night around 7pm EST, we ran into issues with the amount of traffic the site was getting (benefits of being popular eh?). To help ease the pressure on our main database server we have utilized the “slave” feature in vbulletin which will send queries to a slave server to help reduce load on the master. So far this has been working pretty well except for a few hiccups which were easily corrected by changing mySQL variables.
James Troup from Canonical was a great help making sure the proxies were stable last night and will be getting us additional hardware to help with our explosive growth as we have outgrown our current servers. James, thank you for your help, it is appreciated.
Until things stable out we have disabled a couple of popular features on the forums namely the “Thank you” and the “Solved Post” features. We hope to re-enable them again soon. I expect we’ll have a some rough patches ahead of us until the new hardware is in place and we look forward to getting things more stable in the near future. We’ll also be implementing a better method of communication to our users when the forums are offline, expect an announcement on the forums in the near future about that. We appreciate everyones patience during this time.
I also posted this on the forums here.