• Infrastructure Update

      Hi, everyone. I wanted to run through a few of the things our engineers have been working on behind the scenes.

      Our #1 priority has been bringing your blogs back to 100% uptime. We project the graph below on our wall, showing us exactly how many error messages we are serving instead of real pages.

      We live and die by this chart, and we’ve been pulling nights and weekends working to remove the bottlenecks that have made “over capacity” errors routine. The list of systems we’ve tackled includes everything from networking equipment to database queries to the deep dark insides of the Linux kernel. (If working on these kinds of challenge gets your pulse up, we want to hear from you.)

      As we break through our past bottlenecks, we are simultaneously faced with growth like we’ve never seen before. Our challenge is not only to support the current audience of 55 million, but to get ahead of more than 250,000,000 new pageviews each week.

      Still, we’ve made incredible progress in improving our infrastructure. We’re not out of the woods yet, but we expect these coming weeks to be much smoother than the past few rocky months. We’re incredibly confident in our ability to scale to serve all of the visitors to your awesome blogs.

      A couple specific issues I want to cover:

      The Queue feature has been disabled more often than not over the last two weeks. We hit an incredibly difficult problem with the way the Queue processes handle their publishing step that forced us to unwind and rewrite a big chunk of our publishing routine. The Queues have been completely restored for the last couple days and we don’t foresee any more issues.

      This morning we suffered an outage for nearly an hour starting at about 7am EST. The proxy server that handles the vast majority of incoming requests failed, and its automatic backup also failed because of an unrelated networking issue. It took us longer than hoped to correct the multiple causes, but we’ve put systems into place to make sure that even another double failure won’t cause an outage like this in the future.

      If you ever run into issues, please don’t hesitate to contact support. And you can check known issues via our Twitter feed or Help page.

      Thank you for your patience and support through all of this. To make up for it, we have some absolutely epic product updates around the corner.