Here is the trailer to our Server Roundup submission, obviously we take fun seriously!
Yesterday Uptime Institute announced that AOL won the 2nd annual Server Roundup:
AOL won back-to-back years for overall tally of servers removed. The global Web services company decommissioned 8,253 servers in calendar year 2012. This resulted in (gross) total savings of almost $3 million from reduced utility costs, maintenance and recovery of asset resale/scrap. Environmental benefits were seen in the reduction of more than 16,000 tons of carbon emissions, according to AOL.
This is a great way to recognize two consecutive years of great work by my team on driving down power hungry, legacy servers. I continue to talk about and prioritize our challenges with reducing Technical Debt and the results acknowledged with this award demonstrate the impact we are having. (See Technical Debt, Part 1 & Technical Debt Part 2)
This year we are targeting roughly 8,000 servers that are older than our standard 4 year refresh window, most of which are also well above our desired watts/server goal.
Congratulations and thank you to our application engineers, data center personnel, and the numerous folks involved in the often tedious tasks of discovery, research, and data migration. We are both saving money and building an energy conscious culture within our technology operations.
In January during our Technologies All-Hands I had a 5 minute segment to share a story around innovation and human creativity. I thought I’d convert it to an on-line version in this post. I hope you enjoy.
Let’s talk briefly about anthropology…
I recently read a book by Bill Bryson entitled “At Home: A Short Story of Private Life”. In the book Bryson talks about a rectory built in 1851 that he and his wife purchased in England. While the basis of the book was to trace the roots of the various rooms and components of a home and how the 19 century rectory influenced our modern homes, what I want to share with you today is the theme of many of the stories Bryson relates about creativity, innovation, and improvements.
It is really astounding how creative humans can be and the advancement in technology, living conditions, health, and general life improvements that took place in the 18th and early 19th century through the Industrial Revolution.
The discovery or invention of gas lights, indoor plumbing, refrigeration, steamships, photography, anesthesia, electricity, mass-produced bars of soap and push-along lawn mowers. Everyone of these improvements started with a new bold way of looking at a problem and someone having the courage to try something new. Many of the inventors died poor, society outcasting them or not seeing the value of their ideas within their lifetime.
Take for example Captain James Cook. Scurvy killed nearly half of every crew on a long voyage, an estimated 2 million sailors died between 1500 and 1850 due to scurvy. Cook discovered if you gave citrus juices to sailors it prevented Scurvy. On his circumnavigation of the globe in 1768-1771 not a single person died of Scurvy on the voyage. Although the British awarded Cook the Copley medal for his findings it didn’t adopt Cook’s findings for another generation to supply the navy with citrus juice.
Sometimes the methods of invention were dangerous, Karl Scheele discovered 8 elements, receiving no credit for any of them in his lifetime. Part of his discovery method included tasting every substance he worked with. In 1786, he was found slumped over his workbench, dead from an accidental overdose.
- Data center strategy
- Change management toolset and processes
- Simplify our network
- Improve our application deployment tools and methods
Back in March of this year I made a post about technical debt that made the case for incorporating the accumulation of technologies as one way we create new technical debt in Operations. I’m happy to report that the phrase “technical debt” is now firmly embedded in our lexicon and I often hear people using this as part of a justification to change portions of our ecosystem. The acceptance of this concept further strengthens the importance of having a way to identify and address technical debt. We’re only talking about it because we want to improve the accumulation of this debt by paying it down.
Throughout the course of 2012 we have reduced or virtualized our MySQL footprint by over 25%. You might not expect MySQL to be something I lump into technical debt given that this is still a core competency and a technology we continue to recommend and support. We started the year with nearly 7,100 instances, many of them supporting over built environments and running on dedicated servers. If left unchecked, there is no doubt this becomes a debt problem of tomorrow.
We won Uptime Institute’s inaugural Server Roundup contest for removing about 9,500 servers from our data centers. This year we’ve kept a similar pace and we could remove north of 8,500 additional servers. I still count over 8,000 servers that are over 4 years old, so this will continue in 2013 as we look to refresh technologies and lower our power consumption.
In this last quarter of the year we’ve made two huge decisions related to legacy technologies and technical debt. We’ve negotiated agreements for maintenance, software licenses, and new hardware that provide us a time horizon for replacing technology that powered critical components of AOL’s ISP business for the past 2 decades. This will be a tremendous amount of work extended over many months. I’m excited about the willingness to tackle this aging and liable technology and allow us to improve our services offerings in ways that the current technology stack limits.
Looking ahead to 2013, we’ve already kicked off focused efforts to revamp our approach to Networking and revise our long term data center strategy. We are looking at combining our out-of-band and in-band services to reduce our networking equipment footprint. We are keeping pace with advances in Software Designed Networking (SDN) to evaluate where we can take advantage of commodity hardware capable of line speed rules based networking decisions. We continue to enhance our private cloud offerings with the expectations of improving collapse ratios from bare metal servers. Each month we’re paying the interest on our technical debt plus a bit more that is being applied to the principal amount. A great beginning…
I’m happy to share that we launched production traffic on one of our micro data centers (μDC) last week during my Technology Operations all-hands.
My colleague Michael Manos has shared details about our micro data center work in past posts: AOL’s Data Center Independence Day and AOL Micro-DC adds new capability. At the time of those posts we were concentrating on building new infrastructure capabilities and extending the great work we’ve done with AOL’s cloud environment. The micro data center approach provided us new options for solving our hosting challenges. As we brainstormed about the multitude of ways we could take advantage of these new enclosures, I challenged the team to move a portion of our production traffic to a μDC. To make this meaningful I wanted to pick a significant web property to move, giving credence to the technology we’ve enclosed in the μDC cabinets. What better property to target than our flagship portal www.aol.com?
Now up to 25% of all consumer requests to http://www.aol.com are served from an outdoor μDC located on a concrete pad at our Virginia campus. While this helps to prove the technology we’ve bundled into the enclosure, it also proved a great test of the weatherproofing of the enclosure itself. Throughout Hurricane Sandy the micro data center was pounded with rain and heavy winds, it performed admirably and we experienced no issues with the external enclosure. The graph below shows the % of traffic over time served from the μDC. The fluctuations are due to shifting traffic patterns throughout the day based on geo-location.
There were only minor configuration changes required for the application to be installed on 131 virtual machines, which is a credit to the flexible design built by our development team. The only significant difference between aol.com running in our traditional data centers versus our micro data center is the use of virtualized instances of MySQL. Serving aol.com proves that we can take any web product that has been built to run in multiple data centers and easily extend horizontally to our μDC.
We have other uses for micro data centers in the works, and demonstrating the milestone of serving consumer’s web requests and having this as part of our http://www.aol.com ecosystem means we’re well on our way to seeing them come to fruition.
Updated: December 22, 2012
Here is the link to the AWS SES session from the conference.
Original Post below: October 16, 2012
I’m honored to make a cameo at Amazon’s AWS re:Invent conference this coming November. I’ll be participating in a session related to AWS’ Simple Email Service (SES). Chris Wheeler (AWS) and I have been talking about anti-spam efforts and collaborative ways we could improve both AOL’s email service and the AWS delivery to AOL consumers for some time. The SES service follows many of AOL’s recommended best practices for email delivery and it should be an informative forum.
I’m going to talk a bit about how AOL defines spam and highlight some best practices our anti-spam team recommends for email senders. If you happen to be at Amazon Web Service’s first customer/partner conference please stop by.
I really find value in spending time talking to the individuals in my organization, not just about the project of the moment, but about their ideas, thoughts, and frustrations. I’ve found that by knowing more about what motivates people, what their passions and hobbies are, that I develop a better understanding of their motivations and the thinking behind the work they do. I also know I value when my bosses take time to learn a bit about me and who I am, and I’m hoping others feel will feel the same.
I’m writing this on my return trip to Virginia from spending a few days in Northern California where I was meeting with 4 individuals from different teams in my org. The genesis of my trip was to spend time with each of them due to recent org changes I’ve made. I hoped to ensure them that their thoughts are both encouraged and expected, their involvement is crucial to getting the move forward steps right. These talks are similar to many other conversations I have with my staff, albeit, mostly informal and more happenstance much of the time. There are so many valuable things I learn from these talks, and the feedback I get indicates individuals appreciate the time and data I’m able to share in these settings.
Today I’m excited to claim I’ll be adding a slightly more formal approach to these conversations. I’m kicking off what I call “The 1% Club“. Over the next 12 months I’m committing to spend at least 1 hour with every member of my organization in small groups that represent between 1% & 2% of my staff. This will be 50 separate 1 hour meetings with 4-8 people at a time. This size allows a more intimate conversation than team meetings generally provide. Additionally, I’ll be hand crafting each 1% attendee list such that the makeup of each consist of people who don’t normally get to work closely together.
I intend this to be a loosely structured conversation, but I will moderate the discussions, and prefer to mostly listen rather than talk. I’ll also be mindful of the extrovert tendency to dominate the session and reserve the right to halt the conversation of a topic so we get to hear from everyone. I am going to identify a few key messages in each meeting and collect and collate them to share with the larger org. Topics discussed can help inform our future efforts, identify and uncover obstacles to overcome, and ensure we’re doing a better job connecting the business roadmap to the work we’re all busy doing. We won’t talk about individual career development, at least in specific terms, nor confidential things that would be best handled in our normal manager to employee 1 on 1′s. Otherwise, the agenda is a “stream of conscious”, fluid, in-the-moment opportunity for participants.
- Hear from employees directly on their view of the business, our central Technology Operations organization, and what is and isn’t working for them.
- Determine how connected to the larger company people believe their tasks and work are.
- Learning from peer teammates and better understanding of what others are working on. Make introductions and better “one team” efforts.
- Learn what motivates and energies my staff, what are they passionate about, what are the “hot buttons”.
I’ll follow up this post as I progress and share my thoughts on the impact this effort makes, but now I’m off to schedule a number of upcoming meetings.