A little over a year ago, every Dropbox employee received the same email. We were asked to work from home for two weeks to curb the spread of COVID-19. What we didn’t realize then was that a year later, our homes would still be offices and the way we work would have changed permanently.
For the Dropbox Infrastructure team, the March 5 email was no surprise. Our supply chain team had been monitoring the spread of COVID-19 in China and its potential impact on our datacenter hardware suppliers. We documented our initial response to the crisis in a post last April, Engineering a disruption tolerant supply chain.
Today, we’d like to update that story and report on what the Dropbox infrastructure team has experienced and learned over the last year. By changing the ways we meet with and communicate with our suppliers we’ve been able to keep those relationships alive, stay ahead of new customer demand, build new datacenters while fully remote, and ensure that our supply chain has been minimally disrupted.
Learning to work remotely with suppliers
When the initial work from home email went out, we immediately moved our supplier meetings to public settings. Even in public settings, the risk to personal health was becoming more evident every day and we would soon stop all in-person meetings.
The following week, on March 11th, we held what would be our team’s last in-person meeting with our supply base at a local restaurant. No one could imagine what was to come. Dropbox would be one of the first companies to adapt a Virtual First working environment permanently. We would have to change the way we interface with our suppliers.
In normal circumstances we’d not only meet our suppliers in person on a regular basis, we’d also keep a presence at our supplier’s headquarters and manufacturing sites around the world. We often use the spring, summer, and fall to visit our suppliers in Taiwan, Korea, and Thailand. Locally, we typically visited our suppliers’ rack manufacturing lines on a monthly basis, if not more often.
The days of the quick in-person coffee or “let’s get business done” lunches were over. Yet the need for dialogue with our suppliers is paramount to Dropbox’s supply chain. We came to adjust by driving high frequency, sometimes high intensity virtual connections with our suppliers. The thinner but more frequent connection with our suppliers have turned out to be the sweet spot for ensuring suppliers are clear on Dropbox’s hardware needs and for ensuring our suppliers are performing.
We found new ways of using Zoom, utilizing it to do virtual manufacturing line walks and quality inspections, both of which we’d normally do in person prior to COVID-19. At first we weren’t very efficient. We found that virtual line walks and quality reviews require preparation with our suppliers ahead of time, both to keep focus on the task at hand and to allow our suppliers to be ready to be Dropbox’s eyes in the factory. In our first few attempts with virtual walk-throughs, we learned the supplier needed time to provide the best camera angles and views of the hardware we were looking at. Getting this right often took many attempts, leading to very long reviews as we allowed for our suppliers to adjust and learn in real time.
Looking back, this shouldn’t have been a surprise to us since the supplier’s core competency is aligned to manufacturing rather than camera work—they’re makers, not influencers. But together we learned to refine our virtual walk-throughs and can happily say today that we’ve successfully completed many.
At Dropbox we pride ourselves on the relationships we have established up and down our hardware supply chain. It’s one of the primary means to ensure Dropbox is able to deliver hardware to our growing datacenter footprint. This is especially true given our smaller scale compared to other cloud customers our suppliers serve. The supply chain team knew that communication with our supply base was the only way we could stand a chance to overcome the supply disruption clouds gathering over us.
Managing growing customer demand
At the same time, we promptly let our internal teams know about the looming supply disruptions on the horizon. Our capacity planning team, responsible for forecasting server demand from application owners, began plans for a rise in usage of Dropbox services in anticipation of increased remote work. We kicked off a global scenario planning process to analyze impact to Dropbox’s infrastructure—what will we do if there are supply disruptions due to factory shutdowns, a reduced workforce, and extended lead times as the entire supply chain is affected.
Our playbook was simple: Identify and rank levers that need to be pulled to ensure our availability goals were met, while keeping the lowest possible system cost (trading off between CapEx and OpEx while identifying engineering hours spent).
To start, we updated Dropbox’s inventory policy and planning parameters (SLAs) to ensure critical infrastructure services like Magic Pocket and Edgestore had enough free capacity to scale if needed. We also started proactively working with application owners to determine an allocation strategy ahead of time for new hardware deliveries—strategic distribution, fair share, first in/first out (FIFO) prioritization.
Finally, to manage short term capacity risks, we identified a set of services which could be migrated to public cloud infrastructure quickly and cost-efficiently. These set of actions helped us stabilize our forecast to suppliers and avoid knee-jerk reactions caused by the bullwhip effect, in which disruptions travel along the entire chain the way a flick of a whip’s handle causes its tip to lash out painfully at the other end. Migrating to the public cloud isn’t as simple as it might sound, but it offloads both demand growth and unexpected surges to a much larger vendor whose business focus is capacity growth on demand.
These actions ensured that we’d see minimal impact to Dropbox’s capacity SLA/SLO. Most important of all, they ensured no disruption of services to our customers.
Maintaining our datacenters
While the capacity planning team did its work, the datacenter team focused on three areas; maintaining our physical infrastructure uptime SLAs, keeping our essential staff safe when they have to be onsite, and continuing our planned datacenter expansion timelines.
The first two areas were non-negotiable—Dropbox needed to remain reliable for users, especially as they moved to distributed work for an unknown amount of time, and it was important to keep our team physically healthy. We got eager buy-in from all teams, the key factor in our success at these competing goals.
We learned some new ways of working. One valuable example: utilizing a rotating skeleton crew with flexible hours proved to be as effective as our normal 9-5 hours with full staffing. It seems counterintuitive, but our crews pulled together to figure out how to make it work.
We adopted the flextime-skeleton-crew model at all 10 of our datacenter sites with exceptional results. Overall, we tracked government pandemic-safety ordinances across five counties in four states. We partnered closely with our facility managers to ensure Dropbox employees were safe as possible, and that sites had policies in place that aligned with our own company values. In 2020, we kept our repair SLAs constant year over year, yet we’re also proud to report no workplace COVID transmission.
Growing our datacenters—remotely
Building two new datacenter sites in the midst of the pandemic was a new experience for us, and probably for everyone else who builds them. We agreed that the best for Dropbox staff would be to manage the datacenter build completely remote—something we’d never done before.
Software developers take it for granted that an all-remote team can deliver, but would it work for building construction?
We were immediately faced with quality challenges by not having our quality assurance team on site performing weekly job walks. So we shifted our strategy to require construction teams to upload daily or weekly photos (into our Dropbox shared folders of course 🙂). This async visual communication throughout the project, combined with virtual walk-throughs as we did with suppliers, let us identify issues early so we could fix them before they affected schedules.
We’re happy to report: It works! We were 100% virtual with our onsite construction teams, yet we met our deadlines without sacrificing our state of the art design and quality.
Lessons from the past year
Our first objective across the business was to continue to live out our company value of making work human. We wanted to be reliable partners to everyone from our suppliers to our facility managers. But of course, at the human level everyone was worried about their personal and family well being, and everyone in the supply chain had their own changes to which they needed to adapt.
Over the past year, we realized we needed to be more flexible with our partners, while still ensuring we’re delivering to our customers. This meant changing not only where we work, but how. In some cases it meant swimming upstream in our supply chain to get a better understanding of the level of disruptions happening with our vendors and their suppliers. In others it was changing our mindset about how meetings should work, or bringing remote work best practices from software to both hardware manufacturers and the datacenter construction crews.
Interested in helping us build a more resilient supply chain or flexible datacenter strategy? We’re hiring and work Virtual First, which means that remote work (outside an office) will be the primary experience for all Dropboxers. Join us!
Thanks to Harish Pushparaj, Chin Lee, and Latane Garetson for helping me capture experiences across Infrastructure.