Amazon believes the future of data centers depends on the technical problem it just solved

Share

Over time, the technology industry has developed and implemented variations of the Fat Tree architecture. But the design has room for improvement. It is generally reliable, but also stiff, unskilled, and requires elaborate wiring. Just like with real physical cables.

If you’ve ever been to a data center or server room in an office building, you’ve probably seen nests of colorful cables spilling out of metal racks. Rehder says cabling is one of the largest network costs, and Amazon’s global data centers are currently connected by 20 million kilometers of fiber-optic cables. That’s about the distance it would take to travel from Earth to the Moon and back 25 times.

In 2012, faced with an explosion in demand for cloud computing services, a group of researchers from the University of Illinois Urbana-Champaign, including Godfrey, introduced concept known as jellyfish. The fixed network designs used at that time could hardly meet the growing demand, so researchers proposed “a high-capacity network interconnect which, thanks to the adoption of a random graph topology, is naturally subject to gradual expansion.” They believed that this random approach could be more competent and scalable than networks built using a fat tree architecture.

“We named it Jellyfish because it’s liquid,” Godfrey says. “You can randomly combine routers and switches, creating a flexible pool of network bandwidth that is very efficient.”

However, Jellyfish also introduced up-to-date challenges in terms of layout, data routing and cabling. Godfrey argues that routing in random graphs is more challenging because there are many more varied paths that data can take from source to destination. Cabling is more challenging because the cable endpoints are chosen randomly.

A few years later, Google started playing with another solution: It began integrating optical circuit switchingor OCS, for your network designs. This approach uses miniature mirrors to reflect lithe from the input port to the output port, allowing Google to reconfigure optical cabling in real time. But again: this adds some engineering complexity as well as cost.

Courtesy of Amazon

Courtesy of Amazon

So random

Meanwhile, Amazon was searching for the “holy grail,” says Giacomo Bernardi, who is one of the lead authors of the up-to-date paper, along with Amazon researchers Ratul Mahajan and Seshadhri Comandur. In an ideal world, the data network would be flat and competent, resistant to hardware failures, random enough to maximize performance, and scalable enough to be expanded without being unwieldy. It would also rely on simpler, streamlined cabling rather than increasingly elaborate fiber-optic systems.

When he and his colleagues began trying to build such a network, Bernardi says he was already obsessed with Penrose tiling, a type of aperiodic tiling named after the British physicist Roger Penrose. (Other researchers were so inspired by Penrose’s tiles that they tried to translate these patterns into error-correcting code in quantum computers.) Bernardi wondered if Amazon could apply a similar design and create a flat “grid” by following a repeating pattern. He and his team tried to build a simulation of what this might look like.

Latest Posts

More News