German philosopher Fredrich Nietzsche once said that “invisible threads are the strongest bonds.” You can think of “invisible threads” as connecting related entities such as houses along a supplier’s route, or more nebulous entities such as transactions in a financial network or users in a social network.
Computer scientist Julian Shun explores these types of multifaceted but often undetectable connections using graphs in which objects are represented as points or vertices and the relationships between them are modeled using line segments, or edges.
Shun, a newly appointed associate professor in the Department of Electrical Engineering and Computer Science, designs graph algorithms that can be used to find the shortest path between homes along a supplier’s route or detect fraudulent transactions by malicious actors in a financial network.
However, as data volumes have increased, such networks have grown to encompass billions or even trillions of objects and connections. To find productive solutions, Shun creates high-performance algorithms that utilize parallel computing to quickly analyze even the most enormous graphs. Because parallel programming is extremely tough, he also develops user-friendly programming platforms that make it easier for others to write their own productive graph algorithms.
“If you search for something on a search engine or on a social network, you want to get results very quickly. If you’re trying to identify fraudulent financial transactions at a bank, you want to do it in real time to minimize damage. Parallel algorithms can speed things up by using more computing resources,” explains Shun, who is also a principal investigator at the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Such algorithms are often used in online recommendation systems. Search for a product on an e-commerce site and you’ll likely quickly see a list of related products you can add to your cart. This list is generated using graph algorithms that utilize parallelism to quickly find related items across a huge network of users and available products.
Campus Connections
As a teenager, Shun’s only experience with computers was in high school, where he studied web development. More interested in mathematics and science than in technology, he intended to specialize in one of these subjects as he began his undergraduate studies at the University of California, Berkeley.
However, during his freshman year, a friend recommended he take an introductory computer science class. Although he wasn’t sure what to expect, he decided to sign up.
“I fell in love with programming and algorithm design. I switched to computer science and never looked back,” he recalls.
This introductory computer science course was self-paced, so Shun learned most of the material on his own. He liked the logical aspects of developing algorithms and the brief feedback loop in computer science problems. Shun could enter his solutions into the computer and immediately see whether he was right or wrong. And mistakes in wrong solutions will lead him to the right answer.
“I always thought that building things was cool, and in programming you build solutions that do something useful. This appealed to me, he adds.
After graduating, Shun spent some time in industry, but soon realized he wanted to pursue an academic career. He knew that at university he would have the freedom to study problems that interested him.
Getting into the charts
He began postgraduate studies at Carnegie Mellon University, where he focused his research on applied algorithms and parallel computing.
As a student, Shun took theoretical algorithms classes and practical programming courses, but the two worlds did not connect. He wanted to conduct research that combined theory and application. Parallel algorithms worked perfectly.
“In parallel computing, you need to ensure practical applications. The goal of parallel computing is to speed things up in real life, so if the algorithms aren’t fast in practice, they’re not very useful,” he says.
At Carnegie Mellon, he encountered graph datasets, in which objects in a network are modeled as vertices connected by edges. He was attracted to the many applications of this type of data sets and the difficult problem of developing efficient algorithms to handle them.
After completing his postdoctoral fellowship at Berkeley, Shun was looking for a faculty position and decided to join MIT. He had collaborated with several MIT faculty members on research in parallel computing and was excited to join an institute with such broad expertise.
In one of his first projects after joining MIT, Shun teamed up with Department of Electrical Engineering and Computer Science professor and CSAIL fellow Saman Amarasinghe, an expert in programming languages and compilers, to develop a graph processing programming environment known as GraphIt. The easy-to-use framework that generates efficient code from high-level specifications performed about five times faster than the next best approach.
“It was a very fruitful cooperation. I wouldn’t be able to create such an effective solution if I worked alone,” he says.
Shun also expanded his research interests to include clustering algorithms, which aim to group related data points together. He and his students create parallel algorithms and frameworks to quickly solve complex clustering problems, which can be used in applications such as anomaly detection and community detection.
Dynamic problems
Recently, he and his colleagues have been focusing on dynamic problems in which data in a graph network changes over time.
When a dataset contains billions or trillions of data points, running an algorithm from scratch to make one small change can be extremely expensive from a computational point of view. He and his students design parallel algorithms that process multiple updates simultaneously, improving performance while maintaining accuracy.
However, these dynamic problems also represent one of the biggest challenges that Shun and his team have to work on. Because there aren’t many dynamic datasets to test algorithms, the team often has to generate synthetic data, which may not be realistic and can make it difficult for algorithms to perform in the real world.
Ultimately, his goal is to develop dynamic graphics algorithms that perform efficiently in practice while still meeting theoretical guarantees. Thanks to this, they will be applicable in a wide range of applications, he says.
Shun expects that research on dynamic parallel algorithms will receive even greater emphasis in the future. As data sets become larger, more complex, and change more rapidly, researchers will need to develop more efficient algorithms to keep pace.
He also expects advances in computer technology to bring new challenges as researchers will need to design new algorithms to take advantage of the properties of novel hardware.
“That’s the beauty of research — I can try to solve problems that other people haven’t solved before and contribute something useful to society,” he says.