The deep neural network models that power today’s most demanding machine learning applications have become so gigantic and convoluted that they are pushing the boundaries of time-honored electronic computing hardware.
Photonics hardware that can perform machine learning calculations using airy provides a faster and more energy-efficient alternative. However, there are certain types of neural network computations that a photonic device cannot perform, requiring the apply of off-chip electronics or other techniques that limit speed and efficiency.
Based on a decade of research, scientists at MIT and elsewhere have developed a novel photonic chip that overcomes these obstacles. They demonstrated a fully integrated photonic processor that can perform all the key computations of a deep neural network optically on-chip.
The optical device was able to perform key calculations in a machine learning-based classification task in less than half a nanosecond while achieving over 92% accuracy – comparable performance to time-honored hardware.
The chip, which consists of interconnected modules that form an optical neural network, is manufactured using commercial foundry processes, which could enable the technology to be scaled and integrated into electronics.
In the long term, the photonic processor could lead to faster and more energy-efficient deep learning in computationally intensive applications such as lidar, research in astronomy and particle physics, and high-speed telecommunications.
“In many cases, what matters is not only how well the model works, but also how quickly you can get an answer. Now that we have an end-to-end system that can run a neural network in nanosecond-scale optics, we can start thinking at a higher level about applications and algorithms,” says Saumil Bandyopadhyay ’17, MEng ’18, PhD ’23, visiting scientist at Quantum Photonics and AI Group at the Research Laboratory of Electronics (RLE) and a postdoc at NTT Research, Inc., who is the lead author of the paper on the novel chip.
Bandyopadhyay was joined in the article by Alexander Sludds ’18, MEng ’19, PhD ’23; Dr. Nicholas Harris ’17; Dr. Darius Bunandar ’19; Stefan Krastanov, former RLE researcher, currently assistant professor at the University of Massachusetts at Amherst; Ryan Hamerly, visiting scientist at RLE and senior scientist at NTT Research; Matthew Streshinsky, former director of silicon photonics at Nokia and now co-founder and CEO of Enosemi; Michael Hochberg, president of Periplous, LLC; and Dirk Englund, professor at the Department of Electrical Engineering and Computer Science, principal investigator of the Quantum Photonics and Artificial Intelligence Group and RLE, and lead author of the article. Tests is published today in
Machine learning with airy
Deep neural networks consist of many interconnected layers of nodes, or neurons, that operate on input to produce an output. One of the key operations in a deep neural network involves using linear algebra to perform matrix multiplication, which transforms data as it passes from layer to layer.
However, in addition to these linear operations, deep neural networks perform nonlinear operations that facilitate the model learn more convoluted patterns. Nonlinear operations such as activation functions give deep neural networks the power to solve convoluted problems.
In 2017, Englund’s group, along with researchers in the lab of Marin Soljačić, Cecil and Ida Green Professor of Physics, demonstrated an optical neural network on a single photonic chip that could perform matrix multiplication with airy.
However, at that time the device could not perform nonlinear operations on the chip. Optical data had to be converted into electrical signals and sent to a digital processor to perform nonlinear operations.
“Nonlinearity in optics is quite a challenge because photons do not interact with each other very easily. This makes triggering optical nonlinearities very energy-intensive, so building a system that can do it in a scalable way becomes a challenge,” explains Bandyopadhyay.
They overcame this challenge by designing devices called nonlinear optical function units (NOFU), which combine electronics and optics to implement nonlinear on-chip operations.
Scientists built an optical deep neural network on a photonic chip using three layers of devices performing linear and nonlinear operations.
Fully integrated network
First, their system encodes the parameters of a deep neural network into airy. Then, a system of programmable beam splitters, as demonstrated in a 2017 paper, performs matrix multiplication on these inputs.
The data is then transferred to programmable NOFU chips that perform nonlinear functions by dissipating a tiny amount of airy to photodiodes that convert optical signals into electrical current. This process, which eliminates the need for an external amplifier, consumes very little energy.
“We remain in the optical sphere all the time, until the very end, when we want to read the answer. This allows us to achieve very low latency,” says Bandyopadhyay.
Achieving this low latency allowed them to effectively train a deep neural network on a chip. This is a process known as in situ training, which typically consumes a huge amount of power in digital equipment.
“This is particularly useful in systems where domain-specific optical signals are processed, such as navigation or telecommunications, but also in systems where you want to learn in real time,” he says.
The photonic system achieved over 96% accuracy during training tests and over 92% accuracy during inference, which is comparable to time-honored hardware. Additionally, the chip performs key calculations in less than half a nanosecond.
“This work shows that computation—essentially the mapping of inputs to outputs—can be incorporated into new linear and nonlinear physics architectures that enable fundamentally different computational scaling laws relative to the amount of effort required,” Englund says.
The entire circuit was manufactured using the same infrastructure and foundry processes used to produce CMOS computer chips. This could enable large-scale chip production using proven techniques that introduce very few errors into the manufacturing process.
Bandyopadhyay says the main goal of future work will be to scale the device and integrate it with real-world electronics such as cameras and telecommunications systems. Additionally, researchers want to explore algorithms that can leverage the benefits of optics for faster system training and improved energy efficiency.
This research was funded in part by the U.S. National Science Foundation, the U.S. Air Force Office of Scientific Research, and NTT Research.