MIT researchers teach AI models to interpret graphs

Share

To accelerate and improve decision-making in a active global marketplace, enterprises can deploy generative artificial intelligence models that assist summarize and interpret the charts that often populate market summaries and financial reports.

However, even the latest vision-linguistic models sometimes struggle with this task because it requires a model that integrates visual, numerical, and linguistic understanding. A company investing in a cutting-edge model may still receive misleading or incomplete information.

To fill this performance gap, researchers at MIT and the MIT-IBM Computing Research Lab have developed multi-faceted resources for AI users, designed specifically to teach vision-language models (VLM) how to effectively interpret graphs.

They used a novel data generation method to build a state-of-the-art dataset with over a million different graphs. The dataset also encodes many of the visual, linguistic, and numerical components of each graph image, which enables models to reason reliably about the information in the graph.

By enabling open source models to better leverage their commercial counterparts, ChartNet can enable compact businesses with narrow budgets to more easily leverage AI. The open-source dataset can be used to improve the capabilities of AI models for tasks such as analyzing business trends and interpreting scientific data.

“We designed ChartNet as a one-stop-shop for graphing, covering essentially everything an AI model and the practitioners that train that model might need. We hope our work will motivate researchers to achieve state-of-the-art performance using smaller models that don’t require infinite computation,” says Jovana Kondic, an electrical engineering and computer science (EECS) graduate student at MIT and lead author of the book ChartNet article.

She was joined on the paper by a number of co-authors from MIT, the MIT-IBM Computing Research Lab and IBM Research, including Pengyuan Li, a research associate at IBM Research; Dhiraj Joshi, senior scientist at IBM Research; Isaac Sanchez, software engineer at IBM Research; Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing, director of the MIT-IBM Computing Research Lab, and senior research fellow at the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Rogerio Feris, principal scientist and manager at the MIT-IBM Computing Research Lab. The research results will be presented at the IEEE Computer Vision and Pattern Recognition Conference.

Bottleneck in the dataset

Scientists have made great strides in developing generative artificial intelligence models that excel in natural language processing and reasoning about natural images. However, less work has been focused on interpreting the convoluted multimodal data contained in charts, Kondic says.

But for companies gigantic and compact in almost every industry, understanding charts is a critical task.

“The financial industry thrives on graphs. If vision language models can extract information from graphs, such as trend descriptions, it facilitates many of the workflows that take place downstream,” says Joshi.

The lack of high-quality training data is a major bottleneck hindering the development of VLMs that can accurately interpret graphs. Many datasets contain narrow graph images downloaded from the Internet and often lack the necessary scale and additional information to assist the model interpret the underlying data.

“A visual language model, unlike our brains, may need to see thousands of examples during training to reliably recognize something in the form of a line graph,” Kondic says.

Scientists have tried to overcome these shortcomings by generating synthetic data. Synthetic data is artificially generated by algorithms to mimic the statistical properties of real data.

The ChartNet dataset contains over one million high-quality chart images along with the appropriate code used to generate each chart, a text description, and a table containing numerical information. Additionally, each data point contains question and answer pairs that train the model to correctly answer questions about the graph image.

“These additional data modes guide the model to combine and align the various information encoded in the graph image,” says Kondic.

Data generation

To build ChartNet, researchers created a two-step process for generating synthetic data.

First, their automated system translates any pre-existing set of chart images into code. The system then iteratively extends this code to change various aspects of each chart, such as the chart type, data values, topic, colors, etc.

“We can start with a single graph, which we use as a seed, and develop hundreds of extensions to it. In this way, we managed to build a dataset with over a million different images,” explains Kondic.

They also implemented an automated quality control process to ensure the high quality of synthetic data. This process verifies that the code is executable and that the rendered graph images are right and tidy.

“We don’t just want to generate a variety of samples. We also want the information to be presented in an understandable way,” he says.

ChartNet also includes a selection of data points in charts annotated by experts. This provides access to additional chart types and supporting data that are guaranteed to be valid.

A trainee can operate the annotated data to tune an existing VLM, further improving performance in a specific application, adds Joshi.

With ChartNet, compact open source models consistently outperformed much larger commercial models.

“Many previous training datasets focused solely on answering simple graph questions. At ChartNet, we have tried to go beyond that by generating data that supports all aspects of solid graph understanding,” says Kondic.

In the future, researchers plan to further expand ChartNet by including data with additional levels of complexity. They also want to benefit from feedback from the research community.

This research was funded in part by the MIT-IBM Computing Research Lab.

Latest Posts

More News