Skip to content

Code and datasets accompanying the arXiv preprint: "Protecting multimodal large language models against misleading visualizations"

License

Notifications You must be signed in to change notification settings

UKPLab/arxiv2025-misleading-visualizations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Protecting MLLMs against misleading visualizations

License Python Versions

This repository contains the implementation of the arxiv preprint: Protecting Multimodal LLMs against misleading visualizations. The code is released under an Apache 2.0 license.

Contact person: Jonathan Tonglet

UKP Lab | TU Darmstadt

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

Abstract

We assess the vulnerability of multimodal large language models to misleading visualizations - charts that distort the underlying data using techniques such as truncated or inverted axes, leading readers to draw inaccurate conclusions that may support misinformation or conspiracy theories. Our analysis shows that these distortions severely harm multimodal large language models, reducing their question-answering accuracy by up to 34.8 percentage points compared to non-misleading visualizations and lowering it to the level of the random baseline. To mitigate this vulnerability, we introduce six inference-time methods to improve performance of MLLMs on misleading visualizations while preserving their accuracy on non-misleading ones. The most effective approach involves (1) extracting the underlying data table and (2) using a text-only large language model to answer questions based on the table. This method improves performance on misleading visualizations by 15.4 to 19.6 percentage points.

tl;dr

  • Misleading visualizations are charts that distort the underlying data, leading readers to inaccurate interpretations 📊
    • Distortions include truncated and inverted axes, 3D effects, or inconsistent tick intervals
    • Misleading negatively affect the performance of human readers in QA tasks. What about MLLMs?
  • MLLMs are very vulnerable to misleading visualizations too ⚠️
    • their QA performance drops to the level of the random baseline
    • up to 65.5 percentage points decrease in accuracy compared to non-misleading visualization datasets like ChartQA

header

  • We propose six inference-time correction methods to improve performance on misleading visualizations 🛠️
    • the best method is to extract the table using the MLLM, then answer with a LLM using the table only
    • this improves performance by up to 19.6 percentage points

header

Environment

Follow these instructions to recreate the environment used for all our experiments.

$ conda create --name misviz python=3.9
$ conda activate misviz
$ pip install -r requirements.txt

Datasets

header

The following script will prepare the datasets, including downloading the real-world images.

$ python src/dataset_preparation.py

Quick start

The following code lets you evaluate the performance of MLLMs on misleading and non-misleading visualizations, with or without one of the six correction methods proposed in the paper. Some correction methods require intermediate steps like extracting the axes or table, or redrawing the visualization.

header

Evaluate a multimodal LLM on one or more dataset

$ python src/question_answering.py --datasets calvi-chartom-real_world-vlat --model internvl2.5/8B/

The --datasets argument expects a string of dataset names separated by -. By default, available datasets are calvi, chartom, real_world, and vlat.

The --model argument expects a string in the format model_name/model_size/. By default, the following models are available:

Name Available sizes 🤗 models
internvl2.5 2B, 4B, 8B, 26B, 38B Link
ovis 1.6 9B, 27B Link
llava-v1.6-vicuna 7B, 13B Link
qwen2vl 2B, 7B Link
chartinstruction 13B Link
chartgemma 3B Link
tinychart 3B Link

If you want to use TinyChart: you need to copy this folder and place it in the root folder of this repo

If you want to use ChartInstruction: you need to copy this folder and place it in the root folder of this repo

Generate metadata (table, axis)

$ python src/chart2metadata.py --datasets calvi-chartom-real_world-vlat --model internvl2.5/8B/

Redraw a visualization based on the extracted table

$ python src/table2code.py --datasets calvi-chartom-real_world-vlat --model qwen2.5/7B/

Evaluation

Finally, evaluate the accuracy of the models

$ python src/evaluate.py --results_folder results_qa --output_file results_qa.csv

Citation

If you find this work relevant to your research or use this code in your work, please cite our paper as follows:

@article{tonglet2025misleadingvisualizations,
  title={Protecting multimodal LLMs against misleading visualizations},
  author={Tonglet, Jonathan and Tuytelaars, Tinne and Moens, Marie-Francine and Gurevych, Iryna},
  journal={arXiv preprint arXiv:2502.XXXX},
  year={2025}
}

Disclaimer

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

About

Code and datasets accompanying the arXiv preprint: "Protecting multimodal large language models against misleading visualizations"

Topics

Resources

License

Stars

Watchers

Forks

Languages