The GEN-AI Art of Story Telling, LIDA Interact with LLM for Data VisualizationšŸ“ˆ

KARTHIK
3 min readOct 12, 2023

--

In this Blog, I will walk you through the LIDA library that will help you through all data visualization for your data through a generic human understandable language.

SYNOPSIS

  1. What is LIDA
  2. Architecture of LIDA
  3. What can LIDA DO?
  4. Limitations

What is LIDA?

LIDA:- Automatic Generation of Visualization and Infographics with Large Language Model

It is compatible with any programming language, data visualization libraries such as Matplotlib and Seaborn, and many Large Language Models such as OpenAI, Azure OpenAI, PaLM, Cohere, and Huggingface.

Architecture of LIDA

Image taken from https://github.com/microsoft/lida

LIDA comprises of 4 components

  1. Summarizer
  2. Goal Explorer
  3. Viz generator
  4. Infographer

Summarizer:

Summarizer is the 1st step of LIDA after initializing the basic language model that we are gonna use for the interaction, Summarizer will read through the data that has been uploaded and will give the best insights into our data.

It will give:

  1. Column names
  2. Unique values
  3. Maximum value
  4. Minimum value
  5. Standard deviation
  6. Samples of the data of each column

For example, if I take this popular dataset IRIS, it basically has 150 rows of data which typically has 3 outcomes Setosa, Virginica, and Versicolor.

After feeding the data into this LIDA, it will automatically explore the data and give you the above-mentioned outputs

It gives the output as Sepal_length, Sepal_width, Petal_length, and Petal_width along with the unique values, maximum value, minimum value, Standard deviation, and a sample of the data for the basic understanding.

Goal Explorer:

The 2nd part of this architecture is the Goal Explorer, where after the basic summary of the data is done, the Goal Explorer will potentially generate some basic questions of data visualization based on the data itself that could possibly be the context of the data.

As from the previous example itself, the Goal Explorer will automatically grab the context of the data and will prompt a few questions as suggestions for visualization.

  1. What is the distribution of the Petal_width?
  2. what is the relation between Sepal_length and Petal_length?
  3. Or we can also give it the question for analysis.

VizGenerator

Once the prompt or question is given, the Viz Generator will automatically visualize the data based on the datatype and will provide you with the graph and its code in parallel.

It will also explain the chart if necessary, in context if we don't know what the data is trying to convey to us.

There is also a section where the code given by the application is correct or not. It has an option named evaluate by clicking on it will evaluate its own code generated by it and give you a detailed analysis of it.

There is also another section called Recommendation where similar kinds of charts are also suggested and generated based on the data.

Infographer

Given a visualization, generate a data-faithful infographic. This method should be considered experimental and use stable diffusion models from the Peacasso library, This part of the application is still a work in progress.

What LIDA can do?

  1. Data Summarization
  2. Goal Generation
  3. Data Visualization
  4. Visualization Editing
  5. Visualization Explanation
  6. Visualization Evaluation and Repair
  7. Visualization Recommendation
  8. Infographic Generation* [WIP]

Limitations

  1. LIDA works well with Large LLMs like (OPEN AI) and the performance of the LIDA depends on the LLM that you choose to work with it.
  2. It expects the data to be in the format of .csv and .jason and considers that the data can be Pandas data frame.
  3. This LIDA is still under exploration and the performance can be improved for larger datasets too in the future.

--

--

KARTHIK
KARTHIK

Written by KARTHIK

ā€œI do not fear this new challenge. Rather like a true warrior I will rise to meet it.ā€

No responses yet