Leveraging LLMs for Data Analysis and Visualization

We are going to build a lightweight data analyst agent that reads a CSV file, reasons about its contents, and writes Python code to generate charts and summary statistics. This is useful for developers and analysts who need to turn raw data into visual insights without writing boilerplate pandas or matplotlib code by hand.

What you'll need

Python 3.10 or newer
pip install openai pandas matplotlib
An Oxlo.ai API key from https://portal.oxlo.ai

Step 1: Set up the Oxlo.ai client

First, import the OpenAI SDK and instantiate the client pointing at Oxlo.ai. Because Oxlo.ai is fully OpenAI API compatible, this requires only a base URL change.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

Step 2: Craft the system prompt

The system prompt constrains the model to return only executable Python. Keeping the response format strict makes parsing reliable.

SYSTEM_PROMPT = """You are a Python data analyst.
The user will provide a CSV file path, a preview of the data, and a question.
Write only valid Python code that:
1. Reads the CSV into a pandas DataFrame using the variable csv_path
2. Performs the analysis or visualization requested
3. Saves any plot to 'output_chart.png' with tight layout
4. Prints key numeric results to stdout
Do not include markdown fences, explanations, or import statements for pandas or matplotlib. These are already available in the execution environment.
"""

Step 3: Load data and query the model

I will create a sample sales CSV so the tutorial is self-contained. Then I will build a helper that sends the file path, schema preview, and user question to the model.

import pandas as pd

# Create a self-contained sample dataset
sample_csv = "sales.csv"
df = pd.DataFrame({
    "month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
    "revenue": [12000, 15000, 13000, 17000, 16000, 19000],
    "costs": [8000, 9500, 8200, 11000, 10000, 11500]
})
df.to_csv(sample_csv, index=False)

def build_prompt(csv_path, question, df):
    schema = f"Columns: {list(df.columns)}\nTypes:\n{df.dtypes}\nPreview:\n{df.head(3).to_string()}"
    return f"CSV path: {csv_path}\n{schema}\nQuestion: {question}"

user_message = build_prompt(sample_csv, "Plot revenue vs costs as a grouped bar chart and print total profit.", df)

response = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ],
)
generated_code = response.choices[0].message.content
print(generated_code)

Step 4: Execute the generated visualization code

The model returns raw Python strings that may contain accidental markdown fences. I strip those and execute the code in a controlled namespace that already includes pandas and matplotlib.

import matplotlib.pyplot as plt

def run_generated_code(code_string, csv_path):
    cleaned = code_string.replace("

```python", "").replace("```

", "").strip()
    namespace = {
        "pd": pd,
        "plt": plt,
        "csv_path": csv_path
    }
    exec(cleaned, namespace)
    return namespace

run_generated_code(generated_code, sample_csv)

Step 5: Wrap it in a reusable agent class

Now I will package the prompt builder, API call, and executor into a reusable class. Oxlo.ai's flat request-based pricing keeps the cost predictable even when you pass long data previews or iterate across multiple questions.

class DataVizAgent:
    def __init__(self, api_key, model="deepseek-v3.2"):
        self.client = OpenAI(
            base_url="https://api.oxlo.ai/v1",
            api_key=api_key
        )
        self.model = model
        self.system_prompt = SYSTEM_PROMPT

    def analyze(self, csv_path, question):
        df = pd.read_csv(csv_path)
        schema = f"Columns: {list(df.columns)}\nTypes:\n{df.dtypes}\nPreview:\n{df.head(3).to_string()}"
        user_message = f"CSV path: {csv_path}\n{schema}\nQuestion: {question}"

        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": user_message},
            ],
        )
        code = response.choices[0].message.content
        cleaned = code.replace("

```python", "").replace("```

", "").strip()

        namespace = {"pd": pd, "plt": plt, "csv_path": csv_path}
        exec(cleaned, namespace)
        return namespace

Run it

Instantiate the agent and run two different questions against the same CSV. The flat per-request pricing on Oxlo.ai means you can experiment without watching token counters. See https://oxlo.ai/pricing for plan details.

agent = DataVizAgent(api_key=os.environ["OXLO_API_KEY"])

# Run 1
agent.analyze("sales.csv", "Plot revenue vs costs as a grouped bar chart and print total profit.")

# Run 2
agent.analyze("sales.csv", "Create a pie chart of revenue share by month and print the best month.")

After running the second query, stdout shows something like:

Best month: Jun with revenue 19000

And output_chart.png is written to disk.

Wrap-up

Two ways to extend this. First, add multi-turn memory so the agent refines previous charts based on follow-up questions. Second, swap in kimi-k2.6 or qwen-3-32b for reasoning-heavy statistical workloads, since Oxlo.ai hosts both under the same flat request pricing.