Building a Multi-Agent System for data retrieval and visualization from scratch in Python | by Henrique Mello

Recently I became interested in Multi-Agent Systems (MAS) and I wanted to implement one myself, even if it was a simple one, just to get the gist of it. It also has the additional advantage of high customization for my needs.

I wanted to implement a MAS from scratch, without relying on libraries like LangChain or CrewAI. To my surprise, I only found one that uses MAS for a summary task. However, I wanted a more data-related application, such as accessing databases and creating plots, and that’s what we’re gonna do.

For this tutorial, we’ll build a MAS where the user asks the system what information from the database it wants using natural language, and the system returns a dataframe containing the data as well as a visualization of the data. This system can be thought as an assistent to people that lack the knowledge to create insights progamatically. So instead of relying on third parties, people can just ask the system the information it wants that may be contained in the database. The code base is hosted in Github.

The system is composed of two agents: one that writes SQL queries to access a SQLite database and retrieve data from fires that happened in Brazil in 2024. The other agent creates a script to plot the retrieved data on a map. In this particular case, the agents don’t talk directly to each other, but rather use external tools (which we are going to implement) to access the database and to create and run the script. The pipeline is shown in Figure 1 below.

I intend to make the post self contained, so I’ll go through every step from acquisition of the data to the agent-generated visualization. Here is our roadmap:

Create a database in SQLite using a csv file.
Code the agents and their roles and prompts.
Run the application to analyse the queried information.

In this project, we will investigate the wildfires in Brazil. The data from all registered wildfires in the country in 2024 can be found in Instituto Nacional de Pesquisas Espaciais (INPE) website. The files are in CSV and we will put them in SQLite database. If you need to visualize the data and not just use the terminal, I recommend SQLiteStudio.

To turn the CSV into a SQLite database, first open the command prompt and open sqlite3 (I’m using Windows 11 but the steps are analogous in Linux):

Change the mode to CSV and import the data

.mode csv
.import C:\\Users\\hrmel\\Desktop\\focos_br_todos-sats_2024.csv fires_2024

Now we have the database from which the information will be retrieved. It should look like this in SQLiteStudio:

The column names are in portuguese, but let me translate them and what they mean:

latitude and longitude: same as in portuguese, measured in degrees
data_pas: date and hour of the event
satelite: satelite name used to acquire the data
pais: country. Only contains brazilian data
estado: brazilian state
municipio: city (municipio works for both city and town)
bioma: name of the location’s biome
numero dias sem chuva: number of days without rain
precipitação: precipitation
risco fogo: risk of fire
id area industrial: whether or not it is an industrial area
frp: fire radiative power — measures the energy release from burning vegetation per time unit.

The model we use here is Deepseek because its API is great and it is free. In order to use it, create an account in OpenRouter, search for the free api and select it (DeepSeek R1 Zero (free))

and create your API key

First we’ll build a class that will become our most basic agent.

class Agent():
"""
Base Agent class. The config.yaml file only contains the Openrouter API key.
"""
def __init__(self, role: str, config_file: Optional[str], model: str = "deepseek/deepseek-chat:free"):self.role = role
self.model = model
self.config_file = config_file
self.get_credentials()
self.role_prompt = ""
def set_role_prompt(self):
raise NotImplementedError("Each Agent must have its role prompt implemented.")
def generate_prompt(self, query: str, context: Optional[str]):
raise NotImplementedError("Each Agent must define its own prompt to the agent.")
def post_query(self, prompt: str):
"""
Post queries. Sometimes the free models APIs do not return any text, so we force a few attempts to 
ensure we will get an answer from the model.
"""
attempt = 1
max_tries = 5
initial_wait = 1
self.data = {
"model": self.model,
"messages": [{"role": "user", "content": prompt}]
}
print(f"{self.role} is thinking. Attempt {attempt}.")
while attempt < max_tries:
response = requests.post(self.API_URL, json=self.data, headers=self.headers)
if response.status_code == 200:
response_json = response.json()
if response_json and "choices" in response_json and len(response_json["choices"][0]["message"]["content"]) != 0:
with open(f"agent_{self.role}_response.json", "w") as f:
json.dump(response_json, f)
return response
else:
print("Empty response. Trying again.")
else:   
print("Failed to fetch data from API. Status Code:", response.status_code)
time.sleep(initial_wait * (2 ** attempt))
attempt += 1
return response 
def get_credentials(self):
if self.config_file:
with open(self.config_file, 'r') as file:
self.config = yaml.safe_load(file)
API_KEY = self.config["openrouter"]["access_key"]
self.API_URL = 'https://openrouter.ai/api/v1/chat/completions'
self.headers = {
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
}

Now let’s look it in more detail:

def __init__(self, role: str, config_file: Optional[str], model: str = "deepseek/deepseek-chat:free"):self.role = role
self.model = model
self.config_file = config_file
self.get_credentials()
self.role_prompt = ""

Each agent has a role, “Manager”, “Data Scientist”, “SQL Specialist” and so on. This is useful for the organization of your agents as well as to keep track of which agent is doing what in more complex systems.

The model used, as we talked about just now, will be DeepSeek for it is great and it is free.

The configuration file contains the API key and is a typical YAML file. For this tutorial, it will only contain the access key you have just created in OpenRouter:

openrouter:
access_key: "yourkeyhere"

The next two functions are made to tell the agent what is its role description and a more general prompt about the tasks it should do. Both will be seen more in depth when we create individual agents for the SQL query creation and script creation. Since each agent has different roles and their instructions are different, these definitions can be implemented as the agents themselves are created.

def set_role_prompt(self):
raise NotImplementedError("Each Agent must have its role prompt implemented.")def generate_prompt(self, query: str, context: Optional[str]):
raise NotImplementedError("Each Agent must define its own prompt to the agent.")

The next function get the API key from the config.yaml file and sets up the credentials to access the API.

def get_credentials(self):
if self.config_file:
with open(self.config_file, 'r') as file:
self.config = yaml.safe_load(file)API_KEY = self.config["openrouter"]["access_key"]
self.API_URL = 'https://openrouter.ai/api/v1/chat/completions'
self.headers = {
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
}

At last, the function to post the query to the agent and get its response

def post_query(self, prompt: str):
"""
Post queries. Sometimes the free models APIs do not return any text, so we force a few attempts to 
ensure we will get an answer from the model.
"""
attempt = 1
max_tries = 5
initial_wait = 1
self.data = {
"model": self.model,
"messages": [{"role": "user", "content": prompt}]
}print(f"{self.role} is thinking. Attempt {attempt}.")
while attempt < max_tries:
response = requests.post(self.API_URL, json=self.data, headers=self.headers)
if response.status_code == 200:
response_json = response.json()
if response_json and "choices" in response_json and len(response_json["choices"][0]["message"]["content"]) != 0:
with open(f"agent_{self.role}_response.json", "w") as f:
json.dump(response_json, f)
return response
else:
print("Empty response. Trying again.")
else:   
print("Failed to fetch data from API. Status Code:", response.status_code)
time.sleep(initial_wait * (2 ** attempt))
attempt += 1
return response

Now let’s build each agent of our system by writing their roles and specific prompts.

class DataRetrieverAgent(Agent):def __init__(self, role, config_file):
super().__init__(role, config_file)
self.set_role_prompt()
def set_role_prompt(self):
self.role_prompt = """
You only create sql queries to access a database to retrieve information about the query. 
"""
def prompt(self, query: str, context: Optional[str] = None):
generated_prompt = f"""
Considering that {self.role_prompt}, create a query to access a database to answer {query}. 
Your answer should contain only the necessary query for the task and nothing else. Pass the query
as a string and do not use an SQL block to format it.
"""
return generated_prompt

The first agent is the DataRetrieverAgent, that creates SQL queries so we can retrieve data from the database we just created. Its prompt contains specific instructions. It must be done so because LLMs tend to talk a lot, and if we want to use them programatically, with their answers as input to code or to other LLMs, they must answer in a concise way. “Your answer should contain only the necessary query for the task and nothing else.” means it should not say “Here is what I think the query should be…” or something along these lines. “Pass the query as a string and do not use an SQL block to format it.” means we can use this exact same string to access the database ourselves or pass it over to the script writer agent so it incorporates the query in the script.

The second and last agent we’ll use in this tutorial is the PythonPlotterAgent, responsible for creating a script to plot the data retrieved from the database using the SQL query obtained from the previously defined agent.

class PythonPlotterAgent(Agent):def __init__(self, role , config_file):
super().__init__(role, config_file)
self.set_role_prompt()
def set_role_prompt(self):
self.role_prompt = """
You create python scripts that plot the data given to you in a Pandas dataframe. 
"""
def generate_prompt(self, query: str, context: Optional[str] = None):
generated_prompt = f"""
Considering that {self.role_prompt}, create a python script to plot the data to answer {query}.
In addition to the task, use the data printed here to evaluate the best kind of plot to use. 
Only write the script and nothing else in plain text, without a python interpreter, but properly formatted. 
The plot must contain a map using plotly. The example data is: {context}.
"""
return generated_prompt

In this specific case, I forced it to use plotly because of a personal preference. Sometimes it was suggesting using follium, so to force consistency I asked for a plotly visualization.

Since the agents only know to predict words, we need tools to effectively do stuff. First we create a basic Tool class

class Tool():
def __init__(self):
self.description = ""def task(self):
raise NotImplementedError("Each tool must have its own task.")

Each tool does something (task) and has a description. We’ll create two tools. The first one has access to the database, named AccessDatabaseTool

class AccessDatabaseTool(Tool):def __init__(self, database: str, save: bool = True, name_df: str = ""):
self.database = database
self.con = sqlite3.connect(self.database)
self.description = "Accesses an SQL database containing information about the wildfires in Brazil."
self.name_df = name_df
def task(self, query: str):
"""
Execute `query` to output the result.
"""
df = pd.read_sql(query, con = self.con)
if self.save:
df.to_csv(f"{self.name_df}")
return df

Its task is to access a database and put the information retrieved using a query into a Pandas dataframe. The second tool write and saves Python scripts

class PythonScriptCreatorTool(Tool):def __init__(self, file_name):
self.file_name = file_name 
self.description = "Creates a Python script file and execute it."
def task(self, script):
"""
Write the script in a .py file and execute it.
"""
script_file = open(self.file_name, "w")
script_file.write(script)
script_file.close()
os.system(f"python {self.file_name}")

For now, the description attribute is purely for code organization, since it won’t be used directly. However, it can be passed to the LLM so it knows the tools we have and what they are used for. This helps in automation since now they can choose which tools to use based on what they do.

Now we proceed to program the workflow described in Figure 1.

def main():table_name = "fires_2024" # name of the table in the database
script_name = "cities_fire_07.csv" # name of the csv file where the information from the database will be stored
# used tools
database_tool = tools.AccessDatabaseTool("C:\\Users\\hrmel\\Desktop\\Brazilian Fires\\brazilian_fires_2024.db", save = False)
script_creator_tool = tools.PythonScriptCreatorTool("create_map_fires.py")
# extract the column names to pass to the agent, so it will 
# know which one to use to create the query
columns_general_info = database_tool.task(query = "PRAGMA table_info(fires_2024)")
columns_names = [row for row in columns_general_info["name"]]
query_to_sql_agent = f"""In which unique cities the risk of fire is above 0.5 in July. 
I also want their longitude and latitudes.
Agregate fires in the same city using the average of all risk of fires 
indices and an average of latitude and longitude. There should be only 
a single row of any given city, containing the average latitude, 
longitude and risk of fire. All columns are of text type, 
so change types if necessary, the table's name is {table_name},
and the column names are {columns_names}"""
query_to_python_script_agent = f"""Use data passed to you to create a nice visualization about 
the cities with risk of fire above 0.5 in July.
Write a python script in plain text showing the coordinates of the cities in a map using points 
that change their color as the risk of fire grows. 
Assume a csv file named {script_name} with the proper columns
is provided to the script as the data input.
"""
# initialize the agents
sql_agent = DataRetrieverAgent(role = "SQL Query Creator", config_file = "config.yaml")
python_agent = PythonPlotterAgent(role = "Python Script Writer", config_file="config.yaml")
# generate what is going to be passed to the SQL agent based on the query and 
# the agent's own instructions
generated_prompt = sql_agent.prompt(query = query_to_sql_agent)
# post the prompt to the agent and get a response
response_from_sql_agent = sql_agent.post_query(generated_prompt)
# pass the response to json so we can iterate over it
response_from_sql_agent = response_from_sql_agent.json()
# this is how we access the agent's answer itself
sql_query = response_from_sql_agent["choices"][0]["message"]["content"]
# access the database again, this time to use the query provided by the agent 
database_tool = tools.AccessDatabaseTool("C:\\Users\\hrmel\\Desktop\\Brazilian Fires\\brazilian_fires_2024.db", save=True, name_df=f"{script_name}")
# the response of the agent always starts and ends with ", so we remove it
response_from_database = database_tool.task(query = sql_query.strip('"'))
# print some rows for sanity check
print("First rows of the dataframe retrieved from the database.")
print(response_from_database.head())
# generate the prompt for the python agent based on its instructions
# and the question provided to it
generated_prompt = python_agent.generate_prompt(query = query_to_python_script_agent, context = response_from_database.iloc[:10,:])
response_from_python_agent = python_agent.post_query(prompt = generated_prompt)
response_from_python_agent = response_from_python_agent.json()
response_from_python_agent = response_from_python_agent["choices"][0]["message"]["content"]
# Python scripts are in python blocks, so we need to remove ```python ... ``` from the response
response_from_python_agent = response_from_python_agent.strip("`")
response_from_python_agent = response_from_python_agent.strip("python")
script_creator_tool.task(script = response_from_python_agent) # limit token

Note that we need to access the message from the agent inside its response using

["choices"][0]["message"]["content"]

This is done so because the actual response from the agent comes in this form

{
"id": "gen-1741898901-OhMxflJAT1exziizk4VP",
"provider": "Targon",
"model": "deepseek/deepseek-chat",
"object": "chat.completion",
"created": 1741898901,
"choices": [
{
"logprobs": {
"content": [
{
"token": "token_id:1",
"logprob": -0.04900964525480277
},
{
"token": "token_id:4963",
"logprob": -2.7895558506908698
},
{
"token": "token_id:20188",
"logprob": -0.4593904811823618
},
{
"token": "token_id:822",
"logprob": -0.7336598676656245
},
{
"token": "token_id:11",
"logprob": -2.5880288356840726
},
{
"token": "token_id:12431",
"logprob": -0.47951559709077823
},
{
"token": "token_id:38",
"logprob": -1.0523338555265476
},
{
"token": "token_id:7",
"logprob": -0.20343852750776825
},
{
"token": "token_id:35263",
"logprob": -0.6618995161586695
},
{
"token": "token_id:2387",
"logprob": -1.2875383955599802
},
{
"token": "token_id:17584",
"logprob": -0.047895418789184456
},
{
"token": "token_id:5871",
"logprob": -0.029982602963291513
},
{
"token": "token_id:435",
"logprob": -2.3717729985532072
},
{
"token": "token_id:14376",
"logprob": -0.12646486766085713
},
{
"token": "token_id:595",
"logprob": -0.01606043103597924
},
{
"token": "token_id:5871",
"logprob": -3.611149619263857
},
{
"token": "token_id:20291",
"logprob": -0.8151721182659806
},
{
"token": "token_id:918",
"logprob": -0.7046354089551349
},
{
"token": "token_id:17584",
"logprob": -0.5130614273931943
},
{
"token": "token_id:11",
"logprob": -1.5995092289200112
},
{
"token": "token_id:12431",
"logprob": -3.3938979024633915
},
{
"token": "token_id:38",
"logprob": -1.3712396399525775
},
{
"token": "token_id:7",
"logprob": -3.4938972147872196
},
{
"token": "token_id:35263",
"logprob": -0.2546640265498052
},
{
"token": "token_id:12916",
"logprob": -1.1890376495726804
},
{
"token": "token_id:3993",
"logprob": -1.6993597565708587
},
{
"token": "token_id:5871",
"logprob": -0.22071785413253672
},
{
"token": "token_id:435",
"logprob": -0.13012759243459848
},
{
"token": "token_id:14376",
"logprob": -3.213064966095888
},
{
"token": "token_id:595",
"logprob": -0.536816736102419
},
{
"token": "token_id:5871",
"logprob": -0.4531470393053471
},
{
"token": "token_id:20291",
"logprob": -0.038954371463227985
},
{
"token": "token_id:18278",
"logprob": -0.41110138417154135
},
{
"token": "token_id:3993",
"logprob": -1.68538830647622
},
{
"token": "token_id:11",
"logprob": -2.075278860534588
},
{
"token": "token_id:12431",
"logprob": -1.5495110997201849
},
{
"token": "token_id:38",
"logprob": -5.423248808520051
},
{
"token": "token_id:7",
"logprob": -2.133000643838226
},
{
"token": "token_id:35263",
"logprob": -0.19927145279778838
},
{
"token": "token_id:968",
"logprob": -1.0397641047032675
},
{
"token": "token_id:23048",
"logprob": -3.4229556661404072
},
{
"token": "token_id:5871",
"logprob": -0.9731288234546274
},
{
"token": "token_id:435",
"logprob": -0.23141965681970436
},
{
"token": "token_id:14376",
"logprob": -5.267080951247056
},
{
"token": "token_id:595",
"logprob": -3.7262423772291613
},
{
"token": "token_id:5871",
"logprob": -2.128374931885184
},
{
"token": "token_id:20291",
"logprob": -0.0792187962495684
},
{
"token": "token_id:766",
"logprob": -3.233439148417166
},
{
"token": "token_id:23048",
"logprob": -0.1678537705585056
},
{
"token": "token_id:4393",
"logprob": -0.8593827549416201
},
{
"token": "token_id:27176",
"logprob": -1.259947273249719
},
{
"token": "token_id:62",
"logprob": -0.23950172994641336
},
{
"token": "token_id:2366",
"logprob": -0.1903827276416763
},
{
"token": "token_id:19",
"logprob": -0.044049497524710095
},
{
"token": "token_id:5401",
"logprob": -3.3425993514646724
},
{
"token": "token_id:12428",
"logprob": -0.2875593111418817
},
{
"token": "token_id:4082",
"logprob": -1.7230830145517042
},
{
"token": "token_id:5778",
"logprob": -4.4655106979420784
},
{
"token": "token_id:16428",
"logprob": -0.03876183890069447
},
{
"token": "token_id:76",
"logprob": -0.865932429126728
},
{
"token": "token_id:518",
"logprob": -0.38890896200600567
},
{
"token": "token_id:828",
"logprob": -1.3113015781438804
},
{
"token": "token_id:623",
"logprob": -0.1956198991484307
},
{
"token": "token_id:300",
"logprob": -0.2541797700405483
},
{
"token": "token_id:8",
"logprob": -4.173924601139997
},
{
"token": "token_id:284",
"logprob": -0.5610983647995086
},
{
"token": "token_id:364",
"logprob": -1.6067424313653789
},
{
"token": "token_id:2589",
"logprob": -1.111530935493008
},
{
"token": "token_id:6",
"logprob": -0.04085370555505247
},
{
"token": "token_id:3651",
"logprob": -0.08686428813238999
},
{
"token": "token_id:62787",
"logprob": -1.386615832263501
},
{
"token": "token_id:968",
"logprob": -0.5044606994545682
},
{
"token": "token_id:23048",
"logprob": -1.4233590507817762
},
{
"token": "token_id:5871",
"logprob": -2.046107087911784
},
{
"token": "token_id:435",
"logprob": -2.223031062247932
},
{
"token": "token_id:14376",
"logprob": -0.9866522259114427
},
{
"token": "token_id:8",
"logprob": -0.09158830453858371
},
{
"token": "token_id:871",
"logprob": -0.7878563999462064
},
{
"token": "token_id:220",
"logprob": -0.3375082727611512
},
{
"token": "token_id:1135",
"logprob": -2.4255857979786852
},
{
"token": "token_id:27968",
"logprob": -0.4684942137853661
},
{
"token": "token_id:7866",
"logprob": -0.015908862865307325
},
{
"token": "token_id:20188",
"logprob": -0.9930839257373029
},
{
"token": "token_id:822",
"logprob": -0.25363939641336675
},
{
"token": "token_id:11131",
"logprob": -0.689234014849934
}
],
"refusal": []
},
"index": 0,
"message": {
"role": "assistant",
"content": "\"SELECT municipio, AVG(CAST(latitude AS FLOAT)) AS avg_latitude, AVG(CAST(longitude AS FLOAT)) AS avg_longitude, AVG(CAST(frp AS FLOAT)) AS avg_frp FROM fires_2024 WHERE STRFTIME('%m', data_pas) = '07' AND CAST(frp AS FLOAT) > 50 GROUP BY municipio;\"",
"refusal": null
}
}
],
"usage": {
"prompt_tokens": 241,
"completion_tokens": 85,
"total_tokens": 326
}
}

At the end, you’ll have the code provided by the agent and a map showing the information you wanted. Here we asked “In which unique cities the risk of fire is above 0.5 in July.” and the final code and map showing this information is below.


import pandas as pd
import plotly.express as px# Load the data
df = pd.read_csv("cities_fire_07.csv")
# Filter cities with fire risk above 0.5
df_filtered = df[df["avg_risco_fogo"] > 0.5]
# Create a map with points colored by fire risk
fig = px.scatter_mapbox(df_filtered, 
lat="avg_latitude", 
lon="avg_longitude", 
color="avg_risco_fogo",
hover_name="municipio",
zoom=3, 
mapbox_style="open-street-map",
title="Cities with Fire Risk Above 0.5 in July")
# Show the plot
fig.show()

Figure 6: Resulting visualization for first query.

Since I asked for plotly specifically, we obtained a visualization where we can zoom in and out. If the user has some coding expertise, they can also change the code directly to change cosmetic aspects of the visualization or add other small details.

In another try, I asked for a risk of fire x longitude line plot in addition to the map:

query_to_python_script_agent = f"""Use data passed to you to create a visualization of 
the cities with risk of fire above 0.5 in July. 
Write a python script showing the coordinates of the cities in a map using points 
that change their color as the risk of fire grows. Also show 
the risk of fire varying with longitude in a line plot. You can aggregate the longitudes 
in integers using the avergae risk of fire. Use an interactive map with openstreemap.
Assume a csv file named {script_name} with the proper columns
is provided to the script as the data input.
"""

And the resulting plots are the ones in the image below.

Source link

Building a Multi-Agent System for data retrieval and visualization from scratch in Python | by Henrique Mello | Apr, 2025

EU watchdog to set punitive capital rules for insurers holding crypto

18 Construction Methods and Techniques

18 Construction Methods and Techniques

Leave a Reply Cancel reply

POPULAR POSTS

ChatUp AI Unfiltered Video Generator: My Unfiltered Thoughts

How to Configure Proxy Server Settings on iPhone in 2025

Health-specific embedding tools for dermatology and pathology

20 Best Resource Management Software of 2025 (Free & Paid)

10 Ways To Get a Free DoorDash Gift Card

Categories

Connect With Us

Recent Posts

Achieving operational excellence with AI

Venice AI becomes a unicorn with $65M Series A as its privacy-first AI platform takes off