0
# Data Collection
1
2
Mesa's data collection system provides comprehensive tools for gathering, organizing, and exporting simulation data. The `DataCollector` class enables systematic tracking of model dynamics, agent states, and custom metrics throughout simulation runs.
3
4
## Imports
5
6
```python { .api }
7
from mesa import DataCollector
8
from typing import Any, Callable, Dict
9
import pandas as pd
10
```
11
12
## DataCollector Class
13
14
The `DataCollector` class serves as the central data collection system for Mesa models, supporting multiple types of reporters and flexible data organization.
15
16
```python { .api }
17
class DataCollector:
18
"""
19
Data collection system for Mesa models with multiple reporter types.
20
21
DataCollector enables systematic collection of model-level statistics,
22
agent-level data, agent-type summaries, and custom table data throughout
23
simulation runs.
24
25
Attributes:
26
model_reporters: Dictionary of model-level variable reporters
27
agent_reporters: Dictionary of agent-level variable reporters
28
agenttype_reporters: Dictionary of agent-type-level variable reporters
29
model_vars: Dictionary of collected model variables
30
tables: Dictionary of custom data tables
31
"""
32
33
def __init__(self,
34
model_reporters=None,
35
agent_reporters=None,
36
agenttype_reporters=None,
37
tables=None):
38
"""
39
Initialize DataCollector with reporter configurations.
40
41
Parameters:
42
model_reporters: Dict mapping variable names to model reporter functions
43
agent_reporters: Dict mapping variable names to agent attribute names or functions
44
agenttype_reporters: Dict mapping variable names to agent-type reporter functions
45
tables: Dict mapping table names to column specifications
46
"""
47
...
48
49
def collect(self, model):
50
"""
51
Collect data from the model for the current step.
52
53
This method should be called at each simulation step to gather
54
data according to the configured reporters.
55
56
Parameters:
57
model: The Mesa model instance to collect data from
58
"""
59
...
60
61
def add_table_row(self, table_name, row, ignore_missing=False):
62
"""
63
Add a row to a custom data table.
64
65
Parameters:
66
table_name: Name of the table to add data to
67
row: Dictionary of column names to values
68
ignore_missing: Whether to ignore missing columns in the row
69
"""
70
...
71
72
def get_model_vars_dataframe(self):
73
"""
74
Get model-level data as a pandas DataFrame.
75
76
Returns:
77
DataFrame with model variables over time, indexed by step
78
"""
79
...
80
81
def get_agent_vars_dataframe(self):
82
"""
83
Get agent-level data as a pandas DataFrame.
84
85
Returns:
86
DataFrame with agent variables over time, indexed by step and agent ID
87
"""
88
...
89
90
def get_agenttype_vars_dataframe(self, agent_type):
91
"""
92
Get agent-type-level data as a pandas DataFrame.
93
94
Parameters:
95
agent_type: The agent class to get data for
96
97
Returns:
98
DataFrame with agent-type aggregated data over time
99
"""
100
...
101
102
def get_table_dataframe(self, table_name):
103
"""
104
Get custom table data as a pandas DataFrame.
105
106
Parameters:
107
table_name: Name of the table to retrieve
108
109
Returns:
110
DataFrame containing the custom table data
111
"""
112
...
113
```
114
115
## Reporter Types
116
117
### Model Reporters
118
119
Model reporters collect model-level statistics at each simulation step. They can be functions, lambda expressions, or attribute names.
120
121
```python { .api }
122
# Model reporter examples
123
model_reporters = {
124
# Function that takes model as argument
125
"Total Agents": lambda model: len(model.agents),
126
127
# Model attribute access
128
"Current Step": "steps",
129
"Is Running": "running",
130
131
# Complex calculations
132
"Average Wealth": lambda m: m.agents.agg("wealth", lambda vals: sum(vals) / len(vals) if vals else 0),
133
134
# Custom function with multiple operations
135
"Population Stats": lambda m: {
136
"total": len(m.agents),
137
"active": len(m.agents.select(lambda a: a.active)),
138
"inactive": len(m.agents.select(lambda a: not a.active))
139
}
140
}
141
```
142
143
### Agent Reporters
144
145
Agent reporters collect data from individual agents at each step. They specify which agent attributes or computed values to track.
146
147
```python { .api }
148
# Agent reporter examples
149
agent_reporters = {
150
# Direct attribute access
151
"Wealth": "wealth",
152
"Position": "pos",
153
"Energy": "energy",
154
155
# Computed properties
156
"Neighborhood Size": lambda agent: len(agent.get_neighbors()) if hasattr(agent, 'get_neighbors') else 0,
157
158
# Agent method calls
159
"Social Network Size": lambda agent: len(agent.social_network) if hasattr(agent, 'social_network') else 0
160
}
161
```
162
163
### Agent Type Reporters
164
165
Agent type reporters aggregate data across all agents of a specific type at each step.
166
167
```python { .api }
168
# Agent type reporter examples
169
agenttype_reporters = {
170
# Count agents by type
171
"Count": lambda model, agent_type: len(model.agents.select(agent_type=agent_type)),
172
173
# Average values by type
174
"Average Wealth": lambda model, agent_type: (
175
model.agents.select(agent_type=agent_type).agg("wealth",
176
lambda vals: sum(vals) / len(vals) if vals else 0)
177
),
178
179
# Type-specific statistics
180
"Total Energy": lambda model, agent_type: (
181
model.agents.select(agent_type=agent_type).agg("energy", sum)
182
)
183
}
184
```
185
186
### Custom Tables
187
188
Custom tables allow for flexible data structures beyond the standard time-series format.
189
190
```python { .api }
191
# Custom table specification
192
tables = {
193
"interactions": {
194
"columns": ["step", "agent_1", "agent_2", "interaction_type", "outcome"]
195
},
196
"events": {
197
"columns": ["step", "event_type", "location", "participants", "data"]
198
}
199
}
200
```
201
202
## Usage Examples
203
204
### Basic Data Collection Setup
205
206
```python { .api }
207
from mesa import Agent, Model, DataCollector
208
209
class WealthAgent(Agent):
210
def __init__(self, model, wealth=100):
211
super().__init__(model)
212
self.wealth = wealth
213
self.transactions = 0
214
215
def step(self):
216
# Simple wealth transfer
217
if len(self.model.agents) > 1:
218
other = self.random.choice([a for a in self.model.agents if a != self])
219
transfer = self.random.randint(1, min(10, self.wealth))
220
self.wealth -= transfer
221
other.wealth += transfer
222
self.transactions += 1
223
224
class WealthModel(Model):
225
def __init__(self, n_agents=100, initial_wealth=100):
226
super().__init__()
227
228
# Set up data collection
229
self.datacollector = DataCollector(
230
model_reporters={
231
"Total Wealth": lambda m: m.agents.agg("wealth", sum),
232
"Gini Coefficient": self.calculate_gini,
233
"Active Agents": lambda m: len(m.agents.select(lambda a: a.wealth > 0))
234
},
235
agent_reporters={
236
"Wealth": "wealth",
237
"Transactions": "transactions"
238
}
239
)
240
241
# Create agents
242
for i in range(n_agents):
243
agent = WealthAgent(self, wealth=initial_wealth)
244
245
self.running = True
246
247
def calculate_gini(self):
248
"""Calculate Gini coefficient of wealth distribution."""
249
wealth_values = self.agents.get("wealth")
250
if not wealth_values:
251
return 0
252
253
wealth_values = sorted(wealth_values)
254
n = len(wealth_values)
255
cumsum = sum((i + 1) * wealth for i, wealth in enumerate(wealth_values))
256
return (2 * cumsum) / (n * sum(wealth_values)) - (n + 1) / n
257
258
def step(self):
259
self.datacollector.collect(self)
260
self.agents.shuffle_do("step")
261
262
# Run simulation and collect data
263
model = WealthModel(n_agents=50)
264
for i in range(100):
265
model.step()
266
267
# Get collected data
268
model_data = model.datacollector.get_model_vars_dataframe()
269
agent_data = model.datacollector.get_agent_vars_dataframe()
270
271
print("Model data shape:", model_data.shape)
272
print("Agent data shape:", agent_data.shape)
273
print("Final Gini coefficient:", model_data["Gini Coefficient"].iloc[-1])
274
```
275
276
### Advanced Data Collection with Custom Tables
277
278
```python { .api }
279
from mesa import Agent, Model, DataCollector
280
281
class SocialAgent(Agent):
282
def __init__(self, model, cooperation_level=0.5):
283
super().__init__(model)
284
self.cooperation_level = cooperation_level
285
self.reputation = 0.5
286
self.interactions_this_step = []
287
288
def interact(self, other):
289
"""Interact with another agent."""
290
# Determine cooperation
291
cooperate = self.random.random() < self.cooperation_level
292
other_cooperates = self.random.random() < other.cooperation_level
293
294
# Record interaction
295
interaction_data = {
296
"step": self.model.steps,
297
"agent_1": self.unique_id,
298
"agent_2": other.unique_id,
299
"agent_1_cooperates": cooperate,
300
"agent_2_cooperates": other_cooperates,
301
"outcome": "mutual_cooperation" if cooperate and other_cooperates
302
else "mutual_defection" if not cooperate and not other_cooperates
303
else "mixed"
304
}
305
306
# Add to model's interaction table
307
self.model.datacollector.add_table_row("interactions", interaction_data)
308
309
# Update reputation based on outcome
310
if cooperate and other_cooperates:
311
self.reputation += 0.1
312
other.reputation += 0.1
313
elif cooperate and not other_cooperates:
314
self.reputation -= 0.05
315
other.reputation += 0.05
316
317
def step(self):
318
# Interact with random other agents
319
num_interactions = self.random.randint(1, 3)
320
for _ in range(num_interactions):
321
if len(self.model.agents) > 1:
322
other = self.random.choice([a for a in self.model.agents if a != self])
323
self.interact(other)
324
325
class SocialModel(Model):
326
def __init__(self, n_agents=50):
327
super().__init__()
328
329
# Set up comprehensive data collection
330
self.datacollector = DataCollector(
331
model_reporters={
332
"Average Cooperation": lambda m: m.agents.agg("cooperation_level",
333
lambda vals: sum(vals) / len(vals)),
334
"Average Reputation": lambda m: m.agents.agg("reputation",
335
lambda vals: sum(vals) / len(vals)),
336
"High Cooperators": lambda m: len(m.agents.select(lambda a: a.cooperation_level > 0.7)),
337
"Low Cooperators": lambda m: len(m.agents.select(lambda a: a.cooperation_level < 0.3))
338
},
339
agent_reporters={
340
"Cooperation Level": "cooperation_level",
341
"Reputation": "reputation"
342
},
343
agenttype_reporters={
344
"Count": lambda m, agent_type: len(m.agents.select(agent_type=agent_type)),
345
"Avg Cooperation": lambda m, agent_type: (
346
m.agents.select(agent_type=agent_type).agg("cooperation_level",
347
lambda vals: sum(vals) / len(vals) if vals else 0)
348
)
349
},
350
tables={
351
"interactions": {
352
"columns": ["step", "agent_1", "agent_2", "agent_1_cooperates",
353
"agent_2_cooperates", "outcome"]
354
},
355
"reputation_events": {
356
"columns": ["step", "agent_id", "old_reputation", "new_reputation", "cause"]
357
}
358
}
359
)
360
361
# Create agents
362
for i in range(n_agents):
363
cooperation = self.random.uniform(0.1, 0.9)
364
agent = SocialAgent(self, cooperation_level=cooperation)
365
366
self.running = True
367
368
def step(self):
369
self.datacollector.collect(self)
370
self.agents.shuffle_do("step")
371
372
# Add reputation change events to custom table
373
for agent in self.agents:
374
if hasattr(agent, 'old_reputation') and agent.reputation != agent.old_reputation:
375
self.datacollector.add_table_row("reputation_events", {
376
"step": self.steps,
377
"agent_id": agent.unique_id,
378
"old_reputation": agent.old_reputation,
379
"new_reputation": agent.reputation,
380
"cause": "interaction"
381
})
382
agent.old_reputation = agent.reputation
383
384
# Run simulation
385
model = SocialModel(n_agents=30)
386
for i in range(50):
387
model.step()
388
389
# Analyze collected data
390
model_data = model.datacollector.get_model_vars_dataframe()
391
agent_data = model.datacollector.get_agent_vars_dataframe()
392
interactions_data = model.datacollector.get_table_dataframe("interactions")
393
reputation_events = model.datacollector.get_table_dataframe("reputation_events")
394
395
print("Model evolution:")
396
print(model_data[["Average Cooperation", "Average Reputation"]].head(10))
397
398
print("\nInteraction summary:")
399
print(interactions_data["outcome"].value_counts())
400
401
print("\nFinal agent states:")
402
print(agent_data.loc[agent_data.index.get_level_values("Step") == model.steps - 1].describe())
403
```
404
405
### Multi-Agent Type Data Collection
406
407
```python { .api }
408
from mesa import Agent, Model, DataCollector
409
410
class Predator(Agent):
411
def __init__(self, model, energy=100):
412
super().__init__(model)
413
self.energy = energy
414
self.hunting_success = 0
415
416
def step(self):
417
self.energy -= 2
418
if self.energy <= 0:
419
self.remove()
420
421
class Prey(Agent):
422
def __init__(self, model, energy=50):
423
super().__init__(model)
424
self.energy = energy
425
self.escapes = 0
426
427
def step(self):
428
self.energy -= 1
429
if self.energy <= 0:
430
self.remove()
431
432
class EcosystemModel(Model):
433
def __init__(self, n_predators=10, n_prey=50):
434
super().__init__()
435
436
# Multi-type data collection
437
self.datacollector = DataCollector(
438
model_reporters={
439
"Total Population": lambda m: len(m.agents),
440
"Predator-Prey Ratio": lambda m: (
441
len(m.agents.select(agent_type=Predator)) /
442
max(1, len(m.agents.select(agent_type=Prey)))
443
),
444
"Ecosystem Stability": self.calculate_stability
445
},
446
agent_reporters={
447
"Energy": "energy",
448
"Type": lambda a: type(a).__name__
449
},
450
agenttype_reporters={
451
"Count": lambda m, agent_type: len(m.agents.select(agent_type=agent_type)),
452
"Average Energy": lambda m, agent_type: (
453
m.agents.select(agent_type=agent_type).agg("energy",
454
lambda vals: sum(vals) / len(vals) if vals else 0)
455
),
456
"Total Energy": lambda m, agent_type: (
457
m.agents.select(agent_type=agent_type).agg("energy", sum)
458
)
459
}
460
)
461
462
# Create ecosystem
463
for i in range(n_predators):
464
Predator(self, energy=100)
465
466
for i in range(n_prey):
467
Prey(self, energy=50)
468
469
self.running = True
470
471
def calculate_stability(self):
472
"""Calculate ecosystem stability metric."""
473
predators = len(self.agents.select(agent_type=Predator))
474
prey = len(self.agents.select(agent_type=Prey))
475
476
if predators == 0 or prey == 0:
477
return 0.0
478
479
# Stability based on balanced populations
480
total = predators + prey
481
predator_ratio = predators / total
482
483
# Optimal ratio around 0.2 (20% predators)
484
optimal_ratio = 0.2
485
stability = 1.0 - abs(predator_ratio - optimal_ratio) / optimal_ratio
486
return max(0.0, stability)
487
488
def step(self):
489
self.datacollector.collect(self)
490
self.agents.shuffle_do("step")
491
492
# Check for extinction
493
predators = len(self.agents.select(agent_type=Predator))
494
prey = len(self.agents.select(agent_type=Prey))
495
496
if predators == 0 or prey == 0:
497
self.running = False
498
499
# Run ecosystem simulation
500
model = EcosystemModel(n_predators=8, n_prey=40)
501
for i in range(200):
502
model.step()
503
if not model.running:
504
break
505
506
# Analyze multi-type data
507
model_data = model.datacollector.get_model_vars_dataframe()
508
predator_data = model.datacollector.get_agenttype_vars_dataframe(Predator)
509
prey_data = model.datacollector.get_agenttype_vars_dataframe(Prey)
510
511
print("Ecosystem dynamics:")
512
print(model_data[["Total Population", "Predator-Prey Ratio", "Ecosystem Stability"]].tail(10))
513
514
print("\nPredator population:")
515
print(predator_data[["Count", "Average Energy"]].tail(5))
516
517
print("\nPrey population:")
518
print(prey_data[["Count", "Average Energy"]].tail(5))
519
```
520
521
## Data Export and Analysis
522
523
### Exporting Data
524
525
```python { .api }
526
# Export data to files
527
model_df = model.datacollector.get_model_vars_dataframe()
528
agent_df = model.datacollector.get_agent_vars_dataframe()
529
530
# Save to CSV
531
model_df.to_csv("model_data.csv")
532
agent_df.to_csv("agent_data.csv")
533
534
# Save to other formats
535
model_df.to_parquet("model_data.parquet")
536
agent_df.to_json("agent_data.json")
537
538
# Custom table data
539
if "interactions" in model.datacollector.tables:
540
interactions_df = model.datacollector.get_table_dataframe("interactions")
541
interactions_df.to_csv("interactions.csv")
542
```
543
544
### Data Analysis Integration
545
546
```python { .api }
547
import pandas as pd
548
import numpy as np
549
550
# Get data for analysis
551
model_data = model.datacollector.get_model_vars_dataframe()
552
agent_data = model.datacollector.get_agent_vars_dataframe()
553
554
# Time series analysis
555
model_data["Population_Change"] = model_data["Total Agents"].diff()
556
model_data["Population_Growth_Rate"] = model_data["Population_Change"] / model_data["Total Agents"].shift(1)
557
558
# Agent-level analysis
559
# Calculate summary statistics by agent
560
agent_summary = agent_data.groupby(level="AgentID").agg({
561
"Wealth": ["mean", "std", "min", "max"],
562
"Transactions": "sum"
563
})
564
565
# Time-based agent analysis
566
# Get agent wealth over time
567
agent_wealth_over_time = agent_data.reset_index().pivot(
568
index="Step", columns="AgentID", values="Wealth"
569
)
570
571
# Calculate correlation between agents
572
wealth_correlations = agent_wealth_over_time.corr()
573
574
print("Model summary statistics:")
575
print(model_data.describe())
576
577
print("\nAgent summary statistics:")
578
print(agent_summary.head())
579
```
580
581
## Best Practices
582
583
### Performance Optimization
584
585
```python { .api }
586
# Efficient reporter functions
587
def efficient_model_reporter(model):
588
"""Example of efficient model reporter."""
589
# Collect all needed data in one pass
590
agent_data = model.agents.get(["wealth", "energy", "active"])
591
wealth_values = [d[0] for d in agent_data]
592
energy_values = [d[1] for d in agent_data]
593
active_count = sum(1 for d in agent_data if d[2])
594
595
return {
596
"total_wealth": sum(wealth_values),
597
"avg_wealth": sum(wealth_values) / len(wealth_values) if wealth_values else 0,
598
"total_energy": sum(energy_values),
599
"active_agents": active_count
600
}
601
602
# Conditional data collection
603
def conditional_collector(model):
604
"""Only collect data when needed."""
605
if model.steps % 10 == 0: # Collect every 10 steps
606
return model.calculate_complex_metric()
607
return None
608
```
609
610
### Memory Management
611
612
```python { .api }
613
# For long-running simulations, periodically export and clear data
614
class LongRunningModel(Model):
615
def __init__(self):
616
super().__init__()
617
self.datacollector = DataCollector(
618
model_reporters={"Population": lambda m: len(m.agents)}
619
)
620
self.export_interval = 1000
621
622
def step(self):
623
self.datacollector.collect(self)
624
self.agents.shuffle_do("step")
625
626
# Export and clear data periodically
627
if self.steps % self.export_interval == 0:
628
self.export_data()
629
self.datacollector = DataCollector(
630
model_reporters={"Population": lambda m: len(m.agents)}
631
)
632
633
def export_data(self):
634
"""Export data and clear collector."""
635
df = self.datacollector.get_model_vars_dataframe()
636
filename = f"data_batch_{self.steps // self.export_interval}.csv"
637
df.to_csv(filename)
638
```