or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

acceptance-functions.mdconstraints.mddata-aggregation.mdevaluation-metrics.mdgraph-operations.mdindex.mdmarkov-chain-analysis.mdoptimization.mdpartition-management.mdproposal-algorithms.md

data-aggregation.mddocs/

0

# Data Aggregation

1

2

Compute and track partition attributes including election results, demographic data, and structural properties. Updater functions automatically calculate district-level summaries whenever partitions change.

3

4

## Capabilities

5

6

### Election Data Handling

7

8

Track and analyze election results across districts with automatic vote tallying and percentage calculations.

9

10

```python { .api }

11

class Election:

12

def __init__(

13

self,

14

name: str,

15

columns: Union[Dict[str, str], List[str]],

16

alias: str = None

17

) -> None:

18

"""

19

Create election updater for tracking vote data by district.

20

21

Parameters:

22

- name (str): Name identifier for this election

23

- columns (Union[Dict[str, str], List[str]]): Either dict mapping party names to column names, or list of column names that serve as both party names and columns

24

- alias (str, optional): Alternative name for accessing results

25

26

Returns:

27

None

28

"""

29

```

30

31

The Election class returns an ElectionResults object when used as an updater:

32

33

```python { .api }

34

class ElectionResults:

35

def percents(self, party: str) -> Tuple[float, ...]:

36

"""

37

Get vote percentages for a party across all districts.

38

39

Parameters:

40

- party (str): Party name

41

42

Returns:

43

Tuple[float, ...]: Vote percentages by district

44

"""

45

46

def counts(self, party: str) -> Tuple[int, ...]:

47

"""

48

Get raw vote counts for a party across all districts.

49

50

Parameters:

51

- party (str): Party name

52

53

Returns:

54

Tuple[int, ...]: Vote counts by district

55

"""

56

57

@property

58

def totals_for_party(self) -> Dict[str, Dict[int, float]]:

59

"""

60

Get vote totals organized by party and district.

61

62

Returns:

63

Dict[str, Dict[int, float]]: Party -> District -> votes

64

"""

65

66

@property

67

def totals(self) -> Dict[int, int]:

68

"""

69

Get total votes by district.

70

71

Returns:

72

Dict[int, int]: District -> total votes

73

"""

74

```

75

76

Usage example:

77

```python

78

from gerrychain.updaters import Election

79

80

# Set up election tracking

81

election = Election("SEN18", ["SEN18D", "SEN18R"]) # List format

82

# Or: election = Election("SEN18", {"Democratic": "SEN18D", "Republican": "SEN18R"}) # Dict format

83

84

# Use in partition

85

partition = GeographicPartition(

86

graph,

87

assignment="district",

88

updaters={"SEN18": election}

89

)

90

91

# Access results

92

election_results = partition["SEN18"] # Returns ElectionResults object

93

dem_votes = election_results.counts("SEN18D") # Tuple of counts by district

94

dem_percents = election_results.percents("SEN18D") # Tuple of percentages by district

95

total_votes = election_results.totals # Dict[district_id, total_votes]

96

```

97

98

### Generic Data Tallying

99

100

Aggregate arbitrary numeric data by district using flexible tally functions.

101

102

```python { .api }

103

class Tally:

104

def __init__(

105

self,

106

columns: Union[str, List[str]],

107

alias: str = None

108

) -> None:

109

"""

110

Create tally updater for summing data by district.

111

112

Parameters:

113

- columns (Union[str, List[str]]): Column name(s) to sum

114

- alias (str, optional): Alternative name for accessing results

115

116

Returns:

117

None

118

"""

119

120

class DataTally:

121

def __init__(

122

self,

123

columns: Union[str, List[str]],

124

alias: str = None

125

) -> None:

126

"""

127

Generic data tally with additional processing options.

128

129

Parameters:

130

- columns (Union[str, List[str]]): Column name(s) to aggregate

131

- alias (str, optional): Alternative name for accessing results

132

133

Returns:

134

None

135

"""

136

```

137

138

Usage example:

139

```python

140

from gerrychain.updaters import Tally

141

142

# Set up demographic tallies

143

partition = GeographicPartition(

144

graph,

145

assignment="district",

146

updaters={

147

"population": Tally("TOTPOP"),

148

"vap": Tally("VAP"), # Voting age population

149

"minority_pop": Tally(["BVAP", "HVAP", "ASIANVAP"]),

150

"households": Tally("households")

151

}

152

)

153

154

# Access tallied data

155

district_pop = partition["population"][district_id]

156

minority_pop = partition["minority_pop"][district_id]

157

```

158

159

### Structural Properties

160

161

Track graph-theoretic and geometric properties of partitions.

162

163

```python { .api }

164

def cut_edges(partition: Partition) -> Set[Tuple[NodeId, NodeId]]:

165

"""

166

Find edges that cross district boundaries.

167

168

Parameters:

169

- partition (Partition): Partition to analyze

170

171

Returns:

172

Set[Tuple[NodeId, NodeId]]: Set of edges crossing districts

173

"""

174

175

def cut_edges_by_part(partition: Partition) -> Dict[DistrictId, Set[Tuple[NodeId, NodeId]]]:

176

"""

177

Find cut edges grouped by district.

178

179

Parameters:

180

- partition (Partition): Partition to analyze

181

182

Returns:

183

Dict[DistrictId, Set[Tuple[NodeId, NodeId]]]: Cut edges by district

184

"""

185

186

def county_splits(

187

partition: Partition,

188

county_column: str = "county"

189

) -> Dict[str, int]:

190

"""

191

Count number of districts each county is split across.

192

193

Parameters:

194

- partition (Partition): Partition to analyze

195

- county_column (str): Column name for county identifiers

196

197

Returns:

198

Dict[str, int]: Number of districts per county

199

"""

200

201

def boundary_nodes(partition: Partition) -> Set[NodeId]:

202

"""

203

Find all nodes on district boundaries.

204

205

Parameters:

206

- partition (Partition): Partition to analyze

207

208

Returns:

209

Set[NodeId]: Set of nodes on district boundaries

210

"""

211

212

def exterior_boundaries(partition: Partition) -> Set[Tuple[NodeId, NodeId]]:

213

"""

214

Find edges on the exterior boundary of the partition.

215

216

Parameters:

217

- partition (Partition): Partition to analyze

218

219

Returns:

220

Set[Tuple[NodeId, NodeId]]: Exterior boundary edges

221

"""

222

223

def interior_boundaries(partition: Partition) -> Set[Tuple[NodeId, NodeId]]:

224

"""

225

Find edges on interior boundaries between districts.

226

227

Parameters:

228

- partition (Partition): Partition to analyze

229

230

Returns:

231

Set[Tuple[NodeId, NodeId]]: Interior boundary edges

232

"""

233

234

def flows_from_changes(

235

changes: Dict[NodeId, DistrictId],

236

pop_col: str = "population"

237

) -> Dict[Tuple[DistrictId, DistrictId], float]:

238

"""

239

Calculate population flows from partition changes.

240

241

Parameters:

242

- changes (Dict[NodeId, DistrictId]): Node assignment changes

243

- pop_col (str): Population column name

244

245

Returns:

246

Dict[Tuple[DistrictId, DistrictId], float]: Flow between district pairs

247

"""

248

249

class CountySplit:

250

def __init__(self, county_column: str = "county") -> None:

251

"""

252

Track county splits across districts.

253

254

Parameters:

255

- county_column (str): Column name for county data

256

257

Returns:

258

None

259

"""

260

```

261

262

Usage example:

263

```python

264

from gerrychain.updaters import cut_edges, county_splits

265

266

# Track structural properties

267

partition = GeographicPartition(

268

graph,

269

assignment="district",

270

updaters={

271

"cut_edges": cut_edges,

272

"county_splits": county_splits

273

}

274

)

275

276

# Access properties

277

num_cut_edges = len(partition["cut_edges"])

278

split_counties = {

279

county: count for county, count in partition["county_splits"].items()

280

if count > 1

281

}

282

```

283

284

### Complete Updater Example

285

286

Example showing comprehensive data tracking in a real analysis workflow:

287

288

```python

289

from gerrychain import GeographicPartition, Graph

290

from gerrychain.updaters import Election, Tally, cut_edges, county_splits

291

292

# Load data

293

graph = Graph.from_file("precincts.shp")

294

295

# Set up comprehensive updaters

296

partition = GeographicPartition(

297

graph,

298

assignment="district",

299

updaters={

300

# Demographics

301

"population": Tally("TOTPOP"),

302

"vap": Tally("VAP"),

303

"white_pop": Tally("WVAP"),

304

"black_pop": Tally("BVAP"),

305

"hispanic_pop": Tally("HVAP"),

306

307

# Elections

308

"SEN18": Election("SEN18", ["SEN18D", "SEN18R"]),

309

"GOV18": Election("GOV18", ["GOV18D", "GOV18R"]),

310

"PRES16": Election("PRES16", ["PRES16D", "PRES16R"]),

311

312

# Structure

313

"cut_edges": cut_edges,

314

"county_splits": county_splits,

315

316

# Economic data

317

"median_income": Tally("median_income"),

318

"poverty_rate": Tally("poverty_count") # Will need custom calculation for rates

319

}

320

)

321

322

# Use in analysis

323

for district in partition.parts:

324

print(f"District {district}:")

325

print(f" Population: {partition['population'][district]:,}")

326

print(f" % Black: {100 * partition['black_pop'][district] / partition['population'][district]:.1f}%")

327

328

sen_votes = partition["SEN18"]["counts"][district]

329

dem_pct = 100 * sen_votes["SEN18D"] / sum(sen_votes.values())

330

print(f" Senate Dem %: {dem_pct:.1f}%")

331

332

print(f" Cut edges: {len([e for e in partition['cut_edges'] if district in e])}")

333

print()

334

335

# Track changes over Markov chain

336

populations = []

337

cut_edge_counts = []

338

339

for state in chain:

340

populations.append(list(state["population"].values()))

341

cut_edge_counts.append(len(state["cut_edges"]))

342

343

# Analyze distributions

344

import numpy as np

345

print(f"Population std dev: {np.std(populations[-1]):.0f}")

346

print(f"Avg cut edges: {np.mean(cut_edge_counts):.1f}")

347

```

348

349

### Custom Updater Functions

350

351

Examples of creating custom updater functions for specialized analysis:

352

353

```python

354

def minority_vap_percent(partition):

355

"""Calculate minority VAP percentage by district."""

356

result = {}

357

for district in partition.parts:

358

total_vap = partition["vap"][district]

359

minority_vap = (partition["black_pop"][district] +

360

partition["hispanic_pop"][district] +

361

partition["asian_pop"][district])

362

result[district] = minority_vap / total_vap if total_vap > 0 else 0

363

return result

364

365

def compactness_scores(partition):

366

"""Calculate multiple compactness measures."""

367

from gerrychain.metrics import polsby_popper, schwartzberg

368

return {

369

"polsby_popper": polsby_popper(partition),

370

"schwartzberg": schwartzberg(partition)

371

}

372

373

# Use custom updaters

374

partition = GeographicPartition(

375

graph,

376

assignment="district",

377

updaters={

378

"population": Tally("TOTPOP"),

379

"vap": Tally("VAP"),

380

"black_pop": Tally("BVAP"),

381

"hispanic_pop": Tally("HVAP"),

382

"asian_pop": Tally("ASIANVAP"),

383

"minority_vap_pct": minority_vap_percent,

384

"compactness": compactness_scores

385

}

386

)

387

```

388

389

## Types

390

391

```python { .api }

392

UpdaterFunction = Callable[[Partition], Any]

393

DistrictId = int

394

NodeId = Union[int, str]

395

VoteData = Dict[str, int] # Party -> vote count

396

PercentageData = Dict[str, float] # Party -> percentage

397

```