0
# Zonal Statistics
1
2
Calculate summary statistics of raster values aggregated to vector geometries. Zonal statistics analyze how raster data varies within polygon boundaries, supporting both numerical and categorical analysis with extensive customization options.
3
4
## Capabilities
5
6
### Primary Functions
7
8
Calculate summary statistics for vector geometries overlaid on raster data.
9
10
```python { .api }
11
def zonal_stats(vectors, raster, layer=0, band=1, nodata=None, affine=None,
12
stats=None, all_touched=False, categorical=False,
13
category_map=None, add_stats=None, zone_func=None,
14
raster_out=False, prefix=None, geojson_out=False,
15
boundless=True, progress=False, **kwargs):
16
"""
17
Calculate zonal statistics for vector geometries.
18
19
Parameters:
20
- vectors: Path to vector source or geo-like python objects
21
- raster: Path to raster file or numpy array (requires affine if ndarray)
22
- layer: Vector layer to use, by name or number (default: 0)
23
- band: Raster band number, counting from 1 (default: 1)
24
- nodata: Override raster nodata value (default: None)
25
- affine: Affine transform, required for numpy arrays (default: None)
26
- stats: Statistics to calculate - list/string or None for defaults (default: None)
27
- all_touched: Include all touched pixels vs center-point only (default: False)
28
- categorical: Enable categorical analysis mode (default: False)
29
- category_map: Dict mapping raster values to category names (default: None)
30
- add_stats: Dict of custom statistics functions (default: None)
31
- zone_func: Function to apply to zone array before stats (default: None)
32
- raster_out: Include masked array in output (default: False)
33
- prefix: Prefix for output dictionary keys (default: None)
34
- geojson_out: Return GeoJSON features with stats as properties (default: False)
35
- boundless: Allow features beyond raster extent (default: True)
36
- progress: Show progress bar using tqdm, requires tqdm installation (default: False)
37
38
Returns:
39
List of dictionaries with statistics for each feature
40
"""
41
42
def gen_zonal_stats(vectors, raster, layer=0, band=1, nodata=None, affine=None,
43
stats=None, all_touched=False, categorical=False,
44
category_map=None, add_stats=None, zone_func=None,
45
raster_out=False, prefix=None, geojson_out=False,
46
boundless=True, **kwargs):
47
"""
48
Generator version of zonal statistics.
49
50
Parameters are identical to zonal_stats().
51
52
Returns:
53
Generator yielding dictionaries with statistics for each feature
54
"""
55
56
def raster_stats(*args, **kwargs):
57
"""
58
Deprecated alias for zonal_stats. Will be removed in version 1.0.
59
Use zonal_stats instead.
60
"""
61
```
62
63
### Available Statistics
64
65
Default and extended statistics options for numerical analysis.
66
67
```python { .api }
68
DEFAULT_STATS = ["count", "min", "max", "mean"]
69
70
VALID_STATS = [
71
"count", # Number of valid pixels
72
"min", # Minimum value
73
"max", # Maximum value
74
"mean", # Mean value
75
"sum", # Sum of values
76
"std", # Standard deviation
77
"median", # Median value
78
"majority", # Most frequent value
79
"minority", # Least frequent value
80
"unique", # Count of unique values
81
"range", # max - min
82
"nodata", # Count of nodata pixels
83
"nan" # Count of NaN pixels
84
]
85
86
# Percentiles supported via "percentile_N" format (e.g., "percentile_95")
87
```
88
89
### Statistical Utility Functions
90
91
Helper functions for statistics processing and validation.
92
93
```python { .api }
94
def check_stats(stats, categorical):
95
"""
96
Validate and normalize statistics parameter.
97
98
Parameters:
99
- stats: Statistics specification (str, list, or None)
100
- categorical: Whether in categorical mode
101
102
Returns:
103
Tuple of (validated_stats_list, run_count_flag)
104
"""
105
106
def get_percentile(stat):
107
"""
108
Extract percentile value from statistic name.
109
110
Parameters:
111
- stat: Statistic name starting with "percentile_"
112
113
Returns:
114
Float percentile value (0-100)
115
116
Raises:
117
ValueError if invalid percentile format or value
118
"""
119
120
def stats_to_csv(stats):
121
"""
122
Convert statistics list to CSV format.
123
124
Parameters:
125
- stats: List of statistics dictionaries
126
127
Returns:
128
CSV string with headers and data
129
"""
130
```
131
132
### Categorical Analysis
133
134
Functions for analyzing categorical raster data with value mappings.
135
136
```python { .api }
137
def remap_categories(category_map, stats):
138
"""
139
Remap category keys using category mapping.
140
141
Parameters:
142
- category_map: Dict mapping raster values to category names
143
- stats: Statistics dictionary with numeric keys
144
145
Returns:
146
Statistics dictionary with remapped category names as keys
147
"""
148
```
149
150
### Geometry Processing
151
152
Functions for converting geometries to raster-compatible formats.
153
154
```python { .api }
155
def rasterize_geom(geom, like, all_touched=False):
156
"""
157
Rasterize geometry to boolean mask array.
158
159
Parameters:
160
- geom: GeoJSON geometry dictionary
161
- like: Raster object with desired shape and transform
162
- all_touched: Include all touched pixels vs center-point only
163
164
Returns:
165
Boolean numpy array mask
166
"""
167
168
def boxify_points(geom, rast):
169
"""
170
Convert Point/MultiPoint geometries to box polygons for rasterization.
171
Creates boxes 99% of cell size, centered on raster cells.
172
173
Parameters:
174
- geom: Point or MultiPoint geometry
175
- rast: Raster object for cell size calculation
176
177
Returns:
178
MultiPolygon geometry with box polygons
179
"""
180
181
def key_assoc_val(d, func, exclude=None):
182
"""
183
Return key associated with value returned by function.
184
Used for majority/minority statistics.
185
186
Parameters:
187
- d: Dictionary to search
188
- func: Function to apply to values (min, max, etc.)
189
- exclude: Value to exclude from search
190
191
Returns:
192
Key associated with function result
193
"""
194
```
195
196
## Usage Examples
197
198
### Basic Zonal Statistics
199
200
```python
201
from rasterstats import zonal_stats
202
203
# Calculate default statistics for polygons
204
stats = zonal_stats("watersheds.shp", "elevation.tif")
205
print(stats[0]) # {'count': 1000, 'min': 100.0, 'max': 500.0, 'mean': 350.2}
206
207
# Specify custom statistics
208
stats = zonal_stats("polygons.shp", "temperature.tif",
209
stats=['min', 'max', 'mean', 'std', 'percentile_95'])
210
211
# Use all available statistics
212
stats = zonal_stats("zones.shp", "data.tif", stats="ALL")
213
214
# Show progress bar for large datasets (requires pip install rasterstats[progress])
215
stats = zonal_stats("large_dataset.shp", "raster.tif", progress=True)
216
```
217
218
### Categorical Analysis
219
220
```python
221
# Analyze land cover categories
222
category_map = {1: 'Forest', 2: 'Water', 3: 'Urban', 4: 'Agriculture'}
223
stats = zonal_stats("counties.shp", "landcover.tif",
224
categorical=True, category_map=category_map)
225
print(stats[0]) # {'Forest': 500, 'Water': 50, 'Urban': 200, 'Agriculture': 250}
226
```
227
228
### Custom Statistics Functions
229
230
```python
231
import numpy as np
232
233
def coefficient_of_variation(masked_array):
234
return np.std(masked_array) / np.mean(masked_array)
235
236
def pixel_density(masked_array, feature_properties):
237
area_km2 = feature_properties.get('area_km2', 1.0)
238
return masked_array.count() / area_km2
239
240
custom_stats = {
241
'cv': coefficient_of_variation,
242
'density': pixel_density
243
}
244
245
stats = zonal_stats("polygons.shp", "population.tif",
246
add_stats=custom_stats, stats=['mean', 'cv', 'density'])
247
```
248
249
### Working with NumPy Arrays
250
251
```python
252
import numpy as np
253
from affine import Affine
254
255
# Create sample data
256
raster_data = np.random.rand(100, 100) * 1000
257
transform = Affine.translation(0, 100) * Affine.scale(1, -1)
258
259
# Define polygons as GeoJSON
260
polygons = [{
261
'type': 'Feature',
262
'geometry': {
263
'type': 'Polygon',
264
'coordinates': [[(10, 10), (50, 10), (50, 50), (10, 50), (10, 10)]]
265
},
266
'properties': {'id': 1}
267
}]
268
269
stats = zonal_stats(polygons, raster_data, affine=transform,
270
stats=['count', 'mean', 'std'])
271
```
272
273
### GeoJSON Output
274
275
```python
276
# Return results as GeoJSON features with statistics as properties
277
features = zonal_stats("watersheds.shp", "rainfall.tif",
278
geojson_out=True, prefix="rain_",
279
stats=['mean', 'sum', 'count'])
280
281
# Each feature retains original geometry and properties plus statistics
282
print(features[0]['properties'])
283
# {'name': 'Watershed A', 'rain_mean': 45.2, 'rain_sum': 45200, 'rain_count': 1000}
284
```
285
286
### Advanced Options
287
288
```python
289
# Include raster arrays in output
290
stats = zonal_stats("polygons.shp", "elevation.tif", raster_out=True)
291
mini_array = stats[0]['mini_raster_array'] # Masked numpy array
292
mini_affine = stats[0]['mini_raster_affine'] # Corresponding transform
293
294
# Custom zone processing function
295
def apply_mask(zone_array):
296
# Only analyze pixels above threshold
297
return np.ma.masked_where(zone_array < 100, zone_array)
298
299
stats = zonal_stats("zones.shp", "data.tif", zone_func=apply_mask)
300
301
# Handle features extending beyond raster
302
stats = zonal_stats("large_polygons.shp", "small_raster.tif",
303
boundless=True) # Allows partial coverage
304
```
305
306
## Types
307
308
```python { .api }
309
from typing import Union, List, Dict, Any, Optional, Callable
310
from numpy import ndarray, ma
311
from affine import Affine
312
313
StatsSpec = Union[str, List[str], None]
314
CategoryMapping = Dict[Union[int, float], str]
315
CustomStats = Dict[str, Callable[[ma.MaskedArray], Union[int, float]]]
316
ZoneFunction = Callable[[ma.MaskedArray], Optional[ma.MaskedArray]]
317
ZonalOutput = Dict[str, Union[int, float, None, ma.MaskedArray, Affine]]
318
```