0
# Data Analysis
1
2
Statistical analysis, time series computation, and performance metrics for transit operations. This module provides comprehensive analysis capabilities for routes, stops, trips, and system-wide metrics across multiple dates and time periods.
3
4
## Route Analysis
5
6
### Route Statistics
7
8
```python { .api }
9
def compute_route_stats_0(trip_stats_subset, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):
10
"""
11
Compute route statistics for a trip subset.
12
13
Parameters:
14
- trip_stats_subset (DataFrame): Subset of trip statistics
15
- headway_start_time (str): Start time for headway calculations
16
- headway_end_time (str): End time for headway calculations
17
- split_directions (bool): Split statistics by direction
18
19
Returns:
20
- DataFrame: Route statistics
21
"""
22
23
def compute_route_stats(feed, trip_stats_subset, dates, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):
24
"""
25
Compute route statistics for multiple dates.
26
27
Parameters:
28
- feed (Feed): GTFS feed object
29
- trip_stats_subset (DataFrame): Trip statistics subset
30
- dates (list): List of dates to analyze
31
- headway_start_time (str): Start time for headway calculations
32
- headway_end_time (str): End time for headway calculations
33
- split_directions (bool): Split statistics by direction
34
35
Returns:
36
- DataFrame: Route statistics with date index
37
"""
38
```
39
40
### Route Time Series
41
42
```python { .api }
43
def compute_route_time_series_0(trip_stats_subset, date_label='20010101', freq='5Min', *, split_directions=False):
44
"""
45
Compute route time series for a trip subset.
46
47
Parameters:
48
- trip_stats_subset (DataFrame): Trip statistics subset
49
- date_label (str): Date label for the time series
50
- freq (str): Frequency for time series sampling
51
- split_directions (bool): Split by direction
52
53
Returns:
54
- DataFrame: Route time series
55
"""
56
57
def build_zero_route_time_series(feed, date_label='20010101', freq='5Min', *, split_directions=False):
58
"""
59
Build a zero-filled route time series template.
60
61
Parameters:
62
- feed (Feed): GTFS feed object
63
- date_label (str): Date label for the time series
64
- freq (str): Frequency for time series sampling
65
- split_directions (bool): Split by direction
66
67
Returns:
68
- DataFrame: Zero-filled route time series
69
"""
70
71
def compute_route_time_series(feed, trip_stats_subset, dates, freq='5Min', *, split_directions=False):
72
"""
73
Compute route time series for multiple dates.
74
75
Parameters:
76
- feed (Feed): GTFS feed object
77
- trip_stats_subset (DataFrame): Trip statistics subset
78
- dates (list): List of dates to analyze
79
- freq (str): Frequency for time series sampling
80
- split_directions (bool): Split by direction
81
82
Returns:
83
- DataFrame: Route time series with hierarchical columns
84
"""
85
```
86
87
## Stop Analysis
88
89
### Stop Statistics
90
91
```python { .api }
92
def compute_stop_stats_0(stop_times_subset, trip_subset, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):
93
"""
94
Compute stop statistics for data subsets.
95
96
Parameters:
97
- stop_times_subset (DataFrame): Stop times subset
98
- trip_subset (DataFrame): Trip subset
99
- headway_start_time (str): Start time for headway calculations
100
- headway_end_time (str): End time for headway calculations
101
- split_directions (bool): Split statistics by direction
102
103
Returns:
104
- DataFrame: Stop statistics
105
"""
106
107
def compute_stop_stats(feed, dates, stop_ids=None, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):
108
"""
109
Compute stop statistics for specified dates.
110
111
Parameters:
112
- feed (Feed): GTFS feed object
113
- dates (list): List of dates to analyze
114
- stop_ids (list, optional): Specific stop IDs to analyze
115
- headway_start_time (str): Start time for headway calculations
116
- headway_end_time (str): End time for headway calculations
117
- split_directions (bool): Split statistics by direction
118
119
Returns:
120
- DataFrame: Stop statistics with date index
121
"""
122
123
def compute_stop_activity(feed, dates):
124
"""
125
Mark stops as active or inactive on specified dates.
126
127
Parameters:
128
- feed (Feed): GTFS feed object
129
- dates (list): List of dates to analyze
130
131
Returns:
132
- DataFrame: Stop activity indicators by date
133
"""
134
```
135
136
### Stop Time Series
137
138
```python { .api }
139
def compute_stop_time_series_0(stop_times_subset, trip_subset, freq='5Min', date_label='20010101', *, split_directions=False):
140
"""
141
Compute stop time series for data subsets.
142
143
Parameters:
144
- stop_times_subset (DataFrame): Stop times subset
145
- trip_subset (DataFrame): Trip subset
146
- freq (str): Frequency for time series sampling
147
- date_label (str): Date label for the time series
148
- split_directions (bool): Split by direction
149
150
Returns:
151
- DataFrame: Stop time series
152
"""
153
154
def build_zero_stop_time_series(feed, date_label='20010101', freq='5Min', *, split_directions=False):
155
"""
156
Build a zero-filled stop time series template.
157
158
Parameters:
159
- feed (Feed): GTFS feed object
160
- date_label (str): Date label for the time series
161
- freq (str): Frequency for time series sampling
162
- split_directions (bool): Split by direction
163
164
Returns:
165
- DataFrame: Zero-filled stop time series
166
"""
167
168
def compute_stop_time_series(feed, dates, stop_ids=None, freq='5Min', *, split_directions=False):
169
"""
170
Compute stop time series for specified dates.
171
172
Parameters:
173
- feed (Feed): GTFS feed object
174
- dates (list): List of dates to analyze
175
- stop_ids (list, optional): Specific stop IDs to analyze
176
- freq (str): Frequency for time series sampling
177
- split_directions (bool): Split by direction
178
179
Returns:
180
- DataFrame: Stop time series with hierarchical columns
181
"""
182
```
183
184
## Trip Analysis
185
186
### Trip Statistics and Operations
187
188
```python { .api }
189
def get_active_services(feed, date):
190
"""
191
Get list of service IDs active on a specific date.
192
193
Parameters:
194
- feed (Feed): GTFS feed object
195
- date (str): Date in YYYYMMDD format
196
197
Returns:
198
- list: Service IDs active on the date
199
"""
200
201
def compute_trip_activity(feed, dates):
202
"""
203
Mark trips as active or inactive on specified dates.
204
205
Parameters:
206
- feed (Feed): GTFS feed object
207
- dates (list): List of dates to analyze
208
209
Returns:
210
- DataFrame: Trip activity indicators by date
211
"""
212
213
def compute_busiest_date(feed, dates):
214
"""
215
Get the date with maximum number of active trips.
216
217
Parameters:
218
- feed (Feed): GTFS feed object
219
- dates (list): List of dates to analyze
220
221
Returns:
222
- str: Date with maximum active trips
223
"""
224
225
def compute_trip_stats(feed, route_ids=None, *, compute_dist_from_shapes=False):
226
"""
227
Compute comprehensive trip statistics.
228
229
Parameters:
230
- feed (Feed): GTFS feed object
231
- route_ids (list, optional): Specific route IDs to analyze
232
- compute_dist_from_shapes (bool): Calculate distances from shapes
233
234
Returns:
235
- DataFrame: Trip statistics including distances, durations, speeds
236
"""
237
238
def name_stop_patterns(feed):
239
"""
240
Assign stop pattern names to trips based on stop sequences.
241
242
Parameters:
243
- feed (Feed): GTFS feed object
244
245
Returns:
246
- DataFrame: Trips with assigned stop pattern names
247
"""
248
249
def locate_trips(feed, date, times):
250
"""
251
Get trip positions at specified times.
252
253
Parameters:
254
- feed (Feed): GTFS feed object
255
- date (str): Date in YYYYMMDD format
256
- times (list): List of times in HH:MM:SS format
257
258
Returns:
259
- DataFrame: Trip positions and status at specified times
260
"""
261
262
def build_route_timetable(feed, route_id, dates):
263
"""
264
Build a route timetable showing departure times at stops.
265
266
Parameters:
267
- feed (Feed): GTFS feed object
268
- route_id (str): Route ID to build timetable for
269
- dates (list): List of dates in YYYYMMDD format
270
271
Returns:
272
- DataFrame: Route timetable with stops and departure times
273
"""
274
275
def build_stop_timetable(feed, stop_id, dates):
276
"""
277
Build a stop timetable showing all arrivals/departures.
278
279
Parameters:
280
- feed (Feed): GTFS feed object
281
- stop_id (str): Stop ID to build timetable for
282
- dates (list): List of dates in YYYYMMDD format
283
284
Returns:
285
- DataFrame: Stop timetable with trip arrivals and departures
286
"""
287
```
288
289
## Feed-Level Analysis
290
291
### Feed Statistics
292
293
```python { .api }
294
def compute_feed_stats_0(feed, trip_stats_subset, *, split_route_types=False):
295
"""
296
Compute feed-level statistics for a trip subset.
297
298
Parameters:
299
- feed (Feed): GTFS feed object
300
- trip_stats_subset (DataFrame): Trip statistics subset
301
- split_route_types (bool): Split statistics by route type
302
303
Returns:
304
- DataFrame: Feed-level statistics
305
"""
306
307
def compute_feed_stats(feed, trip_stats, dates, *, split_route_types=False):
308
"""
309
Compute feed-level statistics for multiple dates.
310
311
Parameters:
312
- feed (Feed): GTFS feed object
313
- trip_stats (DataFrame): Trip statistics
314
- dates (list): List of dates to analyze
315
- split_route_types (bool): Split statistics by route type
316
317
Returns:
318
- DataFrame: Feed statistics with date index
319
"""
320
321
def compute_feed_time_series(feed, trip_stats, dates, freq='5Min', *, split_route_types=False):
322
"""
323
Compute feed-level time series for multiple dates.
324
325
Parameters:
326
- feed (Feed): GTFS feed object
327
- trip_stats (DataFrame): Trip statistics
328
- dates (list): List of dates to analyze
329
- freq (str): Frequency for time series sampling
330
- split_route_types (bool): Split by route type
331
332
Returns:
333
- DataFrame: Feed time series with hierarchical columns
334
"""
335
```
336
337
## Usage Examples
338
339
### Basic Route Analysis
340
341
```python
342
import gtfs_kit as gk
343
344
# Load feed and compute trip statistics
345
feed = gk.read_feed('gtfs.zip', dist_units='km')
346
trip_stats = gk.compute_trip_stats(feed)
347
348
# Analyze routes for specific dates
349
dates = ['20230101', '20230102', '20230103']
350
route_stats = gk.compute_route_stats(feed, trip_stats, dates)
351
352
# Generate route time series
353
route_ts = gk.compute_route_time_series(feed, trip_stats, dates, freq='15Min')
354
```
355
356
### Stop Performance Analysis
357
358
```python
359
# Compute stop statistics with custom headway period
360
stop_stats = gk.compute_stop_stats(
361
feed,
362
dates=['20230101'],
363
headway_start_time='06:00:00',
364
headway_end_time='22:00:00',
365
split_directions=True
366
)
367
368
# Generate stop time series
369
stop_ts = gk.compute_stop_time_series(
370
feed,
371
dates=['20230101'],
372
freq='10Min'
373
)
374
```
375
376
### System-Wide Analysis
377
378
```python
379
# Find the busiest operating day
380
busiest_date = gk.compute_busiest_date(feed, dates)
381
382
# Compute feed-level statistics
383
feed_stats = gk.compute_feed_stats(feed, trip_stats, dates, split_route_types=True)
384
385
# Generate system-wide time series
386
feed_ts = gk.compute_feed_time_series(feed, trip_stats, dates)
387
```
388
389
All analysis functions support flexible date ranges, time periods, and granularity options to accommodate different analytical needs and reporting requirements.