0
# Utilities
1
2
Constants, helper functions, and miscellaneous utilities for GTFS data manipulation. This module includes configuration constants, calendar operations, feed information functions, and various utility functions.
3
4
## Constants
5
6
### GTFS Reference Data
7
8
```python { .api }
9
GTFS_REF: pd.DataFrame
10
```
11
Reference DataFrame containing GTFS table and column specifications with data types, requirements, and validation rules.
12
13
```python { .api }
14
DTYPE: dict
15
```
16
Data types dictionary for Pandas CSV reads based on GTFS reference specifications.
17
18
```python { .api }
19
FEED_ATTRS: list
20
```
21
List of primary feed attributes for all standard GTFS tables: `['agency', 'stops', 'routes', 'trips', 'stop_times', 'calendar', 'calendar_dates', 'fare_attributes', 'fare_rules', 'shapes', 'frequencies', 'transfers', 'feed_info', 'attributions']`.
22
23
### Unit Constants
24
25
```python { .api }
26
DIST_UNITS: list
27
```
28
Valid distance units: `['ft', 'mi', 'm', 'km']`.
29
30
```python { .api }
31
WGS84: str
32
```
33
WGS84 coordinate reference system identifier: `'EPSG:4326'`.
34
35
### Visualization Constants
36
37
```python { .api }
38
COLORS_SET2: list
39
```
40
Colorbrewer 8-class Set2 colors for visualizations: A list of hex color codes optimized for categorical data visualization.
41
42
```python { .api }
43
STOP_STYLE: dict
44
```
45
Default Leaflet circleMarker style parameters for stop visualization on maps.
46
47
## Calendar Operations
48
49
### Date Management
50
51
```python { .api }
52
def get_dates(feed, *, as_date_obj=False):
53
"""
54
Get all valid service dates for the feed.
55
56
Parameters:
57
- feed (Feed): GTFS feed object
58
- as_date_obj (bool): Return as datetime.date objects instead of strings
59
60
Returns:
61
- list: List of valid service dates
62
"""
63
64
def subset_dates(feed, dates):
65
"""
66
Subset dates to those within feed's service period.
67
68
Parameters:
69
- feed (Feed): GTFS feed object
70
- dates (list): List of dates to filter
71
72
Returns:
73
- list: Filtered dates within feed service period
74
"""
75
```
76
77
### Week Operations
78
79
```python { .api }
80
def get_week(feed, k, *, as_date_obj=False):
81
"""
82
Get the kth Monday-Sunday week of feed service period.
83
84
Parameters:
85
- feed (Feed): GTFS feed object
86
- k (int): Week number (0-indexed)
87
- as_date_obj (bool): Return as datetime.date objects
88
89
Returns:
90
- list: List of dates in the specified week
91
"""
92
93
def get_first_week(feed, *, as_date_obj=False):
94
"""
95
Get the first Monday-Sunday week of feed service period.
96
97
Parameters:
98
- feed (Feed): GTFS feed object
99
- as_date_obj (bool): Return as datetime.date objects
100
101
Returns:
102
- list: List of dates in the first week
103
"""
104
```
105
106
## Helper Functions
107
108
### Date and Time Utilities
109
110
```python { .api }
111
def datestr_to_date(x, format_str='%Y%m%d', *, inverse=False):
112
"""
113
Convert between date strings and datetime.date objects.
114
115
Parameters:
116
- x: Date string or datetime.date object
117
- format_str (str): Date format string
118
- inverse (bool): If True, convert date to string
119
120
Returns:
121
- datetime.date or str: Converted date
122
"""
123
124
def timestr_to_seconds(x, *, inverse=False, mod24=False):
125
"""
126
Convert time strings to seconds since midnight.
127
128
Parameters:
129
- x: Time string in HH:MM:SS format or seconds
130
- inverse (bool): If True, convert seconds to time string
131
- mod24 (bool): Apply modulo 24 hours
132
133
Returns:
134
- int or str: Seconds or time string
135
"""
136
137
def timestr_mod24(timestr):
138
"""
139
Apply modulo 24 hours to time string.
140
141
Parameters:
142
- timestr (str): Time string in HH:MM:SS format
143
144
Returns:
145
- int: Hours modulo 24
146
"""
147
148
def weekday_to_str(weekday, *, inverse=False):
149
"""
150
Convert between weekday numbers and strings.
151
152
Parameters:
153
- weekday: Weekday number (0=Monday) or string
154
- inverse (bool): If True, convert string to number
155
156
Returns:
157
- int or str: Weekday number or string
158
"""
159
```
160
161
### Geometric Utilities
162
163
```python { .api }
164
def get_segment_length(linestring, p, q=None):
165
"""
166
Get length of LineString segment.
167
168
Parameters:
169
- linestring: Shapely LineString
170
- p (float): Start position along line
171
- q (float, optional): End position along line
172
173
Returns:
174
- float: Segment length
175
"""
176
177
def is_metric(dist_units):
178
"""
179
Check if distance units are metric.
180
181
Parameters:
182
- dist_units (str): Distance units string
183
184
Returns:
185
- bool: True if metric units
186
"""
187
188
def get_convert_dist(dist_units_in, dist_units_out):
189
"""
190
Get distance conversion function.
191
192
Parameters:
193
- dist_units_in (str): Input distance units
194
- dist_units_out (str): Output distance units
195
196
Returns:
197
- function: Distance conversion function
198
"""
199
```
200
201
### Data Utilities
202
203
```python { .api }
204
def almost_equal(f, g):
205
"""
206
Check if two DataFrames are almost equal.
207
208
Parameters:
209
- f (DataFrame): First DataFrame
210
- g (DataFrame): Second DataFrame
211
212
Returns:
213
- bool: True if DataFrames are almost equal
214
"""
215
216
def is_not_null(df, col_name):
217
"""
218
Check if DataFrame column has non-null values.
219
220
Parameters:
221
- df (DataFrame): DataFrame to check
222
- col_name (str): Column name to check
223
224
Returns:
225
- bool: True if column has non-null values
226
"""
227
228
def get_max_runs(x):
229
"""
230
Get maximum run lengths in array.
231
232
Parameters:
233
- x: Array-like input
234
235
Returns:
236
- ndarray: Maximum run lengths
237
"""
238
239
def get_peak_indices(times, counts):
240
"""
241
Get indices of peak values in time series.
242
243
Parameters:
244
- times: Array of time values
245
- counts: Array of count values
246
247
Returns:
248
- ndarray: Indices of peaks
249
"""
250
251
def make_ids(n, prefix='id_'):
252
"""
253
Generate n unique ID strings.
254
255
Parameters:
256
- n (int): Number of IDs to generate
257
- prefix (str): Prefix for IDs
258
259
Returns:
260
- list: List of unique ID strings
261
"""
262
263
def longest_subsequence(seq, mode='strictly', order='increasing', key=None, *, index=False):
264
"""
265
Find longest subsequence in sequence.
266
267
Parameters:
268
- seq: Input sequence
269
- mode (str): Comparison mode ('strictly', 'non')
270
- order (str): Order ('increasing', 'decreasing')
271
- key: Key function for comparison
272
- index (bool): Return indices instead of values
273
274
Returns:
275
- list: Longest subsequence or indices
276
"""
277
```
278
279
### Time Series Utilities
280
281
```python { .api }
282
def get_active_trips_df(trip_times):
283
"""
284
Get active trips from trip times DataFrame.
285
286
Parameters:
287
- trip_times (DataFrame): Trip times data
288
289
Returns:
290
- Series: Active trips indicator
291
"""
292
293
def combine_time_series(time_series_dict, kind, *, split_directions=False):
294
"""
295
Combine multiple time series into one DataFrame.
296
297
Parameters:
298
- time_series_dict (dict): Dictionary of time series
299
- kind (str): Type of time series
300
- split_directions (bool): Split by direction
301
302
Returns:
303
- DataFrame: Combined time series
304
"""
305
306
def downsample(time_series, freq):
307
"""
308
Downsample time series to lower frequency.
309
310
Parameters:
311
- time_series (DataFrame): Input time series
312
- freq (str): Target frequency
313
314
Returns:
315
- DataFrame: Downsampled time series
316
"""
317
318
def unstack_time_series(time_series):
319
"""
320
Unstack hierarchical time series columns.
321
322
Parameters:
323
- time_series (DataFrame): Hierarchical time series
324
325
Returns:
326
- DataFrame: Unstacked time series
327
"""
328
329
def restack_time_series(unstacked_time_series):
330
"""
331
Restack unstacked time series.
332
333
Parameters:
334
- unstacked_time_series (DataFrame): Unstacked time series
335
336
Returns:
337
- DataFrame: Restacked time series
338
"""
339
```
340
341
### HTML and GeoJSON Utilities
342
343
```python { .api }
344
def make_html(d):
345
"""
346
Convert dictionary to HTML representation.
347
348
Parameters:
349
- d (dict): Dictionary to convert
350
351
Returns:
352
- str: HTML string
353
"""
354
355
def drop_feature_ids(collection):
356
"""
357
Remove feature IDs from GeoJSON collection.
358
359
Parameters:
360
- collection (dict): GeoJSON FeatureCollection
361
362
Returns:
363
- dict: Collection without feature IDs
364
"""
365
```
366
367
## Feed Information and Quality Assessment
368
369
### Feed Metadata
370
371
```python { .api }
372
def list_fields(feed, table=None):
373
"""
374
Describe GTFS table fields and their specifications.
375
376
Parameters:
377
- feed (Feed): GTFS feed object
378
- table (str, optional): Specific table to describe
379
380
Returns:
381
- DataFrame: Field descriptions and specifications
382
"""
383
384
def describe(feed, sample_date=None):
385
"""
386
Get comprehensive feed indicators and summary values.
387
388
Parameters:
389
- feed (Feed): GTFS feed object
390
- sample_date (str, optional): Date for date-specific metrics
391
392
Returns:
393
- dict: Feed description with key indicators
394
"""
395
396
def assess_quality(feed):
397
"""
398
Assess feed quality using various indicators.
399
400
Parameters:
401
- feed (Feed): GTFS feed object
402
403
Returns:
404
- dict: Quality assessment scores and indicators
405
"""
406
```
407
408
### Feed Modification
409
410
```python { .api }
411
def convert_dist(feed, new_dist_units):
412
"""
413
Convert feed distance units to new units.
414
415
Parameters:
416
- feed (Feed): GTFS feed object (modified in-place)
417
- new_dist_units (str): Target distance units
418
419
Returns:
420
- Feed: Feed with converted distance units
421
"""
422
423
def create_shapes(feed, *, all_trips=False):
424
"""
425
Create shapes by connecting stop coordinates for trips.
426
427
Parameters:
428
- feed (Feed): GTFS feed object (modified in-place)
429
- all_trips (bool): Create shapes for all trips vs only those without shapes
430
431
Returns:
432
- Feed: Feed with generated shapes
433
"""
434
```
435
436
## Feed Filtering and Restriction
437
438
### Trip-Based Filtering
439
440
```python { .api }
441
def restrict_to_trips(feed, trip_ids):
442
"""
443
Restrict feed to specific trips and related entities.
444
445
Parameters:
446
- feed (Feed): GTFS feed object (modified in-place)
447
- trip_ids (list): Trip IDs to retain
448
449
Returns:
450
- Feed: Feed restricted to specified trips
451
"""
452
453
def restrict_to_routes(feed, route_ids):
454
"""
455
Restrict feed to specific routes and related entities.
456
457
Parameters:
458
- feed (Feed): GTFS feed object (modified in-place)
459
- route_ids (list): Route IDs to retain
460
461
Returns:
462
- Feed: Feed restricted to specified routes
463
"""
464
465
def restrict_to_agencies(feed, agency_ids):
466
"""
467
Restrict feed to specific agencies and related entities.
468
469
Parameters:
470
- feed (Feed): GTFS feed object (modified in-place)
471
- agency_ids (list): Agency IDs to retain
472
473
Returns:
474
- Feed: Feed restricted to specified agencies
475
"""
476
```
477
478
### Temporal and Spatial Filtering
479
480
```python { .api }
481
def restrict_to_dates(feed, dates):
482
"""
483
Restrict feed to specific service dates.
484
485
Parameters:
486
- feed (Feed): GTFS feed object (modified in-place)
487
- dates (list): Dates to retain
488
489
Returns:
490
- Feed: Feed restricted to specified dates
491
"""
492
493
def restrict_to_area(feed, area):
494
"""
495
Restrict feed to stops and related entities within geographic area.
496
497
Parameters:
498
- feed (Feed): GTFS feed object (modified in-place)
499
- area: Shapely Polygon or MultiPolygon defining the area
500
501
Returns:
502
- Feed: Feed restricted to specified geographic area
503
"""
504
```
505
506
## Advanced Analysis
507
508
### Screen Line Analysis
509
510
```python { .api }
511
def compute_screen_line_counts(feed, screen_lines, dates, segmentize_m=5, *, include_testing_cols=False):
512
"""
513
Compute transit line crossing counts at screen lines.
514
515
Parameters:
516
- feed (Feed): GTFS feed object
517
- screen_lines: Collection of LineString geometries
518
- dates (list): Dates to analyze
519
- segmentize_m (float): Segmentization distance in meters
520
- include_testing_cols (bool): Include debugging columns
521
522
Returns:
523
- DataFrame: Screen line crossing counts by route and time period
524
"""
525
```
526
527
### Stop Time Operations
528
529
```python { .api }
530
def get_stop_times(feed, date=None):
531
"""
532
Get stop_times DataFrame optionally filtered by date.
533
534
Parameters:
535
- feed (Feed): GTFS feed object
536
- date (str, optional): Filter by service date (YYYYMMDD)
537
538
Returns:
539
- DataFrame: Stop times data
540
"""
541
542
def append_dist_to_stop_times(feed):
543
"""
544
Calculate and append shape_dist_traveled to stop_times.
545
546
Parameters:
547
- feed (Feed): GTFS feed object (modified in-place)
548
549
Returns:
550
- Feed: Feed with updated stop_times
551
"""
552
553
def get_start_and_end_times(feed, date=None):
554
"""
555
Get first departure and last arrival times for the feed.
556
557
Parameters:
558
- feed (Feed): GTFS feed object
559
- date (str, optional): Specific date to analyze
560
561
Returns:
562
- tuple: (earliest_departure, latest_arrival) as time strings
563
"""
564
```
565
566
### Timetable Generation
567
568
```python { .api }
569
def build_route_timetable(feed, route_id, dates):
570
"""
571
Build timetable for a specific route.
572
573
Parameters:
574
- feed (Feed): GTFS feed object
575
- route_id (str): Route ID to build timetable for
576
- dates (list): Dates to include in timetable
577
578
Returns:
579
- DataFrame: Route timetable with stop times
580
"""
581
582
def build_stop_timetable(feed, stop_id, dates):
583
"""
584
Build timetable for a specific stop.
585
586
Parameters:
587
- feed (Feed): GTFS feed object
588
- stop_id (str): Stop ID to build timetable for
589
- dates (list): Dates to include in timetable
590
591
Returns:
592
- DataFrame: Stop timetable with arrival/departure times
593
"""
594
```
595
596
## Package Metadata
597
598
```python { .api }
599
__version__: str
600
```
601
Package version string: "10.3.0".
602
603
## Usage Examples
604
605
### Working with Constants
606
607
```python
608
import gtfs_kit as gk
609
610
# Check available distance units
611
print(gk.DIST_UNITS) # ['ft', 'mi', 'm', 'km']
612
613
# Use GTFS reference data
614
gtfs_spec = gk.GTFS_REF
615
print(gtfs_spec[gtfs_spec['table'] == 'routes'])
616
617
# Use colors for visualization
618
colors = gk.COLORS_SET2
619
```
620
621
### Calendar Operations
622
623
```python
624
# Get all service dates
625
feed = gk.read_feed('gtfs.zip', dist_units='km')
626
all_dates = gk.get_dates(feed)
627
628
# Get first week of service
629
first_week = gk.get_first_week(feed, as_date_obj=True)
630
631
# Get specific week
632
week_3 = gk.get_week(feed, 2) # Third week (0-indexed)
633
```
634
635
### Feed Analysis and Quality Assessment
636
637
```python
638
# Get comprehensive feed description
639
description = gk.describe(feed, sample_date='20230101')
640
print(description)
641
642
# Assess feed quality
643
quality = gk.assess_quality(feed)
644
print(f"Quality score: {quality}")
645
646
# List field specifications
647
field_info = gk.list_fields(feed, table='routes')
648
```
649
650
### Feed Filtering
651
652
```python
653
# Create a copy for filtering
654
filtered_feed = feed.copy()
655
656
# Restrict to specific routes
657
route_ids = ['route_1', 'route_2']
658
gk.restrict_to_routes(filtered_feed, route_ids)
659
660
# Restrict to date range
661
dates = ['20230101', '20230102', '20230103']
662
gk.restrict_to_dates(filtered_feed, dates)
663
664
# Restrict to geographic area
665
from shapely.geometry import Polygon
666
bbox = Polygon([(-122.5, 37.7), (-122.3, 37.7), (-122.3, 37.8), (-122.5, 37.8)])
667
gk.restrict_to_area(filtered_feed, bbox)
668
```
669
670
### Timetable Generation
671
672
```python
673
# Build route timetable
674
route_timetable = gk.build_route_timetable(feed, 'route_1', ['20230101'])
675
676
# Build stop timetable
677
stop_timetable = gk.build_stop_timetable(feed, 'stop_123', ['20230101'])
678
```
679
680
The utilities module provides essential infrastructure for GTFS data manipulation, analysis, and quality assurance workflows.