0
# gtfs-kit - Python GTFS Analysis Library
1
2
A comprehensive Python library for analyzing General Transit Feed Specification (GTFS) data in memory without requiring a database.
3
4
## Package Information
5
6
- **Package Name**: gtfs-kit
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install gtfs-kit`
10
- **Version**: 9.0.0
11
- **Type**: Transit data analysis library
12
- **License**: MIT
13
- **Documentation**: https://mrcagney.github.io/gtfs_kit_docs
14
15
## Core Imports
16
17
```python
18
import gtfs_kit as gk
19
from gtfs_kit import Feed
20
import pandas as pd
21
import geopandas as gpd
22
```
23
24
{ .api }
25
**Primary Class**
26
- `Feed` - Core class representing a GTFS dataset with validation, analysis, and manipulation capabilities
27
28
**Constants**
29
- `GTFS_REF` (pd.DataFrame) - GTFS specification reference table
30
- `DTYPE` (dict) - Optimized data types for CSV parsing
31
- `DIST_UNITS` (list) - Valid distance units: ["ft", "mi", "m", "km"]
32
- `WGS84` (str) - WGS84 coordinate system: "EPSG:4326"
33
34
## Basic Usage
35
36
### Reading and Creating Feeds
37
38
```python
39
# Read from local path or URL
40
feed = gk.read_feed("path/to/gtfs.zip", dist_units="km")
41
42
# Access GTFS tables as pandas DataFrames
43
stops = feed.stops
44
routes = feed.routes
45
trips = feed.trips
46
stop_times = feed.stop_times
47
48
# Write feed to file
49
feed.write("output/path", ndigits=6)
50
```
51
52
### Quick Analysis
53
54
```python
55
# Get feed overview
56
description = gk.describe(feed, sample_date="20240315")
57
58
# Get valid service dates
59
dates = gk.get_dates(feed)
60
61
# Find busiest service day
62
busiest_date = gk.compute_busiest_date(feed, dates)
63
64
# Basic route statistics
65
trip_stats = gk.compute_trip_stats(feed, route_ids=None)
66
route_stats = gk.compute_route_stats(feed, trip_stats, [busiest_date],
67
"07:00:00", "19:00:00")
68
```
69
70
## Architecture
71
72
### Feed Class Structure
73
74
The `Feed` class contains all GTFS tables as pandas DataFrames:
75
76
- **Core Tables**: `agency`, `stops`, `routes`, `trips`, `stop_times`
77
- **Calendar**: `calendar`, `calendar_dates`
78
- **Optional**: `shapes`, `frequencies`, `transfers`, `fare_attributes`, `fare_rules`, `feed_info`, `attributions`
79
- **Metadata**: `dist_units` property for distance measurements
80
81
### Key Concepts
82
83
- **In-Memory Processing**: All operations use pandas/GeoPandas without databases
84
- **Geospatial Support**: Built-in conversion to GeoDataFrames for spatial analysis
85
- **Time Series Analysis**: Comprehensive functions for analyzing service patterns over time
86
- **Validation**: Complete GTFS specification compliance checking
87
- **Visualization**: Integration with Folium for interactive transit maps
88
89
## Capabilities
90
91
### [Feed Operations](./feed-operations.md)
92
Core Feed class operations, reading/writing feeds, and basic data manipulation.
93
94
```python { .api }
95
# Feed I/O and basic operations
96
read_feed(path_or_url: str, dist_units: str = "km") -> Feed
97
list_feed(path: str) -> pd.DataFrame
98
99
# Feed class methods
100
Feed.copy() -> Feed
101
Feed.write(path: str, ndigits: int = 6) -> None
102
Feed.__eq__(other: Feed) -> bool
103
```
104
105
### [Validation](./validation.md)
106
Comprehensive GTFS specification validation and data quality assessment.
107
108
```python { .api }
109
# Main validation functions
110
validate(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list | pd.DataFrame
111
assess_quality(feed: Feed) -> pd.DataFrame
112
113
# Individual table validation
114
check_agency(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list
115
check_routes(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list
116
check_stops(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list
117
check_trips(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list
118
check_stop_times(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list
119
```
120
121
### [Data Analysis](./data-analysis.md)
122
Statistical analysis, feed summaries, and computational functions for transit metrics.
123
124
```python { .api }
125
# Feed analysis and statistics
126
describe(feed: Feed, sample_date: str) -> pd.DataFrame
127
summarize(feed: Feed, table: str) -> pd.DataFrame
128
compute_feed_stats(feed: Feed, trip_stats: pd.DataFrame, dates: list[str], *, split_route_types: bool = False) -> pd.DataFrame
129
130
# Trip and route analysis
131
compute_trip_stats(feed: Feed, route_ids: list[str] | None, *, compute_dist_from_shapes: bool = False) -> pd.DataFrame
132
compute_route_stats(feed: Feed, trip_stats_subset: pd.DataFrame, dates: list[str],
133
headway_start_time: str, headway_end_time: str, *, split_directions: bool = False) -> pd.DataFrame
134
```
135
136
### [Geospatial Operations](./geospatial.md)
137
Geographic analysis, spatial filtering, and coordinate system transformations.
138
139
```python { .api }
140
# Geometric conversions and spatial data
141
geometrize_stops(stops: pd.DataFrame, *, use_utm: bool = False) -> gpd.GeoDataFrame
142
geometrize_shapes(shapes: pd.DataFrame, *, use_utm: bool = False) -> gpd.GeoDataFrame
143
144
# Spatial analysis functions
145
compute_bounds(feed: Feed, stop_ids: list[str] | None) -> np.array
146
compute_convex_hull(feed: Feed, stop_ids: list[str] | None) -> sg.Polygon
147
compute_centroid(feed: Feed, stop_ids: list[str] | None) -> sg.Point
148
149
# Area-based filtering
150
get_stops_in_area(feed: Feed, area: sg.Polygon) -> pd.DataFrame
151
restrict_to_area(feed: Feed, area: sg.Polygon) -> Feed
152
```
153
154
### [Time Series Analysis](./time-series.md)
155
Time-based analysis, service frequency computation, and temporal patterns.
156
157
```python { .api }
158
# Time series computation
159
compute_route_time_series(feed: Feed, trip_stats_subset: pd.DataFrame, dates: list[str],
160
freq: str, *, split_directions: bool = False) -> pd.DataFrame
161
compute_stop_time_series(feed: Feed, dates: list[str], stop_ids: list[str] | None,
162
freq: str, *, split_directions: bool = False) -> pd.DataFrame
163
compute_feed_time_series(feed: Feed, trip_stats: pd.DataFrame, dates: list[str],
164
freq: str, *, split_route_types: bool = False) -> pd.DataFrame
165
166
# Time series utilities
167
downsample(time_series: pd.DataFrame, freq: str) -> pd.DataFrame
168
combine_time_series(time_series_dict: dict, kind: str, *, split_directions: bool = False) -> pd.DataFrame
169
```
170
171
### [Data Cleaning](./data-cleaning.md)
172
Data cleaning, transformation, and feed modification functions.
173
174
```python { .api }
175
# Comprehensive cleaning
176
clean(feed: Feed) -> Feed
177
drop_zombies(feed: Feed) -> Feed
178
clean_ids(feed: Feed) -> Feed
179
clean_times(feed: Feed) -> Feed
180
181
# Feed restrictions and filtering
182
restrict_to_routes(feed: Feed, route_ids: list[str]) -> Feed
183
restrict_to_agencies(feed: Feed, agency_ids: list[str]) -> Feed
184
restrict_to_dates(feed: Feed, dates: list[str]) -> Feed
185
186
# Data aggregation
187
aggregate_routes(feed: Feed, by: str, route_id_prefix: str = "route_") -> Feed
188
aggregate_stops(feed: Feed, by: str, stop_id_prefix: str = "stop_") -> Feed
189
```