or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-gtfs-kit

A Python library for analyzing GTFS feeds with comprehensive data validation, statistical analysis, and geospatial operations.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/gtfs-kit@9.0.x

To install, run

npx @tessl/cli install tessl/pypi-gtfs-kit@9.0.0

0

# gtfs-kit - Python GTFS Analysis Library

1

2

A comprehensive Python library for analyzing General Transit Feed Specification (GTFS) data in memory without requiring a database.

3

4

## Package Information

5

6

- **Package Name**: gtfs-kit

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install gtfs-kit`

10

- **Version**: 9.0.0

11

- **Type**: Transit data analysis library

12

- **License**: MIT

13

- **Documentation**: https://mrcagney.github.io/gtfs_kit_docs

14

15

## Core Imports

16

17

```python

18

import gtfs_kit as gk

19

from gtfs_kit import Feed

20

import pandas as pd

21

import geopandas as gpd

22

```

23

24

{ .api }

25

**Primary Class**

26

- `Feed` - Core class representing a GTFS dataset with validation, analysis, and manipulation capabilities

27

28

**Constants**

29

- `GTFS_REF` (pd.DataFrame) - GTFS specification reference table

30

- `DTYPE` (dict) - Optimized data types for CSV parsing

31

- `DIST_UNITS` (list) - Valid distance units: ["ft", "mi", "m", "km"]

32

- `WGS84` (str) - WGS84 coordinate system: "EPSG:4326"

33

34

## Basic Usage

35

36

### Reading and Creating Feeds

37

38

```python

39

# Read from local path or URL

40

feed = gk.read_feed("path/to/gtfs.zip", dist_units="km")

41

42

# Access GTFS tables as pandas DataFrames

43

stops = feed.stops

44

routes = feed.routes

45

trips = feed.trips

46

stop_times = feed.stop_times

47

48

# Write feed to file

49

feed.write("output/path", ndigits=6)

50

```

51

52

### Quick Analysis

53

54

```python

55

# Get feed overview

56

description = gk.describe(feed, sample_date="20240315")

57

58

# Get valid service dates

59

dates = gk.get_dates(feed)

60

61

# Find busiest service day

62

busiest_date = gk.compute_busiest_date(feed, dates)

63

64

# Basic route statistics

65

trip_stats = gk.compute_trip_stats(feed, route_ids=None)

66

route_stats = gk.compute_route_stats(feed, trip_stats, [busiest_date],

67

"07:00:00", "19:00:00")

68

```

69

70

## Architecture

71

72

### Feed Class Structure

73

74

The `Feed` class contains all GTFS tables as pandas DataFrames:

75

76

- **Core Tables**: `agency`, `stops`, `routes`, `trips`, `stop_times`

77

- **Calendar**: `calendar`, `calendar_dates`

78

- **Optional**: `shapes`, `frequencies`, `transfers`, `fare_attributes`, `fare_rules`, `feed_info`, `attributions`

79

- **Metadata**: `dist_units` property for distance measurements

80

81

### Key Concepts

82

83

- **In-Memory Processing**: All operations use pandas/GeoPandas without databases

84

- **Geospatial Support**: Built-in conversion to GeoDataFrames for spatial analysis

85

- **Time Series Analysis**: Comprehensive functions for analyzing service patterns over time

86

- **Validation**: Complete GTFS specification compliance checking

87

- **Visualization**: Integration with Folium for interactive transit maps

88

89

## Capabilities

90

91

### [Feed Operations](./feed-operations.md)

92

Core Feed class operations, reading/writing feeds, and basic data manipulation.

93

94

```python { .api }

95

# Feed I/O and basic operations

96

read_feed(path_or_url: str, dist_units: str = "km") -> Feed

97

list_feed(path: str) -> pd.DataFrame

98

99

# Feed class methods

100

Feed.copy() -> Feed

101

Feed.write(path: str, ndigits: int = 6) -> None

102

Feed.__eq__(other: Feed) -> bool

103

```

104

105

### [Validation](./validation.md)

106

Comprehensive GTFS specification validation and data quality assessment.

107

108

```python { .api }

109

# Main validation functions

110

validate(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list | pd.DataFrame

111

assess_quality(feed: Feed) -> pd.DataFrame

112

113

# Individual table validation

114

check_agency(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list

115

check_routes(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list

116

check_stops(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list

117

check_trips(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list

118

check_stop_times(feed: Feed, *, as_df: bool = True, include_warnings: bool = True) -> list

119

```

120

121

### [Data Analysis](./data-analysis.md)

122

Statistical analysis, feed summaries, and computational functions for transit metrics.

123

124

```python { .api }

125

# Feed analysis and statistics

126

describe(feed: Feed, sample_date: str) -> pd.DataFrame

127

summarize(feed: Feed, table: str) -> pd.DataFrame

128

compute_feed_stats(feed: Feed, trip_stats: pd.DataFrame, dates: list[str], *, split_route_types: bool = False) -> pd.DataFrame

129

130

# Trip and route analysis

131

compute_trip_stats(feed: Feed, route_ids: list[str] | None, *, compute_dist_from_shapes: bool = False) -> pd.DataFrame

132

compute_route_stats(feed: Feed, trip_stats_subset: pd.DataFrame, dates: list[str],

133

headway_start_time: str, headway_end_time: str, *, split_directions: bool = False) -> pd.DataFrame

134

```

135

136

### [Geospatial Operations](./geospatial.md)

137

Geographic analysis, spatial filtering, and coordinate system transformations.

138

139

```python { .api }

140

# Geometric conversions and spatial data

141

geometrize_stops(stops: pd.DataFrame, *, use_utm: bool = False) -> gpd.GeoDataFrame

142

geometrize_shapes(shapes: pd.DataFrame, *, use_utm: bool = False) -> gpd.GeoDataFrame

143

144

# Spatial analysis functions

145

compute_bounds(feed: Feed, stop_ids: list[str] | None) -> np.array

146

compute_convex_hull(feed: Feed, stop_ids: list[str] | None) -> sg.Polygon

147

compute_centroid(feed: Feed, stop_ids: list[str] | None) -> sg.Point

148

149

# Area-based filtering

150

get_stops_in_area(feed: Feed, area: sg.Polygon) -> pd.DataFrame

151

restrict_to_area(feed: Feed, area: sg.Polygon) -> Feed

152

```

153

154

### [Time Series Analysis](./time-series.md)

155

Time-based analysis, service frequency computation, and temporal patterns.

156

157

```python { .api }

158

# Time series computation

159

compute_route_time_series(feed: Feed, trip_stats_subset: pd.DataFrame, dates: list[str],

160

freq: str, *, split_directions: bool = False) -> pd.DataFrame

161

compute_stop_time_series(feed: Feed, dates: list[str], stop_ids: list[str] | None,

162

freq: str, *, split_directions: bool = False) -> pd.DataFrame

163

compute_feed_time_series(feed: Feed, trip_stats: pd.DataFrame, dates: list[str],

164

freq: str, *, split_route_types: bool = False) -> pd.DataFrame

165

166

# Time series utilities

167

downsample(time_series: pd.DataFrame, freq: str) -> pd.DataFrame

168

combine_time_series(time_series_dict: dict, kind: str, *, split_directions: bool = False) -> pd.DataFrame

169

```

170

171

### [Data Cleaning](./data-cleaning.md)

172

Data cleaning, transformation, and feed modification functions.

173

174

```python { .api }

175

# Comprehensive cleaning

176

clean(feed: Feed) -> Feed

177

drop_zombies(feed: Feed) -> Feed

178

clean_ids(feed: Feed) -> Feed

179

clean_times(feed: Feed) -> Feed

180

181

# Feed restrictions and filtering

182

restrict_to_routes(feed: Feed, route_ids: list[str]) -> Feed

183

restrict_to_agencies(feed: Feed, agency_ids: list[str]) -> Feed

184

restrict_to_dates(feed: Feed, dates: list[str]) -> Feed

185

186

# Data aggregation

187

aggregate_routes(feed: Feed, by: str, route_id_prefix: str = "route_") -> Feed

188

aggregate_stops(feed: Feed, by: str, stop_id_prefix: str = "stop_") -> Feed

189

```