or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-analysis.mddata-processing.mdfeed-operations.mdgeospatial.mdindex.mdutilities.md

data-analysis.mddocs/

0

# Data Analysis

1

2

Statistical analysis, time series computation, and performance metrics for transit operations. This module provides comprehensive analysis capabilities for routes, stops, trips, and system-wide metrics across multiple dates and time periods.

3

4

## Route Analysis

5

6

### Route Statistics

7

8

```python { .api }

9

def compute_route_stats_0(trip_stats_subset, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):

10

"""

11

Compute route statistics for a trip subset.

12

13

Parameters:

14

- trip_stats_subset (DataFrame): Subset of trip statistics

15

- headway_start_time (str): Start time for headway calculations

16

- headway_end_time (str): End time for headway calculations

17

- split_directions (bool): Split statistics by direction

18

19

Returns:

20

- DataFrame: Route statistics

21

"""

22

23

def compute_route_stats(feed, trip_stats_subset, dates, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):

24

"""

25

Compute route statistics for multiple dates.

26

27

Parameters:

28

- feed (Feed): GTFS feed object

29

- trip_stats_subset (DataFrame): Trip statistics subset

30

- dates (list): List of dates to analyze

31

- headway_start_time (str): Start time for headway calculations

32

- headway_end_time (str): End time for headway calculations

33

- split_directions (bool): Split statistics by direction

34

35

Returns:

36

- DataFrame: Route statistics with date index

37

"""

38

```

39

40

### Route Time Series

41

42

```python { .api }

43

def compute_route_time_series_0(trip_stats_subset, date_label='20010101', freq='5Min', *, split_directions=False):

44

"""

45

Compute route time series for a trip subset.

46

47

Parameters:

48

- trip_stats_subset (DataFrame): Trip statistics subset

49

- date_label (str): Date label for the time series

50

- freq (str): Frequency for time series sampling

51

- split_directions (bool): Split by direction

52

53

Returns:

54

- DataFrame: Route time series

55

"""

56

57

def build_zero_route_time_series(feed, date_label='20010101', freq='5Min', *, split_directions=False):

58

"""

59

Build a zero-filled route time series template.

60

61

Parameters:

62

- feed (Feed): GTFS feed object

63

- date_label (str): Date label for the time series

64

- freq (str): Frequency for time series sampling

65

- split_directions (bool): Split by direction

66

67

Returns:

68

- DataFrame: Zero-filled route time series

69

"""

70

71

def compute_route_time_series(feed, trip_stats_subset, dates, freq='5Min', *, split_directions=False):

72

"""

73

Compute route time series for multiple dates.

74

75

Parameters:

76

- feed (Feed): GTFS feed object

77

- trip_stats_subset (DataFrame): Trip statistics subset

78

- dates (list): List of dates to analyze

79

- freq (str): Frequency for time series sampling

80

- split_directions (bool): Split by direction

81

82

Returns:

83

- DataFrame: Route time series with hierarchical columns

84

"""

85

```

86

87

## Stop Analysis

88

89

### Stop Statistics

90

91

```python { .api }

92

def compute_stop_stats_0(stop_times_subset, trip_subset, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):

93

"""

94

Compute stop statistics for data subsets.

95

96

Parameters:

97

- stop_times_subset (DataFrame): Stop times subset

98

- trip_subset (DataFrame): Trip subset

99

- headway_start_time (str): Start time for headway calculations

100

- headway_end_time (str): End time for headway calculations

101

- split_directions (bool): Split statistics by direction

102

103

Returns:

104

- DataFrame: Stop statistics

105

"""

106

107

def compute_stop_stats(feed, dates, stop_ids=None, headway_start_time='07:00:00', headway_end_time='19:00:00', *, split_directions=False):

108

"""

109

Compute stop statistics for specified dates.

110

111

Parameters:

112

- feed (Feed): GTFS feed object

113

- dates (list): List of dates to analyze

114

- stop_ids (list, optional): Specific stop IDs to analyze

115

- headway_start_time (str): Start time for headway calculations

116

- headway_end_time (str): End time for headway calculations

117

- split_directions (bool): Split statistics by direction

118

119

Returns:

120

- DataFrame: Stop statistics with date index

121

"""

122

123

def compute_stop_activity(feed, dates):

124

"""

125

Mark stops as active or inactive on specified dates.

126

127

Parameters:

128

- feed (Feed): GTFS feed object

129

- dates (list): List of dates to analyze

130

131

Returns:

132

- DataFrame: Stop activity indicators by date

133

"""

134

```

135

136

### Stop Time Series

137

138

```python { .api }

139

def compute_stop_time_series_0(stop_times_subset, trip_subset, freq='5Min', date_label='20010101', *, split_directions=False):

140

"""

141

Compute stop time series for data subsets.

142

143

Parameters:

144

- stop_times_subset (DataFrame): Stop times subset

145

- trip_subset (DataFrame): Trip subset

146

- freq (str): Frequency for time series sampling

147

- date_label (str): Date label for the time series

148

- split_directions (bool): Split by direction

149

150

Returns:

151

- DataFrame: Stop time series

152

"""

153

154

def build_zero_stop_time_series(feed, date_label='20010101', freq='5Min', *, split_directions=False):

155

"""

156

Build a zero-filled stop time series template.

157

158

Parameters:

159

- feed (Feed): GTFS feed object

160

- date_label (str): Date label for the time series

161

- freq (str): Frequency for time series sampling

162

- split_directions (bool): Split by direction

163

164

Returns:

165

- DataFrame: Zero-filled stop time series

166

"""

167

168

def compute_stop_time_series(feed, dates, stop_ids=None, freq='5Min', *, split_directions=False):

169

"""

170

Compute stop time series for specified dates.

171

172

Parameters:

173

- feed (Feed): GTFS feed object

174

- dates (list): List of dates to analyze

175

- stop_ids (list, optional): Specific stop IDs to analyze

176

- freq (str): Frequency for time series sampling

177

- split_directions (bool): Split by direction

178

179

Returns:

180

- DataFrame: Stop time series with hierarchical columns

181

"""

182

```

183

184

## Trip Analysis

185

186

### Trip Statistics and Operations

187

188

```python { .api }

189

def get_active_services(feed, date):

190

"""

191

Get list of service IDs active on a specific date.

192

193

Parameters:

194

- feed (Feed): GTFS feed object

195

- date (str): Date in YYYYMMDD format

196

197

Returns:

198

- list: Service IDs active on the date

199

"""

200

201

def compute_trip_activity(feed, dates):

202

"""

203

Mark trips as active or inactive on specified dates.

204

205

Parameters:

206

- feed (Feed): GTFS feed object

207

- dates (list): List of dates to analyze

208

209

Returns:

210

- DataFrame: Trip activity indicators by date

211

"""

212

213

def compute_busiest_date(feed, dates):

214

"""

215

Get the date with maximum number of active trips.

216

217

Parameters:

218

- feed (Feed): GTFS feed object

219

- dates (list): List of dates to analyze

220

221

Returns:

222

- str: Date with maximum active trips

223

"""

224

225

def compute_trip_stats(feed, route_ids=None, *, compute_dist_from_shapes=False):

226

"""

227

Compute comprehensive trip statistics.

228

229

Parameters:

230

- feed (Feed): GTFS feed object

231

- route_ids (list, optional): Specific route IDs to analyze

232

- compute_dist_from_shapes (bool): Calculate distances from shapes

233

234

Returns:

235

- DataFrame: Trip statistics including distances, durations, speeds

236

"""

237

238

def name_stop_patterns(feed):

239

"""

240

Assign stop pattern names to trips based on stop sequences.

241

242

Parameters:

243

- feed (Feed): GTFS feed object

244

245

Returns:

246

- DataFrame: Trips with assigned stop pattern names

247

"""

248

249

def locate_trips(feed, date, times):

250

"""

251

Get trip positions at specified times.

252

253

Parameters:

254

- feed (Feed): GTFS feed object

255

- date (str): Date in YYYYMMDD format

256

- times (list): List of times in HH:MM:SS format

257

258

Returns:

259

- DataFrame: Trip positions and status at specified times

260

"""

261

262

def build_route_timetable(feed, route_id, dates):

263

"""

264

Build a route timetable showing departure times at stops.

265

266

Parameters:

267

- feed (Feed): GTFS feed object

268

- route_id (str): Route ID to build timetable for

269

- dates (list): List of dates in YYYYMMDD format

270

271

Returns:

272

- DataFrame: Route timetable with stops and departure times

273

"""

274

275

def build_stop_timetable(feed, stop_id, dates):

276

"""

277

Build a stop timetable showing all arrivals/departures.

278

279

Parameters:

280

- feed (Feed): GTFS feed object

281

- stop_id (str): Stop ID to build timetable for

282

- dates (list): List of dates in YYYYMMDD format

283

284

Returns:

285

- DataFrame: Stop timetable with trip arrivals and departures

286

"""

287

```

288

289

## Feed-Level Analysis

290

291

### Feed Statistics

292

293

```python { .api }

294

def compute_feed_stats_0(feed, trip_stats_subset, *, split_route_types=False):

295

"""

296

Compute feed-level statistics for a trip subset.

297

298

Parameters:

299

- feed (Feed): GTFS feed object

300

- trip_stats_subset (DataFrame): Trip statistics subset

301

- split_route_types (bool): Split statistics by route type

302

303

Returns:

304

- DataFrame: Feed-level statistics

305

"""

306

307

def compute_feed_stats(feed, trip_stats, dates, *, split_route_types=False):

308

"""

309

Compute feed-level statistics for multiple dates.

310

311

Parameters:

312

- feed (Feed): GTFS feed object

313

- trip_stats (DataFrame): Trip statistics

314

- dates (list): List of dates to analyze

315

- split_route_types (bool): Split statistics by route type

316

317

Returns:

318

- DataFrame: Feed statistics with date index

319

"""

320

321

def compute_feed_time_series(feed, trip_stats, dates, freq='5Min', *, split_route_types=False):

322

"""

323

Compute feed-level time series for multiple dates.

324

325

Parameters:

326

- feed (Feed): GTFS feed object

327

- trip_stats (DataFrame): Trip statistics

328

- dates (list): List of dates to analyze

329

- freq (str): Frequency for time series sampling

330

- split_route_types (bool): Split by route type

331

332

Returns:

333

- DataFrame: Feed time series with hierarchical columns

334

"""

335

```

336

337

## Usage Examples

338

339

### Basic Route Analysis

340

341

```python

342

import gtfs_kit as gk

343

344

# Load feed and compute trip statistics

345

feed = gk.read_feed('gtfs.zip', dist_units='km')

346

trip_stats = gk.compute_trip_stats(feed)

347

348

# Analyze routes for specific dates

349

dates = ['20230101', '20230102', '20230103']

350

route_stats = gk.compute_route_stats(feed, trip_stats, dates)

351

352

# Generate route time series

353

route_ts = gk.compute_route_time_series(feed, trip_stats, dates, freq='15Min')

354

```

355

356

### Stop Performance Analysis

357

358

```python

359

# Compute stop statistics with custom headway period

360

stop_stats = gk.compute_stop_stats(

361

feed,

362

dates=['20230101'],

363

headway_start_time='06:00:00',

364

headway_end_time='22:00:00',

365

split_directions=True

366

)

367

368

# Generate stop time series

369

stop_ts = gk.compute_stop_time_series(

370

feed,

371

dates=['20230101'],

372

freq='10Min'

373

)

374

```

375

376

### System-Wide Analysis

377

378

```python

379

# Find the busiest operating day

380

busiest_date = gk.compute_busiest_date(feed, dates)

381

382

# Compute feed-level statistics

383

feed_stats = gk.compute_feed_stats(feed, trip_stats, dates, split_route_types=True)

384

385

# Generate system-wide time series

386

feed_ts = gk.compute_feed_time_series(feed, trip_stats, dates)

387

```

388

389

All analysis functions support flexible date ranges, time periods, and granularity options to accommodate different analytical needs and reporting requirements.