or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-analysis.mddata-processing.mdfeed-operations.mdgeospatial.mdindex.mdutilities.md

utilities.mddocs/

0

# Utilities

1

2

Constants, helper functions, and miscellaneous utilities for GTFS data manipulation. This module includes configuration constants, calendar operations, feed information functions, and various utility functions.

3

4

## Constants

5

6

### GTFS Reference Data

7

8

```python { .api }

9

GTFS_REF: pd.DataFrame

10

```

11

Reference DataFrame containing GTFS table and column specifications with data types, requirements, and validation rules.

12

13

```python { .api }

14

DTYPE: dict

15

```

16

Data types dictionary for Pandas CSV reads based on GTFS reference specifications.

17

18

```python { .api }

19

FEED_ATTRS: list

20

```

21

List of primary feed attributes for all standard GTFS tables: `['agency', 'stops', 'routes', 'trips', 'stop_times', 'calendar', 'calendar_dates', 'fare_attributes', 'fare_rules', 'shapes', 'frequencies', 'transfers', 'feed_info', 'attributions']`.

22

23

### Unit Constants

24

25

```python { .api }

26

DIST_UNITS: list

27

```

28

Valid distance units: `['ft', 'mi', 'm', 'km']`.

29

30

```python { .api }

31

WGS84: str

32

```

33

WGS84 coordinate reference system identifier: `'EPSG:4326'`.

34

35

### Visualization Constants

36

37

```python { .api }

38

COLORS_SET2: list

39

```

40

Colorbrewer 8-class Set2 colors for visualizations: A list of hex color codes optimized for categorical data visualization.

41

42

```python { .api }

43

STOP_STYLE: dict

44

```

45

Default Leaflet circleMarker style parameters for stop visualization on maps.

46

47

## Calendar Operations

48

49

### Date Management

50

51

```python { .api }

52

def get_dates(feed, *, as_date_obj=False):

53

"""

54

Get all valid service dates for the feed.

55

56

Parameters:

57

- feed (Feed): GTFS feed object

58

- as_date_obj (bool): Return as datetime.date objects instead of strings

59

60

Returns:

61

- list: List of valid service dates

62

"""

63

64

def subset_dates(feed, dates):

65

"""

66

Subset dates to those within feed's service period.

67

68

Parameters:

69

- feed (Feed): GTFS feed object

70

- dates (list): List of dates to filter

71

72

Returns:

73

- list: Filtered dates within feed service period

74

"""

75

```

76

77

### Week Operations

78

79

```python { .api }

80

def get_week(feed, k, *, as_date_obj=False):

81

"""

82

Get the kth Monday-Sunday week of feed service period.

83

84

Parameters:

85

- feed (Feed): GTFS feed object

86

- k (int): Week number (0-indexed)

87

- as_date_obj (bool): Return as datetime.date objects

88

89

Returns:

90

- list: List of dates in the specified week

91

"""

92

93

def get_first_week(feed, *, as_date_obj=False):

94

"""

95

Get the first Monday-Sunday week of feed service period.

96

97

Parameters:

98

- feed (Feed): GTFS feed object

99

- as_date_obj (bool): Return as datetime.date objects

100

101

Returns:

102

- list: List of dates in the first week

103

"""

104

```

105

106

## Helper Functions

107

108

### Date and Time Utilities

109

110

```python { .api }

111

def datestr_to_date(x, format_str='%Y%m%d', *, inverse=False):

112

"""

113

Convert between date strings and datetime.date objects.

114

115

Parameters:

116

- x: Date string or datetime.date object

117

- format_str (str): Date format string

118

- inverse (bool): If True, convert date to string

119

120

Returns:

121

- datetime.date or str: Converted date

122

"""

123

124

def timestr_to_seconds(x, *, inverse=False, mod24=False):

125

"""

126

Convert time strings to seconds since midnight.

127

128

Parameters:

129

- x: Time string in HH:MM:SS format or seconds

130

- inverse (bool): If True, convert seconds to time string

131

- mod24 (bool): Apply modulo 24 hours

132

133

Returns:

134

- int or str: Seconds or time string

135

"""

136

137

def timestr_mod24(timestr):

138

"""

139

Apply modulo 24 hours to time string.

140

141

Parameters:

142

- timestr (str): Time string in HH:MM:SS format

143

144

Returns:

145

- int: Hours modulo 24

146

"""

147

148

def weekday_to_str(weekday, *, inverse=False):

149

"""

150

Convert between weekday numbers and strings.

151

152

Parameters:

153

- weekday: Weekday number (0=Monday) or string

154

- inverse (bool): If True, convert string to number

155

156

Returns:

157

- int or str: Weekday number or string

158

"""

159

```

160

161

### Geometric Utilities

162

163

```python { .api }

164

def get_segment_length(linestring, p, q=None):

165

"""

166

Get length of LineString segment.

167

168

Parameters:

169

- linestring: Shapely LineString

170

- p (float): Start position along line

171

- q (float, optional): End position along line

172

173

Returns:

174

- float: Segment length

175

"""

176

177

def is_metric(dist_units):

178

"""

179

Check if distance units are metric.

180

181

Parameters:

182

- dist_units (str): Distance units string

183

184

Returns:

185

- bool: True if metric units

186

"""

187

188

def get_convert_dist(dist_units_in, dist_units_out):

189

"""

190

Get distance conversion function.

191

192

Parameters:

193

- dist_units_in (str): Input distance units

194

- dist_units_out (str): Output distance units

195

196

Returns:

197

- function: Distance conversion function

198

"""

199

```

200

201

### Data Utilities

202

203

```python { .api }

204

def almost_equal(f, g):

205

"""

206

Check if two DataFrames are almost equal.

207

208

Parameters:

209

- f (DataFrame): First DataFrame

210

- g (DataFrame): Second DataFrame

211

212

Returns:

213

- bool: True if DataFrames are almost equal

214

"""

215

216

def is_not_null(df, col_name):

217

"""

218

Check if DataFrame column has non-null values.

219

220

Parameters:

221

- df (DataFrame): DataFrame to check

222

- col_name (str): Column name to check

223

224

Returns:

225

- bool: True if column has non-null values

226

"""

227

228

def get_max_runs(x):

229

"""

230

Get maximum run lengths in array.

231

232

Parameters:

233

- x: Array-like input

234

235

Returns:

236

- ndarray: Maximum run lengths

237

"""

238

239

def get_peak_indices(times, counts):

240

"""

241

Get indices of peak values in time series.

242

243

Parameters:

244

- times: Array of time values

245

- counts: Array of count values

246

247

Returns:

248

- ndarray: Indices of peaks

249

"""

250

251

def make_ids(n, prefix='id_'):

252

"""

253

Generate n unique ID strings.

254

255

Parameters:

256

- n (int): Number of IDs to generate

257

- prefix (str): Prefix for IDs

258

259

Returns:

260

- list: List of unique ID strings

261

"""

262

263

def longest_subsequence(seq, mode='strictly', order='increasing', key=None, *, index=False):

264

"""

265

Find longest subsequence in sequence.

266

267

Parameters:

268

- seq: Input sequence

269

- mode (str): Comparison mode ('strictly', 'non')

270

- order (str): Order ('increasing', 'decreasing')

271

- key: Key function for comparison

272

- index (bool): Return indices instead of values

273

274

Returns:

275

- list: Longest subsequence or indices

276

"""

277

```

278

279

### Time Series Utilities

280

281

```python { .api }

282

def get_active_trips_df(trip_times):

283

"""

284

Get active trips from trip times DataFrame.

285

286

Parameters:

287

- trip_times (DataFrame): Trip times data

288

289

Returns:

290

- Series: Active trips indicator

291

"""

292

293

def combine_time_series(time_series_dict, kind, *, split_directions=False):

294

"""

295

Combine multiple time series into one DataFrame.

296

297

Parameters:

298

- time_series_dict (dict): Dictionary of time series

299

- kind (str): Type of time series

300

- split_directions (bool): Split by direction

301

302

Returns:

303

- DataFrame: Combined time series

304

"""

305

306

def downsample(time_series, freq):

307

"""

308

Downsample time series to lower frequency.

309

310

Parameters:

311

- time_series (DataFrame): Input time series

312

- freq (str): Target frequency

313

314

Returns:

315

- DataFrame: Downsampled time series

316

"""

317

318

def unstack_time_series(time_series):

319

"""

320

Unstack hierarchical time series columns.

321

322

Parameters:

323

- time_series (DataFrame): Hierarchical time series

324

325

Returns:

326

- DataFrame: Unstacked time series

327

"""

328

329

def restack_time_series(unstacked_time_series):

330

"""

331

Restack unstacked time series.

332

333

Parameters:

334

- unstacked_time_series (DataFrame): Unstacked time series

335

336

Returns:

337

- DataFrame: Restacked time series

338

"""

339

```

340

341

### HTML and GeoJSON Utilities

342

343

```python { .api }

344

def make_html(d):

345

"""

346

Convert dictionary to HTML representation.

347

348

Parameters:

349

- d (dict): Dictionary to convert

350

351

Returns:

352

- str: HTML string

353

"""

354

355

def drop_feature_ids(collection):

356

"""

357

Remove feature IDs from GeoJSON collection.

358

359

Parameters:

360

- collection (dict): GeoJSON FeatureCollection

361

362

Returns:

363

- dict: Collection without feature IDs

364

"""

365

```

366

367

## Feed Information and Quality Assessment

368

369

### Feed Metadata

370

371

```python { .api }

372

def list_fields(feed, table=None):

373

"""

374

Describe GTFS table fields and their specifications.

375

376

Parameters:

377

- feed (Feed): GTFS feed object

378

- table (str, optional): Specific table to describe

379

380

Returns:

381

- DataFrame: Field descriptions and specifications

382

"""

383

384

def describe(feed, sample_date=None):

385

"""

386

Get comprehensive feed indicators and summary values.

387

388

Parameters:

389

- feed (Feed): GTFS feed object

390

- sample_date (str, optional): Date for date-specific metrics

391

392

Returns:

393

- dict: Feed description with key indicators

394

"""

395

396

def assess_quality(feed):

397

"""

398

Assess feed quality using various indicators.

399

400

Parameters:

401

- feed (Feed): GTFS feed object

402

403

Returns:

404

- dict: Quality assessment scores and indicators

405

"""

406

```

407

408

### Feed Modification

409

410

```python { .api }

411

def convert_dist(feed, new_dist_units):

412

"""

413

Convert feed distance units to new units.

414

415

Parameters:

416

- feed (Feed): GTFS feed object (modified in-place)

417

- new_dist_units (str): Target distance units

418

419

Returns:

420

- Feed: Feed with converted distance units

421

"""

422

423

def create_shapes(feed, *, all_trips=False):

424

"""

425

Create shapes by connecting stop coordinates for trips.

426

427

Parameters:

428

- feed (Feed): GTFS feed object (modified in-place)

429

- all_trips (bool): Create shapes for all trips vs only those without shapes

430

431

Returns:

432

- Feed: Feed with generated shapes

433

"""

434

```

435

436

## Feed Filtering and Restriction

437

438

### Trip-Based Filtering

439

440

```python { .api }

441

def restrict_to_trips(feed, trip_ids):

442

"""

443

Restrict feed to specific trips and related entities.

444

445

Parameters:

446

- feed (Feed): GTFS feed object (modified in-place)

447

- trip_ids (list): Trip IDs to retain

448

449

Returns:

450

- Feed: Feed restricted to specified trips

451

"""

452

453

def restrict_to_routes(feed, route_ids):

454

"""

455

Restrict feed to specific routes and related entities.

456

457

Parameters:

458

- feed (Feed): GTFS feed object (modified in-place)

459

- route_ids (list): Route IDs to retain

460

461

Returns:

462

- Feed: Feed restricted to specified routes

463

"""

464

465

def restrict_to_agencies(feed, agency_ids):

466

"""

467

Restrict feed to specific agencies and related entities.

468

469

Parameters:

470

- feed (Feed): GTFS feed object (modified in-place)

471

- agency_ids (list): Agency IDs to retain

472

473

Returns:

474

- Feed: Feed restricted to specified agencies

475

"""

476

```

477

478

### Temporal and Spatial Filtering

479

480

```python { .api }

481

def restrict_to_dates(feed, dates):

482

"""

483

Restrict feed to specific service dates.

484

485

Parameters:

486

- feed (Feed): GTFS feed object (modified in-place)

487

- dates (list): Dates to retain

488

489

Returns:

490

- Feed: Feed restricted to specified dates

491

"""

492

493

def restrict_to_area(feed, area):

494

"""

495

Restrict feed to stops and related entities within geographic area.

496

497

Parameters:

498

- feed (Feed): GTFS feed object (modified in-place)

499

- area: Shapely Polygon or MultiPolygon defining the area

500

501

Returns:

502

- Feed: Feed restricted to specified geographic area

503

"""

504

```

505

506

## Advanced Analysis

507

508

### Screen Line Analysis

509

510

```python { .api }

511

def compute_screen_line_counts(feed, screen_lines, dates, segmentize_m=5, *, include_testing_cols=False):

512

"""

513

Compute transit line crossing counts at screen lines.

514

515

Parameters:

516

- feed (Feed): GTFS feed object

517

- screen_lines: Collection of LineString geometries

518

- dates (list): Dates to analyze

519

- segmentize_m (float): Segmentization distance in meters

520

- include_testing_cols (bool): Include debugging columns

521

522

Returns:

523

- DataFrame: Screen line crossing counts by route and time period

524

"""

525

```

526

527

### Stop Time Operations

528

529

```python { .api }

530

def get_stop_times(feed, date=None):

531

"""

532

Get stop_times DataFrame optionally filtered by date.

533

534

Parameters:

535

- feed (Feed): GTFS feed object

536

- date (str, optional): Filter by service date (YYYYMMDD)

537

538

Returns:

539

- DataFrame: Stop times data

540

"""

541

542

def append_dist_to_stop_times(feed):

543

"""

544

Calculate and append shape_dist_traveled to stop_times.

545

546

Parameters:

547

- feed (Feed): GTFS feed object (modified in-place)

548

549

Returns:

550

- Feed: Feed with updated stop_times

551

"""

552

553

def get_start_and_end_times(feed, date=None):

554

"""

555

Get first departure and last arrival times for the feed.

556

557

Parameters:

558

- feed (Feed): GTFS feed object

559

- date (str, optional): Specific date to analyze

560

561

Returns:

562

- tuple: (earliest_departure, latest_arrival) as time strings

563

"""

564

```

565

566

### Timetable Generation

567

568

```python { .api }

569

def build_route_timetable(feed, route_id, dates):

570

"""

571

Build timetable for a specific route.

572

573

Parameters:

574

- feed (Feed): GTFS feed object

575

- route_id (str): Route ID to build timetable for

576

- dates (list): Dates to include in timetable

577

578

Returns:

579

- DataFrame: Route timetable with stop times

580

"""

581

582

def build_stop_timetable(feed, stop_id, dates):

583

"""

584

Build timetable for a specific stop.

585

586

Parameters:

587

- feed (Feed): GTFS feed object

588

- stop_id (str): Stop ID to build timetable for

589

- dates (list): Dates to include in timetable

590

591

Returns:

592

- DataFrame: Stop timetable with arrival/departure times

593

"""

594

```

595

596

## Package Metadata

597

598

```python { .api }

599

__version__: str

600

```

601

Package version string: "10.3.0".

602

603

## Usage Examples

604

605

### Working with Constants

606

607

```python

608

import gtfs_kit as gk

609

610

# Check available distance units

611

print(gk.DIST_UNITS) # ['ft', 'mi', 'm', 'km']

612

613

# Use GTFS reference data

614

gtfs_spec = gk.GTFS_REF

615

print(gtfs_spec[gtfs_spec['table'] == 'routes'])

616

617

# Use colors for visualization

618

colors = gk.COLORS_SET2

619

```

620

621

### Calendar Operations

622

623

```python

624

# Get all service dates

625

feed = gk.read_feed('gtfs.zip', dist_units='km')

626

all_dates = gk.get_dates(feed)

627

628

# Get first week of service

629

first_week = gk.get_first_week(feed, as_date_obj=True)

630

631

# Get specific week

632

week_3 = gk.get_week(feed, 2) # Third week (0-indexed)

633

```

634

635

### Feed Analysis and Quality Assessment

636

637

```python

638

# Get comprehensive feed description

639

description = gk.describe(feed, sample_date='20230101')

640

print(description)

641

642

# Assess feed quality

643

quality = gk.assess_quality(feed)

644

print(f"Quality score: {quality}")

645

646

# List field specifications

647

field_info = gk.list_fields(feed, table='routes')

648

```

649

650

### Feed Filtering

651

652

```python

653

# Create a copy for filtering

654

filtered_feed = feed.copy()

655

656

# Restrict to specific routes

657

route_ids = ['route_1', 'route_2']

658

gk.restrict_to_routes(filtered_feed, route_ids)

659

660

# Restrict to date range

661

dates = ['20230101', '20230102', '20230103']

662

gk.restrict_to_dates(filtered_feed, dates)

663

664

# Restrict to geographic area

665

from shapely.geometry import Polygon

666

bbox = Polygon([(-122.5, 37.7), (-122.3, 37.7), (-122.3, 37.8), (-122.5, 37.8)])

667

gk.restrict_to_area(filtered_feed, bbox)

668

```

669

670

### Timetable Generation

671

672

```python

673

# Build route timetable

674

route_timetable = gk.build_route_timetable(feed, 'route_1', ['20230101'])

675

676

# Build stop timetable

677

stop_timetable = gk.build_stop_timetable(feed, 'stop_123', ['20230101'])

678

```

679

680

The utilities module provides essential infrastructure for GTFS data manipulation, analysis, and quality assurance workflows.