-
Notifications
You must be signed in to change notification settings - Fork 206
Description
Describe the problem
Currently, modeling a single conceptual trip (e.g., the 8:15 AM train from City A to City B) that operates on a complex, fragmented calendar requires creating multiple unique service_ids. For each of these service_ids, a separate, nearly identical row must be created in trips.txt.
This leads to significant data redundancy and several issues:
- The
trips.txtfile (and consequentlystop_times.txt) becomes unnecessarily large, as one conceptual journey is split into manytrip_ids. - The link that these separate trip entries all represent the same recurring service is lost. This is especially problematic for services that have a consistent identity (like a named train line) but a very fragmented schedule.
The core issue is that calendar.txt enforces a "one service_id per one continuous date range" rule, which doesn't align with the operational reality of many transport services, especially in rail.
Use cases
Consider a train with a complex, non-contiguous schedule common in European rail systems:
"Runs: Aug 31-Sep 4 (daily); Sep 19-27 (on Mon, Wed, Fri, Sat); Oct 6-25 (on Mon, Wed, Fri, Sat, Sun)"
To model this correctly today, a producer must create 3 unique service_ids and 3 corresponding rows in trips.txt, even though it's the exact same train service.
Current trips.txt:
| route_id | service_id | trip_id | ... |
|---|---|---|---|
route_123 |
service_A |
trip_001 |
... |
route_123 |
service_B |
trip_002 |
... |
route_123 |
service_C |
trip_003 |
... |
This is inefficient. The desired state is to define this entire complex schedule under a single service_id and have only one corresponding trip entry.
Proposed solution
Modify the GTFS specification to allow multiple entries for the same service_id in calendar.txt under one strict condition:
The [start_date, end_date] ranges for any given service_id MUST NOT overlap.
When a consumer application parses the feed, it should treat all entries for a single service_id as a logical UNION (OR). A trip associated with that service_id is considered active if the date falls within any of its defined date ranges and matches the day-of-week flags for that specific range.
Example of the proposed calendar.txt:
Using the schedule from the use case, calendar.txt would look like this:
| service_id | monday | tuesday | wednesday | thursday | friday | saturday | sunday | start_date | end_date |
|---|---|---|---|---|---|---|---|---|---|
train_815_service |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 20230831 |
20230904 |
train_815_service |
1 | 0 | 1 | 0 | 1 | 1 | 0 | 20230919 |
20230927 |
train_815_service |
1 | 0 | 1 | 0 | 1 | 1 | 1 | 20231006 |
20231025 |
Consequence for trips.txt:
This would allow the trips.txt file to be simplified to a single, logical entry:
| route_id | service_id | trip_id | ... |
|---|---|---|---|
route_123 |
train_815_service |
trip_815 |
... |
This solution directly addresses the problem of data duplication for producers while remaining relatively simple for consumers to implement, as the logic is a straightforward union of non-overlapping date ranges. It is far more concise than listing potentially hundreds of individual dates in calendar_dates.txt for services that run in fragmented multi-week blocks.
Additional information
No response