Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spain] Add GTFS feeds from NAP #118

Open
Robot8A opened this issue Mar 9, 2024 · 25 comments
Open

[Spain] Add GTFS feeds from NAP #118

Robot8A opened this issue Mar 9, 2024 · 25 comments
Labels
data sources adding, removing, modifying or discussing problems with specific data sources

Comments

@Robot8A
Copy link

Robot8A commented Mar 9, 2024

The Spanish NAP (National Access Point) is available here https://nap.mitma.es/. It has feeds from operator companies and regional governments mixed, so some preprocessing to avoid duplicates may be needed. The NAP only has the feeds from the companies and governments who want to release them, having therefore gaps in the coverage.

@derhuerst derhuerst added the data sources adding, removing, modifying or discussing problems with specific data sources label Mar 19, 2024
@Chaikney
Copy link

Chaikney commented Oct 2, 2024

I am having a quick look at this now. The NAP page shows 114 data sources and the es.json file appears to have 110 sources. (Crude count on "name" excluding the 2 maintainers.)
Before I go deeper into trying to find what the other 4 are, are there criteria that might be excluding some of those sources? (I thought it might exclude the ferries / maritimo, but the ferries are there.)

@jbruechert
Copy link
Collaborator

I think they might not have existed when the list was put together. If you can find a machine readable list from the NAP, it would be great to have a script to generate feeds/es.json, like we have for France and Austria.

@Robot8A
Copy link
Author

Robot8A commented Oct 2, 2024

There is an API, which documentation is here: https://nap.transportes.gob.es/Account/InstruccionesAPI

@Altonss
Copy link
Collaborator

Altonss commented Oct 11, 2024

Ouigo ES is currently in es.json but seems not working 🤔 Should I open a separate issue?
Iryo seems also to be missing currently :)

@Chaikney
Copy link

"Ouigo ES is currently in es.json but seems not working 🤔 Should I open a separate issue?"
The Ouigo file is not getting fetched correctly, or the data it returns is bad?
"Iryo seems also to be missing currently :)"
Iryo is not in that NAP list, true. I wonder if we wait for NAP to add it (Iryo are a new provider I think) or if we can track down their data first...

" it would be great to have a script to generate feeds/es.json, like we have for France and Austria."
Agreed this would be the optimal solution.
Between the API and the example scripts I might be able to hack something together but I am short on time and experience, so I am not committing to producing anything usable.

@Altonss
Copy link
Collaborator

Altonss commented Oct 11, 2024

The Ouigo file is not getting fetched correctly, or the data it returns is bad?

I don't know what is the exact issue, but no train between Madrid and Barcelona is shown...

@GerdC
Copy link

GerdC commented Jan 2, 2025

FGC realtime data https://dadesobertes.fgc.cat/explore/?sort=modified&refine.keyword=gtfs+realtime

@Altonss
Copy link
Collaborator

Altonss commented Jan 8, 2025

Is it possible that URLs change over time, for example we should have AECFA Slots, but nothing appears in Transitous : https://nap.transportes.gob.es/Files/Detail/920 ?

@jbruechert
Copy link
Collaborator

Is it possible that URLs change over time, for example we should have AECFA Slots, but nothing appears in Transitous : https://nap.transportes.gob.es/Files/Detail/920 ?

It would probably be good to be able to generate the list from the NAP automatically to catch such cases, like we already do for France.

@Altonss
Copy link
Collaborator

Altonss commented Jan 9, 2025

Is it possible that URLs change over time, for example we should have AECFA Slots, but nothing appears in Transitous : https://nap.transportes.gob.es/Files/Detail/920 ?

The file url used by Transitous seems to be fine, and the data correct too. Any idea on why the flights do not show up on the map and routing?

@jbruechert
Copy link
Collaborator

I can't check it on the phone right now, but the first step is too see whether the post-processed feed (https://routing.spline.de/gtfs/es_AECFA-slots.gtfs.zip) looks reasonable, and if not, what the import log on the CI says.

@jbruechert
Copy link
Collaborator

The stop time syntax is invalid in the feed, which causes no data to actually be imported.
See https://gtfs-validator-results.mobilitydata.org/b9479fe1-0ae8-423c-acd8-a07580fcb54a/report.html

@Altonss
Copy link
Collaborator

Altonss commented Jan 9, 2025

The stop time syntax is invalid in the feed, which causes no data to actually be imported. See https://gtfs-validator-results.mobilitydata.org/b9479fe1-0ae8-423c-acd8-a07580fcb54a/report.html

The CI/gtfsclean didn't complain about it? Should we add a fix to accept/convert from HH:MM?

@jbruechert
Copy link
Collaborator

jbruechert commented Jan 9, 2025

Currently all the entries in es.json have "fix": true, because manually looking through them was too much work for that number of feeds.
The problem is that the option will just delete all entries that it can't fix automatically, and apparently this issue is one of them. That's exactly the reason while "fix": true is not the default, because it's easy not to notice that this is happening.

I think it would be fine to extend gtfsclean to handle this, but I'd try to contact the feed producer first, since its much easier on their side and every consumer of the feed will need this fix.

@felixguendling
Copy link
Contributor

felixguendling commented Jan 9, 2025

The time parsing only reads HH:MM anyway in MOTIS, so there should be no problem with this syntax.

https://github.com/motis-project/nigiri/blob/master/src/loader/gtfs/parse_time.cc

But probably quoted time fields won't work (haven't tried it) - that's something we could improve if necessary.

@jbruechert
Copy link
Collaborator

In this case it's failing earlier on our side (gtfsclean), which we still need to filter other invalid stuff (wrong coordinates, too fast trips etc.)

@Altonss
Copy link
Collaborator

Altonss commented Feb 19, 2025

In this case it's failing earlier on our side (gtfsclean), which we still need to filter other invalid stuff (wrong coordinates, too fast trips etc.)

I think it is failing even earlier in https://github.com/public-transport/gtfsparser : all stop are badly formatted in HH:MM and therefor directly added to DroppedStopTimes I think, which leads to gtfsclean dropping them all.

@felixguendling
Copy link
Contributor

HH:MM is something MOTIS should be able to work with - so maybe just disable fix?

@jbruechert
Copy link
Collaborator

jbruechert commented Feb 20, 2025

Currently it still goes through gtfsclean anyway. We could add an option to skip that but then we loose filtering of fast trips for that feed which could break things.

I think making the gtfsparser more lenient will be the best solution.

@jbruechert
Copy link
Collaborator

Did this now, the next problem is

routes.txt:2 - Expected integer for field 'route_type', found 'TransporteAereo'

This feed must have seen zero testing or validation, I think it might be a case for an external data cleanup project.

@Altonss
Copy link
Collaborator

Altonss commented Feb 20, 2025

Did this now, the next problem is

routes.txt:2 - Expected integer for field 'route_type', found 'TransporteAereo'

This feed must have seen zero testing or validation, I think it might be a case for an external data cleanup project.

Wow this feed is amazingly bad 🤣 It would need a dedicated cleanup script at this point :/

@Altonss
Copy link
Collaborator

Altonss commented Feb 20, 2025

Did this now, the next problem is

routes.txt:2 - Expected integer for field 'route_type', found 'TransporteAereo'

This error is new, I have a version from january having the correct 1100 type :(

@jbruechert
Copy link
Collaborator

jbruechert commented Feb 20, 2025

So after some more fiddling around, it is salvageable, but only with custom search-and-replace action and gtfsparser hacks.

In case someone wants to clean it up, I have good experiences with the free CI on gitlab.com. It allows to get a stable link to the latest artifact, so you can set up a cron job to do a number of sed commands there without much work, and add the output link here. Obviously anything else that allows to get a stable link works as well.

Manually cleaned up version

@felixguendling
Copy link
Contributor

Maybe before putting in the effort to write a custom script, it would make sense to at least try to get in touch with the feed authors with a wish list which bugs would be most important to fix to have a usable feed. Maybe they are just not aware that their feed is effectively useless in the state they publish it.

@felixguendling
Copy link
Contributor

I found it quite useful that the official validator produces a static report URL that I can put into the email with the note that this validator is the official one by the standardization organization itself, so it has some "authority".

https://gtfs-validator.mobilitydata.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data sources adding, removing, modifying or discussing problems with specific data sources
Development

No branches or pull requests

7 participants