The API of the Zürich airport website has changed which my first program usless. As I found out, the new website makes request using
POST method instead of
GET. Further, the returned data is
html and not
The new webcrawler is now openly available on github:
[ read the article about the old code below ]
I'm not going to talk around the topic, but get straight to the details. For everyone who is not exactly fond of what I did, here's the important stuff.
The Airport websites all have a website with a timetable of the arriving/departing flights. In order for the website to work, they have methods (APIs) for getting the data from their server to the website. All I did was use these "openly" accessible APIs to pull down the flight data.
The hidden API
The site I analyzed was: timetable.engadin-airport.ch
It was a bit tricky at first, since it is not a static or semistactic website.
So the View-Source trick doesn't work here.
When I looked at the Network traffic (using the Firefox developer tools) I quickly found that there is a
GET request every minute.
While the response body does not contain anything like html or
json, the response header sure does.
Basically we just need to send a request and save only the headers using
wget --save-headers http://timetable.engadin-airport.ch/airtrack/timetable.php
The rest is a bit of awk magic and the
json file is ready to be used for whatever you want to do.
Hers's what you've been waiting for, the actual code. I put the scripts into a cronjob, so the server sends an update on the flight data via a Telegram Bot.
(The to-telegram.py would be used to generate human readable text to forward via text message)
The hidden API
The site I analyzed was: flughafen-zuerich.ch/departures-and-arrivals
The Zurich Airport was a bit different. It has a great webapi for its mobile website, where it even offers a filter "spotter" flights. Again I analyzed the webtraffic and the source I looked for wasn't even hidden much. There is a
GET request going out to a link that has even the word webapi in it. But have a look for yourself.
This time we don't even need to dig into the response headers. The transmitted "webpage" already contains pure
Another job for
wget. You might have seen the
/1/ part in the request url. That means that there are a lot of pages to download. Also I only wanted the spotter relevant flight data. The webapi has a normal mode with only civil flights, and a spotter mode, where civil flights plus the interessting flights are present. So I need to get:
- all civil flight arrivals
- all civil flight departures
- all spotter flight arrivals
- all spotter flight departures
That's around 33 requests per category. I download all the pages and merge them into 3 large
json files for later processing.
(The spotter-sort.py compares the standard timetable agains the spotter timetable and outputs only the differences)