Tuesday, August 26, 2008

WOA there! What happens when a Website unwittingly becomes a Web Service...

Ryanair is in the news for canceling tickets which are booked through "screen-scraper" websites. Apparently it cancels up to 450 such tickets a day.

What the "screen-scraper" websites are doing is using the Ryanair website as a Web Service. "Screen-scraper" was the old phrase, conjuring up an image of a green screen mainframe. But, bang-up-to-date, this story is a nice example of "WOA" (Web Oriented Architecture) where the Ryanair Website unwittingly becomes part of the "Global SOA", to be used by applications.

Is "WOA" really new? I urge everyone to read this Byte article from Jon Udell in 1996, 14 12 years ago. Part of the title says it all: every website is a software component. "A powerful capability for ad hoc distributed computing arises naturally from the architecture of the Web."

The fact that the Ryanair site is being used as a software component, even though Ryanair expressly do not want it to be used in this way, shows the power of WOA. You literally can't hold it back.

Or can you? In Irish law, there is a precedent for this. The full details are in this story by Eoin Licken in the Irish Times Archive from 1998. I've pasted snippits below:


Irish companies putting information on Websites should stipulate terms and conditions for how their sites are used, following the dismissal of the State's first prosecution for unauthorised accessing of data earlier this year.

The case also highlights the dilemma faced by online information providers: how to limit access to valuable information in a medium designed for fast information transfer. Last April, Mr Alister Kidd, managing director of Touchtel, became the first person to be charged with unauthorised access to data under the 1991 Criminal Damage Act. The prosecution followed a complaint by Kompass Ireland, which runs an online database of company information, that Mr Kidd had found a way to bypass its site's technical restrictions and download company information from the database more quickly. Kompass says Mr Kidd wrote a computer program to automatically download records of company information every five seconds, a technique it calls "harvesting". He was traced via the address of the computer he used to download the records.

...

[ But the case was thrown out because there were no "Terms of Use" guidelines on the site. Check the Irish Times article for full details]

....

Legal sources say they were surprised the case arose at all, and the major lesson from it is the need for terms and conditions on Websites. Mr Kidd says the lesson is: "If you've got a site, specify what the usage is for."

However, not everyone is satisfied with the need for explicit terms and conditions on Websites. Mr Alex French of Medianet, Touchtel's Internet service provider at the time, says the need for disclaimers to prevent unauthorised access is "akin to requiring shops to put a `You may not break into this shop' sign up at night". He says the case has a profound impact for the Internet community in Ireland. The inspector in charge of the Garda Computer Crimes Unit says the issues surrounding access to data are still not clear. "If you are prepared to put information in the public arena you're inviting public access," says Insp Eugene Gallagher, but he adds: "It's unclear if someone comes in the window instead of the door."


Now Ryanair has such a Terms of Use statement in place. It is unclear if Ryanair are taking any practical countermeasures to block this automated access to their Website. Presumably the screenscrapers are using tools such as Mechanize which simulate a browser. One obvious solution is for Ryanair to add captchas. Another obvious solution [especially coming from the CTO of a Gateway vendor!] is to deploy a Gateway in front of the Ryanair site to detect and block the automated site usage.