Mr. Gorbachev, bring down this wall!
Posted on September 25th, 2013 – 2 CommentsYou may have heard the term “paywall” — it’s when a web site limits the amount of content that you can see unless you sign up with them for a fee. This typically happens after you’ve viewed a predetermined number of articles, and that number is reset on a daily, weekly, or monthly basis (depending on their setup).
All of Toronto’s major daily newspapers have put up paywalls, including the Toronto Star, Toronto Sun, Globe and Mail, and National Post.
And they’re all just awful.
Much hooplah was made about a developer that bypassed the New York Times paywall a couple of years ago, yet little (if anything), has changed since. David Hayes, the developer who cracked the NYT paywall, claims it took him a lunch hour to write the bookmarklet that bypasses the newspaper’s paywall.
A couple of days ago when Sarah was hitting the Star’s paywall I decided to take a quick look at what would be involved in getting around it. Twenty minuted later I had bypassed the paywalls of all of the above papers, including the New York Times (before I’d read anything on the topic, I should add). It took another 30 minutes to produce a small, generic site script that makes the dewalling process just a little easier and faster.
I’m not blowing my own horn here. I’m no super genius and this “hack” could be accomplished by anyone with rudimentary web development experience. In fact, both Hayes’ code and my own are almost unnecessary; with a few extra steps, you can bypass these paywalls with no extra software or crazy hacking skills. Chances are good that you already know how to do it.
I can see some extra benefit to a utility that would assist in automatically navigating the paywall beyond the first article — so that you could click on the web page links instead of having to load article by article — but this was more of a proof-of-concept thing, and the proven concept is that paywalls are unfortunately simple to defeat.
I’m not currently posting my dewalling code publicly. However, I will detail why this problem exists, and what the papers can do to fix it (if you’re from any of the aforementioned newspapers, feel free to give me a shout).
So Why Are Developers So Dumb?
I don’t think they are :) And to be honest, I totally get why things were done this way.
When a typical web browser grabs the web page you request, it sends out some limited information for the listening web server on the other end. This includes listing the browser’s capabilities (what kinds of content it can handle), specifying what it’s looking for (usually the URL of the web page), and cookies.
The receiving web server has that, plus an IP address, to identify an individual reader over the internet.
The IP isn’t unique to you, it’s unique to your internet connection which may be shared by many devices (like the the internet box thing, a.k.a. residential router, in your home). Browser capabilities can’t be assumed to be unique, again, because of that shared internet connection thing. And cookies can be cleared with the click of a button.
Given these limitations, how are web developers supposed to identify unique readers while ensuring that other legitimate readers can still access the site?
Better to err on the side of caution and just use cookies, sometimes along with IP, rather than accidentally block readers. Paywalls are necessarily leaky.
So What Should They Do?
This is a tough one.
It’s tough because it puts the limitations of technology up against corporate culture and profits.
What this does is really call up the need for reflection on how the papers profit from their content, and to me it’s an all-or-nothing proposition.
One option is for the papers go all-in and make certain articles, sections, features, etc. fully pay-only. That means having to log in to access them, otherwise it’s an excerpt, or some sort of teaser, to the general hoi poloi.
Another, more Zuckerbergian option is to offer access in exchange for personal information. I’m not necessarily averse to this, but it also requires a content lock-down of some sort.
The current paywall solution is somewhere just above both of these, being easily circumventable but still acting as a deterrent to the average web user.
I would gravitate towards the nothing end of the scale with a nag solution where on every X views of an article, the non-subscribed reader receives a temporary pop-over message suggesting that they subscribe. IP address on the server could be used to determine how often to do this — it seems unlikely that shared connections would all be connecting to the same content source, and even so, all it would produce is a nagging reminder that people really do like the content. It’d be sort of like a local rating system with an option to subscribe.
Beyond that, there could be a mild nag every time, for non-subscribed users. This starts to get close to being just plain old fashioned inline advertising, which would be the next solution before nothing at all (full, free access to everything).
Of course, since the papers have full control over their sites, there’s theoretically no limit on how inline advertising could be accomplished. There’s the always classy Toronto Sun wall-to-wall background…
…but if that’s not the newspaper’s style, I’m sure there are other and more elegant approaches.
Ultimately, the decision is whether or not to lock away content. Logins are reliable, which is why they’re so popular. Identifying users without them is inherently unreliable. Either content can be locked away completely, or it should be assumed to be open to everyone. The seemingly in-between paywall solution is actually in the second family by reasons which I’ve explained earlier.
Astute web developers will point out that other mechanisms are available to bypass some of these limitations: Flash shared objects, or persistent browser databases. While these are a step beyond simple cookies, both are easily deleted as part of most modern browers’ cache management. In other words, they’re not much better than anything mentioned so far.
Browsers impose these limits to provide a level of privacy protection, and without requiring readers to manually enter additional information like a username and password, it’s tough if not impossible to pinpoint an individual human being. Without this exactness, any paywall or content blocking system is bound to be flimsy. The solution, at least at the present time, won’t involve technology; it’ll require high-level decisions about what will be locked away from the general public and what won’t.