Monday, August 13, 2018

Paywalls and Redirects

It is well known that paywalls can commonly be circumvented by editing the client side code so that the content is no longer hidden. But what about when that code is server side? Here is a quick "no tools" way of getting around certain kinds of paywalls.

Disclaimer: don't get around paywalls. It's rude. Pay content providers for content. Anyway... here is the idea. You are on a site with mixed paywall and non-paywall content. You're looking around trying use the typical methods: to find the element to delete or the flag to set to be able to see the content. You're comparing application logic for displaying the paywall vs non-paywall stuff. No luck. In my case, I am presented with this URL, which has a paywall error on the page:

awesome.com/pants

I wondered if maybe the content had been cached and I could view it that way with an "in site" search (in Google, "site:awesome.com/pants [OPTIONAL_SEARCH_TERM]"). Bingo! There are links like:

awesome.com/pants/pants01
awesome.com/pants/pants02

I am able to navigate to these pages and view the "pants" content. But what if I want to view "socks" content? Even though only the "pants" page was cached on Google, it's all I need, because this page happens to link to other pages behind the paywall. But what about the ones that don't have links? Well, since I have an example of the convention used in the URL page, using the root of the URL as a hint, I can craft any URL needed to access all the others as well. For example:

awesome.com/kneesocks

This probably means it has these behind the paywall:

awesome.com/kneesocks/kneesocks01
awesome.com/kneesocks/kneesocks02

And so it does! Now between the links on these pages and the formulaic predictable URL, I can navigate around the rest of the paywall. Additionally, a mistaken URL will very helpfully redirect if the last directory is correct:

awesome.com/pants/kneesocks03  >  [301 status error]  >  awesome.com/kneesocks/kneesocks03

Suggestion to the site for mitigating this risk: require authorization for the pages that need it, whether they are the redirect page or the old page.