Today I investigate Spiffy Co. , my nickname for a company that sells licensed, proprietary software. Their intention is to have only registered customers be able to log in to access downloads they've already paid for. Unfortunately, their paywall is very flimsy.
Checking out their robots.txt file, I noticed there was nothing blocking that particular directory (/downloads) from search engine crawlers to see it. So I search "insite [spiffyco.com]" to see what kind of view the public has. Though, it doesn't exactly return what I'm looking for, since it's just manuals in a directory below that directory. So I go to the manuals directory and try to navigate up to see if it will list a directory with downloads and ... nope, no listing. It returns me to the paywall.
At this point, I decide to guess. A pretty intuitive starting place seems to be /downloads/downloads/, since I guess that's what I would do if I were trying to organize the site. Behold, here are the list of .exes. Yep. Just by a guess, and thinking "how would I have done that?". No coding magic here!
Moral of story: Similar to the cheat-on-homework.net post below, you cannot count on directories without links to them to be secure links.
Disclaimer: It is illegal to install and use licensed software that you haven't paid for. Don't try this at home.
This blog is about the educational (and sometimes entertainment) value of simple hacks. For active vulnerabilities, real names are concealed.
Monday, October 14, 2013
Saturday, August 10, 2013
Homework-help.net's Customer Identities
No, their real name is not homework-help.net.
It's one of many sites where students can get help with homework. They have a privacy policy. It's quite inaccurate. Improper permissions on their log file has made it easy to find out information about the students using the service.
Anyway, this is part of my investigation into a popular open source logging service. Searching for the log file's name and an IP address from a local university lead me to the log page from a few years ago, so I switched the date in the log title and sure enough it gave me this month's traffic information. The timestamps in the file names are predictable, because it is generated by the service.
The log file contains information like:
- search terms that lead to the site
- time ranges spent on the site
- IP addresses of visitors, what time the IP address accessed the site
- specific pages accessed
- referrer links
- browser, OS, etc. for each IP
- even window dimension
- etc.
What meta data can be obtained from these things? Not much, considering how often the data is logged. But because the logs were stored in a directory with read access as well (bad admin! NO! BAD!), there were timestamps on when the files were last modified. Then a script was written to go check up on the file once in a while to see the latest visitor (or at least subset of visitors for a few hours), and then map the timestamps within the file with others. Really, the jackpot only happens when you only get one new visitor in that time frame, so that all successive new data in the diff between the old and current log file can be applied to that one person. For example, even though no timestamps are associated with their web search terms, if there's only one new IP/person, they must have been the one to make the search. I'd like to experiment with this and make sure it's accurate, but the logic seems sound, and it worked for my purposes.
Anyway, this one new customer was interacting with a directory I couldn't navigate to from the log file directory. Of course I am instantly curious, because the directory's named something like /7df83kic. So I browse to that area, which also has public read permissions.
Whoa, buckets of uploaded user files! I check around on the site, and the only time you can upload an assignment is when you sign up for the service. What's more, these are the original documents, meaning all meta data is included, whether it be a .PDF, .doc, .docx etc. So in addition to the teacher's email, class name, university name, etc., there are, for example, the author names on the Microsoft Word licenses when you check out the properties. Or a username. (Random example from the interwebs on the left.)
The site's remediation plan was to change the name of the directory, but it would be much better to correct the file permissions (since the traffic log is still public and shows the new file upload location...)
Moral of Story: There are reasons Facebook caches photos that you upload, and that cached version is what gets displayed. All those little hidden bits of meta data (like facial-detection tags in Picasa and the names you've added) in that photo disappear when the picture is saved in a different format, compressed, what have you. Or, another interesting example is code hidden in pictures, but that's another story. In essence, don't use original upload data, and sanitize everything from form data to uploads.
UPDATE as of August 5 2018: there are a lot of other sites with this same vulnerability, where this particular logging library sets its log files to public by default. I recently came across another situation where the logs revealed someone unsuccessfully trying to use this appserv vulnerability (discussed here), by setting the URL parameter to appserv_root as malicioussite.com/attack.gif, a gif which presumably has some PHP in it that would be executed during the PHP include. Most of this was only visible in the 404 part of the log because the attacker was attempting the attack a few times with various typos.
It's one of many sites where students can get help with homework. They have a privacy policy. It's quite inaccurate. Improper permissions on their log file has made it easy to find out information about the students using the service.
Anyway, this is part of my investigation into a popular open source logging service. Searching for the log file's name and an IP address from a local university lead me to the log page from a few years ago, so I switched the date in the log title and sure enough it gave me this month's traffic information. The timestamps in the file names are predictable, because it is generated by the service.
The log file contains information like:
- search terms that lead to the site
- time ranges spent on the site
- IP addresses of visitors, what time the IP address accessed the site
- specific pages accessed
- referrer links
- browser, OS, etc. for each IP
- even window dimension
- etc.
What meta data can be obtained from these things? Not much, considering how often the data is logged. But because the logs were stored in a directory with read access as well (bad admin! NO! BAD!), there were timestamps on when the files were last modified. Then a script was written to go check up on the file once in a while to see the latest visitor (or at least subset of visitors for a few hours), and then map the timestamps within the file with others. Really, the jackpot only happens when you only get one new visitor in that time frame, so that all successive new data in the diff between the old and current log file can be applied to that one person. For example, even though no timestamps are associated with their web search terms, if there's only one new IP/person, they must have been the one to make the search. I'd like to experiment with this and make sure it's accurate, but the logic seems sound, and it worked for my purposes.
Anyway, this one new customer was interacting with a directory I couldn't navigate to from the log file directory. Of course I am instantly curious, because the directory's named something like /7df83kic. So I browse to that area, which also has public read permissions.
Whoa, buckets of uploaded user files! I check around on the site, and the only time you can upload an assignment is when you sign up for the service. What's more, these are the original documents, meaning all meta data is included, whether it be a .PDF, .doc, .docx etc. So in addition to the teacher's email, class name, university name, etc., there are, for example, the author names on the Microsoft Word licenses when you check out the properties. Or a username. (Random example from the interwebs on the left.)
The site's remediation plan was to change the name of the directory, but it would be much better to correct the file permissions (since the traffic log is still public and shows the new file upload location...)
Moral of Story: There are reasons Facebook caches photos that you upload, and that cached version is what gets displayed. All those little hidden bits of meta data (like facial-detection tags in Picasa and the names you've added) in that photo disappear when the picture is saved in a different format, compressed, what have you. Or, another interesting example is code hidden in pictures, but that's another story. In essence, don't use original upload data, and sanitize everything from form data to uploads.
UPDATE as of August 5 2018: there are a lot of other sites with this same vulnerability, where this particular logging library sets its log files to public by default. I recently came across another situation where the logs revealed someone unsuccessfully trying to use this appserv vulnerability (discussed here), by setting the URL parameter to appserv_root as malicioussite.com/attack.gif, a gif which presumably has some PHP in it that would be executed during the PHP include. Most of this was only visible in the 404 part of the log because the attacker was attempting the attack a few times with various typos.
Saturday, July 27, 2013
Cilent Controlled Subscriber Content
Site, for this post I shall dub you fun-trails-for-hiking.com. Here is another case where paid subscriber content can be reached with some JavaScript tweaking. These happen often.
Here's the idea - they say you must be a premium subscriber to see a larger area of the map. Instead you change
<div id="trailBox" class="topoSml">
to
<div id="trailBox" class="topoLrg"> or <div id="trailBox" class="topoMed">
When coding on a team, it's really helpful to use descriptive variable names (wasn't it the author of "Clean Code" that said if your variable names work you don't need any comments explaining code?). No where else on the page are the words "topoLrg" or "topoMed", but because the site looks well put together and professional, it's reasonable to infer that the map sizes other than "small" or named similarly.
Here's the idea - they say you must be a premium subscriber to see a larger area of the map. Instead you change
<div id="trailBox" class="topoSml">
to
<div id="trailBox" class="topoLrg"> or <div id="trailBox" class="topoMed">
When coding on a team, it's really helpful to use descriptive variable names (wasn't it the author of "Clean Code" that said if your variable names work you don't need any comments explaining code?). No where else on the page are the words "topoLrg" or "topoMed", but because the site looks well put together and professional, it's reasonable to infer that the map sizes other than "small" or named similarly.
Wednesday, June 19, 2013
"Prominent Political TV Host" Poll Center
I'm surprised this trick still works, I bothered them with it like a year ago, and it works as of a few days ago. In fairness, the site has a disclaimer that says "These polls are not scientific. Only one count is counted per visitor". This turns out not to be the case. Anyway, in my usual style, it's simple JavaScript edits with a few goals in mind for the poll center of this site:
To Do:
1. Sway the results of closed archived polls
2. Make the poll results add up to less than 100%
3. View the results of a poll before it is aired on the show
You can see this as a product of poor data sensitization. Messing with FireBug Firefox extension causes permanent data changes on the server that no one seems to notice or care about. So if you're a developer, may this be a lesson you never have to learn in production - sanitize input on client side and server side.
1. Change archived poll results
Simply change loadArchivePoll to loadPoll. That's it.
<div class="Pollwrap">
<div style="padding: 5px; cursor:pointer; border-bottom: 1px solid #cccccc;" class="pollArchiveItemOff" id="pollArchiveItem0_0" onmouseover="" onmouseout="changeBackgroundClassArchivePoll('0','0', 'Off');" onclick="loadArchivePoll('0','9337','0');"><span style="display:block;">Make a prediction - will Attorney General Eric Holder resign?<br>May 30, 2013</span></div>
</div>
2. Polls that add up to less than 100%
This effect can be easy triggered. If the radio button's value attribute is edited to something not in the given values in the multiple choice survey, it won't be counted.
<input type="radio" value="Fake value." name="surveyAnswer">
3. View the results first
Okay, I'm leaving this one as an exercise for you! Trust me, it's as trivial as the other two hacks. Hint: take a poll legitimately, and figure out where the form takes you to view the results.
To Do:
1. Sway the results of closed archived polls
2. Make the poll results add up to less than 100%
3. View the results of a poll before it is aired on the show
You can see this as a product of poor data sensitization. Messing with FireBug Firefox extension causes permanent data changes on the server that no one seems to notice or care about. So if you're a developer, may this be a lesson you never have to learn in production - sanitize input on client side and server side.
1. Change archived poll results
Simply change loadArchivePoll to loadPoll. That's it.
<div class="Pollwrap">
<div style="padding: 5px; cursor:pointer; border-bottom: 1px solid #cccccc;" class="pollArchiveItemOff" id="pollArchiveItem0_0" onmouseover="" onmouseout="changeBackgroundClassArchivePoll('0','0', 'Off');" onclick="loadArchivePoll('0','9337','0');"><span style="display:block;">Make a prediction - will Attorney General Eric Holder resign?<br>May 30, 2013</span></div>
</div>
2. Polls that add up to less than 100%
This effect can be easy triggered. If the radio button's value attribute is edited to something not in the given values in the multiple choice survey, it won't be counted.
<input type="radio" value="Fake value." name="surveyAnswer">
3. View the results first
Okay, I'm leaving this one as an exercise for you! Trust me, it's as trivial as the other two hacks. Hint: take a poll legitimately, and figure out where the form takes you to view the results.
Tuesday, June 18, 2013
Client Side (Instead of the Dark Side)
It's astonishing how much power there is in simply looking/editing web page source! Possibly anything more seems to get you on the news, so you can stay on the bright side: the client side. Granted, it depends on the contest or website, I guess, but every once in a while you get a gem like this:
answersTxt[3] = "<div id=\"answer-input-3\" class=\"answer-result \"><span></span></div>";
answersImg[4] = "<div id=\"answer-final-4\" class=\"contest-answer result hasImage\"><img src=\"/sites/files/bottle.jpg\" alt=\"Picture4\" /></div>";
Yes, there's an array called answersTxt[] and answersImg[].
This was a matching game where pictures were matched to text. The administrators have remediated this... they have started hashing the names of the pictures to avoid future shenanigans.
Where Did Wiki Scanner Go?
A guy named Virgil in early 2000's created a scandalous project called WikiScanner. All it does it look at anonymous Wiki edits for a certain ip ranges to figure out what companies were editing their own Wikipedia pages (obviously the ip addresses were cross referenced against a whois or something).
It doesn't seem to exist anymore. It got taken down I think. All I found is a bad quality copycat with injectable debugging statements:
Yikes, TMI, error code...
Anyway, this is easy to replicate yourself using the Requests library in Python. Get a sample of the IP of the network you'd like to know more about, and take a walk around the "block":
(where xxx.xx.xx.xxx is the IP)
for i in range (0,256):
r = requests.get('http://en.wikipedia.org/wiki/Special:Contributions/xxx.xx.xx.' + str(i))
Refine the crawler to look for related articles, with key words that are relative to your interest.
It doesn't seem to exist anymore. It got taken down I think. All I found is a bad quality copycat with injectable debugging statements:
Table 'wikiscanneres_wiki.org3' doesn't exist
SELECT name, ip_from from org3 where '2433878598' between ip_from and ip_to order by ip_from DESC limit 1
Anyway, this is easy to replicate yourself using the Requests library in Python. Get a sample of the IP of the network you'd like to know more about, and take a walk around the "block":
(where xxx.xx.xx.xxx is the IP)
for i in range (0,256):
r = requests.get('http://en.wikipedia.org/wiki/Special:Contributions/xxx.xx.xx.' + str(i))
Refine the crawler to look for related articles, with key words that are relative to your interest.
FB's Creepy Friend-Suggest:
Automatically Lists All Friends of a Contact
Consider a Facebook account created with a gmail account with only one contact. Despite settings to keep the friends list private, this new account will get friend suggestions for all of the contact's friends, assuming the contact's email has a Facebook account.
More testing is needed to see if the suggestions still happen if the contact has a private profile and wishes the names of friends to not be visible to the public.
Consider a Facebook account created with a gmail account with only one contact. Despite settings to keep the friends list private, this new account will get friend suggestions for all of the contact's friends, assuming the contact's email has a Facebook account.
More testing is needed to see if the suggestions still happen if the contact has a private profile and wishes the names of friends to not be visible to the public.
Premium Content Control
Most news sites display their premium content briefly in the browser before snatching it away again, allowing enough time for the user to stop the page from loading and view the full article.
Some (i.e. WSJ) deliberately offer a "first click is free" deal where content is viewable if the site is reached by clicking a news Google result.
Exhibit A from page source:
<!--added for registration bypass2 October 3,2002-->
Then where content is not available, we have this tag (as well as scripts for no-content instead of content):
<meta name="GOOGLEBOT" content="unavailable_after: 17-Jun-2014 21:49:00 EDT" />
A blog post from 2009 mentioned that this has a limit of 5 articles, but it's not true. As long as you can search for the link location of the article it will work.
How this came about: I very much like the WSJ, but I am annoyed when LinkedIn news posts are blocked because they have linked to their premium content.