Saturday, August 10, 2013

Homework-help.net's Customer Identities

No, their real name is not homework-help.net.

It's one of many sites where students can get help with homework. They have a privacy policy. It's quite inaccurate. Improper permissions on their log file has made it easy to find out information about the students using the service.

Anyway, this is part of my investigation into a popular open source logging service. Searching for the log file's name and an IP address from a local university lead me to the log page from a few years ago, so I switched the date in the log title and sure enough it gave me this month's traffic information. The timestamps in the file names are predictable, because it is generated by the service.

The log file contains information like:

 - search terms that lead to the site
- time ranges spent on the site
- IP addresses of visitors, what time the IP address accessed the site
- specific pages accessed
- referrer links
- browser, OS, etc. for each IP
- even window dimension
- etc.

What meta data can be obtained from these things? Not much, considering how often the data is logged. But because the logs were stored in a directory with read access as well (bad admin! NO! BAD!),  there were timestamps on when the files were last modified. Then a script was written to go check up on the file once in a while to see the latest visitor (or at least subset of visitors for a few hours), and then map the timestamps within the file with others. Really, the jackpot only happens when you only get one new visitor in that time frame, so that all successive new data in the diff between the old and current log file can be applied to that one person. For example, even though no timestamps are associated with their web search terms, if there's only one new IP/person, they must have been the one to make the search. I'd like to experiment with this and make sure it's accurate, but the logic seems sound, and it worked for my purposes.

Anyway, this one new customer was interacting with a directory I couldn't navigate to from the log file directory. Of course I am instantly curious, because the directory's named something like /7df83kic. So I browse to that area, which also has public read permissions.

Whoa, buckets of uploaded user files! I check around on the site, and the only time you can upload an assignment is when you sign up for the service. What's more, these are the original documents, meaning all meta data is included, whether it be a .PDF, .doc, .docx etc. So in addition to the teacher's email, class name, university name, etc., there are, for example, the author names on the Microsoft Word licenses when you check out the properties. Or a username. (Random example from the interwebs on the left.)

The site's remediation plan was to change the name of the directory, but it would be much better to correct the file permissions (since the traffic log is still public and shows the new file upload location...)

Moral of Story: There are reasons Facebook caches photos that you upload, and that cached version is what gets displayed. All those little hidden bits of meta data (like facial-detection tags in Picasa and the names you've added) in that photo disappear when the picture is saved in a different format, compressed, what have you. Or, another interesting example is code hidden in pictures, but that's another story. In essence, don't use original upload data,  and sanitize everything from form data to uploads.

UPDATE as of August 5 2018: there are a lot of other sites with this same vulnerability, where this particular logging library sets its log files to public by default. I recently came across another situation where the logs revealed someone unsuccessfully trying to use this appserv vulnerability (discussed here), by setting the URL parameter to appserv_root as malicioussite.com/attack.gif, a gif which presumably has some PHP in it that would be executed during the PHP include. Most of this was only visible in the 404 part of the log because the attacker was attempting the attack a few times with various typos.