4 Coding Errors
I was reading the news late last night, and I came across a new study put out by the NSA, Homeland Security, Microsoft and Symantec. The BBC article was entitled “Dangerous Coding Errors Revealed.” I found it interesting though, that the top four problems listed are all directly related to web applications–although the principles can be applied offline as well, they hit especially close to home with web apps. I thought I’d run down these four and go a bit into detail as to how MediaWiki handles each one
- Improper Input Validation – Every good application developer knows that if they do not sanitize their user input, it will come back to bite them. I used to have a sign on my desk at work that said “Trusting your users to only put a number in the age field was your second mistake. The first was not filtering all user input.” This stemmed from (true story) an application developed internally that failed to check user input for validity before handling it. When I began testing it, the developers were concerned that I was putting non-appropriate input into their application. They asked me “Why would you do that? A user will only put X in Y field.” I replied, “You trust your end users?” The unfortunate matter is that we as application developers cannot trust our end users. We know that a number goes in the age field, but we can’t assume our end user grasps that. Not to mention those who will purposefully put in bad data to try and break your system. MediaWiki handles this all very nicely through the WebRequest class, rather than handling raw $_GET and $_POST superglobals. This way, when someone calls getIntOrNull(), they know damn well that they are getting an integer or they are getting a null.
- Improper Encoding or Escaping of Output – Anytime you have output, especially when it’s user-driven output, you must ensure that you’re escaping all the right things. With MediaWiki and other webapps, this means running any and all user output through htmlspecialchars() before injecting it into the HTML. Failure to do this will allow your user to provide arbitrary markup that will get added to the page. Major security risk
. In addition, you need to make sure you’re sending properly encoded data that your user expects. With MediaWiki, this means that when we say we’re sending utf-8, we need to actually send it. And vice versa. Failure to make our actual output coincide with our expected output will generate warnings at best, and complete failure at worst (as I was talking with my Comp Sci TA today: I long for the day that unicode support is truly universal and we don’t have to _think_ about how to handle it) - Failure to Preserve SQL Query Structure – This all goes back to proper user validation. If you don’t validate your user input, you’re already opening yourself up to some rather icky holes. To make matters worse, using user input within SQL can be even worse. At best, you’ll generate some errors. At worse, you’re exposing your data to malicious users, those that would want to either farm private data or hose your database. Either way, this input cannot be allowed. And many times, it’s as simple as escaping your quotes and occasionally a % if you’re doing LIKE statements. I don’t know a single place in MediaWiki where user input that is being added to SQL isn’t first run through addQuotes() or escapeLike(), respectively. Any such places need to be tracked down and fixed, of course.
- Failure to Preserve Web Page Structure – This all goes back to user-generated input. When you have a static HTML page, there’s no issue with page structure (this is assuming you generated an appropriate structure to begin with) as the user has no control over what your page contents are, much less how they’re formatted and arranged. However, when you begin accepting user input–this applies doubly so when the majority of the page content is user-generated–the entire equation changes. Simple text is fine, provided you escaped it (see point 2). What happens when you allow arbitrary HTML, or even a subset, as in MediaWiki’s case? Which tags are allowed? Which ones are denied? How do you deal with the situation if a user puts markup that breaks your page? Do you allow overriding of non-content areas (the various images at the top right for protection/featured/etc come to mind)? All these things MediaWiki handles (as far as I know, it’s all Parser stuff
.
I rather enjoyed this article actually; I just wish the article went into more depth on the individual issues. Any of the ones on the list that stand out to you guys? Some don’t apply as heavily in PHP-based web applications, of course.
