4 Coding Errors

I was reading the news late last night, and I came across a new study put out by the NSA, Homeland  Security, Microsoft and Symantec. The BBC article was entitled “Dangerous Coding Errors Revealed.” I found it interesting though, that the top four problems listed are all directly related to web applications–although the principles can be applied offline as well, they hit especially close to home with web apps. I thought I’d run down these four and go a bit into detail as to how MediaWiki handles each one

  1. Improper Input Validation – Every good application developer knows that if they do not sanitize their user input, it will come back to bite them. I used to have a sign on my desk at work that said “Trusting your users to only put a number in the age field was your second mistake. The first was not filtering all user input.” This stemmed from (true story) an application developed internally that failed to check user input for validity before handling it. When I began testing it, the developers were concerned that I was putting non-appropriate input into their application. They asked me “Why would you do that? A user will only put X in Y field.” I replied, “You trust your end users?” The unfortunate matter is that we as application developers cannot trust our end users. We know that a number goes in the age field, but we can’t assume our end user grasps that. Not to mention those who will purposefully put in bad data to try and break your system. MediaWiki handles this all very nicely through the WebRequest class, rather than handling raw $_GET and $_POST superglobals. This way, when someone calls getIntOrNull(), they know damn well that they are getting an integer or they are getting a null.
  2. Improper Encoding or Escaping of Output – Anytime you have output, especially when it’s user-driven output, you must ensure that you’re escaping all the right things. With MediaWiki and other webapps, this means running any and all user output through htmlspecialchars() before injecting it into the HTML. Failure to do this will allow your user to provide arbitrary markup that will get added to the page. Major security risk :) . In addition, you need to make sure you’re sending properly encoded data that your user expects. With MediaWiki, this means that when we say we’re sending utf-8, we need to actually send it. And vice versa. Failure to make our actual output coincide with our expected output will generate warnings at best, and complete failure at worst (as I was talking with my Comp Sci TA today: I long for the day that unicode support is truly universal and we don’t have to _think_ about how to handle it)
  3. Failure to Preserve SQL Query Structure – This all goes back to proper user validation. If you don’t validate your user input, you’re already opening yourself up to some rather icky holes. To make matters worse, using user input within SQL can be even worse. At best, you’ll generate some errors. At worse, you’re exposing your data to malicious users, those that would want to either farm private data or hose your database. Either way, this input cannot be allowed. And many times, it’s as simple as escaping your quotes and occasionally a % if you’re doing LIKE statements. I don’t know a single place in MediaWiki where user input that is being added to SQL isn’t first run through addQuotes() or escapeLike(), respectively. Any such places need to be tracked down and fixed, of course.
  4. Failure to Preserve Web Page Structure – This all goes back to user-generated input. When you have a static HTML page, there’s no issue with page structure (this is assuming you generated an appropriate structure to begin with) as the user has no control over what your page contents are, much less how they’re formatted and arranged. However, when you begin accepting user input–this applies doubly so when the majority of the page content is user-generated–the entire equation changes. Simple text is fine, provided you escaped it (see point 2). What happens when you allow arbitrary HTML, or even a subset, as in MediaWiki’s case?  Which tags are allowed? Which ones are denied? How do you deal with the situation if a user puts markup that breaks your page? Do you allow overriding of non-content areas (the various images at the top right for protection/featured/etc come to mind)? All these things MediaWiki handles (as far as I know, it’s all Parser stuff :) .

I rather enjoyed this article actually; I just wish the article went into more depth on the individual issues. Any of the ones on the list that stand out to you guys? Some don’t apply as heavily in PHP-based web applications, of course.

Making MediaWiki friendly

Just before the New Year, I was in a round-table discussion with Andrew Lih and Liam Wyatt over at Wikipedia Weekly. We talked about a lot of things, but the first bit of it was by far my favorite bit. We got into a rather extensive discussion about usability (specifically the Stanton Grant) and the problems MediaWiki faces in this area. Specifically, I’d like to go into more details about the wiki markup.

Back in 2005, I began contributing to Wikipedia. One of the reasons I began to work on the project was my love for the free/open source, with free/open content being a natural extension of that.  The second reason was because it was ridiculously easy. The markup was still (relatively) simple at the time and presented no major barrier to contribution. Since then, we’ve seen Wikipedia grow massively, and MediaWiki had to grow with it. ParserFunctions have since arrived, as have many other additions to the markup. I’ve never learned ParserFunctions; I can’t memorize template parameters. Asking our newbies to do this is unrealistic, and puts an insurmountable barrier in the way of their contribution.

Part of the problem is a lack of WYSIWYG (as has been blogged many times by many people). This comes back to the fact that the wiki markup A) Has no formal grammar (see bug 7 and the Markup spec project), and B) Hasn’t grown in a manageable and predictable way. Over the years, more features and tags have been tacked on the side, as I said above. Hopefully the Stanton Grant will make a difference in these areas.

What problems do you have with using MediaWiki? What would you like to see easier to do?

Politically correct

Greetings from South Carolina. I’m visiting family for a few days, it being the holidays and all. While driving through town this evening, my brother and I were flipping the radio stations and briefly stopped on talk radio. The host was rehashing a story in the news–major thanks to anyone who can find this for me online, it’s midnight and it’s been a long day–about a woman who deals with the public for her job (the exact job escapes me). The story revolved around her being fired for refusing the follow company protocol.

According to the story, the company states that employees need to use the politically correct term of “Happy Holidays” when dealing with customers. Fair enough, and pretty much standard these days. However, the woman was highly offended at being forced to–in her eyes–diminish the value of Christmas (she is a Christian) in relation to the other holidays of the season, making it equal. From her perspective, she should be allowed to say “Merry Christmas” and bah humbug to those who do not celebrate this most joyous of holidays. Accordingly, she was fired for failing to adhear to company policy. I for one say good for her employer for sticking to his policies. Sadly, the story continues that the woman is now pressing charges and seeks damages (on what grounds I dunno). No doubt the woman will get some level of damages…

In any case, this really says a lot about how people view this season, and political correctness seems to crop up every season. This supposed “War on Christmas” and its defenders cry afoul of absolutely any holiday mention besides their own. They refuse to acknowledge any holidays from the season except their own.

Get over yourselves! I’m not one to be passing judgement, but this is ridiculous. Just because someone says “Happy Holidays” instead of “Merry Christmas” is not demeaning to Christmas. It isn’t pretending Christmas doesn’t exist, nor is it trying to make Christmas any less important. It is simply a non-denominational greeting in this time of year in which many people tend to celebrate, both religious and secular, Christian and non-Christian. Those people who see this as a put down to Christmas need to remember, realize, and accept that there are in fact celebrations during the month of December besides their own. These non-Christmas-celebrating people, regardless their faith (or lack thereof) have every full right to celebrate the holiday of their choosing without the pesky requirement that they be lumped into Christmas, rather than the more-inclusive “Happy Holidays. And for those who think a “Merry Christmas” is putting down their non-Christmas-holiday-of-choice, you need to not be offended as well.

In short: political correctness sucks and makes needless issues. Happy December everyone.

[EDIT: Just found the story, not too sure on the source however. Anyone got a better link?]