Anyone Can Edit

Free content, free software, and just about anything else free

Archive for January, 2009

Why I do what I do

Posted by Chad January - 29 - 2009 - Thursday 3 COMMENTS

I was talking with a friend in my logic class, and he was asking me how long I had been programming. I told him, and went on to mention some of the things I’ve done. I talked a bit about MediaWiki and the volunteer development work I did there. He asked me why I enjoyed developing for free. To me, it’s all about doing your part. I just happen to do my part by developing software rather than writing articles or taking pictures. I’ve got extremely liberal ideas about content and its accessibility. To me: quality information (be it text, images, video, music) should be as readily available as possible to everyone.

I take special note of the word quality. To me, when trying to balance quality over quantity, I tend to give heavier weight to quality. 2,000,000 articles is useless when it’s unorganized, unverified and poorly written. It’s be as useful as putting the world’s literary collections into a giant pile, after ripping all of the pages out and shuffling them around. The same goes for images. Poorly organized, licensed and documented, this too turns into a degenerate mess.

A second ideal that I’ve gained respect for is cultural significance. We need to be generating content that is both of high quality (as possible) and high cultural use. So when we take up the subject of sexually explicit photographs, I see a quick distinction. Unlike older photos and paintings that cannot be obtained now (we can only take pictures of the Mona Lisa, we can’t make a brand new one), pictures of things in our everyday lives, be it houses, cars, or people (clothed or nude) can be taken any day of the week. I think I put it best earlier, when I said on Foundation-l that “Commons \ is meant to be a collection of freely-licensed media, not a dumping \ ground for all media that happens to be free.”

At some point, a line has to be drawn. We are not a hosting service, we are a collection of free content. A free image does not necessarily make a good image. When asked about what the difference was between being a collection and dumping ground, I followed with:

Emphasis on usefulness. We’re about providing free content, and I would hope being culturally significant would still be a priority. I always considered that a major point in inclusionism/deletionism debates. Are we remaining culturally relevant? Talking about pop culture as well as historical events, places, customs, etc. Providing information about naked people, their habits, customs, fetishes even: I consider this culturally relevant. Hosting a picture looking up a girl’s skirt is hardly culture, and is borderline voyeurism.

If we’re a dumping ground, of course none of this matters at all.

Have priorities changed over the years? Is this in fact the direction our community wants to go in?

Under age sex is illegal, even for a cop

Posted by Chad January - 29 - 2009 - Thursday ADD COMMENTS

Not tech related at all, just felt like writing this one up. Found out a few hours ago (via VCU e-mail) that our police chief has been arrested in Chesterfield County on charges of soliciting underage sex. The Washington Post has already reported on the story, as have several local publications. Of course, VCU has already done the expected: placing him on administrative leave and issued a statement saying that Chesterfield’s finest have VCU’s full cooperation.

For what it’s worth: the Chesterfield police have been (and this is no secret) using Craigslist and other such services to handle sting operations in recent months. Trying to catch pedophiles and such. Wouldn’t the police chief of VCU know better? *sigh* And to think this week was going to end without some mess in Richmond about something remarkably stupid.

[Just to clarify the situation: Chesterfield is a county about 20 minutes south of the city of Richmond. Virginia is fun in that we have incorporated cities, they don't exist within counties like pretty much everywhere else. Richmond has its own police. VCU has its own fully-certified and trained police force, separate from the city's.]

MediaWikiPerformAction

Posted by Chad January - 24 - 2009 - Saturday ADD COMMENTS

Quite possibly one of the coolest hooks to exist in MediaWiki. I’ve been playing with this for the past few days, and it really is indeed a powerful hook. Occurring very near the beginning of a MediaWiki execution, it allows some high-level access to early objects, before MediaWiki has begun its output.

  • $output – The global OutputPage object allowing you to control all aspects of page output prior to execution–and if you’re halting execution, you can provide your own output.
  • $article – The Article object, which could be Article, ImagePage, or CategoryPage (potentially a SpecialPage or child?)
  • $title – Your Title object. Very easy to allow for Title swapping at a very high-pre-output level.
  • $user – The current User object
  • $request – Just the WebRequest so you don’t have to call the global yourself.
  • $wiki – The MediaWiki object. Not a huge amount of use that I’ve seen yet, but it is helpful to have on hand :)

I’m really starting to see what all this baby can do :)

4 Coding Errors

Posted by Chad January - 14 - 2009 - Wednesday ADD COMMENTS

I was reading the news late last night, and I came across a new study put out by the NSA, Homeland  Security, Microsoft and Symantec. The BBC article was entitled “Dangerous Coding Errors Revealed.” I found it interesting though, that the top four problems listed are all directly related to web applications–although the principles can be applied offline as well, they hit especially close to home with web apps. I thought I’d run down these four and go a bit into detail as to how MediaWiki handles each one

  1. Improper Input Validation – Every good application developer knows that if they do not sanitize their user input, it will come back to bite them. I used to have a sign on my desk at work that said “Trusting your users to only put a number in the age field was your second mistake. The first was not filtering all user input.” This stemmed from (true story) an application developed internally that failed to check user input for validity before handling it. When I began testing it, the developers were concerned that I was putting non-appropriate input into their application. They asked me “Why would you do that? A user will only put X in Y field.” I replied, “You trust your end users?” The unfortunate matter is that we as application developers cannot trust our end users. We know that a number goes in the age field, but we can’t assume our end user grasps that. Not to mention those who will purposefully put in bad data to try and break your system. MediaWiki handles this all very nicely through the WebRequest class, rather than handling raw $_GET and $_POST superglobals. This way, when someone calls getIntOrNull(), they know damn well that they are getting an integer or they are getting a null.
  2. Improper Encoding or Escaping of Output – Anytime you have output, especially when it’s user-driven output, you must ensure that you’re escaping all the right things. With MediaWiki and other webapps, this means running any and all user output through htmlspecialchars() before injecting it into the HTML. Failure to do this will allow your user to provide arbitrary markup that will get added to the page. Major security risk :). In addition, you need to make sure you’re sending properly encoded data that your user expects. With MediaWiki, this means that when we say we’re sending utf-8, we need to actually send it. And vice versa. Failure to make our actual output coincide with our expected output will generate warnings at best, and complete failure at worst (as I was talking with my Comp Sci TA today: I long for the day that unicode support is truly universal and we don’t have to _think_ about how to handle it)
  3. Failure to Preserve SQL Query Structure – This all goes back to proper user validation. If you don’t validate your user input, you’re already opening yourself up to some rather icky holes. To make matters worse, using user input within SQL can be even worse. At best, you’ll generate some errors. At worse, you’re exposing your data to malicious users, those that would want to either farm private data or hose your database. Either way, this input cannot be allowed. And many times, it’s as simple as escaping your quotes and occasionally a % if you’re doing LIKE statements. I don’t know a single place in MediaWiki where user input that is being added to SQL isn’t first run through addQuotes() or escapeLike(), respectively. Any such places need to be tracked down and fixed, of course.
  4. Failure to Preserve Web Page Structure – This all goes back to user-generated input. When you have a static HTML page, there’s no issue with page structure (this is assuming you generated an appropriate structure to begin with) as the user has no control over what your page contents are, much less how they’re formatted and arranged. However, when you begin accepting user input–this applies doubly so when the majority of the page content is user-generated–the entire equation changes. Simple text is fine, provided you escaped it (see point 2). What happens when you allow arbitrary HTML, or even a subset, as in MediaWiki’s case?  Which tags are allowed? Which ones are denied? How do you deal with the situation if a user puts markup that breaks your page? Do you allow overriding of non-content areas (the various images at the top right for protection/featured/etc come to mind)? All these things MediaWiki handles (as far as I know, it’s all Parser stuff :).

I rather enjoyed this article actually; I just wish the article went into more depth on the individual issues. Any of the ones on the list that stand out to you guys? Some don’t apply as heavily in PHP-based web applications, of course.