Anyone Can Edit

Free content, free software, and just about anything else free

Nearing full power

Posted by Chad November - 20 - 2008 - Thursday

ForeignApiRepo has gotten a bit of work in the past week. I cleaned up some long-standing work I had lying around, and as a result got the thumb caching integrated with the normal thumb paths (ie: in /thumbs/ where they belong). Also, as of tonight, we have findBySha1() functionality as well.

This means local wikis using a foreign api repository (like commons or enwiki) can make use of the sha1-based dupe checks. Score one for catching up functionality that the rest of the FileRepo has had for awhile now ;-)

Now if only I could make mass-lookups not suck…

[UPDATE: I had to remove the findBySha1 functionality temporarily, it broke normal image metadata fetching. The reversion is here, if anyone cares to help debug what went wrong]

7 Responses to “Nearing full power”

  1. pfctdayelise says:

    Yay! Glad to hear you’re making some progress. :)

  2. Tgr says:

    It is already a very useful feature, but it would be nice if it could handle timeouts better. Right now if a call to the Commons API fails, the page gets stuck with a red link, and it won’t try to get the image again on the next page load. Which means that it’s pretty much impossible to get a page with lots of images to show up right on days when Commons is slow. If no information would be stored for unsuccessful lookups, then after enough reloads, eventually all images would get stored and show up right…

  3. Chad says:

    RE: Tgr

    I see what you’re saying. Unfortunately, the Http (from which I am using the get() method) class doesn’t handle timeouts (in general) well at all. It just fails to return data, after timing out.

    Not to mention, if you don’t have cURL, it will attempt to get the data via file_get_contents(), which is almost certain to fail.

    Thoughts?

  4. Tgr says:

    Re

    if the question still stands (I haven’t used the foreign repo feature since then, so I don’t know if this is still an issue): what happens is that the call times out, and the page shows red links and php error messages saying the thumbs/ file cannot be read. The actual file is thumbs// so I’m guessing empty string are getting cached for the file sizes. If caching could be simply aborted when Http::get returns empty, that would solve the problem. (OTOH this would mean that the site would make calls to the foreign repo on every page load. This is a good thing when the problem is that Commons is overloaded or the connection is too slow so the site can get a few images on every call but not a hundred at the same time; but it is unneccessary extra traffic when there is some permanent error, and the “cant download” status actually should get cached.)

  5. Chad says:

    True. I suppose it would make sense to say:

    if( cant-get-data ) {
    // dont bother asking for an hour
    }

  6. Tgr says:

    That depends. If a page has many images and either the local wiki or the foreign repo cannot handle the load, then a few images will download and the rest will timeout. On the next page load, if the local wiki requests them again, some more will download; after a few reloads, everything will work fine. If, on the other hand, there is some sort of permanent error, the perpetually requesting the images would waste time resources on both sides.

    Maybe the best solution would be to check whether any new images were downloaded, and if not, pause quieries for a while; but that seems difficult to implement.

  7. Chad says:

    Could perhaps do it on a per-request basis. Use a static $disabled parameter that could be triggered if a request times out. This provides a nice failure for a single user without damaging subsequent requests that may well complete without issue.

Leave a Reply