21 March, 2007

strigi in kdebase

So, Jos van den Oever (yes, the guy working on Strigi) blogged about porting the KFilePlugin's to the new strigi-based infrastructure. Those plugins extract meta-data from files, like in case of a picture, resolution, creation date and such.

This allows Nepomuk integration, less code duplication, and more speed.

Now there have been some insights in Strigi's performance, where it proved to be 40 times faster indexing files compared the the second-best... Of course, the competition has improved since. But how does the new strigi meta-data extraction compare to the old KFileMetaInfo in terms of speed?

I tested this on all png files on my system, using the following two commands:

For KDE 3: time locate png|grep \\.png$|xargs kfile --av
For KDE 4: time locate png|grep \\.png$|xargs xmlindexer

The output of those is comparable, though the xml indexer from strigi extracts stuff in xml form. Both put out the meta information to the commandline, showing things like dimensions, color depth or comments.

Now, on my dualcore 2.2ghz 3gb RAM system, kfile takes 40 minutes to complete (average over 3 runs). Xmlindexer takes an average of 24 seconds. Yes, seconds. That's almost 100x as fast...

xmlindexer (strigi):
real 0m27.262s
real 0m27.056s
real 0m19.328s
Average 24 sec

kfile (kde3):
real 45m5.689s
real 37m14.158s
real 37m17.515s
Average 40m12sec

So, how could this be so much faster? Filecache used by strigi? I first ran xmlindexer (27 seconds), then kfile (45 minutes) and then, after a few hours (during the night), I ran both tests twice again, with a 1 hour interval. Yes, I did use my dualcore in the mean time, mostly watching a movie. So the results aren't exactly scientific, but hey - we're talking a 100 times faster here. A movie won't have such a huge impact... And filecache might have an impact, but if it does - it can't explain the differences. Second and third run of kfile where indeed faster, but still magnitudes behind strigi (which got faster as well).

According to Jos, the speed difference could have two reasons: kfile does something wrong, and KFileMetadnfo isn't that slow, or - well, it is... (I'll have a look at this, btw, expect a second blog about it). It could just be that the png indexer in KFileMetaInfo hasn't seen much work, and most of the speedup could have been gained by rewriting it - so I'll have to check other plugins as well to see how they compare. And currently, the new infrastructure isn't as complete as KFileMetaInfo was - many plugins haven't been ported yet, and there is no KIO integration.

Still, seeing new technology being this closely integrated in KDE, and finding out what it can do makes for some interesting statistics, don't you think? I'll try to test more plugins, and figure out if there is something wrong with kfile. And Konsole, btw, as it crashes every time I try the KDE3 test in it... Maybe the unlimited history gets filled up a bit too fast/much?