26 March, 2007

Strigi performance II

Let's check the performance of some other plugins. The strigi png extractor was clearly faster than the KFileMetaInfo based one, but how about the other fileformats?

So, I went looking for a fileformat which was abundantly available on my pc, and supported by both strigi and kfile. Not much choice, but in the end, xpm turned out to be interesting. Again, xmlindexer was a lot faster. It took xmlindexer 13 seconds to extract the meta-data from all xpm files on my disk, while kfile needed almost 2 minutes (average over 3 runs again).

Now I'm really wondering why the difference is so big. So I went to sysprof to figure out what was happening. It can show you what an app is spending it's time on.

Looking at the results, it seems kfile spends only a very small percentage actually reading the metainfo. Most of it's time, around 50%, it's trying to figure out what mimetype the file is, using kmimetype! So that's the slow part... In the xpm case, a bit more than 3% was spend in reading meta-data, while with the png plugin, it was less than 0.2%!

So the speed difference between kfile and strigi's xmlindexer wasn't really in reading meta-data itself, it was mostly in figuring out the mimetype in KDE. Which strigi does much more efficient.

What does this say about strigi performance? Well, we're back to square one. It's not reliably possible to correct for the slowness of kmimetype, so I can't figure out how much faster (or slower, but I think that's not very likely) the strigi meta-data extractors are compared to kfilemetainfo. At least, we know if someone would write a strigi analyzer for it, figuring out mimetype will be a lot faster in KDE 4...

And of course, strigi could use it's database for meta-data - which might give a big speedup after all. And it will allow things like special directory listings based on some property of the data.

21 March, 2007

strigi in kdebase

So, Jos van den Oever (yes, the guy working on Strigi) blogged about porting the KFilePlugin's to the new strigi-based infrastructure. Those plugins extract meta-data from files, like in case of a picture, resolution, creation date and such.

This allows Nepomuk integration, less code duplication, and more speed.

Now there have been some insights in Strigi's performance, where it proved to be 40 times faster indexing files compared the the second-best... Of course, the competition has improved since. But how does the new strigi meta-data extraction compare to the old KFileMetaInfo in terms of speed?

I tested this on all png files on my system, using the following two commands:

For KDE 3: time locate png|grep \\.png$|xargs kfile --av
For KDE 4: time locate png|grep \\.png$|xargs xmlindexer

The output of those is comparable, though the xml indexer from strigi extracts stuff in xml form. Both put out the meta information to the commandline, showing things like dimensions, color depth or comments.

Now, on my dualcore 2.2ghz 3gb RAM system, kfile takes 40 minutes to complete (average over 3 runs). Xmlindexer takes an average of 24 seconds. Yes, seconds. That's almost 100x as fast...

xmlindexer (strigi):
real 0m27.262s
real 0m27.056s
real 0m19.328s
Average 24 sec

kfile (kde3):
real 45m5.689s
real 37m14.158s
real 37m17.515s
Average 40m12sec

So, how could this be so much faster? Filecache used by strigi? I first ran xmlindexer (27 seconds), then kfile (45 minutes) and then, after a few hours (during the night), I ran both tests twice again, with a 1 hour interval. Yes, I did use my dualcore in the mean time, mostly watching a movie. So the results aren't exactly scientific, but hey - we're talking a 100 times faster here. A movie won't have such a huge impact... And filecache might have an impact, but if it does - it can't explain the differences. Second and third run of kfile where indeed faster, but still magnitudes behind strigi (which got faster as well).

According to Jos, the speed difference could have two reasons: kfile does something wrong, and KFileMetadnfo isn't that slow, or - well, it is... (I'll have a look at this, btw, expect a second blog about it). It could just be that the png indexer in KFileMetaInfo hasn't seen much work, and most of the speedup could have been gained by rewriting it - so I'll have to check other plugins as well to see how they compare. And currently, the new infrastructure isn't as complete as KFileMetaInfo was - many plugins haven't been ported yet, and there is no KIO integration.

Still, seeing new technology being this closely integrated in KDE, and finding out what it can do makes for some interesting statistics, don't you think? I'll try to test more plugins, and figure out if there is something wrong with kfile. And Konsole, btw, as it crashes every time I try the KDE3 test in it... Maybe the unlimited history gets filled up a bit too fast/much?