31 December, 2007

Last Performance Blog

According to a reply on my previous blog, I could just as well test if having my hamster run in his wheel would increase drawingspeed as long as I don't use valgrind and cachegrind.

Well, I've never used those tools before, but hey. Let's give it a try.

Valgrind and Dolphin (resizing all the time).
Well, valgrind talked about memleaks. I get that. The details of that, however, don't tell me much (of course, that's an understatement).

==21180== LEAK SUMMARY:
==21180== definitely lost: 8,383 bytes in 342 blocks.


The output for individual parts where something is lost is like this:
==21180== 216 bytes in 1 blocks are definitely lost in loss record 175 of 257
==21180== at 0x4021765: malloc (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==21180== by 0x56BB885: _XimOpenIM (in /usr/lib/libX11.so.6.2.0)
==21180== by 0x56B88CF: _XimRegisterIMInstantiateCallback (in /usr/lib/libX11.so.6.2.0)
==21180== by 0x5699517: XRegisterIMInstantiateCallback (in /usr/lib/libX11.so.6.2.0)
==21180== by 0x503F73D: QXIMInputContext::QXIMInputContext() (qximinputcontext_x11.cpp:361)
==21180== by 0x503E4A5: QInputContextFactory::create(QString const&, QObject*) (qinputcontextfactory.cpp:120)
==21180== by 0x4B928F1: QApplication::inputContext() const (qapplication.cpp:4541)
==21180== by 0x4BD5046: QWidget::inputContext() (qwidget.cpp:245)
==21180== by 0x4C051CC: QWidget::destroy(bool, bool) (qwidget_x11.cpp:853)
==21180== by 0x4BD7D8F: QWidget::~QWidget() (qwidget.cpp:1264)
==21180== by 0x4EADB04: QLineEdit::~QLineEdit() (qlineedit.cpp:357)
==21180== by 0x4E76C8A: QComboBox::setLineEdit(QLineEdit*) (qcombobox.cpp:1599)


Above is with QtCurve as style, btw.

Well, I think that would explain it all if I just had more brains. Unfortunately, I don't, so let's go on to kcachegrind. Again with QtCurve, which, as I mentioned before, draws a bit faster than Oxygen.

KCachegrind
Well, what can I say. Lovely graphs, that's for sure. As far as I can tell, this tool shows where the app is spending it's time. I guess the 100% ld-2.7.so doesn't mean that's what the app does all the time.
The < cycle 13 > I see as LibQtCore.so.4.4.0 doesn't explain much either. Digging down (this kcallgrind tool is pretty cool, even though I don't really understand it) I seem to see a lot of calls to QWidgetBackingStore?!? And there are over 2.4 million calls to isPaintOrScrollDoneEvent, but the % spend there (if that's what the % mean) is just 1.5%... Hmmm, again - big ?????

Or is that second line, self, the one which shows how much a function is called? Sounds like it could be true. In that case, at least 12% of the time of Dolphin (when resizing all the time, of course) is spend in varous libc-2.7.so calls.
The funny thing is that the orders for the libc-2.7.so functions seem to come all from the same Qt function: QListView::paintEvent < cycle 13 >. And then some KFileitemDelegate::paint things.

This could mean QListView is the problem (did I say something about QScrollBar? Does that have something to do with QListView?)? Or does this mean my hamster simply didn't run fast enough?

I'm sorry. This does not make anything clear to me. As I said before, I'm going to wait for someone actually knowledgeable in this area to say something.

A commenter on my previous blog did use callgrind, btw, but his findings were not very clear (to him, I mean, they were obviously not clear to me anyway).

Luckily I received also a few private mails, ranging from "Xephyr does indeed weird stuff, Qt4 apps draw much faster in it than in plain X" to "go on blogging about this". Which I won't, btw, as I never intended to go this far. Sure, many ppl seem to find it interesting, but the complaints about "don't whine, do something", while not making me happy, have a point. I can't do anything, so someone who actually CAN should take over. Or not, in that case - well. Shit happens. Maybe just use GTK apps, then. I had to install a bunch of GTK libs for kcachegrind anyway, for some weird reason:
libgnomecanvas, libgnome, libbonoboui, gnome-keyring, libgnomeui, graphviz, gail, gconf, gnome-vfs and libbonobo (including a bunch of errors about missing gtk things and other stuff, even though kcachegrind works fine)
Why KCachegrind needed that stuf: please let me know. I'd love to get rid of it again.

Edit: aah, probably optional (but apparently compiled in) dependencies of Graphviz... Weird, the graphviz page doesn't mention them, so I still think it's faulty.... So far for Arch linux is mean & lean.

Oh, and Happy new year for those who already are in 2008 when they read this ;-)

Edit2: And the Gwenview author, Aurélien Gâteau, just emailed me to ask me to test some improvements to the panning in Gwenview 4 he made in response to my blogs. So I did test it, and I am happy to tell you all the slower panning in Gwenview in KDE 4 is totally gone - the KDE 4 version is now as fast as the KDE 3 version! Thank you, Aurélien!

Edit3: make the blog sound less like I'll be crying all night because someone said bad things - after all, that's not the point, and I don't want to sound like I won't blog anymore if ppl are mean...

30 December, 2007

performance and Qt 4.4 again

My previous blog resulted in an large amount of reactions, many of which warrant a reply from me.

I blogged about bad drawing performance, and the big question I had was: what was causing it.

Firstly, a few video's were posted which showed the issue:
http://bw.uwcs.co.uk/kde3_resize.ogg
http://bw.uwcs.co.uk/kde4_resize.ogg

I hope the person posting them won't mind the bandwidth sucking....

Now I ended my previous blog with mentioning this issue seemingly was a bug in Qt, and more precisely, QScrollBar, based on the profiling one reader did in the blog before that. But in the comments section another issue came up: first mentioned by Benji, who discovered to his surprise his 3 year old laptop with integrated (Intel) graphics performed better than his NVidia GeForce 8800GTS... Quite a clue as to where the issue might be, I'd say. An article talking about the upcoming (now out) NVidia driver mentioned quite a huge improvement in XRENDER performance, so that might make a difference. I've compiled and installed the driver, and unfortunately, no visible difference. I can show the graph, but just take my word: Konqi still takes between 10% and 20% CPU when scrolling (that's with dualcores, so it's actually between 20 and 40% of a 2.4 ghz AMD core...). Resizing still barely goes over 5 frames a second (mostly a lot less).

So it probably isn't the NVidia driver - somebody with an old ATI Radeon (pretty good FOSS drivers available) also has the issue. But I do wonder about the apparently good Intel performance. More convincing against the NVidia-does-it case is an comment from Rui, who mentions sub-standard Qt 4 performance on Mac OS X:
Hi, FWIW, on Mac OS X, the last.fm client, which bundles some version of Qt4, is also a lot slower redrawing itslef when resizing than, say, a Finder window, even if said Finder window is displaying the new coverflow view of files.

I'll try getting some numbers later. Maybe checking Qt4's performance on windows would be constructive too. By checking on other systems we are eliminating a whole bunch of probable places where the slowdown could be occurring. This kind of performance problem is quite difficult to measure especially because there are so many variables going under (system frameworks) and above (the app itself) Qt.


So, were are we at. A short overview:

- It is not the Oxygen style.
- It is not Qt doublebuffering.
- It is not 3D/compositing stuff.
- It is not the (lack of) alien widgets.
- It is most likely not the XRENDER performance of the NVidia drivers (but maybe Intel does something right?)

So, we're back at Qt again. Why does Qt 4 drawing seem to perform so much worse on some hardware? It could be that Qt 4 is much more dependent on hardware acceleration - and if that's not done properly by the drivers, drawing suffers. I wouldn't know. But if it IS drivers, the new NVidia driver should probably fix it - and it doesn't. I know there is more to drivers than just XRENDER, so I should probably let others talk about that. Actually, that would be a good thing - maybe a graphics guru (our own Graphics Ninja perhaps?) could chime in, say something?


----DISCLAIMER
Now I also have a disclaimer, as my previous blog also resulted in a comment blessing me by the name of 'whistleblower', and someone else wasn't to happy with the "this is Qt's fault".

I think I was clear enough on this, but I want to repeat it: I don't think this is a huge issue. Sure, it looks bad, but that's about it. The faster application startup we have in KDE 4 far outweighs the slower drawing. Performance is much more than drawing, and imho having to wait for an application to start is much more annoying.

Secondly, my measurements are of course entirely unscientific. After all, the difference on can see with the bare eye must be relatively large to really count. And it might be that there isn't one cause but there are several causes. Maybe it is a combination of the double-buffering, bad drivers, Oxygen style and lack of alien widgets. But I have tried resizing dolphin with the QtCurve style and Qt 4.4 (aka alien widgets) with the new NVidia driver - and I still clearly could see it draw. Oh, and turning the double-buffering of leads to 100% cpu usage when resizing even a slight bit so that won't help either ;-)

Third, maybe it isn't Qt's fault at all - it might be that the way Qt4 works (more hardware acceleration, more use of animations, transparency, etc) just exposes bad performance in X.org and/or graphics drivers. Seriously, I find that very likely. Seeing how many Qt developers seem to care about performance, it seems unlikely they would release something which is so much slower than Qt 3. So it is very well possible it works fine on THEIR hardware. Apparently, the FOSS Intel drivers fare pretty well, so maybe they all have these nice MacBooks and such ;-)

Last, I didn't file a bug. First because I'm still trying to figure out what exactly causes this, and secondly I'm simply not knowledgeable enough to start profiling apps. And also because this process seems to work rather well - with the help from the many readers of this blog, I feel we're getting closer to finding the reason of the issue.

So, while I look forward to a comment from someone who can tell me he/she found the issue, this is in no way a blocker for KDE 4.0 or a horrible thing for Qt 4.x. What I would do is recommend to the KWin developers to set KWin to NOT display content in resizing windows by default. That wouldn't look incredibly cool, but it would look far better than horribly slow resizing windows. Lubos? Rivo? Are you guys reading this?

29 December, 2007

performance and Qt 4.4

And again I blog about performance in KDE 4. In my last blog about that, I did express the hope the Trolls would be able to fix the issue. So, today, I tried to test the cool alien widgets which are supposed to fix flicker-when-resizing! To cut right to the chase, no, it didn't work. Blegh.

I know I'm a sucker for speed. I want my computer to wait for me, not the other way around. I mean, a machine which can easily do millions of computations a second should be fast enough to keep up with my freakingly slow brain, right? It can take up to 1/10th of a second for a signal to travel from one end of the human brain to the other - that's not even the same league as my PC is playing in... Not to mention it has two cores and 3 GB of ram to go with that insane clockspeed ;-)

So I have played with kernel patches (Con Kolivas' work is and has always been amazing in that regard). I've tried Gentoo (aw, yes, I did...). With newer GCC, X.org and linux IO and process scheduler versions, my KDE got much more smooth. Not perfect, no - but my biggest gripe, application startup speed, well - that's covered now. KDE 3.5.x didn't do badly, but if you see the difference with KDE 4 - you're gonna be surprised. Really.

Unfortunately, as I found out some time ago, drawing performance has gone through the drain in KDE 4. Now that issue is less important. Having to wait for an application to start up is annoying - you actually have to wait. If an application redraws slow, that means resizing a window won't look good. That's about it. Not a huge deal, if it wasn't for all the cool eyecandy in KDE 4 (oxygen, animations). Really, it doesn't fit very well if your apps look bloody cool but resizing causes epileptic seizures... Besides, it's not only the resizing, but also the panning of an image in Gwenview and the autoscroll feature in Konqi. The latter manages to use up to 40% of one CPU core, and the former also redraws visibly slow.

As a non-hacker, I'm not sure all issues I find are even related - but hey, one can try to figure out the issue anyway. So, here are suggested reasons for bad performance:
- Oxygen style is slow. Yep, it's a bit slower than for example QtCurve, but not much. Not THE problem.
- Qt doublebuffering is slowing everything down. Nope, disabling makes the application flicker horribly, but it won't be any faster.
- 3D stuff is slow. Sure is, but I ran the apps in KDE 3, no compositing. Not the issue.

So, now I wonder if the slowness could be fixed by the Alien Widgets, which have arrived in Qt 4.4 snapshots.

I downloaded a Qt 4.4 snapshot (today's snapshot, from 29-12-2007), compiled and installed it, and recompiled KDE with it. As I already said - no improvement, at least not as far as I could tell. Sure, with QtCurve the issue did get better, but still - if you have the KDE 3 dolphin resize almost entirely smooth, while the KDE 4 one paints visibly (I could count the number of repaints when resizing if I wanted to) - not good.

The panning in Gwenview isn't faster either, but I guess that doesn't have much to do with the alien widgets thing. I couldn't test konqi scrolling, it doesn't load webpages with Qt 4.4 ;-)

So, it's not the Alien widgets which will come to rescue KDE 4 from bad drawing performance... I see currently one possible reason for the bad speed. An anonymous reader pointed out in my last blog how Qscrollbar isn't that fast:
It seems related to QScrollBar.
Profiling konqueror in auto-scroll mode, I obtain 50% (!!) of the time spent just repainting the scrollbar.

I found no bugreports for this in TT's bugtracker (if the anonymous reader is reading this, maybe he/she should file a bug?). I'm afraid KDE 4.0 and 4.1 will have this issue - IF this gets fixed, it most likely will take until KDE 4.2 is out for this to get into the users hands. Unless the Trolls decide it's important enough for a bugfix release, of course... Personally, I hope so, this really looks bad.


Further info:
I use the following additional cmake-options in my .kdesvn-buildrc:
-DKDE4_BUILD_TESTS:BOOL=OFF -DCMAKE_BUILD_TYPE=release
-> that should lead to no debug symbols, if I understood some recent threads properly.
I've tried several styles, from the with Qt included clearlooks, QtCurve and default Oxygen. Yes, Oxygen is a bit slower with its cool background gradient, but it's not like the other ones are smooth anyway.

One last thing, of course all this might be my fault. I think I do know a little about compiling and stuff, but I'm not a code writer, and profiling is scary for me. So maybe I've made a horrible mistake, and the KDE 4.0 packages my distribution will provide soon after the release (love the rolling release schedule in Arch) will be as fast as a fox. And I don't mean FireFox, as that must be the most horribly slow painting application we have on linux (and windows and Mac OS X).

Let's hope so.

24 December, 2007

I wrote about the performance of KDE 4 some time ago, and I'd like to revisit it.

After a few threads about debug builds and release builds on kde-core-devel I figured I was wrong in my previous entry, KDE 4 couldn't easily be build without debugging symbols. So my build WAS a debug build. But now it's possible to have it clean and fast, and I did indeed see an increase in performance when trying it.

Apps start up even faster, it's rather weird to see an application which always took a noticeable delay to start up now pop up like its a basic calculator ;-)

Drawing is also a bit faster, but still clearly not as fast as in KDE 3. Resizing dolphin on KDE 3 is almost entirely smooth - very unlike the rather unpleasant experience on KDE 4. Gwenview is a bit faster, it doesn't draw an image visibly when you drag it anymore but clearly isn't as smooth as the KDE 3 version. It IS faster loading large images though, a 6400x3200 image loaded in a snap (!) while the KDE 3 gwenview made me think it got stuck...

And here's a new Konqi-auto-scrolling picture. It starts with XPLANET (nice 20, yes), then 5 seconds pause, and I start scrolling the Konqi from KDE 3. You don't see a visible difference in CPU usage... When I increase the speed of the scrolling, there are some very small spikes but generally the CPU is still barely used.



The picture for KDE 4 is different - approximately 10% of my cpu is needed to scroll a page slowly, which increases to 20% when I increase the speed of scrolling.

Notice that these numbers DO represent a decrease compared to my previous test, which showed twice as high CPU usages in KDE 4. So it might be no more debugging, or some performance work - either way, Konqi got twice as fast already ;-)

As discussed on the other blog post, this is all with a dualcore 2.4 ghz AMD, 3 gb ram, proprietary NVidia drivers (geforce 6600) and NO compositing. Tests are run in a KDE 3 environment. Yes, double-buffering is on in Qt4 - as it should be, it looks horrible without... The whole area flashes when scrolling.

One person mentioned a potential scrolling bug in Qt4, which makes it redraw the whole area when scrolling. This could indeed explain the flickering without double-buffering and the performance problems, I guess. Hopefully a new Qt version fixes this, I found quite a few scroll-speed related bugs in the TT bugtracker.

I think I can conclude the performance issues with drawing are not as bad as I depicted in my previous blog, and it's very well possible they can and will be fixed. Meanwhile, KDE 4 performs amazingly well in many other areas - most noticeably application startup.

EDIT: With some help (see comments) I was able to compile QtCurve for KDE 4, and tried it. Some argued the slow drawing when resizing might be due to Oxygen behing slow, and that showed indeed at least half true. Dolphin resizes faster with Qtcurve - but the KDE 3 version is still much more smooth.

16 December, 2007

Performance 3

Hi all,

After some raving blogs about how memory-efficient and fast KDE 4 is, I decided to test something myself.

I felt there are regressions in drawing performance in some apps, and I tried to quantify that a little bit. Of course, entirely unscientific, but the patern should be clear... For those who wonder, as far as I can tell, debugging is off in my build and my hardware is pretty nice - a 2.4 ghz dualcore AMD with 3 GB ram. It was all done in a KDE3 environment.

Procedure:
1. Start Ksysguard (the kde 4 version of course, as it looks much cooler), and minimize.
2. Do some Stuff.
3. Use Ksnapshot on Ksysguard ;-)

I tried Konqueror and it's autoscrolling feature (shift-down-arrow) and Gwenview (drag a big picture around). Here the results:



As you can see, Konqi in KDE 3 uses barely any CPU over the baseline when autoscrolling. The KDE 4 version, on the other hand uses quite some CPU.



Here I pan (is that the right word?) the image around. As you see, Gwenview 4 easily uses 100% cpu (one core) and is visually very laggy, too. The KDE 3 version looks entirely smooth, and indeed doesn't get to use a cpu core fully ever. Again, a performance regression.

Well, what can I say. There sure is still a lot to optimize, and maybe these issues can be fixed very easily. After the positive blogs about memory usage I became rather sceptical - I mean, with so much code changed and so many new features - could KDE 4 really be that much faster? Now I didn't test memory usage, which might very well indeed be much better - but at least these two apps got slower in the rendering/painting department, so here's the balance ;-)

Other remarks: yes, ksysguard uses 20% cpu when drawing 4 sensors and put full-screen. Awful indeed, but the KDE 3 version doesn't do much better, and minimized the cpu usage is reduced to almost nothing. Besides, it looks damn sexy, doesn't it?
The big spikes marked 'xplanet' - you guessed it, it's xplanet, which updates my desktop picture every 5 minutes using some pretty pretty big planetary pictures - so it takes a while.



One more thing. I played with the colors in Oxygen again. And in my humble opinion (those who know me know I'm rarely humble, but whatever) Oxygen rocks, it really does. It is one of those rare themes which looks amazingly good in a wide variety of colorschemes. Despite the fact there are a few unfinished things (but that's true for almost all of KDE 4) it is imho the best theme ever made. Original (it looks as much as Vista or Mac OS X as those look like Windows 95) and brilliantly smooth.

The best thing about Oxygen is some quality it shares with Dolphin: it grows on you. At first, Dolphin didn't appeal to me - I love konqi as a filemanager. And I wasn't impressed with Oxygen much either, I normally prefer very glossy themes like liquid or Polyester. Oxygen, I thought, was kind'a boring.

After using it for a while, I've grown very dissatisfied with any theme on KDE 3 - I find Domino, one of the most modern KDE 3 themes, bearable, but that's about it. And I find myself using Dolphin more and more, to the point that I installed the KDE 3 version on my main system...

So, to those bashing Oxygen, Dolphin and Kickoff - wait until you've used it for a while. You'll probably discover you don't wanna go back...

more ramblings about vision and future ;-)

In the comments section on OSnews someone noted how it would be very hard to gain marketshare for FOSS due to the dominance (and use thereof) of Microsoft. I thought my reaction was worth posting as a blog ;-)

Indeed. Uptake of linux will go slow, and not at all if Linux doesn't offer advantages to both developers and users. We need to be clearly better than Microsoft and Apple. We need to out-innovate them and bring the latest technology at the earliest possible moment.

Yet it happens to be that we're particularly good at these things, and as our ecosystem grows, we grow faster and stronger. FOSS development techniques, unlike proprietary development models, scale pretty well.

As more and more companies are joining FOSS one way or another, we will cross some threshold were a FOSS system will have such clear advantages it will be impossible to ignore. Lower costs, better availability and more capabilities; and a more healthy ecosystem with more competition and smaller, more innovative companies.

In the long run, I don't think MS and the other proprietary vendors will be able to stop it - unless, UNLESS they can do so through legal and political barriers. But the tide is turning against them - Europe is slowly getting committed to FOSS, and it is keeping a close eye on abuse of marketpower - which is the only real asset Microsoft has against FOSS.


The above is why I think what is going on with KDE 4 is crucial. I've blogged before about that, and I also talk a lot about these things on forums... One on of these posts, I wrote about Novell and how they are (imho) wasting resources (re)writing stuff for yast in GTK. (probably worth reading before continuing. I didn't want to paste all my posts here...)

And one of the comments was that I was trolling. Now, I agree I wasn't very nice. Saying Gnome release announcements feel like a timemachine from the KDE 3.3 era - a bit harsh, I know. And also of course an exaggeration - there is regularly stuff in there which was not in KDE 5 years ago. Anyway. I did reply on it, but I think Segedunum put it much better:
Ahhhh, the emotive question of open source desktops. You can only go so far before pointing out the truth, and when you do, others are simply not going to like it -).

Aaah well. Time will tell what happens. At least we have big plans.

Another thing, as my previous dwarfhamster died (did I blog about that?) and I went through the mandatory period of mourning, I got a new one. This time not from Animal Protection (or whatever you'd call those guys in english, too lazy to look it up) but from a pet store. I'm not too fond of those, as they generally don't treat animals too well, but OK, it was a gift by a loved one ;-)

So the question to ya'll. How should I name him? Not tux or something silly, nor those horrible names like 'sniffy' or Blacky or whatever painfully cliche names ppl seem to come up with.