30 December, 2007

performance and Qt 4.4 again

My previous blog resulted in an large amount of reactions, many of which warrant a reply from me.

I blogged about bad drawing performance, and the big question I had was: what was causing it.

Firstly, a few video's were posted which showed the issue:
http://bw.uwcs.co.uk/kde3_resize.ogg
http://bw.uwcs.co.uk/kde4_resize.ogg

I hope the person posting them won't mind the bandwidth sucking....

Now I ended my previous blog with mentioning this issue seemingly was a bug in Qt, and more precisely, QScrollBar, based on the profiling one reader did in the blog before that. But in the comments section another issue came up: first mentioned by Benji, who discovered to his surprise his 3 year old laptop with integrated (Intel) graphics performed better than his NVidia GeForce 8800GTS... Quite a clue as to where the issue might be, I'd say. An article talking about the upcoming (now out) NVidia driver mentioned quite a huge improvement in XRENDER performance, so that might make a difference. I've compiled and installed the driver, and unfortunately, no visible difference. I can show the graph, but just take my word: Konqi still takes between 10% and 20% CPU when scrolling (that's with dualcores, so it's actually between 20 and 40% of a 2.4 ghz AMD core...). Resizing still barely goes over 5 frames a second (mostly a lot less).

So it probably isn't the NVidia driver - somebody with an old ATI Radeon (pretty good FOSS drivers available) also has the issue. But I do wonder about the apparently good Intel performance. More convincing against the NVidia-does-it case is an comment from Rui, who mentions sub-standard Qt 4 performance on Mac OS X:
Hi, FWIW, on Mac OS X, the last.fm client, which bundles some version of Qt4, is also a lot slower redrawing itslef when resizing than, say, a Finder window, even if said Finder window is displaying the new coverflow view of files.

I'll try getting some numbers later. Maybe checking Qt4's performance on windows would be constructive too. By checking on other systems we are eliminating a whole bunch of probable places where the slowdown could be occurring. This kind of performance problem is quite difficult to measure especially because there are so many variables going under (system frameworks) and above (the app itself) Qt.


So, were are we at. A short overview:

- It is not the Oxygen style.
- It is not Qt doublebuffering.
- It is not 3D/compositing stuff.
- It is not the (lack of) alien widgets.
- It is most likely not the XRENDER performance of the NVidia drivers (but maybe Intel does something right?)

So, we're back at Qt again. Why does Qt 4 drawing seem to perform so much worse on some hardware? It could be that Qt 4 is much more dependent on hardware acceleration - and if that's not done properly by the drivers, drawing suffers. I wouldn't know. But if it IS drivers, the new NVidia driver should probably fix it - and it doesn't. I know there is more to drivers than just XRENDER, so I should probably let others talk about that. Actually, that would be a good thing - maybe a graphics guru (our own Graphics Ninja perhaps?) could chime in, say something?


----DISCLAIMER
Now I also have a disclaimer, as my previous blog also resulted in a comment blessing me by the name of 'whistleblower', and someone else wasn't to happy with the "this is Qt's fault".

I think I was clear enough on this, but I want to repeat it: I don't think this is a huge issue. Sure, it looks bad, but that's about it. The faster application startup we have in KDE 4 far outweighs the slower drawing. Performance is much more than drawing, and imho having to wait for an application to start is much more annoying.

Secondly, my measurements are of course entirely unscientific. After all, the difference on can see with the bare eye must be relatively large to really count. And it might be that there isn't one cause but there are several causes. Maybe it is a combination of the double-buffering, bad drivers, Oxygen style and lack of alien widgets. But I have tried resizing dolphin with the QtCurve style and Qt 4.4 (aka alien widgets) with the new NVidia driver - and I still clearly could see it draw. Oh, and turning the double-buffering of leads to 100% cpu usage when resizing even a slight bit so that won't help either ;-)

Third, maybe it isn't Qt's fault at all - it might be that the way Qt4 works (more hardware acceleration, more use of animations, transparency, etc) just exposes bad performance in X.org and/or graphics drivers. Seriously, I find that very likely. Seeing how many Qt developers seem to care about performance, it seems unlikely they would release something which is so much slower than Qt 3. So it is very well possible it works fine on THEIR hardware. Apparently, the FOSS Intel drivers fare pretty well, so maybe they all have these nice MacBooks and such ;-)

Last, I didn't file a bug. First because I'm still trying to figure out what exactly causes this, and secondly I'm simply not knowledgeable enough to start profiling apps. And also because this process seems to work rather well - with the help from the many readers of this blog, I feel we're getting closer to finding the reason of the issue.

So, while I look forward to a comment from someone who can tell me he/she found the issue, this is in no way a blocker for KDE 4.0 or a horrible thing for Qt 4.x. What I would do is recommend to the KWin developers to set KWin to NOT display content in resizing windows by default. That wouldn't look incredibly cool, but it would look far better than horribly slow resizing windows. Lubos? Rivo? Are you guys reading this?

33 comments:

  1. Out of curiosity, could you try for me how it performs in Xephyr? (Yes, really.)

    I recently had a case with Qt 4 where some drawing stuff performed rather horribly on my main screen (powered by nVidia), and inside a Xephyr instance it suddenly went super-smooth. Totally baffling.

    ReplyDelete
  2. > The faster application startup we have in KDE 4 far outweighs the slower drawing.

    I disagree. My applications start up, when the desktop initializes, while I'm sipping a cup of tea. It's pretty uninteresting, if it takes a few seconds per application. On the other hand, the substandard drawing performance of Qt/KDE is a royal pain in the ass, using the desktop all day.

    ReplyDelete
  3. I have seen a similar problem in KDE3. Using an ATI radeon 9200 (opensource drivers) and konqueror, this http://www.planeshift.it/ scrolls extremely slowly (about 1mm/s), while the same website with an nvidia ti4200 (closed-source drivers) scrolls very fast. My guess is that the large background pixmaps trigger some driver bugs.

    ReplyDelete
  4. Xephyr: I've been trying to do something with it, but just gave up. It doesn't come packaged on Archlinux, and installing it myself clearly is a royal pain in the ass. You claim inside Xephyr Qt 4 apps perform much better? Can you retry, are you sure?

    If it is true, I think it is clear there is some weird bottleneck somewhere in the underlying stuff in X.org...

    About application startup speed: An user starts apps all the time. And it happens more often than you think - for example when systemsettings has been started, switching from one settings panel to another is much faster than with KDE 3 and Kcontrol. While the drawing only really hurts when resizing a window.

    About the planeshift.it page, it uses a lot of CPU here as well. I guess it's just a heavy page. I used the same page (dot.kde.org) in all my tests, btw...

    ReplyDelete
  5. About the Nvidia driver, this page:
    http://www.nvidia.com/object/linux_display_ia32_169.07.html
    claims "Fixed problems scrolling ARGB X drawables in Qt.", so maybe you didn't try the very latest release?

    ReplyDelete
  6. I sure did try the 169.07 driver, according to nvidia-settings it is what I'm running right now ;-)

    Didn't find any problems in there either. DRI works fine etc etc...

    So IF the new NVidia driver fixes XRENDER performance, XRENDER can not be the reason for the bad drawing performance.

    Unless I've got some settings wrong, but then again, Qt 3 performs fine, as do GTK apps like the nvidia-settings thing.

    ReplyDelete
  7. I have experienced very strange performance issues.
    I remember way in the past, when the first Kororaa CD came out with XGL alpha. This was Live CD and everything was very fast. Then I installed compiz and it was fast too. This was on Nvidia Geforce 6600 and I *think* still single core AMD 2 Ghz.

    After KDE 4 Beta came out, I downloaded KDE sources compiled it and it was slow. VERY slow. Moving windows was a pain in the ass, and widgets also. I thought that KDE was to blame. Then I tried to install compiz fusion, but it was also dead slow. From 160 FPS that I had long time ago, compiz showed only 10. The only thing that I changed from that time was, that I installed new Nvidia drivers (100.14.09), new processor (Dualcore 2.4 Ghz) and I got 2 monitors instead of one. I am still running 32 bit system. I tried guides for running composite WMs on the internet but I still can't find what is the source of the problems. Running on one monitors improves performance a little.

    Last KDE 4 builds are getting better and better but I still can't imagine working with them on a daily basis. Moving windows is now quite normal, But choosing something in drop down menu is an adventure ;).

    Maybe you could Try compiz or some other new window manager to see if it is also slow. I know this isn't post about kwin. But maybe the source of the problem is the same.

    ReplyDelete
  8. > ... NVidia driver mentioned quite a huge improvement in XRENDER performance, so that might make a difference

    As far as I saw, only the scaling got accelerated and even not for all cards (no change here with my fx5700). However, Xorg's scaling isn't much usefull because it leaves transparent borders where none should be. Qt4 used it some time ago, but removed it before I could complain :)

    For stylers Qt4 has some pitfalls which should be avoided. TT seems to be allergic to 32 bit pixmaps: Qt4 offers no way to create them without unnecessary QImage transformations. Without those transformations it will be around ten times faster (luckily fixable with a subclass). TT's opinion is that pixmaps should use the deep of the system, imho that's just ridiculous. Tiling is also a problem: because of a bug in Xorg < 7.0 or an older nvidia driver (got contradictory reasons for this), they won't let Xorg do the tiling, which would be up to 25 times faster.

    So the Trolls are indeed at least partly to blame for the performance.

    ReplyDelete
  9. I too am worried about the apparent bad drawing performance of Qt4 in general and especially about the combination with nvidia linux drivers.

    So I did a small test.

    I added 'Option "NoRenderExtension" "true"' to xorg.conf and the resizing performance of Qt4 apps seemed to improve considerably. Unfortunately several rendering bugs appeared, but it's a strange effect nevertheless.

    'Option "RenderAccel" "false"' without '"NoRenderExtension" "true"' did not seem to change the performance in Qt4 apps, but made other applications much slower. This is slightly strange too.

    ReplyDelete
  10. You can try setting the environment variable QT_X11_NO_XRENDER=1 to achieve quickly the same effect as disabling the render extension from xorg.conf. E.g. run "QT_X11_NO_XRENDER=1 designer".

    ReplyDelete
  11. @Marko: The issues you describe have to do with compositing, and are not related to what is going on here, afaik. I myself wouldn't recommend compositing yet, many drivers aren't ready for it - yet.

    @Michael: Well, I wouldn't know much about the stuff you write about, naturally - let's hope a TT developer can elaborate...

    ReplyDelete
  12. Just installed the latest kde4 windows snapshot from http://download.cegit.de/kde-windows/installer/

    Resizing kwrite's configuration dialog feels quite slow, too, also resizing qt designer's main window seems to be slower than a native windows app

    ReplyDelete
  13. I don't know much about such things, but I experience big differences when I rescale konqi with the planet konqi about 20%, Xorg about 40% CPU usage (2.2 GH dual).
    With the filemanaging and few items, konqi has just 10 % while Xorg uses even more: 45% thats 2.2 GH!

    So I think it could be Xorg's fault, because sometimes the app does really just about 1 % and Xorg 45%.
    But maybe the app just gives Xorg too much work.
    (as far as I know Xorg should mainly just communicate with the kernel and the graphics card - What for does it need 2.2 GH!)

    Scrolling works fine for me (15% Xorg, konqi 2% usage).

    But the redraw with rescale is ugly.

    ReplyDelete
  14. QT_X11_NO_XRENDER=1 does, like disabling doublebuffering, make things ugly - but it doesn't seem to visibly help drawing speed (or, again, the difference is to small to be very noticeable - and resizing still is far from smooth) :(

    ReplyDelete
  15. On ubuntu 7.10, when using compiz (with default settings), window contents aren't shown while dragging, and instead there's a simple effect, the window extends in a translucent blue rectangle, similar to rubber-band selection on a file manager.

    I think this effect is great because resize becomes very fluid, natural and fast, instead of watching the apps struggling to redraw to keep up.

    Here's to hoping that such an effect also gets added to kwin.

    ReplyDelete
  16. btw.. the resizing slowness might come from kwin not updating the size constantly but only a few times per second...

    I remember having read something about this early this year.
    Have you compared kwin3 and kwin4 with the same app/toolkit?

    ReplyDelete
  17. @knuckles: that is the option in Kwin to not show the content of resizing windows, as I asked to set as default in my latest (or was it my previous?) blog...

    @Gizmo: As I ran all the tests in KDE 3, there should be no difference between KDE 3 and KDE 4 apps...

    ReplyDelete
  18. I've wrotre an app with Qt4, i use it on linux and Windows. This app is "skinable" using stylesheets.

    When i use a skin using bitmaps and gradiants, i've noticed there is a HUGE difference beetwen X and windows, on windows redrawing widget (like juste text in QLabels) is smooth, on X i just can't use skin.

    ReplyDelete
  19. Again, the intel drivers are smooth on the planeshift page too, so imho it's rather clear that the nvidia/ati drivers aren't accelerating something they should.
    What I think you should do, is profile the x-server, and see what is eating your cpu (use valgrind or something).

    ReplyDelete
  20. Jos, can you check to see what process is using all the CPU? If it's X, then that would tend to mean poor drivers, or a bug there. If it's Konq, then that would probably mean the problem is in Qt.

    It would be nice if a KDE dev could profile this, because it shouldn't be very difficult to track it down at least to a general area since everything is open source.

    ReplyDelete
  21. @last anonymous:
    Top gives rather weird results, imho... When I run it while resizing dolphin, I get this:
    Total CPU usage:
    40% user; 15% system; 46% idle
    X: 62% CPU
    Dolphin: 43% CPU

    Of course, it fluctuates, but this is generally the picture. I do have two cores, which I guess complicates stuff, but it seems both X and Dolphin are rather busy. I don't know why it all adds up rather weird...

    ReplyDelete
  22. See what happens with the vesa driver. Software xrender is often faster than a buggy hardware implementation.

    ReplyDelete
  23. Hmm, interesting. Using stock OpenSuse 10.3, with the included KDE4 games I think I'm seeing the same issue that you're talking about. X and the game are using 50-60% cpu apiece, and redrawing is quite slow. I remember that before I installed the proprietary NVidia drivers X was using 100% and it was taking 5-10 seconds to redraw a single time after resizing. Resizing Konqueror shows a similar 60% X cpu usage, but Konqueror only takes around 20-30%. And it repaints nearly immediately so it looks smooth.

    I suspect that what I'm seeing is that the resizing operation in X is taking up about half of a core, and then the repainting of the app is finishing cleanly in KDE3 apps, while the KDE4 apps are maxing out the rest of the core even while repainting slowly.

    That means that Qt4 is just very slow at painting operations for some reason, which I guess is what you've been saying for several blog posts now. Not too much help from me...

    ReplyDelete
  24. @Robert: vesa is equally slow. The KDE 3 version still almost smooth...

    ReplyDelete
  25. Ok, let's put it this way: is anyone seeing *better* rendering performance in Qt4 than in Qt3? If so, what are your specs?

    TT claim to run benchmarks and claim that Qt4 is the faster, so there must be someone out there somewhere that can duplicate their findings!

    ReplyDelete
  26. nvidia and slow window resizing:
    http://www.nvnews.net/vbulletin/archive/index.php/t-98277.html

    ReplyDelete
  27. You have all this time to download package X and driver Y and you you don't have time to run your application in valgrind? WTF! The point of a profiler is so that you don't waste (days in your sad case) GUESSING as to what the problem is.

    Until you care to run valgrind and use the GUI tool kcachegrind let me add to your list of what isn't causing a problem. Having 2GB or 4GB of ram doesn't seem to make a difference. When I plug in my ipod it doesn't seem to make a difference. I spent 2 whole days re-installing on a different hd, but it didn't make much of a difference. I tried putting my cat on top of my computer, but sorry that didn't do much either. If my monitor is upside down the fps seem to be the same. I tried taking snow from outside and cooling down my computer, but again no difference. Tomorrow I hope to get drunk and then see if it *looks* faster. I'll let you know. Until then how about profiling?

    ReplyDelete
  28. I do know if this is the same problem, but my experience with KDE 4 is that the drawing is a little bit jumppy (excuse my English), I did not measure the cpu usage but I assume this is because of a high cpu usage. It definitely not as smooth as KDE 3. But I am using a NVidia 6200, so it could be some bad interaction between Qt/KDE 4 and the binary driver.

    ReplyDelete
  29. For the guy complaining about profiling: I only have the beta 2 code running here, but this is what i got in 2 apps. Unfortunately nothing really pops out to me. gray_cubic_to and shift functions seem to take up quite a bit in both, but maybe that's expected.

    15.5%: shift(QBezier,QBezier,double,double)
    7.5%: gray_cubic_to
    5.9%: QPainterPath::computeBoundingRect()
    4.3%: gray_render_line
    3.5%: gray_convert_glyph
    3%: gray_render_scanline
    2.7%: QVector[QPainterPath:Element]::append(QPainterPath::Element)


    13%: cycle14, in libQtCore.so. not sure if that's helpful or not, but its called from qt_syncBackingStore and QImage::save most of the time.
    13%: something from libpng.so
    13%: gray_cubic_to
    13%: something from libz.so
    5.7%: shift(QBezier,QBezier,double,double)
    5.7%: gray_render_line
    5.2%: destStoreARGB32(QRasterBuffer,int,int,pointer,int)
    4.9%: something from libz.so
    3.6%: qt_gradient_pixel
    3.3%: destFetchARGB32
    3%: fetchRadialGradient
    2.3%: QPainterPath::computeBoundingRect()
    1.9%: something from libz
    1.9%: gray_render_scanline
    1.6%: comp_func_SourceOver[QSSEIntrinsics]
    1.3%: gray_convert_glyph
    1%: convert_ARGB_PM_to_ARGB
    1%: QXmlStreamReaderPrivate::parse
    .8%: QVector[QPainterPath::Element]::append


    running a simpler app, qtconfig, doesn't really have this problem nearly as badly. I'm not sure if it's just simple enough to not show or if there's something different that isn't causing the problem. It's top functions are mostly in libc,libx11, and libxcb.

    cycle6 from libQtCore.so, called from qt_syncBackingStore and postEventSourceDispatch most often, PolygonRegion(QPoint,int,int), QTesselatorPrivate::collectAndSortVertices(QPointF,int),
    QPainterPath::toSubpathPolygons, and QStroker::joinPoints were the top libQtGui.so functions.

    ReplyDelete
  30. From the NVIDIA link:
    "It's not the allocation that's slow. After allocating the new pixmap, the server copies pixels from the parent window into the new pixmap. For X.org 7.2 and earlier, this used XCopyArea but this turned out to be a security problem for certain windows, so they switched to the RENDER extension's Composite call instead for X.org 7.3. Since pixmaps start out in sysmem by default, this falls back to software *every single time the window is resized*."

    So testing against xorg 7.2 would rule this out.

    ReplyDelete
  31. > About application startup speed: An user starts apps all the time.

    Yes - when the major libraries and other files are all already in the systems memory/cache. What applications startup time mostly is about is getting the darn files from disk, please don't forget that.


    > While the drawing only really hurts when resizing a window.

    The problem is it hurts (not only) my sense of aethetics. I don't _like_ to use applications that feel sluggish and this is the impression, if the code isn't fast enough to perform basic operations without flickering or other drawing issues.

    ReplyDelete
  32. Application startup still matters when most libraries are loaded. Most apps in KDE 4 seem to take just miliseconds to start, while in KDE 3 it often takes half a second or more.

    There are exceptions, but I rarely see the bouncing cursor in KDE 4, while I see it almost every time I start an application in KDE 3.

    To me, this matters more than drawing speed - even though I would love to see that one fixed as well.

    ReplyDelete
  33. This bad scroll-perfomance still is valid for qt-4.4.3+. I tried different graphics-cards (7000VE, 9250, 9550) with radeon-driver and compared qt3/qt4 scrolling.

    ReplyDelete

Say something smart and be polite please!