02 May, 2014

Teacups and storms!

warning: wall-of-text blog. Don't read!

I see quite a bit of misconceptions and unhappy-ness around the new search infrastructure in KDE. While a series of small patches has probably fixed most of that by now, if you want to know what happened, read on. This is based on a mail I send to the openSUSE-KDE list a few days ago. And based on what I figured and understood from the developers - I'm not exactly a coder myself, so I'm sure some minor errors have seeped in. I'd be happy to fix them!

Why?

For starters, I would strongly suggest you read this dot article to get a bit more of the background behind it. Some people wondered if the new search got enough testing. We did quite a big testing push for this release, see this call to action and the liveCD I kept updated. There was social media stuff, too, of course.

If you've read the article on the dot, you will understand why the new search (baloo is the technical term, which I'll avoid using) came to be: it is orders of magnitude faster and more reliable and otherwise simply re-uses most of the previous infrastructure so it should not really introduce new issues. For almost all users, it should give a big improvement over the previous version.

New issues?

This turned out to be true and wrong at the same time. The new search was so much more efficient at indexing, it could totally clog up the disk on certain systems. Depending on the kernel version, settings and hardware, it was possible that reading and writing would overwhelm Linux' I/O system and slow down the entire computer to a crawl until indexing was done.

Usually, this would not take long, as the new search indexes so fast. A few minutes is enough for many gigabytes of data. But certain files can not be handled properly by the indexer - among them, text files over 20 megabyte. They take a long time. When it detects this, Search will put these files on a 'bad list' and not index them again.

Note that the old search had these same issues, but they were less noticeable among the general performance problem it had... It would eat lots of CPU frequently anyway, mostly because its database (virtuoso) was simply not very suitable for desktop usage!

As the new indexer indexes in batches of 40 (for performance reason), it needs some time to detect the problematic file. In 4.13.0, it has a time-out of 5 minutes on such a batch, then do the two halves again to see which half contains the bad file, then cut those remaining 10 in half again and so on until it knows what file exactly is faulty. You can imagine this can take a while, and the 5 minute timeout has now been shortened to about 2 so this should go faster. Also, the indexer has been improved to deal better with these files. Of course, very few people have such large text files lying around, but those who do did get bitten painfully, and are probably happy that this has been mostly (but not completely...) resolved.

Lastly, indexing will stop immediately on laptops when the power plug is taken out, to not shorten your battery life.

I hope the users who did get in trouble (often relatively early adopters) understand why we released Search: for all testers and developers, it worked and provided huge benefits. Why make users suffer by not releasing if you have something so much better?

About testing

For users of repositories like openSUSE's KDE:Current or distributions which were quick to ship this release, it can happen you bumped into these and other issues. Certainly the developer, Vishesh, and everybody who helped testing (including myself) wanted to try out everything to make sure it was stable. I installed this on my work laptop, despite the risk, so I could test in a realistic scenario. And I've send various screenshots of issues, including performance problems to Vishesh, who promptly fixed them. Few people are as responsive, responsible and hard working as he is.

Which makes it all the more frustrating that a lot of people are loudly yelling at him and others for their work. I can get that some people have no time to test. I accept that. But then THEY should accept that that might mean that their scenario, be it hardware, usage, or configuration, is NOT TESTED. Unfortunately, we can't delay releasing forever until every user has tested all our software - that would never happen. If it works for all testers, we release it. There is no sane alternative.

If you have problem with that - it is the Universe you have a beef with, not us. This is the reality with volunteer work you're not paying for.

Configuration

Some users were unhappy that most of the configuration of Search had disappeared. As most efforts went into making the new search as fast and unobtrusive as possible, the UI received little love. The most important thing, the ability to exclude some or all user data from indexing, is there. Indexing can be disabled by simply adding your home folder to the 'exclude' list - there is nothing to be indexed at that point, so Search won't index anything.

Few users needed more: the ability to configure specifically what files are indexed or not, for example. This is available in the configuration files, and a GUI has been made available by Lindsay, one of the people unhappy about the lack of configurability: here. She had something nice to say about the code ;-).

You can see the GUI in the gif on the right. I expect that the group of users in need of these advanced features but not capable of installing this KCM by hand is extremely small, and in a newer release perhaps a 'advanced' button or something similar can be added. If you want to read more about this, perhaps the discussion in this google Plus thread is interesting, especially the one from Thomas Pfeiffer at the end.

On or off by default?

Search has been enabled in Plasma Desktop for quite a while now, and the new Search had proven itself for all testers to be far superior to the previous release. There seemed to be no reason to not enable it. Moreover, Search is something you'd expect in 2014. Every competitor ships it by default, and most users, not even aware it exists, use it. Those more technical users who are aware of the existence of the various search technologies are often smart enough to be able to turn it off if they want to.

Seriously, Vishesh is convinced a Raspberry Pi running Search would be able to index an terrabyte hard drive over USB in less than 15 minutes. Honestly, he said 5 minutes to me, but I just can't believe that so I tell people 15.... It'd be fun if somebody could try this out and tell us how long it really takes.

Why can't I choose?

Some users on the openSUSE KDE list complained in various forms and shapes about a 'lack of choice'. Now there is no 2nd search infrastructure for Qt/KDE applications ATM, so until somebody writes one, it seems rather pointless to ask developers to put in extra work and create extra complexity for the ability to swap out Search with something else.

What about KNotes

Quite a few users had trouble with KNotes: it didn't properly import their notes and misses some functionality compared to the old version. Why did this get shipped?

This is, like many things, a matter of resources. The old version could no longer be maintained. At that point, we can drop it. Or rewrite it such that it can be easily maintained. A developer was willing to do the latter, so that is what happened. The rewrite looses some functionality, and it clearly has a bug: not always importing all notes properly. This WAS tested and it worked for the developers and at least the people who tested, so I guess some users were just unlucky. Or did not help test...

Again, the same: if you would have helped test, the importing could have been more robust. In this case, too, I know the developer who did this work personally, and he is extremely responsive to bug reports. If you had no time, that is fair, but then accept that we don't get paid to test or can't pay people to test, so this is all we could do. We would love to get 100 extra, paid developers and another 100 full-time testers. And pink unicorns.

Of course, if you think we made the wrong choice and should simply have dropped KNotes, you can simply remove it yourself and tell yourself that KDE did not have enough volunteers to maintain the old version and KNotes is no more. That, too, sucks and you can blame any random person (just pick a colleague next to you in the office for example) for not stepping up and maintaining KNotes. An alternative is to be happy that somebody at least did the work they did to keep it around and hope that they will find time to add the functionality missing now. And not be discouraged by all the negativity and drop the app completely...

Again, I'm sorry for the issues. So are many KDE developers. But we can't change the world as it is and yelling at us doesn't help. On the contrary, it does the opposite: there is little reason to put in your free time when people just yell at you for doing something for them for free.

Documentation

With the new Search being, well, new, not much documentation has been updated yet. I've put in a few hours to update the online documentation here. If you are missing anything, please add it!


If you have questions, ask in the comment section. But if you comment, at least be reasonable and realistic. We can't bend reality so don't expect us to. And be fair and put in some effort to understand what we do and why before you complain. I think that that is not too much to ask.

As Bruce Lee said: A wise man can learn more from a foolish question than a fool can learn from a wise answer.

I hope the above helps answer some questions!