26 June, 2013

Debugging a system without desktop

Sometimes, Linux Desktop bites you in the ass. You just get a command line after booting up. I've ran various bleeding edge distro's over the year, including Cooker, Unstable, Factory, Unmasked stuff; build my own kernels and desktop and more.

Seeing a request for help today made me decide to write down the steps I go through when my favorite desktop doesn't appear in the morning... I bet there's plenty to improve, tips are welcome!

General approach

I try to work quickly from the bottom up: test pieces of the lower stack, go higher until it breaks. While digging through logs can find the issue most likely, I've found that shooting a few quick commands to catch common issues is often faster than immediately going to comb through logs... So, first basic things before diving in logs.

step 1: common issues

I usually of course think back to what I recently changed or updated - that's likely the issue. But to keep it generic, let's assume that doesn't tell us anything. Then it's time to look at a few common causes of problems. Log in on a console with the root user name and password.

disk space

First up for me are disk space and stuff in tmp. A full disk can lead to the weirdest problems - from an end user perspective, it often makes no sense. So check how much space there is on your devices:
df -h
and check the output for 100% full drives:
Filesystem       Size   Used   Avail   Use%   Mounted on
devtmpfs         7.7G   8.0K    7.7G     1%   /dev
tmpfs            7.7G   5.2M    7.7G     1%   /dev/shm
tmpfs            7.7G   319M    7.4G     5%   /run
/dev/sda1          2G   704M    1.2G    63%   /boot
/dev/sda2         20G    14G    5.0G    74%   /
tmpfs            7.7G      0    7.7G     0%   /sys/fs/cgroup
tmpfs            7.7G   139M    7.6G     2%   /tmp
tmpfs            7.7G   319M    7.4G     5%   /var/lock
tmpfs            7.7G   319M    7.4G     5%   /var/run
/dev/sda3        213G    74G    139G    35%   /home

Obviously, if a drive seems full (less than 100 mb free), it's time to clean up. I trust you know enough cd ../cd [directory] and rm -rf [directory]/rm [filename] to get this done...

Temporary directory

Looks all good here, so next step then, just to check: if /tmp is not mounted on a temporary ram drive (like in my case), just clear everything in there to make sure nothing is causing issues. rm -rf /tmp/* will do the trick. Yeah, canon, fly, but /tmp should not be used by applications to store important data (hence the name) so this is safe™. Also, check permissions for /tmp, just to be sure:
ls -la / |grep tmp
drwxrwxrwt 30 root root 760 26.06.2013 15:37 tmp/

Yes, /tmp should be readable and writable by anyone, so the permissions string rwxrwxrwt is important. If it is anything else, execute
chmod +rwxrwxrwt /tmp, remove the contents of /tmp again with rm -rf /tmp/* and try again.

update the system

Third, I always quickly do a zypper refresh && zypper update. If an update broke it, let's see if a newer update can fix it ;-)
Of course, use the commands appropriate to your distribution: apt-get update && apt-get upgrade; pacman -Syu; etcetera.

Step 2: does X work?

Second is seeing if the graphical system (X.org/xorg/X11, or just 'X') still works. Type
startx /usr/bin/xterm
This will start a simple terminal in the graphical environment and nothing else. As failsafe as it gets... If it works go to step 2, if not, see below.

X not working?

Now it's time to check the X.org log file and perhaps the output of the startx command. Any hints there as to what's wrong? If not, look in /var/log/Xorg.0.log (less /var/log/Xorg.0.log will do) and see where you have errors. The Xorg log is long, but has a neat way of showing errors: the codes between parenthesis give hints to what the line represents. Example output:
[199842.753] (WW) Falling back to old probe method for fbdev
[199842.753] (II) Loading sub module "fbdevhw"
[199842.753] (EE) Failed to load module "modesetting" (module does not exist, 0)

First a timestamp, then the code (WW = warning, II is for your info; Look for EE, meaning error!) and then you can try and solve this. In this case, a module doesn't exist - that looks like xorg didn't upgrade properly or got partially uninstalled. Search your package database for xorg and see if possibly-important packages are missing: zypper se x11
Loading repository data...
Reading installed packages...

S | Name                          | Summary       | Type
--+-------------------------------+---------------+-----------
[cut for simplicity]
  | xorg-x11-driver-input         |
[cut for simplicity] | package
i | xorg-x11-driver-video         |
[cut for simplicity] | package
  | xorg-x11-driver-video         |
[cut for simplicity] | srcpackage
  | xorg-x11-driver-video-nouveau | FOSS
nVidia driver   | package
[cut for simplicity]


You could imagine the input driver might be missing. Or, if you have nvidia hardware and don't use the proprietary driver, well, nouveau is what you need!

Likely culprit for users of NVidia and AMD hardware is in the proprietary drivers. You're in for a world of pain - I'd just remove them and re-install once your graphical system works again. For NVidia (the only one I have experience with) I usually just remove anything I see with nvidia in it and install the mentioned nouveau driver. Find more Xorg debugging solutions on this excellent Fedora wiki page. Once startx works, I'd reboot.

Step 3: does the desktop work?

Third is finding out if your desktop (plasma desktop, GNOME Shell etc) works. Exit the xterm (note that your mouse has to hover the window before you can type in it) and now start startx without the terminal command:
startx
Because you're root you have a relatively 'clean' user account (you should never daily use the 'root' account for your desktop, we're just testing here). If it doesn't work, you could attempt to remove the desktop user settings in the root user account. For a KDE Plasma Desktop that means:
rm -rf ~/.kde4
Note that I again assume you don't USE this account and the settings are inconsequential. Don't do this on a normal user account, you loose lots of settings, recent files and even data!

If the problem is in user settings, you should by now be on you to your desktop of choice, logged in as root! If that is indeed the case, skip to the next step. If you're not there, let's find out what is wrong.

Desktop doesn't come up?

Ok, step back. xterm did work, Plasma/Shell/XFCE don't? Let's see if we can start them FROM a xterm and catch a glimpse of where we get!
startx xterm
Now back in the terminal? Ok, let's start Plasma Desktop as example. type
startkde
and observe KDE attempting to start. Now you should see some messages and hopefully you can take it from here - going any deeper doesn't really fit this blog post, google is your friend from here onwards...

Step 4: User config broken!

Now we know where the problem is, at least, approximately: the graphical system works, the desktop works, so it's in the user settings. Time to dig there. 90% certain the issue is in the configuration of your desktop, although it could be an application somehow botching stuff. Let's start with startx and xterm again. Log in from the command line as your user and type
startx xterm and then startkde or the equivalent for the desktop you're looking for. Observe the failure, see if it tells you anything. If that doesn't help, next up the Real Hard Work: finding out what setting in your ~/.kde4 configuration folder is the culprit.

I suggest to back up your config folder:
cp -R ~/.kde4 kde4-config-backup
Then go in and randomly nuke files and folders until you get the problem. Ok, random is a bit too bad, I'd start removing ~/.kde4/share/config and if that doesn't solve it put it back and remove ~/.kde/share/config/apps instead. One of these two contains the problem, most likely...

More?

In the end, you'll have to find and solve the problem with googling and simply some hard work... The above is a guide which I personally follow, essentially a heuristic based on lots of fixing. I bet lots of readers have tips and tricks that are far more helpful and I'll happily update this post to include them...

Hugs and good luck!