Why We Tell You To Reboot: Redux

I recently wrote an article, entitled Why We Tell You To Reboot, that described a Final Cut Pro bug which, after going to great troubleshooting lengths, I was ultimately only able to solve by a simple reboot. Shortly after posting I received a single comment from a fellow admin and blogger:

"You really tell people to reboot for no particular reason?

I don’t believe we should accept that standard from OS X, and what kind of an IT person are you if you’re blindly proposing solutions without any reasoning to back them up?"

What kind, indeed.

I posted a response to the comment that basically explained my position in a nutshell, but I thought it was worth writing a follow-up on the question, both for thoroughness and for those who didn't happen by the comments section of the previous article, or who may have had a similar reaction.

Reboot!

When a user comes to me with a problem, my primary goal is to fix the problem and get the user working again. Typically what happens is that we have a little discussion about what's going on. Once I feel I have a good handle on the symptoms, often the next words out of my mouth are, "Did you reboot?" If the answer is no, and the situation permits, I will recommend that they do so. Rebooting is often my first step in troubleshooting.

I believe (though I can't be completely sure) that my commenter took issue with the approach because blindly telling the user to reboot to solve problems gives the sysadmin no information about what those problems are and what caused them. But in fact, as I'll demonstrate in a moment, it's not blind, and it does tell us one important thing: rebooting either solves or doesn't solve the problem. This in and of itself can be crucial troubleshooting information if there is a deeper problem at work.

But the fact of the matter is that, probably 80% of the time, there is no deeper issue. The fact is that rebooting routinely fixes problems with no other practical solution, such as the one I described in my article. Moreover, it provides the end-user with a method of troubleshooting that is likely to achieve the desired results — allowing them to get the system or an application back to a working condition — without the need for admin intervention. This is win-win: it saves both the user and me time and energy and, by determining whether or not a reboot is helpful, still provides valuable troubleshooting information.

I would even argue that rebooting should almost always be the first step in troubleshooting. When a user comes to me with a problem, I have no idea what they've been doing on that system. I have no idea how many nor which applications they have open. I have no idea what sorts of preferences they've set. There's simply no way for me to reliably predict the state the user has put their machine in, and thus whether or not this is a system- or user-level problem. The only way for me to get things back to some semblance of a known, working state is to reboot the system. Rebooting has myriad benefits, not the least of which are: clearing stale caches; recreating network connections; and freeing up RAM and disk space. In fact, it seems almost crazy to proceed with most troubleshooting without first rebooting.

You may have noticed that I keep saying that rebooting should almost always be the first troubleshooting step. That almost is there because, obviously, there are times when rebooting is not a good first step. Primarily, when a user stands to lose work by rebooting. If an application is hung and the user hasn't saved his document, for instance, I don't tell them to reboot. Rebooting would be bad in this instance. Also, I usually try other troubleshooting methods first on my own systems, with which I am better acquainted (though this is often to my own detriment and rebooting would have been quicker and easier, as happened with the Final Cut bug).  Another instance in which I avoid a reboot is when there is a persistent problem that is not solved, or is only temporarily solved, by a reboot. Then I do need to get on that system and attempt to understand a problem. And that's what I do.

But, because Mac OS X is very reliable on the whole, these instances are extremely rare in my experience. The majority of problems are minor and are easily and permanently rectified by a simple reboot. I stand behind that recommendation, and any search of Mac troubleshooting articles will reveal that the advice is almost universal. That's because it works.

So, hopefully it's clear by now that I'm not "blindly proposing solutions without any reasoning to back them up." Hopefully it's clear now that there are a lot of good reasons to try rebooting as a first troubleshooting step.

And hopefully it's clear that the kind of sysadmin I am is the kind that likes to get his users back up and running again with a minimum of friction.