Infrastructure

There are a bunch of legacy issues at my new job. Many of them, I believe (I'm not completely sure, I'm still pretty new after all), stem from the once heavy use of IRIX and its peculiarities. We are only just reaching a point at which we can do away, once and for all, with that platform. It's a very complex ecosystem we're dealing with here. Complex, and in some spots delicate. But what surprises me more than a little is that, despite the fact that my new job is at one of the most respected institutions in the country — and one of the more advanced computer labs as well — I find myself facing many of the same issues I tackled in my last position. Those challenges have a great deal to do with creating a simpler, more elegant, more efficient computing environment. And, though the user base has changed dramatically — we're dealing with a much more technically sophisticated group of professionals here, not students — and the technological and financial resources are much more vast, the basic goals remain the same, as do many of the steps to accomplishing those goals. And the one thing those steps all have in common, at least at this stage of the game, is infrastructure.

What makes infrastructure so crucial? And why is it so often overlooked?

Infrastructure is key to creating efficient, and therefore more productive, work environments, whether you're in art, banking, science, you name it. If your tools are easy and efficient to use you can work faster and make fewer mistakes. This allows you to accomplish more in a shorter period of time. Productivity goes up. Simple. Infrastructure is a lot like a kitchen. If it's laid out intelligently and intuitively you can cook marvels; if it's not you get burned.

Infrastructure, for our intents and purposes, is the back-end of the computing environment. Everything you do between computers — that is, every interaction that takes place between your computer and another computer, be it file server, authentication server, web server, what have you — relies on some sort of infrastructure. I'm referring to network infrastructure here, to be sure, but also to the processes for adding and accessing computer resources in any given facility. How, for instance, do we roll out new workstations? Or update to the latest operating system?

Typically, it is Systems Administrators — the very people who know (or should know) the importance of infrastructure — that tend to work between computers the most. We of all people should all know how important a solid infrastructure is for even the simple act of basic troubleshooting: if your infrastructure is solid and predictable, the paths to troubleshoot are greatly lessened and simplified, making your job easier and making you better at it at the same time. Yet infrastructure, time and again, is left to stagnate for a variety of reasons.

I'd like to enumerate a few of those reasons, at least the ones I suspect factor most strongly:

  1. Infrastructure is difficult Infrastructure planning, like, say, interface design, is complicated and often requires numerous iterations, coordinated effort and a willingness to change to successfully implement.
  2. Infrastructure requires coordination Infrastructure changes often require re-educating the user base and creating a collective understanding as well as clear policies on how things are supposed to work.
  3. Infrastructure is not sexy (to most people) The benefits of infrastructure reorganization are often not immediately apparent, or even immediately beneficial for that matter. You might not see the benefits until long after a reorganization.
  4. Infrastructure can be expensive If an infrastructure requires a major overhaul, the cost can be high. Couple that with less-than-immediate benefits and you tend to often meet with a great deal of resistance from the money people who feel that they'd be better served buying new workstations than a faster switch.
  5. Change is scary You know it is.

I've been extraordinarily lucky in that I've been able to learn about infrastructure in a sheltered environment — that of the Education sector — that allowed me copious downtime (unheard of elsewhere) and a forgiving user base. (Students! Pfft!) I'm still pretty lucky in that A) I'm working somewhere where people, for the most part, get it; and B) I have some time, i.e. I'm not just being brought in as a consultant. This last bit is really fortunate because it affords me both ample opportunity to gain an understanding of the environment I'm trying to change as well as the time in which to change it. This is not to say that this sort of thing can't be done in consulting. But it's certainly a much harder sell, and one I'm glad I don't really have to make to such a degree.

Still, with all that, I've got my work cut out for me.

When I first arrived on the scene, The New Lab was using (actually, still is to a large extent) NIS for user authentication. Now this is something I know a bit about, and I can tell you (if you even remember what NIS is anymore) NIS is very, very passe. And for good reason. NIS is like the Web 1.0 of user authentication: it uses flat files rather than databases and is extremely cumbersome and inflexible. Moreover, it is not well-suited to cross-platform operation. It is completely end-of-life and obsolete. To continue to invest in NIS is silly. So one of my first duties was to build an Open Directory server, which relies on numerous databases, each suited to authentication for a given platform. The OD server will be both easier to use (creating users is a breeze) and more capable than any NIS server could ever hope to be (by allowing cross-platform integration down the line, if desired). But so far, for some reason, no one's done this. Partly, maybe, it's just inertia: NIS works fine enough, it's not that big a problem. And maybe it's partly happening now because this is something I just happen to know a lot about and can make it happen quickly and effectively. Because of my background, I also see it as a huge problem: by slowing down the user creation process, you're hindering productivity. And not just physical productivity, but mental productivity. If I have to spend twenty minutes creating a user, not only have I wasted that time on something trivial, but I've expended far too much mental energy for a task that should be simple. And this makes it more difficult to get back to Work That Matters. Again, the beauty of being on staff is that I have time to introduce this gradually. To gradually switch machines over to the new server. To gradually get the user base used to the slightly new way of doing things before we move on to the next item up for bid.

So far, so good.

I've talked to my fellow co-workers as well, and they're all primed to make changes happen. That's really good. We're talking about all kinds of things: re-mapping the network, asset management with DHCP, redoing the scheduling system, and others I can't even think of right now. User authentication took years at my old job. It was, in many ways, a much more complex network than this new one (we don't manage an external network, thank God). But this place has its own set of complexities and challenges, and though the authentication server is basically done, there are a whole host of things I could see happening in the realm of infrastructure. And they're all right there... See them? Just over the horizon.

Should be fun.

There are a few basic things I like to keep in mind when preparing for and making major infrastructure changes. These are the types of problems I look for that give me purpose:

  1. Repeat offenders What problems crop up again and again on the user side? What questions get asked over and over? These are indicators that something is functioning sub-optimally, or that a process could be more intuitive.
  2. Personal frustration What parts of my job are frustrating or infuriating to me? These are usually indicative of a problem, as I tend to get frustrated with things that don't work well. Either that or I need more coffee.
  3. Redundant errors Is there a process that tends to yield mistakes on a regular basis? If so it could probably use some automation or clarification at some point. Sometimes all you need is a clear policy or workflow.
  4. Long-term problems Is there something that everyone complains about, but that just "never gets fixed?" Betcha ten bucks it's an infrastructure problem.
  5. The workflow How do people in the facility currently work? What's the pipeline for getting things done? Are they spending their time working on tech issues when they should be working on production? How could this be easier?

There are probably more, but these are the general things I'm thinking about when considering infrastructure changes. And the better I can understand the people and the technology in a facility the more informed my decisions can be with regards to those changes.

Finally, there are some basic notions I keep in mind when proceeding with infrastructure changes:

  1. Simplify The simpler solution is almost always best, both for the admin and the user. Building a simple solution to a problem is often exceedingly difficult, and I might point out, not necessarily simple on the back-end. But a simple workflow is an efficient one and is usually my prime directive.
  2. Centralize It's important to know when to centralize. Not everything benefits from centralization, obviously. If it did we'd all be using terminals. Or web apps. For everything. But properly centralizing the right resources can have a dramatic affect on the productivity of a facility.
  3. Distribute Some resources should be distributed rather than (or in addition to being) centralized. Some things will need redundancy and failover, particularly resources that are crucial to the operation of the facility.
  4. Educate Change doesn't work if no one knows about it. It's important to explain to users what's changing and also why. Though I've been met with resistance to changes that would actually make a user's job easier (this is typical), making them aware of what and why the change is happening is the first step in getting them to see the light.

It's true that infrastructure changes can be a bit of a drag. They are difficult. They're hard to justify. They piss people off. But in the end they make everything work better. And as SysAdmins — who are probably more intimate with a facility's resources than anyone — we stand as much to gain (if not more!) than our users. And they stand to gain quite a bit. It's totally win-win.