Three Platforms, One Server Part 7: Testing...

So, our primary authentication server, which will be used to enable network home accounts for our entire internal network — Mac, Windows and Linux — is up and running. I've installed it in our server room, put it on the KVM, and switched over all the workstations. So far, so good. (My fingers are so crossed they hurt.)

For the Windows quota requirements we went with the first solution mentioned in this article, and outlined in detail here: Windows roaming profiles are on a separate, quota-enabled volume on the authentication server. This — and the Windows implementation in general — is my biggest concern, and will require a great deal of testing. Summer Session has begun, and this will give us the opportunity to test on live subjects as a limited pool of students begins using the systems for work again.

There are a couple of immediate benefits of this new system. For one, users can now change their password lab-wide from any Mac in the lab. Now, users don't tend to do this much, but they can if they want to, and it's easy as pie. But as an admin, the most exciting benefit to all this is the ease with which new users can now be created. In the past, creating a new user was a multi-step, multi-computer process that involved coordination between multiple sysadmins: Admin 1 gets the user info, creates the Linux and Windows accounts and hands it off to admin 2, who then creates the Mac account and uploads the Mac skel. This process was lengthy and extremely error-prone. Using the new system, user creation is a breeze. One single, easy-to-use script on the authentication server pretty much does it all. Any sysadmin — or even a trained assistant — can do it. Just log in to the authentication server, run the script, and you're basically done. It's fast, easy and accurate. Which is really the whole reason we're doing all this. I'm pretty happy about it.

There are a few minor details I need to work out. The main one is file sharing. In addition to being an authentication server, our original Mac server is also a file server. On it there are numerous shares for various purposes — one for staff, one for students, etc.. At this point I'm thinking of keeping the file server and the authentication server separate. This will reduce the load on both, and mitigate the need to migrate any data. The catch is that user authentication data on the file server will need to match that of the new authentication server. This should be a simple matter of changing the old server's role to "Connected to a Directory System" and pointing it at the new authentication server. I've never done this though, and it's complicated by the fact that the old server is running Panther whereas the new one is running Tiger. Probably the best thing to do will be to wipe the old server (yes, I have a backup) and put Tiger on it, then redo the shares. But I'll probably test with Panther first, just to see what happens. I'll let you know.

In any case, this is, again, a minor problem that shouldn't affect the overall plan much and should be relatively easy to figure out and implement. It's just one of those little things you forget about when you're drawing up your master plan for world conquest. "Oh yeah... What about the file server?..."

Otherwise, the conversion is on and seems to be going smoothly so far. Once we get into serious testing I'll post the results. Hopefully this will all be done in the next few weeks and the conversion will finally be complete.

Three Platforms, One Server Part 6: More Windows Quota Problems

Of course I knew it was too good to be true. I've found the first fatal flaw in my plan to unify authentication on the internal network. It goes back to the Windows quotas problem I studied some time ago, and to which I'd thought I'd found a solution.

I won't go into great detail about the problems and various solutions to the Windows roaming profile issue. I've already written plenty on it, and the previous posts outline it all fairly well. I will say that the intended solution was to provide Windows roaming profile quotas by setting them locally on each workstation. But, last week, as we moved forward with the plan, one of my fellow sysadmins, who is far more capable with Windows than I happen to be, pointed out the fact that certain applications (i.e. Photoshop, Maya, etc.) need a certain amount of disk space for temp files and what-not in order to operate. Setting small local quotas effectively keeps these applications from running properly.

We are currently testing a few other scenarios in which quotas for Windows roaming profiles can be implemented to our satisfaction:

  1. The Authentication Server (Mac OS X Server 10.4.6) has a separate volume for Windows roaming profile storage and that drive has quotas enabled (this was the original plan). The drawback to this is that the user's home account data is stored separately for Mac and Linux — which keep this data on the home account server — than for Windows, which would store this data on a reserved volume. The other drawback is the fact that the client machine is not aware of the quotas until the user logs out and the Windows client attempts to upload the new data, at which point Windows issues an error if the user has exceeded his quota. But the user is not warned about quota violations until logout, and this could cause some minor problems.
  2. The Windows Server continues to host roaming profiles and determine quotas for Windows users, but gets user authentication information through a connection to our Mac Authentication Server. The drawback here is that we have to continue using the Windows Server, which we don't really want to do, though it does seem to give us a slightly higher level of control than does the Mac Server. This method is also complicated by the fact that we are currently running Windows Server 2000, which does not include native authentication to LDAP, so third-party solutions would be required. This method could also complicate user creation.
  3. Through some combination of Windows and Mac Servers, we convince the Windows roaming profiles to be situated on the Temp volume of the local workstations, rather than the default location on the "C" drive in the "Documents and Settings" folder, and then set quotas for the Temp volume. I'm skeptical that this is even possible. Windows seems to be hard-coded in ways that make specifying the location of roaming profiles anywhere other than "C:\Documents and Settings" impossible. So this last option seems the least likely to succeed, though if it did it would match the way Mac and Linux behave much more closely. And if we could figure out how to do this without the Windows server, it would be almost ideal.

(Actually, I just thought of a problem with this last method: The Windows Temp drives get erased every Friday. If users happen to be working during the deletion period, what happens to their roaming profiles? The same thing happens on the Mac, but the deletion script on Mac does not delete work owned by the currently logged in user. Can such a scenario be implemented on Windows?)

Most likely we will go with the first solution. We already know that it works. It's a little extra effort when creating new users, but that's totally scriptable. We plan to do user creation from a single script from now on anyway, so these extra few steps, once incorporated into our user creation script, won't really be a big deal at all. The only other problem with this is that our replica will now need to sync the Windows roaming profile volume as well if it's to work as a proper fallback system. This, too, should not be terribly difficult to accomplish. Overall, this solution is less elegant than the original one, but it should be workable. Hopefully, Windows Vista will mitigate a lot of these problems. (Yes, that was totally a joke. Please chuckle softly to yourself and move on.)

I guess what amazes me still is how contrary to other operating systems the Windows OS is. Everything we're doing here can be done the same way on Mac as it can on Linux, or virtually any other *NIX system: home accounts can be read directly from a network disk, their locations can be specified, and therefore, all this quota nonsense is unnecessary. On Windows, roaming profiles apparently must be downloaded to the client machine (an unbelievably stupid requirement), and the location of said profiles apparently must always be on the root drive in the "Documents and Settings" folder. I guess these are the ways in which Windows continues to force people to use Microsoft products (I can almost hear Bill Gates whispering in my ear, "Wouldn't it all be easier if you just used Windows Server?") But for software that's become the dominant standard in both the business and personal markets, Windows sure seems non-standard in baffling and infuriating ways. Though this may be how Microsoft has managed to stay on top all these years, I still, perhaps naively, believe that some day, if they don't change this strategy, it will hurt them. Frankly, though I'm sure they're out there, I don't know a single sysadmin that likes Windows. Can you blame them?

Three Platforms, One Server Part 5: Away We Go!

So it's the last week during which students have access to the lab, and that means I can finally implement my plan to unify internal network user authentication. Finally! I'm so jazzed. I've been waiting for months (well, years, really) for the chance to do this, and it's here at last.

The general outline of what I'll be doing over the next few days goes something like this:

  • Backup my current Mac server (for safety)
  • Build my master authentication server
  • Backup a clone of the clean server install
  • Configure the new server with:
    • Users and Groups
    • Home account automounting
    • Home account sharing to SMB (for Windows Roaming Profiles)
    • A skel account for Windows users (to live on the home account server)
    • Other share points
  • Create a replica of the new master server

What I did today:
Well, it's amazing how long a base install of Tiger Server can take. I've pretty much been doing that all day. Not that I'm so incompetent that I can't install the software in seconds flat, but software updates take forever and a day. Planning and getting drives to do all this on was a bit of an effort too. Plus I just wanted to make sure I did it right the first time, so I went slow, gave myself the day. I'm also making clones of everything along the way, for building my replica, and in case I goof and need to start over. So that takes a while. I guess I'm just saying that I'm taking my time with this, 'cause I want it to be as perfect as possible from the get-go.

By Monday we should have:

  • A base install of Tiger Server 10.4.6 with requisite Software Updates on a firewire drive
  • A backup of our old Mac Server
  • A new Tiger 10.4.6 authentication server that's configured to host Mac, Windows and Linux users
  • A replica of same

We will spend part of next week pointing all our workstations at the new server. The Windows machines will be the biggest pain as 1) they are running Windows, and 2) they need local quotas set (which could really be just a subset of point 1, but whatever). The reason for all this quota nonsense, you ask? Well, for the answer, you'll just have to read the previous posts on the matter. Suffice to say, I'm hoping the quota setting nonsense will be the worst part of this job, which it should if all goes according to plan, which, I'm sure you're aware, it rarely does.

Finally, I wanted to mention this quote that I read on Daring Fireball, by someone called John Gall, author of Systemantics, as it really jives with a lot of the stuff I've been thinking about with regards to the lab:

“A complex system that works is invariably found to have evolved from a simple system that worked….A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system.”
— John Gall

Next week should be interesting. I'll keep you posted.

Three Platforms, One Server Part 4: Redundancy

One of the major hurdles in our server unification project, mentioned in Part 1 of this series, is that of redundancy. In the old paradigm, each platform's users were hosted by a separate server. Mac users authenticated to a Mac Server, Windows users to a Windows Server, and Linux users to an NIS server. While this is exactly what we're trying to avoid by hosting all users on a single server, it does have one advantage over this new approach: built-in redundancy. That is, if one of our authentication servers fails, only the users on the platform hosted by said server are affected. For example, if our Windows Server fails, Windows users cannot login, but Mac users and Linux users can. In our new model, where all authentication for all platforms is hosted by a single server, if that server fails, no user can log in anywhere.

Servers are made to handle lots of different tasks and to keep running and doing their jobs under extreme conditions. To a certain extent, that is the very nature of being a server. To serve. Hence the name. So servers need and tend to be very robust. Nevertheless, they do go down from time to time. That's just life. But in the world of organizations that absolutely must have constant, 24 hour, 'round-the-clock uptime, this unavoidable fact of life is simply unacceptable. Fortunately for me I do not inhabit such a world. But, also fortunately for me, this notion of constant uptime has provided solutions to the problem of servers crashing. And while I probably won't lose my job if a server crashes periodically, and no one is going to lose millions of dollars from the down-time, no SysAdmin likes it when he has to tell his users to go home for the night while he rebuilds the server. It just sucks. So we all do our best to keep key systems like servers available as much as possible. It's just part of the deal.

So how are we going to do this? Well, one of the reasons I decided to use a Mac for this project is that it has built-in server replication for load balancing, and, yes failover. We're not too concerned with the load balancing; failover is what we're after. Failover is essentially a backup database that is a replica of a primary database, and that takes over in the case of a failure of the primary database. Mac Server has this built-in, and from what I read, it should be fairly easy to set up. Which is exactly what we're about to do.

The first thing we need is our primary server. This is the main server. The one that gets used 99% of the time (hopefully). We have this (or at least a test version of it) built already as discussed in Part 1. What we need next is what is called the replica. The replica is another Mac OSX Server machine that is set to be an "Open Directory Replica," rather than an "Open Directory Master."

So I've built a plain old, vanilla, Mac Server, and set it initially to be a Standalone Server. I've given it an IP address, and done the requisite OS and security upgrades. (Oy! What a pain!) In the Server Admin application, I set the new server to be an "Open Directory Replica." I'll be asked for some information here. Mainly, I'll need to tell this replica what master server to replicate. Specifically I'm asked to provide the following at the outset:

IP address of Open Directory master:

Root password on Open Directory master:

Domain administrator's short name on master:

Domain administrator's password on master:

(The domain administrator, by the way, is the account used to administer the LDAP database on the master.)

Once I fill in these fields I'll get a progress bar, and then, once the replica is established, I'm basically done. There are a few settings I can tweak. For instance, I can set up secure communications between the server with SSL. But for my purposes, this would be overkill. I'm pretty much going with the out-of-the-box experience at this point. So for setup, that should be it. Setting up a replica is pretty easy stuff.

Establishing the Replica: Could it Be Any Easier?

(click for larger view)

Now here comes the fun part: testing. What happens if our primary server goes offline? Will the replica take over authentication services? Really? I'd like to be sure. What I'm going to do now is test the behavior of the Master/Replica servers to make sure it acts as intended. The best way I know to do this is to simulate a real-world crash. So I am binding one of my clients to my Master server, with Replica in place. Then I'm going to pull the plug. In theory, users should still be able to login to the bound client. Let's try it...

Bang! It works! I'm a bit surpsrised; last time I tried it, years ago, it (or I) failed. This time, though, it worked. We bound a client to the Master, our mother-ship server. Authentication worked as expected. (We knew we were bound to the new server because the passwords are different.) And then we killed it. We killed the master and logged out. There was some beachballing at logout. But after a few minutes -- like two or three, not a long wait at all -- we were able to complete logout, and then log right back in as though nothing had happened. I tell you, it was a thing of beauty.

So let's briefly recap where we've been and what's left to do.

Where we've been:

  • We've built our Mama Server. Our authentication server for the entire lab.
  • We've figured out how to migrate our users to Mama, and how to handle the required password change.
  • We've solved the inherent problems with Windows clients and figured out a few solutions for handling them involving quotas and various roaming profile locations.
  • We've built and tested the operation of the Open Directory Replica, and it is good.

What's left to do:

  • Well, honestly, not a whole Hell of a lot.
  • The next step, really, is real-world testing. We have a basic model of how our servers and clients should be configured, and it's basically working. To really test this, we'll need to take some actual clients from the lab and set them up to use the new system.
  • Stress testing (i.e. seeing if we can break the system, how it holds up under load, etc.) would also be good, and might be something to do over Winter break a bit, and definitely in the Summer. To do this, we'll need to set up several client systems, and get users (guinea pigs) to do some real work on them all at the same time.
  • Once stress testing is done, if all is well, I'm pretty sure we can go ahead and implement the change. I can't foresee any other problems.

So I'm at a stopping point. There's not much else I can do until the break, at which point I'll be sure and post my test results.

Hope to see you then!

Three Platforms, One Server Part 3: Another Quota Solution

Another solution to the problem of quotas occurred to me today: local quotas. Since the Windows workstation copies everything in the roaming profile from the server to the local machine at login, and uses that data as the user works, it actually makes a lot more sense for quotas to happen on the local workstation itself. This way, rather then becoming aware of quotas at logout only, the workstation is always aware of the state of the user's home account quota, because that quota exists on the volume where the profile actively lives.

Doing this also prevents the error message at logout due to exceeding a server-side quota. In this new, local-quota scenario, the user is alerted the instant he exceeds his quota and can rectify the situation immediately. But if he doesn't, no problem. As long as the local quota and the server-side quota match, he'll never run into the problem of being unable to save his settings to the server, since he'll never be able to exceed his quota anyway.

It turns out that, unlike on Mac OS X, setting quotas for NTFS volumes is incredibly easy. In fact, it's a simple matter of viewing the properties of the drive on which you want to set quotas.


Windows Quota Settings: Easy as Pie
(click for larger view)

Here, go to the Quotas tab and click "Enable quota management" and "Deny disk space to users exceeding quota limit," set the limits, and you're basically done. The administrator will have no quotas, and any new user will have the quotas specified in this panel. You may, however, want to set certain users (particularly other admin or local users) to have larger quotas, or none at all. Again, exceptionally easy: Click that badly named, but ever-useful little "Quota Entries..." button and you'll get a list of local users and their quotas.


Windows Quota Entries: Edit Specific User Quotas
(click for larger view)

Here, you can set quotas for specific users. Initially, you'll only see local users. But after any network user logs in, you'll also see them listed as well.

The more I think about it, the more I like the idea of local Windows quotas. Using them does away with all the problems mentioned in earlier posts, and may help with a lot of the quota-related problems users have with our current, Windows-server-based system. Also, this would allow me to store all profile-related stuff in the same place for Windows as for Mac and Linux -- on our home account RAID -- as I'd wanted to do in the first place. And, in general, it should be much easier -- or at least not too difficult -- to maintain.

A last note on the timing of this project: I just spoke with my boss about the fact that I'm way ahead of schedule and have been thinking about implementing our unified password server over the Winter break. Lately I've had some misgivings about doing this, and he echoed those sentiments. His take was basically, "If it ain't broke, don't fix it." Always sage advice. And as gung-ho as I am to see this working, giving it some time and implementing it over the Summer will give me more time to plan this more carefully and test it more thoroughly, so that we can iron out problems at the beginning of the school year -- when students' work is not quite so in-progress -- rather than at the end -- when students are trying to finish their theses. This seems like a wise approach, and at this point I'm leaning toward erring on the side of caution.

Finally, credit where due: In-depth information on NTFS quotas can be found on this Microsoft page, which is where I lifted the images in this post 'cause I'm too lazy to figure out how to get good screen captures in Windows, and 'cause those images described perfectly what I wanted to show. I believe the images are in the public domain and, therefore, legally usable in my post, but if I'm wrong, I apologize, and if I need to take them down, someone can let me know in the comments section.