Upgrading older systems

Every couple of weeks somebody asks me whether it makes sense to move from an older Sun SPARC machine to a new x86 machine running either Linux or Solaris. My usual answer is that it depends: mainly on what the thing is supposed to do, how it fits with the other stuff you have, and who looks after it.

This week, however, I want to look at various aspects of that question and see if it’s possible to offer a clearer and less general answer.

In most cases the driving force for this comes from the fact that support costs on older hardware move in the opposite direction from purchase costs for warrentied hardware of comparable or greater processing power - but the primary issue in making that change comes from the need for architectural change imposed because the current version of what you have in place has moved too far upscale relative to your requirements.

Basically, as your hardware gets older your monthly support costs become an ever increasing fraction of the cost of replacing it with comparably powerful new gear - new gear that typically comes with a year or more of warranty coverage as good or better than the coverage you’re paying for on the old gear. At the same time, however, the fact that the applications still work on existing hardware means that the replacement hardware comes from much lower down the relative performance scale - and adopting it can often require an architecture change.

On net, therefore, the incentives sometimes work for change, and sometimes against - and sometimes they collide in unpleasant ways.

One former client has an application, for example, that’s both mission critical to his business and essentially unchanged for over ten years. It currently runs on a pair of Sun 450s from the Solaris 2.7 days - and while he could get better performance at near zero incremental cost by swapping in a pair of otherwise retired Xeon servers running Linux, he has no incentive to take any risk because the thing’s success makes it invisible to user management and he stopped paying Sun anything in about 2003 when he put a decommissioned, but fully functional, 450 away as a backup.

Another has huge incentives to change - but can’t for internal political reasons. On the incentive side they’re currently paying IBM more per quarter in support on a couple of P690s and a shared disk store than it would cost to replace them with brand new T5440s complete with three years of gold support - and on the negative side their original ERP decisions were so bad (customization and best of breed) that the company nearly went under getting anything working, and so user management will now reliably throw panic hissies if anybody in IT so much as thinks about making any changes.

And yet, the simple bottom line is that both are going to have to change - it’s just a fact of life: costs change; gear becomes obsolete, old software becomes a drag on positive change, failure risks increase as equipment goes past its engineered end of life.


Some general considerations for small systems change


Let’s assume a scenario under which the older system you’re thinking about upgrading is relatively small, support costs are high, and you can’t obviously transfer its workload to some other, larger, machine with adequate idle capacity.

Specifically, lets assume you have a Sun 490 from a few years ago (4 x 1.8Ghz USIV, 16GB, 4 x 73G that’s still under support and runs an engineering document and database application critical to everyone from R&D to the people handling customer warranty claims.

It works, but the hardware is getting old - and support costs seem outrageous relative to the nominal cost of PC style servers: your predecessor signed up for full 24 x 7 Gold level support at nearly $8,000 per year - about 10% of the nominal list price when he bought it- and lots of people claim you can get ten PC servers for that: one for every six weeks in support costs.

This is, in other words, Red Hat’s dream scenario - the primary one their anti-Sun campaign targets, and the one in which you’re supposed to believe that buying a free Linux from them will give you better performance for less money.

In this situation the key things to consider are:

  1. your tolerance for system failure;
  2. your tolerance for security (in the PC sense) risk;
  3. constraints on future change opportunities;
  4. I/O limitations and storage growth rates; and,
  5. staffing related issues.

Note that application level SPARC compatibility is not directly an issue - any application can be either migrated or replaced if the incentives for doing it justify the risk and costs involved. It’s easier, of course, to upgrade to binary compatible HW/OS combinations, but that’s a cost/benefit issue, not an absolute.

The failure tolerance issue comes down to this: SPARC (and Power) systems are built to higher quality standards than x86 ones - and that’s true whether you’re comparing at the low end, mid range, or high end in each category. As a result the issue here is whether you care about the quality you’re paying for with that 490.

It’s a low end machine for SPARC but to match the quality in the x86 world you have to go to the higher end stuff: typically Compaq’s Proliant line, and that costs more than a new SPARC machine would. To make hardware savings, therefore, you have to be willing to accept a higher risk of hardware failure - so this comes down to how much of that you can tolerate.

All management speak aside, this is ultimately a gut call: my own rule of thumb being that if your users can see a cost difference between eight hours a year in downtime and two, then sticking with the higher end gear will be the right thing to do even if that cost difference seems smaller than the hardware savings .

The security (in the PC sense) issue is this: you only care about the risk of attacks that work or could work - meaning attacks that exploit code or process vulnerabilities in ways that can be directed against you. Since every OS and application has code vulnerabilities, and every process involves people and/or networking, the determining factor is how high the exploit barrier is.

In the x86 world exploits are virtually synonymous with vulnerabilities, but because this isn’t true for PPC or SPARC the barriers there are much higher - witness, for example, Apple’s transition from a company that could build a security reputation while ignoring vulnerabilities on PPC to an x86 maker that’s rapidly losing its reputation for security despite obsessive patching.

Again the question is one of comparing risks to possible costs and other consequences: basically, the worse the consequences a successful attack could be for you, the further you want to stay away from x86 - and if there’s a genuinely compelling reason to use x86 in a high value situation, bite the bullet on porting your application to OpenBSD and have security experts go over your code line by line as part of that process.

The opportunity cost issue on software change is one of the hardest to get your head around. The question is at what point change now starts to significantly drive up the cost of future change. In the obvious version of this you make a change decision today, and tomorrow’s vendor announcement means you’ve spent the money buying the wrong thing -but the more interesting, and more subtle, version is that you spend your change budget (including non dollar spending like stressing out user management’s tolerance for change) and tomorrow one of your people comes up with a new idea that you really want to implement but can’t -and one thing I’ll guarantee you is that nobody on your staff will really buy into your reasons for saying no.

This is where the option of doing nothing as long as possible really shines: the maxim about a tax delayed being a tax unpaid works here - if it’s Unix, and it works today, leaving it alone will pretty much guarantee that it works tomorrow -and, in these kinds of situations, that can be a good thing.

In contrast to opportunity costs, the storage issue is dead simple: those 73GB disks in the 490 can be upgraded to 146GB at minor cost, but going beyond that means either getting an external JBOD or trading off significant new costs against performance. Either way, once volumes get much past 4 x 146GB, the fact is that new gear with terabyte disk sets and full warranties usually combine lower cost with lower risk and higher performance relative to adding disk to old systems.

And, finally, there are staffing issues. People will tell you that switching from Solaris to Linux will make it easier to find qualified staff, but that isn’t true. Unix skills are usually easily transferable: if your current staff can keep their hands off that 490 running Solaris, they can probably keep their hands off a Linux replacement machine too -and, similarly, if you can hire someone who can get Linux set up and running properly, the chances are that Solaris won’t give them any trouble either.

Conversely, if your staff reports that 490 as unreliable, the one thing you can be assured of is that they’re causing those failures -and not only will they do the same thing to a Linux replacement, but whatever root cause (usually a manager whose skillset doesn’t match the technology) is driving this will also limit your ability to retain any new people you bring in with better skills.

Thus the positive bottom line on staffing is that if your shop is working well, there won’t be anything scary about transitioning between Linux and Solaris - in either direction.

Conversely, if what you’ve got is a skills-technology mismatch you have two choices: change the people, or change the technology - and do it before you change anything else because not facing up to the issue condemns you to a long and slow death by a thousand failures.

So what’s the real bottom line on all of this? Support costs may be a lever for getting people thinking about change, and technology continuation may have value for you, but in the end these kinds of decisions almost always come down to intangibles: guesses about future risks and opportunities, not the small dollars involved in support contracts.


 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this post.
Comments
Page: 1 of 2
Page: 1 of 2
Leave a comment

 Enter the above security code (required)

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.