Friday, May 22, 2009

Trying to visit

And you thought

YOU had problems

Customer ID: 1*******

Contract ID: 8******

Dear John Glenn,

The web server that your site is hosted on has been offline due to an apc power failure of 3 server racks. While all the other affected servers came back online, this web server did not recognize its RAID subsystem.

RAID stands for "Redundant Array of Independent Disks" and is a technology that employs the simultaneous use of two or more hard disk drives to achieve greater levels of reliability and performance.

Your website is stored across the RAID system twice over different hard drives, if one of the hard drives fails your web site will continue to run. The failed hard drive is replaced and the data that was on the drive copied again from the other drives within the RAID, this is known as rebuilding the RAID, and normally happens seamlessly without any effect to the web hosting server or your website. This is a daily task performed in our data centers and is standard for large data storage systems such as used in the web hosting environment.

In this instance, we replaced the failed drive with a new drive and the RAID started to rebuild. While this was happening the rebuild process failed, corrupting all the data within the RAID set. This should not happen and we have open tickets with the RAID manufacturer to understand what went wrong in this case and to ensure that they can prevent this for the future.

Our system administrators do not rely on the RAID system as our only source of backup. We run a rolling backup of the live system to external backup servers to ensure that in a case like this we have a restore solution.

After the RAID corruption occurred, our engineers analyzed the situation and found that the only solution left to us was to recover the data from our backup systems. At this point the RAID was reinitialized ready to receive data, this process itself takes several hours to perform.

We are currently copying and restoring the data from our backup systems to the web hosting server that your site runs from. The restore process takes time and will be finished not before Saturday afternoon.

Since the system problems began we have had a dedicated team of administrators working around the clock to monitor the copy of data from our backups and to ensure that all settings are restored so that your website will run again.

We apologize for any inconvenience and thank you for your patience. We will update you again as soon as there is additional information available.

Sincerely, 1&1 Internet Inc.

For the record, 1&1 normally has an excellent up time.

Also "For the Record": RAID assemblies are useless if the box in which they are installed "goes away." Yes, it happens. Given that nothing is perfect (except you and me, and I'm not sure about you < g >) backup media is a must and that media needs to be proven - was the WRITE error-free? If a tape, has anyone tested a READ on a drive other than the one on which the tape was written?

Nothing is simple. It's our job as risk managers to anticipate 99% of the potential "got'chas."

John Glenn, MBCI, SRP
Enterprise Risk Management/Business Continuity practitioner
Ft. Lauderdale FL
Planner @

Friday, May 15, 2009

Ripple effect

Headline: GM, Chrysler to cut dealerships 20 percent

Who is hurt?

Obviously the owners of the dealerships - they have to unload inventory either to potential new car buyers or wholesale inventory to surviving dealerships.

Dealership employees - more folks looking for jobs and, meanwhile, going on unemployment. Will 26 weeks be enough? Are sales "sales," or are sales "product specific"? In other words, are sales skills portable and, if they are, are there jobs to which displaced sales people can migrate?

Media, both newspaper and electronic, will lose advertising revenue.

Even "The Web" will be hit.

No advertising means less revenue for ad agencies and media, and Web developers.

All that can mean more layoffs and more people on the market and on the dole.

Unemployed people buy less.

Unemployed people default on their mortgages and, unless a mortgage holder is unusually astute, the property eventually will be foreclosed. Unemployed renters will be forced out of their quarters.

Unemployed people, according to recent studies, get sick more often and, since most lack health insurance - ever priced COBRA coverage ? - by the time they get medical attention, the malady has progressed to a critical - and more costly to cure - level. Since there is no insurance, the public will have to pick up the bill - as long as the funds are available.

What about the "uninsurable?"

Government Medi-this and Medi-that budgets will be strained; even before the major hit on the budgets the Federal government already increased the user-pay for less product. (That's akin to Coca-Cola shrinking it's 8-ounce bottle to 7 ounces an raising the cost-per-bottle 10%. Trouble is, while we can drink water from the tap - the price of which also is increasing - we have no substitute for Medi-whatever other than private insurance and that, as suggested earlier, usually is not available to the unemployed.)

There is a Hebrew expression that, roughly translated goes: A sin begets a sin; a positive act begets a positive act. Permit a paraphrase: Unemployment begets unemployment; employment begets employment.

Where is employment to be found?

The government became, and perhaps remains, the "employer of last resort." The FDR administration created multiple federal make-work agencies (e.g., WPA, CCC) and programs to put Americans to work during the Great Depression. Not being a financial historian, I have to ask if the US had a debt as huge (in dollars-of-the-era) as today's debt. It took a world war to extract the world from the depression.

While suggesting that the government - at all levels - might properly be the "employer of last resort" I read and hear that programs are being abandoned and governments such as Washington DC's are trying to tighten the belt. Translation: more unemployment and less taxable income.

What about famous or infamous - your choice - "bailout money?"

Properly controlled, which the first give-aways were not , that approach may - operative word is "may" - be the best option to recovery. Will "The World" cooperate? If the recession/depression is worldwide, recovery also must be worldwide.

The point of all the above for risk managers is that we must look at The Big Picture and consider the consequences of a risk not only on our own little world - be it the organization or the family - but on those with whom we are closely or loosely related. We also need to look outside our own little world to see how those with whom we are closely or loosely related are doing.

We can see the impact of a relatively minor hiccup in the economy - dealership closings - on the overall economy.

The impact of any risk rarely is limited to a small anything - organization or geography.

Risk managers must understand this and attempt to impress this fact upon management.

John Glenn, MBCI, SRP
Enterprise Risk Management/Business Continuity
Planner @