Many of us don’t think too much about Oracle Support services – they are there and they sit on a trove of information treasure, but beyond that, what value are they? My company may be paying several hundred thousand dollars a year for the privilege of having that OSS login. Let me tell you though for anyone who’s dabbled with service requests of severity 2/3/4, the Rolls Royce of Oracle Support comes out the moment you escalate or create a severity 1 request.
For those not following along, a severity 1 request says “help! my database is down”. That database is mission critical to my business/function and without it, our clients/customers/users face catastrophic issues. So it’s serious, and you’d better be serious about it before declaring a severity 1 emergency.
Case in point, a client had experienced serious, unrecoverable corruption of several ASM disk groups, including the one holding the 11gR2 clusterware components (voting disk, OCR etc). One node that had rebooted had attempted to come up and appeared to have done so successfully with a workaround or two – yet in reality, it had not joined the cluster and the ASM and RDBMS instances were almost operating independently of the other node instances. It was not pretty.
What was worse, the surviving node was operational with ASM and database instances running as expected. Except they were relying on cached data for clusterware and ASM disk groups that had failed. One poke with a pen and it would have fallen over causing a long unplanned and quite unpleasant outage.
The architecture of the cluster is a topic for another day.
So, engaging Oracle Support and escalating my severity 2 service request to a severity 1, I was pleasantly surprised to get a phone call from the just as I was updating the request with some additional information. The analyst was knowledgable and understood all to well the risks facing my environment and quickly pointed me to two very useful documents that detailed the steps to resolve my issue. She also concurred that this situation was one breath away from an outage and made no bones about iterating the importance of a total outage to resolve the underlying issue.
Long story short, a total outage was arranged and within 90 minutes the application was brought down, the resolution implemented and the application returned to full operational status. Cluster services behaved themselves and it’s been an uneventful period since.
So, yes, Oracle Support is worth every penny. Even if you are an experienced DBA (23 years for me) you’ll always come across a weird situation from time to time that becomes a wonderful learning experience.
Note 1062983.1, you are the best! To Judy, the support analyst, I am grateful for your experience and knowledge.
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems (Doc ID 1062983.1)
OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) (Doc ID 428681.1)