On Monday 30th May, the service that provisions our Cisco VMR’s (called Cisco TelePresence Conductor - Conductor for short) reported as offline 00:03 hours. Our Cisco VMR's are those that begin with chair or meet. Point to point calling and our V-Connect (Pexip) platform were unaffected by this issue.
The following events subsequently took place to remedy the issue. All times BST (GMT +1)
- UCi2i engineer went onsite to diagnose the problem. It was noted to be a hardware fault so Cisco support contract invoked and ticket logged with TAC at 05:20
- Further support and log information provided to Cisco and they diagnose that we have a faulty memory module AND hard drive. They agree to send replacement parts and engineer to site, but that they won’t have the parts until tomorrow morning (today) Hong Kong time.
- At this stage it was agreed that swapping out the parts would resolve the problem as the server was fluctuating between online and offline statuses.
- Cisco engineers arrived onsite at the Hong Kong data centre at 07:50 (not in the morning in Hong Kong as they had informed us due to waiting on parts)
- At 08:19 the defective parts were replaced and the systems brought back online.
- When the system returned back online it was noted that another hard drive in slot 4 of the system was marked as ‘Predictive Failure’. At this point the system instability after the replacement parts were installed surfaced at Conductor locked up
- At 09:27 Conductor returned to service for another 45 minutes before crashing again.
- At 10:35 a decision was made to ensure that video services continued we migrate the Cisco-based virtual meeting rooms over to our newer Pexip platform. As such this allowed services to function despite a different look and feel.
- At 10:36 an announcement on our website and company Twitter profile stated that there was a problem and we were working on a solution.
- At 10:53 we followed this up with an email to all partners at 10:53 informing them of the issue and requesting they inform their customers (http://uci2i.cmail20.com/t/j-e-kjidkuy-fenog-r/)
- At 11:07 Cisco request us to escalate this as their onsite engineer needed further assistance to investigate a resolution to the problem.
- At 11:32 with the system again failing on us we decided to invoke our BCDR for this system.
- BCDR was completed at 12:20 (such is the process of Conductor that a manual restoration is required)and the Pexip services withdrawn for Cisco VMR’s
- At 12:25 UCi2i started requesting customers with issues to verify services were restored.
We would like to take this opportunity to apologize to all our partners and customers for this outage and any inconvenience caused.
If you would like to discuss this further, please contact your local Sales contact or email firstname.lastname@example.org. Alternatively if you feel you are still having issues, then please contact our support team at email@example.com.