As previously communicated, the storage maintenance incident for Private Cloud in the data center was caused by a misconfiguration during a routine VSAN storage scale-up procedure. This misconfiguration was the result of incorrect procedural guidance given by VergeIO support staff. As a result of this misconfiguration, the storage array initiated improper reallocation and deletion of storage blocks. This caused a service outage after the scale-up process had started. At this time, the DC engaged VergeIO support to determine the issue.
VergeIO identified that, due to the misconfiguration during the scale-up process, the reallocation was actively deleting sequential blocks of data. The DC stopped the scale-up process immediately as soon as this was recognized. From there, they started scrubbing the VSAN to identify discrepancies between the data on-disk and the metadata, which was untouched by the reallocation. This took longer than expected, causing extended downtime, however, this scrubbing process was essential to restore the system to a usable state while minimizing further damage to the data on the VSAN.
At the end of the data scrubbing process, the DC found that the corruption was substantially more widespread than initially thought. Furthermore, due to the nature of the reallocation process, deleted blocks were evenly distributed across the array, as data was deleted sequentially across striped data blocks.
As a result, while we do have disk images for existing VMs, they are not expected to be bootable, and the DC expects that the majority of files stored on the disks have some level of corruption. Given the extent of the data loss, iSync.io has begun the process of rebuilding all servers from scratch and recover from our remote backup systems. This will take time, and we cannot give an ETA as of now but, we will continue to keep our temporary continuity solution in place until we are able to get our production environment back to 100%
We are working with the datacenter to ensure that we have mitigations in place moving forward, including a separate storage array(s) for disaster recovery, to prevent this from ever happening again in the future. We will also send out a formal root cause analysis as soon as possible.
Please know that our team is working all hands-on deck to get things back to normal as quickly as possible. All telephone and ticket support time will be very limited as we go through the recovery process.
Our customers have grown to expect excellence from iSync.io and many of you have reached out with kind words and some of our resellers have even offered to jump in and help. We appreciate your patience and support.
Getting service restored is our top priority and of high urgency. Your continued understanding is very much appreciated.
We will continue to post updates to here: https://isync.io/billing/index.php/announcements
Saturday, February 8, 2025