Skip to end of metadata
Go to start of metadata

On Tuesday, January 29, at 3:48 PM Eastern, Desire2Learn issued the following service bulletin:

Critical Infrastructure issue impacting SaaS facility
Incident: The D2L Network Operations Center (NOC) advises that an infrastructure event in your SaaS facility may be impacting access for clients.
End User Impact: End users not able to access their Learning Environment, or experiencing slow performance.
Status: SaaS Operations and vendors are fully engaged and working on this issue.
We sincerely apologize for the impact this disruption may have caused your organization. We will follow up with further updates as they become available, or within the next hour.

UPDATES:

January 30, 2013

10:30am ET Update:

At this time we would like to provide a bit more visibility into what we have been experiencing over the past 24 hours.

As part of our major investment in next-generation infrastructure project within our SaaS facilities we began in the fall of 2012, many changes to our environment were required as part of our migration into this new environment.

One of these changes required a sophisticated process of migrating data to our new enterprise storage solution. We made a decision that it would serve our clients best to migrate this data over time, with the help of our vendors using technologies purpose-built for live migration. This methodology prevented the requirement for long, multi-day maintenance windows due to the large volumes of file data that need to be transferred. Effectively this "file virtualization" technology ("ARX") would allow the seamless use of both source and destination storage during the migration with no impact to users.

The issues currently being experienced have been determined to exist within the ARX technology. We are currently seeing different impact to different customers. For customers whom we had yet to begin the migration of their data, or for customers for whom we had completed the process, we were able to remove the ARX solution from their environment, resulting in a complete restoration of service.

For customers who are in midst of their data migration, the ARX cannot be removed, and we have initiated a separate restoration process that applies to a portion of customers. This process involves a configuration change to the internal format of metadata within the ARX. This change has shown to have a positive impact on the clients for whom this process has completed. However, this configuration change takes time to process, and we are targeting noon EST today for completion with clients seeing ongoing improvements as it makes progress.

There are a small group of customers who are affected by one additional issue that was caused by a recommended course of action that did not produce expected results. We are currently in the midst of backing this change out and a new course of action is being considered. We will contact these affected customers directly.

We will be clarifying potential resolution times specific to each client, in our next update at noon.

08:00am ET Update:
  • The plan recommended by our vendors has been partially implement resulting in normal performance for some clients. This is still being actively monitored.
  • Where the solution is only partially implemented we are seeing some performance impact for other clients and we anticipate steady improvement to normal performance by 12:00 PM ET.
  • We understand these events have negatively impacted some of our customers and your end users; some of the experiences are, the inability to login and or extensive delays logging in, and the inability to access content or diminished performance in accessing said content.
  • We will provide a preliminary root cause analysis within two hours followed by a more formal root cause analysis as soon as possible.
  • Please be assured that the best resources from D2L and representatives from all of our vendors have been working around the clock and will continue to do so until resolution
01:45am ET Update:
  • Our vendors have provided us a recommended course of action to resolve this issue
  • We are currently implementing the plan it is expected to take 6-10 hours to fully complete
  • Performance and availability of the site should improve over the next 4 hours, however your users may experience the following:
  • Access to sites will improve
  • Access to content, upload and download of files may still be impacted until the full plan is implemented
  • Please note that a full complement of resources (Development, SaaS, Support, management and Executive) are working around the clock to implement resolution of the problem
  • A full verification of all customer sites will be completed by the full team of Services staff following the implementation of this plan.
  • Our next update will be at 8AM EST
00:30am ET Update:
  • Testing on our proposed solution continues prior to implementation. We will move forward with measures shortly. We will have an update an hour.

January 29, 2013

10:50pm ET Update:
  • Proposed solutions from our vendors to our issue have been selected by D2L management and are now being tested prior to implementation. We will move forward with measures this evening. We will have an update an hour.
9:50pm ET Update:
  • Possible solutions to our issue are being vetted by vendors. We will move forward with measures this evening. We will have an update an hour.
8:50pm ET Update:
  • Internal teams and vendors are continuing to troubleshoot and are working on this issue at the highest priority.
  • The next update will be in an hour, or sooner if the situation changes.
7:50pm ET Update:

Troubleshooting continues with all relevant internal groups along with our vendors
Possible solutions are currently being reviewed
We continue to investigate this issue with highest urgency and priority
The next update will be within the hour or as they become available

6:50pm ET Update:

We have had a further escalation call with the top of our vendors organizations, our next update will be within the next hour or as they are available.
We are currently reviewing options to restore service to an acceptable level
We continue to investigate this issue with highest urgency and priority

5:50pm ET Update:

We continue to investigate the issue and have escalated this to the top of our vendors organizations, our next update will be within the next hour or as they are available.
We have completed numerous troubleshooting steps and are currently recycling the Application servers
We continue to investigate this issue with highest urgency and priority

4:45pm ET Update:
  • SaaS Operations and vendors are continuing to troubleshoot the issue.
    We sincerely apologize for the impact this disruption may have caused your organization. We will follow up with further updates as they become available, or within the next hour.

Desire2Learn Customer Service & Support