Our Catalog of Ideas, Sweat, Inspiration, observations and Mea Culpas for IT that just didn't work out, aka: It works Awesome Except it Didn't Work (for a moment, at least).
Dear SureMail™ Customers,
Some users may be experiencing periodic slowness in their email connection this morning.
We wanted to let all users know that we are aware of the issue and are in the process of troubleshooting and will be determining a full fix asap. We will have a status update on the resolution within the next 2 hours.
If you have any questions do not hesitate to contact us at 1-800-882-8701 x1 or This email address is hidden from email harvesters via JavaScript
-Your SureTech™ Solutions Team
Dear SureMail™ Customers,
We are happy to report the SureMail slowness issues some users experienced earlier today has been resolved.
The problem was caused by an update to 1 of our 5 anti-virus engines triggering a condition on 2 of our Exchange 2010 mailbox servers that resulted in a service outage for mailboxes located on those servers. The condition was first reported at 10:45 AM EST and due to the complex nature of the issue, it took until 11:23 AM EST to resume normal mailbox access on one of the servers and until 11:38 AM EST to normal mailbox access on the second server. There may have been some lag time for some users to feel the full effects of the fix.
For the future, we have fixed the way these updates are triggered. Furthermore, we are adding several system monitors so that we can detect and resolve this type of issue right away should it come up again in the future.
If you are still experiencing any delays or issues with your email, please contact our HelpDesk support so we can help you right away at 1-800-882-8701 x1 or This email address is hidden from email harvesters via JavaScript
-Your SureTech™ Solutions Team
We experienced a load-balancing issue on one of our application firewalls today which resulted in some clients networks being unable to access our hosted Exchange servers. It did not affect al users or any in-bound mail coming to our servers. The issue was identified at 6:25 AM and was resolved at 7:45 AM.
We are taking corrective actions to develop a monitor that can detect the problem proactively in the future.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
- Your SureTech.com Solutions Team
As part of our continued efforts to provide a secure SureMail environment we updated the configuration of a firewall on the Exchange Server today at approximately 4:20 AM EDT. This update went smoothly and our monitoring indicated no issues with the new configuration.
However, some SureMail clients did begin to experience connectivity issues at that time; this was due to a load-balancing problem caused by our updates. This issue was resolved at 8:20 AM EDT; no mail was lost only connectivity to local clients was affected.
We are currently updating our monitoring to detect this type of load-balancing issue in the future to prevent further connection issues of this kind.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
- Your SureTech.com Solutions Team
Friday afternoon, June 17th, our server administrators noticed that the hard disk on one of our web servers was showing signs of potential failure. In the process of transferring to a new web server using our backup drive, we discovered the backup drive was compromised as well. This double effect caused our sites to be down intermittently between 4:30 pm Friday and 2 am Saturday as we restored data from our most recent back ups. This is the first hardware failure to impact operations in 8 years. We're happy that no backup data was lost, though please check any posts or events updates made between Friday, June 17 at 3 am and Saturday, June 18 at 2 am as some of these edits may not have been retained. We apologize for any inconvenience this causes. If you have any questions, please do not hesitate to contact us. Sincerely, Your Solutions Team at SureTech.com
This email address is hidden from email harvesters via JavaScript
We experienced a new failure mode today with one of our application firewalls. It started returning errors to some customer requests from the Internet at approximately 11:46 AM EST. The issue, while seriously affecting some customer organizations, was not detected via our multiple monitoring systems. However based on some issues we were able to see, we re-started the affected application firewall at 12:09 PM EST. The resulting re-convergence of load balancing that occurred affected the other application firewall starting at approximately 12:12 PM EST and ending by 12:19 PM EST. All services returned to normal production availability via the originally affected application firewall by 12:22 PM EST. The aggregate time during which any customer organizations were affected by the issue was 36 minutes.
We are taking corrective actions to detect the memory fragmentation issue that caused inbound requests to fail on the affected application firewall. We will update our monitoring systems to alert us of this issue prior to inbound requests being rejected, so that we can remediate the issue without customer organizations being affected.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
- Your SureTech.com Solutions Team
Your MS Exchange is fully recovered after Tuesday's service outage. All data has been fully recovered to your mailbox with no loss of data.
The issue we experienced was precipitated by a problem with an HP Storage Area Network that caused an enclosure with 12-drives to fail. Our work to recover from this was followed by an additional drive failure during the rebuild of the degraded array that hosted your mailbox, causing a catastrophic failure in the array.
Please note we regularly deal with upgrades, maintenance and occasional hardware failures transparently and without affecting your service. In this case, however, all data and data redundancy in the production environment was lost causing us to rely on our disaster recovery backup and log systems.
According to this procedure we made your mailboxes available in a 'dial-tone' configuration where you were able to send and receive emails online, but not able to work with older data offline. Then, the original mailbox data was restored, and the 'dial-tone' emails were merged into the mailboxes providing a full data recovery as of 6:30am Wednesday (yesterday).
We fully realize the interruption this caused and will make additional changes to improve our ability to survive a similar enclosure failure in the future without a similar (or idealy any) service interruption.
We appreciate your patience and cooperation during the resolution of this issue. We continue to work to improve the way we manage all our services, including during emergencies and appreciate your feedback.
As always, if you have any questions, suggestions or need support please drop us a line at This email address is hidden from email harvesters via JavaScript or call us at 609-688-1111
- The Solutions Team
We strive to ensure that all our products are reliable and consistent. Whenever services are interrupted we work hard to get to the bottom of the cause and solutions so that such an event does not happen again.
On Monday, December 28 we experienced an outage for selected clients from 8:58 a.m. to 9:57 a.m. EST.
The underlying cause:
An error in the Storage Area Network (SAN) supporting mailboxes hosted on the MAIL34 server removed client access to the mailboxes. We troubleshooted the issue and were able to bring the SAN and server back online within an hour.
Steps taken to prevent reoccurence:
We have implemented additional monitoring of the SAN in order to be informed quickly of this specific condition, so that if this issue ever occurs again, the downtime associated will be much shorter.
We are researching this issue further in an effort to eliminiate the possibility of it occuring again.
Network Stability:
Overall, the entire SureMail™ environment has enjoyed a 99.902% availabilty rate over the past 365 days, along with very few scheduled maintentance periods. The MAIL34 mailbox server has experienced an availability rate of 99.906% since it was brought into production approximately 7 months ago.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
Best regards,
- The Technical Support Team at SureTech.com
We strive to ensure that all our products are reliable and consistent. Whenever services are interrupted we work hard to get to the bottom of the cause and solutions so that such an event does not happen again.
On Tuesday, July 22 we experienced an outage for selected clients from approximately 5 a.m. to 11 a.m. EST.
The underlying cause:
An error in the behavior of clustering services led to the offlining of a number of mailbox stores which prevented access to those mailboxes. The same event also introduced inconsistencies into the log files that are generated for these mailbox stores which made bringing them back online a lengthy process with some element of risk. Once we had taken steps to ensure that incoming mail would continue to be accepted by our incoming mail servers we made copies of all affected mailbox stores to ensure that existing data was secure before beginning the process of rebuilding the mailbox stores. The rebuild process is resource intensive and to minimise the downtime for our customers we allocated additional hardware resources to the recovery process. Recovery of mailboxes began 3 hours after the initial problem and was complete 9 hours later. Other dependent services were brought up on completion of this work.
Steps taken to prevent reoccurrence:
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
Best regards,
- The Technical Support Team at SureTech.com
At 1:49 AM EDT on 6/30/09, a brief power interruption in our Data Center appears to have severely damaged one of the four UPS's in one of our racks. (This UPS had not exhibited any symptoms of issues going into the power interruption.) The damaged UPS resulted in half of the rack's AC power supply being removed.The infrastructure in that rack was designed to continue to function in this type of partial power outage, but several limitations in this design were exposed yesterday, resulting in the queuing of all inbound email of organizations using the Ultimate Anti-Spam Protector option, an outage of BlackBerry service, and issues with one of our two infrastructure monitoring systems. At 8:42 AM, we re-routed inbound email from the queues to the Ultimate Anti-Spam Protector service, and the inbound email resumed processing. Due to a configuration issue with the re-routing, some organizations' inbound email was 'bounced' back to the email sender, instead of being successfully delivered. We were able to restore most email service by 9:30 AM. Some isolated issues with email and BlackBerry service remained until everything was fully resolved at 12:40 PM.
To prevent this type of issue in the future, we have taken corrective actions so that the Ultimate Anti-Spam Protector processing and BlackBerry services will continue functioning in the event of this type of issue in the future. We are in the process of updating the affected infrastructure monitoring system so that it too will operate properly during this type of issue. And we are replacing the affected UPS with a model that will provide our monitoring system with more diagnostic information, to help reduce the probability of a UPS-caused AC power outage occurring again.
We apologize for the service interruption, and we will build on the corrective actions we have already taken, as we continue to strive to provide the highest possible service level on a proactive basis. If you have any questions or concerns please do not hesitate to contact us at
This email address is hidden from email harvesters via JavaScript
or 1-800-882-8701.
- The SureTech.com Solutions Team
On June 22, 2009 we experienced a serious outage on our SureDesk™ systems. While attempting a minor stability upgrade, our systems admins encountered an unfortunate irreversible bug that crashed the connection service to our SureDesk™ Gold environment
As it happens we also had a parallel upgrade standing by for release this weekend that we were able to move up to be in effect today and include when we restored service.
Service was down from 7:30am to 3:07pm and we sincerely regret the inconvenience to all affected SureDesk™ Gold users. Going forward we have adjusted our upgrade policy for bugs to take less risks while system upgrades are also being rolled out. Please note SureDesk™ Platinum users were not affected. Our Gold services don’t have a fully redundant failover standby which contributes to the difficulties in restoring service we saw today.
Also please note in addition to policy changes, we are streamlining work arounds if this were to happen again (which we do NOT expect) including old-school “Terminal Services” access and streamlined local synchronization of SureFiles™. Feel free to contact us for more information.
On the good news Toot-Toot side you should find a number of benefits from the upgrade now that we suffered through the service interruption:
General reliability and performance improvements:
· multi-monitor support
· Additional intelligent printing and reliability
· graphics and color resolution improvements - certain video such as youtube.com now works better on the SureDesk™
Thanks for your patience and please let us know if we can do anything to be of help or if you need help restoring or upgrading your connection.
We strive to ensure that all our products are reliable and consistent. Whenever services are interrupted we work hard to get to the bottom of the cause and solutions so that such an event does not happen again.
On Monday, August 2 we experienced an outage for selected clients from 3:27 p.m. to 5:20 p.m. EST.
The underlying cause:
The partial outage today was caused by a problem with one of the application firewalls. It failed in such a way as to 'lock' those sessions that had been using it, and it required an on-site intervention to correct the issue.
Steps taken to prevent reoccurence:
We have taken action to prevent this failure mode from happening again, and also to enable remotely correcting this issue so that if this issue ever occurs again, the downtime associated will be much shorter.
Network Stability:
Overall, the entire SureMail™ environment has enjoyed a 99.902% availabilty rate over the past 365 days, along with very few scheduled maintentance periods. The MAIL34 mailbox server has experienced an availability rate of 99.906% since it was brought into production approximately 7 months ago.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
Best regards,
- The Technical Support Team at SureTech.com
Xobni which is inbox spelled backwards is an absolutely terrific plug in for Microsoft Outlook except for the small fact that it doesn't work... - view comments
American based, for american customers - and email.
That's pretty much the price of excellent service these days. If you outsource your service to a place that doesn't care about your customers ... - view comments