On 1/26/20 at around 11:34 AM EST, we were informed that the site was becoming extremely slow and responses took a while.
Max - Last Sunday at 11:34 AM
Can it be that the server has to handle a lot right now? Because responses are really slow right now (and I mean even slower than my connection normally is)
After this initial message, we noticed that our servers had gone completely offline. Throughout the next few days, the systems would come online for a short period, then go offline again for a longer period, and the cycle would repeat. We were unable to get any onsite staff to have a look at the possible hardware problem, as through the limited periods of connectivity, we were unable to diagnose any issue through our router interfaces or through Proxmox.
Throughout the next few days we attempted many troubleshooting steps that we could remotely including gateway resets and router reboots. None of these appeared to have resolved the problem.
We were eventually able to get onsite staff to look at the problem and no immediate problem presented itself. We did reseat all networking connections and the issue seems to have resolved itself. We are unsure of the exact cause of the issue, but have suspicions it was a power supply problem with our HP ProCurve switch. We will update this postmortem if any new information is discovered.
We have added many new monitoring systems through Datadog to allow for any future incidents to be handled better. Some of the Datadog stats are available on the homepage of this status site, but the rest are available on our public Datadog dashboard, which is linked at the top of this status site. We will be able to better handle any future incidents and be informed quicker when they happen.
The following is some more information on the switch issue:
DJ Electro - Today at 7:53 AM
my thinking was when I connected over Wifi, I couldn't even reach local sites (192.168.1.1 timed out). Soooo it couldn't be the modem since I would still be able to reach local addresses, therefore it must be the switch because if that died then it would make sense I would drop connections to all addresses.
Now our switch does this weird thing sometimes where if theres a big voltage drop which can happen if theirs a power grid problem or if the cable is yanked or pulled to the wrong direction, the switch will power cycle instead of doing what anything else does during a brownout. So my guess was that the cable was pulled wayyyy off and so every single time it would complete the boot cycle it would just power cycle again and we were sort of stuck in an infinite loop but I checked the modem and I reseated the power plug on the switch.
Thanks for sticking with us,
EH Administration