How to Survive a Problem That Won’t Sit Still
I recently was faced with a problem that most IT pros can relate to and thought I’d share. I’ve been working on an obscure Active Directory secure channel problem on a production server where it will intermittently decided to lose its domain trust. It’s one of those problems that come and go. To make matters worse, it’s the kind of problem that when it breaks all hell breaks loose but it’s not broken all the time. No one knows when it’s going to break and the only way to do a temporary “fix” is to reboot it. However, you no longer have a problem to troubleshoot.
I’m now in a constant battle with users. They won’t let me leave it broke for any period of time yet they want the problem fixed. Something that seems so obvious to me such as some short-term pain (leaving it broke) for long-term gain (permanently fixing the problem) just doesn’t apply to this set of users. It’s driving me crazy! Anyway, I always try to squeeze some good from all situations like this and decided to document a few steps that might help others track down intermittent problems like this.
One of the big problems with my instance is that the users just had to deal with this for so long and were fed up. They wanted the problem fixed and fixed NOW. On the flip side, they wanted 100% uptime and when the problem occurred the server would be immediately rebooted not allowing…