Monthly Archives: April 2014

Code Screams

Martin Fowler explains a code smell as “a surface indication that usually corresponds to a deeper problem in the system”. He explains that there are subtleties and that “The best smells are something that’s easy to spot and most of time lead you to really interesting problems.”

Code screams are my description of something similar, but different: a code scream is a behavioral indication of a deeper problem in the system. It differs from a smell in that it is behavioral rather than structural. For example, a scream is a user interaction action that takes an unusually long time to complete whereas a smell is a method that has more than a dozen lines of code.

Code screams are almost always found in live production systems because reality is much less predictable than anyone can imagine and, as good software developers, we’ve already accounted for all the predictable situations. Thus the only way to hear the code screaming is to have full spectrum software analytics live on the production system.

As a small example of what I’m talking about, here’s one of our production systems screaming for help: the response time goes way up, brown out start happening…

Response time goes up

A normal thought would be “overloaded” with too much data coming in, but the incoming data rate is unchanged:

Incoming data rate is unchanged

 

With further digging, we discovered that the code was in pain because a DDOS attack was underway on our upstream network provider causing very slow TCP streams and thus too many open and pending sockets, hence the servers couldn’t accept additional requests. That’s the essence of a code scream: a behavioral indication (massive increase in response time) of a deeper problem in the system (inability to handle alive but very slow TCP streams).

The software psychologist action plan is to incorporate full spectrum software analytics and alerting on your production systems and then pay attention when they start screaming.