Genuinely 24x7, highly transactional environment with low tolerance for downtime.
Microservices have to be carefully considered.
A set of cascading failures across a suite of microservices contributed to doing 30+ hours of oncall in a single weekend.
We came up with a checklist of operational requirements, dozens of which were failing on each service.
This is a distillation of learnings from that time!
The applications I support are typically provided by third parties, and so making them more supportable is hard. But that’s what support tickets are for.
Or public education…
No support team should have to guess how software is set up. Have organisational standards that declare:
Special mention for this beauty buried deep in a file that changes every version:
<!-- Uncomment this to enable this particularly useful configuration
<some>
<arbitrary>
<xml>Make stuff work here</xml>
</arbitrary>
</some>
-->
Particularly difficult to template or remove specific lines in a way that works across releases
AuthenticationException
when a user mistypes their password is NOT exceptional — don’t log a stack trace for it — put it in an audit log if need be!X-Forwarded-For
or similar, not the intermediary IP.Michael Nygard wrote the book on making software operable.
Jez Humble and Dave Farley regularly makes the point that configuration management is one of the foundations of Continuous Delivery and have some great advice:
In Doing microservices right, Sam Newman talks about how to avoid some of the common pitfalls of microservices. Building Microservices is currently in Early Release.