Things that have been thinking about when releasing a new feature in a B2B SaaS environment:
- Logs - If a customer reports a problem, can we track down at what point there was an issue? In a complex system with multiple layers, tracking when an issue happened can be troublesome. Logs leave breadcrumbs that can paint a better picture of what’s happening. Tools like DataDog, can then ingest these logs and generate metrics out of them. GCP does this, but it’s not as nice.
- Feature Flags - Feature flags can split traffic from an entirely new feature, to a previous one - or even cut it entirely. If it’s a completely new feature, I am not entirely sure if it makes sense - if the interaction is via endpoints. In a B2C consumer facing product, one can hide a button, or a menu, if a feature is broken. The remaining experience to the end-user still works. Via endpoints is something I need to further think about. When changing an existing feature in the backend, sometimes it requires a change in the DB Schema - which might be incompatible with the previous one. It’s possible to have two at the same time and revert to the stable path, if the feature is broken. I am just not sure if it warrants the trouble.
- Gradual rollouts - Most IaaS (e.g. GCP, AWS) provide gradual rollouts. After a release is made, a % of the traffic can be pointed to the new revision. This allows one to test the waters with a small percentage of the traffic before fully committing.
- Error Handler - Things like Sentry and others are essential these days. They report crashes when they happen and group problems of the same nature, so we can understand how often these can happen. They also provide things like crash stack trace, so it’s easier to pinpoint the issue.
- Observability - IaaS have the option to set alerts on the different levels of the stack. Things like SQL read/writes, queue’s subscribers rate of ingestion, number of nack (not acknowledge) messages per period of time, etc. You can set alerts at the API Gateway Level, Queues, Database and in other parts of the stack (e.g. Redis). You can then average this data across time and do other manipulations so the alerts that come out are true positives.
- Heartbeats - A service that pings a feature and expects and outcome. From outside to inside: hitting and endpoint every hour and expecting a
200as a response. Postman allows the creation of a battery of requests and have it run every X minutes. If something is wrong they will report. From inside to outside: cron job that starts some work and when it’s done a POST is sent to an external service. The external service expects to be pinged every hour.