SaaS apps do not fail because a server explodes. They fail because the stack has become too complex to debug and maintain.
The project is built in such a way that it is impossible to reproduce bugs locally. After jumping through dozens of nested files and multiple microservices, you still can't figure out where the bug is coming from. Or you don't have a reliable way to run database migrations so they crash all the time. Or maybe no one knows where anything is or how it works. Because the person who built it left 6 months ago and everything is so convoluted that nobody understands it.
This is what kills productivity. This is what kills projects.
Be a minimalist
Every app I work on, I always try to keep the stack as lean as I can.
Take this blog for example. It is just HTML files. Nothing else. Nothing fancy. And it just works. I can edit anything I want at any time. I have version control with Git and it scales because I have a CDN. I do not need anything else. It is just me working on this alone, so why would I need a complex system?
Before I add anything, I ask:
- Can I run it locally?
- Can I easily replace it?
- Can I reuse something I already have?
- Can I remove it in one sprint without breaking everything?
- Does it give more value than its full cost, including complexity, migration risk and maintenance burden?
Building a resilient system takes years of failures
I had a client who wanted maximum redundancy. We set up multi-cloud failover so if AWS died, Azure would take over.
In the end, the card on file expired so both providers stopped working and the whole setup fell over.
Hidden coupling is everywhere. Billing, identity, DNS, CI, secrets managers, etc. I can do my best but I won't catch them all.
I have also seen provider lock-in hurt teams hard. A vendor gets acquired, prices jump or an account gets suspended. So today, I build in a way I can easily leave.
Optimize for recovery not for fantasy uptime
Across my career, hardware has rarely caused the worst incidents. Human mistakes, bad deploys, provider lockouts and architecture mistakes did.
I cannot predict every outage. I can control how fast I recover.
I want incidents to be boring and recovery to be boring.
The stack I trust by default
For small teams, this setup has worked for me for a long time:
- One oversized server
- Postgres for primary data
- Files on local disk
- Containerized app deployment
I oversize the machine on purpose. Extra RAM gives room for spikes, leaks and debugging without panic. It also stops premature optimization work that does not matter yet.
A boring server with too much headroom beats a clever setup with no margin.
Use Postgres for more than tables
Most of the time, I need a database, a key-value store and a queue. Turns out, Postgres can already do all of this and so much more.
- Normal relational tables for core data
- Materialized views for caching
- Unlogged tables for fast transient data
- A Postgres-backed queue library such as pgboss
- Cron + scheduler
- Postgres's built-in full-text search is more than enough for most apps
- Append-only tables for audit trails, events, webhooks received and logs.
- Time-series analytics store with partitioning, indexes and rollups.
Eventually, I might move to Kafka, Redis, Elasticsearch or ClickHouse but Postgres works really well for a surprisingly long time.
In practice, fewer systems mean fewer moving parts and fewer places to look at when I debug something.
Everything should be boring
My rule is simple: everything should be boring. There's nothing exciting about using Postgres. It's easy to explain to people and it won't look shiny on my CV.
But a boring stack is just much easier to debug.
I restore and recreate the production environment with its data and reproduce the issue locally.
My backup plan is also dead simple:
- Run regular `pg_dump`
- Incremental backups to object storage with restic
Then I can run restore drills on any machine.
Rebuild on a fresh server, restore files and database and verify the app boots end-to-end.
Deploy fast
I care a lot about CI speed. If deploys take half an hour, the team only gets a handful of deploys in a day.
My deploy path is equally boring every single time.
Build a docker container with everything, including the frontend bundle and any other assets. No need for S3 or anything complicated, the CDN will cache it anyway.
To microservice or not
I do not like microservices, but they make sense when:
- Different services need different runtimes
- A critical ingestion path must stay isolated
Most of the other reasons I've heard are organisational issues or shortcomings in the architecture. At this stage, your team is too small for "coordination overhead", "domain-driven design" or the worst reason to use microservices, "scaling".
Still, I keep things simple with HTTP API calls. No Kafka or PubSub unless I really need it.
I choose boring protocols because boring protocols fail in obvious ways and have tools with decades of development to monitor and debug them.
Get off the cloud
This one is controversial but hear me out. It actually makes sense.
I am not against cloud. I am against cloud complexity theater that teams adopt before they have a real need.
There's the classic mistake of over-engineering too early with Kubernetes and full cloud ceremony, but there's another more subtle one.
The memory- and CPU-starved PaaS that forces you to deal with fake scaling issues.
Under-provisioned platforms distort priorities. I have seen teams waste time on caps, cold starts and workarounds instead of shipping product.
People misread platform limits as architecture needs and add caches and service splits they never needed.
None of that is needed if you get one HUGE server to begin with. I can get a fully managed server with 32-core, 256 GB RAM and 2TB of storage for € 199/m.
Hardware failures rarely happen. Usually the app goes down because a human messed something up. Even if the server itself fails, the backup recovery is so simple the whole app can be redeployed somewhere else.
But there's still a place for innovation. Strong engineering is boring by default and innovative by exception. When we innovate, it has to be justified, calculated and easy to undo.