For me the most interesting concept in Erlang/BEAM is that partial recovery is built in from the ground up. When an unexpected state is encountered, instead of either killing the entire process or trying to proceed and risking corruption, you just roll back to a known good state, at the most granular level possible. This idea was researched many years ago under the name of "microreboots"(associated with "crash-only software"), but only Erlang/BEAM made it a first-class concept in a production system.
You still have to be careful with supervision trees and parts of the tree restarting. For example your system might work if the whole erlang operating system process is suddenly killed and restarted but your system might start corrupting data if parts of the erlang process tree is restarted. Erlang gives you a good model to work with these problems but it doesn't allow you to completely turn off your brain. If you walk in thinking that you can just let things restart and everything will be fine then you might end up getting burnt.
> You still have to be careful with supervision trees and parts of the tree restarting [...] Erlang gives you a good model to work with these problems but it doesn't allow you to completely turn off your brain.
Erlang gives architects the tools to restart as little, or as much of the tree as they like, so I hope they have their brains fully engaged when working on the infrastructure that underlies their projects. For complex projects, it's vital think long and hard about state-interactions and sub-system dependencies, but the upside for Erlang is that this infrastructure is separated from sequential code via behaviors, and if the organization is big enough, the behaviors will be owned by a dedicated infrastructure team (or person) and consumed by product teams, with clear demarcations of responsibilities.
> When an unexpected state is encountered, instead of either killing the entire process or trying to proceed and risking corruption, you just roll back to a known good state, at the most granular level possible.
> but only Erlang/BEAM made it a first-class concept in a production system.
In most languages that have exceptions you don't have the same guarantees because the values are not immutable so if they were mutated they will stay mutated. The language can roll back the stack using exceptions but it can't roll back the state.
The BEAM runtime and all languages that target it including Erlang do not allow mutation, (ETS and company excepted). This means that on the BEAM runtime you can not only roll back the stack but you can also rollback the state safely. This is part of what the poster meant by the most granular level possible.
Maybe their idea is that you can have a thread that processes work from a queue and catch any exceptions thrown during that processing and just continue processing other work.