A bug occurs when our code does something unexpected and puts our data into a state we don't expect. To debug a bug, we need clues about how we ended up in this unexpected state. The current, buggy state of our data tells us that something is wrong, but it might not explain how it happened. The real clues are in the state of the data as it was just before the bug occurred. We need to look backwards through time and this is why we preserve our data record with a two-step data classification process.
Classify all data as memorable data, which must not be deleted or overwritten because it records the story of how the state changed over time.
Downgrade some exceptional memorable data to forgettable data, which can be deleted and overwritten because it tells us nothing about how the state changed.
By initially classifying all data as memorable, and only downgrading certain exceptions, we ensure that any past state of our data is preserved for future debugging.
There is no definitive list of which data is memorable and which is forgettable and this is why the classification strategy is conservative. Data must be shown to be forgettable to be downgraded. An example of forgettable data is a user's first name; it can change value but any specific value does not tell the story of how and why the state got to where it is.
It is acceptable for forgettable data to be lost so using database DELETE
and UPDATE
statements is also acceptable. This is not acceptable for memorable data and we have some strategies to preserve it.
Soft deleting rows instead of using DELETE
statements.
Versioning rows by creating new rows to replace old rows.
Leave null values in new rows that are placeholders for future data.
Soft-deletion is achieved by adding a deleted_at
timestamp to each table that contains memorable data. The timestamp is set when the row is soft-deleted and our code can filter out these “deleted” entities.
Versioning is achieved by having multiple rows for the same business entity and the most recent row is the definitive version.
id amount created_at --- ------ ---------- 123 100.00 2019-01-10 124 105.00 2019-01-11 125 200.00 2019-01-12
Leaving null values in a row means the row can be updated if no memorable data is overwritten.
CREATE TABLE invoice id: int amount: float sent_at: timestamp, default: null paid_at: timestamp, default: null INSERT INTO invoice (amount) VALUES (100.00); UPDATE invoice SET sent_at = now() WHERE id = 123; UPDATE invoice SET paid_at = now() WHERE id = 123;
These strategies preserve an unbroken record of how our data ended up in the state it is in. Without them, bugs take longer to debug or are undebuggable.