Preserving our data record

A bug occurs when our code does something unexpected and puts our data into a state we don't expect. To debug a bug, we need clues about how we ended up in this unexpected state. The current, buggy state of our data tells us that something is wrong, but it might not explain how it happened. The real clues are in the state of the data as it was just before the bug occurred. We need to look backwards through time and this is why we preserve our data record with a two-step data classification process.

  1. Classify all data as memorable data, which must not be deleted or overwritten because it records the story of how the state changed over time.

  2. Downgrade some exceptional memorable data to forgettable data, which can be deleted and overwritten because it tells us nothing about how the state changed.

By initially classifying all data as memorable, and only downgrading certain exceptions, we ensure that any past state of our data is preserved for future debugging.

There is no definitive list of which data is memorable and which is forgettable and this is why the classification strategy is conservative. Data must be shown to be forgettable to be downgraded. An example of forgettable data is a user's first name; it can change value but any specific value does not tell the story of how and why the state got to where it is.

It is acceptable for forgettable data to be lost so using database DELETE and UPDATE statements is also acceptable. This is not acceptable for memorable data and we have some strategies to preserve it.

Soft-deletion is achieved by adding a deleted_at timestamp to each table that contains memorable data. The timestamp is set when the row is soft-deleted and our code can filter out these “deleted” entities.

Versioning is achieved by having multiple rows for the same business entity and the most recent row is the definitive version.

 id  amount  created_at
---  ------  ----------
123  100.00  2019-01-10
124  105.00  2019-01-11
125  200.00  2019-01-12

Leaving null values in a row means the row can be updated if no memorable data is overwritten.

  id: int
  amount: float
  sent_at: timestamp, default: null
  paid_at: timestamp, default: null

INSERT INTO invoice (amount) VALUES (100.00);
UPDATE invoice SET sent_at = now() WHERE id = 123;
UPDATE invoice SET paid_at = now() WHERE id = 123;

These strategies preserve an unbroken record of how our data ended up in the state it is in. Without them, bugs take longer to debug or are undebuggable.