Site menu Fibonacci snapshots

Fibonacci snapshots

I had been thinking about buying a NAS and went for a DIY Linux server. Quite a "waste of time" but I like to tame servers, and it is a way to keep some dexterity with Linux admin. Last time I had set up a Samba server was 13 years ago.

The contraption's main server uses the polemical BTRFS filesystem in RAID 1 (*), one internal disk, one external. The mirror server is in another physicial location, with EXT4 filesystem, just to be sure :) and obviously the partitions are encrypted. I get a daily notification about files deleted or changed in the NAS.

But why BTRFS? First, because of the snapshots. Everyone has already needed, or wanted, to do the equivalent to

cd ~/.yesterday

to undo some mishap. Not only humans but also machines can make errors e.g. the Google Drive mirror program "ate" the target folder due to some obscure bug, and copying it from a snapshot avoided a 100GB download.

Another reason to use BTRFS or ZFS is the protection against data corruption, the dreaded "bit rot". It is a serious problem (**) and nobody should be using filesystems without protection against bit rot, not even in phones. (The same happens with RAM: every device with multi-GB memory should be using ECC-RAM.)

Going back to snapshots. My "NAS" makes snapshots every hour, but it does not make much sense to keep all of them. As one needs to go back in time, she thinks in broader units of measure: one hour, one day, one month, one year. It is not likely that we need to go back in time exactly 11 months, 25 days and 15 hours. The idea is to keep some relevant snapshots, following a Time Machine-like progression, if with more resolution.

The Fibonacci series, in my view, is a nice representation of this subjective effect of coarser resolution for growing magnitudes. For example, in SCRUM we estimate the effort of a task using "weights" chosen from the sequence 2, 3, 5, 8, 13, 20, 40, 100. The heavier the task, the bigger the uncertainty — it is not possible to antecipate that a 40 task is actually 39 or 43. On the other hand, a sequence like 2, 4, 8, 16, 32, 64... is already too coarse.

The beauty of a geometric series like Fibonnaci's is the equal resolution for all units. Even if we calculate the series using hours (or minutes, or high tides), the snapshots older than one week will be distributed as: 1 week old, 2, 3, 5, 8...

So all that was left was to write the script to prune the snapshots, keeping only the "relevant" ones accordingly to a Fibonacci series of time.

The script should work well whether it was run many times, or once a day, or once a week. To achieve this:

1) The script must start the series based on the most recent snapshot, not based on the system clock. (Just suppose that the NAS was turned off for three months. Or worse, the system clock was incorrectly set 10 years into the future.)

2) Within each range of "age", the oldest snapshot is to be kept. Given any other criteria, all snapshots would be pruned before they had a chance to move to the next range.

The strategy above has an issue: from time to time it leaves a gap — time ranges without a snapshot — though it never happens in neighbouring ranges. The gap positions depend on the script execution frequency. If the script is run every week, the first gap may show up at 5th week (and also at 25th, 125th, etc.)

The gap is not desirable, and a behavior that changes as the script is run more or less often is even less desirable. I took some time to see the problem, and more time to find a solution, which is

3) The newest snapshot of the range is also kept, to guarantee that there is at least one snapshot within the range when the oldest one is promoted to the next range.

Finally, my script uses the snapshot's own name to determine its age, and does not touch any snapshot with unexpected name format e.g. a snapshot that I might have made manually. There is the script:

#!/usr/bin/env python3

import time
import datetime
import os

mask = "%Y-%m-%d:%H:%M:%S"
folder = "/nas/snapshots"
remove_cmd = "btrfs subvolume delete /nas/snapshots/"

def hr(timestamp):
    timestamp = time.localtime(timestamp)
    return time.strftime("%Y-%m-%d %H:%M:%S", timestamp)

items = {}
newest_general = 0

for item in os.listdir(folder):
        items[item] = \
            datetime.datetime.strptime(item, mask).timestamp()
        newest_general = max(items[item], newest_general)
        # print("file %s time %s" % (item, hr(items[item])) )
    except ValueError:
        print("Ignored file %s" % item)

# print("now is %s" % hr(time.time()))

# generate Fibonacci sequence in hours
dirty = True
a = 1
b = 1
while dirty:
    dirty = False
    a, b = b, a + b
    x0 = newest_general - b * 3600
    x1 = newest_general - a * 3600
    print("%d..%d - between %s and %s" % (a, b, hr(x0), hr(x1)))
    oldest = 99999999999999999999999
    newest = 0
    candidates = {}
    oldest_candidate = None
    newest_candidate = None

    for k in items:
        if items[k] <= x0:
            # print("there is older")
            dirty = True
        elif items[k] > x1:

        # print("candidate found")
        dirty = True
        candidates[k] = items[k]
        if items[k] < oldest:
            oldest = items[k]
            oldest_candidate = k
        if items[k] > newest:
            newest = items[k]
            newest_candidate = k

    if not candidates:

    print("    keeping %s" % oldest_candidate)
    del candidates[oldest_candidate]
    if newest_candidate != oldest_candidate:
        print("    keeping %s" % newest_candidate)
        del candidates[newest_candidate]

    for k in candidates:
        print("    deleting file %s" % k)
        os.system(remove_cmd + k)


(*) Kosher RAID 1 does mirroring of disks in a layer below the filesystem. The BTRFS's raid1 mode works differently: it guarantees that every data lies on two disks, at least. Any number of disks, even with dissimilar sizes, can be part of the filesystem. The usual for raid1 is 2 to 4 similarly-sized disks. For high availability, at least 3 with good spare space is recommended, otherwise the loss of a disk puts BTRFS in "degraded mode" that is read-only.

(**) All disks (even SSD) have a small rate of unrecoverable, undetectable errors (UBER), something around 1 bit every 1014 bits read. Since disks bigger than one terabyte (1013 bits) are commonplace these days, data corruption will take place just by reading the whole disk. The main problem is not the corruption, is not knowing where it happened. Filesystems like BTRFS and ZFS keep data checksums for corruption detection. If they are operating with redundant disks ("RAID"), the error is silently corrected.