Site menu VolatileFS: A volatile filesystem for computation caching

VolatileFS: A volatile filesystem for computation caching

Andrew Clausen (translator), EPx (writer), Rik Van Riel. The core ideas in this article arose from an online conversation with Andrew Clausen and Rik van Riel.

0) Reducing latency by caching expensive computation.

Latency is the primary aspect of computers' performance that users care about. While throughput is a more popular measure of performance, latency is more important. Taking a random example, consider an ADSL link versus a dial-up link. The user experience of ADSL is much better, being about 4 or 5 times faster (ADSL is about 300kbps compared to 56kbps dial-up). Users enjoy this speed not so much because they can download more and bigger files, but rather because they need to wait less time to receive information after requesting it. (It's worth mentioning that TCP/IP exacerbates latency problems with the handshakes in TCP and HTTP.)

Desktop computers have ever increasing amounts of RAM and CPU available to them. Yet, the latency users experience has not improved, as applications have also become more complicated. In the past, users opened text files in "text mode". Today, PDFs and web pages are full of detailed figures and proportionally spaced tables, rendered with CPU-intensive intensive algorithms. Naturally, applications "hesitate" before displaying documents, which users find annoying ("in text mode, I can open my files in a flash!")

This article proposes a solution to this problem: caching computed data. While caches are already common for storing copies of data near users (as in CPU caches and web proxies), they have not been used to cache processing. While caching a rendered web page might involve storing a lot of data, the abundance of RAM makes this tradeoff worthwhile for users.

1) How and where to store computed data efficiently

Any application can store computed data in RAM or temporary files. But doing this indiscriminately might make things worse. Perhaps it is quicker to recompute some data than to read it back in - especially if other programs are involved in heavy I/O. Or perhaps other running programs have low demand for CPU but high demand for RAM. The problem is that one program has a narrow view of the computer, and is in a weak position to know what the best global strategy is.

The first idea raised by Clausen is to delegate such decisions to the virtual memory manager (VMM) in the operating system. The VMM already makes similar decisions, such as how big caches should be, what data should be cached in memory or written to a swap file.

The next problem is: how can processes inform the VMM about computed data?

2) A volatile temporary file system ("volatilefs")

One possible way applications could inform the operating system about computed data is through a volatile temporary file system, which we baptised "volatilefs". Such a file system would be similar to the "tmpfs" that appears in Linux and other Unix systems, except that the VMM would automatically delete infrequently opened files. This approach is quite simple, and does not require any new APIs.

3) How applications should use volatilefs

Because volatilefs can delete closed files at any time, applications can store computed data with the following strategy:

[comment: the following discussion looks too complicated IMHO. Ideally we want a solution where the above strategy works well without any changes.]

This brings us to a potential problem: what if an application abuses volatilefs, keeping all its files open so that VMM can't evict any of them? One possibility would be to remove the guarantee that "open files can't disappear". The VMM could delete the file and invalidate its file descriptors. Programs attempting to read or write from the file would return errors.

Obviously, programs should deal graciously with such errors. In particular, they should treat such errors as a sign the VMM has limited space available, and avoid using the cache for some time.

A lot of file handling code presumes that an open file in a local filesystem can not fail. If further investigation points that this guarantee should not be broke even in volatilefs, another strategy would be to swap out the open files instead of deleting them. The application will take a performance hit while reading the file, but at least it does not fail. (See Appendix.)

So, although volatilefs does not need a new API, programs would need to be more careful than usual when reading files from the cache. In the above scenario, the mmap() system call is problematic. AFAIK, mmap() errors with NFS or SMB result in SIGBUS. We could require programs that use volatilefs to handle such SIGBUS signals cleanly, but I'm not sure if this is a good idea:

Another option would be to disallow mmap() calls to volatilefs files. Perhaps this is a reasonable option, because files in the cache would tend to be read and written in one pass without any intermediate data needing to be stored.

4) And how should the VMM decide which files should be deleted?

The VMM needs to balance volatilefs memory consumption with other memory consumption requirements. Inevitably, some volatilefs files will need to be evicted. But which ones? The heart of a good volatilefs implementation would be its eviction algorithm, which should ideally take the following factors into account:

5) How should the computational cost of a file be calculated?

Determining the computational cost of a file is a complicated problem, as it includes RAM, CPU and I/O costs. It would have to be done by each application, because only it knows how the data was computed. One option would be to measure time elapsed. Even simpler would be to use a predefined "weight" for each type of data. Time elapse has the advantage that it measures latency which is what we are trying to minimize. Both approaches still have these two problems:

It would be much more interesting if volatilefs could infer the computational cost of data on its own to eliminate these complications.

An acceptable (not perfect) way to measure the cost of the object is to measure the wall clock time between creation and closing of the file. Bigger time almost always means that the object was more expensive to compute.

Of course, for this to work, the applications should comply with another volatilefs "idiom": they must create the file as soon as the computation begins, i keep it open during the computation, and close it when it finishes. Also, it implies that the application is honest (does not keep the file open for a longer time than needed)

6) Other open questions

APPENDIX: other comments from people

Also, some comments:
 * too many open files: I think we can just swap volatilefs out, if that
makes applications' lives easier.  That said, interrupting a pickle
operation shouldn't be a big deal.
 * read/write vs mmap(): Pickling can be a pain: imagine having to
implement pickle for Mozilla's internal representaiton of a web page!
That said, I can't see any elegant way applications could make good use
of mmap() either.  They'd need to use a special memory allocator that
allocates some objects to the mmap'd space, and other objects elsewhere.
Moreover, you would need to be very careful that there be no references
from non-mmaped pointers to mapped ones when you do close(2).
       Perhaps a cheap trick would be to store an bitmapped image of
the page, and display that while recomputing the underlying data
structures?  At least you could start reading the page instantly,
and could determine if it really is the right page.  (And continue
hitting the back button if necessary.)


Andrew [Clausen]
Alguns comentarios por um user-level weenie :P
(supondo que esse volatilefs nao seja um hoax :D)

- o caching é para ser valido somente durante a execuçao do programa?
ou entre execuções tambem? Aplicaçoes (ou instancias) diferentes podem
compartilhar dados cacheados? Quais as implicacoes de segurança disso?
- tem certeza que é melhor guardar dados em memória  em vez de disco?
Memoria nao é tao barato e abundante assim, é facil abrir umas 3/4
aplicacoes e começar a swappar. Com o volatilefs entao, imagina o caos:
thrashing de VM <-> swap e volatilefs <-> recomputacao e re-save.
- muitas vezes dados computados nao sao um simples bloco de memoria
contiguo. Na verdade acho que na maioria das vezes nao é. É muito mais
comum que seja uma estrutura de dado complexa, que é montada e alocada
aos poucos, num emaranhado de ponteiros. Naturalmente isso tambem pode
ser cacheado com grandes ganhos, como é visivel no APT (e nao no yum),
que cacheia os arquivos de listas de pacotes + lista de pacotes
instalados + metadados em um arquivo mmap()eado, com os ponteiros
sendo convertidos/substituidos por offsets dentro de um bloco
contiguo. Mas isso é muito trabalhoso e por isso a maioria dos
programadores nao usa. Se fosse possivel abstrair grande parte desse
trabalho em uma API de alto-nivel, mudaria a situaçao
consideravelmente, sem tirar a possibilidade de cachear dados em
- como se supoe que uma pagina html seria cacheada? A maioria dos
dados uteis que eu queira cachear vao estar estruturados e espalhados
na memoria, como no exemplo do APT. Nao conheco o codigo de nenhum
renderer de html, mas imagino que os dados esetejam em uma arvore (ou
um grafo qualquer), com imagens, widgets e outras coisas penduradas.
Nao é algo facil de se serializar a nao ser que o toolkit usado
suporte serializacao (como no Cocoa). Fazer a serializacao ou cachear
a mao seria um parto e considerando o pouco tempo em que os dados
sobreviviriam no cache (uma sessao de browser passa por dezenas de
paginas e mesmo sem caching, isso ja usa muita memoria), nao é algo
que valha a pena fazer.

Em suma, a ideia parece boa a principio, mas nao sei se é pratica.

Alfredo [Kojima]
> Creio que quando se fala em cache de rendering HTML, seria o resultado
> finalzão (código Display Postscript ou o bitmap da tela), e não a árvore

O problema de cachear bitmap é que usa uma quantidade muito grande de
memoria e nao é a forma que os renderers de HTML trabalham, caso
contrario nao teria como tratar hyperlinks, widgets, redimensionamento

Algo como o volatilefs certamente tem diversos usos, mas deve ser mais
produtivo nao limitar a discussao a esse tipo de cacheamento.

> HTML. Realmente cachear algo no estilo grafo ou árvore, cheio de ponteiros,
> não tinha mesmo passado pela minha cabeça, e só seria prático armazenar
> dentro do esquema que você falou (área contíngua criada via mmap(), como um
> heap particular só para o objeto em questão).

Isso parece ser uma boa ideia (usar um arquivo como heap pra funcoes
de alocacao de memoria padrao). Deve ser possivel elaborar mais e sair
com uma API de cacheamento em disco simples de usar. Vou pensar um
pouco mais e talvez ripar a ideia :D

Alfredo [Kojima]