Monday, February 9, 2009

Casting magic against segfaults

The problem

For years, there has been the Sys::Mmap module, however, it has a few issues. For example, let's take this piece of code:

use Sys::Mmap; open my $fd, '+>', 'filename'; mmap my $var, -s $fd, PROT_READ|PROT_WRITE, MAP_SHARED, $fd, 0; $var = 'Foobar'; munmap $var;

First of all, it's simply not user-friendly. mmap takes 6 arguments in a weird order, and uses weird constants. Also munmap shouldn't be necessary: variables should dispose of themselves when they run out of scope.

But more importantly, this program does not do what you think it does, though the only hint of that is an Invalid argument exception when doing munmap. During the assignment, the link between the mapping and the variable is lost, so nothing is written to the file. Worse yet, this can even lead to a segfault in some circumstances.

Ouch!

Tying things up?

The documentation clearly says that you shouldn't do this (or anything else that changes the length of the variable), but IMHO this hole shouldn't be left open in the first place, if only because it is extremely counterintuitive (and thus a maintenance nightmare). Modules should fail more gracefully than this.

Sys::Mmap offers a tied interface as compensation, but this didn't work out. The tied interface indeed is safe, but it creates another problem.

Every time it is read, it copies the whole file into the variable. Every time the variable is modified, it writes the whole new value to the file, even if the change only affects a single byte.

Ouch!

Obviously, that doesn't scale at all. One user of the module reported a 10-20 times slowdown of his program after converting to ties. That's not a workable solution.

The solution

Perl has a powerful but rarely used feature called magic. (It's rare use by module authors is indicated by the fact that the prototypes of the magic virtual table as documented in perlguts aren't even complete: they lack pTHX_'s). They are used by the perl core to implement magic variables such as $! and ties (surprise, surprise). It offers 8 hooks into different stages of handling a variable, the three most important being fetching(svt_get), storing(svt_set) and destruction(svt_free).

In my case, I didn't need get magic, but I did use set and free magic. Freeing the variable is not that interesting (simply unmapping the variable), but setting it is. This function is called just after every write to the variable:

static int mmap_write(pTHX_ SV* var, MAGIC* magic) {     struct mmap_info* info = (struct mmap_info*) magic->mg_ptr;     if (SvPVX(var) != info->address) {         if (ckWARN(WARN_SUBSTR))             warn("Writing directly to a to a memory mapped file is not recommended");         Copy(SvPVX(var), info->address, MIN(SvLEN(var) - 1, info->length), char);         SvPV_free(var);         reset_var(var, info);     }     return 0; }

This function is called after every write to the variable to check if the variable is still linked to the map. If it isn't, it copies the new value into the map, frees the old value and restores the link. As copying is potentially expensive, it will issue a warning if warnings (or actually, 'substr' warnings) is in effect.

There is no perfect solution to this problem, but getting a friendly warning is undeniably better than getting a segmentation fault or data loss.

Anyway, you can find Sys::Mmap::Simple here. It offers more goodies, such as portability to Windows, a greatly simplified interface, and built-in thread synchronization.