Saturday 16 February 2013

Automated Large Object Swapping Part I


 

I don't like the title - says a bird
I don't like that title!
Creative Commons - see here
I hate that title! But it does describe what has been keeping me up nights for a week now. The challenge - make swapping large objects to disk completely automatic, not use too much disk and be efficient.


I have a system which hits these goals now; it is far too complex for a single Nerds Central post so I will discuss it over a few posts. This is an introduction and a description of the algorithms used.

The challenge comes when Sonic Field is performing very large renders. However, I believe the solution is general purpose. 

The particular piece which caused all the trouble was 'Further Into The Caverns'. You see, I have made a new memory management system to deal with vocoding. The idea I used for that was to keep track of all the SFData objects (audio data in effect) via weak references. When the total amount of SFData live objects went above a particular level, some were written to disk. The simple approach worked on the idea 'disks are big and cheap, so we will just keep adding to the end of the file'. This worked find for small renders with big memory requirements, but for long running renders it was not so good. I would have needed over a terabyte of storage for Further Into The Caverns.

Whilst all this is about audio in my case, I suspect the problem is a more general one. I suspect that any large scale computation could hit similar issues and may well benefit from my work. Here is a general description of the challenge.

  1. A program is performing very large computations on very large (multi-megabyte) in memory objects.
  2. Reading and writing objects from disk is very slow compared to storing in memory so should be avoided.
  3. Memory is insufficient to store the peak level of live large objects the program uses.
  4. Disk is large, but the amount of data used by the system will consume all the disk if disk space is not re-used.
  5. The program/system should handle the memory management automatically.
  6. Using normal operating system supplied swap technology does not work well.
This latter one is something I tend to see a lot with Java. Because the JVM is one process and it has a huge heap which has objects constantly being moved around in it, regular OS style swapping just cannot cope. A JVM has to fit in RAM or it will thrash the machine.

Trial And Error
I wish I could say I designed this system from scratch and it worked first time. The reality is that many different ideas and designes went by the way-side before I got one which worked well. I will not bore you with all the failures; I believe it is sufficient to say that the approach I took is born of hard kicks and a lot of stress testing.

Object Life Cycle
Sonic Field's model has a particular object life cycle which makes it possible to perform the disk swap technique without a significant performance impact where swapping is not occurring. This is achieved by wrapping large objects in memory manager objects when the former is not actively involved in a algorithm tight loop. Thus the overhead of the extra indirection (of the memory manager) is not incurred inside tight loops:
  1. Creation
  2. Filling with data
  3. Wrapping in memory manager object
  4. In RAM Storage
  5. Retrieval
  6. Unrapping - retain wrapper
  7. Algorithmic manipulation (read only)
  8. Return to 3 or
  9. Garbage collected
  10. Wrapper garbage collected
The key is that large objects are unwrapped when in use. However, when not actively being used in an algorithm, they are referenced only from the wrapper. The memory manager can store the large object on disk and replace the reference in the wrapper with information on where the object is stored on disk. When the object is requested for an algorithm again, it can be retrieved from disk.

When To Store
This was a tricky one! It turned out that the difference between the maximum heap and the available heap (are reported from the Java Runtime object) is the best way of triggering swapping out of objects. Just before a new large object is allocated the memory manager assesses if the maximum heap less the current heap use less the size of the object (approximately) is below a threshold. If it is, all currently in memory large objects are scheduled for swapping out. The details of this swapping out scheduling are complex and I will cover them in another post. There are other points at which the threshold is checked as well, though the pre-allocation one is the most important.

Cute Woman Posing In Torn Top
I'm all sort of torn up and fragmentary.
Creative Commons - See Here
Non Fragmenting Swap File Algorithm
All this sounds fine, but is really is not enough. A simple application of this idea was more than capable of consuming an entire 1 terabyte disk! The problem is fragmentation (if you are interested in the subject - here is a good place to start looking).

Let me explain in a little bit more detail:
Consider that I write out ten 1 gigabyte objects. Now my swap file contains 10 slots each of which is approximately 1 gig long (there a few header bytes as well). Now - the JVM garbage collector may eventually garbage collect the wrapper object which contains the information about one of the objects. This is detected by the memory manager by a weak reference to that wrapper returning null from the .get() method. OK! Now we have a slot that is free to be used again.

Unfortunately, this 1 gigabyte slot might well become filled with a 10 megabyte slot. We could then create a new slot out of the remaining 990 gig, but we have made sure that if we were then to need another gigabyte, it would have to go on the end of the swap file thus increasing swap usage.

It turns out that this process of fragmentation just keeps happening. Rather than (as I had originally thought) it levelling off so the swap file grows asymptomatically to maximum, the reality is that the file just grows and grows until what ever medium is it on fills up. Clearly a better algorithm is required.

The next obvious step is slot merging where by, if two slots next to each other are found to be empty (weak references to the file info objects) then they can be merged to create a bigger slot. There is no doubt this helped, but not enough to run Further Into The Caverns in less than 250G of file space. I became ambitious and wanted to run that render on my Mac Book without an external drive, which meant getting the swap down below 100G or so (it has a 250G SSD, and it is not good for SSDs to be filled completely).

So, an even better algorithm was still needed. I could have used one like that buddy algorithm (linked above) or something based on the idea that Sonic Field objects come in similar sizes. However, due to the point of indirection between wrapper objects and the swap file, an near perfect option exists.

Compaction - Simple and Effective
A rubbish compacting lorry
Squash it, move it, dump it.
Creative Commons see here.
Yep - garbage is better off squashed - makes it easier to deal with and move around.

Let us thing of the swap file as a set of slots, those with stuff in I will represent at [*] and empty ones (i.e. space which can be reused) as [ ].

[*][ ][*][ ][ ][*]
 1  2  3  4  5  6

If we swap 3 and 2 we get:

[*][*][ ][ ][ ][*]
 1  3  2  4  5  6


Now we can merge 2, 4 and 5:

[*][*][       ][*]
 1  3  2  4  5  6


Finally, we swap the new large empty slot with 6:

[*][*][*][       ]
 1  3  6  2  4  5 


With a single pass of the swap file we have moved all the empty space to one large slot at the end. Next time something is written to the file the big slot at the end can be split into two.

The final result
Moving slots is an expensive operation because it means copying data from one part of the disk to another. As a result, running the full compaction algorithm every time we need to find some space in the swap file is not such a good idea. It turns out that merging adjacent free slots and splitting slots when not all of one is used can keep the swap file from growing for a while. Eventually, though, it gets badly fragmented and starts growing quickly. The compromise I have found works OK (but I am sure could be tweaked to improve it) is to use a fast merge/split approach most of the time, but at random intervals which on average are every 100 times the memory manager looks for space, a full compaction is performed. I went for the random interval approach to ensure that no cycles between loops in a patch and the memory algorithms develop.

Did it work? Yes - Further Into The Caverns renders with just 5.8 gigabytes of swap file!

No comments:

Post a Comment