Sunday, 29 November 2015

Work Steeling, Lazy, Parallel Task Schedular in Jython

Performance, and we are cooking.
Sonic Field is heavily parallel, but I have always wanted to make the used of parallelism really easy. 

The schedular has gone though a lot of changes over the years; this new work steeling feature, I believe, really fixes a lot of the deficiencies in the previous design without adding and complexity to the user experience.

So let's start by an explanation of the schedular in general. This is a task based lazy schedular. We create a closure around a 'task' and then return a 'SuperFuture' for it. The thing about SuperFuture is that it does not do anything until it has to. Unlike some Future based task modules, this one does not compute stuff as it turns up; rather it computes stuff as it is needed. 

When I restarted work on this a few days again I realised I had forgotten how the previous, simpler version worked. To avoid this mistake again, I have very, very heavily commented the code. So, rather than duplicate everything, I leave the code and comments to explain the rest.

Note - this code is all AGPL 3 - please respect copyright.

# For Copyright and License see LICENSE.txt and COPYING.txt in the root directory
import threading
import time
from java.util.concurrent import Callable, Future, ConcurrentLinkedQueue, \
                                 ConcurrentHashMap, Executors, TimeUnit
from java.util.concurrent.locks import ReentrantLock

from java.lang import System
from java.lang import ThreadLocal
from java.lang import Thread
from java.util import Collections
from java.util.concurrent import TimeUnit
from java.util.concurrent.atomic import AtomicLong, AtomicBoolean

"""
Work Stealing Lazy Task Scheduler By Dr Alexander J Turner

Basic Concepts:
===============

- Tasks are closures. 
- The computation of the task is called 'execution'.
- They can be executed in any order* on any thread.
- The data they contain is immutable.
- Making a task able to be executed is called submission.
- Execution is lazy**.
- Tasks are executed via a pool of threads and optionally one 'main' thread.
- Ensuring the work of a task is complete and acquiring the result (if any)
  is called 'getting' the task.

Scheduling Overview:
====================

* Any order, means that subitted tasks can be executed in any order though there
can be an order implicit in their dependencies.

IE taskA can depend on the result of taskB. Therefore.
- taskA submits taskA for execution.
- taskB submits taskC, taskD and taskE for execution.
- taskB now 'gets' the results of taskC, taskD and taskE

In the above case it does not matter which order taskC, taskD and taskE are
executed.

** Lazy execution means that executing a task is not done always at task 
submission time. Tasks are submitted if one of the following conditions is 
met.
- the maximum permitted number of non executing tasks has been reached. See
  'the embrace of meh' below.
- the thread submitting the task 'gets' and of the tasks which have not yet
  been submitted.
- a thread would is in the process of 'getting' a task but would block waiting
  for the task to finish executing. In this results in 'work stealing' (see
  below) where any other pending tasks for any threads any be executed by the
  thread which would otherwise block.

Embrace of meh
==============

Meh, as in a turm for not caring or giving up. This is a form of deadlock in
pooled future based systems where the deadlock is causes by a circular 
dependency involving the maximum number of executors in the pool rather than a
classic mutex deadlock. Consider this scenario:

- There are only two executors X and Y
- taskA executes on X
- taskA sumbits and then 'gets' taskB
- taskB executes on Y
- taskB submits and then 'gets' taskC
- taskC cannot execute because X and Y are blocked
- taskB cannot progress because it is waiting for taskC
- taskA cannot progress because it is waiting for taskB
- the executors cannt free to execute task as they are blocked by taskA and 
  taskB
- Meh

The solution used here is a soft upper limit to the number of tasks submitted
to the pool of executors. When that upper limit is reached, new tasks are 
not submitted for asynchronous execution but are executed immediately by the
thread which submits them. This prevents exhaustion of the available executors
and therefore prevents the embrace of meh.

Exact computation of the number of running executors is non tricky with the
code used here (java thread pools etc). Therefore, the upper limit is 'soft'.
In other words, sometimes more executors are used than the specified limit. This
is not a large issue here because the OS scheduler simply time shares between 
the executors - which are in fact native threads.

The alternative of an unbounded upper limit to the number of executors in not
viable; every simple granular synthesis or parallel FFT work can easily 
exhaust the maximum number of available native threads on modern machines. Also,
current operating systems are not optimised for time sharing between huge
numbers of threads. Finally, there is a direct link between the number of
threads running and the amount of memory used. For all these reasons, a soft
limited thread pool with direct execution works well.

Work Stealing
=============
Consider:
- taskA threadX submits taskB
- taskA threadX gets taskB
- taskB threadY submits taskC and taskD
- taskB threadY gets taskC
- there are no more executors so taskC is executed on threadY
- at this point taskC is pending execution and taskA threadX is waiting for the
- result of taskB on threadY.
- threadX can the stop waiting for taskB and 'steal' taskC.

Note that tasks can be executed in any order on any thread.

"""

# CONSTANTS AND CONTROL VARIABLES
# ===============================

# The maximum number of executors. Note that in general the system
# gets started by some 'main' thread which is then used for work as 
# well, so the total number of executors is often one more than this 
# number. Also note that this is a soft limit, synchronisation between
# submissions for execution is weak so it is possible for more executors
# to scheduled.
SF_MAX_CONCURRENT = int(System.getProperty("sython.threads"))

# Tracks the number of current queue but not executing tasks
SF_QUEUED         = AtomicLong()

# Tracks the number of executors which are sleeping because they are blocked
# and are not currently stealing work
SF_ASLEEP         = AtomicLong()

# Marks when the concurrent system came up to make logging more human readable
SF_STARTED        = System.currentTimeMillis()

# Causes scheduler operation to be logged
TRACE             = str(System.getProperty("sython.trace")).lower()=="true"

# A thread pool used for the executors 
SF_POOL    = Executors.newCachedThreadPool()

# A set of tasks which might be available for stealing. Use a concurrent set so
# that it shares information between threads in a stable and relatively 
# efficient way. Note that a task being in this set does not guarantee it is
# not being executed. A locking flag on the 'superFuture' task management
# objects disambiguates this to prevent double execution. 
SF_PENDING = Collections.newSetFromMap(ConcurrentHashMap(SF_MAX_CONCURRENT*128,0.75,SF_MAX_CONCURRENT))

# EXECUTION
# =========

# Define the logger method as more than pass only is tracing is turned on
if TRACE:
    # Force 'nice' interleaving when logging from multiple threads
    SF_LOG_LOCK=ReentrantLock()
    print "Thread\tQueue\tAsleep\tTime\tMessage..."
    def cLog(*args):
        SF_LOG_LOCK.lock()
        print "\t".join(str(x) for x in [Thread.currentThread().getId(),SF_QUEUED.get(),SF_ASLEEP.get(),(System.currentTimeMillis()-SF_STARTED)] + list(args))
        SF_LOG_LOCK.unlock()
else:
    def cLog(*args):
        pass

cLog( "Concurrent Threads: " + SF_MAX_CONCURRENT.__str__())
    
# Decorates ConcurrentLinkedQueue with tracking of total (global) number of
# queued elements. Also remaps the method names to be closer to python lists
class sf_safeQueue(ConcurrentLinkedQueue):
    # Note that this is actually the reverse of a python pop, this is actually
    # equivalent to [1,2,3,4,5].pop(0).
    def pop(self):
        SF_QUEUED.getAndDecrement()
        r = self.poll()
        SF_PENDING.remove(r)
        return r
    
    def append(self, what):
        SF_QUEUED.getAndIncrement()
        SF_PENDING.add(what)
        self.add(what)

# Python implements Callable to alow Python closers to be executed in Java
# thread pools
class sf_callable(Callable):
    def __init__(self,toDo):
        self.toDo=toDo
    
    # This method is that which causes a task to be executed. It actually
    # executes the Python closure which defines the work to be done
    def call(self):
        cLog("FStart",self.toDo)
        ret=self.toDo()
        cLog("FDone")
        return ret

# Holds the Future created by submitting a sf_callable to the SF_POOL for
# execution. Note that this is also a Future for consistency, but its work
# is delegated to the wrapped future. 
class sf_futureWrapper(Future):
    def __init__(self,toDo):
        self.toDo   = toDo

    def __iter__(self):
        return iter(self.get())
    
    def isDone(self):
        return self.toDo.isDone()
    
    def get(self):
        return self.toDo.get()

# Also a Future (see sf_futureWrapper) but these execute the python closer
# in the thread which calls the constructor. Therefore, the results is available
# when the constructor exits. These are the primary mechanism for preventing
# The Embrace Of Meh.
class sf_getter(Future):
    def __init__(self,toDo):
        self.toDo=toDo
        cLog("GStart",self.toDo)
        self.result=self.toDo()
        cLog("GDone")

    def isDone(self):
        return True

    def get(self):
        return self.result

# Queues of tasks which have not yet been submitted are thread local. It is only
# when a executor thread would become blocked that we go to work stealing. This
# class managed that thread locallity.
# TODO, should this, can this, go to using Python lists rather than concurrent
# linked queues.
class sf_taskQueue(ThreadLocal):
    def initialValue(self):
        return sf_safeQueue()

# The thread local queue of tasks which have not yet been submitted for
# execution
SF_TASK_QUEUE=sf_taskQueue()

# The main coordination class for the schedular. Whilst it is a future
# it actually deligates execution to sf_futureWrapper and sf_getter objects
# for synchronous and asynchronous operation respectively
class sf_superFuture(Future):

    # - Wrap the closure (toDo) which is the actual task (work to do)
    # - Add that task to the thread local queue by adding self to the queue
    #   thus this object is a proxy for the task.
    # - Initialise a simple mutual exclusion lock.
    # - Mark this super future as not having been submitted for execution. This
    #   is part of the mechanism which prevents work stealing resulting in a
    #   task being executed twice.
    def __init__(self,toDo):
        self.toDo=toDo
        queue=SF_TASK_QUEUE.get()
        queue.append(self)
        self.mutex=ReentrantLock()
        self.submitted=False

    # Used by work stealing to submit this task for immediate execution on the
    # the executing thread. The actual execution is delegated to an sf_getter
    # which executes the task in its constructor. This (along with submit) use
    # the mutex to manage the self.submitted field in a thread safe way. No
    # two threads can execute submit a super future more than once because
    # self.submitted is either true or false atomically across all threads. 
    # The lock has the other effect of synchronising memory state across cores
    # etc.
    def directSubmit(self):
        # Ensure this cannot be executed twice
        self.mutex.lock()
        if self.submitted:
            self.mutex.unlock()
            return
        self.submitted=True
        self.mutex.unlock()
        # Execute
        self.future=sf_getter(self.toDo)
    
    # Normal (non work stealing) submition of this task. This might or might not
    # result in immediate execution. If the total number of active executors is
    # at the limit then the task will execute in the calling thread via a
    # sf_getter (see directSubmit for more details). Otherwise, the task is 
    # submitted to the execution pool for asynchronous execution.
    #
    # It is important to understand that this method is not called directly
    # but is called via submitAll. submitAll is the method which subits tasks
    # from the thread local queue of pending tasks.
    def submit(self):
        # Ensure this cannot be submitted twice
        self.mutex.lock()
        if self.submitted:
            self.mutex.unlock()
            return
        self.submitted=True
        self.mutex.unlock()
        
        # See if we have reached the parallel execution soft limit
        count=SF_POOL.getActiveCount()
        cLog("Submit")
        if count<SF_MAX_CONCURRENT:
            # No, so submit to the thread pool for execution
            task=sf_callable(self.toDo)
            self.future=sf_futureWrapper(SF_POOL.submit(task))
        else:
            # Yes, execute in the current thread
            self.future=sf_getter(self.toDo)
        cLog("Submitted")

    # Submit all the tasks in the current thread local queue of tasks. This is
    # lazy executor. This gets called when we need results. 
    def submitAll(self):
        queue=SF_TASK_QUEUE.get()
        while(len(queue)):
            queue.pop().submit()

    # The point of execution in the lazy model. This method is what consumers
    # of tasks call to get the task executed and retrieve the result (if any).
    # This therefore acts as the point of synchronisation. This method will not
    # return until the task wrapped by this super future has finished executing.
    #
    # A note on stealing. Consider that we steal taskA. TaskA then invokes
    # get() on taskB. taskB is not completed. The stealing can chain here; 
    # whilst waiting for taskB to complete the thread can just steal another
    # task and so on. This is why we can use the directSubmit for stolen tasks.  
    def get(self):
        cLog( "Submit All")
        # Submit the current thread local task queue
        self.submitAll()
        cLog( "Submitted All")
        t=System.currentTimeMillis()
        c=1
        # There is a race condition where by the submitAll has been called
        # which recursively causes another instance of get on this super future
        # which results in there being no tasks to submit but we get to this 
        # point before self.future has been set. This tends to resolve itself
        # very quickly as the empty calls to submit all do not do very much work 
        # so the origianl progresses and sets the future. Rather than a complex
        # and potentially brittle locking system, we just spin waiting for the
        # future to be set. This works fine on my Mac as it only ever seems to 
        # spin once, so the cost is pretty much the same as a locking approach
        # basically, one quantum. If this starts to spin a lot in the future
        # a promise/future approach could be used. 
        while not hasattr(self,"future"):
            c+=1
            Thread.yield()
        t=System.currentTimeMillis()-t
        cLog( "Raced: ", c ,t)
        
        # This is where the work stealing logic starts
        # isDone() tells us if the thread would block if get() were called on
        # the future. We will try to work steal if the thread would block so as
        # not to 'waste' the thread. This if block is setting up the
        # log/tracking information
        if not self.future.isDone():
            SF_ASLEEP.getAndIncrement();
            cLog("Sleep")
            nap=True
        else:
            nap=False
            cLog("Get")
        # back control increasing backoff of the thread trying to work steal.
        # This is not the most efficient solution but as Sonic Field tasks are
        # generally large, this approach is OK for the current use. We back of
        # between 1 and 100 milliseconds. At 100 milliseconds we just stick at
        # polling every 100 milliseconds.
        back=1
        # This look steal work until the get() on the current task will 
        # not block
        while not self.future.isDone():
            # Iterate over the global set of pending super futures
            # Note that the locking logic in the super futures ensure 
            # no double execute.
            it=SF_PENDING.iterator()
            while it.hasNext():
                try:
                    # reset back (the back-off sleep time)
                    back=1
                    toSteal=it.next()
                    it.remove()
                    cLog("Steal",toSteal.toDo)
                    # Track number of tasks available to steal for logging
                    SF_ASLEEP.getAndDecrement();
                    # The stollen task must be performed in this thread as 
                    # this is the thread which would otherwise block so is
                    # availible for further execution
                    toSteal.directSubmit()
                    # Now we manage state of 'nap' which is used for logging
                    # based on if we can steal more (i.e. would still block)
                    # Nap also controls the thread back off
                    if self.future.isDone():
                        nap=False
                    else:
                        nap=True
                        SF_ASLEEP.getAndIncrement();
                except Exception, e:
                    # All bets are off
                    cLog("Failed to Steal",e.getMessage())
                    # Just raise and give up
                    raise
            # If the thread would block again or we are not able to steal as
            # nothing pending then we back off. 
            if nap==True:
                if back==1:
                    cLog("Non Pending")
                Thread.sleep(back)
                back+=1
                if back>100:
                    back=100
        if nap:
            SF_ASLEEP.getAndDecrement();
        # To get here we know this get will not block
        r = self.future.get()
        cLog("Wake")
        # Return the result if any
        return r

    # If the return of the get is iterable then we delegate to it so that 
    # this super future appears to be its embedded task
    def __iter__(self):
        return iter(self.get())
            
    # Similarly for resource control for the + and - reference counting 
    # semantics for SF_Signal objects
    def __pos__(self):
        obj=self.get()
        return +obj

    def __neg__(self):
        obj=self.get()
        return -obj

# Wrap a closure in a super future which automatically 
# queues it for future lazy execution
def sf_do(toDo):
    return sf_superFuture(toDo)
    
# An experimental decorator approach equivalent to sf_do
def sf_parallel(func):
    def inner(*args, **kwargs): #1
        return func(*args, **kwargs) #2
    return sf_do(inner)

# Shut the execution pool down. This waits for it to shut down
# but if the shutdown takes longer than timeout then it is 
# forced down.
def shutdown_and_await_termination(pool, timeout):
    pool.shutdown()
    try:
        if not pool.awaitTermination(timeout, TimeUnit.SECONDS):
            pool.shutdownNow()
            if not pool.awaitTermination(timeout, TimeUnit.SECONDS):
                print >> sys.stderr, "Pool did not terminate"
    except InterruptedException, ex:
        pool.shutdownNow()
        Thread.currentThread().interrupt()

# The default shutdown for the main pool  
def shutdownConcurrnt():
    shutdown_and_await_termination(SF_POOL, 5)

Sunday, 1 November 2015

Scary Halloween From Sonic Field

I am a very lucky person! My piece of music 'Fever' was plaid at a a Danish Halloween event for thousands of people to hear.

Many, many thanks to Henrik Graham who, most politely, asked permission to use my piece Fever for the music for a presentation at Halloween in Måløv which is a Danish halloween festival help at a park in Måløv Denmark.

There is the rather excited email I received from Henrik last night (the night of halloween).

" Hi Alexander Just want to drop you a note thanking you for the opportunity to share your music with 5.000 guest at Halloween in Måløv. It created an awesome ‘sound-installation’ in the park. Everyone I talked to thought it was spot on and provided the experience of parading through the park a new spooky dimension.

I’ll let you know once the pictures of the 2015 event is posted on the website: http://www.halloweenimaaloev.dk/ TV2 (one of 2 major Danish broadcasting companies) visited and made a TV spot that was broadcasts yesterday – you can actually hear your music at the beginning and at the very end of the clip 43-45 sec. just before the end at 45 sec. See and listen here: http://www.tv2lorry.dk/artikel/285660

If you interested in the parade here is the link to the programme: http://www.halloweenimaaloev.dk/assets/halloweenkort_ruten_2015_web.pdfthe The PA was setup to cover post 24 to 32. Below I’ve marked the PA setup with red dots app. 4kw Turbosound setup.

The blue dots maks the route through the park.
In a sec I’ll be emailing the organizer asking them to pay tribute to you by posting a thank you note on their website. Would you like anything special or will something like “Thank you to Dr. Alexander J Turner…” do? All the Best Henrik Graham "


Here is Fever on youtube:


Also see see it and music I have made on my 'Greatest Hits' page (note the irony).