Montag, 5. November 2012

Scala Closures as HyperGraphDB Transactions

Transactions in HyperGraphDB follow an MVCC (multi-version concurrency control) model where a transaction can be aborted midway simply because there's a conflict with another transaction. Unlike a locking model, where a thread waits until it can acquire a lock and perform the required operation, MVCC transactions never wait and deadlocks never occur. On the other hand, conflicts are not uncommon and one must be prepared to deal with them. A conflict can be detected either while the transaction is running or during commit time. Regardless, the way to handle a conflict is simply to retry the transaction and do so until it succeeds.

In this short post, I will show how to write proper HyperGraphDB transactions in Scala. At the end, you can take the result transact generic method and use it as a global utility in your projects. Let's start with a basic template and build from there. Here's a first attempt at encapsulating code in a database transaction:

    var graph:HyperGraph = HGEnvironment.get("/tmp/hgdbinstance")
    graph.getTransactionManager().beginTransaction()
    graph.add("Hello World.")
    graph.getTransactionManager().commit()

The code between beginTransaction and endTransaction here is irrelevant - it can anything reading and writing to the database. The first and obvious problem with the above is that it ignore exceptions that may occur during the transaction. So a second approximation might look like this:
    
    var graph:HyperGraph = HGEnvironment.get("/tmp/hgdbinstance")
    graph.getTransactionManager().beginTransaction()
    try {
      graph.add("Hello World.")
      graph.getTransactionManager().commit()
    }
    catch {
      case e => {
        graph.getTransactionManager().abort();
      }
    }
Now, as I mentioned above any database operation as well as the final commit may fail due to a conflict with another transaction. If that happens, the transaction must be repeated. Since we can't know in advance how many times such a conflict can occur, we have to repeat the transaction in a loop. Hence the code grows to the following:
    
    var done = false
    while (!done) {
        graph.getTransactionManager().beginTransaction()
        try {
          graph.add("Hello World.")
          graph.getTransactionManager().commit()
          done = true
        }
        catch {
          case e => {
            graph.getTransactionManager().abort();
            if (!graph.getStore().getTransactionFactory().canRetryAfter(e))
              throw e
          }
        }
    }
Since Scala doesn't have continue/break in loop, we use a boolean to break out. The logic here is simply: we try to execute our task, commit and exit the loop. In case of an exception, we must loop again when we're dealing with a conflict. The canRetryAfter method tells us precisely that. Its implementations basically checks for a few types of exceptions that indicate conflict and that may depend on the storage layer. So, the above will loop as many times as takes for the transaction to succeed. What's the guarantee that it will ever succeed (you may ask)? The guarantee is that the database will select which transaction to abort at random. Obviously, if you have a really long running transaction in a highly concurrent environment, it may take a really long time to succeed. While one can construct such a situation artificially, it is unlikely to occur in practice. Also, it must be noted that MVCC transactions can conflict only if they both read and write. A read-only or a write-only transaction will always succeed regardless of how long it is.

So far so good. This is already a lot boilerplate code for what will frequently be a couple of db operations. The HyperGraphDB API lets you encapsulate your transaction in a java.util.Concurrent.Callable so you don't have to write all that code:

        
    graph.getTransactionManager().transact(new Callable[Object] {
      def call():Object = {
         graph.add("Hello World.")      }
    })
But we are using Scala which already has a much nicer syntax for closures than Runnable or Callable. So we'd prefer to write something like this instead:
        
    transact(Unit => {
        graph.add("Hello World.")
    })

And the transact method could just be a utility that delegates to the HyperGraphDB Callable-based API. Except that doesn't work. The problem with that strategy was what actually prompted the writing this blog post. The reason is that Scala compiler uses Java exceptions to implement return statements in the middle of a function block. So if a the function you pass to transact has some logic with a return statement before the end of the function, it will throw an exception of type NonLocalReturnControl. However that exception is caught by the HyperGraphAPI which considers it abnormal and the transaction fails. For more, see the this forum thread:  http://www.scala-lang.org/node/6759. Instead of debating whether HyperGraphDB is at fault here for catching all throwables, we can "just" reimplement the transaction loop in Scala taking that exception into consideration. Here is the full version:
        
  def transact[T](code:Unit => T):T = {
    var result:T = null.asInstanceOf[T]
    var done = false
    var commit = true
    while (!done) {
      graph.getTransactionManager().beginTransaction()
      try {
        result = code()
        commit = true;
      } 
      catch {
        case e:HGUserAbortException => {
          commit = false
          done = true
        }
        case e:scala.runtime.NonLocalReturnControl[T] => {            
            result = e.value
            commit = true;
        }
        case e => {
          commit = false          
          if (!graph.getStore().getTransactionFactory().canRetryAfter(e))
            throw e
        }
      }     
      finally try {
        graph.getTransactionManager().endTransaction(commit)
        done = done || commit
      } 
      catch {
        case e =>  {
          if (!graph.getStore().getTransactionFactory().canRetryAfter(e))
            throw e
        }        
      }
    }
    return result
  }

You can put this method in whatever utility package you have and call this transaction method instead of the HyperGraphDB one. Let me make a few observation before leaving:

  • Note the rather ugly initialization of the local result variable. This is the only way Scala would type check because otherwise it doesn't know at compile time what should the default value of a generic parameter T be. 
  • The logic is governed by two booleans: commit and done. Commit says whether we should commit or abort, and done says whether we should exit from the loop.
  • The HGUserAbortException is part of the HyperGraphDB API and part of the contract of executing transactions. User code may throw this exception to cause the transaction to be aborted. This is the only way you can abort a transaction due to application logic when you are inside this transact method. Otherwise, transact has no way of knowing if the transaction was legitimately aborted or something went wrong.
If you have question about this code or any other aspect of MVCC transactions in HyperGraphDB, don't hesitate to post a comment or ask on the discussion forum.

5 Kommentare:

  1. thanks for the contribution!

    Some thoughts on how to get a more idiomatic scala solution:
    - one of the paradigma shifts of scala is to avoid mutable variables and statements, preferring expressions and immutable values. The advantage is more reliable code, especially in multithreaded environment.
    I can be wrong, but as far as I can see, using one of the following two constructs, can avoid both, mutable variables and the while loop.
    http://stackoverflow.com/a/3063092
    or
    http://stackoverflow.com/a/3064195
    I currently don't have access to my PC to dig into and try it out myself

    - if the above doesn't work:
    assigning null to the return value as a means to describe success or failure of some operation is considered a strong anti-pattern in scala, since null can be a valid value, which can be seen here:
    scala> null.asInstanceOf[Int]
    res0: Int = 0

    The idiomatic alternative would be to use the Option[T] wrapper. Preferably if possible, usage as the return type of def transact. So usage would be like this:
    var result : Option[T] = None
    ...
    ..

    try {
    result = Option(code())
    ...

    if transaction return type should stay T not Option[T]
    return result.getOrElse(someDefaultValue)

    Besides avoiding null, there are other reasons why Options are a worth the indirection, i.e. they can take part in for comprehensions / any monadic operations.


    Will think about how to rewrite asap.

    best
    ingvar

    AntwortenLöschen
  2. Hey

    I'm not assigning null as a means to describe success. I have boolean variables for that. I'm assigning null only because Scala requires me to assign something here. Basically, while the type is generic and unknown at compile time, Scala asks me to provide an appropriate initial value because it doesn't know what that could be (given that the type is unknown). But I have the same problem..I don't the type either, so I can't provide a value :)

    AntwortenLöschen
  3. Sidenote concerning NonLocalReturn: it has to be dealt with, but the case can and should be avoided already when writing scala code, because usage of exceptions have a non-negligable performance impact, amongst other reasons.
    Idiomatic scala is not supposed to use return at all. Either put result value in the last line of a method, or, in order achieve early returns, one uses functions like find or collectFirst. Quick test in the REPL:
    fast:
    (1 to 1000000000).find(i => i*3 > 100)
    slower:
    (1 to 1000000000).find(i => i > 100000*3)

    additional ref: this answer on stackoverflow demonstrates how to avoid complex while /for loops with early returns: http://stackoverflow.com/a/9707994

    AntwortenLöschen
  4. Idiomatic scala versions of transact can be found here:
    https://gist.github.com/4055504

    Implementation is expression based, uses recursion instead of while loop, and is tail recursion optimized (no stackoverflow/outofmemory, performance equivalent to while loop).
    It uses functional Either as abstraction, an Either[T,R] can be either a Left(someThrowable) or Right(someR), where conventionally Right is reserved for sucess, Left for failures.

    There is a pure version without null, casts or any mutable variables, which returns the Either. This allows client code to distinguish occured errors - or just ignore any exceptions but stay type safe and avoid null/casts.
    The less pure version returns directly T. Returns null casted to T in case of abortion or failure.

    @ Boris: how do you test for conflicting transactions?

    AntwortenLöschen
  5. It's true. This freelancing is a good opportunity for the experts in Android application development. I have been working at an in-house job for the last three years as an android developer, but I am looking for a second source of income. Therefore, I think I can work as a freelancer side-by-side. Can you help me find a good platform to work with for the long term?

    AntwortenLöschen