tag:blogger.com,1999:blog-74114640600748707592023-11-15T08:03:37.621-08:00scala and hypergraphdbUnknownnoreply@blogger.comBlogger6125tag:blogger.com,1999:blog-7411464060074870759.post-942341680958468322012-12-04T06:08:00.002-08:002013-01-02T04:50:26.555-08:00A shift of perspective - hacking HypergraphDB HGHandle<h1>
The Symptom - inverted perspectives</h1>
The JVM is a great platform. Java, the language on the other hand, is pretty limited and suffers a serious boilerplate syndrom: you have to write tons of redundant code. The actual intention of the code is often burried deeply in that boilerplate. HypergraphDB suffers that Java syndrom too.<br />
Furthermore, although HGHandle and HGLink are the very central elements of HypergraphDB, almost no functionality is encapsulated into them. There certainly are good reasons for that design, which I do not want to get into here, but as a consequence the usage pattern of many java frameworks and of hypergraphDB in particular are kind of "inverted", in my opinion.<br />
<br />
This blog post is part of a loose series of posts about scala hacks that make the usage of hypergraphDB more intuitive, by encapsulating common operations right into the objects you are dealing with most. Today, it's about handles, the following post will focus on links.
<br />
<br />
Well, so currently we have code like this:
<br />
<pre><code>
HGALGenerator alGen = new DefaultALGenerator(graph, null,
null, true,
true, false);
HGTraversal trav= new HGDepthFirstTraversal(startHandle, alGen);
while (trav.hasNext()){
Pair<HGHandle,HGHandle> pair = trav.next();
System.out.println(graph.get(pair.getSecond()));
}
</code></pre>
<br />
Question:<br />
Unless you are an expert, how much time did it take you to understand what this is about? Probably till "HGTraversal" indicated that this has to do with traversals. How much time to notice that little "startHandle" variable? Well, that is were the traversals starts. I wonder, why not start from where it starts?<br />
Why not simply <i>startHandle.traverse(args...)</i> ?<br />
<br />
This is just one example where Java's limited toolset often forces developers to write code, that confronts users with a lot of unnecessary stuff for our brains to filter out. This eats a lot of karma better spent on dealing with the actual tasks!<br />
Correction: of course those parameters are not at all unnecessary, but most of them don't change often. So, there should be defaults that should be overridden on demand.<br />
To improve the situation with Java a bit, we could at least define some usability methods on new implementations of HGHandle or HGLink, respectively. However, most of the time we get our HGHandle / HGLink objects out of HypergraphDB classes such as HGQuery. Hence thoses classes would need to return those new handle/link implementations. That requires the HyperGraphDB community to agree on the necessity of this issue and the implied code changes. However, even then that wouldn't help with reducing complexity of operations such as traversals, since with Java we can not define default parameters. That is where scala comes in.<br />
<br />
With the hacks presented below, the same traversal as shown above will not only become as short as one line of code, while keeping all functionality accessible in form of overridable default parameters. Additionally, simply being a method on handle objects, it corresponds better to the intuitive understanding of a traversal, which is that a graph traversal starts from a given start handle. So among other things, we get <i>startHandle.traverse(args...).</i><br />
<br />
<h1>
Remedies </h1>
<h2>
Quick demo of used scala tools</h2>
With scala there are several new mechanisms to simplify code, help reuse of code and augment existing code that one has no control of. Here is a quick demo of the scala features used for the handle hacks presented further below.<br />
<br />
<h3>
named default parameters </h3>
Define it like this:<br />
<pre><code>def fun(actualParam: Int, param:String = "default " ):String = param + actualParam</code></pre>
<br />
Use it like that:
<br />
<pre><code>fun(4) </code></pre>
or
<br />
<pre><code>fun(5, param="i'm overriding defaults")
</code></pre>
<br />
<h3>
implicit parameters</h3>
Define it like this:<br />
<pre><code>def fun(actualParam: Int)(implicit param:String ):String = param + actualParam </code></pre>
<br />
Use it like that:
<br />
<pre><code>implicit val nameDoesntMatter:String = "default"</code></pre>
<pre><code>.... // </code></pre>
<pre><code></code></pre>
<pre><code>fun(5)</code></pre>
<br />
Note that we got a similar effect as for named default parameters here, but implicit parameters are available to all methods in scope. Furthermore, there can be only one implicit parameter for each type. <br />
<br />
<h3>
implicit conversions </h3>
Define it like this:<br />
<pre><code>object StringHack {
implicit def string2put(s:String) = new { def print = println(s) }
}
</code></pre>
<br />
Use it like that:
<br />
<pre><code>import HackString._</code></pre>
<pre><code>"someString".print
</code></pre>
<br />
<h2>
Enriching any HGHandle </h2>
The following table show some common operations. On the left, the current HypergraphDB API used with Java. On the right, the identical operation with scala, using some implicit conversions.
<br />
Notes: <br />
- often parameters on the left don't even show up on the right. However, you can always override them when needed, as shown below.
<br />
- the ".h", ".hh", and ".hhh" are actually hacks described in a <a href="http://scalahypergraph.blogspot.de/2012/09/tutorial-how-to-run-and-hack.html">previous post</a><br />
- code for demos and implementation can be found below. <br />
<br />
<table border="1" style="text-align: left;">
<tbody>
<tr>
<td><b>current HGDB API - Java</b></td>
<td><b>scala api</b></td>
</tr>
<tr>
<td><pre><code>
HyperGraph graph = new HyperGraph("/home/.../bje");</code></pre>
</td>
<td><pre><code>
implicit val graph = new HyperGraph("/home/.../bje")
// defining a default graph. Can always be overriden.
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
HGHandle h = graph.add("hello");
HGHandle h2 = hg.assertAtom(graph, 5);
HGHandle h3 = graph.getHandle(5);
</code></pre>
</td>
<td><pre><code>
val h = "hello".hhh
val h2 = 5.hh
val h3 = 5.h
//add 5 to otherGraph (override defaults)
val h3 = 5.hhh(otherGraph)
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
// DEREFERENCING
System.out.println(graph.get(h));
String s1 = graph.get(h);
String s2 = ((String) graph.get(h)).toUpperCase();
int ten = 5 + (Integer) graph.get(graph.add(5));
</code></pre>
</td>
<td><pre><code>
println(h.d)
val s1 :String = h.d
val s2 = h.d[String].toUpperCase
val ten = 5 + 5.hhh.d[Int]
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
// LINKS
HGHandle wo = graph.add("World");
HGHandle pLi = graph.add(new HGPlainLink(h, wo);
HGHandle aRel = graph.add(new HGRel("hw", h, w ));
</code></pre>
</td>
<td><pre><code>
val pLink = h <-> "World".hhh
val aRel = h rel ("hw", w)
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
// TYPES & TYPE BASED QUERYING
graph.getTypeSystem().getClassForType(graph.getType(aRel));
</code></pre>
</td>
<td><pre><code>
aRel.getType
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
int j = 1;
List<HGHandle> intH = new ArrayList<HGHandle>(10);
// we'll need those later
for (int i = j; i<11; i++){
intH.add(graph.add(i));
}
</code></pre>
</td>
<td><pre><code>
val one2ten = (1 to 10).map(_.hhh)
// we'll need those later
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
// print all atoms of same type as given atom a
HGHandle a = intH.get(0);
HGHandle intTypeHandle = graph.getType(a);
List typealikes = hg.getAll(graph, hg.type(intTypeHandle));
for (Object hh : typealikes){
System.out.println(hh);
}
</code></pre>
</td>
<td><pre><code>
one2ten.head.typeAlikes.
foreach(hh => println(hh.d[Int]))
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
// querying very often constrains on type + some other constraint:
System.out.println("\n print all greater than 4");
List gt4 = hg.findAll(graph,hg.and(hg.type(typeInt), hg.gt(4)));
for (Object hh : gt4){
System.out.println(graph.get((HGHandle)hh));
}
</code></pre>
</td>
<td><pre><code>
1.h.queryOnSameType(hg.gt(4)).foreach(hh => println(hh.d[Int]))
// or
hg.getAll(graph, 1.h.sameTypeQC).foreach(println)
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
// preparing some links to traverse
for (int i = 0; i < intH.size()-1; i++){
graph.add(new HGPlainLink(intH.get(i), intH.get(i+1)));
}
</code></pre>
</td>
<td><pre><code>
one2ten.zipWithIndex.
map{case (handle, index) =>
if (index < one2ten.size-1)
handle <-> one2ten(index+1)}
</code></pre>
</td>
</tr>
<tr>
<td><pre><code>
HGALGenerator alGen = new DefaultALGenerator(graph, null,
null, true,
true, false);
HGTraversal trav= new HGDepthFirstTraversal(intH.get(0), alGen);
boolean continueIt = true;
while (trav.hasNext() && continueIt){
Pair<HGHandle,HGHandle> pair = trav.next();
System.out.println(graph.get(pair.getSecond()));
}
</code></pre>
</td>
<td><pre><code>
1.h.traverse().foreach(p => println(p.getSecond.d[Int]))
// hidden as named default parameters can be overriden.
// Traversing only numbers smaller than 6:
1.h.traverse(sibling = hg.lt(6)).
foreach(p => println(p.getSecond.d[Int]))
// Traversing Links of type L & atoms of type A
1.h.typeTraverse[HGRel, Integer]().
map(p => p.getSecond.d[Int] ).
foreach(println)
</code></pre>
</td>
</tr>
</tbody>
</table>
<br />
<br />
<div style="text-align: center;">
Here is the code implementing the implicits:
</div>
<div class="gistLoad" data-id="3168761" id="gist-3168761">
Loading ....
</div>
<!----->
<script src="https://raw.github.com/moski/gist-Blogger/master/public/gistLoader.js" type="text/javascript"></script><!----->Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-7411464060074870759.post-55458929467991312022012-11-05T23:57:00.000-08:002012-11-05T23:57:01.103-08:00Scala Closures as HyperGraphDB Transactions<div dir="ltr" style="text-align: left;" trbidi="on">
Transactions in HyperGraphDB follow an MVCC (multi-version concurrency control) model where a transaction can be aborted midway simply because there's a conflict with another transaction. Unlike a locking model, where a thread waits until it can acquire a lock and perform the required operation, MVCC transactions never wait and deadlocks never occur. On the other hand, conflicts are not uncommon and one must be prepared to deal with them. A conflict can be detected either while the transaction is running or during commit time. Regardless, the way to handle a conflict is simply to retry the transaction and do so until it succeeds.<br />
<br />
In this short post, I will show how to write proper HyperGraphDB transactions in Scala. At the end, you can take the result <i>transact </i>generic method and use it as a global utility in your projects. Let's start with a basic template and build from there. Here's a first attempt at encapsulating code in a database transaction:<br />
<br />
<pre><code> var graph:HyperGraph = HGEnvironment.get("/tmp/hgdbinstance")
graph.getTransactionManager().beginTransaction()
graph.add("Hello World.")
graph.getTransactionManager().commit()
</code></pre>
<br />
The code between <i>beginTransaction</i> and <i>endTransaction</i> here is irrelevant - it can anything reading and writing to the database. The first and obvious problem with the above is that it ignore exceptions that may occur during the transaction. So a second approximation might look like this:
<br />
<pre><code>
var graph:HyperGraph = HGEnvironment.get("/tmp/hgdbinstance")
graph.getTransactionManager().beginTransaction()
try {
graph.add("Hello World.")
graph.getTransactionManager().commit()
}
catch {
case e => {
graph.getTransactionManager().abort();
}
}
</code></pre>
Now, as I mentioned above any database operation as well as the final commit may fail due to a conflict with another transaction. If that happens, the transaction must be repeated. Since we can't know in advance how many times such a conflict can occur, we have to repeat the transaction in a loop. Hence the code grows to the following:
<br />
<pre><code>
var done = false
while (!done) {
graph.getTransactionManager().beginTransaction()
try {
graph.add("Hello World.")
graph.getTransactionManager().commit()
done = true
}
catch {
case e => {
graph.getTransactionManager().abort();
if (!graph.getStore().getTransactionFactory().canRetryAfter(e))
throw e
}
}
}
</code></pre>
Since Scala doesn't have continue/break in loop, we use a boolean to break out. The logic here is simply: we try to execute our task, commit and exit the loop. In case of an exception, we must loop again when we're dealing with a conflict. The <i>canRetryAfter</i> method tells us precisely that. Its implementations basically checks for a few types of exceptions that indicate conflict and that may depend on the storage layer. So, the above will loop as many times as takes for the transaction to succeed. What's the guarantee that it will ever succeed (you may ask)? The guarantee is that the database will select which transaction to abort at random. Obviously, if you have a really long running transaction in a highly concurrent environment, it may take a really long time to succeed. While one can construct such a situation artificially, it is unlikely to occur in practice. Also, it must be noted that MVCC transactions can conflict only if they both read <b>and</b> write. A read-only or a write-only transaction will always succeed regardless of how long it is.<br />
<br />
So far so good. This is already a lot boilerplate code for what will frequently be a couple of db operations. The HyperGraphDB API lets you encapsulate your transaction in a <i>java.util.Concurrent.Callable </i>so you don't have to write all that code:<br />
<br />
<pre><code>
graph.getTransactionManager().transact(new Callable[Object] {
def call():Object = {
graph.add("Hello World.") }
})
</code></pre>
But we are using Scala which already has a much nicer syntax for closures than Runnable or Callable. So we'd prefer to write something like this instead:
<br />
<pre><code>
transact(Unit => {
graph.add("Hello World.")
})
</code></pre>
<br />
And the transact <i>method</i> could just be a utility that delegates to the HyperGraphDB Callable-based API. Except that doesn't work. The problem with that strategy was what actually prompted the writing this blog post. The reason is that Scala compiler uses Java exceptions to implement return statements in the middle of a function block. So if a the function you pass to transact has some logic with a return statement before the end of the function, it will throw an exception of type <i>NonLocalReturnControl</i>. However that exception is caught by the HyperGraphAPI which considers it abnormal and the transaction fails. For more, see the this forum thread: <a href="http://www.scala-lang.org/node/6759">http://www.scala-lang.org/node/6759</a>. Instead of debating whether HyperGraphDB is at fault here for catching all throwables, we can "just" reimplement the transaction loop in Scala taking that exception into consideration. Here is the full version:
<br />
<pre><code>
def transact[T](code:Unit => T):T = {
var result:T = null.asInstanceOf[T]
var done = false
var commit = true
while (!done) {
graph.getTransactionManager().beginTransaction()
try {
result = code()
commit = true;
}
catch {
case e:HGUserAbortException => {
commit = false
done = true
}
case e:scala.runtime.NonLocalReturnControl[T] => {
result = e.value
commit = true;
}
case e => {
commit = false
if (!graph.getStore().getTransactionFactory().canRetryAfter(e))
throw e
}
}
finally try {
graph.getTransactionManager().endTransaction(commit)
done = done || commit
}
catch {
case e => {
if (!graph.getStore().getTransactionFactory().canRetryAfter(e))
throw e
}
}
}
return result
}
</code></pre>
<br />
You can put this method in whatever utility package you have and call this transaction method instead of the HyperGraphDB one. Let me make a few observation before leaving:<br />
<br />
<ul style="text-align: left;">
<li>Note the rather ugly initialization of the local result variable. This is the only way Scala would type check because otherwise it doesn't know at compile time what should the default value of a generic parameter T be. </li>
<li>The logic is governed by two booleans: commit and done. Commit says whether we should commit or abort, and done says whether we should exit from the loop.</li>
<li>The HGUserAbortException is part of the HyperGraphDB API and part of the contract of executing transactions. User code may throw this exception to cause the transaction to be aborted. This is the only way you can abort a transaction due to application logic when you are inside this <i>transact</i> method. Otherwise, transact has no way of knowing if the transaction was legitimately aborted or something went wrong.</li>
</ul>
<div>
If you have question about this code or any other aspect of MVCC transactions in HyperGraphDB, don't hesitate to post a comment or ask on the <a href="http://groups.google.com/group/hypergraphdb" target="_blank">discussion forum</a>.</div>
</div>
Unknownnoreply@blogger.com5tag:blogger.com,1999:blog-7411464060074870759.post-27259525313434112632012-10-14T07:04:00.003-07:002012-10-26T13:36:46.933-07:00why hypergraphDB rocks - and why users are still scared off<br />
<br />
HyperGraphDB is cool. The <a href="http://www.hypergraphdb.org/index">main page</a> already gives an excellent overview, but in particular look at the example applications on the bottom: its expressivity is so high that different things such as artificial intelligence with neural networks, prolog unification on properly typed hgdb atoms as fact base, and semantic web tripletstores or OWL databases can be naturally hosted in a hypergraphDB. There is much more, just naming a few examples.<br />
<br />
However, the user base of HyperGraphDB appears to be rather small, which is a pity. Also, there are plenty of interesting things that could be done, but we are lacking man power.<br />
<br />
This blog post is investigating why this is the case, and how to overcome it. One point I do not want to indulge here, is to improve usability. This will be soon treated in the next posts.<br />
<br />
<br />
People that consider HyperGraphDB for a particular project, generally have several other options to choose from. They invest a rather limited amount of time and brains for each option in that process. If the concepts, the design, and their respective usefullness for the job do not sink in after a reasonable amount of time, it is just an option being dropped in favor of other options.<br />
<br />
<br />
HyperGraphDB brings in many uncommon concepts, that also differ from standard version of these concepts in several aspects.<br />
Wait, you could also slightly reword that last phrase into: <br />
"hypergraphDB is the Mother of all the databases that make things the most different from the mainstream":<br />
<ul>
<li>it is not relational database, but some form of NoSQL</li>
<li>it's not only a graph database, but <i>also</i> an object-oriented database </li>
<li>it's not just a graph database, but hypergraph database</li>
<li>it's tuple-based hypergraphs, not pairs-of-set-based hypergraphs </li>
<li>its hyperedges don't have two sides, but it <i>can</i> be directed anyway</li>
<li>it has its own type system, that is so mighty, that it cannot be fully expressed in java's type system. </li>
<li>it's a general rule of thumb that java's typed objects get mapped to corresponding hgdb typed atoms and vice versa. But... that does not always hold true.</li>
<li>it's also links that are fully typed in the type hierarchy, and that can point to other links. Rather, nodes are special cases of edges.</li>
</ul>
<br />
<br />
<b><i>Ok, it gets clearer now, why people have problems adopting hypergraphDB! </i></b><br />
<b><i> </i></b> <br />
Some explanations regarding the graph/hypergraph chaos:<br />
A typical graph is nice and simple because there are nodes and edges, whereas pairs of nodes are represented as edges. It is hard <i>not</i> to understand a typical graph. It sinks in fast.<br />
<a href="https://en.wikipedia.org/wiki/Hypergraph">Hypergraphs</a> are a generalization, i.e. they do not have a restriction on the binary relationship. I would intuitively understand that <i>not-only-binary</i> aspect in two possible ways: to allow more than one node on each side of an edge, i.e. pairs of sets of nodes (directed), or to have any number of nodes with no sides at all (undirected).<br />
Also here, HypergraphDB doesn't stick to the general rule: A HGDB link is not a pair of sets of nodes, but <i>one</i> tuple of nodes, but at the same time, it can be directed anyway!<br />
So, in some way, there are now two reasons, why it does no longer fit to common understanding of "a graph". <br />
<br />
<b><i>Wait, so how can it be directed without having two sides? </i></b><br />
That is possible because unlike a set, a tuple has the notion of an order and in your very own definition of HGDB links, you can generally hard-code the meaning of positions in the tuple. That is, the tuple approach allows to supersede just "direction". This is illustrated in the example below.<br />
<br />
<b><i>Ok, but why is it worth it to make it that way, and not the standard way?</i></b><br />
The short answer is, that neither graphs nor normal hypergraphs are expressive enough to express a big class of problems. <br />
Maybe an example helps to illustrate, why it is useful have n-ary relationships, why in particular, the tuple-approach is interesting, and why it is helpful to have fully typed atoms.<br />
<br />
<i><b>Ok then, give me an example why you need that stuff!</b></i><br />
I pick biology here because it is my domain, but you can find situations everywhere, in which modeling after binary relationships are either too strong simplifications, or imply splitting up "one thing" into many smaller ones. This means the representation of your domain entity (forgive me if I get terminology wrong), is fragmented into many nodes and vertices. This can be a good thing too, when calling it decoupling in some situations, but often it just doesn't make sense, to separate what belongs together. Therefore, often things are just simplified to a point until they fit the system. It must not be that way.<br />
<br />
Ok, example. Graphs are useful for modeling enzyme reactions, where one typical graph representation is <br />
subtrate ⇌ product<br />
<br />
where the edge itself is simply understood as the enzyme. Actually, this is the first example of why the limitation of oversimplifying binary graphs hurt: the edge has now a twofold meaning, it would be at the same time a reaction and an enzyme (which quite often correlates well, but not always at all).<br />
<br />
Ok, let's illustrate. Understanding details don't matter here, just look at the pic. Interesting to note, that this example, glycolysis, is what happens billions of times in each cell of your body, every hour or so. It is one of the most important ways of how chemical energy is converted to biologically useful energy forms, starting from glucose. You would not exist without that:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Glycolysis2.svg/2000px-Glycolysis2.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="296" src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Glycolysis2.svg/2000px-Glycolysis2.svg.png" width="640" /></a></div>
<br />
<br />
<br />
Even without understanding anything, there are several spots where you can see visually, that the above graph simplification could never come near the truth and that generally it would be a huge mess to model anything like that with a binary graph or even a regular hypergraph:<br />
<ul>
<li>in some reactions there are not only subtrates and products and the (omitted) enzyme, but also cofactors such as ATP or NAD involved (energy intermediates) or magnesium. Cofactors can have state changes in both directions too. This is critical in order to know in which direction the whole pathway occurs. </li>
<li>Some reactions are reversible, others are not (correlates to some extent with cofactors implied and the kind of state change)</li>
<li>even when only considering substrates and products, some reactions are not binary, see triple-directed arrow in lower right corner.</li>
<li>although it is convenient to separate between substrates /products vs enzymes vs cofactors, these are generally in continuous flux of interconversion, limited only by the stoichiometry of actual elements present, and of course thermodynamics. </li>
</ul>
Well now with graphs this is tricky to model, you would basically need to encode that into probably several dozens if not hundreds of nodes and vertices. With normal, untyped hypergraphs it would be also very difficult if not impossible to model one reaction in one hyperedge, since there are different types of things on each side of the hyperedge whose role in that particular reaction you must distinguish: organic molecules that are substrates and products such as glucose, cofactors such as ATP and enzymes such as hexokinase.<br />
Furthermore, you still would be unable to represent n-ary reactions such as in the case of F6BP-aldolase. Note that it is both, n-ary and directed (one 6-carbon molecules is converted to two different 3-carbon molecules). <br />
Afaik, when your system such as binary graphs have limited expressivity, you have to make a trade-of between simplification and an exploding number of nodes and vertices. Obviously, simplification has a price in expressivity of the model. The question is hence, what questions can be answered with your simplified model.<br />
Some interesting example questions to metabolic models like that:<br />
How can the production of a specific desired product be maximized? How to avoid the backreaction of that desired product without disturbing the production? How to optimize the production? Feed the cell with which subtrate? What are the rate-limiting reactions in the pathway, due to which limiting factor in that reaction?
What kind of products are accumulated, when a particular enzyme is knocked-out, or is that just
compensated for by another reaction (btw an interesting property of
biological systems called robustness)?<br />
For these questions, you would need much information about the enzymes, and a variety of parameters in the cell, such as keep track of the concentration and their changes of each molecule species. For example, commonly, cofactors often are just simplified away, but you would also need to keep track what is the concentration of ATP/ADP, NAD+/NADH2 and how their concentration is changed by reactions involving them. At a second glance, you'd also need to know how much phosphate is bound to what kind of molecules (in the first reaction for example, glucose is activated by one of those yellow circles of phosphate that is split up of ATP). This matters because when lacking ATP, most but not all of them can be easily converted back to ATP when phosphorylation state in the cell is low. This are just some improvised examples, of why it is bad to be forced to do oversimplification. That is also probably one of the reason why graphs are not used as much in bioinformatics as one would expect. <br />
<br />
<br />
<br />
<i><b>So ok, modelling is complicated, in whatever field you are in. How is hypergraphDB different?</b></i><br />
I speculate here that a single HypergraphDB's hyperedge can be designed such that it accomodates one entire glycolysis reaction. If that is wise in a particular case is another question.<br />
The only main limitation is that you could not a single link with several
variable length types of arguments. Just as with java varargs, you can only
have one vararg argument in the final position. But there are no varargs here.<br />
The java class would have to implement HGLink, and the ugly part, would need to encode specific positions in the link to particular fields in the class. Hence, hereby we create a hypergraphDB link in which the positions in the link have reserved meanings and with the respective type constraint of allowed atoms i.e. position 1: type of the reaction<br />
position 2 : key enzyme <br />
position 3 to 4: other involved enzymes <br />
position 5 to 10: subtrates<br />
position 10 to 12: cofactors (in specific state)<br />
position 13 to 15: intermediary enzyme-substrate complexes<br />
position 16 to 20: products<br />
position 21 to xy: rate constants, Km values etc, pH and temperature optima metrics<br />
<br />
This link would probably encompass at least half of all reactions, but more suitable would be to have a small type hierarchy of reaction links, where the positions are mapped to parameters of particular reaction types. Analogously, one could encode the n-ary directed reactions.<br />
<br />
<br />
Now, with hypergraphDB type system, you could also reflect that all sorts of molecules are interconverted constantly, also across the oversimplifying roles we spoke about above (substrate vs enzyme etc). For this to achieve, you would need to define a type hierarchy that carries in their type definition the information of what the things actually are: polymers are a specific sequence of monomers (you have a field in the corresponding java class), monomers are specific compound of elements in specific amounts and bounds among them. Hence the type enzyme would be a subtype of protein, which in turn would hold a specific sequence of aminoacids, which in turn are composed of specific functional groups and elements. Hence it could be reflected what kind of amino acids or even compounds would be released into the cell, when a particular enzyme is degraded - (enzymes are indeed constantly formed and degraded, because this allows negative feedback and it prevents malfunctioning enzymes to form and do damage). In terms of hypergraphDB that would mean that one or several atoms of a given type would be erased and other atoms would be created, probably in a transaction, such that no mass just vanishes or appears out of thin air.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-7411464060074870759.post-26769385619637287032012-09-17T16:36:00.001-07:002012-09-30T03:07:10.522-07:00Tutorial: how to run and hack HyperGraphDB in the SBT console<br />
In this post I'd like to show how to use <a href="http://www.scala-sbt.org/">hypergraphdb</a> in a scripting-like environment, using <a href="http://www.scala-sbt.org/">SBT</a>, the scala build tool. This is useful to quickly try out hypergraphDB and scala. <br />
Finally there are two examples of small but already quite useful hacks:<br />
<ul>
<li>extending HGHandle with a simple dereferencing method</li>
<li>extending any Object with three succinct ways of getting their handles</li>
</ul>
<ol>
<li><h2>
Setup</h2>
</li>
<ol>
<li>create directory hgdbTesting (arbitrary name)</li>
<li>create subdirectory hgdbTesting/lib</li>
<li>create subdirectory hgdbTesting/berkeleyDbDir</li>
<li>download: <a href="http://www.hypergraphdb.org/maven/org/hypergraphdb/hgdb/1.2/hgdb-1.2.jar">hypergraphDB-jar</a> and place it into hgdbTesting/lib</li>
<li>download <a href="http://hypergraphdb.googlecode.com/files/hypergraphdb-1.2-beta.tar.gz">hypergraphDB-archive</a> then extract lib/je-5.0.34.jar and lib/hgbdbje-1.2.jar and place them into hgdbTesting/lib</li>
<li>download and setup sbt, either via step 1 or2</li>
<ol>
<li>from <a href="http://www.scala-sbt.org/">here</a> & follow install instructions </li>
<li>or in the shell:</li>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="4" wrap="SOFT">wget http://typesafe.artifactoryonline.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.12.0/sbt-launch.jar -O ~/bin/sbt-launch.jar
echo 'java -Xmx512M -jar `dirname $0`/sbt-launch.jar "$@"' > ~/bin/sbt
chmod u+x ~/bin/sbt
</textarea>
</ol>
<li>while in hgdbTesting run <i>~/bin/sbt </i>This first run sets up sbt, automatically downloading a minimal scala distribution, storing its data in ~/.sbt. There should be some info status reports, then you should end up in the sbt interactive console</li>
</ol>
<li><h2>
Scripting HypergraphDB</h2>
</li>
<ol>
<li>in the sbt interactive console type "console" to enter into the scala interpreter. This brings up an enhanced version of the standard scala REPL (read eval print loop). We use it here since it handles some nonobvious classpath and buildpath issues. It is also faster and more reliable than scala support in any eclipse or intellij.</li>
<li>Start typing scala code:</li>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="6" wrap="SOFT">import org.hypergraphdb._
val config = new HGConfiguration
val graph = new HyperGraph("/home/ingvar/berkeleyDBTemp")
val res0 = graph.add("hello ")
val bla:String = graph.get(res0)
// console prints bla: String = "hello "
</textarea>
</ol>
<ol>
</ol>
<br />
<li><h2>
Hacking HypergraphDB</h2>
</li>
<ol>
<li>The first hack is an extension to HGHandle. It allows dereferencing a given handle without the need to call graph.get, but with a new method d ("dereference"), parametrized by the type. If you use several HyperGraph instances at once, you still have the option to override the "implicit parameter" graphImplicit, defined in 2.3.6.</li>
<ol>
<li>enter: "<i>:paste</i>"</li>
<li>then copy paste the following three lines (without numbering):</li>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="3" wrap="SOFT">implicit def richHandle (handle:HGHandle) = new{
def d[T](implicit graph:HyperGraph):T = graph.get(handle).asInstanceOf[T]
}
</textarea>
<li>press enter then Ctrl-D to terminate paste mode.</li>
<li>for this hack to work, we have to define an implicit parameter of type HyperGraph, i.e. define the default HyperGraph instance that is used wherever a method has an "implicit parameter", in that case by assigning it to <i>graph</i>, that we defined before.</li>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="1" wrap="SOFT">implicit val graphImplicit = graph</textarea>
<ol>instead of:</ol>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="1" wrap="SOFT">(String) graph.get(res0)</textarea>
<ol>we can now use:</ol>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="1" wrap="SOFT">val d:String = res0.d</textarea>
<ol>or</ol>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="1" wrap="SOFT">val d = res0.d[String] </textarea>
</ol>
<li>The second hack is an extension to any Object. They are simple shorthands for three common operations to obtain a handle of a given object. The first, "h" retrieves the handle of an object that is already stored and loaded in the cache.
The second "hh" retrieves a handle, by obtaining the existing one, else storing a new atom. The third "hhh" stores the atom without that check.</li>
<ol>
<li>enter ":paste"</li>
<li>copy & paste this:</li>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="5" wrap="SOFT">implicit def pimpAny[T](any: T)(implicit graph: HyperGraph) = new {
def h: HGHandle = graph.getHandle(any)
def hh: HGHandle = hg.assertAtom(graph, any)
def hhh: HGHandle = graph.add(any)
}</textarea>
</ol>
<li>press enter then Ctrl-D to terminate paste mode.</li>
<li>now we can obtain handles by writing:</li>
<textarea cols="85" name="SOFT" readonly="TRUE" rows="3" wrap="SOFT">"a".h
5.hh
true.hhh
</textarea>
</ol>
<li>Finish by typing <i>graph.close</i>. close sbt repl mode and return to sbt console by <i>Ctrl-D</i>, then enter <i>exit</i></li>
<br />
<ol>
</ol>
</ol>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7411464060074870759.post-57077640338127538302012-07-28T05:08:00.000-07:002012-07-28T05:08:48.688-07:00interesting scala features<h1>
</h1>
<h2>
General</h2>
Scala
is a statically-typed, advanced but mature OO-functional-hybrid
language. It allows writing very concise and elegant code, and is fully
compatible with Java, but offers numerable possibilities absent in Java, far beyond mere syntactic sugar.<br />
Using a mechanism called
implicit conversions, scala can extend existing code on-demand, i.e. add
methods to classes and interconvert types, without requiring any change
to the extended classes.<br />
<br />
<br />
<br />
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="Functions_as_first-class_objects"></a>Functions as first-class objects</h2>
In
scala functions are first-class objects, they can be passed in as
argument and returned as result of a method or another function. Methods
can be converted to function objects on demand. Scala's standard
collection liberary provides map, flatmap, filter, foreach, forall,
groupBy among many other. These are prominent examples of
higher-order-functions and a well-known example of an extremely useful
feature that is missing in java. They can be used on Java Collections,
Iterators or Iterables, hence they can be used everywhere with
HyperGraphDB, given just one import ("import collection.JavaConversions<a href="http://code.google.com/p/hypergraphdb/w/edit/JavaConversions">?</a>.<i>").
</i>Using closures, partially applied functions / currying and liberal use
of curly braces, control structures and operators can be implemented in
a way indistinguishable from regular ones. Automatic resource managment
could be implemented "with (resource) do {function}". Since in scala
there are no operators only methods, and you are free to use fancy
symbols, one could also write "µ someResource | function" <br />
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="Haskell-style_pattern_matching"></a>Haskell-style pattern matching</h2>
Scala
allows pattern matching as found in functional languages, during which
variables are bound to parts of deconstructed objects. For example: <br />
<pre class="prettyprint"><span class="typ">List</span><span class="pun">(</span><span class="lit">1</span><span class="pun">,</span><span class="lit">2</span><span class="pun">,</span><span class="lit">3</span><span class="pun">,</span><span class="lit">4</span><span class="pun">)</span><span class="pln"> match </span><span class="pun">{</span><span class="pln">
</span><span class="kwd">case</span><span class="pln"> </span><span class="typ">List</span><span class="pun">(</span><span class="pln">_</span><span class="pun">,</span><span class="pln">a</span><span class="pun">,</span><span class="pln">_</span><span class="pun">,</span><span class="pln">b</span><span class="pun">)</span><span class="pln"> </span><span class="kwd">if</span><span class="pln"> a </span><span class="pun"><</span><span class="pln"> b </span><span class="pun">=></span><span class="pln"> println</span><span class="pun">(</span><span class="str">"hello "</span><span class="pun">+</span><span class="pln"> a</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">case</span><span class="pln"> _ </span><span class="pun">=></span><span class="pln"> println</span><span class="pun">(</span><span class="str">"not found"</span><span class="pun">);</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="com">//prints "hello 2"</span></pre>
By
providing compagnion objects with extractor methods, existing code can
participate in such deep matches, again without touching the extended
code. <br />
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="for_comprehensions"></a>for comprehensions</h2>
for
comprehensions are more than syntactic sugar for
higher-order-functions, they also allow for very comfortable usage of
advanced constructs such as monads, functors and applicatives functors.
They can be used with (java) collection directly.<br />
<br />
<pre class="prettyprint"><span class="pln">val result </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">for</span><span class="pun">(</span><span class="pln"> e </span><span class="pun"><-</span><span class="pln"> employees</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> e</span><span class="pun">.</span><span class="pln">age </span><span class="pun">></span><span class="pln"> </span><span class="lit">25</span><span class="pun">;</span><span class="pln">
salary </span><span class="pun">=</span><span class="pln"> e</span><span class="pun">.</span><span class="pln">age </span><span class="pun">*</span><span class="pln"> </span><span class="lit">100</span><span class="pun">;</span><span class="pln">
c </span><span class="pun"><-</span><span class="pln"> companies</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> c</span><span class="pun">.</span><span class="pln">region </span><span class="pun">==</span><span class="pln"> </span><span class="str">"DA"</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> c</span><span class="pun">.</span><span class="pln">name </span><span class="pun">==</span><span class="pln"> e</span><span class="pun">.</span><span class="pln">companyName</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> c</span><span class="pun">.</span><span class="pln">avgSalary </span><span class="pun"><</span><span class="pln"> salary
</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">yield</span><span class="pln"> </span><span class="pun">(</span><span class="pln"> e</span><span class="pun">.</span><span class="pln">name</span><span class="pun">,</span><span class="pln"> c</span><span class="pun">.</span><span class="pln">name</span><span class="pun">,</span><span class="pln"> salary </span><span class="pun">-</span><span class="pln"> c</span><span class="pun">.</span><span class="pln">avgSalary </span><span class="pun">)</span><span class="pln">
</span><span class="com">// result is now a List of Tuple(Employe, Compagnie, SalaryExtra)</span></pre>
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="macros_&_reflection"></a>macros & reflection</h2>
The upcoming scala release 2.10 introduces macros for code generation,
and a new reflection system that provides "mirrors", ie provides access
to the exact same thing that the compiler sees. <br />
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="higher-kinded_types"></a>higher-kinded types</h2>
Scala's typesystem has <a href="http://stackoverflow.com/questions/6246719/what-is-a-higher-kinded-type-in-scala" rel="nofollow">higher-kinded types</a>, i.e. it can accomodate hypergraphDB's type constructors of type constructors of type constructors... <br />
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="traits_/_mixin_composition"></a>traits / mixin composition</h2>
Traits
are interfaces with implementations, including mechanisms to a) refer to
the object they are mixed into and b) deal with multiple mixed in traits. Mixin may occur at class definition but
also at instantiation of a class one does not have control over. <br />
<h2>
<a href="http://www.blogger.com/blogger.g?blogID=7411464060074870759" name="advanced_concepts_of_functional_programming"></a>advanced concepts of functional programming</h2>
Scala
has mechanisms that make it possible or make it more practical to use
advanced techniques of functional programming. Using traits and implicit
conversions, scala allows for ad-hoc-polymorphism aka typeclasses as
found in haskell (<a href="http://www.youtube.com/watch?v=sVMES4RZF-8" rel="nofollow">good intro video for humans</a>
). Typeclasses allow to further decoupling of components. Scala also
makes it much easier to use monads, functors, applicative functors. As
mentioned before, for comprehension provide a language - level feature
that can be directly used with any monad. This is powerful not only
using collections, but also any sort of container or computations. For
example, OptionT<a href="http://code.google.com/p/hypergraphdb/w/edit/T">?</a> reflects as type information that a value may be null, it avoids NPE by allowing to "flatmap over" options ( <a href="http://www.youtube.com/watch?v=Mw_Jnn_Y5iA" rel="nofollow">good intro video for humans</a> ).Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7411464060074870759.post-87315056584591674132012-07-28T04:50:00.000-07:002012-07-28T14:56:14.647-07:00IntroHi,
I'm a biologist and an autodidact-IT guy with strong interest in the scala language and <a href="http://hypergraphdb.org/">hypergraphDB</a>. They are both cool, check them out.<br />
This blog is about my explorations on either scala or hypergraphDB or on a scala wrapper for hypergraphDB.<br />
<br />Unknownnoreply@blogger.com0