Archive for February, 2008

Did You Mean: Compass?

Wednesday, February 27th, 2008

Just finished adding support for spell checking in Compass (available in the nightly builds) which turned out a really cool and easy to use feature. In order to enable spell check in Compass, all that needs to be done is set the following property:

1
compass.engine.spellcheck.enable=true

Next, in the code that performs the search, the following can be done:

1
2
3
4
5
6
CompassQuery query = session.queryBuilder().queryString("fiv").toQuery();
CompassHits hits = query.hits();
System.out.println("Original Query: " + hits.getQuery());
if (hits.getSuggestedQuery().isSuggested()) {
    System.out.println("Did You Mean: " + hits.getSuggestedQuery());
}

Naturally, there are many ways to configure and control this feature, which are all explained in the reference docs, but this is the gist of it.

1000 Tests Mark for Compass

Tuesday, February 26th, 2008

Compass just hit the 1000 tests which is quite a landmark by my not so biased opinion ;). Check out the bamboo test report here, and the clover one here.

p.s. Clover clearly indicates that there is some more work left to be done, but all in all, its pretty good.

Dune and Scalability

Friday, February 22nd, 2008

As I posted before, I am reading the wonderful Dune book again. Another quote from Dune reminded me of problems we have when trying to build a scalable application:

Kynes looked at Jessica, said: “The newcomer to Arrakis frequently underestimate the importance of water here. You are dealing, you see, with the Law of the Minimum.”

She heard the testing quality in his voice, said, “Growth is limited by the necessity which is present in the least amount. And, naturally, the least favorable condition controls the growth rate.”

“It’s rare to find members of a Great House aware of planetological problems,” Kynes said. “Water is the least favorable condition for life on Arrakis. And remember that growth itself can produce unfavorable conditions unless treated with extreme care”.

So true. When we build an application and notice some performance problems, we first need to try and nail down the “least favorable condition”. For example, if we have a messaging system talking to a database, and we notice a bottleneck in the messaging system, we first need to tackle it.

But, “growth itself can produce unfavorable conditions”. We might find that we increased the growth in our messaging system two fold, only to find that our database became the next bottleneck which allowed our system to grow in a much smaller factor.

Even so, “unfavorable conditions” can be much worse, where we find that our whole architecture can simply not scale anymore, and we have to re-architect it to meet our needs. Extreme care, Kynes said, and he is right… .

Stay At Home Servers

Thursday, February 21st, 2008

I really thought that this was a prank. Turns out there is a real product behind it. Funny.

Improved Boosting with “all” Property

Wednesday, February 20th, 2008

Well, I just finished implementing a really cool feature in Compass supporting specific boosting for terms in the all property. This is best shown with an example:

1
2
3
4
5
6
7
8
9
@Searchable
public class Article {
 
    @SearchableProperty(boost = 5.0f)
    String title;
 
    @SearchableProperty
    String content;
}

In this case we mark the title to be more “important” than content. This means that if Article a title has the token london, and Article b content has a token london, then if we do the following search: title:london OR content:london, a will rank higher than b.

This is really nice, but, when searching, people usually search on the all field of Compass. The all field in Compass is a special field that actually allows to search on all the different fields of Article (its performance has much improved in 2.0). The main problem with the all field (up until now) is the fact that it did not take boosting into account. Well, now it does :).

What does it means? If we take current Compass versions (without the new all boost feature), if we search for london, a Article and b Article will rank the same. Now, with the new all boost feature, a will rank much higher than b.

This feature is available in Compass 2.0 M3. Take the nightlies for a spin.

Compass 2.0.0 M2 Released

Tuesday, February 19th, 2008

Compass version 2.0.0 M2 has just been released. This is another great milestone in the 2.0 release train.

Main feature in this release include support for Lucene 2.3 version. A lot of work has gone into integrating new Lucene features into Compass in the best, simple and performant way. This release should see some very nice performance improvements.

The main enhancement in this release is in the area of transactions. Compass now makes use of Lucene transaction API, with a completely rewritten Read Committed transaction isolation support. Also, the batch insert transaction isolation level has been renamed to lucene transaction isolation level, and it now fully transactional as well as all the allowing all different operations (CRUD as well as search).

The main difference now between the two transaction isolation levels is the visibility of changes done during an on going transaction to the on going transaction operations. With Read Committed, if for example something is deleted from the index, then within the same transaction it will no longer show up. With Lucene transaction isolation level, the deletion will still be visible to search operations done within the same transaction.

In most cases, the lucene transaction isolation level is more than enough, but for systems that work with Compass as the main API (as oppose to working with ORM integration) the Lucene transaction can be easily used as well.

There are new parameters that can control different aspects of the two transaction isolations, so I highly recommend reading this section again.

I was debating with myself which transaction isolation should now be the default one. Currently, Compass still has the read committed isolation level as the default one (since it seems to yield slightly better performance), but the lucene transaction isolation level can certainly yield better performance under certain conditions. For long running transactions, lucene transaction isolation level is preferred. It would be great for people who try out the new version to give both of them a go and maybe post back some numbers?

There are more features in this release, which can be found in the release notes. Enjoy!.

Great quote from Dune

Tuesday, February 19th, 2008

I am reading the wonderful Dune book again and wanted to share the following quote:

Many have marked the speed with which the Muad’Dib learned the necessities of Arrakis. The Bene Gesserit, of course, know the basis of the speed. For the others, we can say that Muad’Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It is shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad’Dib knew that every experience carries a lesson.

So true…

Time to rewrite DBMS, says Ingres founder

Monday, February 18th, 2008

In a paper titled The end of an architectural era (It’s time for a complete rewrite), Mike Stonebraker, Ingres founder and a Postgres architect, with a group of academics said that modern use of computers renders many features of mainstream DBMS obsolete.

They have argued that DBMS designs such as Oracle and SQL Server come from an age when online transaction processing (OLTP) dominated and required techniques such as multi threading and transaction locking. They said that modern transactions - entered via web pages - do not need these expensive processing overheads and DBMS should, therefore, be re-designed without them. Persistent storage such as disks are also seen as unnecessary and could be replaced by geographically dispersed RAM storage.

Stonebraker and his group also advocate abandoning SQL because they see no need for a separate data manipulation language. Data manipulation, they said, can be performed with other tasks using languages such as Ruby. They describe a prototype DBMS called H-Store that embodies these ideas.

This paper is a very interesting read, and basically acknowledge the hard work that DataGrid providers such as GigaSpaces and Tangosol have been advocating for a long time. One thing that I do have to comment regarding the article is the fact that Mr. Stonebraker and the rest of the group fail to take this architecture to the next level of integrating such a solution into the “application tier”. I guess this is mainly due to the fact of the “remote database” concept that is inherit when working with databases.

What do I mean by that? Very simple. Once we have our data stored in memory, we can bring it “into” our application tier. This means that operations that we perform will actually be done in memory without even leaving our “vm”. Naturally, the next question that is asked then is what do you do with partitioning? Well, the idea is to have the processing of data redirected into the partition that will hold most (if not all) the relevant data that is required for its processing (the one that is not can still be accessed in a remote “clustered” manner).

Another interesting point is the replacement of SQL with better ways to query for data. For one, the simplest thing can be to define our queries based on the objects we work on. For example, create a “template” of an Order where its processed flag is set to false. Advance queries can be based on dynamic languages such as ruby and groovy, which is exactly what I have been hacking around in GigaSpaces for our upcoming version (more information can be found here).

Its great to see this movement starting to happen within the database world.

Embedded TopLink Essentials (Glassfish)

Monday, February 18th, 2008

Compass 2.0 M1 supports an embedded mode when working with TopLink Essentials (which its development is part of the Glassfish project).

Here are the simple steps needed to enable Compass with TopLink Essentials:

First, add the following to your persistence xml file:

1
2
3
4
5
6
7
8
<persistence-unit name="test" transaction-type="RESOURCE_LOCAL">
  <provider>oracle.toplink.essentials.PersistenceProvider</provider>
  <properties>    
    <!-- ... (other properties) -->
    <property name="toplink.session.customizer" 
         value="org.compass.gps.device.jpa.embedded.toplink.CompassSessionCustomizer" />
  </properties>
</persistence-unit>

This will enable Compass within TopLink Essentials, basically going over all the mapped JPA classes and adding them to Compass automatically if they have the @Searchable annotation.

Now, if we want to completely index the database based on the mappings, we can execute the following code:

1
TopLinkHelper.getCompassGps(emf).index();

Last, if we want to perform a search, we can simply obtain the Compass instance and perform it. Here is the code:

1
Compass compass = TopLinkHelper.getCompass(emf);

That is it. Simple no? Now, Compass comes with support for embedded OpenJPA, Hibernate, and TopLink (as well as EclipseLink, which is very similar to TopLink). More information on the integration can be found in the reference manual.

Overview Section on Compass Site

Thursday, February 14th, 2008

I have finally managed to put some content into the overview section on the new Compass site. I tried to nail down in a concise manner Compass main features and wanted to ask you Compass users what do you think about it. Is there something missing? Are things less important emphasized here? Thanks in advance!. Here is a direct link.