Hibernate Search/Lucene
It seems like Hibernate have upgraded the support for Lucene integration. The first go at support full text search was pretty basic, and it looks like the next upcoming version will have better support for it (Some more information can be found here). Naturally, with Hibernate releasing such a library, the obvious question would be where does Compass fit into the picture. Let me first start with answering some of the arguments raised by Emmanuel Bernard on the Lucene mailing list:
[Emmanuel Bernard] 1. not Yet Another API to deal with your domain model. If you already use an ORM (JPA or Hibernate), you are familiar with those APIs. Using compass implies that you have to use a different set
of API to play with the object lifecycle (CRUD).Hibernate Search is integrated with the org.hibernate.Query interface, and all the CUD operations on the index are triggered from the Hibernate CUD operations.
Compass, since version 0.5, have integrated with Hibernate lifecycle event mechanism. It actually supports Hibernate from version 3.0.x till 3.2.x . Compass support for it is done through Compass GPS infrastructure, and CUD operations are mirrored to the search engine automatically. As for read / search operations, personally, I really don’t see the difference between using Compass API for searching (with the many benefits it gives) and Hibernate API (which is an extension on top of original Hibernate API).
[Emmanuel Bernard] 2. Metadata are minimal and fit particularly well through annotations, so
you don’t have yet another XML representation of ther same domain model (Compass might now have annotations support, you’ll have to check)
Compass have supported annotations since version 0.9, and they are as minimal as possible, even more simple than the Hibernate ones. With Compass, you can also use xml mapping definitions instead or in combination with the annotation support. Many developers do prefer the use of xml mappings and not annotations, with Compass they have this option.
[Emmanuel Bernard] 3. it’s all about managed objects (ie managed by the Session or the EntityManager)
Hibernate Search gives you back objects managed by the Session, so any change made to them will (by default) be synchronized with the database, this is the normal behavior of an ORM, but is not what you have from a Compass search. This approach fits well with the JBoss Seam approach of having all the application around the domain model and EJB 3.0
Compass allows you to map your domain model to the search engine. You can easily get the objects back from the hit results, and load the respective object form the Hibernate Session or Entity Manager (this can easily be abstracted away and used throughout the application with a few lines of code). This is actually one of the main benefits of Compass over the current Hibernate implementation, you don’t hit the database when searching and displaying search results. Moreover, you do not have to use object in order to display the results, but use Resources (which are Compass abstraction on top of Lucene Document).
[Emmanuel Bernard] 4. Not too much abstraction. From what I’ve heard, Compass borrow a lot of its design / classnames from Hibernate/Spring/Lucene. Compass tries to abstract those 3 techlnologies (at least Hibernate and Lucene), by providing its own infrastructure. What am trying to do with Hibernate Search is to keep the abstraction as light as possible. For advanced Lucene query you’ll have to use pure Lucene APIs, which is possible / natural with Hibernate Search
Compass main API does uses the same programming model as other ORM tools (Hibernate did not invent SessionFactory and Session afterall). The main drive behind using a similar API is the simplicity for users to adopt Compass (not necessarily used within an environment that uses ORM tool) and the applicability of it to Compass. Naturally, Compass does borrow a lot of Lucene semantics, but adds on top of them some enhancements (for example, Resource, which maps to Lucene Docuement, is also identifiable and is associated with an alias/mapping definition). As for abstracting Spring, Compass does not really abstract Spring, but integrate with it in order to simplify the development process when working within a Spring environment (similar to what Spring provides for Hibernate). Last, Compass allows the user to directly work with Lucene classes where needed (for example, getting IndexReaders and Searchers).
[Emmanuel Bernard] I do not think that all your object properties belongs to the Index, and some of them will be put in the index with information degradation (ie store year/month rather than the whole date). So I do not believe there is a bidirectional relationship between your domain model and your index documents (for size, efficiency and accuracy purpose). For that matter, Compass cannot really truly index your database backed domain model and give back the object to you. Hibernate Search can because it delegate the object hydration to Hibernate Core.
First, Compass can truly index your full domain model into the index, but I agree that many times this is not required. Compass allows full control over what gets saved into the index and what not, and in which format. Compass allows the user to work only with Resources when displaying search results, and allows to work in a pure mode where un-marshallign is not supported. But, many times the user would still like to work with the domain model, even though it is a degraded view of it, and Compass allows that by creating as much of the domain model as possible according to the mapping definitions.
So, what are Compass features compared to Hibernate Lucene/Search? Actually, this is a difficult question to answer, since the Hibernate library has something like 5% out of all of Compass features. First and foremost, you can use Compass within an Hibernate managed environment, but you can also use Compass where Hibernate is not used, with different ORM tools (JPA, OJB, JDO), and as a standalone. If we focus on a scenario where Hibernate is used, here is a short and by no means complete list of features: Transactions and atomicity of transactions (get ready for index corruption with Hibernate), much more performant, automatic indexing of the domain model based on ORM mappings and Compass mappings, ‘all’ property, component mapping (allowing to index a related class into the same Resource/Document), sub index hashing, query builder and filter builder API, contract mappings, Lucene Analyzers granularity up to the Property/Field level, built in highlighter support, built in support for Lucene extended analyzers and custom analyzers, declarative configuration over all of Compass features and many of Lucene, extensive support for converters with dynamic languages support as well, XML and Resource level mappings (in combination with OSEM), Lucene caching support and many more (I probably need another blog post for this).
At the end, I think this is good news. The fact that it has been realized that full text search makes a lot of sense in many applications, and we can see Hibernate responding. The response, I am guessing, came based on user demand, which means that users require it. Up until now, Compass has been almost on its own in simplifying full text support within applications and the publicity of Hibernate is only going to help Compass.
November 20th, 2006 at 3:29 am
OK Thanks for having correcting me on my Compass misconceptions.
A couple of comments:
There is a way in Hibernate Annotation to plug an XML metadata facility to replace/override annotations. So I guess when someone will need the feature, it should be straightforward.
About retrieving managed objects.
I do believe it is a better model, but that’s pure opinion.
Writing the small wrapper you’re talking about is trivial until you start thinking about the 1+ n problem. I have work around it somehow, and plan to go beyond when the fetch profile will be exposed to the Hibernate API.
You do not have to reach the DB with Hibernate if used in conjunction with the second level cache or even better because of the conjunction use of the persistence context.
I really like the idea of having your data in your database, if you want a local copy of your DB, then fine, the retrieval “by id”, should be just as fast as a Lucene index object reconstruction.
I think you’ve got a point: Compass is about indexing whatever placing Compass as the central API for object/resource retrieval
Hibernate / Hibernate Search is about using an ORM and indexing/searching the domain model. The angle is quite different in it’s philosophy.
The idea of giving a degraded version of the domain model is something I want to avoid for sure (unless asked explicitly through a query). If at some point I support object hydration form the index, I’ll make sure to use the lazy loading mechanism of Hibernate to make it transparent to the end user.
When you say atomic transaction you refer to full 2PC including the DB transaction (or other resources) or just the indexes?
I believe the latter.
Regarding a transactional behavior of the Lucene indexes, I’d better see that resolved by the Lucene project (there are some improvements going on I believe). Most people I discussed with are perfectly happy with the current behavior of Lucene, esp since I don’t think there is no true Lucene XAResource out there so a full 2PC including multiple heterogeneous resources is out of reach.
Comparing Hibernate Search and Compass is for sure unfair: give me some time ;-)
More seriously, I’m driven by use case and user demand, you can cover a bunch of usecases already with Hibernate Search. Some of the features you’ve listed are on my radar, I don’t see a clear strong use case for some others, and I do have some features of mine that I want to explore first.
“much more performant” Maybe, but why? :-)
Competition is good, and like I said, the user demand came from people having checked Compass. Both cover the problem through different angles.
Cheers
Emmanuel
Cheers
November 20th, 2006 at 9:46 am
Compass with with the Hibernate module still allows a user to work with its main data stored in the database. The same programming model exposed by Hibernate Search can be easily done with Compass. So, there is no problem using Compass in such a way.
Regarding the philosophy part, it ties with my previous statement. Compass is open and flexible enough to be used in many different ways, one of them is similar to how Hibernate Search works.
Compass adds support for atomic transactions on top of Lucene. Which is the core for its performance benefits and fast updates. There is more information in Compass reference manual. The next version will also come with an XAResource implementation (though not supporting recovery).
I agree that Hibernate Search is at its infant stage, and that competition if very good. Compass is driven by user demand as well, and I am guessing that many of its features will be required from Hibernate Search as well. Anyhow, just wanted to make things clear with respect to the post on the Lucene mailing list.
November 21st, 2006 at 12:31 am
Posted a long commet on JL: http://www.javalobby.org/java/forums/m92065159.html#92065159
November 21st, 2006 at 12:40 am
The other thing I like about Compass - compared to my admittedly light understanding of Hibernate’s search, is that it doesnt try and hide the underlying Lucene index. Ultimately, all of the Lucene functionality is available (Filters, HitCollectors, etc).
The impression I got looking at the Hibernate API was that it was trying to keep the Lucene index “hidden”.
Is this a fair asessment?
November 23rd, 2006 at 11:48 pm
The assessment is unfair. The Bridge mechanism gives you access to the Document literally.
The Query you pass in is literally the Lucene Query.
I still need to open:
- the directory providers, so that you can easily access your index (even thought a direct lucene access is possible at that time)
- the query result manipulation which I haven’t touched yet (Besides object return)