Lucene Summit: Erik Hatcher

The co-author of Lucene in Action, this presentation could probably have been termed a keynote. Erik is working with Lucene and Solr in quite a few projects and some of the displays are fairly interesting. Most of these projects are listed under patacriticism.org though there is another called Juxta which also used Lucene though mostly for data analysis.

Rosetti Archive

Available at rossettiarchive.org
Uses lucene for full text search
Has multiple types of digitized objects - poems, paintings, etc
Scholars demanded case sensitive version - accomplished with two seperate indexes - checkbox chooses index
Metadata is mixed - some have alot, some don't, many based on crappy xml formats
More like this function that creates query from object data

Nines and Collex

A scholary portal for 19th century objects.
Available at www.nines.org/collex
Interface uses idea of facets/sets/collections - add and subtract facets to limit your collection - keyword not really necessary
facets are intersections between sets
objects can be in multiple sets
counts shown are specific to the current restraints - dynamic - cached to speed up
can intercept any set - example: everything in a category i haven't collected/taged
username, collection - just another facet - everything is a set - sets can be objects
there is an index for the archive and a seperate index for user data currently
docsets used for users
currently writing an update handler for solr to help make dynamic/user data easier to handle - update single fields
hopeful to allow member archives to use the connections users make within collex
atom exports one possibility
open-source project at sourceforge but not really general at this point
ruby on rails frontend with solr on the backend. mysql is also used for some user data, may contain index in future
rdf -> solr right now. member archives submit the rdf format or are converted to it
future: move more things to mysql so displays can be built faster without pulling information from index itself (limit lucene/solr for searching)

There was more information in the presentation but I found it hard to summarize some of it. I really found the Collex interface concept to be very interesting. Everything is a contraint or limit and you can easily add or invert the contraint. It’s also easy to add things to a personal collection and parts of the personal collection then become facets/contraints themselves. He’s really using all of the metadata (archive and user) to it’s full extent. He also has more plans including “exhibits” where people can “curate collections”. These collections themselves can then become objects in the index and so on. This project will be something to watch.