Lucene Summit: Erik Hatcher
The co-author of Lucene in Action, this presentation could probably have been termed a keynote. Erik is working with Lucene and Solr in quite a few projects and some of the displays are fairly interesting. Most of these projects are listed under patacriticism.org though there is another called Juxta which also used Lucene though mostly for data analysis.
Rosetti Archive
- Available at rossettiarchive.org
- Uses lucene for full text search
- Has multiple types of digitized objects - poems, paintings, etc
- Scholars demanded case sensitive version - accomplished with two seperate indexes - checkbox chooses index
- Metadata is mixed - some have alot, some don't, many based on crappy xml formats
- More like this function that creates query from object data
Nines and Collex
- A scholary portal for 19th century objects.
- Available at www.nines.org/collex
- Interface uses idea of facets/sets/collections - add and subtract facets to limit your collection - keyword not really necessary
- facets are intersections between sets
- objects can be in multiple sets
- counts shown are specific to the current restraints - dynamic - cached to speed up
- can intercept any set - example: everything in a category i haven't collected/taged
- username, collection - just another facet - everything is a set - sets can be objects
- there is an index for the archive and a seperate index for user data currently
- docsets used for users
- currently writing an update handler for solr to help make dynamic/user data easier to handle - update single fields
- hopeful to allow member archives to use the connections users make within collex
- atom exports one possibility
- open-source project at sourceforge but not really general at this point
- ruby on rails frontend with solr on the backend. mysql is also used for some user data, may contain index in future
- rdf -> solr right now. member archives submit the rdf format or are converted to it
- future: move more things to mysql so displays can be built faster without pulling information from index itself (limit lucene/solr for searching)
There was more information in the presentation but I found it hard to summarize some of it. I really found the Collex interface concept to be very interesting. Everything is a contraint or limit and you can easily add or invert the contraint. It’s also easy to add things to a personal collection and parts of the personal collection then become facets/contraints themselves. He’s really using all of the metadata (archive and user) to it’s full extent. He also has more plans including “exhibits” where people can “curate collections”. These collections themselves can then become objects in the index and so on. This project will be something to watch.