SlideSix Improves Search With Seeker

Posted By : todd sharp Posted At : April 6, 2009 8:56 AM Posted In: ColdFusion, SlideSix

8

I pushed my weekly SlideSix build into production on Saturday night which included a number of miscelaneous bug fixes for the management console and a minor enhancement to the presentation viewer that gives users the ability to control the advance speed via a slider control in the viewer menu. But the big addition for the week came in the form of a powerful search engine called Lucene. User searches are drastically more accurate - and quicker too - thanks to the open source engine.

I chose to use Lucene over the built in ColdFusion Verity searching for a few reasons. First and most importantly I wanted to make sure whatever option I chose was scalable. Verity has a limit on how large collections can be (depending on your ColdFusion license) and I didn't want to have to worry about that limit at any point in the future. I also wanted something that was easy to implement. In the case of Lucene, implementation was extremely simple thanks to - yep, you guessed it - a project called Seeker by the King of ColdFusion open source Ray Camden. I should mention that his birthday is Wednesday and he has a wishlist.

As always, feedback is welcome. Please try out the new search and let me know what you think.

Comments (8)

Joshua Siok's Gravatar This appears to be Apache specific. What alternatives to Verity are there for Windows/IIS?

todd sharp's Gravatar @Joshua - No, Lucene is not Apache Only. It is a part of the 'Apache Project', one component of which is the Apache web server. Lucene can be used with any web server (I'm using IIS7 myself). The only requirement is that you drop the Lucene JAR in your CF class path so that you can call the Java objects. Ray's code wraps those calls in a series of custom tags to make things easy.

Dave Phipps's Gravatar If you are already running CF 8.01 you may not even need to install the Lucene jar as I noticed that our OS X version already has it in the classpath!! Not sure if it is installed on Windows or Linux but Verity doesn't work at all on OS X so this may be why Adobe added. Either way Seeker is another great piece of code from Ray.

Avi Rappoport, Searchtools.com's Gravatar I'm very interested in your experiences adding Lucene (via Seeker). I'm sure a lot of CF people will want to take a look.

Are you indexing the text within the slides? I think not but it's hard to tell. And if you'd like some help with design patterns for search results, especially no-matches, I'll offer you some free consulting, just to make the user experience better.

todd sharp's Gravatar Hey Avi - I am indexing the slide text, slide titles, slideshow titles, slideshow abstract and all slideshow tags in order to have the best chance of a match. I do plan on a more in depth blog post explaining what I did to implement Lucene in the near future.

The only issue with slide text is that I have not been extracting it from the time the site launched (only started saving it a few months ago) and I can not always obtain the text (depending on what the original file format was).

Can you tell me more about the design patterns? I'm not familiar with them (I've only recently started using any search engine).

Dave Phipps's Gravatar Sorry to post this here, but could someone take a look at these 2 urls:

http://www.methodist.org.uk/
and
http://www.methodist.org.uk/index.cfm?fuseaction=home.gsearch

try entering a search such as:

"Towards three new districts"

include the quotes. The search in the first url will display a lucene (Seeker) based search and the second uses a google custom search with no annotations etc. We are evaluating the 2 options for the client and I would prefer to use Seeker as I can control the index however the results are very different and the client is preferring the google option. Why does the Seeker score mostly come out as 0? Does anyone have any tips for improving the quality and relevance of the search results. I have been trying to read up on the Lucene docs but it's all a little over my head! I just want to be able to index the content (a mix of database and file based) and provide relevant results.

Cheers,

Dave

P.S. If Ray is reading this, could you add a new forum to your site for discussing Seeker?

Avi Rappoport, Searchtools.com's Gravatar Well, OK, they're not called "design patterns" as such. And I don't seem to have any proper articles, this bit is all stuck into the talks I've been giving lately.

So you've inspired me, which is good. Would you be interested in some free consulting from me to help you improve the usability of your search results?


In the meantime, here are a couple of useful articles:

http://www.uie.com/articles/search_results/

http://www.useit.com/alertbox/reading_pattern.html...

http://googleblog.blogspot.com/2009/02/eye-trackin...

http://www.searchtools.com/analysis/how-people-sea...

http://www.searchtools.com/guide/nomatches.html

Raymond Camden's Gravatar Hi Dave. I believe something could be wrong on your display page, or this is a new bug in Seeker. The scores should not all be 0. Please contact me directly and we can work together to figure it out. My email is ray at camdenfamily dot com.