Building A Better Search For SlideSix - Part 1
Posted By : todd sharp Posted At : October 22, 2009 8:47 AM Posted In: Java, ColdFusion, SlideSix
2
After finishing up a few outstanding fixes and enhancements for SlideSix the other day I decided to give search a bit of love. Well, I didn't quite decide it; it was more of the gentle reminder from Sean Corfield about the existing search not really working all that well that prompted me to action. Regardless, I learned a few things about search that I'm going to share over a few blog posts that will use this alliteration littered title.
Search is one of those features on a website that isn't really glamorous. It either works (as people expect it should) or it doesn't (and people get pissed). Most users probably don't realize (or care) what a complex task it really is to accomplish behind the scenes. Good search should focus on two things: quickness and accuracy. Of course there's more to it, but in the end that's all that really matters.
Developers have a few options when it comes to search. You can go the SQL route which looks something like this:
from slideshows
where name like <cfqueryparam value="%#arguments.searchString#%" />
//etc...
Which works per se, but it's not efficient. Start adding some more columns and joined tables to that query and things get ugly really quickly.
The more efficient way is to use some sort of library to assist you in indexing your content in a collection and searching against that collection. There are many options for such software. ColdFusion has long shipped with built in support for Verity. To use Verity in ColdFusion you use the <cfcollection>, <cfindex> and <cfsearch> tags to - yeah, you guessed it, build collections, index content and search against the collections. Verity is pretty powerful, but it does have limitations. More importantly though Verity was acquired in 2005 and likely won't be around too many more years.
ColdFusion 9 added Solr, an "open source enterprise search server based on Lucene Java". Those who've upgraded to CF9 can now take advantage of Solr/Lucene by using those same <cfcollection>, <cfindex> and <cfsearch> tags that they've grown to know and love (hate?). Lucene is free, open source and powerful. Companies like Disney and LinkedIn (among many others) use Lucene.
For SlideSix I decided to go the Lucene route since I needed a flexible architecture that gave me granular control over the searching and indexing and scalability for when SlideSix gets bigger then those other craptastic slide sharing imitators on the web. Since I'm not yet on CF9 I decided to use Seeker, a custom tag wrapper for Lucene built by a guy you've probably heard of if you've ever written a line of ColdFusion. Seeker is beautifully easy - download it and drop it into your project, download Lucene and put it in your CF classpath and you're good to go. You could easily modify the tags to dynamically load Lucene using Java Loader, but since I had access to the classpath I decided to be lazy and just go with it. I should mention that you could probably skip the step of installing Lucene all together on CF9 since it already includes the Lucene Jar, but I haven't fully tested that to make sure there are no version compatibility issues.
I'll wrap this first post up here for now. In future parts we'll look at some of the lessons I've learned while working with Lucene. I should note that although I'm rolling with Seeker these lessons will likely be applicable for most other search libraries as they'll be focused on best practice type things. Stay tuned and be sure to drop a comment if you have questions or specific things you'd like me to address.


