Translations of this page:

Faceted Search with Lucene

For a new project I wanted to have a look at Faceted navigation but first of all, just what is a Facet? Well this is what Wikipedia has to say:

Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information. A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order. Each facet typically corresponds to the possible values of a property common to a set of digital objects.

Hmmm right. Well lets translate that from Geek. What they mean is that an object or document (in Lucene terms) can have a number of attributes. For example, lets say we have a Fruit object. Well there are different species of fruit such as Apples, Oranges, Bananas and each species can have different varieties such as: Granny Smith, Cox etc in the case of Apples. Fruit also has a price per kilogramme. So we have three facets: Species, Variety, Price. For each facet we have a number of values, so if we click on Apple we will only see facet values related to Apples, varieties such as “Jaffa” will be filtered out. For price we generally want to give a range, say 0-50 cents, 50 c - $ 1 etc. More of that later.

Accepted wisdoms says if you want facets you need the Solr extension to Lucene. However if you want to embedded your search then embedding Solr is not well supported. All of the Solr goodies are built on top of Lucene and it is possible to roll your own faceting. Indeed there are moves to roll the Solr faceting features into the Lucene Core. Lucene 3.2 implements Grouping which are the first moves in this direction.

Another solution exists in the form of the Bobo developed by LinkedIn and which powers their facetted people search. This article provides a simple demonstration of how to use this library. No knowledge about the source documents are needed for this to work. I provide an example based on French avalanche reports but the navigation will work on any source data without code changes as we discover the indexed fields programmatically.

The example was also a chance to use the SpringSource Tool Suite version of Eclipse. I've worked with Spring for 5 years, the last 4 on a long term project. This used vanilla Eclipse and Maven and I'd never really looked at STS. I've no excuse. So first thing I did was download STS and create a basic Spring MVC project. This gave me a standard Maven layout along with a Controller and View JSP as well as the Spring infrastructure.

HomeController.java

HomeController is generated by STS. I added my own controller method. The first thing this does is open a Lucene index in the “tmp” directory (a Unixism). It reads all the field names that are present in the index and adds these to the model. These are our facets. Frequently you will generate your facet list from a different source and it will be a subset of the indexed fields. In our example any indexed field is a facet.

@Controller
public class HomeController {
 
	private static final Logger logger = LoggerFactory
			.getLogger(HomeController.class);
 
	@RequestMapping(value = "/", method = RequestMethod.GET)
	public String search(HttpServletRequest request, Locale locale, Model model)
			throws Exception {
 
		Directory index = new NIOFSDirectory(new File("/tmp/indexDirectory"));
		IndexSearcher searcher = new IndexSearcher(index, true);
		IndexReader reader = searcher.getIndexReader();
		Collection<String> fieldNames = reader.getFieldNames(FieldOption.ALL);
		model.addAttribute("fieldNames", fieldNames);

We now set up facet handlers for each field and add these to the BoboIndexReader. We will only use the first 10 results starting at the first result.

		List<FacetHandler<?>> handlers = new ArrayList<FacetHandler<?>>();
		for (String fieldName : fieldNames) {
			System.out.println("adding facet for " + fieldName);
			handlers.add(new SimpleFacetHandler(fieldName));
		}
		BoboIndexReader boboReader = BoboIndexReader.getInstance(reader,
				handlers);
 
		BrowseRequest br = new BrowseRequest();
		br.setCount(10);
		br.setOffset(0);

This is our facet filter. We loop over any query parameters which will be facet filters such as Activity=Skiing, we create a browse selection based on this name/value pair and add it to the browse request. At the same time we rebuild the query string to pass to the view.

 
		StringBuilder facetQuery = new StringBuilder();
		// BrowseSelection: A selection or filter to be applied, e.g.
		// Activity=Skiing
		Map<String, String[]> map = request.getParameterMap();
 
		for (Map.Entry<String, String[]> entry : map.entrySet()) {
			// remember previous querys
			facetQuery.append(entry.getKey());
			facetQuery.append("=");
			facetQuery.append(entry.getValue()[0]);
			facetQuery.append("&");
 
			BrowseSelection sel = new BrowseSelection(entry.getKey());
			sel.addValue(entry.getValue()[0]);
			br.addSelection(sel);
		}// for
		model.addAttribute("facetQuery", facetQuery.toString());

Now we specify how the facets we are interested in should be displayed. If the facet has been passed in the query string we exclude it from this list. Here we include all facets with at least one item and we order by descending number of hits.

		for (String fieldName : fieldNames) {
			if (!map.containsKey(fieldName)) {
				FacetSpec facetSpec = new FacetSpec();
				facetSpec.setMinHitCount(1);
				facetSpec.setOrderBy(FacetSortSpec.OrderHitsDesc);
				br.setFacetSpec(fieldName, facetSpec);
			}
		}

Now we perform the query and add the facet map to the model for later display. We finish by handing off to the showFacets view. If you look in the spring application-context xml file you will see that we have configured JSP as the view handler and views are found under WEB-INF/views

 
		// perform browse
		Browsable browser = new BoboBrowser(boboReader);
		BrowseResult result;
		try {
			result = browser.browse(br);
 
			// search query result
			int totalHits = result.getNumHits();
			BrowseHit[] nhits = result.getHits();
 
			model.addAttribute("facets", result.getFacetMap());
 
		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
 
		reader.close();
		searcher.close();
		return "showFacets";
	}
 
}

showFacets.jsp

Here is the view jsp: showFacets.jsp. Normally I do views in Velocity so this was a chance to use the jstl taglib which I've been using a lot in Liferay projects recently. We'll use the jstl core tags as well as the spring taglib.

<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<%@ taglib prefix="spring" uri="http://www.springframework.org/tags"%>
<%@ page session="false"%>
<html>
<head>
<title>Simple Facet Navigation</title>
<style type="text/css">
body {
margin: 0px;
font-family: Verdana, Arial, Helvetica, sans-serif;
color: #444;
}
th, td {
    border-right: 15px solid transparent;
}
</style>
</head>

We start by displaying all the potential facet names

 
<body>
	<h1>Facet Navigation Example</h1>
	<h2>Field Names</h2>
	<ol>
		<c:forEach items="${fieldNames}" var="field">
			<li><c:out value="${field}" /></li>
		</c:forEach>
	</ol>
 

And here we show the facets which you can click on to narrow down the search.

<h1>Navigation</h1>
<p>Click on a link to narrow your search:-</p>
<table>
	<tr>
		<c:forEach items="${facets}" var="facet">
			<td valign="top" align="left">
				<b><c:out value="${facet.key}" /></b><br/>
 
				<c:forEach items="${facet.value.facets}" var="field">
				    <c:choose>
				    <c:when test="${field.hitCount == 1}" >
						<c:out value="${field.value}" /> (<c:out value="${field.hitCount}" />)<br/>
				    </c:when>
				    <c:otherwise>
				        <spring:url var="url" value="/?${facetQuery}${facet.key}=${field.value}" htmlEscape="true" />
					<a href="${url}" ><c:out value="${field.value}" /> (<c:out value="${field.hitCount}" />)</a><br/>
				    </c:otherwise>
				</c:choose>
				</c:forEach>
				<td>
		</c:forEach>
	</tr>
</table>
<a href='<spring:url value="/" htmlEscape="true" />'>Clear</a>
<p>
(C)2011 <a href="http://www.abcseo.com/">David B George</a>
</p>

Source Code

You will need to download the bobo jar separately and install it in your maven repository. See the script in the ./jars subdirectory

facets.tar.gz

tech/search/facetted-search-with-lucene.txt · Last modified: 2011/08/30 20:01 by davidof
Recent changes RSS feed