Translations of this page:

Solr Cores and Multiple Indexes

If you are dealing with multiple sources with different data models you may need to consider if you want a single or multiple indexes for you search application. It is a similar problem to denormalizing data in a traditional database. Using a single index may result in name-space collisions and lower the quality of document scoring algorithms. The main benefit of separating indexes is scalability in terms of update and query performance. Multiple indexes will make the schema easier to read and tune. However for less than around a million documents it may not be worth the small effort involved.

Multi-indexes can be achieved with the Solr cores feature introduced in Solr 1.3. Start by creating a solr.xml file in the solr.home directory. In this case we are indexing two sources. Data from Liferay and data from an Oracle Products Database.

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" sharedLib="lib">
 <cores adminPath="/admin/cores">
   <core name="liferay" instanceDir="liferay" >
     <property name="dataDir" value="/opt/solr-instance/liferay/data" />
   </core>
   <core name="products" instanceDir="products">
     <property name="dataDir" value="/opt/solr-instance/products/data" />
  </core>
  </cores>
</solr>

We give the two cores logical names and create subdirectories for each core. We also explicitly tell Solr where the data directories are located. We can access the Admin and Data Import URLs using the Core names:

(Note we are running Solr under a Liferay Tomcat instance in this example on the port 8080).

We now need to create the directory structure for each core and the configuration files e.g.:

liferay
  conf
    solrconfig.xml
    schema.xml
  data
    index
    spellchecker

There is just one thing. You will have to copy any jars required for indexing into the solr-instance/lib directory (you can change the location of this directory in solr.xml). For example, to use Tika to index PDF documents you will need:

 solr/lib
   apache-solr-dataimporthandler-extras-4.0-dev.jar
   pdfbox-1.1.0.jar
   fontbox-1.1.0.jar
   tika-core-0.6.jar
   tika-parsers-0.6.jar

You can also dump these in the tomcat lib directory which may cause less problems with different class loaders.

You can search your cores using a sharded url:

http://myhost.com/solr/liferay/select?shards=localhost/solr/products/,localhost/solr/liferay&q=moteur+drive

and the Java code

<%@page import="org.apache.solr.client.solrj.SolrServer"%>
<%@page import="org.apache.solr.client.solrj.SolrQuery"%>
...
SolrServer solrServer = new CommonsHttpSolrServer("http://localhost/solr/liferay");
SolrQuery solrQuery = new SolrQuery();
solrQuery.setParam("shards", "localhost/solr/products/,localhost/solr/liferay");

Further Reading

tech/search/solr-cores-and-multiple-indexes.txt · Last modified: 2010/06/16 10:37 by davidof
Recent changes RSS feed