Wednesday, September 7, 2011

Drupal and Solr Integration

The basic Apache Solr integration is rather simple.  Download the Solr Integration module from:
http://drupal.org/project/apachesolr

Install it and enable the module.  Follow the included readme directions since you need to copy the schema.xml, protwords.txt, and solconfig.xml files so Solr can talk the same "language" as Drupal.

Configuring Solr Integration Module
Once you have installed Solr, enabled the module, and cron has run, some data should be in Solr.  Configuring the module with make your search results much more useful.  I'm giving the steps for Drupal 7, but they are a little different for Drupal 6.

Go to Configuration > Search and Metadat > Apache Solr Search
  • Edit the search environment to point to your Solr server
  • Under Behavior on empty search, pick the last radio item which talks about showing the first page of results.  This is best for testing purposes so you can see results without filtering on terms.
  • Also enable the spellchecker since they may help "find" results
 Go to the Search Index tab
  • If it doesn't say 100% of your site has been index, click run cron to send some documents.  Also check number of documents in the index, to see if anything has been sent
The enabled filters tab is where you can define what content types will be indexed

You can also edit the content bias and search field tabs to change result weighting once you get the basic search running.

Finally go to Configuration > Search Settings
  • Make sure Apache Solr Search is enabled under modules
  • Make sure Apache Solr Search is set to be the default engine
  • Save your settings

One final tweak is to enable the blocks you want to see, so go to Structure > Blocks
  • Scroll down till you see the Disable section
  • Pick some of the Apache Solr items there and assign then to your sidebar
You should see items such as:
  • Apache Solr environment: localhost server : Current Search
  • Apache Solr Core: Sorting
Your's may look a little different based on other modules that have been installed, but they should look very similar.

Additional research (homework) to do on your own...take a look at different modules that can help extend Solr:
  • Apache Solr Attachments
  • Apache Solr Autocomplete
  • Facet API
  • Search API
  • Search API Solr


Apache Nutch Integration
Another interesting idea is to add Nutch:
http://nutch.apache.org/

To webcrawl websites and import its data into Solr.  This part gets a little trickier because Nutch has its own schema it wants you to put into the schema.xml, so you end up having to merge to different schemas to get a final result.  I ended up using only a few of the suggested schema changes and added the following line to schema.xml
<copyField source="id" dest="path"/>

This told Solr to copy the id field to a new field called path (which Drupal looks for to display search results).


I then edited the solrindex-mapping.xml file for Nutch as follows:

<mapping>
        <fields>
                <!--field dest="site" source="site"/-->
                <field dest="type" source="site"/>
                <field dest="title" source="title"/>
                <field dest="host" source="host"/>
                <field dest="segment" source="segment"/>
                <field dest="boost" source="boost"/>
                <field dest="digest" source="digest"/>
                <field dest="tstamp" source="tstamp"/>
                <field dest="id" source="url"/>
                <field dest="body" source="content"/>
                <copyField source="url" dest="url"/>
        </fields>
        <uniqueKey>id</uniqueKey>
</mapping>
 
This correctly created search documents in the Solr index which were in the correct format the Apache Solr module was looking for.  Also note I copied the "site" field to "type" so when you turn on facets in Apache Solr, the Drupal content types get faceted along with the URL of the site you are webcrawling.  This was a person choice, but you can categorize any other way.


I realize this is a mashed up entry with several different directions, but I wanted to give a little flavor of all the way you can integrate Solr into Drupal and do it with data from multiple sources.  For additional reading, here is a link to lots of projects I found on Drupal:

http://drupal.org/node/343467#other-documentation

Next entry I want to talk about how I used Solr's Data Import Handler to index data from RSS/blogs, MySQL and XML files.

No comments:

Post a Comment