The Solr configuration files packaged with this module are provided in a way to make customizing as easy as possible. The “core files” with the base configuration for the Solr server are schema.xml
and solrconfig.xml
. These should never be edited directly as they will have to be updated if future versions of the Search API Solr search module changes these files (though this shouldn't be the case too often).
The other files, however, only contain some default settings or only documentation, to help you customize your Solr server. These files will only rarely change, and when they do it should either be unnecessary to update your copies, or trivial to do so. Therefore, you can fill and edit them with custom settings specific to your site's needs. For the format of these files and what you can do with them, see the documentation comments included in them, or the official Solr wiki. The three *_extra*.xml
files are included into schema.xml
and solrconfig.xml
when they are read, thus allowing you to easily add settings to them.
Remember: After changing any configuration, you will always have to restart your Solr server for the changes to take effect!
A few examples for possible customizations follow.
- Changing the Solr type of a field
- Changing the language of a fulltext field
- Creating a text type for partial matching
- Using the correct Lucene version
Changing the Solr type of a field
The schema.xml
file contains several alternatives for most data types that aren't used by default. For example, for fulltext fields there are text
(the default), text_ws
, text_und
and edge_n2_kw_text
; for (long) integers, there are long
(used by default), slong
and tlong
.
If you want to use such a type for one of your indexed fields, it's pretty easy: you first have to find out the internal name Solr uses for the field. This will be the internal Search API field name (can be seen, e.g., in the source code of the index's Fields tab) prefixed by a few letters and an underscore. For example, the Solr field name for the body text of a node is tm_body:value
.
Then just put the following inside of schema_extra_fields.xml
:
<fields>
<field name="FIELD" type="TYPE" indexed="true" stored="true" multiValued="(true|false)" />
</fields>
For the right multiValued
(and perhaps other) settings, it's easiest to look inside the schema.xml
file for the <dynamicField>
declaration with the prefix matching your field, and copy all its settings except for name and type.
So, for example, to change the Solr type of the node's body text to text_ws
, use:
<fields>
<field name="tm_body:value" type="text_ws" indexed="true" stored="true" multiValued="true" termVectors="true" />
</fields>
Sadly, due to restrictions of Solr itself, replacing the Solr type used for a certain Search API data type alltogether is not possible. If you want to do that, you will manually have to change it for all fields of that type – though you can use dynamic fields for at least a little help: e.g., if you want to replace the type for the fields is_comment_count
Changing the language of a fulltext field
By default, all text fields in Solr will use English stemming. If you want to use stemming for a different language (or other modifications), you'll have to create a new type with these settings and then configure the relevant fields to be indexed with this type. (How the latter is done was already explained above – just add field definitions for some or all fields with the tm_*
prefix with your customly added type.)
For adding the custom text type, just copy the definition of the text
type in schema.xml
to schema_extra_types.xml
. The type definition is the block starting with <fieldType name="text"
and ending with the next </fieldType>
(about 54 lines in total). Then edit the copy in schema_extra_types.xml
to your liking.
First, change the identifier (in name="text"
right at the beginning) to some other, not already used one – e.g., text_fr
for French text. (An example for German is already included in the schema_extra_types.xml
file – just remove the comment to use it.) You can use any identifier you like, though, so iwflksxf
is also fine.
Then replace the two occurrences of "English" in the definition with the language of your choice – see this Solr wiki page for a list of supported languages.
If you want to use several languages at once on this Solr server, and therefore can't just fill synonyms.txt
, protwords.txt
, etc., with settings for your language, you can also set new, language-specific files for these default files here. Just replace the respective file names in the definition.
To add more than one type, just copy one or more additional type definitions after the closing </fieldType>
of the first one.
Finally, just add the <field>
definitions using the new type(s) to schema_extra_fields.xml
as described above.
Creating a text type for partial matching
(For actually using that type for your fields, again, see above.)
By default, the Solr search module doesn't support partial (or substring) matching. E.g., when searching for "break", items containing "breakpoint" (or "unbreakable") aren't found. This default was selected since it returns more reliable results that don't just contain the search keys by accident, and since it will perform better for larger data sets. Also, stemming already takes care of some of these queries (see also Solr's notes about stemming).
However, on many sites users will expect partial matches to be returned. Luckily, Solr already comes equipped with text analysis tools to easily implement this for your server: the solr.NGramFilterFactory
and the solr.EdgeNGramFilterFactory
filters. The difference is that, with the latter, only partial matches at the beginning (or, optionally, at the end) of words will be found, while the former will find all substrings contained in a word. Which of these you want to use depends on your specific use case / site. The procedure is nearly identical in both cases, though:
First, copy a text type definition to schema_extra_types.xml
and change the identifier, as described above.
Then, add the following line to the type definition after the first occurrence of "solr.SnowballPorterFilterFactory" (inside of the <analyzer type="index">
element; not after the second occurrence):<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
If you want partial matches inside of words to be found, too, simply remove the "Edge" part from that line. In this case, you should also remove both occurrences of the solr.WordDelimiterFilterFactory
filters: remove everything from the <filter
that preceeds that string to the "great than" sign (>) coming after it.
Now, after also adding the field definitions, re-starting your Solr server and re-indexing your content, partial matches should be found with searches on your site.
Using the correct Lucene version
Starting with Solr 3.x, it is possible (and mandatory) to specify the version of Lucene your Solr server should use. Since the module developers cannot know what version of Solr their users will running, the default config files contain defaults for Solr 3.5 or Solr 4.0 (depending on config version), which will also work for all later versions (of the same major version, i.e., 3 or 4).
However, for best performance, the latest bug fixes, etc., you should definitely use the latest version available to your server, which will be the version of Solr itself. This setting can easily be changed in the solrcore.properties
file provided with the config files. Just change the value after the equals sign = that starts with solr.luceneMatchVersion=
. The format to use is as follows: first LUCENE_
, then the major and minor version number you want to specify, without anything in between. So, for example, if you are using a Solr 4.2 server, the line in solrcore.properties
should look as like this:solr.luceneMatchVersion=LUCENE_42
Never use versions higher than that of your Solr server, as Solr will then refuse to start.
Caution: You should also keep in mind that for some minor version updates, the format of config files can change. This is especially the case for Solr 3.6. This means, that you cannot use versions of 3.6 or later for this setting and still use the default config files provided with this module. That's also why the default setting for the 3.x configs is 3.5 – it is the latest version that will work with the provided 3.x config files.
If you are using Solr 3.6 or higher (but still 3.x), you should either leave the setting unchanged at LUCENE_35
; or try to upgrade to Solr 4.x; or, if you are an advanced Solr user, use the correct Solr version and adapt the config files accordingly.