Quantcast
Channel: drupal.org - Site administrators
Viewing all articles
Browse latest Browse all 426

Common pitfalls

$
0
0

This page lists the pitfalls most commonly encountered by new users, to hopefully minimize the people falling into them in the future.

When you want to use Views to create a search page with a fulltext search, only use the "Search: Fulltext search" filter (or contextual filter) for the keywords input! Specifically for the filter, also make sure that "Use as" is switched to "Search keys", not to "Search filter".

"Search keywords" are special in the Search API, compared to normal filters, in that they are parsed into separate words (unless you are using the "Single term" parse mode) that will all be searched separately. Normal filters, even on fulltext fields (= fields indexed as type "Fulltext"), will search for entered phrases as a whole, as if the keywords were put in quotes. Furthermore, only proper keywords will influence the relevance of results, if you are using this mechanism for sorting – filters won't do that.

So, even if you only want fulltext searches on a single field – if you want "normal" fulltext search behavior, use the "Search: Fulltext search" filter!

Problems with non-ASCII characters

These problems are highly backend-specific, as the Search API itself doesn't specify or implement any constraints or special treatments of characters. (However, if you have the Transliteration module enabled, a Transliteration processor will become available which can help alleviate most problems, regardless of service class. Any data returned from the search server, like facets, would then be in transliterated form, too, though.)

For the Database search service class, #1144620: Fix problems with umlauts, accented characters, etc. contains a discussion on that topic. In the issue's course, a patch was also committed to the module which should solve the problem on MySQL servers. For other types of SQL servers, patches are still needed, however.

For the Solr search, Solr should already treat non-ASCII characters properly. However, English stemming is applied by default, which might lead to wrong results for other languages. See the Solr module's documentation for how to fix this.

Changes in related entities don't lead to re-indexing

Through the use of the "Add related entities" form on an index's Fields tab, it is possible to index the fields of entities (or other structures) related to your indexed items. For example, you could index the names of taxonomy terms contained in a node's taxonomy reference field. Or – e.g., for access control – the user roles of the node's author.

However, when you now change the name of a taxonomy term (or the roles of a user), you'll notice that the nodes who reference that term (or user) aren't getting marked as "dirty" and, subsequently, re-indexed. This leads to those fields containing related data to become stale. Sadly, this is very hard to solve in the Search API, so a solution to this problem could still take a while. (See #2007692: Changes in related entities don't lead to proper re-indexing for a discussion of this issue.)

There are a few custom workarounds available which you can use for your site:

  • Probably the easiest and most comfortable to implement would be to use the Rules integration of the Search API to automatically re-index (or mark as "dirty") items when their related entities change. (The rules to create for this of course depend on your specific setup.)
  • If you are (or employ) a developer: Use custom code to do the same. In hook_X_update(), just call search_api_track_item_change() with the appropriate (indexed) item type and IDs. (See below for an example.)
  • If such changes occur only very rarely, and if the site is rather small and only maintained by you, you can also just manually re-save all affected items if such a change occurs.

An example for doing this in custom code (when you have the name of related taxonomy terms indexed for a node index) follows:

<?php
/**
* Implements hook_taxonomy_term_update().
*/
function MODULE_taxonomy_term_update($term) {
  if (
$term->name !== $term->original->name) {
   
search_api_track_item_change('node', taxonomy_select_nodes($term->tid, FALSE));
  }
}
?>

Placing the above into a module file (and replacing MODULE with the module's identifier) will automatically mark nodes which reference a term as "dirty" when the term's name changes.

Having "Index items immediately" disabled can lead to leaks of confidential data

The "Node access" data alteration, which automatically filters out node results that the current user shouldn't be able to access, works with the indexed state of the entity. The same is true for manual set filters (e.g., in Views on the "Published" field) or most other access control mechanisms.
However, if the index's "Index items immediately" setting is disabled, changed items will (usually) not be indexed until the next cron run, which means the data in the index will be out-dated until then. Since, usually, the data of the results shown to the user comes from the database, not from the search index, this means that data which the user shouldn't see might be displayed to them in search results. However, this will be the case only for very specific setups:

  • The item must have been accessible previously and only later become inaccessible.
  • When the item becomes inaccessible, some data must be added that end users shouldn't see. (Otherwise, only data they could see before anyways will be shown to them.)
  • The "secret" data must be in a field that will be displayed in the search results (or could end up in an excerpt shown with the results).

If this setup applies to your site, it is very much recommended that you enable the "Index items immediately" option for the index in question. (Using Rules to immediately index items only if such a change occurs is also possible, if the load on the server would otherwise be too high. However, keep in mind that Solr's commit behavior might prevent this from working as expected.)

If you are using Solr, enabling the server's "Retrieve result data from Solr" option might also be a way to prevent this from happening, since the search will then show the old data while the new one isn't indexed, not the one with the confidential content added. This will not work for data in Field API fields shown in Views results (due to restrictions imposed by the Field API) – so this is only an option if you aren't using Views, or if the confidential data won't be in a Field API field.


Viewing all articles
Browse latest Browse all 426

Trending Articles