Scalability of the Taxonomy Browser module: Restricting number of terms

Two very large sites that we manage for clients experienced several instances of outages recently.

On the first site, the symptom was a long running query and many other queries locked as a result.

Upon digging deeper, we found out that this was caused by the taxonomy browser module. This is a useful module for drilling down into a site's category structure for nodes that match certain criteria.

The query in question attempts to count the number of nodes in the selected terms. It works well when a few terms are selected. That is until the site's visitors decide that they want to see things with tens of terms intersected from various vocabularies.

This site has 4200+ terms, and 112,000+ entries in the term_data table.

But for this site, even with tagging vocabularies excluded from taxonomy browser's settings, it resulted in "the query from hell", like so:

SELECT COUNT(n.nid) FROM node n 
  INNER JOIN term_node tn0 ON n.nid = tn0.nid 
  INNER JOIN term_node tn1 ON n.nid = tn1.nid 
  INNER JOIN term_node tn2 ON n.nid = tn2.nid 
  INNER JOIN term_node tn3 ON n.nid = tn3.nid 
  INNER JOIN term_node tn4 ON n.nid = tn4.nid 
  INNER JOIN term_node tn5 ON n.nid = tn5.nid 
  INNER JOIN term_node tn6 ON n.nid = tn6.nid 
  INNER JOIN term_node tn7 ON n.nid = tn7.nid 
  INNER JOIN term_node tn8 ON n.nid = tn8.nid 
  INNER JOIN term_node tn9 ON n.nid = tn9.nid 
  INNER JOIN term_node tn10 ON n.nid = tn10.nid 
  INNER JOIN term_node tn11 ON n.nid = tn11.nid 
  INNER JOIN term_node tn12 ON n.nid = tn12.nid 
  INNER JOIN term_node tn13 ON n.nid = tn13.nid 
  INNER JOIN term_node tn14 ON n.nid = tn14.nid 
  INNER JOIN term_node tn15 ON n.nid = tn15.nid 
  INNER JOIN term_node tn16 ON n.nid = tn16.nid 
  INNER JOIN term_node tn17 ON n.nid = tn17.nid 
  INNER JOIN term_node tn18 ON n.nid = tn18.nid 
  INNER JOIN term_node tn19 ON n.nid = tn19.nid 
  INNER JOIN term_node tn20 ON n.nid = tn20.nid 
  INNER JOIN term_node tn21 ON n.nid = tn21.nid 
  INNER JOIN term_node tn22 ON n.nid = tn22.nid 
  INNER JOIN term_node tn23 ON n.nid = tn23.nid 
  INNER JOIN term_node tn24 ON n.nid = tn24.nid 
  INNER JOIN term_node tn25 ON n.nid = tn25.nid 
  INNER JOIN term_node tn26 ON n.nid = tn26.nid 
  INNER JOIN term_node tn27 ON n.nid = tn27.nid  
WHERE tn0.tid IN (1) 
  AND tn1.tid IN (4) 
  AND tn2.tid IN (484) 
  AND tn3.tid IN (17) 
  AND tn4.tid IN (33) 
  AND tn5.tid IN (18) 
  AND tn6.tid IN (19) 
  AND tn7.tid IN (34) 
  AND tn8.tid IN (38) 
  AND tn9.tid IN (20) 
  AND tn10.tid IN (21) 
  AND tn11.tid IN (22) 
  AND tn12.tid IN (40) 
  AND tn13.tid IN (23) 
  AND tn14.tid IN (25) 
  AND tn15.tid IN (24) 
  AND tn16.tid IN (29) 
  AND tn17.tid IN (32) 
  AND tn18.tid IN (41) 
  AND tn19.tid IN (52) 
  AND tn20.tid IN (27) 
  AND tn21.tid IN (51) 
  AND tn22.tid IN (26) 
  AND tn23.tid IN (39) 
  AND tn24.tid IN (43) 
  AND tn25.tid IN (42) 
  AND tn26.tid IN (35) 
  AND tn27.tid IN (28)

No wonder this query was running for 41 minutes when we killed it.

On the other site, this were worse, with MySQL's processlist showing many instances of the same query above, from taxonomy_browser_count_nodes():

No wonder the server quickly went to 100% CPU utilization!

In order to prevent users from selecting too many terms and bringing the site to its knees, we implemented the following validator to limit the number of terms selected to just 5.

The exact override depends on whether you have the settings show drop down lists, or checkboxes, as well as how many vocabularies are shown on the taxonomy browser search page.

For drop down lists, use the following:

/*
 * Add a validator to the taxonomy form
 */
function custom_form_taxonomy_browser_form_alter(&$form, &$form_state) {
  $form['#validate'] = 
    array_merge(array('custom_taxonomy_browser_validate'), $form['#validate']);
}

/*
 * Validate that we don't have more than 5 terms
 */
function custom_taxonomy_browser_validate($form_id, $form_state) {
  $count = 0;
  foreach($form_state['values']['taxonomy'] as $vocab => $data) {
    $count += count($data);
  }

  if ($count > 5) {
    form_set_error('taxonomy', t('Too many selections (@count).', 
      array('@count' => $count)));
  }
}

For checkboxes, users can select a lot of terms more, so we override the default validator, and add it in our validation function:

/*
 * Add a validator to the taxonomy form
 */
function custom_form_taxonomy_browser_form_alter(&$form, &$form_state) {
  $form['#validate'] = array('custom_taxonomy_browser_validate');
}

/*
 * Validate that we don't have more than X terms
 */
function custom_taxonomy_browser_validate($form_id, $form_state) {
  $max_terms = 5;
  $count = 0;
  foreach($form_state['values']['taxonomy'] as $vocab => $data) {
    foreach($data as $tid => $sel) {
      if ($sel > 0) {
        $count++;
      }
    }
  }

  // Check the number of selected terms
  if ($count > $max_terms) {
    form_set_error('taxonomy', 
      t('Too many selections. You selected @count, but only @max_terms are allowed.', 
      array('@count' => $count, '@max_terms' => $max_terms)));
    return;
  }

  // Check that the terms selected do have nodes
  $include_children = $form_state['values']['include_children'];
  $tids = _taxonomy_browser_get_tid_list($form_state['values']['taxonomy'], 
    $include_children);
  $operator = $form_state['values']['operator'] ? 'or' : 'and';

  if (!taxonomy_browser_count_nodes($tids, $operator, 0, $node_type)) {
    form_set_error('taxonomy', t('No posts match your criteria.'));
    return;
  }
}

The reason we do that is that if we left the other validator, it will run and it will call taxonomy_browser_count_nodes() with a lot of terms to find out whether there are results or not before doing the redirect. This check in itself kills the server.

Changing the term_data table to InnoDB is also an option. It would not make the long running query any faster, but will potentially prevent other queries from locking waiting for it to finish. But unless something else warrants it for this table, we will stick with MyISAM for the time being.

Contents: 

Comments

#698398

It is not really a patch. It is a form alter, so portable and out of the module's code.

I submitted an issue for this in #698398 many months ago. No response from the maintainer.