Browse Articles in Linux How To

Sphinx Search 2.0.5Here we will install Sphinx Search (sphinxsearch) version 2.0.5 in Ubuntu 12.04 Server from the command line. Once finished I will make sphinxsearch work together with my website for a real world example. The flow to reach our goal will be to download Sphinx Search from their website. Then we will install the software. Following the installation, we will then further set up automated indexing and get it working within our PHP project. Now lets get moving...

First get on your Ubuntu 12.04 command line and become root.

sudo su

Now we need to download the package sphinxsearch 2.0.5 from the Sphinx Search site.

cd
wget http://sphinxsearch.com/files/sphinxsearch_2.0.5-release-0ubuntu11~precise2_amd64.deb

Sphinx Search 2.0.5 requires a dependency package called libpq5 that you might not have yet on your server so run the following to install the deb package:

# install dependency
apt-get install libpq5

# install deb package
dpkg -i sphinxsearch_2.0.5-release-0ubuntu11~precise2_amd64.deb

Now we have Sphinx Search 2.0.5 installed on our Ubuntu 12.04 Server installation and we are ready to index some of our mysql data. Also we want to set up an automated cron job to index for us daily on off hours so that our index is always up to date with the new content of the day.

Here are the programs we need to get the job done:

/usr/bin/searchd
/usr/bin/indexer
/usr/bin/search

Our imaginary setup consists of a blog website sitting in a home directory. For example purpose, we have a folder structure of

/home/andrehonsberg/web/sphinx/etc

The above line shows that we have an etc in a sphinx directory which is were we store our config file

cat /home/andrehonsberg/web/sphinx/etc/sphinx.conf

Below is our configuration file that Sphinx Search will use to index the data we want indexed. Our purpose is to allow full text search for our blog content so the configuration below covers what such a setup might look like.

source andrehonsbergcom
{
    type                            = mysql    
    sql_host                        = localhost
    sql_user                        = mysqluser
    sql_pass                        = mysqlpass
    sql_db                          = mysqldatabase
    sql_port                        = 3306

    sql_query_range = SELECT MIN(id), MAX(id) FROM articles
    sql_range_step  = 128
    sql_query       = SELECT id, created, modified, title, content, tags, short_description, author_id FROM posts WHERE id>=$start AND id<=$end
}

index andrehonsbergcom {
    source = andrehonsbergcom
    path = /home/andrehonsberg/web/sphinx/sphinx
    morphology = stem_en
    min_word_len = 3
    min_prefix_len = 0
}

searchd {
    compat_sphinxql_magics = 0
    port = 3313
    log = /home/andrehonsberg/web/logs/searchd.log
    query_log = /home/andrehonsberg/web/logs/query.log
    pid_file = /home/andrehonsberg/web/logs/searchd.pid
    max_matches = 10000
}

Now that our configuration file is ready and points out exactly what we want sphinxsearch to index, we can move on to the actual indexing. Still as root user do the following to index your data:

/usr/bin/indexer --config /home/andrehonsberg/web/sphinx/etc/sphinx.conf --all

Now your mysql query in the config file is indexed. We can now move forward and set up sphinxsearch to start at startup of our server in case we need to reboot or something happens that restarts your server. As root do the following:

nano /etc/rc.local

Now we need to add the following command right before the last line which displays exit 0 so that our file looks like:

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

/usr/bin/searchd --config /home/andrehonsberg/web/sphinx/etc/sphinx.config

exit 0

Make sure to replace andrehonsberg with your directory. What the above script does is it starts the search daemon using the configuration file we used earlier.

Now let us test our installation by starting the sphinxsearch daemon. Run the following to start searchd:

/usr/bin/searchd --config /home/andrehonsberg/web/sphinx/etc/sphinx.conf

Once the we have searchd running we can test if the index work by doing the following:

/usr/bin/search -c /home/andrehonsberg/web/sphinx/etc/sphinx.conf mysql

The result should look similar to what I have below. Obviously if your data has different content yours would look different but the below display is just so you can see how it is supposed to look like.

Sphinx 2.0.5-id64-release (r3308)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/andrehonsberg/web/sphinx/etc/sphinx.conf'...
index 'andrehonsbergcom': query 'mysql ': returned 15 matches of 15 total in 0.024 sec

displaying matches:
1. document=120, weight=4660
2. document=125, weight=4655
3. document=6, weight=4645
4. document=115, weight=4645
5. document=100, weight=4634
6. document=93, weight=3660
7. document=99, weight=3645
8. document=60, weight=2609
9. document=118, weight=1645
10. document=105, weight=1634
11. document=10, weight=1624
12. document=7, weight=1609
13. document=117, weight=1609
14. document=119, weight=1609
15. document=108, weight=1579

words:
1. 'mysql': 15 documents, 82 hits

Now that we know everything is working we have 2 things left to do. First, we want to make sure that our indexer runs every day so that the new data entered into our MySQL database is always indexed. If you have more traffic or need faster indexing, you can of course always make the indexer interval smaller or even run it right after publishing. For the purpose of this article I will set the site up to run the indexer every day around midnight.

The command that we need to run to index the results while searchd is already running is the following:

/usr/bin/indexer --rotate --config /home/andrehonsberg/web/sphinx/etc/sphinx.conf --all

Now we need to run the above command from cron so do the following as root:

crontab -e

Add the following line to the end of the crontab file:

# run the indexer for andrehonsberg.com at midnight
0 0 * * * /usr/bin/indexer --rotate --config /home/andrehonsberg/web/sphinx/etc/sphinx.conf --all

Now we have finally fully set up our indexing to the point where we do not need to worry further. Our indexer runs at midnight every day and we made sure that if our server goes down the search daemon starts back up without our interaction. As promised, I will now go over how to use Sphinx Search 2.0.5 in our PHP code on Ubuntu 12.04 Server.

First we need to retrieve the sphinxapi class for PHP. The scope of our PHP example is simple and tries to search articles for keywords. I will include a snippet below to show how you could go about searching with the API.

include(DIR_CLASSES . 'sphinxapi.class.php');
$out = '';
$rowsOnPage         = $config['SPHINX_ROWS_PER_PAGE'];
$sphinxOffset       = 0;
$sphinxNumRows      = $config['SPHINX_ROWS_RETURNED'];
$sphinxHost         = $config['SPHINX_HOST'];
$sphinxPort         = $config['SPHINX_PORT'];
$sphinxIndexName    = $config['SPHINX_INDEX_NAME'];
$sphinxNumRowsAll   = 100;
$searchQuery        = (!empty($_POST['search_q'])) ? trim(strip_tags($_POST['search_q'])) : null;
$q                  = '';

if (isset($_SESSION['prev_q']) && isset($_GET['page'])) {
  $q = $_SESSION['prev_q'];
  unset($_SESSION['prev_q']);
}
else {
  $q = $searchQuery;
}

$pageNum            = (isset($_GET['page'])) ? (int)$_GET['page'] : 1;
$sphinxNumRowsAll   = $pageNum * $sphinxNumRows;
$s                  = new SphinxClient();

$s->SetServer($sphinxHost, $sphinxPort);
$s->SetLimits(($pageNum - 1) * $sphinxNumRows, $sphinxNumRows, $sphinxNumRowsAll);
$s->SetMatchMode( SPH_MATCH_ANY );
$s->SetSortMode( SPH_SORT_RELEVANCE );

$res                = $s->Query($q, $sphinxIndexName);
$_SESSION['prev_q'] = $q;

$db = new Database(DBHOST, DBUSER, DBPASS, DBNAME, 1);

$pgOut              = '';
$pgLinksOut         = '';
$pgLinkRange        = 4;
$pgPages            = 0;
$pgStats            = '';

if ($res !== FALSE) {
  // we have results
  if (!empty($res['matches'])) {    
    $pgPages = ceil($res['total_found'] / $rowsOnPage);
    $pgLinksOut = '<div class="pagination_container">';
    $pgLinksOut .= '<span class="ui-corner-all pagination-links"><b>' . $res['total_found'] . '</b> article(s) on <b>' . $pgPages . '</b> page(s)</span> ';
    if ($pageNum > 1) {
      $pgLinksOut .= '<span class="ui-corner-all pagination-links"><a href="'.WEB_ROOT.'search/1"><< First</a></span> ';
      $pgPrevPage = $pageNum - 1;
      $pgLinksOut .= '<span class="ui-corner-all pagination-links"><a href="'.WEB_ROOT.'search/' . $pgPrevPage . '" rel="prev">< Prev</a></span> ';
    }

    // loops through the rest of the links
    for ($pgX = ($pageNum - $pgLinkRange); $pgX < (($pageNum + $pgLinkRange) + 1); $pgX++) {
      if (($pgX > 0) && ($pgX <= $pgPages)) {
        // we have a valid page number
        if ($pgX == $pageNum) {
          // we are on current page
          $pgLinksOut .= '<span class="ui-corner-all pagination-links-current"><b>' . $pgX . '</b></span> ';
        } else {
          // not current page ... make link
          $pgLinksOut .= '<span class="ui-corner-all pagination-links"><a href="'.WEB_ROOT.'search/' . $pgX . '">' . $pgX . '</a></span> ';
        }
      }
    }
if ($pageNum != $pgPages) {
      // we are not on the last page
      $pgNextPage = $pageNum + 1;
      $pgLinksOut .= '<span class="ui-corner-all pagination-links"><a href="' . WEB_ROOT . 'search/' . $pgNextPage . '" rel="next">Next ></a></span> ';
      $pgLinksOut .= '<span class="ui-corner-all pagination-links"><a href="' . WEB_ROOT . 'search/' . $pgPages . '">Last >></a></span> ';
    }

    $pgLinksOut .= '</div>';

    foreach ($res['matches'] as $match => $matchInfo) {
      .......
    }
  }
  else {
    $out .= "<div class='alert-box'>No search results found</div>";
  }
}
else {
  $out .= "<div class='alert-box'>No search results found</div>";
}
$db->close();
?>
<div id="search-results">
  <? echo $pgLinksOut; ?>
  <div class="category-name">
    <h2>Search Results for: <? echo $searchQuery; ?></h2>
  </div>
  <? echo $out; ?>
  <? echo $pgLinksOut; ?>
</div>

This now completes this article. What we have learned is how to set up Sphinx Search (sphinxsearch) 2.0.5 in Ubuntu 12.04. We also learned how to keep the indexes fresh and how to get the search daemon back up automatically in case of a crash of our system. We also covered how we can utilize Sphinx in our PHP projects. Hope this was helpful... Peace

Want to leave a Commnet?
Benino
Friday the 23rd of August, 2013 at 09:22:31
Gravatr Image
How do you set the --config option when using the php api?


for esamle:


/usr/bin/search -c /home/andrehonsberg/web/sphinx/etc/sphinx.conf mysql


if i make this call without the -c option it tries to use the config file here: /etc/sphinxsearch/sphinx.conf and fails
Andre Honsberg
Friday the 23rd of August, 2013 at 12:04:17
Gravatr Image
You have your own config file somewhere where you enter the port on which to run on and your mysql options. When you run search on the command line -c switch needs the path to your own config file. Before you can access the data and search it with PHP or search on the command line, you need to index it with: /usr/bin/indexer --config /your/path/to/your/sphinx/sphinx.conf --all . The all switch tells the indexer to index all your sources in your config. Once your data is indexed you can run searchd and tell it where your config is like so: /usr/bin/searchd --config /your/path/to/your/sphinx/sphinx.conf Now searchd is running with the same config your indexer ran on. Now search -c /path/to/your/config... will work since you indexed. In PHP you now have to make sure your connection settings respect your config and you can use the API. Check the PHP config above. Just enter the data manually in the right fields if you need.