Lucene Number Range Search – Integers & Floats (with Zend_Search_Lucene)

I came across an interesting problem this afternoon which took me about an hour to unravel.   Things aren’t quite as straight-forward as one would expect when using Lucene’s range search in Zend Framework against a number, integer or float.

Ignoring for a moment that I’m using the less performant Zend_Search_Lucene instead of Solr, let me illustrate the problem.

The code below simply adds a field ‘price’ with the value ‘9.99’ ($9.99) to a document in a given Lucene index.

$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('price', '9.99'));
$this->index->addDocument($doc);

Using Lucene’s Index Toolkit (Luke), a query to find the above document might look like:

price:9.99

Which will return any document with a field ‘price’ of value ‘9.99’.

Now, how about if we want to list a number of documents based on a price range?

The query for such a search would typically use Lucene’s range search syntax:

price:[9.00 TO 10.00]

You’d expect that the above would return all documents with a field ‘price’ of value between $9 and $10 – right?  Wrong!

Let’s simplify this further.  How about if we opted to stick with whole numbers:

price:[9 TO 20]

This time, we’d expect the query to return all documents with a field ‘price’ of value between $9 and $20 – right?  Wrong!

Range searches are strings!  Solution….

Range searches are performed on strings only, so you must convert your integer or floating point number to a string.  This means that the number 9.99 must be stored like 00999.

It is not enought to simply place a zero (‘0’) or a letter in front of the number to convert it to a string.  It must be in alpha-numeric order.  For example, this is how an alpha-numeric sort would look like if you made the mistake of placing a zero in front of the number:

1.  0100          $100
2.  01234567      $1,234,567
3.  09            $9
4.  09100000      $9,100,000
5.  099           $99
6.  09999         $9999

Now, if you ensure the number is always the same length then the alpha-numeric sort will work as expected.  The following is correct:

1.  0000001       $1
2.  0000100       $100
3.  0002000       $2000
4.  6543210       $6,543,210
5.  7000000       $7,000,000
6.  9999999       $9,999,999

So, how do you do this?  The same concept applies whether you’re using PHP or any other language.

Part 1:  Ensure you store the number (integer or floating point) as a string when you add the document to the index:

function convertNumberToStringForRangeSearch($number)
{
    // removes decimal place (when only 2) eg 9.99 => 999
    $number = (int) $number * 100;
    // pads number with zeros eg 999 => 0000999
    return str_pad($number, 7, "0", STR_PAD_LEFT);
}

$doc = new Zend_Search_Lucene_Document();
// $value returns 0000999
$value = $this->convertNumberToStringForRangeSearch('9.99');
$doc->addField(Zend_Search_Lucene_Field::Text('price', $value));
$this->index->addDocument($doc);

Part 2:  Ensure when you construct your query that you convert the number to a string.  The range query for documents with a ‘price’ of value ‘9.99’ to ’20’ will now look like:

price:[0000999 TO 0002000]

When you’re constructing the above query, keep things easy and reuse the same method above! For example:

$luceneQuery = 'price:['
    .$this->convertNumberToStringForRangeSearch('9.99')
    .' TO '
    .$this->convertNumberToStringForRangeSearch('20')
    .']';

Just to be complete, the standard query will now look like this:

price:[0000999]

Footnote: I haven’t actually tried the code with the floating point number as my implementation was strictly integer related 😉

About the Author
Brett is the Lead Web Developer at BBC.com working on a number of products, such as the BBC International Homepage, News, Sport, Travel and the back-end work on the iPhone and iPad applications.

Advertisements
Posted in PHP, Zend Framework, Zend_Search_Lucene
5 comments on “Lucene Number Range Search – Integers & Floats (with Zend_Search_Lucene)
  1. César says:

    Thanks, helped me.

  2. salman says:

    hi Brett ! its works great for both int and float but the only catch is decimal points where it don’t work e.g if you search for price value $1 to $5 than its also return values like $5.1 up $5.9 which mean if you are trying to search in cents e.g $4.2 to $4.5 than it also return values like 4.0 , 4.1 , 4.2 ,4.3 and so on upto 4.9 . i hope you got the idea .i will be glad if you notify me any progress regarding this .
    sincerely salman .
    by the way thanks alot for this post you have saved me alot of time 🙂

  3. Darshita says:

    Is this working for Zend Search Lucene also? Because its not working for me.

  4. Mark says:

    Absolute lifesaver, thx!

  5. Slawoj says:

    as Mark said, absolute lifesaver, thank you very much 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: