Sunday, February 7, 2010

A Weird Little CouchDB Lucene Bug

‹prev | My Chain | next›

Yesterday, I was able to get all of my recipe search Cucumber scenarios passing. I am not quite done with couchdb-lucene though:
cstrom@whitefall:~/repos/eee-code$ cucumber ./features/site.feature:31
Feature: Site

So that I may explore many wonderful recipes and see the meals in which they were served
As someone interested in cooking
I want to be able to easily explore this awesome site

Scenario: Exploring food categories (e.g. Italian) from the homepage # ./features/site.feature:31
Given 25 yummy meals # features/step_definitions/site.rb:1
And 50 Italian recipes # features/step_definitions/recipe_search.rb:131
And 10 Breakfast recipes # features/step_definitions/recipe_search.rb:131
When I view the site's homepage # features/step_definitions/site.rb:87
And I click the Italian category # features/step_definitions/site.rb:131
Then I should see 20 results # features/step_definitions/recipe_search.rb:260
expected following output to contain a <table td a/> tag:
...
<p class="no-results">
No results matched your search. Please refine your search
</p>

...
(Spec::Expectations::ExpectationNotMetError)
./features/step_definitions/recipe_search.rb:261:in `/^I should see (\d+) results$/'
./features/site.feature:38:in `Then I should see 20 results'
And I should see 2 pages of results # features/step_definitions/recipe_search.rb:264
When I click the site logo # features/step_definitions/site.rb:112
Then I should see the homepage # features/step_definitions/site.rb:163
And I click the Breakfast category # features/step_definitions/site.rb:131
Then I should see 10 results # features/step_definitions/recipe_search.rb:260
And I should see no more pages of results # features/step_definitions/site.rb:167

Failing Scenarios:
cucumber ./features/site.feature:31 # Scenario: Exploring food categories (e.g. Italian) from the homepage

1 scenario (1 failed)
12 steps (1 failed, 6 skipped, 5 passed)
0m5.227s
I am using couchdb-lucene to present a list of recipes in a given category (Italian, Vegetarian, etc.). The categories are presented as part of the master layout, which is why this particular scenario is part of the overall "site" feature. At any rate, to resolve this failure, I should only need to add categories to my couchdb-lucene design document:
{
"_id": "_design/recipes",
"_rev": "1-93d99ffe0bddccd4b1aa521f3a569a50",
"fulltext": {
"all": {
"index": "function(rec) { // couchdb-lucene index code here }",
"analyzer": "perfield:{default:\"porter\"}"
}
}
}
Something like this ought to do it:
function(rec) {
if (rec.type == 'Recipe' && rec.published) {
var doc = new Document();

// Other indexing code
doc.add((rec['tag_names'] || []).join(' '), {"field":"categoy"});

return doc;
}
}
If a recipe is both Italian and vegetarian, then it will have two entries in the tag_names array. I join them in the index function so that "italian vegetarian" will be added to the index (Lucene will tokenize into individual terms first). With that I ought to be done:
cstrom@whitefall:~/repos/eee-code$ cucumber ./features/site.feature:31
Feature: Site

So that I may explore many wonderful recipes and see the meals in which they were served
As someone interested in cooking
I want to be able to easily explore this awesome site

Scenario: Exploring food categories (e.g. Italian) from the homepage # ./features/site.feature:31
Given 25 yummy meals # features/step_definitions/site.rb:1
And 50 Italian recipes # features/step_definitions/recipe_search.rb:131
And 10 Breakfast recipes # features/step_definitions/recipe_search.rb:131
When I view the site's homepage # features/step_definitions/site.rb:87
And I click the Italian category # features/step_definitions/site.rb:131
Then I should see 20 results # features/step_definitions/recipe_search.rb:260
expected following output to contain a <table td a/> tag:
...
<p class="no-results">
No results matched your search. Please refine your search
</p>
...
(Spec::Expectations::ExpectationNotMetError)
./features/step_definitions/recipe_search.rb:261:in `/^I should see (\d+) results$/'
./features/site.feature:38:in `Then I should see 20 results'
And I should see 2 pages of results # features/step_definitions/recipe_search.rb:264
When I click the site logo # features/step_definitions/site.rb:112
Then I should see the homepage # features/step_definitions/site.rb:163
And I click the Breakfast category # features/step_definitions/site.rb:131
Then I should see 10 results # features/step_definitions/recipe_search.rb:260
And I should see no more pages of results # features/step_definitions/site.rb:167

Failing Scenarios:
cucumber ./features/site.feature:31 # Scenario: Exploring food categories (e.g. Italian) from the homepage

1 scenario (1 failed)
12 steps (1 failed, 6 skipped, 5 passed)
1m2.832s
Bah! That is the exact same failure I had before. What's worse is that it is now taking over a minute to fail where before it was 5 seconds. I am sure that worked in couchdb-lucene 0.2, so what gives? Checking out the couchdb-lucene log, I find a whole bunch of:
...
2010-02-07 15:27:35,610 WARN [http://localhost:5984//eee-test/recipes/all] 2009-04-22-italian_50 caused TypeError: Cannot find default value for object. (unnamed script#27)
2010-02-07 15:27:35,625 WARN [http://localhost:5984//eee-test/recipes/all] 2009-04-22-breakfast_1 caused TypeError: Cannot find default value for object. (unnamed script#27)
2010-02-07 15:27:35,642 WARN [http://localhost:5984//eee-test/recipes/all] 2009-04-22-breakfast_2 caused TypeError: Cannot find default value for object. (unnamed script#27)
2010-02-07 15:27:35,658 WARN [http://localhost:5984//eee-test/recipes/all] 2009-04-22-breakfast_3 caused TypeError: Cannot find default value for object. (unnamed script#27)
...
Ew.

I am not quite sure how I could be causing "Type Errors". First thing to check is the type of tag_names / categories array:
function(rec) {
if (rec.type == 'Recipe' && rec.published) {
var doc = new Document();

// Other indexing code
log.info("typeof: " + typeof(rec['tag_names']));
log.info("constructor: " + rec['tag_names'].constructor);
log.info("instanceof Array?: " + (rec['tag_names'] instanceof Array));


doc.add((rec['tag_names'] || []).join(' '), {"field":"categoy"});

return doc;
}
}
For an array like rec['tag_names'], the typeof function should return "object", the constructor attribute should be "Array", and it ought to be an instance of Array. What I find is after re-running the Cucumber scenario is:
2010-02-07 15:50:02,570 INFO [JSLog] typeof: object
2010-02-07 15:50:02,570 INFO [JSLog] constructor: undefined
2010-02-07 15:50:02,570 INFO [JSLog] instanceof Array?: false
2010-02-07 15:50:02,572 WARN [http://localhost:5984//eee-test/recipes/all] 2009-04-22-breakfast_10 caused TypeError: Cannot find default value for object. (unnamed script#35)
Odd, the step definition that is responsible for creating these recipes is adding tag names as arrays:
Given /^(\d+) (.+) recipes$/ do |count, keyword|
date = Date.new(2009, 4, 22)

(1..count.to_i).each do |i|
permalink = date.to_s + "-" + keyword.downcase.gsub(/\W/, '_') + "_" + i.to_s

recipe = {
:title => "#{keyword} recipe #{i}",
:summary => "recipe summary",
:date => date,
:preparations => [
{ 'ingredient' => { 'name' => 'ingredient' } }
],
:tag_names => [keyword.downcase],
:type => 'Recipe',
:published => true
}

RestClient.put "#{@@db}/#{permalink}",
recipe.to_json,
:content_type => 'application/json'
end
end
Something is going wrong, but I do not think that it in my code—especially since this same code worked with couchdb-lucene 0.2. I doubt I'll be able to figure a bad JSON array mapping into Spidermonkey (or couchdb-lucene—not sure where the problem really is). I will submit a bug report, but first... a workaround.

I may not have an Array here, but I do have an object. I can iterate over the properties in the object, which is an awful lot like treating it as an Array:
function(rec) {
if (rec.type == 'Recipe' && rec.published) {
var doc = new Document();

// Other indexing code

for (var i=0; i < rec['tag_names'].length; i++) {
doc.add(rec['tag_names'][i], {"field":"category"});
}

return doc;
}
}
That workaround actually works:
cstrom@whitefall:~/repos/eee-code$ cucumber ./features/site.feature:31
Feature: Site

So that I may explore many wonderful recipes and see the meals in which they were served
As someone interested in cooking
I want to be able to easily explore this awesome site

Scenario: Exploring food categories (e.g. Italian) from the homepage # ./features/site.feature:31
Given 25 yummy meals # features/step_definitions/site.rb:1
And 50 Italian recipes # features/step_definitions/recipe_search.rb:131
And 10 Breakfast recipes # features/step_definitions/recipe_search.rb:131
When I view the site's homepage # features/step_definitions/site.rb:87
And I click the Italian category # features/step_definitions/site.rb:131
Then I should see 20 results # features/step_definitions/recipe_search.rb:260
And I should see 2 pages of results # features/step_definitions/recipe_search.rb:264
When I click the site logo # features/step_definitions/site.rb:112
Then I should see the homepage # features/step_definitions/site.rb:163
And I click the Breakfast category # features/step_definitions/site.rb:131
Then I should see 10 results # features/step_definitions/recipe_search.rb:260
And I should see no more pages of results # features/step_definitions/site.rb:167

1 scenario (1 passed)
12 steps (12 passed)
0m6.870s
After another quick fix, I am down to 3 failing scenarios—all are tied to a reduce for which I deservedly caught grief last year. I am not quite sure how to deal with that other than disabling the reduce limit in the CouchDB config. I will mull that over and tackle the solution tomorrow.

Day #7

No comments:

Post a Comment