WCI Search Overload Syndrome
Recently, one of our customers asked us to help them improve the search experience for their 6.5 portal users, and before I could even think about the problem, the first thing that popped in my mind was this:
http://www.youtube.com/watch?v=i1AwFY6MuwE&NR=1
This commercial is so telling of the search-overloaded world we live in, where we are constantly searching for information since the results are so handy. On some days, I even feel like Nicolas Carr and want to throw out my laptop! Well ... may be after a few more searches.
After "researching" this syndrome on the internet, I came back to focus on this customer's issue. In general, their portal users were finding that relevant content was not showing up at all or was showing up further down in the list of search results than expected. While most of the issues were solved by repairing the search index and tuning other search features, there was still one quirky search scenario which was still returning unexpected results to their users when they performed a multi-word search.
For instance, if a portal user searched for 'Lincoln Abraham' instead of 'Abraham Lincoln', the search results returned something like this:
1. Biography of Lincoln, Abraham
2. 16th President: Lincoln, Abraham
...
12. Abraham Lincoln
Most users of Google and other search engines would expect result #12 to show up first. In WCI, we could set a best bet to force #12 to show up first in the search results, but there were too many cases like this for this customer. The answer to this problem lies in the algorithm used by the portal for the default search mode. I recalled that I had come across this algorithm in a document some years ago but couldn't remember where. Ironically, it could not be found by searching the internet, but on my backup drive. It was actually in the Search Server section of the good ole Plumtree 4.5WS Deployment Guide, which I have no idea why I still have saved but just thankful I did. I'm not sure if or where this is in the current documentation, but I'll save you some trouble by providing the section here: Plumtree Search Engine
Before I jump into showing how to fix the problem, let's quickly review the seemingly simple search functionality of the portal. I'm sure you're saying: "What's there to review? ... I know how to search ... I just type in some search terms and click the search button". While that's true, and that's what most of your portal users are doing, there are more options available there to narrow the search results, which are definitely useful!
Including the default mode that most users use, the portal search actually supports the following 3 modes for queries:
- Default Mode - list of strings without any reserved characters/words from the other 2 modes listed below.
- Internet Mode – using include and exclude operators noted with '+' and '-' operators. The '+' operator requires that word in the search result, and the '-' operator returns results without the corresponding term.
- Query Operator Mode – uses a set of reserved search keywords to create complex query combinations. Here's a list of the reserved words :
- AND , OR, NOT
- NEAR , NEAR/ - use to find words that are near each other in a phrase. Default proximity is 25 words, but NEAR/10 can be used to check 10 words apart instead
- <ORDER> -
- <WORD> - find exact match on word with case-sensitivity
- <ALL>, <ANY>
If the search server gets a query string which contains a reserved term for the latter 2 modes, then it uses that corresponding mode automatically to conduct the search. Otherwise, if there's no reserved word, then it uses the default mode. All modes support the use of wildcard character (*) and the use of double quotes for phrase-search (i.e., "President Abraham Lincoln"). Now that we've covered the query operator mode, it'll make it easier to understand the algorithm for the default mode. To illustrate the algorithm for this default mode, we'll use the same example as before. When the user searched for “Lincoln Abraham”, the search server interprets this as a default mode query since there are no operators in the query string. Here's the actual 3-part algorithm that the search server uses to perform the default-mode query:
NOTE: The first part of the query ranks higher, so the results for part 1 will show up in the search results before the results for part 2.
So, in other words, the default search mode is just a fixed query operator mode search under the covers. With this revelation about the default search behavior, the question still lingers ... How do we fix this customer's problem? Well, after some testing, it turns out that default search was causing problems for this customer due to the <ORDER> keyword in part 2 of the default mode algorithm. And, there are a couple of solutions for fixing this:
Option 1: Upgrade to 10gR3 , which doesn't seem to have this problem
Option 2: Write a custom Programmable Event Interface that modifies the query string (see PEI in the WCI UI Customization Guide)
Option 3: Replace the default search in the layout with custom search banner code
Both options 2 & 3 will need to convert the query from a default mode one to a custom, query operator mode before submitting to the search server. The custom query algorithm will be the same as the default mode without the <ORDER> operator. For this customer using WCI 6.5, option 3 actually gives us the simplest and least intrusive way to fix this issue. We did not end up implementing the solution for the customer as they chose to do this on their own. But, here's a quick stab at how the code would be put together:
In the base page layout file, we find the existing search UI and replace it with HTML like this:
Also, the psuedo-code for the javascript method mentioned above (customizeSearchString) would be something like:
1) Get the current value of the fake_q_id text field
2) Token-ize the text string entered by the user in the fake_q_id field
3) Don't convert if there's only 1 token (single-word query) or if any of the tokens are a reserved word for search
3) Using the tokens, construct a replacement query that doesn't use the <ORDER> keyword.
4) Set the constructed query to the hidden q_id field
And that's it! This javascript function essentially performs the conversion of the default mode query to the custom one we needed. As for the customer we had supported, they have implemented a similar solution and the pilot users have been pleased! At the end of the day, it's all about making the portal as intuitive and as productive as possible for your users. And now that we understand the portal's search functionality better, it should make it easier for us to do exactly that.
Happy searching :)
- Log in to post comments
Comments
mattchiste on March 29, 2010