User:Jnweiger/Wiki search

Jump to: navigation, search

Written July 19, 2010, in the hope it is outdated soon.

google search prototype

Matt Ehle established http://en.opensuse.org/MediaWiki:GoogleSearch. This does exactly what I'd exepct from a decent search engine. The site is marked as a non-profit, so the advertising is gone now.

The Google search is more of a temporary solution until we can get Lucene in play. We have some technical barriers to getting Lucene up and running (Matt is the driving force here again. Thanks) BNC#625677

Why the default wiki search sucks

There is a number of complaints on our mailing lists.

Examples of issues

  • Try to find the e-mail address of an openSUSE user. You can e.g. search for a Board member like Alan Clark, who is definitly not hiding himself. Not even the SUSE internal tel* tools can find his e-mail address.
  • Search for 'SUSE_ASNEEDED' nothing is found, as the one page /Packaging/Fixing has not been migrated. The google search also includes the old-en wiki, and locates the page.
    --> Issues:
    • A user would not know that there is an old wiki with lots of valuable contents.
    • SUSE_ASNEEDED is more often the answer than the question. I'd like to get some pointers in the right direction when searching 'linker library order symbol lookup "undefined reference to"'


  • Search for 'video'. It falsely claims e.g.
    No page title matches
    There is no page titled "video". You can create this page.

    --> Issues:
    • The wording suggests this is a general truth. No hint, that the list of selected namespaces make any difference.
    • "I know that a page titled "video" exists, please tell me in which namespace it is." This type of question cannot be asked. (Answer in this case: 'HCL:Video').
  • search for "nvidia" and note the first article shown there is something about compiz without the proper one being in sight.
    --> Issues:
    • User might give up browsing the results too early.
    • User might get distracted with a result that appears 'good enough'.
  • I looked for 'Ambassadors' with no result. And the small notification ..only some name spaces are searched is especially tricky, the more you have a user in front of the wiki.
    --> Issues:
    • A user might not know "What's a name space?"
    • A user might not know "what's the difference between Main and openSUSE?"
    • "etc."
    • The need of teaching this to our users is an entry barrier.
  • The official workaround is to create a portal page containing all the words that a user might want to search for.
    --> Issues:
    • It is hard to predict, what a user might want to search. We could try to use the past (list of failed searches) as a predictor for the future, but this low fun work and never ending.
    • It will cause proliferation of portal pages, once this concept is well known. Thus spoiling the effect of reduced hit count in the long run. (Okay, I am a pessimist here).
  • Search for page titles need to be exact. 'bugreport', 'bug report', 'report a bug' all fail to match a page title, only 'report a Bug' currently succeeds.
    --> Issues:
    • User is guided to a random page content match, and may believe we have nothing better.
    • User may learn over time that the search engine 'randomly fails' without learning how to avoid this situation.

Suggestions

  • Work on Page title matches.
    • Allow page title matches on all namespaces per default
    • Review the code if we can easily allow substring, case insensitive matches; or even similarity based matches.
  • Work on feeding relevance data into the search engine.
    • Associate each namespace with a default relevance.
    • some metrics of hit count (this month and ever since.)
    • some metrics of edit frequency,
    • Nr. of inbound and outbound links.
    • Nr. of different contributors.
    • Nr. of stars in some rating.
  • work on search algorithms
    • allow for typos and run-together/split words with a small penalty in relevance. (ouch, that is hard)