🔍 Search Engine

Developed a search engine system in a team for a university catalogue and a corpus of Reuters articles. Implemented various search algorithms from the ground up, including live query completion, index construction, and query expansion.

Introduction

For CSI4107 (Information Retrieval), I developed a search engine and various retrieval modules. This project was completed in a team of two.

The search engine was able to search over two distinct corpora: uOttawa courses catalogue, and Reuter’s articles. Different models could be used to store the corpora, either a Vector Space model, or a Boolean model. As the user enters their query, live suggestions are fed to them to help complete their query. After sending a query, related topics are suggested for expansion.

Demo

Below is a demonstration of the final application.


Modules

Query completion module

This module leverages the work done in the bigram language module in an effort to provide explicit query completion suggestions to the user. Using the bigram index, we only save the top 3 words, which we then present to the user in the UI. The word suggestions appear in a separate widget, and clicking on them will add that term to your query (as well as to the query textbox).

Global query expansion using WordNet

This module attempts to perform query expansion using an external source, WordNet. Thankfully, Python’s natural language toolkit provides a simple API for interacting with WordNet, so it was quite trivial to wire it into our project. We chose to give the user the option of applying the expansion for most cases. The only time we don’t is when the number of synonyms is large (> 10), since at that point we will sacrifice too much precision. We hypothesize that this provides the user’s with a good amount of insight into how their query is behaving while giving them the option to opt-out of this behaviour should they feel the need. We also ensured that the expansion works using both the boolean and VSM model. Indeed, the terms are expanded before ever being transformed into a search expression.

UI Improvements

While this was not a module, per se, it was still a lot of effort, as we had to accommodate a number of new workflows and elements. In particular, we had to create widgets for query expansion, as well as confirmation dialogs informing users of implicit expansions as well. We also needed a mechanism for users to provide relevance feedback, both declaring something relevant and undeclaring it, should the need arise. Finally, we needed to add the topic filtering selection box and the requisite logic to limit search results for Reuters articles when used.