Ranking & Ranking Algorithms and Relevance Feedback
Ranking of query is one of the fundamental problems in information retrieval (IR),^{}the scientific/engineering discipline behind search engines. ^{}Given a query q and a collection D of documents that match the query, the problem is to rank, that is, sort, the documents in D according to some criterion so that the “best” results appear early in the result list displayed to the user. Ranking in terms of information retrieval is an important concept in computer science and is used in many different applications such as search engine queries and recommender systems.A majority of search engines use ranking algorithms to provide users with accurate and relevant results.^{}
Ranking models
Ranking functions are evaluated by a variety of means; one of the simplest is determining the precision of the first k topranked results for some fixed k; for example, the proportion of the top 10 results that are relevant, on average over many queries.
IR models can be broadly divided into three types: Boolean models or BIR, Vector Space Models, and Probabilistic Models.^{} Various comparisons between retrieval models can be found in the literature.^{}
Boolean Models
Boolean Model or BIR is a simple baseline query model where each query follows the underlying principles of relational algebra with algebraic expressions and where documents are not fetched unless they completely match with each other. Since the query is either fetch the document (1) or doesn’t fetch the document (0), there is no methodology to rank them.
Vector Space Model
Since the Boolean Model only fetches complete matches, it doesn’t address the problem of the documents being partially matched. The Vector Space Model solves this problem by introducing vectors of index items each assigned with weights. The weights are ranged from positive (if matched completely or to some extent) to negative (if unmatched or completely oppositely matched) if documents are present. Term Frequency – Inverse Document Frequency (tfidf) is one of the most popular techniques where weights are terms (e.g. words, keywords, phrases etc.) and dimensions is number of words inside corpus.
The similarity score between query and document can be found by calculating cosine value between query weight vector and document weight vector using cosine similarity. Desired documents can be fetched by ranking them according to similarity score and fetched top k documents which has the highest scores or most relevant to query vector.
Probabilistic Model
In probabilistic model, probability theory has been used as a principal means for modeling the retrieval process in mathematical terms. The probability model of information retrieval was introduced by Maron and Kuhns in 1960 and further developed by Roberston and other researchers. According to Spack Jones and Willett (1997): The rationale for introducing probabilistic concepts is obvious: IR systems deal with natural language, and this is too far imprecise to enable a system to state with certainty which document will be relevant to a particular query.
Evaluation Measures
The most common measures of evaluation are precision, recall, and fscore. They are computed using unordered sets of documents. These measures must be extended, or new measures must be defined, in order to evaluate the ranked retrieval results that are standard in modern search engines. In a ranked retrieval context, appropriate sets of retrieved documents are naturally given by the top k retrieved documents. For each such set, precision and recall values can be plotted to give a precisionrecall curve.^{[14]}
Precision
Precision measures the exactness of the retrieval process. If the actual set of relevant documents is denoted by I and the retrieved set of documents is denoted by O, then the precision is given by:
Recall
Recall is a measure of completeness of the IR process. If the actual set of relevant documents is denoted by I and the retrieved set of documents is denoted by O, then the recall is given by:
F1 Score
F1 Score tries to combine the precision and recall measure. It is the harmonic mean of the two. If P is the precision and R is the recall then the FScore is given by:

Page Rank Algorithm
The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on the links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value. The formulae is given below:
i.e. the PageRank value for a page u is dependent on the PageRank values for each page v contained in the set B_{u} (the set containing all pages linking to page u), divided by the amount L(v) of links from page v.
Add Comment