Information Retrieval

Information Retrieval and Related Applications. TF/IDF, Cosine Similarity.

TF/IDF

Term Frequency (TF)

TF: number of times term t occurs in document (or alternative: number of terms divided by length of document)

Inverse Document Frequency (IDF)

IDF: logarithm of number of documents (in corpus) divided by number of documents containing term t

TF-IDF

Cosine Similarity

Cosine of the Angle Between the Vectors. Range is [0, 1]. The higher the value, the more similar the vectors.

Example

Last updated