Information Retrieval and Related Applications. TF/IDF, Cosine Similarity.
TF: number of times term t occurs in document (or alternative: number of terms divided by length of document)
IDF: logarithm of number of documents (in corpus) divided by number of documents containing term t
Cosine of the Angle Between the Vectors. Range is [0, 1]. The higher the value, the more similar the vectors.