The
factored language model (
FLM) is an extension of a conventional
language model. In an FLM, each word is viewed as a vector of
k factors:
. An FLM provides the probabilistic model
where the prediction of a factor
is based on
parents
. For example, if
represents a word token and
represents a
Part of speech tag for English, the expression
gives a model for predicting current word token based on a traditional
Ngram model as well as the
Part of speech tag of the previous word.
A major advantage of factored language models is that they allow users to specify linguistic knowledge such as the relationship between word tokens and Part of speech in English, or morphological information (stems, root, etc.) in Arabic.
Like N-gram models, smoothing techniques are necessary in parameter estimation. In particular, generalized back-off is used in training an FLM.
References