Natural Language Generation (NLG)
is the natural language processing
task of generating natural language
from a machine representation system such as a knowledge base
or a logical form
Some people view NLG as the opposite of natural language understanding. The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words.
The process to generate text can be as simple as keeping a list of canned text that is copied and pasted, possibly linked with some glue text. The results may be satisfactory in simple domains such as horoscope machines or generators of personalised business letters. However, a sophisticated NLG system needs to include stages of planning and merging of information to enable the generation of text that looks natural and does not become repetitive. Typical stages are:
Content determination: Determination of the salient features that are worth being said. Methods used in this stage are related to data mining.
Discourse planning: Overall organisation of the information to convey.
Sentence aggregation: Merging of similar sentences to improve readability and naturalness. For example, the sentences "The next train is the Caledonian Express" and "The next train leaves Aberdeen at 10am" can be aggregated to form "The next train, which leaves at 10am, is the Caledonian express".
Lexicalisation: Putting words to the concepts.
Referring expression generation: Linking words in the sentences by introducing pronouns and other types of means of reference.
Syntactic and morphological realisation: This stage is the inverse of parsing: given all the information collected above, syntactic and morphological rules are applied to produce the surface string.
Orthographic realisation: Matters like casing, punctuation, and formatting are resolved.
The most successful applications of NLG technology to date have been data-to-text
systems. Such systems generate textual summaries of numeric and other non-linguistic data; they combine NLG and data analysis
. For example, a number of systems have been built which automatically generate textual weather forecasts from numerical weather prediction data.
The popular media has been especially interested in NLG systems which generate jokes
(see computational humor).
- - Bateman and Zock's list of NLG systems
- - SIGGEN list of NLG resources
- E Reiter and R Dale (2000). Building Natural Language Generation Systems. Cambridge University Press.
- SIGGEN - ACL Special Interest Group on Generation
- Introduction An open-ended review of the state of the art including many references
- HALogen - general-purpose natural language generation system
- KPML - general-purpose natural language generation system