RFC 5234 supersedes RFC 4234 (which superseded RFC 2234).
An ABNF specification is a set of derivation rules, written as
rule = definition ; comment CR LF
where rule is a case-sensitive nonterminal, the definition consists of sequences of symbols that define the rule, a comment for documentation, and ending with a carriage return and line feed.
Rule names are case insensitive:
all refer to the same rule. Rule names consist of a letter followed by letters, numbers, and hyphens.
Angle brackets (“
>”) are not required around rule names (as they are in BNF). However they may be used to delimit a rule name when used in prose to discern a rule name.
ABNF is encoded in ASCII (which is seven bits) in an eight-bit field with the high bit set to zero.
Terminals are specified by one or more numeric characters.
Numeric characters may be specified as the percent sign “
%”, followed by the base (b = binary, d = decimal, and x = hexadecimal), followed by the value, or concatenation of values (indicated by “
.”). For example a carriage return is specified by
%d13 in decimal or
%x0D in hexadecimal. A carriage return followed by a line feed may be specified with concatenation as
Literal text is specified through the use of a string enclosed in quotation marks (
"). These strings are case-insensitive and the character set used is (US-)ASCII. Therefore the string “abc” will match “abc”, “Abc”, “aBc”, “abC”, “ABc”, “AbC”, “aBC”, and “ABC”. For a case-sensitive match the explicit characters must be defined: to match “aBc” the definition will be
%d97 %d66 %d99.
A rule may be defined by listing a sequence of rule names.
To match the string “aba” the following rules could be used:
fu = %x61 ; a
bar = %x62 ; b
mumble = fu bar fu
Rule1 / Rule2
A rule may be defined by a list of alternative rules separated by a solidus ("
To accept the rule fu or the rule bar the following rule could be constructed:
fubar = fu / bar
Rule1 =/ Rule2
Additional alternatives may be added to a rule through the use of “
=/” between the rule name and the definition.
ruleset = alt1 / alt2 / alt3 / alt4 / alt5
is equivalent to
ruleset = alt1 / alt2
ruleset =/ alt3
ruleset =/ alt4 / alt5
A range of numeric values may be specified through the use of a hyphen (“
OCTAL = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7"
is equivalent to
OCTAL = %x30-37
Elements may be placed in parentheses to group rules in a definition.
To match “elem fubar snafu” or “elem tarfu snafu” the following rule could be constructed:
group = elem (fubar / tarfu) snafu
To match “elem fubar” or “tarfu snafu” the following rules could be constructed:
group = elem fubar / tarfu snafu
group = (elem fubar) / (tarfu snafu)
To indicate repetition of an element the form
<a>*<b>element is used. The optional
<a> gives the minimum number of elements to be included with the default of 0. The optional
<b> gives the maximum number of elements to be included with the default of infinity.
*element for zero or more elements,
1*element for one or more elements, and
2*3element for two or three elements.
To indicate an explicit number of elements the form
<a>element is used and is equivalent to
2DIGIT to get two numeric digits and
3DIGIT to get three numeric digits. (DIGIT is defined below under 'Core rules'. Also see zip-code in the example below.)
To indicate an optional element the following constructions are equivalent:
A semicolon (“
;”) starts a comment that continues to the end of the line.
Use of the alternative operator with concatenation may be confusing and it is recommended that grouping be used to make explicit concatenation groups.
|ALPHA||%x41-5A / %x61-7A||Upper- and lower-case ASCII letters (A–Z, a–z)|
|DIGIT||%x30-39||Decimal digits (0–9)|
|HEXDIG||DIGIT / "A" / "B" / "C" / "D" / "E" / "F"||Hexadecimal digits (0–9, A–F, a–f)|
|WSP||SP / HTAB||space and horizontal tab|
|LWSP||*(WSP / CRLF WSP)||linear white space (past newline)|
|VCHAR||%x21-7E||visible (printing) characters|
|CHAR||%x01-7F||any ASCII character, excluding NUL|
|OCTET||%x00-FF||8 bits of data|
|CTL||%x00-1F / %x7F||controls|
|CRLF||CR LF||Internet standard newline|
|BIT||"0" / "1"||binary digit|
postal-address = name-part street zip-part
name-part = *(personal-part SP) last-name [SP suffix] CRLF name-part =/ personal-part CRLF
personal-part = first-name / (initial ".") first-name = *ALPHA initial = ALPHA last-name = *ALPHA suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X"))
street = [apt SP] house-num SP street-name CRLF apt = 1*4DIGIT house-num = 1*8(DIGIT / ALPHA) street-name = 1*VCHAR
zip-part = town-name "," SP state 1*2SP zip-code CRLF town-name = 1*(ALPHA / SP) state = 2ALPHA zip-code = 5DIGIT ["-" 4DIGIT]
RFC 5234 adds a warning in conjunction to the definition of LWSP as follows:
; Use of this linear-white-space rule
; permits lines containing only white
; space that are no longer legal in
; mail headers and have caused
; interoperability problems in other
; Do not use when defining mail
; headers and use with caution in
; other contexts.