XPath (XML Path Language) is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values (strings, numbers, or boolean values) from the content of an XML document. The current version of the language is XPath 2.0, but because version 1.0 is still the more widely-used version, this article describes XPath 1.0.
The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as an XPath.
Originally motivated by a desire to provide a common syntax and behavior model between XPointer and XSLT, subsets of the XPath query language are used in other W3C specifications such as XML Schema and XForms.
An XPath expression is evaluated with respect to a context node. An Axis Specifier such as 'child' or 'descendant' specifies the direction to navigate from the context node. The node test and the predicate are used to filter the nodes specified by the axis specifier: For example the node test 'A' requires that all nodes navigated to must have label 'A'. A predicate can be used to specify that the selected nodes have certain properties, which are specified by XPath expressions themselves.
Two notations are defined; the first, known as abbreviated syntax, is more compact and allows XPaths to be written and read easily using intuitive and, in many cases, familiar characters and constructs. The full syntax is more verbose, but allows for more options to be specified, and is more descriptive if read carefully.
<A>
<B>
<C/>
</B>
</A>
the simplest XPath takes a form such as /A/B/Cwhich selects C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. XPath syntax is designed to mimic URI (Uniform Resource Identifier) syntax and file path syntax.
More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression
A//B/*[1] selects the first element ('[1]'), whatever its name ('*'), that is a child ('/') of a B element that itself is a child or other, deeper descendant ('//') of an A element that is a child of the current context node (the expression does not begin with a '/'). If there are several suitable B elements in the document, this actually returns a set of all their first children.
/child::A/child::B/child::C child::A/descendant-or-self::node()/child::B/child::*[position()=1]
Here, in each step of the XPath, the axis (e.g. child or descendant-or-self) is explicitly specified, followed by :: and then the node test, such as A or node() in the examples above
| Full Syntax | Abbreviated Syntax | Notes |
|---|---|---|
ancestor
| ||
ancestor-or-self
| ||
attribute
| @
| @abc is short for attribute::abc |
child
| xyz is short for child::xyz | |
descendant
| ||
descendant-or-self
| //
| // is short for /descendant-or-self::node()/ |
following
| ||
following-sibling
| ||
namespace
| ||
parent
| ..
| .. is short for parent::node() |
preceding
| ||
preceding-sibling
| ||
self
| .
| . is short for self::node() |
As an example of using the attribute axis in abbreviated syntax, //a/@href selects the attribute called href in a elements anywhere in the document tree.
The expression . (an abbreviation for self::node()) is most commonly used within a predicate to refer to the currently selected node.
For example, h3[.='See also'] selects an element called h3 in the current context, whose text content is See also.
gs has been defined, //gs:enquiry will find all the enquiry elements in that namespace, and //gs:* will find all elements, regardless of local name, in that namespace.Other node test formats are:comment() :finds an XML comment node, e.g. text() :finds a node of type text, e.g. the hello in processing-instruction() :finds XML processing instructions such as . In this case, processing-instruction('php') would match.node() :finds any node at all.
//a[@href='help.php'], which will match an a element with an href attribute whose value is help.php. There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context. All predicates must be satisfied for a match to occur.
When //a[/html/@lang='en'][@href='help.php'][1]/@target is applied to a XHTML document, it selects the value of the target attribute of the first a element that has its href attribute set to help.php, provided the document's html top-level element also has a lang attribute set to en. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself.
Predicate order is significant, however. Each predicate 'filters' a location step's selected node-set in turn. //a[1][/html/@lang='en'][@href='help.php']/@target will find a match only if the first a element in a @lang='en' document also meets @href='help.php'
The above uses //a[1] incorrectly where it should use (//a)[1]. See talk page.
The available operators are:
The function library includes:
Some of the more commonly useful functions are detailed below. For a complete description, see the W3C Recommendation document
true if s1 contains s2normalize-space(string?) :all leading and trailing whitespace is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have been prettyprint formatted, which could make further string processing unreliable.
=, !=, <=, <, >= and >. Boolean expressions may be combined with brackets () and the boolean operators and and or as well as the not() function described above. Numeric calculations can use *, +, -, div and mod. Strings can consist of any Unicode characters.//item[@price > 2*@discount] selects items whose price attribute is greater than twice the numeric value of their discount attribute.
Entire node-sets can be combined ('unioned') using the pipe character |. Node sets that meet one or more of several conditions can be found by combining the conditions inside a predicate with 'or'.
v[x or y] | w[z] will return a single node-set consisting of all the v elements that have x or y child-elements, as well as all the w elements that have z child-elements, that were found in the current context.
en.wikipedia.org de.wikipedia.org fr.wikipedia.org pl.wikipedia.org es.wikipedia.org en.wiktionary.org fr.wiktionary.org vi.wiktionary.org tr.wiktionary.org es.wiktionary.org
The XPath expression
/wikimedia/projects/project/@nameSelects name attributes for all projects, and
/wikimedia//editionsSelects all editions of all projects, and
/wikimedia/projects/project/editions/edition[@language="English"]/text()Selects addresses of all English Wikimedia projects (text of all edition elements where language attribute is equal to English), and the following
/wikimedia/projects/project[@name="Wikipedia"]/editions/edition/text()Selects addresses of all Wikipedias (text of all edition elements that exist under project element with a name attribute of Wikipedia)
C/C++
The Java package has been part of Java standard edition since Java 5. Technically this is an XPath API rather than an XPath implementation, and it allows the programmer the ability to select a specific implementation that conforms to the interface. JavaScript
