Syntax
Path Expressions
/Root node (absolute path start)
/bookstore/bookDirect child selection
//bookSelect all book nodes anywhere
.Current context node
..Parent of current node
@langAttribute named lang
node()Any node of any type
*Any element node
@*Any attribute
Basic Examples
/html/body/div # absolute path to
//input[@type='text'] # all text inputs //div[@class='main']/* # children of div.main //a/@href # all href attributes
Combining Paths
//book/title | //book/price # union of two paths //h1 | //h2 | //h3 # multiple element types
Axes
Axis Directions
child::Direct children (default axis)
parent::Direct parent
ancestor::All ancestors to root
ancestor-or-self::Ancestors + current node
descendant::All descendants
descendant-or-self::Descendants + current node
following::All nodes after current in document
following-sibling::Siblings after current
preceding::All nodes before current in document
preceding-sibling::Siblings before current
self::Current node only
attribute::Attributes of current node
namespace::Namespace nodes
Axis Examples
//div/child::p #

children of

//td/parent::tr # parent of //h2/following-sibling::p #

after

//li/ancestor::ul #
    containing

Predicates
Filtering with Predicates
//book[1] # first book element //book[last()] # last book element //book[position() < 3] # first two books //book[@lang='en'] # books with lang="en" //book[price > 30] # books with price > 30
Predicate Patterns
[n]Element at position n (1-based)
[last()]Last element
[last()-1]Second to last
[@attr]Has attribute
[@attr='val']Attribute equals value
[element]Has child element
[element='text']Child element contains text
[not(@attr)]Does not have attribute
Chained Predicates
Functions
String Functions
contains(s, sub)True if s contains sub
starts-with(s, pre)True if s starts with pre
string-length(s)Length of string
normalize-space(s)Trim and collapse whitespace
concat(a, b, ...)Join strings
substring(s, pos, len)Extract substring (1-based)
translate(s, from, to)Character-by-character replace
Numeric Functions
sum(node-set)Sum of numeric values
count(node-set)Number of nodes
floor(n)Round down
ceiling(n)Round up
round(n)Round to nearest integer
number(val)Convert to number
Function Examples
//div[contains(@class, 'active')] //a[starts-with(@href, 'https')] //p[string-length(text()) > 100] //ul[count(li) > 5]
Operators
Comparison Operators
=Equal
!=Not equal
<Less than
<=Less than or equal
>Greater than
>=Greater than or equal
Logical & Arithmetic
andLogical AND
orLogical OR
not()Logical NOT (function)
+Addition
-Subtraction
*Multiplication
divDivision
modModulus
|Union of node sets
Operator Examples
//book[price > 20 and price < 50] //item[@type='a' or @type='b'] //span[not(contains(@class, 'hidden'))]
Node Tests
Node Types
node()Any node (element, text, comment, PI)
text()Text node only
comment()Comment node only
processing-instruction()Processing instruction node
*Any element node
@*Any attribute node
element-nameElement with specific name
Node Test Examples
//p/text() # text content of

//div/comment() # comments inside

//body/node() # all child nodes of //div/* # all element children of
Boolean Functions
true()Boolean true
false()Boolean false
boolean(expr)Convert to boolean
not(expr)Negate boolean
lang(code)True if node's lang matches
Abbreviations
Short vs Long Form
(none)child:: (default axis)
@attribute::
///descendant-or-self::node()/
.self::node()
..parent::node()
[n][position()=n]
Abbreviated Examples
# These pairs are equivalent: child::div → div attribute::href → @href /descendant-or-self::node()/p → //p self::node() → . parent::node() → ..
Common Abbreviated Patterns
//div[@id='main'] # div with id="main" //table//td # all in any ../sibling # sibling via parent .//span # span descendants of context
Common Patterns
Web Scraping / Testing
//input[@name='username'] # form input by name //button[text()='Submit'] # button by text //div[contains(@class, 'error')] # element by partial class //a[contains(@href, 'login')] # link by partial href
Conditional Selection
//div[@class='item'][.//span[@class='price']] //tr[td[1]='Active'] # row where 1st cell = Active //*[contains(text(), 'Warning')] # any element with text
XPath in Python (lxml)
from lxml import html tree = html.fromstring(page_content) links = tree.xpath('//a/@href') titles = tree.xpath('//h2/text()')
XPath in Selenium
driver.find_element(By.XPATH, "//input[@id='search']") driver.find_elements(By.XPATH, "//li[@class='result']")