Overview
Every XSLT stylesheet you have written so far takes a single XML file as its input. Batch processing means applying the same transformation to an entire directory of XML files at once. This page covers the pure XSLT approach: declaring the input corpus from within the stylesheet itself using the collection() function, with no external configuration required.
There are two output shapes to know:
- Many-to-one: many XML input files produce a single HTML output. Useful for aggregate analysis — statistics, indexes, summaries across a whole corpus.
- Many-to-many: many XML input files each produce their own HTML output file. Useful for generating a set of individual pages — one per document — from a corpus.
Both share the same three-part infrastructure: a corpus variable using collection(), an xsl:initial-template entry point, and xsl:result-document to write output to named files. The only structural difference between the two shapes is whether xsl:result-document fires once or inside a loop.
Before running any batch stylesheet in oXygen, set the XML input dropdown to (None). If a document is selected there, it overrides the collection() variable declared inside the stylesheet. This is the most common source of errors in batch processing.
Variables and collection()
xsl:variable
A variable stores a value for reuse elsewhere in the stylesheet. XSLT variables are immutable — unlike Python variables, once declared the value cannot be changed. The basic syntax is:
<xsl:variable name="my-variable" as="xs:integer" select="42"/>
The name= attribute gives the variable its identifier; as= declares its type; select= provides its value. You reference a variable elsewhere in the stylesheet with a $ prefix: $my-variable.
For a corpus of XML documents the type is document-node()+:
- document-node() — the root node of a parsed XML document. This is the node above the root element — the same thing / refers to in XPath.
- + — one or more (the same quantifier as in RelaxNG schemas).
The collection() function
The collection() function takes a path to a directory and returns all matching XML documents in it as a sequence of document nodes.
Flat directory (no subdirectories)
<xsl:variable name="my-corpus" as="document-node()+"
select="collection('./my-collection?select=*.xml')"/>
The ?select=*.xml filter is important. Without
it, any non-XML file in the directory — a .DS_Store, a
README.md — produces a cryptic error. The
? signals that options follow;
*.xml matches any filename ending in
.xml.
Recursive directory (includes nested subdirectories)
<xsl:variable name="my-corpus" as="document-node()+"
select="collection('./my-collection?recurse=yes;select=*.xml')"/>
recurse=yes tells collection() to descend into subdirectories. Multiple options are separated by semicolons. This is useful when a corpus is organized into subfolders by author, date, genre, or any other grouping.
Querying across the corpus
Once the corpus is in a variable, XPath expressions can run across all documents at once. Reference the variable with a $ prefix:
<xsl:value-of select="count($my-corpus//item)"/>
The //item steps down through all document nodes in the sequence simultaneously — no loop needed for a simple aggregate count. For output that must be computed or written per document, use xsl:for-each to iterate over the corpus variable.
xsl:initial-template
Every stylesheet you have written so far begins by matching the root node of a single input document:
<xsl:template match="/">
...
</xsl:template>
That works because oXygen supplies a default input document via the XML dropdown. With collection(), there is no default input document — the corpus is declared inside the stylesheet, and the XML dropdown is set to (None). The root-match template therefore never fires.
The solution is xsl:initial-template: a named template the processor runs first, before any document matching:
<xsl:template name="xsl:initial-template">
<!-- entry point: query the corpus variable and write output here -->
</xsl:template>
Note that xsl:initial-template is both the attribute name and the value — this is intentional XSLT 3.0 syntax, not a typo. Named templates are not triggered by document processing; they run when explicitly called. In this case the processor calls it automatically at startup because of the special reserved name.
xsl:result-document
Normally a stylesheet writes all output to the primary output tree — the single file configured in the oXygen output dropdown. With batch processing you often want to control exactly where output goes. xsl:result-document lets you write to any named file:
<xsl:result-document href="output/myfile.html" method="html">
<!-- content to write to this file -->
</xsl:result-document>
The href is an AVT — the curly-brace
attribute value template syntax you already know — so the filename can be
computed dynamically. Saxon will create intermediate directories (such as
output/) if they do not already exist.
In a many-to-one stylesheet, xsl:result-document appears once with a hard-coded filename. In a many-to-many stylesheet, it appears inside an xsl:for-each loop so it fires once per input document.
Many-to-one output
The many-to-one pattern collects data from all documents in the corpus and writes a single HTML output — a summary page, a statistics table, an index. The xsl:result-document fires once, outside any loop, with a hard-coded filename.
Skeleton
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:output method="html" indent="yes"/>
<!-- 1. Declare the corpus variable -->
<xsl:variable name="my-corpus" as="document-node()+"
select="collection('./my-collection?select=*.xml')"/>
<!-- 2. Named entry point -->
<xsl:template name="xsl:initial-template">
<!-- 3. Write to a single named file -->
<xsl:result-document href="output/summary.html" method="html">
<html>
<head><title>Corpus Summary</title></head>
<body>
<h1>Corpus Summary</h1>
<!-- Corpus-level aggregate: no loop needed -->
<p>Total documents: <xsl:value-of
select="count($my-corpus)"/></p>
<p>Total items: <xsl:value-of
select="count($my-corpus//item)"/></p>
<!-- Per-document breakdown: iterate with for-each -->
<table>
<tr><th>Document</th><th>Item count</th></tr>
<xsl:for-each select="$my-corpus">
<xsl:sort select=".//title"/>
<tr>
<td><xsl:value-of select=".//title"/></td>
<td><xsl:value-of select="count(.//item)"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
Key points
- Corpus-level counts query $my-corpus directly — //item steps across all document nodes in the sequence simultaneously. No loop needed.
- Inside xsl:for-each, . is the current document node. Use .//item (not //item) to query within that document only. Without the leading ., XPath searches the entire corpus.
- xsl:sort takes select= (an expression), not match= (a pattern).
Many-to-many output
The many-to-many pattern produces one output file per input document. xsl:result-document moves inside an xsl:for-each loop so it fires once per iteration. The corpus variable and xsl:initial-template are identical to the many-to-one pattern.
Skeleton
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:output method="html" indent="yes"/>
<!-- 1. Declare the corpus variable -->
<xsl:variable name="my-corpus" as="document-node()+"
select="collection('./my-collection?select=*.xml')"/>
<!-- 2. Named entry point -->
<xsl:template name="xsl:initial-template">
<!-- 3. Loop over the corpus -->
<xsl:for-each select="$my-corpus">
<!-- 4. Write one file per document -->
<xsl:result-document href="output/doc_{position()}.html" method="html">
<html>
<head>
<title><xsl:value-of select=".//title"/></title>
</head>
<body>
<h1><xsl:value-of select=".//title"/></h1>
<xsl:apply-templates select=".//item"/>
</body>
</html>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<!-- 5. Match templates work exactly as in a single-document stylesheet -->
<xsl:template match="item">
<p><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>
Key points
- xsl:result-document is inside the loop: one call per iteration, one output file per document.
- All XPath expressions inside the loop should be scoped with a leading . to query only the current document, not the whole corpus.
- Match templates work exactly as they do in a single-document stylesheet. The push model is unchanged.
- Nothing is written to the primary output tree when all content goes through xsl:result-document. oXygen may warn about an empty result — this is not an error.
- If a document in the corpus does not contain the element you are querying, its output file is still produced but the relevant portion of the body will be empty. Batch processing does not guarantee every document contains what you are looking for.
Dynamic filenames
Using position() — simple and transparent
position() returns the position of the current node in the sequence being iterated. Inside an xsl:for-each over the corpus, it gives each document a unique number. Use it in an AVT inside the href:
<xsl:result-document href="output/doc_{position()}.html" method="html">
This produces doc_1.html, doc_2.html, etc. The
filenames are not descriptive of their content, but the mechanism is completely
transparent and easy to reason about.
Derived from the input filename — descriptive
To name each output file after its input file — so that
myfile.xml produces myfile.html — declare a variable
inside the loop that computes the filename from the input document's URI:
<xsl:variable name="filename"
select="substring-before(
tokenize(base-uri(), '/')[last()],
'.xml') || '.html'"/>
Then reference it in the href with an AVT:
<xsl:result-document href="output/{$filename}" method="html">
Breaking down the expression:
- base-uri() — returns the full file path of the current document as a string
- tokenize(..., '/')[last()] — splits the
path on
/and selects the last segment, giving the bare filename - substring-before(..., '.xml') — returns
everything before
.xml, stripping the extension without a regular expression - || '.html' — concatenates the new extension (|| is the XPath string concatenation operator)
Because this variable is declared inside the xsl:for-each loop, it is re-evaluated for each document in turn — each iteration gets its own value of $filename.
Quick reference
The two output patterns compared
| Many-to-one | Many-to-many | |
|---|---|---|
| Entry point | xsl:initial-template |
xsl:initial-template |
| Corpus variable | collection() |
collection() |
xsl:result-document |
once, hard-coded filename | inside loop, dynamic filename |
| Primary output tree | used by result-document | empty (warning, not error) |
xsl:for-each purpose |
one row or section per document | one output file per document |
Common errors
| Symptom | Cause | Fix |
|---|---|---|
Cryptic collection() error |
Non-XML file in the directory | Add ?select=*.xml to the path |
| Wrong output / stylesheet ignores corpus | XML input not set to (None) in oXygen | Set XML dropdown to (None) |
| XPath inside loop queries whole corpus | Missing leading . |
Use .//element not //element |
| Empty result warning in oXygen | Nothing written to primary output tree | Not an error — expected in many-to-many |
| Output file produced but body is empty | Document doesn't contain the queried element | Expected batch behavior; add xsl:if guard if needed |
collection() syntax
| Use case | Syntax |
|---|---|
| Flat directory, XML files only | collection('./my-dir?select=*.xml') |
| Recursive (nested subdirectories), XML only | collection('./my-dir?recurse=yes;select=*.xml') |