Goal
This document lists all available matchers of Apache Cocoon and
describes their purpose.
See also the concepts document
Using and Implementing
Matchers and Selectors.
Overview
Matchers are a core component of Cocoon. These powerful sitemap
components allow Cocoon to associate a pure
"virtual" URI space to a given set of instructions that describe
how to generate, transform and present the requested resource(s) to
the client.
Cocoon is driven by the client request. A request typically
contains a URI, some parameters, cookies, and much more. The
request, along with the Cocoon environment, is the entry
point to decide what will be the sitemap instructions to be
used. The mechanism to decide what will be the instruction
driving the Cocoon process for a given request
is based on matching a request element against a pattern given
as a matcher's parameter. If the match operation is successful
processing starts.
As an example, consider the following sitemap snippet:
 |  |  |
 |
<map:match pattern="body-faq.xml">
<map:generate src="xdocs/faq.xml"/>
<map:transform src="stylesheets/faq2document.xsl"/>
<map:transform src="stylesheets/document2html.xsl"/>
<map:serialize/>
</map:match>
<map:match pattern="body-**.xml">
<map:generate src="xdocs/{1}.xml"/>
<map:transform src="stylesheets/document2html.xsl"/>
<map:serialize/>
</map:match>
|  |
 |  |  |
Here the two sitemap entries are mapped to different virtual URIs using
the default matcher (based on a wildcard intepretation of the request
URI) in a different way: the first one
uses an exact match ("body-faq.xml"), meaning that only a Request URI
that exactly matches the string will result in a successful match. The
second one uses a wildcard pattern, meaning that every request
starting with "body-" and ending with ".xml" will satisfy the matcher's
requirement: thus requesting a resource such as "book-cocoon.xml"
would turn out in the sitemap matching the request and starting
the second pipeline.
Order
It's important to understand that Cocoon is based on a "first match"
approach. The request is matched against the different "map:match"
entries in the order in which they are specified in the sitemap: as soon
as a match is successful the pipeline is chosen and started. This means
that more specific patterns must appear before generic ones: in the
example above if the two pipelines were in a different order a request
for "body-faq.xml" would never work properly, since the generic
"book-**.xml" pattern would be matched first (this is a well known
concept especially in router and firewall configurations).
Tokenization
Another important feature of matchers is tokenization. Every "variable"
part of the pattern being matched will be kept in memory by Cocoon for
further reuse and will be available in the next sitemap instructions
as a numbered argument. This means that, using once again the previous
example, when a request URI such as "body-index.xml" comes in and the
second pipeline is choosen, the string that matches the "**" wildcard,
containing the value "index", is available in the sitemap as a parameter
identified by {1}. This string can be used as the parameter for the
generator which will evaluate the symbol resolving it to the string
"index" and look for a file named "xdocs/index.html".
Wildcard and regular expressions
Most of Cocoon matchers are built using two different techniques:
regular expressions and wildcards.
Regular expressions (or regexps) are a well known and powerful
system for pattern matching: learning to master them it's outside
the scope of this document, but there is a lot of documentation
available on the web regarding this topic.
While being so powerful, regexps can just be overkill for most of
typical Cocoon use cases where only simple matching operations
have to be performed. This is why Cocoon offers a simplified
pattern matching system based on a small set of very simple rules:
-
An asterisk ('*') matches zero or more of characters
up to the occurrence of a '/' character (which is intended as
a path separator). If a string such as /cocoon/docs/index.html is
matched against the pattern '/*/*.index.html' the match is not
succesful: the first asterisk would match only up to the first path
separator, resulting in the "cocoon" string. Using this technique
a correct pattern would be '/*/*/*.html'.
-
A string made of two asterisks ('**') matches zero or more
characters, this time including the path separator (the character
'/'). Using the the example above the string would be matched by
the /**/*.html' pattern: the double asterisk would match also the
path separator and would resolve in the "cocoon/docs" string.
-
As with regexps the backslash character ('\') is used as an
escape sequence. The string '\*' will match an actual asterisk
while a double backslash ('\\') will match the character '\'. A
pattern such as "**/a-\*-is-born.html" would match only strings
such as "documents/movies/a-*-is-born.html" or
'a/very/long/path/a-*-is-born.html'. It would not match
a string such as 'docs/a-star-is-born.html'.
The Matchers in Cocoon
-
WildCard URI matcher(The default matcher): matches the URI against a wildcard pattern
-
Regexp URI matcher:
matches the URI against a fully blown regular expression
-
Request parameter
matcher: matches a request parameters given as a pattern. If
the parameter exists, its value is available for later substitution
-
Wildcard request parameter matcher: matches a wildcard
given as a pattern against the value of the configured
parameter
-
Wildcard session parameter matcher: same as the
request parameter, but it matches a session parameter
|