 |
Index for Section 3 |
|
 |
Alphabetical listing for R |
|
 |
Bottom of page |
|
regcomp(3)
NAME
regcomp, regerror, regexec, regfree - Compare string to regular expression
SYNOPSIS
#include <sys/types.h>
#include <regex.h>
int regcomp(
regex_t *preg,
const char *pattern,
int cflags );
size_t regerror(
int errcode,
const regex_t *preg,
char *errbuf,
size_t errbuf_size );
int regexec(
const regex_t *preg,
const char *string,
size_t nmatch,
regmatch_t *pmatch,
int eflags );
void regfree(
regex_t *preg );
LIBRARY
Standard C Library (libc)
STANDARDS
Interfaces documented on this reference page conform to industry standards
as follows:
regcomp(), regexec(), regerror(), regfree(): POSIX.2, XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
PARAMETERS
coptions
Specifies the options for regcomp(). The cflags parameter is the
bitwise inclusive OR of zero or more of the following options, which
are defined in the /usr/include/regex.h file.
REG_EXTENDED
Uses extended regular expressions.
REG_ICASE
Ignores case in match.
REG_NOSUB
Reports only success or failure in regexec(); does not report
subexpressions.
REG_NEWLINE
Treats newline as a special character marking the end and
beginning of lines.
pattern
Contains the basic or extended regular expression to be compiled by
regcomp().
preg
The structure that contains the compiled basic or extended regular
expression.
errcode
Identifies the error code.
errbuf
Points to the buffer where regerror() stores the message text.
errbuf_size
Specifies the size of the errbuf buffer.
string
Contains the data to be matched.
nmatch
Contains the number of subexpressions to match.
pmatch
Contains the array of offsets into the string parameter that match the
corresponding subexpression in the preg parameter.
eflags
Specifies the options controlling the customizable behavior of the
regexec function. The eflags parameter modifies the interpretation of
the contents of the string parameter. The value for this parameter is
formed by bitwise inclusive ORing zero or more of the following
options, which are defined in the /usr/include/regex.h file.
REG_NOTBOL
The first character of the string pointed to by the string
parameter is not the beginning of the line. Therefore, the
circumflex character ^ (circumflex), when taken as a special
character, does not match the beginning of the string
parameter.
REG_NOTEOL
The last character of the string pointed to by the string
parameter is not the end of the line. Therefore, the $ (dollar
sign), when taken as a special character, does not match the
end of the string parameter.
DESCRIPTION
The regcomp(), regerror(), regexec(), and regfree() functions perform
regular expression matching. The regcomp() function compiles a regular
expression and the regexec() function compares the compiled regular
expression to a string. The regerror() function returns text associated
with an error condition encountered by regcomp() or regexec(). The
regfree() function frees the internal storage allocated for the compiled
regular expression.
The regcomp() function compiles the basic or extended regular expression
specified by the pattern parameter and places the output in the preg
structure. The default regular expression type for the pattern parameter is
a basic regular expression. An application can specify extended regular
expressions with the REG_EXTENDED option.
If the REG_NOSUB option is not set in coptions, the regcomp() function sets
the number of parenthetic subexpressions (delimited by \( and \) in basic
regular expressions or by () in extended regular expressions) to the number
found in pattern.
The regexec() function compares the null-terminated string in the string
parameter against the compiled basic or extended regular expression in the
preg parameter. If a match is found, the regexec() function returns a
value of 0 (zero). The regexec() function returns REG_NOMATCH if there is
no match. Any other nonzero value returned indicates an error.
If the value of the nmatch parameter is 0 (zero) or if the REG_NOSUB option
was set on the call to the regcomp() function, the regexec() function
ignores the pmatch parameter. Otherwise, the pmatch parameter points to an
array of at least the number of elements specified by the nmatch parameter.
The regexec() function fills in the elements of the array pointed to by the
pmatch parameter with offsets of the substrings of the string parameter.
The elements of the pmatch array correspond to the parenthetic
subexpressions of the original pattern parameter that was specified to the
regcomp() function. The pmatch[i].rm_so structure is the byte offset of the
beginning of the substring, and the pmatch[i].rm_eo structure is one
greater than the byte offset of the end of the substring. Subexpression i
begins at the ith matched open parenthesis, counting from 1. The 0 (zero)
element of the array corresponds to the entire pattern. Unused elements of
the pmatch parameter, up to the value pmatch[nmatch-1], are filled with -1.
If the number of subexpressions exceeds the number specified by the nmatch
parameter (the pattern parameter itself counts as a subexpression), only
the first nmatch-1 are recorded.
When matching a basic or extended regular expression, any given parenthetic
subexpression of the pattern parameter can participate in the match of
several different substrings of the string parameter; however, it may not
match any substring even though the pattern as a whole did match. The
following rules are used to determine which substrings to report in the
pmatch parameter when matching regular expressions:
· If a subexpression in a regular expression participated in the match
several times, the offset of the last matching substring is reported
in the pmatch parameter.
· If a subexpression did not participate in a match, the byte offset in
the pmatch parameter is a value of -1.
· If a subexpression is contained in a subexpression, the data in the
pmatch parameter refers to the last such subexpression.
· If a subexpression is contained in a subexpression and the byte
offsets in the pmatch parameter have a value of -1, the pointers in
the pmatch parameter also have a value of -1.
· If a subexpression matched a zero-length string, the offsets in the
pmatch parameter refer to the byte immediately following the matching
string.
If the REG_NOSUB option was set in the cflags parameter in the call to the
regcomp() function and the nmatch parameter is not equal to 0 (zero) in the
call to the regexec function, the content of the pmatch array is
unspecified.
If the REG_NEWLINE option was not set in the cflags parameter when the
regcomp() function was called, a newline character in the pattern or string
parameter is treated as an ordinary character. If the REG_NEWLINE option
was set when the regcomp() function was called, the newline character is
treated as an ordinary character, except as follows:
· A newline character in the string parameter is not matched by a .
(dot) outside of a bracket expression or by any form of a nonmatching
list.
· A ^ (circumflex) in the pattern parameter, when used to specify
expression anchoring, matches the zero-length string immediately after
a newline character in the string parameter, regardless of the setting
of the REG_NOTBOL option.
· A $ (dollar sign) in the pattern parameter, when used to specify
expression anchoring, matches the zero-length string immediately
before a newline character in the string parameter, regardless of the
setting of the REG_NOTEOL option.
The regerror() function returns the text associated with the specified
error code. If the regcomp() or regexec() function fails, it returns a
nonzero error code. If this return value is assigned to the errcode
parameter, the regerror() function returns the text of the associated
message.
If the errbuf_size parameter is not 0, regerror() places the generated
string into the buffer size errbuf_size bytes pointed to by errbuf. If the
string (including the terminating null) cannot fit in the buffer,
regerror() truncates the string and null-terminates the result.
If errbuf_size is 0, regerror() ignores the errbuf parameter and returns
the size of the buffer needed to hold the generated string.
The regfree() function frees any memory allocated by the regcomp() function
associated with the preg parameter. An expression defined by the preg
parameter is no longer treated as a compiled basic or extended regular
expression after it is given to the regfree() function.
RETURN VALUES
Upon successful completion, the regcomp() function returns a value of 0
(zero). Otherwise, regcomp() returns an integer value indicating an error
as described below, and the contents of the preg parameter is undefined. If
the regcomp() function detects an illegal basic or extended regular
expression, it returns REG_BADPAT or an error code that more precisely
describes the error.
If the regexec() function finds a match, the function returns a value of 0
(zero). Otherwise, it returns REG_NOMATCH to indicate no match or
REG_ENOSYS to indicate that the function is not supported.
Upon successful completion, the regerror() function returns the number of
bytes needed to hold the entire generated string. This value may be greater
than the value of the errbuf_size parameter. If regerror fails, it returns
0 (zero) to indicate that the function is not implemented.
The regfree() function returns no value.
The following constants are defined as error return values:
REG_BADBR
The contents within the pair \{ and \} are invalid: not a number,
number too large, more than two numbers, or first number larger than
second.
REG_BADPAT
The pattern contains an invalid regular expression.
REG_BADRPT
The ?, *, or + symbols are not preceded by a valid regular expression.
REG_EBRACE
The use of a pair of \{ and \} or {} is unbalanced.
REG_EBRACK
The use of [] is unbalanced.
REG_ECOLLATE
An invalid collating element was referenced.
REG_ECTYPE
An invalid character class type was referenced.
REG_EESCAPE
The pattern contains a trailing \ (backslash).
REG_ENOSYS
The function is unsupported.
REG_EPAREN
The use of a pair of \( and \) or () is unbalanced or exceeds the
allowable range. The range is set in the _REG_SUBEXP_MAX parameter of
regex.h and is usually 49.
REG_ERANGE
An endpoint in the range expression is invalid.
REG_ESPACE
Insufficient memory space is available.
REG_ESUBREG
The number in \digit is invalid or in error.
REG_MPAREN
The pattern contains too many parenthetic subexpressions.
REG_NOMATCH
The regexec() function did not find a match.
ERRORS
These functions do not set errno to indicate an error.
EXAMPLES
The following example demonstrates how the REG_NOTBOL option can be used
with the regexec() function to find all substrings in a line that match a
pattern supplied by a user. The main() function in the example accepts two
input strings from the user. The match() function in the example uses
regcomp() and regexec() to search for matches.
#include <sys/types.h>
#include <regex.h>
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <nl_types.h>
#include "reg_example.h"
#define SLENGTH 128
main()
{
char patt[SLENGTH], strng[SLENGTH];
char *eol;
nl_catd catd;
(void)setlocale(LC_ALL, );
catd = catopen("reg_example.cat", NL_CAT_LOCALE);
printf(catgets(catd,SET1,INPUT,
"Enter a regular expression:"));
fgets(patt, SLENGTH, stdin);
if ((eol = strchr(patt, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
printf(catgets(catd,SET1,COMPARE,
"Enter string to compare\nString: "));
fgets(strng, SLENGTH, stdin);
if ((eol = strchr(strng, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
match(patt, strng);
}
int match(char *pattern, char *string)
{
char message[SLENGTH];
char *start_search;
int error, msize, count;
regex_t preg;
regmatch_t pmatch;
error = regcomp(&preg, pattern,
REG_ICASE | REG_EXTENDED);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,"Additional text lost\n"));
return;
}
error = regexec(&preg, string, 1, &pmatch, 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH,
"No matches in string\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
};
count = 1;
start_search = string + pmatch.rm_eo;
while (error == 0) {
error =
regexec(&preg, start_search, 1, &pmatch,
REG_NOTBOL);
start_search = start_search + pmatch.rm_eo;
count++;
};
count--;
printf(catgets(catd,SET1,MATCH,
"There are %i matches\n"), count);
regfree(&preg);
catclose(catd);
}
The following example finds out which subexpressions in the regular
expression have matches in the string. This example uses the same main()
program as the preceding example. This example does not specify
REG_EXTENDED in the call to regcomp() and, consequently, uses basic
regular expressions, not extended regular expressions.
#define MAX_MATCH 10
int match(char *pattern, char *string)
{
char message[SLENGTH];
char *start_search;
int error, msize, count, matches_tocheck;
regex_t preg;
regmatch_t pmatch[MAX_MATCH];
error = regcomp(&preg, pattern, REG_ICASE);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regcomp: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
}
if (preg.re_nsub > MAX_MATCH) {
printf(catgets(catd,SET1,SUBEXPR,
"There are %1$i subexpressions, checking %2$i\n"),
preg.re_nsub, MAX_MATCH);
matches_tocheck = MAX_MATCH;
} else {
printf(catgets(catd,SET1,SUB_EXPR_NUM,
"There are %i subexpressions in the regular expression\n"),
preg.re_nsub);
matches_tocheck = preg.re_nsub;
}
error = regexec(&preg, string, MAX_MATCH, &pmatch[0], 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH_ENT,
"String did not contain match for entire regular expression\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regexe: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
} else
printf(catgets(catd,SET1,MATCH_ENT,
"String contained match for the entire regular expression\n"));
for (count = 0; count <= matches_tocheck; count++) {
if (pmatch[count].rm_so != -1) {
printf(catgets(catd,SET1,SUB_EXPR_MATCH
"Subexpression %i matched in string\n"),count);
printf(catgets(catd,SET1,MATCH_WHERE,
"Match starts at %1$i. Byte after match is %2$i\n"),
pmatch[count].rm_so, pmatch[count].rm_eo);
} else
printf(catgets(catd,SET1,NO_MATCH_SUB,
"Subexpression %i had NO match\n"), count);
}
regfree(&preg);
catclose(catd);
}
SEE ALSO
Commands: grep(1)
Standards: standards(5)
 |
Index for Section 3 |
|
 |
Alphabetical listing for R |
|
 |
Top of page |
|