Home | History | Annotate | Download | only in pcre
      1 README file for PCRE (Perl-compatible regular expression library)
      2 -----------------------------------------------------------------
      3 
      4 The latest release of PCRE is always available in three alternative formats
      5 from:
      6 
      7   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
      8   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.bz2
      9   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.zip
     10 
     11 There is a mailing list for discussion about the development of PCRE at
     12 
     13   pcre-dev (a] exim.org
     14 
     15 Please read the NEWS file if you are upgrading from a previous release.
     16 The contents of this README file are:
     17 
     18   The PCRE APIs
     19   Documentation for PCRE
     20   Contributions by users of PCRE
     21   Building PCRE on non-Unix systems
     22   Building PCRE on Unix-like systems
     23   Retrieving configuration information on Unix-like systems
     24   Shared libraries on Unix-like systems
     25   Cross-compiling on Unix-like systems
     26   Using HP's ANSI C++ compiler (aCC)
     27   Using PCRE from MySQL
     28   Making new tarballs
     29   Testing PCRE
     30   Character tables
     31   File manifest
     32 
     33 
     34 The PCRE APIs
     35 -------------
     36 
     37 PCRE is written in C, and it has its own API. The distribution also includes a
     38 set of C++ wrapper functions (see the pcrecpp man page for details), courtesy
     39 of Google Inc.
     40 
     41 In addition, there is a set of C wrapper functions that are based on the POSIX
     42 regular expression API (see the pcreposix man page). These end up in the
     43 library called libpcreposix. Note that this just provides a POSIX calling
     44 interface to PCRE; the regular expressions themselves still follow Perl syntax
     45 and semantics. The POSIX API is restricted, and does not give full access to
     46 all of PCRE's facilities.
     47 
     48 The header file for the POSIX-style functions is called pcreposix.h. The
     49 official POSIX name is regex.h, but I did not want to risk possible problems
     50 with existing files of that name by distributing it that way. To use PCRE with
     51 an existing program that uses the POSIX API, pcreposix.h will have to be
     52 renamed or pointed at by a link.
     53 
     54 If you are using the POSIX interface to PCRE and there is already a POSIX regex
     55 library installed on your system, as well as worrying about the regex.h header
     56 file (as mentioned above), you must also take care when linking programs to
     57 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
     58 up the POSIX functions of the same name from the other library.
     59 
     60 One way of avoiding this confusion is to compile PCRE with the addition of
     61 -Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
     62 compiler flags (CFLAGS if you are using "configure" -- see below). This has the
     63 effect of renaming the functions so that the names no longer clash. Of course,
     64 you have to do the same thing for your applications, or write them using the
     65 new names.
     66 
     67 
     68 Documentation for PCRE
     69 ----------------------
     70 
     71 If you install PCRE in the normal way on a Unix-like system, you will end up
     72 with a set of man pages whose names all start with "pcre". The one that is just
     73 called "pcre" lists all the others. In addition to these man pages, the PCRE
     74 documentation is supplied in two other forms:
     75 
     76   1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
     77      doc/pcretest.txt in the source distribution. The first of these is a
     78      concatenation of the text forms of all the section 3 man pages except
     79      those that summarize individual functions. The other two are the text
     80      forms of the section 1 man pages for the pcregrep and pcretest commands.
     81      These text forms are provided for ease of scanning with text editors or
     82      similar tools. They are installed in <prefix>/share/doc/pcre, where
     83      <prefix> is the installation prefix (defaulting to /usr/local).
     84 
     85   2. A set of files containing all the documentation in HTML form, hyperlinked
     86      in various ways, and rooted in a file called index.html, is distributed in
     87      doc/html and installed in <prefix>/share/doc/pcre/html.
     88 
     89 Users of PCRE have contributed files containing the documentation for various
     90 releases in CHM format. These can be found in the Contrib directory of the FTP
     91 site (see next section).
     92 
     93 
     94 Contributions by users of PCRE
     95 ------------------------------
     96 
     97 You can find contributions from PCRE users in the directory
     98 
     99   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
    100 
    101 There is a README file giving brief descriptions of what they are. Some are
    102 complete in themselves; others are pointers to URLs containing relevant files.
    103 Some of this material is likely to be well out-of-date. Several of the earlier
    104 contributions provided support for compiling PCRE on various flavours of
    105 Windows (I myself do not use Windows). Nowadays there is more Windows support
    106 in the standard distribution, so these contibutions have been archived.
    107 
    108 
    109 Building PCRE on non-Unix systems
    110 ---------------------------------
    111 
    112 For a non-Unix system, please read the comments in the file NON-UNIX-USE,
    113 though if your system supports the use of "configure" and "make" you may be
    114 able to build PCRE in the same way as for Unix-like systems. PCRE can also be
    115 configured in many platform environments using the GUI facility provided by
    116 CMake's cmake-gui command. This creates Makefiles, solution files, etc.
    117 
    118 PCRE has been compiled on many different operating systems. It should be
    119 straightforward to build PCRE on any system that has a Standard C compiler and
    120 library, because it uses only Standard C functions.
    121 
    122 
    123 Building PCRE on Unix-like systems
    124 ----------------------------------
    125 
    126 If you are using HP's ANSI C++ compiler (aCC), please see the special note
    127 in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
    128 
    129 The following instructions assume the use of the widely used "configure, make,
    130 make install" process. There is also support for CMake in the PCRE
    131 distribution; there are some comments about using CMake in the NON-UNIX-USE
    132 file, though it can also be used in Unix-like systems.
    133 
    134 To build PCRE on a Unix-like system, first run the "configure" command from the
    135 PCRE distribution directory, with your current directory set to the directory
    136 where you want the files to be created. This command is a standard GNU
    137 "autoconf" configuration script, for which generic instructions are supplied in
    138 the file INSTALL.
    139 
    140 Most commonly, people build PCRE within its own distribution directory, and in
    141 this case, on many systems, just running "./configure" is sufficient. However,
    142 the usual methods of changing standard defaults are available. For example:
    143 
    144 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
    145 
    146 specifies that the C compiler should be run with the flags '-O2 -Wall' instead
    147 of the default, and that "make install" should install PCRE under /opt/local
    148 instead of the default /usr/local.
    149 
    150 If you want to build in a different directory, just run "configure" with that
    151 directory as current. For example, suppose you have unpacked the PCRE source
    152 into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
    153 
    154 cd /build/pcre/pcre-xxx
    155 /source/pcre/pcre-xxx/configure
    156 
    157 PCRE is written in C and is normally compiled as a C library. However, it is
    158 possible to build it as a C++ library, though the provided building apparatus
    159 does not have any features to support this.
    160 
    161 There are some optional features that can be included or omitted from the PCRE
    162 library. You can read more about them in the pcrebuild man page.
    163 
    164 . If you want to suppress the building of the C++ wrapper library, you can add
    165   --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
    166   it will try to find a C++ compiler and C++ header files, and if it succeeds,
    167   it will try to build the C++ wrapper.
    168 
    169 . If you want to make use of the support for UTF-8 Unicode character strings in
    170   PCRE, you must add --enable-utf8 to the "configure" command. Without it, the
    171   code for handling UTF-8 is not included in the library. Even when included,
    172   it still has to be enabled by an option at run time. When PCRE is compiled
    173   with this option, its input can only either be ASCII or UTF-8, even when
    174   running on EBCDIC platforms. It is not possible to use both --enable-utf8 and
    175   --enable-ebcdic at the same time.
    176 
    177 . If, in addition to support for UTF-8 character strings, you want to include
    178   support for the \P, \p, and \X sequences that recognize Unicode character
    179   properties, you must add --enable-unicode-properties to the "configure"
    180   command. This adds about 30K to the size of the library (in the form of a
    181   property table); only the basic two-letter properties such as Lu are
    182   supported.
    183 
    184 . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
    185   of the preceding, or any of the Unicode newline sequences as indicating the
    186   end of a line. Whatever you specify at build time is the default; the caller
    187   of PCRE can change the selection at run time. The default newline indicator
    188   is a single LF character (the Unix standard). You can specify the default
    189   newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
    190   or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
    191   --enable-newline-is-any to the "configure" command, respectively.
    192 
    193   If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
    194   the standard tests will fail, because the lines in the test files end with
    195   LF. Even if the files are edited to change the line endings, there are likely
    196   to be some failures. With --enable-newline-is-anycrlf or
    197   --enable-newline-is-any, many tests should succeed, but there may be some
    198   failures.
    199 
    200 . By default, the sequence \R in a pattern matches any Unicode line ending
    201   sequence. This is independent of the option specifying what PCRE considers to
    202   be the end of a line (see above). However, the caller of PCRE can restrict \R
    203   to match only CR, LF, or CRLF. You can make this the default by adding
    204   --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
    205 
    206 . When called via the POSIX interface, PCRE uses malloc() to get additional
    207   storage for processing capturing parentheses if there are more than 10 of
    208   them in a pattern. You can increase this threshold by setting, for example,
    209 
    210   --with-posix-malloc-threshold=20
    211 
    212   on the "configure" command.
    213 
    214 . PCRE has a counter that can be set to limit the amount of resources it uses.
    215   If the limit is exceeded during a match, the match fails. The default is ten
    216   million. You can change the default by setting, for example,
    217 
    218   --with-match-limit=500000
    219 
    220   on the "configure" command. This is just the default; individual calls to
    221   pcre_exec() can supply their own value. There is more discussion on the
    222   pcreapi man page.
    223 
    224 . There is a separate counter that limits the depth of recursive function calls
    225   during a matching process. This also has a default of ten million, which is
    226   essentially "unlimited". You can change the default by setting, for example,
    227 
    228   --with-match-limit-recursion=500000
    229 
    230   Recursive function calls use up the runtime stack; running out of stack can
    231   cause programs to crash in strange ways. There is a discussion about stack
    232   sizes in the pcrestack man page.
    233 
    234 . The default maximum compiled pattern size is around 64K. You can increase
    235   this by adding --with-link-size=3 to the "configure" command. You can
    236   increase it even more by setting --with-link-size=4, but this is unlikely
    237   ever to be necessary. Increasing the internal link size will reduce
    238   performance.
    239 
    240 . You can build PCRE so that its internal match() function that is called from
    241   pcre_exec() does not call itself recursively. Instead, it uses memory blocks
    242   obtained from the heap via the special functions pcre_stack_malloc() and
    243   pcre_stack_free() to save data that would otherwise be saved on the stack. To
    244   build PCRE like this, use
    245 
    246   --disable-stack-for-recursion
    247 
    248   on the "configure" command. PCRE runs more slowly in this mode, but it may be
    249   necessary in environments with limited stack sizes. This applies only to the
    250   pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
    251   use deeply nested recursion. There is a discussion about stack sizes in the
    252   pcrestack man page.
    253 
    254 . For speed, PCRE uses four tables for manipulating and identifying characters
    255   whose code point values are less than 256. By default, it uses a set of
    256   tables for ASCII encoding that is part of the distribution. If you specify
    257 
    258   --enable-rebuild-chartables
    259 
    260   a program called dftables is compiled and run in the default C locale when
    261   you obey "make". It builds a source file called pcre_chartables.c. If you do
    262   not specify this option, pcre_chartables.c is created as a copy of
    263   pcre_chartables.c.dist. See "Character tables" below for further information.
    264 
    265 . It is possible to compile PCRE for use on systems that use EBCDIC as their
    266   character code (as opposed to ASCII) by specifying
    267 
    268   --enable-ebcdic
    269 
    270   This automatically implies --enable-rebuild-chartables (see above). However,
    271   when PCRE is built this way, it always operates in EBCDIC. It cannot support
    272   both EBCDIC and UTF-8.
    273 
    274 . It is possible to compile pcregrep to use libz and/or libbz2, in order to
    275   read .gz and .bz2 files (respectively), by specifying one or both of
    276 
    277   --enable-pcregrep-libz
    278   --enable-pcregrep-libbz2
    279 
    280   Of course, the relevant libraries must be installed on your system.
    281 
    282 . It is possible to compile pcretest so that it links with the libreadline
    283   library, by specifying
    284 
    285   --enable-pcretest-libreadline
    286 
    287   If this is done, when pcretest's input is from a terminal, it reads it using
    288   the readline() function. This provides line-editing and history facilities.
    289   Note that libreadline is GPL-licenced, so if you distribute a binary of
    290   pcretest linked in this way, there may be licensing issues.
    291 
    292   Setting this option causes the -lreadline option to be added to the pcretest
    293   build. In many operating environments with a sytem-installed readline
    294   library this is sufficient. However, in some environments (e.g. if an
    295   unmodified distribution version of readline is in use), it may be necessary
    296   to specify something like LIBS="-lncurses" as well. This is because, to quote
    297   the readline INSTALL, "Readline uses the termcap functions, but does not link
    298   with the termcap or curses library itself, allowing applications which link
    299   with readline the to choose an appropriate library." If you get error
    300   messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
    301   this is the problem, and linking with the ncurses library should fix it.
    302 
    303 The "configure" script builds the following files for the basic C library:
    304 
    305 . Makefile is the makefile that builds the library
    306 . config.h contains build-time configuration options for the library
    307 . pcre.h is the public PCRE header file
    308 . pcre-config is a script that shows the settings of "configure" options
    309 . libpcre.pc is data for the pkg-config command
    310 . libtool is a script that builds shared and/or static libraries
    311 . RunTest is a script for running tests on the basic C library
    312 . RunGrepTest is a script for running tests on the pcregrep command
    313 
    314 Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
    315 names config.h.generic and pcre.h.generic. These are provided for those who
    316 have to built PCRE without using "configure" or CMake. If you use "configure"
    317 or CMake, the .generic versions are not used.
    318 
    319 If a C++ compiler is found, the following files are also built:
    320 
    321 . libpcrecpp.pc is data for the pkg-config command
    322 . pcrecpparg.h is a header file for programs that call PCRE via the C++ wrapper
    323 . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
    324 
    325 The "configure" script also creates config.status, which is an executable
    326 script that can be run to recreate the configuration, and config.log, which
    327 contains compiler output from tests that "configure" runs.
    328 
    329 Once "configure" has run, you can run "make". It builds two libraries, called
    330 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
    331 command. If a C++ compiler was found on your system, "make" also builds the C++
    332 wrapper library, which is called libpcrecpp, and some test programs called
    333 pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
    334 Building the C++ wrapper can be disabled by adding --disable-cpp to the
    335 "configure" command.
    336 
    337 The command "make check" runs all the appropriate tests. Details of the PCRE
    338 tests are given below in a separate section of this document.
    339 
    340 You can use "make install" to install PCRE into live directories on your
    341 system. The following are installed (file names are all relative to the
    342 <prefix> that is set when "configure" is run):
    343 
    344   Commands (bin):
    345     pcretest
    346     pcregrep
    347     pcre-config
    348 
    349   Libraries (lib):
    350     libpcre
    351     libpcreposix
    352     libpcrecpp (if C++ support is enabled)
    353 
    354   Configuration information (lib/pkgconfig):
    355     libpcre.pc
    356     libpcrecpp.pc (if C++ support is enabled)
    357 
    358   Header files (include):
    359     pcre.h
    360     pcreposix.h
    361     pcre_scanner.h      )
    362     pcre_stringpiece.h  ) if C++ support is enabled
    363     pcrecpp.h           )
    364     pcrecpparg.h        )
    365 
    366   Man pages (share/man/man{1,3}):
    367     pcregrep.1
    368     pcretest.1
    369     pcre.3
    370     pcre*.3 (lots more pages, all starting "pcre")
    371 
    372   HTML documentation (share/doc/pcre/html):
    373     index.html
    374     *.html (lots more pages, hyperlinked from index.html)
    375 
    376   Text file documentation (share/doc/pcre):
    377     AUTHORS
    378     COPYING
    379     ChangeLog
    380     LICENCE
    381     NEWS
    382     README
    383     pcre.txt       (a concatenation of the man(3) pages)
    384     pcretest.txt   the pcretest man page
    385     pcregrep.txt   the pcregrep man page
    386 
    387 If you want to remove PCRE from your system, you can run "make uninstall".
    388 This removes all the files that "make install" installed. However, it does not
    389 remove any directories, because these are often shared with other programs.
    390 
    391 
    392 Retrieving configuration information on Unix-like systems
    393 ---------------------------------------------------------
    394 
    395 Running "make install" installs the command pcre-config, which can be used to
    396 recall information about the PCRE configuration and installation. For example:
    397 
    398   pcre-config --version
    399 
    400 prints the version number, and
    401 
    402   pcre-config --libs
    403 
    404 outputs information about where the library is installed. This command can be
    405 included in makefiles for programs that use PCRE, saving the programmer from
    406 having to remember too many details.
    407 
    408 The pkg-config command is another system for saving and retrieving information
    409 about installed libraries. Instead of separate commands for each library, a
    410 single command is used. For example:
    411 
    412   pkg-config --cflags pcre
    413 
    414 The data is held in *.pc files that are installed in a directory called
    415 <prefix>/lib/pkgconfig.
    416 
    417 
    418 Shared libraries on Unix-like systems
    419 -------------------------------------
    420 
    421 The default distribution builds PCRE as shared libraries and static libraries,
    422 as long as the operating system supports shared libraries. Shared library
    423 support relies on the "libtool" script which is built as part of the
    424 "configure" process.
    425 
    426 The libtool script is used to compile and link both shared and static
    427 libraries. They are placed in a subdirectory called .libs when they are newly
    428 built. The programs pcretest and pcregrep are built to use these uninstalled
    429 libraries (by means of wrapper scripts in the case of shared libraries). When
    430 you use "make install" to install shared libraries, pcregrep and pcretest are
    431 automatically re-built to use the newly installed shared libraries before being
    432 installed themselves. However, the versions left in the build directory still
    433 use the uninstalled libraries.
    434 
    435 To build PCRE using static libraries only you must use --disable-shared when
    436 configuring it. For example:
    437 
    438 ./configure --prefix=/usr/gnu --disable-shared
    439 
    440 Then run "make" in the usual way. Similarly, you can use --disable-static to
    441 build only shared libraries.
    442 
    443 
    444 Cross-compiling on Unix-like systems
    445 ------------------------------------
    446 
    447 You can specify CC and CFLAGS in the normal way to the "configure" command, in
    448 order to cross-compile PCRE for some other host. However, you should NOT
    449 specify --enable-rebuild-chartables, because if you do, the dftables.c source
    450 file is compiled and run on the local host, in order to generate the inbuilt
    451 character tables (the pcre_chartables.c file). This will probably not work,
    452 because dftables.c needs to be compiled with the local compiler, not the cross
    453 compiler.
    454 
    455 When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
    456 by making a copy of pcre_chartables.c.dist, which is a default set of tables
    457 that assumes ASCII code. Cross-compiling with the default tables should not be
    458 a problem.
    459 
    460 If you need to modify the character tables when cross-compiling, you should
    461 move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
    462 run it on the local host to make a new version of pcre_chartables.c.dist.
    463 Then when you cross-compile PCRE this new version of the tables will be used.
    464 
    465 
    466 Using HP's ANSI C++ compiler (aCC)
    467 ----------------------------------
    468 
    469 Unless C++ support is disabled by specifying the "--disable-cpp" option of the
    470 "configure" script, you must include the "-AA" option in the CXXFLAGS
    471 environment variable in order for the C++ components to compile correctly.
    472 
    473 Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
    474 needed libraries fail to get included when specifying the "-AA" compiler
    475 option. If you experience unresolved symbols when linking the C++ programs,
    476 use the workaround of specifying the following environment variable prior to
    477 running the "configure" script:
    478 
    479   CXXLDFLAGS="-lstd_v2 -lCsup_v2"
    480 
    481 
    482 Using Sun's compilers for Solaris
    483 ---------------------------------
    484 
    485 A user reports that the following configurations work on Solaris 9 sparcv9 and
    486 Solaris 9 x86 (32-bit):
    487 
    488   Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
    489   Solaris 9 x86:     ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
    490 
    491 
    492 Using PCRE from MySQL
    493 ---------------------
    494 
    495 On systems where both PCRE and MySQL are installed, it is possible to make use
    496 of PCRE from within MySQL, as an alternative to the built-in pattern matching.
    497 There is a web page that tells you how to do this:
    498 
    499   http://www.mysqludf.org/lib_mysqludf_preg/index.php
    500 
    501 
    502 Making new tarballs
    503 -------------------
    504 
    505 The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
    506 zip formats. The command "make distcheck" does the same, but then does a trial
    507 build of the new distribution to ensure that it works.
    508 
    509 If you have modified any of the man page sources in the doc directory, you
    510 should first run the PrepareRelease script before making a distribution. This
    511 script creates the .txt and HTML forms of the documentation from the man pages.
    512 
    513 
    514 Testing PCRE
    515 ------------
    516 
    517 To test the basic PCRE library on a Unix system, run the RunTest script that is
    518 created by the configuring process. There is also a script called RunGrepTest
    519 that tests the options of the pcregrep command. If the C++ wrapper library is
    520 built, three test programs called pcrecpp_unittest, pcre_scanner_unittest, and
    521 pcre_stringpiece_unittest are also built.
    522 
    523 Both the scripts and all the program tests are run if you obey "make check" or
    524 "make test". For other systems, see the instructions in NON-UNIX-USE.
    525 
    526 The RunTest script runs the pcretest test program (which is documented in its
    527 own man page) on each of the testinput files in the testdata directory in
    528 turn, and compares the output with the contents of the corresponding testoutput
    529 files. A file called testtry is used to hold the main output from pcretest
    530 (testsavedregex is also used as a working file). To run pcretest on just one of
    531 the test files, give its number as an argument to RunTest, for example:
    532 
    533   RunTest 2
    534 
    535 The first test file can also be fed directly into the perltest.pl script to
    536 check that Perl gives the same results. The only difference you should see is
    537 in the first few lines, where the Perl version is given instead of the PCRE
    538 version.
    539 
    540 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
    541 pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
    542 detection, and run-time flags that are specific to PCRE, as well as the POSIX
    543 wrapper API. It also uses the debugging flags to check some of the internals of
    544 pcre_compile().
    545 
    546 If you build PCRE with a locale setting that is not the standard C locale, the
    547 character tables may be different (see next paragraph). In some cases, this may
    548 cause failures in the second set of tests. For example, in a locale where the
    549 isprint() function yields TRUE for characters in the range 128-255, the use of
    550 [:isascii:] inside a character class defines a different set of characters, and
    551 this shows up in this test as a difference in the compiled code, which is being
    552 listed for checking. Where the comparison test output contains [\x00-\x7f] the
    553 test will contain [\x00-\xff], and similarly in some other cases. This is not a
    554 bug in PCRE.
    555 
    556 The third set of tests checks pcre_maketables(), the facility for building a
    557 set of character tables for a specific locale and using them instead of the
    558 default tables. The tests make use of the "fr_FR" (French) locale. Before
    559 running the test, the script checks for the presence of this locale by running
    560 the "locale" command. If that command fails, or if it doesn't include "fr_FR"
    561 in the list of available locales, the third test cannot be run, and a comment
    562 is output to say why. If running this test produces instances of the error
    563 
    564   ** Failed to set locale "fr_FR"
    565 
    566 in the comparison output, it means that locale is not available on your system,
    567 despite being listed by "locale". This does not mean that PCRE is broken.
    568 
    569 [If you are trying to run this test on Windows, you may be able to get it to
    570 work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
    571 RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
    572 Windows versions of test 2. More info on using RunTest.bat is included in the
    573 document entitled NON-UNIX-USE.]
    574 
    575 The fourth test checks the UTF-8 support. It is not run automatically unless
    576 PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
    577 running "configure". This file can be also fed directly to the perltest.pl
    578 script, provided you are running Perl 5.8 or higher.
    579 
    580 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
    581 features of PCRE that are not relevant to Perl.
    582 
    583 The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
    584 character properties. It it not run automatically unless PCRE is built with
    585 Unicode property support. To to this you must set --enable-unicode-properties
    586 when running "configure".
    587 
    588 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
    589 matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
    590 property support, respectively. The eighth and ninth tests are not run
    591 automatically unless PCRE is build with the relevant support.
    592 
    593 The tenth test checks some internal offsets and code size features; it is run
    594 only when the default "link size" of 2 is set (in other cases the sizes
    595 change).
    596 
    597 The eleventh test checks out features that are new in Perl 5.10, and the
    598 twelfth test checks a number internals and non-Perl features concerned with
    599 Unicode property support. It it not run automatically unless PCRE is built with
    600 Unicode property support. To to this you must set --enable-unicode-properties
    601 when running "configure".
    602 
    603 
    604 Character tables
    605 ----------------
    606 
    607 For speed, PCRE uses four tables for manipulating and identifying characters
    608 whose code point values are less than 256. The final argument of the
    609 pcre_compile() function is a pointer to a block of memory containing the
    610 concatenated tables. A call to pcre_maketables() can be used to generate a set
    611 of tables in the current locale. If the final argument for pcre_compile() is
    612 passed as NULL, a set of default tables that is built into the binary is used.
    613 
    614 The source file called pcre_chartables.c contains the default set of tables. By
    615 default, this is created as a copy of pcre_chartables.c.dist, which contains
    616 tables for ASCII coding. However, if --enable-rebuild-chartables is specified
    617 for ./configure, a different version of pcre_chartables.c is built by the
    618 program dftables (compiled from dftables.c), which uses the ANSI C character
    619 handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
    620 build the table sources. This means that the default C locale which is set for
    621 your system will control the contents of these default tables. You can change
    622 the default tables by editing pcre_chartables.c and then re-building PCRE. If
    623 you do this, you should take care to ensure that the file does not get
    624 automatically re-generated. The best way to do this is to move
    625 pcre_chartables.c.dist out of the way and replace it with your customized
    626 tables.
    627 
    628 When the dftables program is run as a result of --enable-rebuild-chartables,
    629 it uses the default C locale that is set on your system. It does not pay
    630 attention to the LC_xxx environment variables. In other words, it uses the
    631 system's default locale rather than whatever the compiling user happens to have
    632 set. If you really do want to build a source set of character tables in a
    633 locale that is specified by the LC_xxx variables, you can run the dftables
    634 program by hand with the -L option. For example:
    635 
    636   ./dftables -L pcre_chartables.c.special
    637 
    638 The first two 256-byte tables provide lower casing and case flipping functions,
    639 respectively. The next table consists of three 32-byte bit maps which identify
    640 digits, "word" characters, and white space, respectively. These are used when
    641 building 32-byte bit maps that represent character classes for code points less
    642 than 256.
    643 
    644 The final 256-byte table has bits indicating various character types, as
    645 follows:
    646 
    647     1   white space character
    648     2   letter
    649     4   decimal digit
    650     8   hexadecimal digit
    651    16   alphanumeric or '_'
    652   128   regular expression metacharacter or binary zero
    653 
    654 You should not alter the set of characters that contain the 128 bit, as that
    655 will cause PCRE to malfunction.
    656 
    657 
    658 File manifest
    659 -------------
    660 
    661 The distribution should contain the following files:
    662 
    663 (A) Source files of the PCRE library functions and their headers:
    664 
    665   dftables.c              auxiliary program for building pcre_chartables.c
    666                             when --enable-rebuild-chartables is specified
    667 
    668   pcre_chartables.c.dist  a default set of character tables that assume ASCII
    669                             coding; used, unless --enable-rebuild-chartables is
    670                             specified, by copying to pcre_chartables.c
    671 
    672   pcreposix.c             )
    673   pcre_compile.c          )
    674   pcre_config.c           )
    675   pcre_dfa_exec.c         )
    676   pcre_exec.c             )
    677   pcre_fullinfo.c         )
    678   pcre_get.c              ) sources for the functions in the library,
    679   pcre_globals.c          )   and some internal functions that they use
    680   pcre_info.c             )
    681   pcre_maketables.c       )
    682   pcre_newline.c          )
    683   pcre_ord2utf8.c         )
    684   pcre_refcount.c         )
    685   pcre_study.c            )
    686   pcre_tables.c           )
    687   pcre_try_flipped.c      )
    688   pcre_ucd.c              )
    689   pcre_valid_utf8.c       )
    690   pcre_version.c          )
    691   pcre_xclass.c           )
    692   pcre_printint.src       ) debugging function that is #included in pcretest,
    693                           )   and can also be #included in pcre_compile()
    694   pcre.h.in               template for pcre.h when built by "configure"
    695   pcreposix.h             header for the external POSIX wrapper API
    696   pcre_internal.h         header for internal use
    697   ucp.h                   header for Unicode property handling
    698 
    699   config.h.in             template for config.h, which is built by "configure"
    700 
    701   pcrecpp.h               public header file for the C++ wrapper
    702   pcrecpparg.h.in         template for another C++ header file
    703   pcre_scanner.h          public header file for C++ scanner functions
    704   pcrecpp.cc              )
    705   pcre_scanner.cc         ) source for the C++ wrapper library
    706 
    707   pcre_stringpiece.h.in   template for pcre_stringpiece.h, the header for the
    708                             C++ stringpiece functions
    709   pcre_stringpiece.cc     source for the C++ stringpiece functions
    710 
    711 (B) Source files for programs that use PCRE:
    712 
    713   pcredemo.c              simple demonstration of coding calls to PCRE
    714   pcregrep.c              source of a grep utility that uses PCRE
    715   pcretest.c              comprehensive test program
    716 
    717 (C) Auxiliary files:
    718 
    719   132html                 script to turn "man" pages into HTML
    720   AUTHORS                 information about the author of PCRE
    721   ChangeLog               log of changes to the code
    722   CleanTxt                script to clean nroff output for txt man pages
    723   Detrail                 script to remove trailing spaces
    724   HACKING                 some notes about the internals of PCRE
    725   INSTALL                 generic installation instructions
    726   LICENCE                 conditions for the use of PCRE
    727   COPYING                 the same, using GNU's standard name
    728   Makefile.in             ) template for Unix Makefile, which is built by
    729                           )   "configure"
    730   Makefile.am             ) the automake input that was used to create
    731                           )   Makefile.in
    732   NEWS                    important changes in this release
    733   NON-UNIX-USE            notes on building PCRE on non-Unix systems
    734   PrepareRelease          script to make preparations for "make dist"
    735   README                  this file
    736   RunTest                 a Unix shell script for running tests
    737   RunGrepTest             a Unix shell script for pcregrep tests
    738   aclocal.m4              m4 macros (generated by "aclocal")
    739   config.guess            ) files used by libtool,
    740   config.sub              )   used only when building a shared library
    741   configure               a configuring shell script (built by autoconf)
    742   configure.ac            ) the autoconf input that was used to build
    743                           )   "configure" and config.h
    744   depcomp                 ) script to find program dependencies, generated by
    745                           )   automake
    746   doc/*.3                 man page sources for PCRE
    747   doc/*.1                 man page sources for pcregrep and pcretest
    748   doc/index.html.src      the base HTML page
    749   doc/html/*              HTML documentation
    750   doc/pcre.txt            plain text version of the man pages
    751   doc/pcretest.txt        plain text documentation of test program
    752   doc/perltest.txt        plain text documentation of Perl test program
    753   install-sh              a shell script for installing files
    754   libpcre.pc.in           template for libpcre.pc for pkg-config
    755   libpcreposix.pc.in      template for libpcreposix.pc for pkg-config
    756   libpcrecpp.pc.in        template for libpcrecpp.pc for pkg-config
    757   ltmain.sh               file used to build a libtool script
    758   missing                 ) common stub for a few missing GNU programs while
    759                           )   installing, generated by automake
    760   mkinstalldirs           script for making install directories
    761   perltest.pl             Perl test program
    762   pcre-config.in          source of script which retains PCRE information
    763   pcrecpp_unittest.cc          )
    764   pcre_scanner_unittest.cc     ) test programs for the C++ wrapper
    765   pcre_stringpiece_unittest.cc )
    766   testdata/testinput*     test data for main library tests
    767   testdata/testoutput*    expected test results
    768   testdata/grep*          input and output for pcregrep tests
    769 
    770 (D) Auxiliary files for cmake support
    771 
    772   cmake/COPYING-CMAKE-SCRIPTS
    773   cmake/FindPackageHandleStandardArgs.cmake
    774   cmake/FindReadline.cmake
    775   CMakeLists.txt
    776   config-cmake.h.in
    777 
    778 (E) Auxiliary files for VPASCAL
    779 
    780   makevp.bat
    781   makevp_c.txt
    782   makevp_l.txt
    783   pcregexp.pas
    784 
    785 (F) Auxiliary files for building PCRE "by hand"
    786 
    787   pcre.h.generic          ) a version of the public PCRE header file
    788                           )   for use in non-"configure" environments
    789   config.h.generic        ) a version of config.h for use in non-"configure"
    790                           )   environments
    791 
    792 (F) Miscellaneous
    793 
    794   RunTest.bat            a script for running tests under Windows
    795 
    796 Philip Hazel
    797 Email local part: ph10
    798 Email domain: cam.ac.uk
    799 Last updated: 19 January 2010
    800