Home | History | Annotate | only in /external/pcre/dist2
Up to higher level directory
NameDateSize
132html05-Oct-20176.8K
aclocal.m405-Oct-201753.3K
ar-lib05-Oct-20175.7K
AUTHORS05-Oct-2017728
ChangeLog05-Oct-201742.2K
CheckMan05-Oct-20171.5K
CleanTxt05-Oct-20172.9K
cmake/05-Oct-2017
CMakeLists.txt05-Oct-201728.4K
compile05-Oct-20177.2K
config-cmake.h.in05-Oct-20171.3K
config.guess05-Oct-201741.9K
config.sub05-Oct-201735.1K
configure05-Oct-2017529.4K
configure.ac05-Oct-201734.3K
COPYING05-Oct-201797
depcomp05-Oct-201723K
Detrail05-Oct-2017643
doc/05-Oct-2017
HACKING05-Oct-201727.6K
INSTALL05-Oct-201715.4K
install-sh05-Oct-201714.3K
libpcre2-16.pc.in05-Oct-2017393
libpcre2-32.pc.in05-Oct-2017393
libpcre2-8.pc.in05-Oct-2017390
libpcre2-posix.pc.in05-Oct-2017329
LICENCE05-Oct-20172.9K
ltmain.sh05-Oct-2017316.5K
m4/05-Oct-2017
Makefile.am05-Oct-201722.9K
Makefile.in05-Oct-2017195.5K
missing05-Oct-20176.7K
NEWS05-Oct-20173.9K
NON-AUTOTOOLS-BUILD05-Oct-201717.3K
pcre2-config.in05-Oct-20172.2K
perltest.sh05-Oct-20178.1K
PrepareRelease05-Oct-20176.8K
README05-Oct-201738.6K
RunGrepTest05-Oct-201731.5K
RunTest05-Oct-201725.3K
RunTest.bat05-Oct-201713.7K
src/05-Oct-2017
test-driver05-Oct-20174.5K
testdata/05-Oct-2017

README

      1 README file for PCRE2 (Perl-compatible regular expression library)
      2 ------------------------------------------------------------------
      3 
      4 PCRE2 is a re-working of the original PCRE library to provide an entirely new
      5 API. The latest release of PCRE2 is always available in three alternative
      6 formats from:
      7 
      8   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
      9   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
     10   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
     11 
     12 There is a mailing list for discussion about the development of PCRE (both the
     13 original and new APIs) at pcre-dev (a] exim.org. You can access the archives and
     14 subscribe or manage your subscription here:
     15 
     16    https://lists.exim.org/mailman/listinfo/pcre-dev
     17 
     18 Please read the NEWS file if you are upgrading from a previous release.
     19 The contents of this README file are:
     20 
     21   The PCRE2 APIs
     22   Documentation for PCRE2
     23   Contributions by users of PCRE2
     24   Building PCRE2 on non-Unix-like systems
     25   Building PCRE2 without using autotools
     26   Building PCRE2 using autotools
     27   Retrieving configuration information
     28   Shared libraries
     29   Cross-compiling using autotools
     30   Making new tarballs
     31   Testing PCRE2
     32   Character tables
     33   File manifest
     34 
     35 
     36 The PCRE2 APIs
     37 --------------
     38 
     39 PCRE2 is written in C, and it has its own API. There are three sets of
     40 functions, one for the 8-bit library, which processes strings of bytes, one for
     41 the 16-bit library, which processes strings of 16-bit values, and one for the
     42 32-bit library, which processes strings of 32-bit values. There are no C++
     43 wrappers.
     44 
     45 The distribution does contain a set of C wrapper functions for the 8-bit
     46 library that are based on the POSIX regular expression API (see the pcre2posix
     47 man page). These can be found in a library called libpcre2posix. Note that this
     48 just provides a POSIX calling interface to PCRE2; the regular expressions
     49 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
     50 and does not give full access to all of PCRE2's facilities.
     51 
     52 The header file for the POSIX-style functions is called pcre2posix.h. The
     53 official POSIX name is regex.h, but I did not want to risk possible problems
     54 with existing files of that name by distributing it that way. To use PCRE2 with
     55 an existing program that uses the POSIX API, pcre2posix.h will have to be
     56 renamed or pointed at by a link.
     57 
     58 If you are using the POSIX interface to PCRE2 and there is already a POSIX
     59 regex library installed on your system, as well as worrying about the regex.h
     60 header file (as mentioned above), you must also take care when linking programs
     61 to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
     62 pick up the POSIX functions of the same name from the other library.
     63 
     64 One way of avoiding this confusion is to compile PCRE2 with the addition of
     65 -Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
     66 compiler flags (CFLAGS if you are using "configure" -- see below). This has the
     67 effect of renaming the functions so that the names no longer clash. Of course,
     68 you have to do the same thing for your applications, or write them using the
     69 new names.
     70 
     71 
     72 Documentation for PCRE2
     73 -----------------------
     74 
     75 If you install PCRE2 in the normal way on a Unix-like system, you will end up
     76 with a set of man pages whose names all start with "pcre2". The one that is
     77 just called "pcre2" lists all the others. In addition to these man pages, the
     78 PCRE2 documentation is supplied in two other forms:
     79 
     80   1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
     81      doc/pcre2test.txt in the source distribution. The first of these is a
     82      concatenation of the text forms of all the section 3 man pages except the
     83      listing of pcre2demo.c and those that summarize individual functions. The
     84      other two are the text forms of the section 1 man pages for the pcre2grep
     85      and pcre2test commands. These text forms are provided for ease of scanning
     86      with text editors or similar tools. They are installed in
     87      <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
     88      (defaulting to /usr/local).
     89 
     90   2. A set of files containing all the documentation in HTML form, hyperlinked
     91      in various ways, and rooted in a file called index.html, is distributed in
     92      doc/html and installed in <prefix>/share/doc/pcre2/html.
     93 
     94 
     95 Building PCRE2 on non-Unix-like systems
     96 ---------------------------------------
     97 
     98 For a non-Unix-like system, please read the comments in the file
     99 NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
    100 "make" you may be able to build PCRE2 using autotools in the same way as for
    101 many Unix-like systems.
    102 
    103 PCRE2 can also be configured using CMake, which can be run in various ways
    104 (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
    105 NON-AUTOTOOLS-BUILD has information about CMake.
    106 
    107 PCRE2 has been compiled on many different operating systems. It should be
    108 straightforward to build PCRE2 on any system that has a Standard C compiler and
    109 library, because it uses only Standard C functions.
    110 
    111 
    112 Building PCRE2 without using autotools
    113 --------------------------------------
    114 
    115 The use of autotools (in particular, libtool) is problematic in some
    116 environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
    117 file for ways of building PCRE2 without using autotools.
    118 
    119 
    120 Building PCRE2 using autotools
    121 ------------------------------
    122 
    123 The following instructions assume the use of the widely used "configure; make;
    124 make install" (autotools) process.
    125 
    126 To build PCRE2 on system that supports autotools, first run the "configure"
    127 command from the PCRE2 distribution directory, with your current directory set
    128 to the directory where you want the files to be created. This command is a
    129 standard GNU "autoconf" configuration script, for which generic instructions
    130 are supplied in the file INSTALL.
    131 
    132 Most commonly, people build PCRE2 within its own distribution directory, and in
    133 this case, on many systems, just running "./configure" is sufficient. However,
    134 the usual methods of changing standard defaults are available. For example:
    135 
    136 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
    137 
    138 This command specifies that the C compiler should be run with the flags '-O2
    139 -Wall' instead of the default, and that "make install" should install PCRE2
    140 under /opt/local instead of the default /usr/local.
    141 
    142 If you want to build in a different directory, just run "configure" with that
    143 directory as current. For example, suppose you have unpacked the PCRE2 source
    144 into /source/pcre2/pcre2-xxx, but you want to build it in
    145 /build/pcre2/pcre2-xxx:
    146 
    147 cd /build/pcre2/pcre2-xxx
    148 /source/pcre2/pcre2-xxx/configure
    149 
    150 PCRE2 is written in C and is normally compiled as a C library. However, it is
    151 possible to build it as a C++ library, though the provided building apparatus
    152 does not have any features to support this.
    153 
    154 There are some optional features that can be included or omitted from the PCRE2
    155 library. They are also documented in the pcre2build man page.
    156 
    157 . By default, both shared and static libraries are built. You can change this
    158   by adding one of these options to the "configure" command:
    159 
    160   --disable-shared
    161   --disable-static
    162 
    163   (See also "Shared libraries on Unix-like systems" below.)
    164 
    165 . By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
    166   the "configure" command, the 16-bit library is also built. If you add
    167   --enable-pcre2-32 to the "configure" command, the 32-bit library is also
    168   built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
    169   to disable building the 8-bit library.
    170 
    171 . If you want to include support for just-in-time (JIT) compiling, which can
    172   give large performance improvements on certain platforms, add --enable-jit to
    173   the "configure" command. This support is available only for certain hardware
    174   architectures. If you try to enable it on an unsupported architecture, there
    175   will be a compile time error.
    176 
    177 . If you do not want to make use of the support for UTF-8 Unicode character
    178   strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
    179   library, or UTF-32 Unicode character strings in the 32-bit library, you can
    180   add --disable-unicode to the "configure" command. This reduces the size of
    181   the libraries. It is not possible to configure one library with Unicode
    182   support, and another without, in the same configuration.
    183 
    184   When Unicode support is available, the use of a UTF encoding still has to be
    185   enabled by setting the PCRE2_UTF option at run time or starting a pattern
    186   with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
    187   either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
    188   not possible to use both --enable-unicode and --enable-ebcdic at the same
    189   time.
    190 
    191   As well as supporting UTF strings, Unicode support includes support for the
    192   \P, \p, and \X sequences that recognize Unicode character properties.
    193   However, only the basic two-letter properties such as Lu are supported.
    194   Escape sequences such as \d and \w in patterns do not by default make use of
    195   Unicode properties, but can be made to do so by setting the PCRE2_UCP option
    196   or starting a pattern with (*UCP).
    197 
    198 . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
    199   of the preceding, or any of the Unicode newline sequences, as indicating the
    200   end of a line. Whatever you specify at build time is the default; the caller
    201   of PCRE2 can change the selection at run time. The default newline indicator
    202   is a single LF character (the Unix standard). You can specify the default
    203   newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
    204   --enable-newline-is-crlf, --enable-newline-is-anycrlf, or
    205   --enable-newline-is-any to the "configure" command, respectively.
    206 
    207   If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
    208   the standard tests will fail, because the lines in the test files end with
    209   LF. Even if the files are edited to change the line endings, there are likely
    210   to be some failures. With --enable-newline-is-anycrlf or
    211   --enable-newline-is-any, many tests should succeed, but there may be some
    212   failures.
    213 
    214 . By default, the sequence \R in a pattern matches any Unicode line ending
    215   sequence. This is independent of the option specifying what PCRE2 considers
    216   to be the end of a line (see above). However, the caller of PCRE2 can
    217   restrict \R to match only CR, LF, or CRLF. You can make this the default by
    218   adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
    219 
    220 . In a pattern, the escape sequence \C matches a single code unit, even in a
    221   UTF mode. This can be dangerous because it breaks up multi-code-unit
    222   characters. You can build PCRE2 with the use of \C permanently locked out by
    223   adding --enable-never-backslash-C (note the upper case C) to the "configure"
    224   command. When \C is allowed by the library, individual applications can lock
    225   it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
    226 
    227 . PCRE2 has a counter that limits the depth of nesting of parentheses in a
    228   pattern. This limits the amount of system stack that a pattern uses when it
    229   is compiled. The default is 250, but you can change it by setting, for
    230   example,
    231 
    232   --with-parens-nest-limit=500
    233 
    234 . PCRE2 has a counter that can be set to limit the amount of resources it uses
    235   when matching a pattern. If the limit is exceeded during a match, the match
    236   fails. The default is ten million. You can change the default by setting, for
    237   example,
    238 
    239   --with-match-limit=500000
    240 
    241   on the "configure" command. This is just the default; individual calls to
    242   pcre2_match() can supply their own value. There is more discussion on the
    243   pcre2api man page.
    244 
    245 . There is a separate counter that limits the depth of recursive function calls
    246   during a matching process. This also has a default of ten million, which is
    247   essentially "unlimited". You can change the default by setting, for example,
    248 
    249   --with-match-limit-recursion=500000
    250 
    251   Recursive function calls use up the runtime stack; running out of stack can
    252   cause programs to crash in strange ways. There is a discussion about stack
    253   sizes in the pcre2stack man page.
    254 
    255 . In the 8-bit library, the default maximum compiled pattern size is around
    256   64K. You can increase this by adding --with-link-size=3 to the "configure"
    257   command. PCRE2 then uses three bytes instead of two for offsets to different
    258   parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
    259   the same as --with-link-size=4, which (in both libraries) uses four-byte
    260   offsets. Increasing the internal link size reduces performance in the 8-bit
    261   and 16-bit libraries. In the 32-bit library, the link size setting is
    262   ignored, as 4-byte offsets are always used.
    263 
    264 . You can build PCRE2 so that its internal match() function that is called from
    265   pcre2_match() does not call itself recursively. Instead, it uses memory
    266   blocks obtained from the heap to save data that would otherwise be saved on
    267   the stack. To build PCRE2 like this, use
    268 
    269   --disable-stack-for-recursion
    270 
    271   on the "configure" command. PCRE2 runs more slowly in this mode, but it may
    272   be necessary in environments with limited stack sizes. This applies only to
    273   the normal execution of the pcre2_match() function; if JIT support is being
    274   successfully used, it is not relevant. Equally, it does not apply to
    275   pcre2_dfa_match(), which does not use deeply nested recursion. There is a
    276   discussion about stack sizes in the pcre2stack man page.
    277 
    278 . For speed, PCRE2 uses four tables for manipulating and identifying characters
    279   whose code point values are less than 256. By default, it uses a set of
    280   tables for ASCII encoding that is part of the distribution. If you specify
    281 
    282   --enable-rebuild-chartables
    283 
    284   a program called dftables is compiled and run in the default C locale when
    285   you obey "make". It builds a source file called pcre2_chartables.c. If you do
    286   not specify this option, pcre2_chartables.c is created as a copy of
    287   pcre2_chartables.c.dist. See "Character tables" below for further
    288   information.
    289 
    290 . It is possible to compile PCRE2 for use on systems that use EBCDIC as their
    291   character code (as opposed to ASCII/Unicode) by specifying
    292 
    293   --enable-ebcdic --disable-unicode
    294 
    295   This automatically implies --enable-rebuild-chartables (see above). However,
    296   when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
    297   both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
    298   which specifies that the code value for the EBCDIC NL character is 0x25
    299   instead of the default 0x15.
    300 
    301 . If you specify --enable-debug, additional debugging code is included in the
    302   build. This option is intended for use by the PCRE2 maintainers.
    303 
    304 . In environments where valgrind is installed, if you specify
    305 
    306   --enable-valgrind
    307 
    308   PCRE2 will use valgrind annotations to mark certain memory regions as
    309   unaddressable. This allows it to detect invalid memory accesses, and is
    310   mostly useful for debugging PCRE2 itself.
    311 
    312 . In environments where the gcc compiler is used and lcov version 1.6 or above
    313   is installed, if you specify
    314 
    315   --enable-coverage
    316 
    317   the build process implements a code coverage report for the test suite. The
    318   report is generated by running "make coverage". If ccache is installed on
    319   your system, it must be disabled when building PCRE2 for coverage reporting.
    320   You can do this by setting the environment variable CCACHE_DISABLE=1 before
    321   running "make" to build PCRE2. There is more information about coverage
    322   reporting in the "pcre2build" documentation.
    323 
    324 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
    325   you add --disable-pcre2grep-jit to the "configure" command.
    326 
    327 . On non-Windows sytems there is support for calling external scripts during
    328   matching in the pcre2grep command via PCRE2's callout facility with string
    329   arguments. This support can be disabled by adding --disable-pcre2grep-callout
    330   to the "configure" command.
    331 
    332 . The pcre2grep program currently supports only 8-bit data files, and so
    333   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
    334   libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
    335   specifying one or both of
    336 
    337   --enable-pcre2grep-libz
    338   --enable-pcre2grep-libbz2
    339 
    340   Of course, the relevant libraries must be installed on your system.
    341 
    342 . The default size (in bytes) of the internal buffer used by pcre2grep can be
    343   set by, for example:
    344 
    345   --with-pcre2grep-bufsize=51200
    346 
    347   The value must be a plain integer. The default is 20480.
    348 
    349 . It is possible to compile pcre2test so that it links with the libreadline
    350   or libedit libraries, by specifying, respectively,
    351 
    352   --enable-pcre2test-libreadline or --enable-pcre2test-libedit
    353 
    354   If this is done, when pcre2test's input is from a terminal, it reads it using
    355   the readline() function. This provides line-editing and history facilities.
    356   Note that libreadline is GPL-licenced, so if you distribute a binary of
    357   pcre2test linked in this way, there may be licensing issues. These can be
    358   avoided by linking with libedit (which has a BSD licence) instead.
    359 
    360   Enabling libreadline causes the -lreadline option to be added to the
    361   pcre2test build. In many operating environments with a sytem-installed
    362   readline library this is sufficient. However, in some environments (e.g. if
    363   an unmodified distribution version of readline is in use), it may be
    364   necessary to specify something like LIBS="-lncurses" as well. This is
    365   because, to quote the readline INSTALL, "Readline uses the termcap functions,
    366   but does not link with the termcap or curses library itself, allowing
    367   applications which link with readline the to choose an appropriate library."
    368   If you get error messages about missing functions tgetstr, tgetent, tputs,
    369   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
    370   should fix it.
    371 
    372 The "configure" script builds the following files for the basic C library:
    373 
    374 . Makefile             the makefile that builds the library
    375 . src/config.h         build-time configuration options for the library
    376 . src/pcre2.h          the public PCRE2 header file
    377 . pcre2-config          script that shows the building settings such as CFLAGS
    378                          that were set for "configure"
    379 . libpcre2-8.pc        )
    380 . libpcre2-16.pc       ) data for the pkg-config command
    381 . libpcre2-32.pc       )
    382 . libpcre2-posix.pc    )
    383 . libtool              script that builds shared and/or static libraries
    384 
    385 Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
    386 tarballs under the names config.h.generic and pcre2.h.generic. These are
    387 provided for those who have to build PCRE2 without using "configure" or CMake.
    388 If you use "configure" or CMake, the .generic versions are not used.
    389 
    390 The "configure" script also creates config.status, which is an executable
    391 script that can be run to recreate the configuration, and config.log, which
    392 contains compiler output from tests that "configure" runs.
    393 
    394 Once "configure" has run, you can run "make". This builds whichever of the
    395 libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
    396 program called pcre2test. If you enabled JIT support with --enable-jit, another
    397 test program called pcre2_jit_test is built as well. If the 8-bit library is
    398 built, libpcre2-posix and the pcre2grep command are also built. Running
    399 "make" with the -j option may speed up compilation on multiprocessor systems.
    400 
    401 The command "make check" runs all the appropriate tests. Details of the PCRE2
    402 tests are given below in a separate section of this document. The -j option of
    403 "make" can also be used when running the tests.
    404 
    405 You can use "make install" to install PCRE2 into live directories on your
    406 system. The following are installed (file names are all relative to the
    407 <prefix> that is set when "configure" is run):
    408 
    409   Commands (bin):
    410     pcre2test
    411     pcre2grep (if 8-bit support is enabled)
    412     pcre2-config
    413 
    414   Libraries (lib):
    415     libpcre2-8      (if 8-bit support is enabled)
    416     libpcre2-16     (if 16-bit support is enabled)
    417     libpcre2-32     (if 32-bit support is enabled)
    418     libpcre2-posix  (if 8-bit support is enabled)
    419 
    420   Configuration information (lib/pkgconfig):
    421     libpcre2-8.pc
    422     libpcre2-16.pc
    423     libpcre2-32.pc
    424     libpcre2-posix.pc
    425 
    426   Header files (include):
    427     pcre2.h
    428     pcre2posix.h
    429 
    430   Man pages (share/man/man{1,3}):
    431     pcre2grep.1
    432     pcre2test.1
    433     pcre2-config.1
    434     pcre2.3
    435     pcre2*.3 (lots more pages, all starting "pcre2")
    436 
    437   HTML documentation (share/doc/pcre2/html):
    438     index.html
    439     *.html (lots more pages, hyperlinked from index.html)
    440 
    441   Text file documentation (share/doc/pcre2):
    442     AUTHORS
    443     COPYING
    444     ChangeLog
    445     LICENCE
    446     NEWS
    447     README
    448     pcre2.txt         (a concatenation of the man(3) pages)
    449     pcre2test.txt     the pcre2test man page
    450     pcre2grep.txt     the pcre2grep man page
    451     pcre2-config.txt  the pcre2-config man page
    452 
    453 If you want to remove PCRE2 from your system, you can run "make uninstall".
    454 This removes all the files that "make install" installed. However, it does not
    455 remove any directories, because these are often shared with other programs.
    456 
    457 
    458 Retrieving configuration information
    459 ------------------------------------
    460 
    461 Running "make install" installs the command pcre2-config, which can be used to
    462 recall information about the PCRE2 configuration and installation. For example:
    463 
    464   pcre2-config --version
    465 
    466 prints the version number, and
    467 
    468   pcre2-config --libs8
    469 
    470 outputs information about where the 8-bit library is installed. This command
    471 can be included in makefiles for programs that use PCRE2, saving the programmer
    472 from having to remember too many details. Run pcre2-config with no arguments to
    473 obtain a list of possible arguments.
    474 
    475 The pkg-config command is another system for saving and retrieving information
    476 about installed libraries. Instead of separate commands for each library, a
    477 single command is used. For example:
    478 
    479   pkg-config --libs libpcre2-16
    480 
    481 The data is held in *.pc files that are installed in a directory called
    482 <prefix>/lib/pkgconfig.
    483 
    484 
    485 Shared libraries
    486 ----------------
    487 
    488 The default distribution builds PCRE2 as shared libraries and static libraries,
    489 as long as the operating system supports shared libraries. Shared library
    490 support relies on the "libtool" script which is built as part of the
    491 "configure" process.
    492 
    493 The libtool script is used to compile and link both shared and static
    494 libraries. They are placed in a subdirectory called .libs when they are newly
    495 built. The programs pcre2test and pcre2grep are built to use these uninstalled
    496 libraries (by means of wrapper scripts in the case of shared libraries). When
    497 you use "make install" to install shared libraries, pcre2grep and pcre2test are
    498 automatically re-built to use the newly installed shared libraries before being
    499 installed themselves. However, the versions left in the build directory still
    500 use the uninstalled libraries.
    501 
    502 To build PCRE2 using static libraries only you must use --disable-shared when
    503 configuring it. For example:
    504 
    505 ./configure --prefix=/usr/gnu --disable-shared
    506 
    507 Then run "make" in the usual way. Similarly, you can use --disable-static to
    508 build only shared libraries.
    509 
    510 
    511 Cross-compiling using autotools
    512 -------------------------------
    513 
    514 You can specify CC and CFLAGS in the normal way to the "configure" command, in
    515 order to cross-compile PCRE2 for some other host. However, you should NOT
    516 specify --enable-rebuild-chartables, because if you do, the dftables.c source
    517 file is compiled and run on the local host, in order to generate the inbuilt
    518 character tables (the pcre2_chartables.c file). This will probably not work,
    519 because dftables.c needs to be compiled with the local compiler, not the cross
    520 compiler.
    521 
    522 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
    523 created by making a copy of pcre2_chartables.c.dist, which is a default set of
    524 tables that assumes ASCII code. Cross-compiling with the default tables should
    525 not be a problem.
    526 
    527 If you need to modify the character tables when cross-compiling, you should
    528 move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
    529 and run it on the local host to make a new version of pcre2_chartables.c.dist.
    530 Then when you cross-compile PCRE2 this new version of the tables will be used.
    531 
    532 
    533 Making new tarballs
    534 -------------------
    535 
    536 The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
    537 zip formats. The command "make distcheck" does the same, but then does a trial
    538 build of the new distribution to ensure that it works.
    539 
    540 If you have modified any of the man page sources in the doc directory, you
    541 should first run the PrepareRelease script before making a distribution. This
    542 script creates the .txt and HTML forms of the documentation from the man pages.
    543 
    544 
    545 Testing PCRE2
    546 ------------
    547 
    548 To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
    549 There is another script called RunGrepTest that tests the pcre2grep command.
    550 When JIT support is enabled, a third test program called pcre2_jit_test is
    551 built. Both the scripts and all the program tests are run if you obey "make
    552 check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
    553 
    554 The RunTest script runs the pcre2test test program (which is documented in its
    555 own man page) on each of the relevant testinput files in the testdata
    556 directory, and compares the output with the contents of the corresponding
    557 testoutput files. RunTest uses a file called testtry to hold the main output
    558 from pcre2test. Other files whose names begin with "test" are used as working
    559 files in some tests.
    560 
    561 Some tests are relevant only when certain build-time options were selected. For
    562 example, the tests for UTF-8/16/32 features are run only when Unicode support
    563 is available. RunTest outputs a comment when it skips a test.
    564 
    565 Many (but not all) of the tests that are not skipped are run twice if JIT
    566 support is available. On the second run, JIT compilation is forced. This
    567 testing can be suppressed by putting "nojit" on the RunTest command line.
    568 
    569 The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
    570 libraries that are enabled. If you want to run just one set of tests, call
    571 RunTest with either the -8, -16 or -32 option.
    572 
    573 If valgrind is installed, you can run the tests under it by putting "valgrind"
    574 on the RunTest command line. To run pcre2test on just one or more specific test
    575 files, give their numbers as arguments to RunTest, for example:
    576 
    577   RunTest 2 7 11
    578 
    579 You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
    580 end), or a number preceded by ~ to exclude a test. For example:
    581 
    582   Runtest 3-15 ~10
    583 
    584 This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
    585 except test 13. Whatever order the arguments are in, the tests are always run
    586 in numerical order.
    587 
    588 You can also call RunTest with the single argument "list" to cause it to output
    589 a list of tests.
    590 
    591 The test sequence starts with "test 0", which is a special test that has no
    592 input file, and whose output is not checked. This is because it will be
    593 different on different hardware and with different configurations. The test
    594 exists in order to exercise some of pcre2test's code that would not otherwise
    595 be run.
    596 
    597 Tests 1 and 2 can always be run, as they expect only plain text strings (not
    598 UTF) and make no use of Unicode properties. The first test file can be fed
    599 directly into the perltest.sh script to check that Perl gives the same results.
    600 The only difference you should see is in the first few lines, where the Perl
    601 version is given instead of the PCRE2 version. The second set of tests check
    602 auxiliary functions, error detection, and run-time flags that are specific to
    603 PCRE2. It also uses the debugging flags to check some of the internals of
    604 pcre2_compile().
    605 
    606 If you build PCRE2 with a locale setting that is not the standard C locale, the
    607 character tables may be different (see next paragraph). In some cases, this may
    608 cause failures in the second set of tests. For example, in a locale where the
    609 isprint() function yields TRUE for characters in the range 128-255, the use of
    610 [:isascii:] inside a character class defines a different set of characters, and
    611 this shows up in this test as a difference in the compiled code, which is being
    612 listed for checking. For example, where the comparison test output contains
    613 [\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
    614 cases. This is not a bug in PCRE2.
    615 
    616 Test 3 checks pcre2_maketables(), the facility for building a set of character
    617 tables for a specific locale and using them instead of the default tables. The
    618 script uses the "locale" command to check for the availability of the "fr_FR",
    619 "french", or "fr" locale, and uses the first one that it finds. If the "locale"
    620 command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
    621 the list of available locales, the third test cannot be run, and a comment is
    622 output to say why. If running this test produces an error like this:
    623 
    624   ** Failed to set locale "fr_FR"
    625 
    626 it means that the given locale is not available on your system, despite being
    627 listed by "locale". This does not mean that PCRE2 is broken. There are three
    628 alternative output files for the third test, because three different versions
    629 of the French locale have been encountered. The test passes if its output
    630 matches any one of them.
    631 
    632 Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
    633 with the perltest.sh script, and test 5 checking PCRE2-specific things.
    634 
    635 Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
    636 non-UTF mode and UTF-mode with Unicode property support, respectively.
    637 
    638 Test 8 checks some internal offsets and code size features; it is run only when
    639 the default "link size" of 2 is set (in other cases the sizes change) and when
    640 Unicode support is enabled.
    641 
    642 Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
    643 16-bit and 32-bit modes. These are tests that generate different output in
    644 8-bit mode. Each pair are for general cases and Unicode support, respectively.
    645 Test 13 checks the handling of non-UTF characters greater than 255 by
    646 pcre2_dfa_match() in 16-bit and 32-bit modes.
    647 
    648 Test 14 contains a number of tests that must not be run with JIT. They check,
    649 among other non-JIT things, the match-limiting features of the intepretive
    650 matcher.
    651 
    652 Test 15 is run only when JIT support is not available. It checks that an
    653 attempt to use JIT has the expected behaviour.
    654 
    655 Test 16 is run only when JIT support is available. It checks JIT complete and
    656 partial modes, match-limiting under JIT, and other JIT-specific features.
    657 
    658 Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
    659 the 8-bit library, without and with Unicode support, respectively.
    660 
    661 Test 19 checks the serialization functions by writing a set of compiled
    662 patterns to a file, and then reloading and checking them.
    663 
    664 
    665 Character tables
    666 ----------------
    667 
    668 For speed, PCRE2 uses four tables for manipulating and identifying characters
    669 whose code point values are less than 256. By default, a set of tables that is
    670 built into the library is used. The pcre2_maketables() function can be called
    671 by an application to create a new set of tables in the current locale. This are
    672 passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
    673 compile context.
    674 
    675 The source file called pcre2_chartables.c contains the default set of tables.
    676 By default, this is created as a copy of pcre2_chartables.c.dist, which
    677 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
    678 specified for ./configure, a different version of pcre2_chartables.c is built
    679 by the program dftables (compiled from dftables.c), which uses the ANSI C
    680 character handling functions such as isalnum(), isalpha(), isupper(),
    681 islower(), etc. to build the table sources. This means that the default C
    682 locale which is set for your system will control the contents of these default
    683 tables. You can change the default tables by editing pcre2_chartables.c and
    684 then re-building PCRE2. If you do this, you should take care to ensure that the
    685 file does not get automatically re-generated. The best way to do this is to
    686 move pcre2_chartables.c.dist out of the way and replace it with your customized
    687 tables.
    688 
    689 When the dftables program is run as a result of --enable-rebuild-chartables,
    690 it uses the default C locale that is set on your system. It does not pay
    691 attention to the LC_xxx environment variables. In other words, it uses the
    692 system's default locale rather than whatever the compiling user happens to have
    693 set. If you really do want to build a source set of character tables in a
    694 locale that is specified by the LC_xxx variables, you can run the dftables
    695 program by hand with the -L option. For example:
    696 
    697   ./dftables -L pcre2_chartables.c.special
    698 
    699 The first two 256-byte tables provide lower casing and case flipping functions,
    700 respectively. The next table consists of three 32-byte bit maps which identify
    701 digits, "word" characters, and white space, respectively. These are used when
    702 building 32-byte bit maps that represent character classes for code points less
    703 than 256. The final 256-byte table has bits indicating various character types,
    704 as follows:
    705 
    706     1   white space character
    707     2   letter
    708     4   decimal digit
    709     8   hexadecimal digit
    710    16   alphanumeric or '_'
    711   128   regular expression metacharacter or binary zero
    712 
    713 You should not alter the set of characters that contain the 128 bit, as that
    714 will cause PCRE2 to malfunction.
    715 
    716 
    717 File manifest
    718 -------------
    719 
    720 The distribution should contain the files listed below.
    721 
    722 (A) Source files for the PCRE2 library functions and their headers are found in
    723     the src directory:
    724 
    725   src/dftables.c           auxiliary program for building pcre2_chartables.c
    726                            when --enable-rebuild-chartables is specified
    727 
    728   src/pcre2_chartables.c.dist  a default set of character tables that assume
    729                            ASCII coding; unless --enable-rebuild-chartables is
    730                            specified, used by copying to pcre2_chartables.c
    731 
    732   src/pcre2posix.c         )
    733   src/pcre2_auto_possess.c )
    734   src/pcre2_compile.c      )
    735   src/pcre2_config.c       )
    736   src/pcre2_context.c      )
    737   src/pcre2_dfa_match.c    )
    738   src/pcre2_error.c        )
    739   src/pcre2_find_bracket.c )
    740   src/pcre2_jit_compile.c  )
    741   src/pcre2_jit_match.c    ) sources for the functions in the library,
    742   src/pcre2_jit_misc.c     )   and some internal functions that they use
    743   src/pcre2_maketables.c   )
    744   src/pcre2_match.c        )
    745   src/pcre2_match_data.c   )
    746   src/pcre2_newline.c      )
    747   src/pcre2_ord2utf.c      )
    748   src/pcre2_pattern_info.c )
    749   src/pcre2_serialize.c    )
    750   src/pcre2_string_utils.c )
    751   src/pcre2_study.c        )
    752   src/pcre2_substitute.c   )
    753   src/pcre2_substring.c    )
    754   src/pcre2_tables.c       )
    755   src/pcre2_ucd.c          )
    756   src/pcre2_valid_utf.c    )
    757   src/pcre2_xclass.c       )
    758 
    759   src/pcre2_printint.c     debugging function that is used by pcre2test,
    760 
    761   src/config.h.in          template for config.h, when built by "configure"
    762   src/pcre2.h.in           template for pcre2.h when built by "configure"
    763   src/pcre2posix.h         header for the external POSIX wrapper API
    764   src/pcre2_internal.h     header for internal use
    765   src/pcre2_intmodedep.h   a mode-specific internal header
    766   src/pcre2_ucp.h          header for Unicode property handling
    767 
    768   sljit/*                  source files for the JIT compiler
    769 
    770 (B) Source files for programs that use PCRE2:
    771 
    772   src/pcre2demo.c          simple demonstration of coding calls to PCRE2
    773   src/pcre2grep.c          source of a grep utility that uses PCRE2
    774   src/pcre2test.c          comprehensive test program
    775   src/pcre2_printint.c     part of pcre2test
    776   src/pcre2_jit_test.c     JIT test program
    777 
    778 (C) Auxiliary files:
    779 
    780   132html                  script to turn "man" pages into HTML
    781   AUTHORS                  information about the author of PCRE2
    782   ChangeLog                log of changes to the code
    783   CleanTxt                 script to clean nroff output for txt man pages
    784   Detrail                  script to remove trailing spaces
    785   HACKING                  some notes about the internals of PCRE2
    786   INSTALL                  generic installation instructions
    787   LICENCE                  conditions for the use of PCRE2
    788   COPYING                  the same, using GNU's standard name
    789   Makefile.in              ) template for Unix Makefile, which is built by
    790                            )   "configure"
    791   Makefile.am              ) the automake input that was used to create
    792                            )   Makefile.in
    793   NEWS                     important changes in this release
    794   NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
    795   PrepareRelease           script to make preparations for "make dist"
    796   README                   this file
    797   RunTest                  a Unix shell script for running tests
    798   RunGrepTest              a Unix shell script for pcre2grep tests
    799   aclocal.m4               m4 macros (generated by "aclocal")
    800   config.guess             ) files used by libtool,
    801   config.sub               )   used only when building a shared library
    802   configure                a configuring shell script (built by autoconf)
    803   configure.ac             ) the autoconf input that was used to build
    804                            )   "configure" and config.h
    805   depcomp                  ) script to find program dependencies, generated by
    806                            )   automake
    807   doc/*.3                  man page sources for PCRE2
    808   doc/*.1                  man page sources for pcre2grep and pcre2test
    809   doc/index.html.src       the base HTML page
    810   doc/html/*               HTML documentation
    811   doc/pcre2.txt            plain text version of the man pages
    812   doc/pcre2test.txt        plain text documentation of test program
    813   install-sh               a shell script for installing files
    814   libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
    815   libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
    816   libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
    817   libpcre2posix.pc.in      template for libpcre2posix.pc for pkg-config
    818   ltmain.sh                file used to build a libtool script
    819   missing                  ) common stub for a few missing GNU programs while
    820                            )   installing, generated by automake
    821   mkinstalldirs            script for making install directories
    822   perltest.sh              Script for running a Perl test program
    823   pcre2-config.in          source of script which retains PCRE2 information
    824   testdata/testinput*      test data for main library tests
    825   testdata/testoutput*     expected test results
    826   testdata/grep*           input and output for pcre2grep tests
    827   testdata/*               other supporting test files
    828 
    829 (D) Auxiliary files for cmake support
    830 
    831   cmake/COPYING-CMAKE-SCRIPTS
    832   cmake/FindPackageHandleStandardArgs.cmake
    833   cmake/FindEditline.cmake
    834   cmake/FindReadline.cmake
    835   CMakeLists.txt
    836   config-cmake.h.in
    837 
    838 (E) Auxiliary files for building PCRE2 "by hand"
    839 
    840   pcre2.h.generic         ) a version of the public PCRE2 header file
    841                           )   for use in non-"configure" environments
    842   config.h.generic        ) a version of config.h for use in non-"configure"
    843                           )   environments
    844 
    845 Philip Hazel
    846 Email local part: ph10
    847 Email domain: cam.ac.uk
    848 Last updated: 01 April 2016
    849