Home | History | Annotate | only in /external/pcre/dist2
Up to higher level directory
NameDateSize
132html22-Oct-20206.9K
aclocal.m422-Oct-202055.7K
ar-lib22-Oct-20205.7K
AUTHORS22-Oct-2020728
ChangeLog22-Oct-202085.8K
CheckMan22-Oct-20201.5K
CleanTxt22-Oct-20202.9K
cmake/22-Oct-2020
CMakeLists.txt22-Oct-202030K
compile22-Oct-20207.2K
config-cmake.h.in22-Oct-20201.4K
config.guess22-Oct-202043.2K
config.sub22-Oct-202035.7K
configure22-Oct-2020516K
configure.ac22-Oct-202038.2K
COPYING22-Oct-202097
depcomp22-Oct-202023K
Detrail22-Oct-2020643
doc/22-Oct-2020
HACKING22-Oct-202037.4K
INSTALL22-Oct-202015.4K
install-sh22-Oct-202014.3K
libpcre2-16.pc.in22-Oct-2020393
libpcre2-32.pc.in22-Oct-2020393
libpcre2-8.pc.in22-Oct-2020390
libpcre2-posix.pc.in22-Oct-2020329
LICENCE22-Oct-20203.4K
ltmain.sh22-Oct-2020323.5K
m4/22-Oct-2020
Makefile.am22-Oct-202024.8K
Makefile.in22-Oct-2020213.5K
missing22-Oct-20206.7K
NEWS22-Oct-20209.3K
NON-AUTOTOOLS-BUILD22-Oct-202017.3K
pcre2-config.in22-Oct-20202.2K
perltest.sh22-Oct-202010.5K
PrepareRelease22-Oct-20206.8K
README22-Oct-202041K
RunGrepTest22-Oct-202036.9K
RunGrepTest.bat22-Oct-202034.2K
RunTest22-Oct-202024.3K
RunTest.bat22-Oct-202013.4K
src/22-Oct-2020
test-driver22-Oct-20204.5K
testdata/22-Oct-2020

README

      1 README file for PCRE2 (Perl-compatible regular expression library)
      2 ------------------------------------------------------------------
      3 
      4 PCRE2 is a re-working of the original PCRE library to provide an entirely new
      5 API. The latest release of PCRE2 is always available in three alternative
      6 formats from:
      7 
      8   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
      9   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
     10   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
     11 
     12 There is a mailing list for discussion about the development of PCRE (both the
     13 original and new APIs) at pcre-dev (a] exim.org. You can access the archives and
     14 subscribe or manage your subscription here:
     15 
     16    https://lists.exim.org/mailman/listinfo/pcre-dev
     17 
     18 Please read the NEWS file if you are upgrading from a previous release. The
     19 contents of this README file are:
     20 
     21   The PCRE2 APIs
     22   Documentation for PCRE2
     23   Contributions by users of PCRE2
     24   Building PCRE2 on non-Unix-like systems
     25   Building PCRE2 without using autotools
     26   Building PCRE2 using autotools
     27   Retrieving configuration information
     28   Shared libraries
     29   Cross-compiling using autotools
     30   Making new tarballs
     31   Testing PCRE2
     32   Character tables
     33   File manifest
     34 
     35 
     36 The PCRE2 APIs
     37 --------------
     38 
     39 PCRE2 is written in C, and it has its own API. There are three sets of
     40 functions, one for the 8-bit library, which processes strings of bytes, one for
     41 the 16-bit library, which processes strings of 16-bit values, and one for the
     42 32-bit library, which processes strings of 32-bit values. There are no C++
     43 wrappers.
     44 
     45 The distribution does contain a set of C wrapper functions for the 8-bit
     46 library that are based on the POSIX regular expression API (see the pcre2posix
     47 man page). These can be found in a library called libpcre2-posix. Note that
     48 this just provides a POSIX calling interface to PCRE2; the regular expressions
     49 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
     50 and does not give full access to all of PCRE2's facilities.
     51 
     52 The header file for the POSIX-style functions is called pcre2posix.h. The
     53 official POSIX name is regex.h, but I did not want to risk possible problems
     54 with existing files of that name by distributing it that way. To use PCRE2 with
     55 an existing program that uses the POSIX API, pcre2posix.h will have to be
     56 renamed or pointed at by a link.
     57 
     58 If you are using the POSIX interface to PCRE2 and there is already a POSIX
     59 regex library installed on your system, as well as worrying about the regex.h
     60 header file (as mentioned above), you must also take care when linking programs
     61 to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
     62 may pick up the POSIX functions of the same name from the other library.
     63 
     64 One way of avoiding this confusion is to compile PCRE2 with the addition of
     65 -Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
     66 compiler flags (CFLAGS if you are using "configure" -- see below). This has the
     67 effect of renaming the functions so that the names no longer clash. Of course,
     68 you have to do the same thing for your applications, or write them using the
     69 new names.
     70 
     71 
     72 Documentation for PCRE2
     73 -----------------------
     74 
     75 If you install PCRE2 in the normal way on a Unix-like system, you will end up
     76 with a set of man pages whose names all start with "pcre2". The one that is
     77 just called "pcre2" lists all the others. In addition to these man pages, the
     78 PCRE2 documentation is supplied in two other forms:
     79 
     80   1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
     81      doc/pcre2test.txt in the source distribution. The first of these is a
     82      concatenation of the text forms of all the section 3 man pages except the
     83      listing of pcre2demo.c and those that summarize individual functions. The
     84      other two are the text forms of the section 1 man pages for the pcre2grep
     85      and pcre2test commands. These text forms are provided for ease of scanning
     86      with text editors or similar tools. They are installed in
     87      <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
     88      (defaulting to /usr/local).
     89 
     90   2. A set of files containing all the documentation in HTML form, hyperlinked
     91      in various ways, and rooted in a file called index.html, is distributed in
     92      doc/html and installed in <prefix>/share/doc/pcre2/html.
     93 
     94 
     95 Building PCRE2 on non-Unix-like systems
     96 ---------------------------------------
     97 
     98 For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
     99 your system supports the use of "configure" and "make" you may be able to build
    100 PCRE2 using autotools in the same way as for many Unix-like systems.
    101 
    102 PCRE2 can also be configured using CMake, which can be run in various ways
    103 (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
    104 NON-AUTOTOOLS-BUILD has information about CMake.
    105 
    106 PCRE2 has been compiled on many different operating systems. It should be
    107 straightforward to build PCRE2 on any system that has a Standard C compiler and
    108 library, because it uses only Standard C functions.
    109 
    110 
    111 Building PCRE2 without using autotools
    112 --------------------------------------
    113 
    114 The use of autotools (in particular, libtool) is problematic in some
    115 environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
    116 file for ways of building PCRE2 without using autotools.
    117 
    118 
    119 Building PCRE2 using autotools
    120 ------------------------------
    121 
    122 The following instructions assume the use of the widely used "configure; make;
    123 make install" (autotools) process.
    124 
    125 To build PCRE2 on system that supports autotools, first run the "configure"
    126 command from the PCRE2 distribution directory, with your current directory set
    127 to the directory where you want the files to be created. This command is a
    128 standard GNU "autoconf" configuration script, for which generic instructions
    129 are supplied in the file INSTALL.
    130 
    131 Most commonly, people build PCRE2 within its own distribution directory, and in
    132 this case, on many systems, just running "./configure" is sufficient. However,
    133 the usual methods of changing standard defaults are available. For example:
    134 
    135 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
    136 
    137 This command specifies that the C compiler should be run with the flags '-O2
    138 -Wall' instead of the default, and that "make install" should install PCRE2
    139 under /opt/local instead of the default /usr/local.
    140 
    141 If you want to build in a different directory, just run "configure" with that
    142 directory as current. For example, suppose you have unpacked the PCRE2 source
    143 into /source/pcre2/pcre2-xxx, but you want to build it in
    144 /build/pcre2/pcre2-xxx:
    145 
    146 cd /build/pcre2/pcre2-xxx
    147 /source/pcre2/pcre2-xxx/configure
    148 
    149 PCRE2 is written in C and is normally compiled as a C library. However, it is
    150 possible to build it as a C++ library, though the provided building apparatus
    151 does not have any features to support this.
    152 
    153 There are some optional features that can be included or omitted from the PCRE2
    154 library. They are also documented in the pcre2build man page.
    155 
    156 . By default, both shared and static libraries are built. You can change this
    157   by adding one of these options to the "configure" command:
    158 
    159   --disable-shared
    160   --disable-static
    161 
    162   (See also "Shared libraries on Unix-like systems" below.)
    163 
    164 . By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
    165   the "configure" command, the 16-bit library is also built. If you add
    166   --enable-pcre2-32 to the "configure" command, the 32-bit library is also
    167   built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
    168   to disable building the 8-bit library.
    169 
    170 . If you want to include support for just-in-time (JIT) compiling, which can
    171   give large performance improvements on certain platforms, add --enable-jit to
    172   the "configure" command. This support is available only for certain hardware
    173   architectures. If you try to enable it on an unsupported architecture, there
    174   will be a compile time error. If in doubt, use --enable-jit=auto, which
    175   enables JIT only if the current hardware is supported.
    176 
    177 . If you are enabling JIT under SELinux you may also want to add
    178   --enable-jit-sealloc, which enables the use of an execmem allocator in JIT
    179   that is compatible with SELinux. This has no effect if JIT is not enabled.
    180 
    181 . If you do not want to make use of the default support for UTF-8 Unicode
    182   character strings in the 8-bit library, UTF-16 Unicode character strings in
    183   the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
    184   library, you can add --disable-unicode to the "configure" command. This
    185   reduces the size of the libraries. It is not possible to configure one
    186   library with Unicode support, and another without, in the same configuration.
    187   It is also not possible to use --enable-ebcdic (see below) with Unicode
    188   support, so if this option is set, you must also use --disable-unicode.
    189 
    190   When Unicode support is available, the use of a UTF encoding still has to be
    191   enabled by setting the PCRE2_UTF option at run time or starting a pattern
    192   with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
    193   either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
    194 
    195   As well as supporting UTF strings, Unicode support includes support for the
    196   \P, \p, and \X sequences that recognize Unicode character properties.
    197   However, only the basic two-letter properties such as Lu are supported.
    198   Escape sequences such as \d and \w in patterns do not by default make use of
    199   Unicode properties, but can be made to do so by setting the PCRE2_UCP option
    200   or starting a pattern with (*UCP).
    201 
    202 . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
    203   of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
    204   character as indicating the end of a line. Whatever you specify at build time
    205   is the default; the caller of PCRE2 can change the selection at run time. The
    206   default newline indicator is a single LF character (the Unix standard). You
    207   can specify the default newline indicator by adding --enable-newline-is-cr,
    208   --enable-newline-is-lf, --enable-newline-is-crlf,
    209   --enable-newline-is-anycrlf, --enable-newline-is-any, or
    210   --enable-newline-is-nul to the "configure" command, respectively.
    211 
    212 . By default, the sequence \R in a pattern matches any Unicode line ending
    213   sequence. This is independent of the option specifying what PCRE2 considers
    214   to be the end of a line (see above). However, the caller of PCRE2 can
    215   restrict \R to match only CR, LF, or CRLF. You can make this the default by
    216   adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
    217 
    218 . In a pattern, the escape sequence \C matches a single code unit, even in a
    219   UTF mode. This can be dangerous because it breaks up multi-code-unit
    220   characters. You can build PCRE2 with the use of \C permanently locked out by
    221   adding --enable-never-backslash-C (note the upper case C) to the "configure"
    222   command. When \C is allowed by the library, individual applications can lock
    223   it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
    224 
    225 . PCRE2 has a counter that limits the depth of nesting of parentheses in a
    226   pattern. This limits the amount of system stack that a pattern uses when it
    227   is compiled. The default is 250, but you can change it by setting, for
    228   example,
    229 
    230   --with-parens-nest-limit=500
    231 
    232 . PCRE2 has a counter that can be set to limit the amount of computing resource
    233   it uses when matching a pattern. If the limit is exceeded during a match, the
    234   match fails. The default is ten million. You can change the default by
    235   setting, for example,
    236 
    237   --with-match-limit=500000
    238 
    239   on the "configure" command. This is just the default; individual calls to
    240   pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
    241   discussion in the pcre2api man page (search for pcre2_set_match_limit).
    242 
    243 . There is a separate counter that limits the depth of nested backtracking
    244   (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
    245   matching process, which indirectly limits the amount of heap memory that is
    246   used, and in the case of pcre2_dfa_match() the amount of stack as well. This
    247   counter also has a default of ten million, which is essentially "unlimited".
    248   You can change the default by setting, for example,
    249 
    250   --with-match-limit-depth=5000
    251 
    252   There is more discussion in the pcre2api man page (search for
    253   pcre2_set_depth_limit).
    254 
    255 . You can also set an explicit limit on the amount of heap memory used by
    256   the pcre2_match() and pcre2_dfa_match() interpreters:
    257 
    258   --with-heap-limit=500
    259 
    260   The units are kibibytes (units of 1024 bytes). This limit does not apply when
    261   the JIT optimization (which has its own memory control features) is used.
    262   There is more discussion on the pcre2api man page (search for
    263   pcre2_set_heap_limit).
    264 
    265 . In the 8-bit library, the default maximum compiled pattern size is around
    266   64 kibibytes. You can increase this by adding --with-link-size=3 to the
    267   "configure" command. PCRE2 then uses three bytes instead of two for offsets
    268   to different parts of the compiled pattern. In the 16-bit library,
    269   --with-link-size=3 is the same as --with-link-size=4, which (in both
    270   libraries) uses four-byte offsets. Increasing the internal link size reduces
    271   performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
    272   link size setting is ignored, as 4-byte offsets are always used.
    273 
    274 . For speed, PCRE2 uses four tables for manipulating and identifying characters
    275   whose code point values are less than 256. By default, it uses a set of
    276   tables for ASCII encoding that is part of the distribution. If you specify
    277 
    278   --enable-rebuild-chartables
    279 
    280   a program called dftables is compiled and run in the default C locale when
    281   you obey "make". It builds a source file called pcre2_chartables.c. If you do
    282   not specify this option, pcre2_chartables.c is created as a copy of
    283   pcre2_chartables.c.dist. See "Character tables" below for further
    284   information.
    285 
    286 . It is possible to compile PCRE2 for use on systems that use EBCDIC as their
    287   character code (as opposed to ASCII/Unicode) by specifying
    288 
    289   --enable-ebcdic --disable-unicode
    290 
    291   This automatically implies --enable-rebuild-chartables (see above). However,
    292   when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
    293   both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
    294   which specifies that the code value for the EBCDIC NL character is 0x25
    295   instead of the default 0x15.
    296 
    297 . If you specify --enable-debug, additional debugging code is included in the
    298   build. This option is intended for use by the PCRE2 maintainers.
    299 
    300 . In environments where valgrind is installed, if you specify
    301 
    302   --enable-valgrind
    303 
    304   PCRE2 will use valgrind annotations to mark certain memory regions as
    305   unaddressable. This allows it to detect invalid memory accesses, and is
    306   mostly useful for debugging PCRE2 itself.
    307 
    308 . In environments where the gcc compiler is used and lcov version 1.6 or above
    309   is installed, if you specify
    310 
    311   --enable-coverage
    312 
    313   the build process implements a code coverage report for the test suite. The
    314   report is generated by running "make coverage". If ccache is installed on
    315   your system, it must be disabled when building PCRE2 for coverage reporting.
    316   You can do this by setting the environment variable CCACHE_DISABLE=1 before
    317   running "make" to build PCRE2. There is more information about coverage
    318   reporting in the "pcre2build" documentation.
    319 
    320 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
    321   you add --disable-pcre2grep-jit to the "configure" command.
    322 
    323 . There is support for calling external programs during matching in the
    324   pcre2grep command, using PCRE2's callout facility with string arguments. This
    325   support can be disabled by adding --disable-pcre2grep-callout to the
    326   "configure" command.
    327 
    328 . The pcre2grep program currently supports only 8-bit data files, and so
    329   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
    330   libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
    331   specifying one or both of
    332 
    333   --enable-pcre2grep-libz
    334   --enable-pcre2grep-libbz2
    335 
    336   Of course, the relevant libraries must be installed on your system.
    337 
    338 . The default starting size (in bytes) of the internal buffer used by pcre2grep
    339   can be set by, for example:
    340 
    341   --with-pcre2grep-bufsize=51200
    342 
    343   The value must be a plain integer. The default is 20480. The amount of memory
    344   used by pcre2grep is actually three times this number, to allow for "before"
    345   and "after" lines. If very long lines are encountered, the buffer is
    346   automatically enlarged, up to a fixed maximum size.
    347 
    348 . The default maximum size of pcre2grep's internal buffer can be set by, for
    349   example:
    350 
    351   --with-pcre2grep-max-bufsize=2097152
    352 
    353   The default is either 1048576 or the value of --with-pcre2grep-bufsize,
    354   whichever is the larger.
    355 
    356 . It is possible to compile pcre2test so that it links with the libreadline
    357   or libedit libraries, by specifying, respectively,
    358 
    359   --enable-pcre2test-libreadline or --enable-pcre2test-libedit
    360 
    361   If this is done, when pcre2test's input is from a terminal, it reads it using
    362   the readline() function. This provides line-editing and history facilities.
    363   Note that libreadline is GPL-licenced, so if you distribute a binary of
    364   pcre2test linked in this way, there may be licensing issues. These can be
    365   avoided by linking with libedit (which has a BSD licence) instead.
    366 
    367   Enabling libreadline causes the -lreadline option to be added to the
    368   pcre2test build. In many operating environments with a sytem-installed
    369   readline library this is sufficient. However, in some environments (e.g. if
    370   an unmodified distribution version of readline is in use), it may be
    371   necessary to specify something like LIBS="-lncurses" as well. This is
    372   because, to quote the readline INSTALL, "Readline uses the termcap functions,
    373   but does not link with the termcap or curses library itself, allowing
    374   applications which link with readline the to choose an appropriate library."
    375   If you get error messages about missing functions tgetstr, tgetent, tputs,
    376   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
    377   should fix it.
    378 
    379 . There is a special option called --enable-fuzz-support for use by people who
    380   want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
    381   library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
    382   be built, but not installed. This contains a single function called
    383   LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
    384   length of the string. When called, this function tries to compile the string
    385   as a pattern, and if that succeeds, to match it. This is done both with no
    386   options and with some random options bits that are generated from the string.
    387   Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
    388   be created. This is normally run under valgrind or used when PCRE2 is
    389   compiled with address sanitizing enabled. It calls the fuzzing function and
    390   outputs information about it is doing. The input strings are specified by
    391   arguments: if an argument starts with "=" the rest of it is a literal input
    392   string. Otherwise, it is assumed to be a file name, and the contents of the
    393   file are the test string.
    394 
    395 . Releases before 10.30 could be compiled with --disable-stack-for-recursion,
    396   which caused pcre2_match() to use individual blocks on the heap for
    397   backtracking instead of recursive function calls (which use the stack). This
    398   is now obsolete since pcre2_match() was refactored always to use the heap (in
    399   a much more efficient way than before). This option is retained for backwards
    400   compatibility, but has no effect other than to output a warning.
    401 
    402 The "configure" script builds the following files for the basic C library:
    403 
    404 . Makefile             the makefile that builds the library
    405 . src/config.h         build-time configuration options for the library
    406 . src/pcre2.h          the public PCRE2 header file
    407 . pcre2-config          script that shows the building settings such as CFLAGS
    408                          that were set for "configure"
    409 . libpcre2-8.pc        )
    410 . libpcre2-16.pc       ) data for the pkg-config command
    411 . libpcre2-32.pc       )
    412 . libpcre2-posix.pc    )
    413 . libtool              script that builds shared and/or static libraries
    414 
    415 Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
    416 tarballs under the names config.h.generic and pcre2.h.generic. These are
    417 provided for those who have to build PCRE2 without using "configure" or CMake.
    418 If you use "configure" or CMake, the .generic versions are not used.
    419 
    420 The "configure" script also creates config.status, which is an executable
    421 script that can be run to recreate the configuration, and config.log, which
    422 contains compiler output from tests that "configure" runs.
    423 
    424 Once "configure" has run, you can run "make". This builds whichever of the
    425 libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
    426 program called pcre2test. If you enabled JIT support with --enable-jit, another
    427 test program called pcre2_jit_test is built as well. If the 8-bit library is
    428 built, libpcre2-posix and the pcre2grep command are also built. Running
    429 "make" with the -j option may speed up compilation on multiprocessor systems.
    430 
    431 The command "make check" runs all the appropriate tests. Details of the PCRE2
    432 tests are given below in a separate section of this document. The -j option of
    433 "make" can also be used when running the tests.
    434 
    435 You can use "make install" to install PCRE2 into live directories on your
    436 system. The following are installed (file names are all relative to the
    437 <prefix> that is set when "configure" is run):
    438 
    439   Commands (bin):
    440     pcre2test
    441     pcre2grep (if 8-bit support is enabled)
    442     pcre2-config
    443 
    444   Libraries (lib):
    445     libpcre2-8      (if 8-bit support is enabled)
    446     libpcre2-16     (if 16-bit support is enabled)
    447     libpcre2-32     (if 32-bit support is enabled)
    448     libpcre2-posix  (if 8-bit support is enabled)
    449 
    450   Configuration information (lib/pkgconfig):
    451     libpcre2-8.pc
    452     libpcre2-16.pc
    453     libpcre2-32.pc
    454     libpcre2-posix.pc
    455 
    456   Header files (include):
    457     pcre2.h
    458     pcre2posix.h
    459 
    460   Man pages (share/man/man{1,3}):
    461     pcre2grep.1
    462     pcre2test.1
    463     pcre2-config.1
    464     pcre2.3
    465     pcre2*.3 (lots more pages, all starting "pcre2")
    466 
    467   HTML documentation (share/doc/pcre2/html):
    468     index.html
    469     *.html (lots more pages, hyperlinked from index.html)
    470 
    471   Text file documentation (share/doc/pcre2):
    472     AUTHORS
    473     COPYING
    474     ChangeLog
    475     LICENCE
    476     NEWS
    477     README
    478     pcre2.txt         (a concatenation of the man(3) pages)
    479     pcre2test.txt     the pcre2test man page
    480     pcre2grep.txt     the pcre2grep man page
    481     pcre2-config.txt  the pcre2-config man page
    482 
    483 If you want to remove PCRE2 from your system, you can run "make uninstall".
    484 This removes all the files that "make install" installed. However, it does not
    485 remove any directories, because these are often shared with other programs.
    486 
    487 
    488 Retrieving configuration information
    489 ------------------------------------
    490 
    491 Running "make install" installs the command pcre2-config, which can be used to
    492 recall information about the PCRE2 configuration and installation. For example:
    493 
    494   pcre2-config --version
    495 
    496 prints the version number, and
    497 
    498   pcre2-config --libs8
    499 
    500 outputs information about where the 8-bit library is installed. This command
    501 can be included in makefiles for programs that use PCRE2, saving the programmer
    502 from having to remember too many details. Run pcre2-config with no arguments to
    503 obtain a list of possible arguments.
    504 
    505 The pkg-config command is another system for saving and retrieving information
    506 about installed libraries. Instead of separate commands for each library, a
    507 single command is used. For example:
    508 
    509   pkg-config --libs libpcre2-16
    510 
    511 The data is held in *.pc files that are installed in a directory called
    512 <prefix>/lib/pkgconfig.
    513 
    514 
    515 Shared libraries
    516 ----------------
    517 
    518 The default distribution builds PCRE2 as shared libraries and static libraries,
    519 as long as the operating system supports shared libraries. Shared library
    520 support relies on the "libtool" script which is built as part of the
    521 "configure" process.
    522 
    523 The libtool script is used to compile and link both shared and static
    524 libraries. They are placed in a subdirectory called .libs when they are newly
    525 built. The programs pcre2test and pcre2grep are built to use these uninstalled
    526 libraries (by means of wrapper scripts in the case of shared libraries). When
    527 you use "make install" to install shared libraries, pcre2grep and pcre2test are
    528 automatically re-built to use the newly installed shared libraries before being
    529 installed themselves. However, the versions left in the build directory still
    530 use the uninstalled libraries.
    531 
    532 To build PCRE2 using static libraries only you must use --disable-shared when
    533 configuring it. For example:
    534 
    535 ./configure --prefix=/usr/gnu --disable-shared
    536 
    537 Then run "make" in the usual way. Similarly, you can use --disable-static to
    538 build only shared libraries.
    539 
    540 
    541 Cross-compiling using autotools
    542 -------------------------------
    543 
    544 You can specify CC and CFLAGS in the normal way to the "configure" command, in
    545 order to cross-compile PCRE2 for some other host. However, you should NOT
    546 specify --enable-rebuild-chartables, because if you do, the dftables.c source
    547 file is compiled and run on the local host, in order to generate the inbuilt
    548 character tables (the pcre2_chartables.c file). This will probably not work,
    549 because dftables.c needs to be compiled with the local compiler, not the cross
    550 compiler.
    551 
    552 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
    553 created by making a copy of pcre2_chartables.c.dist, which is a default set of
    554 tables that assumes ASCII code. Cross-compiling with the default tables should
    555 not be a problem.
    556 
    557 If you need to modify the character tables when cross-compiling, you should
    558 move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
    559 and run it on the local host to make a new version of pcre2_chartables.c.dist.
    560 Then when you cross-compile PCRE2 this new version of the tables will be used.
    561 
    562 
    563 Making new tarballs
    564 -------------------
    565 
    566 The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
    567 zip formats. The command "make distcheck" does the same, but then does a trial
    568 build of the new distribution to ensure that it works.
    569 
    570 If you have modified any of the man page sources in the doc directory, you
    571 should first run the PrepareRelease script before making a distribution. This
    572 script creates the .txt and HTML forms of the documentation from the man pages.
    573 
    574 
    575 Testing PCRE2
    576 -------------
    577 
    578 To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
    579 There is another script called RunGrepTest that tests the pcre2grep command.
    580 When JIT support is enabled, a third test program called pcre2_jit_test is
    581 built. Both the scripts and all the program tests are run if you obey "make
    582 check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
    583 
    584 The RunTest script runs the pcre2test test program (which is documented in its
    585 own man page) on each of the relevant testinput files in the testdata
    586 directory, and compares the output with the contents of the corresponding
    587 testoutput files. RunTest uses a file called testtry to hold the main output
    588 from pcre2test. Other files whose names begin with "test" are used as working
    589 files in some tests.
    590 
    591 Some tests are relevant only when certain build-time options were selected. For
    592 example, the tests for UTF-8/16/32 features are run only when Unicode support
    593 is available. RunTest outputs a comment when it skips a test.
    594 
    595 Many (but not all) of the tests that are not skipped are run twice if JIT
    596 support is available. On the second run, JIT compilation is forced. This
    597 testing can be suppressed by putting "nojit" on the RunTest command line.
    598 
    599 The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
    600 libraries that are enabled. If you want to run just one set of tests, call
    601 RunTest with either the -8, -16 or -32 option.
    602 
    603 If valgrind is installed, you can run the tests under it by putting "valgrind"
    604 on the RunTest command line. To run pcre2test on just one or more specific test
    605 files, give their numbers as arguments to RunTest, for example:
    606 
    607   RunTest 2 7 11
    608 
    609 You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
    610 end), or a number preceded by ~ to exclude a test. For example:
    611 
    612   Runtest 3-15 ~10
    613 
    614 This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
    615 except test 13. Whatever order the arguments are in, the tests are always run
    616 in numerical order.
    617 
    618 You can also call RunTest with the single argument "list" to cause it to output
    619 a list of tests.
    620 
    621 The test sequence starts with "test 0", which is a special test that has no
    622 input file, and whose output is not checked. This is because it will be
    623 different on different hardware and with different configurations. The test
    624 exists in order to exercise some of pcre2test's code that would not otherwise
    625 be run.
    626 
    627 Tests 1 and 2 can always be run, as they expect only plain text strings (not
    628 UTF) and make no use of Unicode properties. The first test file can be fed
    629 directly into the perltest.sh script to check that Perl gives the same results.
    630 The only difference you should see is in the first few lines, where the Perl
    631 version is given instead of the PCRE2 version. The second set of tests check
    632 auxiliary functions, error detection, and run-time flags that are specific to
    633 PCRE2. It also uses the debugging flags to check some of the internals of
    634 pcre2_compile().
    635 
    636 If you build PCRE2 with a locale setting that is not the standard C locale, the
    637 character tables may be different (see next paragraph). In some cases, this may
    638 cause failures in the second set of tests. For example, in a locale where the
    639 isprint() function yields TRUE for characters in the range 128-255, the use of
    640 [:isascii:] inside a character class defines a different set of characters, and
    641 this shows up in this test as a difference in the compiled code, which is being
    642 listed for checking. For example, where the comparison test output contains
    643 [\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
    644 cases. This is not a bug in PCRE2.
    645 
    646 Test 3 checks pcre2_maketables(), the facility for building a set of character
    647 tables for a specific locale and using them instead of the default tables. The
    648 script uses the "locale" command to check for the availability of the "fr_FR",
    649 "french", or "fr" locale, and uses the first one that it finds. If the "locale"
    650 command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
    651 the list of available locales, the third test cannot be run, and a comment is
    652 output to say why. If running this test produces an error like this:
    653 
    654   ** Failed to set locale "fr_FR"
    655 
    656 it means that the given locale is not available on your system, despite being
    657 listed by "locale". This does not mean that PCRE2 is broken. There are three
    658 alternative output files for the third test, because three different versions
    659 of the French locale have been encountered. The test passes if its output
    660 matches any one of them.
    661 
    662 Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
    663 with the perltest.sh script, and test 5 checking PCRE2-specific things.
    664 
    665 Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
    666 non-UTF mode and UTF-mode with Unicode property support, respectively.
    667 
    668 Test 8 checks some internal offsets and code size features, but it is run only
    669 when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
    670 32-bit modes and for different link sizes, so there are different output files
    671 for each mode and link size.
    672 
    673 Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
    674 16-bit and 32-bit modes. These are tests that generate different output in
    675 8-bit mode. Each pair are for general cases and Unicode support, respectively.
    676 
    677 Test 13 checks the handling of non-UTF characters greater than 255 by
    678 pcre2_dfa_match() in 16-bit and 32-bit modes.
    679 
    680 Test 14 contains some special UTF and UCP tests that give different output for
    681 different code unit widths.
    682 
    683 Test 15 contains a number of tests that must not be run with JIT. They check,
    684 among other non-JIT things, the match-limiting features of the intepretive
    685 matcher.
    686 
    687 Test 16 is run only when JIT support is not available. It checks that an
    688 attempt to use JIT has the expected behaviour.
    689 
    690 Test 17 is run only when JIT support is available. It checks JIT complete and
    691 partial modes, match-limiting under JIT, and other JIT-specific features.
    692 
    693 Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
    694 the 8-bit library, without and with Unicode support, respectively.
    695 
    696 Test 20 checks the serialization functions by writing a set of compiled
    697 patterns to a file, and then reloading and checking them.
    698 
    699 Tests 21 and 22 test \C support when the use of \C is not locked out, without
    700 and with UTF support, respectively. Test 23 tests \C when it is locked out.
    701 
    702 Tests 24 and 25 test the experimental pattern conversion functions, without and
    703 with UTF support, respectively.
    704 
    705 
    706 Character tables
    707 ----------------
    708 
    709 For speed, PCRE2 uses four tables for manipulating and identifying characters
    710 whose code point values are less than 256. By default, a set of tables that is
    711 built into the library is used. The pcre2_maketables() function can be called
    712 by an application to create a new set of tables in the current locale. This are
    713 passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
    714 compile context.
    715 
    716 The source file called pcre2_chartables.c contains the default set of tables.
    717 By default, this is created as a copy of pcre2_chartables.c.dist, which
    718 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
    719 specified for ./configure, a different version of pcre2_chartables.c is built
    720 by the program dftables (compiled from dftables.c), which uses the ANSI C
    721 character handling functions such as isalnum(), isalpha(), isupper(),
    722 islower(), etc. to build the table sources. This means that the default C
    723 locale that is set for your system will control the contents of these default
    724 tables. You can change the default tables by editing pcre2_chartables.c and
    725 then re-building PCRE2. If you do this, you should take care to ensure that the
    726 file does not get automatically re-generated. The best way to do this is to
    727 move pcre2_chartables.c.dist out of the way and replace it with your customized
    728 tables.
    729 
    730 When the dftables program is run as a result of --enable-rebuild-chartables,
    731 it uses the default C locale that is set on your system. It does not pay
    732 attention to the LC_xxx environment variables. In other words, it uses the
    733 system's default locale rather than whatever the compiling user happens to have
    734 set. If you really do want to build a source set of character tables in a
    735 locale that is specified by the LC_xxx variables, you can run the dftables
    736 program by hand with the -L option. For example:
    737 
    738   ./dftables -L pcre2_chartables.c.special
    739 
    740 The first two 256-byte tables provide lower casing and case flipping functions,
    741 respectively. The next table consists of three 32-byte bit maps which identify
    742 digits, "word" characters, and white space, respectively. These are used when
    743 building 32-byte bit maps that represent character classes for code points less
    744 than 256. The final 256-byte table has bits indicating various character types,
    745 as follows:
    746 
    747     1   white space character
    748     2   letter
    749     4   decimal digit
    750     8   hexadecimal digit
    751    16   alphanumeric or '_'
    752   128   regular expression metacharacter or binary zero
    753 
    754 You should not alter the set of characters that contain the 128 bit, as that
    755 will cause PCRE2 to malfunction.
    756 
    757 
    758 File manifest
    759 -------------
    760 
    761 The distribution should contain the files listed below.
    762 
    763 (A) Source files for the PCRE2 library functions and their headers are found in
    764     the src directory:
    765 
    766   src/dftables.c           auxiliary program for building pcre2_chartables.c
    767                            when --enable-rebuild-chartables is specified
    768 
    769   src/pcre2_chartables.c.dist  a default set of character tables that assume
    770                            ASCII coding; unless --enable-rebuild-chartables is
    771                            specified, used by copying to pcre2_chartables.c
    772 
    773   src/pcre2posix.c         )
    774   src/pcre2_auto_possess.c )
    775   src/pcre2_compile.c      )
    776   src/pcre2_config.c       )
    777   src/pcre2_context.c      )
    778   src/pcre2_convert.c      )
    779   src/pcre2_dfa_match.c    )
    780   src/pcre2_error.c        )
    781   src/pcre2_extuni.c       )
    782   src/pcre2_find_bracket.c )
    783   src/pcre2_jit_compile.c  )
    784   src/pcre2_jit_match.c    ) sources for the functions in the library,
    785   src/pcre2_jit_misc.c     )   and some internal functions that they use
    786   src/pcre2_maketables.c   )
    787   src/pcre2_match.c        )
    788   src/pcre2_match_data.c   )
    789   src/pcre2_newline.c      )
    790   src/pcre2_ord2utf.c      )
    791   src/pcre2_pattern_info.c )
    792   src/pcre2_serialize.c    )
    793   src/pcre2_string_utils.c )
    794   src/pcre2_study.c        )
    795   src/pcre2_substitute.c   )
    796   src/pcre2_substring.c    )
    797   src/pcre2_tables.c       )
    798   src/pcre2_ucd.c          )
    799   src/pcre2_valid_utf.c    )
    800   src/pcre2_xclass.c       )
    801 
    802   src/pcre2_printint.c     debugging function that is used by pcre2test,
    803   src/pcre2_fuzzsupport.c  function for (optional) fuzzing support
    804 
    805   src/config.h.in          template for config.h, when built by "configure"
    806   src/pcre2.h.in           template for pcre2.h when built by "configure"
    807   src/pcre2posix.h         header for the external POSIX wrapper API
    808   src/pcre2_internal.h     header for internal use
    809   src/pcre2_intmodedep.h   a mode-specific internal header
    810   src/pcre2_ucp.h          header for Unicode property handling
    811 
    812   sljit/*                  source files for the JIT compiler
    813 
    814 (B) Source files for programs that use PCRE2:
    815 
    816   src/pcre2demo.c          simple demonstration of coding calls to PCRE2
    817   src/pcre2grep.c          source of a grep utility that uses PCRE2
    818   src/pcre2test.c          comprehensive test program
    819   src/pcre2_jit_test.c     JIT test program
    820 
    821 (C) Auxiliary files:
    822 
    823   132html                  script to turn "man" pages into HTML
    824   AUTHORS                  information about the author of PCRE2
    825   ChangeLog                log of changes to the code
    826   CleanTxt                 script to clean nroff output for txt man pages
    827   Detrail                  script to remove trailing spaces
    828   HACKING                  some notes about the internals of PCRE2
    829   INSTALL                  generic installation instructions
    830   LICENCE                  conditions for the use of PCRE2
    831   COPYING                  the same, using GNU's standard name
    832   Makefile.in              ) template for Unix Makefile, which is built by
    833                            )   "configure"
    834   Makefile.am              ) the automake input that was used to create
    835                            )   Makefile.in
    836   NEWS                     important changes in this release
    837   NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
    838   PrepareRelease           script to make preparations for "make dist"
    839   README                   this file
    840   RunTest                  a Unix shell script for running tests
    841   RunGrepTest              a Unix shell script for pcre2grep tests
    842   aclocal.m4               m4 macros (generated by "aclocal")
    843   config.guess             ) files used by libtool,
    844   config.sub               )   used only when building a shared library
    845   configure                a configuring shell script (built by autoconf)
    846   configure.ac             ) the autoconf input that was used to build
    847                            )   "configure" and config.h
    848   depcomp                  ) script to find program dependencies, generated by
    849                            )   automake
    850   doc/*.3                  man page sources for PCRE2
    851   doc/*.1                  man page sources for pcre2grep and pcre2test
    852   doc/index.html.src       the base HTML page
    853   doc/html/*               HTML documentation
    854   doc/pcre2.txt            plain text version of the man pages
    855   doc/pcre2test.txt        plain text documentation of test program
    856   install-sh               a shell script for installing files
    857   libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
    858   libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
    859   libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
    860   libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
    861   ltmain.sh                file used to build a libtool script
    862   missing                  ) common stub for a few missing GNU programs while
    863                            )   installing, generated by automake
    864   mkinstalldirs            script for making install directories
    865   perltest.sh              Script for running a Perl test program
    866   pcre2-config.in          source of script which retains PCRE2 information
    867   testdata/testinput*      test data for main library tests
    868   testdata/testoutput*     expected test results
    869   testdata/grep*           input and output for pcre2grep tests
    870   testdata/*               other supporting test files
    871 
    872 (D) Auxiliary files for cmake support
    873 
    874   cmake/COPYING-CMAKE-SCRIPTS
    875   cmake/FindPackageHandleStandardArgs.cmake
    876   cmake/FindEditline.cmake
    877   cmake/FindReadline.cmake
    878   CMakeLists.txt
    879   config-cmake.h.in
    880 
    881 (E) Auxiliary files for building PCRE2 "by hand"
    882 
    883   src/pcre2.h.generic     ) a version of the public PCRE2 header file
    884                           )   for use in non-"configure" environments
    885   src/config.h.generic    ) a version of config.h for use in non-"configure"
    886                           )   environments
    887 
    888 Philip Hazel
    889 Email local part: ph10
    890 Email domain: cam.ac.uk
    891 Last updated: 17 June 2018
    892