1 README 2007/05/31 2 3 Oniguruma ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp> 4 5 http://www.geocities.jp/kosako3/oniguruma/ 6 7 Oniguruma is a regular expressions library. 8 The characteristics of this library is that different character encoding 9 for every regular expression object can be specified. 10 11 Supported character encodings: 12 13 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, 14 EUC-JP, EUC-TW, EUC-KR, EUC-CN, 15 Shift_JIS, Big5, GB18030, KOI8-R, CP1251, 16 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, 17 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, 18 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16 19 20 * GB18030: contributed by KUBO Takehiro 21 * CP1251: contributed by Byte 22 ------------------------------------------------------------ 23 24 License 25 26 BSD license. 27 28 29 Install 30 31 Case 1: Unix and Cygwin platform 32 33 1. ./configure 34 2. make 35 3. make install 36 37 * uninstall 38 39 make uninstall 40 41 * test (ASCII/EUC-JP) 42 43 make atest 44 45 * configuration check 46 47 onig-config --cflags 48 onig-config --libs 49 onig-config --prefix 50 onig-config --exec-prefix 51 52 53 54 Case 2: Win32 platform (VC++) 55 56 1. copy win32\Makefile Makefile 57 2. copy win32\config.h config.h 58 3. nmake 59 60 onig_s.lib: static link library 61 onig.dll: dynamic link library 62 63 * test (ASCII/Shift_JIS) 64 4. copy win32\testc.c testc.c 65 5. nmake ctest 66 67 68 69 Regular Expressions 70 71 See doc/RE (or doc/RE.ja for Japanese). 72 73 74 Usage 75 76 Include oniguruma.h in your program. (Oniguruma API) 77 See doc/API for Oniguruma API. 78 79 If you want to disable UChar type (== unsigned char) definition 80 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then 81 include oniguruma.h. 82 83 If you want to disable regex_t type definition in oniguruma.h, 84 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h. 85 86 Example of the compiling/linking command line in Unix or Cygwin, 87 (prefix == /usr/local case) 88 89 cc sample.c -L/usr/local/lib -lonig 90 91 92 If you want to use static link library(onig_s.lib) in Win32, 93 add option -DONIG_EXTERN=extern to C compiler. 94 95 96 97 Sample Programs 98 99 sample/simple.c example of the minimum (Oniguruma API) 100 sample/names.c example of the named group callback. 101 sample/encode.c example of some encodings. 102 sample/listcap.c example of the capture history. 103 sample/posix.c POSIX API sample. 104 sample/sql.c example of the variable meta characters. 105 (SQL-like pattern matching) 106 107 Test Programs 108 sample/syntax.c Perl, Java and ASIS syntax test. 109 sample/crnl.c --enable-crnl-as-line-terminator test 110 111 112 Source Files 113 114 oniguruma.h Oniguruma API header file. (public) 115 onig-config.in configuration check program template. 116 117 regenc.h character encodings framework header file. 118 regint.h internal definitions 119 regparse.h internal definitions for regparse.c and regcomp.c 120 regcomp.c compiling and optimization functions 121 regenc.c character encodings framework. 122 regerror.c error message function 123 regext.c extended API functions. (deluxe version API) 124 regexec.c search and match functions 125 regparse.c parsing functions. 126 regsyntax.c pattern syntax functions and built-in syntax definitions. 127 regtrav.c capture history tree data traverse functions. 128 regversion.c version info function. 129 st.h hash table functions header file 130 st.c hash table functions 131 132 oniggnu.h GNU regex API header file. (public) 133 reggnu.c GNU regex API functions 134 135 onigposix.h POSIX API header file. (public) 136 regposerr.c POSIX error message function. 137 regposix.c POSIX API functions. 138 139 enc/mktable.c character type table generator. 140 enc/ascii.c ASCII encoding. 141 enc/euc_jp.c EUC-JP encoding. 142 enc/euc_tw.c EUC-TW encoding. 143 enc/euc_kr.c EUC-KR, EUC-CN encoding. 144 enc/sjis.c Shift_JIS encoding. 145 enc/big5.c Big5 encoding. 146 enc/gb18030.c GB18030 encoding. 147 enc/koi8.c KOI8 encoding. 148 enc/koi8_r.c KOI8-R encoding. 149 enc/cp1251.c CP1251 encoding. 150 enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1) 151 enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2) 152 enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3) 153 enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4) 154 enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic) 155 enc/iso8859_6.c ISO-8859-6 encoding. (Arabic) 156 enc/iso8859_7.c ISO-8859-7 encoding. (Greek) 157 enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew) 158 enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish) 159 enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic) 160 enc/iso8859_11.c ISO-8859-11 encoding. (Thai) 161 enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim) 162 enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic) 163 enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro) 164 enc/iso8859_16.c ISO-8859-16 encoding. 165 (Latin-10 or South-Eastern European with Euro) 166 enc/utf8.c UTF-8 encoding. 167 enc/utf16_be.c UTF-16BE encoding. 168 enc/utf16_le.c UTF-16LE encoding. 169 enc/utf32_be.c UTF-32BE encoding. 170 enc/utf32_le.c UTF-32LE encoding. 171 enc/unicode.c Unicode information data. 172 173 win32/Makefile Makefile for Win32 (VC++) 174 win32/config.h config.h for Win32 175 176 177 178 ToDo 179 180 ? case fold flag: Katakana <-> Hiragana. 181 ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z) 182 ?? \X (== \PM\pM*) 183 ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS. 184 ?? transmission stopper. (return ONIG_STOP from match_at()) 185 186 and I'm thankful to Akinori MUSHA. 187 188 189 Mail Address: K.Kosako <sndgk393 AT ybb DOT ne DOT jp> 190