Home | History | Annotate | only in /external/chromium_org/third_party/sqlite/src/ext/fts3
Up to higher level directory
NameDateSize
fts3.c05-Aug-2015121.6K
fts3.h05-Aug-2015705
fts3_aux.c05-Aug-201514.6K
fts3_expr.c05-Aug-201529.3K
fts3_hash.c05-Aug-201511.3K
fts3_hash.h05-Aug-20154K
fts3_icu.c05-Aug-20156.6K
fts3_porter.c05-Aug-201516.9K
fts3_snippet.c05-Aug-201553.1K
fts3_tokenizer.c05-Aug-201513.5K
fts3_tokenizer.h05-Aug-20156K
fts3_tokenizer1.c05-Aug-20156.5K
fts3_write.c05-Aug-201589.5K
fts3Int.h05-Aug-201514.7K
fts3speed.tcl05-Aug-20152.7K
mkfts3amal.tcl05-Aug-20153.3K
README.syntax05-Aug-20159.3K
README.tokenizers05-Aug-20155.1K
README.txt05-Aug-2015219

README.syntax

      1 
      2 1. OVERVIEW
      3 
      4   This README file describes the syntax of the arguments that may be passed to
      5   the FTS3 MATCH operator used for full-text queries. For example, if table 
      6   "t1" is an Fts3 virtual table, the following SQL query:
      7 
      8     SELECT * FROM t1 WHERE <col> MATCH <full-text query>
      9 
     10   may be used to retrieve all rows that match a specified for full-text query. 
     11   The text "<col>" should be replaced by either the name of the fts3 table 
     12   (in this case "t1"), or by the name of one of the columns of the fts3 
     13   table. <full-text-query> should be replaced by an SQL expression that 
     14   computes to a string containing an Fts3 query.
     15 
     16   If the left-hand-side of the MATCH operator is set to the name of the
     17   fts3 table, then by default the query may be matched against any column
     18   of the table. If it is set to a column name, then by default the query
     19   may only match the specified column. In both cases this may be overriden
     20   as part of the query text (see sections 2 and 3 below).
     21 
     22   As of SQLite version 3.6.8, Fts3 supports two slightly different query 
     23   formats; the standard syntax, which is used by default, and the enhanced
     24   query syntax which can be selected by compiling with the pre-processor
     25   symbol SQLITE_ENABLE_FTS3_PARENTHESIS defined.
     26 
     27     -DSQLITE_ENABLE_FTS3_PARENTHESIS
     28 
     29 2. STANDARD QUERY SYNTAX
     30 
     31   When using the standard Fts3 query syntax, a query usually consists of a 
     32   list of terms (words) separated by white-space characters. To match a
     33   query, a row (or column) of an Fts3 table must contain each of the specified
     34   terms. For example, the following query:
     35 
     36     <col> MATCH 'hello world'
     37 
     38   matches rows (or columns, if <col> is the name of a column name) that 
     39   contain at least one instance of the token "hello", and at least one 
     40   instance of the token "world". Tokens may be grouped into phrases using
     41   quotation marks. In this case, a matching row or column must contain each
     42   of the tokens in the phrase in the order specified, with no intervening
     43   tokens. For example, the query:
     44 
     45     <col> MATCH '"hello world" joe"
     46 
     47   matches the first of the following two documents, but not the second or
     48   third:
     49 
     50     "'Hello world', said Joe."
     51     "One should always greet the world with a cheery hello, thought Joe."
     52     "How many hello world programs could their be?"
     53 
     54   As well as grouping tokens together by phrase, the binary NEAR operator 
     55   may be used to search for rows that contain two or more specified tokens 
     56   or phrases within a specified proximity of each other. The NEAR operator
     57   must always be specified in upper case. The word "near" in lower or mixed
     58   case is treated as an ordinary token. For example, the following query:
     59 
     60     <col> MATCH 'engineering NEAR consultancy'
     61 
     62   matches rows that contain both the "engineering" and "consultancy" tokens
     63   in the same column with not more than 10 other words between them. It does
     64   not matter which of the two terms occurs first in the document, only that
     65   they be seperated by only 10 tokens or less. The user may also specify
     66   a different required proximity by adding "/N" immediately after the NEAR
     67   operator, where N is an integer. For example:
     68 
     69     <col> MATCH 'engineering NEAR/5 consultancy'
     70 
     71   searches for a row containing an instance of each specified token seperated
     72   by not more than 5 other tokens. More than one NEAR operator can be used
     73   in as sequence. For example this query:
     74 
     75     <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy'
     76 
     77   searches for a row that contains an instance of the token "reliable" 
     78   seperated by not more than two tokens from an instance of "engineering",
     79   which is in turn separated by not more than 5 other tokens from an
     80   instance of the term "consultancy". Phrases enclosed in quotes may
     81   also be used as arguments to the NEAR operator.
     82 
     83   Similar to the NEAR operator, one or more tokens or phrases may be 
     84   separated by OR operators. In this case, only one of the specified tokens
     85   or phrases must appear in the document. For example, the query:
     86 
     87     <col> MATCH 'hello OR world'
     88 
     89   matches rows that contain either the term "hello", or the term "world",
     90   or both. Note that unlike in many programming languages, the OR operator
     91   has a higher precedence than the AND operators implied between white-space
     92   separated tokens. The following query matches documents that contain the
     93   term 'sqlite' and at least one of the terms 'fantastic' or 'impressive',
     94   not those that contain both 'sqlite' and 'fantastic' or 'impressive':
     95 
     96     <col> MATCH 'sqlite fantastic OR impressive'
     97 
     98   Any token that is part of an Fts3 query expression, whether or not it is
     99   part of a phrase enclosed in quotes, may have a '*' character appended to
    100   it. In this case, the token matches all terms that begin with the characters
    101   of the token, not just those that exactly match it. For example, the 
    102   following query:
    103 
    104     <col> MATCH 'sql*'
    105 
    106   matches all rows that contain the term "SQLite", as well as those that
    107   contain "SQL".
    108 
    109   A token that is not part of a quoted phrase may be preceded by a '-'
    110   character, which indicates that matching rows must not contain the 
    111   specified term. For example, the following:
    112 
    113     <col> MATCH '"database engine" -sqlite'
    114 
    115   matches rows that contain the phrase "database engine" but do not contain
    116   the term "sqlite". If the '-' character occurs inside a quoted phrase,
    117   it is ignored. It is possible to use both the '-' prefix and the '*' postfix
    118   on a single term. At this time, all Fts3 queries must contain at least
    119   one term or phrase that is not preceded by the '-' prefix.
    120 
    121   Regardless of whether or not a table name or column name is used on the 
    122   left hand side of the MATCH operator, a specific column of the fts3 table
    123   may be associated with each token in a query by preceding a token with
    124   a column name followed by a ':' character. For example, regardless of what
    125   is specified for <col>, the following query requires that column "col1"
    126   of the table contains the term "hello", and that column "col2" of the
    127   table contains the term "world". If the table does not contain columns
    128   named "col1" and "col2", then an error is returned and the query is
    129   not run.
    130 
    131     <col> MATCH 'col1:hello col2:world'
    132 
    133   It is not possible to associate a specific table column with a quoted 
    134   phrase or a term preceded by a '-' operator. A '*' character may be
    135   appended to a term associated with a specific column for prefix matching.
    136 
    137 3. ENHANCED QUERY SYNTAX
    138 
    139   The enhanced query syntax is quite similar to the standard query syntax,
    140   with the following four differences:
    141 
    142   1) Parenthesis are supported. When using the enhanced query syntax,
    143      parenthesis may be used to overcome the built-in precedence of the
    144      supplied binary operators. For example, the following query:
    145 
    146        <col> MATCH '(hello world) OR (simple example)'
    147 
    148      matches documents that contain both "hello" and "world", and documents
    149      that contain both "simple" and "example". It is not possible to forumlate
    150      such a query using the standard syntax.
    151 
    152   2) Instead of separating tokens and phrases by whitespace, an AND operator
    153      may be explicitly specified. This does not change query processing at
    154      all, but may be used to improve readability. For example, the following
    155      query is handled identically to the one above:
    156 
    157        <col> MATCH '(hello AND world) OR (simple AND example)'
    158 
    159      As with the OR and NEAR operators, the AND operator must be specified
    160      in upper case. The word "and" specified in lower or mixed case is 
    161      handled as a regular token.
    162 
    163   3) The '-' token prefix is not supported. Instead, a new binary operator,
    164      NOT, is included. The NOT operator requires that the query specified
    165      as its left-hand operator matches, but that the query specified as the
    166      right-hand operator does not. For example, to query for all rows that
    167      contain the term "example" but not the term "simple", the following
    168      query could be used:
    169 
    170        <col> MATCH 'example NOT simple'
    171 
    172      As for all other operators, the NOT operator must be specified in
    173      upper case. Otherwise it will be treated as a regular token.
    174 
    175   4) Unlike in the standard syntax, where the OR operator has a higher
    176      precedence than the implicit AND operator, when using the enhanced
    177      syntax implicit and explict AND operators have a higher precedence
    178      than OR operators. Using the enhanced syntax, the following two
    179      queries are equivalent:
    180 
    181        <col> MATCH 'sqlite fantastic OR impressive'
    182        <col> MATCH '(sqlite AND fantastic) OR impressive'
    183 
    184      however, when using the standard syntax, the query:
    185 
    186        <col> MATCH 'sqlite fantastic OR impressive'
    187 
    188      is equivalent to the enhanced syntax query:
    189 
    190        <col> MATCH 'sqlite AND (fantastic OR impressive)'
    191 
    192      The precedence of all enhanced syntax operators, in order from highest
    193      to lowest, is:
    194 
    195        NEAR       (highest precedence, tightest grouping)
    196        NOT
    197        AND
    198        OR         (lowest precedence, loosest grouping)
    199 
    200   Using the advanced syntax, it is possible to specify expressions enclosed
    201   in parenthesis as operands to the NOT, AND and OR operators. However both
    202   the left and right hand side operands of NEAR operators must be either
    203   tokens or phrases. Attempting the following query will return an error:
    204 
    205     <col> MATCH 'sqlite NEAR (fantastic OR impressive)'
    206 
    207   Queries of this form must be re-written as:
    208 
    209     <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive'
    210 

README.tokenizers

      1 
      2 1. FTS3 Tokenizers
      3 
      4   When creating a new full-text table, FTS3 allows the user to select
      5   the text tokenizer implementation to be used when indexing text
      6   by specifying a "tokenize" clause as part of the CREATE VIRTUAL TABLE
      7   statement:
      8 
      9     CREATE VIRTUAL TABLE <table-name> USING fts3(
     10       <columns ...> [, tokenize <tokenizer-name> [<tokenizer-args>]]
     11     );
     12 
     13   The built-in tokenizers (valid values to pass as <tokenizer name>) are
     14   "simple" and "porter".
     15 
     16   <tokenizer-args> should consist of zero or more white-space separated
     17   arguments to pass to the selected tokenizer implementation. The 
     18   interpretation of the arguments, if any, depends on the individual 
     19   tokenizer.
     20 
     21 2. Custom Tokenizers
     22 
     23   FTS3 allows users to provide custom tokenizer implementations. The 
     24   interface used to create a new tokenizer is defined and described in 
     25   the fts3_tokenizer.h source file.
     26 
     27   Registering a new FTS3 tokenizer is similar to registering a new 
     28   virtual table module with SQLite. The user passes a pointer to a
     29   structure containing pointers to various callback functions that
     30   make up the implementation of the new tokenizer type. For tokenizers,
     31   the structure (defined in fts3_tokenizer.h) is called
     32   "sqlite3_tokenizer_module".
     33 
     34   FTS3 does not expose a C-function that users call to register new
     35   tokenizer types with a database handle. Instead, the pointer must
     36   be encoded as an SQL blob value and passed to FTS3 through the SQL
     37   engine by evaluating a special scalar function, "fts3_tokenizer()".
     38   The fts3_tokenizer() function may be called with one or two arguments,
     39   as follows:
     40 
     41     SELECT fts3_tokenizer(<tokenizer-name>);
     42     SELECT fts3_tokenizer(<tokenizer-name>, <sqlite3_tokenizer_module ptr>);
     43   
     44   Where <tokenizer-name> is a string identifying the tokenizer and
     45   <sqlite3_tokenizer_module ptr> is a pointer to an sqlite3_tokenizer_module
     46   structure encoded as an SQL blob. If the second argument is present,
     47   it is registered as tokenizer <tokenizer-name> and a copy of it
     48   returned. If only one argument is passed, a pointer to the tokenizer
     49   implementation currently registered as <tokenizer-name> is returned,
     50   encoded as a blob. Or, if no such tokenizer exists, an SQL exception
     51   (error) is raised.
     52 
     53   SECURITY: If the fts3 extension is used in an environment where potentially
     54     malicious users may execute arbitrary SQL (i.e. gears), they should be
     55     prevented from invoking the fts3_tokenizer() function, possibly using the
     56     authorisation callback.
     57 
     58   See "Sample code" below for an example of calling the fts3_tokenizer()
     59   function from C code.
     60 
     61 3. ICU Library Tokenizers
     62 
     63   If this extension is compiled with the SQLITE_ENABLE_ICU pre-processor 
     64   symbol defined, then there exists a built-in tokenizer named "icu" 
     65   implemented using the ICU library. The first argument passed to the
     66   xCreate() method (see fts3_tokenizer.h) of this tokenizer may be
     67   an ICU locale identifier. For example "tr_TR" for Turkish as used
     68   in Turkey, or "en_AU" for English as used in Australia. For example:
     69 
     70     "CREATE VIRTUAL TABLE thai_text USING fts3(text, tokenizer icu th_TH)"
     71 
     72   The ICU tokenizer implementation is very simple. It splits the input
     73   text according to the ICU rules for finding word boundaries and discards
     74   any tokens that consist entirely of white-space. This may be suitable
     75   for some applications in some locales, but not all. If more complex
     76   processing is required, for example to implement stemming or 
     77   discard punctuation, this can be done by creating a tokenizer 
     78   implementation that uses the ICU tokenizer as part of its implementation.
     79 
     80   When using the ICU tokenizer this way, it is safe to overwrite the
     81   contents of the strings returned by the xNext() method (see
     82   fts3_tokenizer.h).
     83 
     84 4. Sample code.
     85 
     86   The following two code samples illustrate the way C code should invoke
     87   the fts3_tokenizer() scalar function:
     88 
     89       int registerTokenizer(
     90         sqlite3 *db, 
     91         char *zName, 
     92         const sqlite3_tokenizer_module *p
     93       ){
     94         int rc;
     95         sqlite3_stmt *pStmt;
     96         const char zSql[] = "SELECT fts3_tokenizer(?, ?)";
     97       
     98         rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);
     99         if( rc!=SQLITE_OK ){
    100           return rc;
    101         }
    102       
    103         sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);
    104         sqlite3_bind_blob(pStmt, 2, &p, sizeof(p), SQLITE_STATIC);
    105         sqlite3_step(pStmt);
    106       
    107         return sqlite3_finalize(pStmt);
    108       }
    109       
    110       int queryTokenizer(
    111         sqlite3 *db, 
    112         char *zName,  
    113         const sqlite3_tokenizer_module **pp
    114       ){
    115         int rc;
    116         sqlite3_stmt *pStmt;
    117         const char zSql[] = "SELECT fts3_tokenizer(?)";
    118       
    119         *pp = 0;
    120         rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);
    121         if( rc!=SQLITE_OK ){
    122           return rc;
    123         }
    124       
    125         sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);
    126         if( SQLITE_ROW==sqlite3_step(pStmt) ){
    127           if( sqlite3_column_type(pStmt, 0)==SQLITE_BLOB ){
    128             memcpy(pp, sqlite3_column_blob(pStmt, 0), sizeof(*pp));
    129           }
    130         }
    131       
    132         return sqlite3_finalize(pStmt);
    133       }
    134 

README.txt

      1 This folder contains source code to the second full-text search
      2 extension for SQLite.  While the API is the same, this version uses a
      3 substantially different storage schema from fts1, so tables will need
      4 to be rebuilt.
      5