1 2 1. OVERVIEW 3 4 This README file describes the syntax of the arguments that may be passed to 5 the FTS3 MATCH operator used for full-text queries. For example, if table 6 "t1" is an Fts3 virtual table, the following SQL query: 7 8 SELECT * FROM t1 WHERE <col> MATCH <full-text query> 9 10 may be used to retrieve all rows that match a specified for full-text query. 11 The text "<col>" should be replaced by either the name of the fts3 table 12 (in this case "t1"), or by the name of one of the columns of the fts3 13 table. <full-text-query> should be replaced by an SQL expression that 14 computes to a string containing an Fts3 query. 15 16 If the left-hand-side of the MATCH operator is set to the name of the 17 fts3 table, then by default the query may be matched against any column 18 of the table. If it is set to a column name, then by default the query 19 may only match the specified column. In both cases this may be overriden 20 as part of the query text (see sections 2 and 3 below). 21 22 As of SQLite version 3.6.8, Fts3 supports two slightly different query 23 formats; the standard syntax, which is used by default, and the enhanced 24 query syntax which can be selected by compiling with the pre-processor 25 symbol SQLITE_ENABLE_FTS3_PARENTHESIS defined. 26 27 -DSQLITE_ENABLE_FTS3_PARENTHESIS 28 29 2. STANDARD QUERY SYNTAX 30 31 When using the standard Fts3 query syntax, a query usually consists of a 32 list of terms (words) separated by white-space characters. To match a 33 query, a row (or column) of an Fts3 table must contain each of the specified 34 terms. For example, the following query: 35 36 <col> MATCH 'hello world' 37 38 matches rows (or columns, if <col> is the name of a column name) that 39 contain at least one instance of the token "hello", and at least one 40 instance of the token "world". Tokens may be grouped into phrases using 41 quotation marks. In this case, a matching row or column must contain each 42 of the tokens in the phrase in the order specified, with no intervening 43 tokens. For example, the query: 44 45 <col> MATCH '"hello world" joe" 46 47 matches the first of the following two documents, but not the second or 48 third: 49 50 "'Hello world', said Joe." 51 "One should always greet the world with a cheery hello, thought Joe." 52 "How many hello world programs could their be?" 53 54 As well as grouping tokens together by phrase, the binary NEAR operator 55 may be used to search for rows that contain two or more specified tokens 56 or phrases within a specified proximity of each other. The NEAR operator 57 must always be specified in upper case. The word "near" in lower or mixed 58 case is treated as an ordinary token. For example, the following query: 59 60 <col> MATCH 'engineering NEAR consultancy' 61 62 matches rows that contain both the "engineering" and "consultancy" tokens 63 in the same column with not more than 10 other words between them. It does 64 not matter which of the two terms occurs first in the document, only that 65 they be seperated by only 10 tokens or less. The user may also specify 66 a different required proximity by adding "/N" immediately after the NEAR 67 operator, where N is an integer. For example: 68 69 <col> MATCH 'engineering NEAR/5 consultancy' 70 71 searches for a row containing an instance of each specified token seperated 72 by not more than 5 other tokens. More than one NEAR operator can be used 73 in as sequence. For example this query: 74 75 <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy' 76 77 searches for a row that contains an instance of the token "reliable" 78 seperated by not more than two tokens from an instance of "engineering", 79 which is in turn separated by not more than 5 other tokens from an 80 instance of the term "consultancy". Phrases enclosed in quotes may 81 also be used as arguments to the NEAR operator. 82 83 Similar to the NEAR operator, one or more tokens or phrases may be 84 separated by OR operators. In this case, only one of the specified tokens 85 or phrases must appear in the document. For example, the query: 86 87 <col> MATCH 'hello OR world' 88 89 matches rows that contain either the term "hello", or the term "world", 90 or both. Note that unlike in many programming languages, the OR operator 91 has a higher precedence than the AND operators implied between white-space 92 separated tokens. The following query matches documents that contain the 93 term 'sqlite' and at least one of the terms 'fantastic' or 'impressive', 94 not those that contain both 'sqlite' and 'fantastic' or 'impressive': 95 96 <col> MATCH 'sqlite fantastic OR impressive' 97 98 Any token that is part of an Fts3 query expression, whether or not it is 99 part of a phrase enclosed in quotes, may have a '*' character appended to 100 it. In this case, the token matches all terms that begin with the characters 101 of the token, not just those that exactly match it. For example, the 102 following query: 103 104 <col> MATCH 'sql*' 105 106 matches all rows that contain the term "SQLite", as well as those that 107 contain "SQL". 108 109 A token that is not part of a quoted phrase may be preceded by a '-' 110 character, which indicates that matching rows must not contain the 111 specified term. For example, the following: 112 113 <col> MATCH '"database engine" -sqlite' 114 115 matches rows that contain the phrase "database engine" but do not contain 116 the term "sqlite". If the '-' character occurs inside a quoted phrase, 117 it is ignored. It is possible to use both the '-' prefix and the '*' postfix 118 on a single term. At this time, all Fts3 queries must contain at least 119 one term or phrase that is not preceded by the '-' prefix. 120 121 Regardless of whether or not a table name or column name is used on the 122 left hand side of the MATCH operator, a specific column of the fts3 table 123 may be associated with each token in a query by preceding a token with 124 a column name followed by a ':' character. For example, regardless of what 125 is specified for <col>, the following query requires that column "col1" 126 of the table contains the term "hello", and that column "col2" of the 127 table contains the term "world". If the table does not contain columns 128 named "col1" and "col2", then an error is returned and the query is 129 not run. 130 131 <col> MATCH 'col1:hello col2:world' 132 133 It is not possible to associate a specific table column with a quoted 134 phrase or a term preceded by a '-' operator. A '*' character may be 135 appended to a term associated with a specific column for prefix matching. 136 137 3. ENHANCED QUERY SYNTAX 138 139 The enhanced query syntax is quite similar to the standard query syntax, 140 with the following four differences: 141 142 1) Parenthesis are supported. When using the enhanced query syntax, 143 parenthesis may be used to overcome the built-in precedence of the 144 supplied binary operators. For example, the following query: 145 146 <col> MATCH '(hello world) OR (simple example)' 147 148 matches documents that contain both "hello" and "world", and documents 149 that contain both "simple" and "example". It is not possible to forumlate 150 such a query using the standard syntax. 151 152 2) Instead of separating tokens and phrases by whitespace, an AND operator 153 may be explicitly specified. This does not change query processing at 154 all, but may be used to improve readability. For example, the following 155 query is handled identically to the one above: 156 157 <col> MATCH '(hello AND world) OR (simple AND example)' 158 159 As with the OR and NEAR operators, the AND operator must be specified 160 in upper case. The word "and" specified in lower or mixed case is 161 handled as a regular token. 162 163 3) The '-' token prefix is not supported. Instead, a new binary operator, 164 NOT, is included. The NOT operator requires that the query specified 165 as its left-hand operator matches, but that the query specified as the 166 right-hand operator does not. For example, to query for all rows that 167 contain the term "example" but not the term "simple", the following 168 query could be used: 169 170 <col> MATCH 'example NOT simple' 171 172 As for all other operators, the NOT operator must be specified in 173 upper case. Otherwise it will be treated as a regular token. 174 175 4) Unlike in the standard syntax, where the OR operator has a higher 176 precedence than the implicit AND operator, when using the enhanced 177 syntax implicit and explict AND operators have a higher precedence 178 than OR operators. Using the enhanced syntax, the following two 179 queries are equivalent: 180 181 <col> MATCH 'sqlite fantastic OR impressive' 182 <col> MATCH '(sqlite AND fantastic) OR impressive' 183 184 however, when using the standard syntax, the query: 185 186 <col> MATCH 'sqlite fantastic OR impressive' 187 188 is equivalent to the enhanced syntax query: 189 190 <col> MATCH 'sqlite AND (fantastic OR impressive)' 191 192 The precedence of all enhanced syntax operators, in order from highest 193 to lowest, is: 194 195 NEAR (highest precedence, tightest grouping) 196 NOT 197 AND 198 OR (lowest precedence, loosest grouping) 199 200 Using the advanced syntax, it is possible to specify expressions enclosed 201 in parenthesis as operands to the NOT, AND and OR operators. However both 202 the left and right hand side operands of NEAR operators must be either 203 tokens or phrases. Attempting the following query will return an error: 204 205 <col> MATCH 'sqlite NEAR (fantastic OR impressive)' 206 207 Queries of this form must be re-written as: 208 209 <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive' 210