1 /* 2 ** 2006 Oct 10 3 ** 4 ** The author disclaims copyright to this source code. In place of 5 ** a legal notice, here is a blessing: 6 ** 7 ** May you do good and not evil. 8 ** May you find forgiveness for yourself and forgive others. 9 ** May you share freely, never taking more than you give. 10 ** 11 ****************************************************************************** 12 ** 13 ** This is an SQLite module implementing full-text search. 14 */ 15 16 /* 17 ** The code in this file is only compiled if: 18 ** 19 ** * The FTS3 module is being built as an extension 20 ** (in which case SQLITE_CORE is not defined), or 21 ** 22 ** * The FTS3 module is being built into the core of 23 ** SQLite (in which case SQLITE_ENABLE_FTS3 is defined). 24 */ 25 26 /* The full-text index is stored in a series of b+tree (-like) 27 ** structures called segments which map terms to doclists. The 28 ** structures are like b+trees in layout, but are constructed from the 29 ** bottom up in optimal fashion and are not updatable. Since trees 30 ** are built from the bottom up, things will be described from the 31 ** bottom up. 32 ** 33 ** 34 **** Varints **** 35 ** The basic unit of encoding is a variable-length integer called a 36 ** varint. We encode variable-length integers in little-endian order 37 ** using seven bits * per byte as follows: 38 ** 39 ** KEY: 40 ** A = 0xxxxxxx 7 bits of data and one flag bit 41 ** B = 1xxxxxxx 7 bits of data and one flag bit 42 ** 43 ** 7 bits - A 44 ** 14 bits - BA 45 ** 21 bits - BBA 46 ** and so on. 47 ** 48 ** This is similar in concept to how sqlite encodes "varints" but 49 ** the encoding is not the same. SQLite varints are big-endian 50 ** are are limited to 9 bytes in length whereas FTS3 varints are 51 ** little-endian and can be up to 10 bytes in length (in theory). 52 ** 53 ** Example encodings: 54 ** 55 ** 1: 0x01 56 ** 127: 0x7f 57 ** 128: 0x81 0x00 58 ** 59 ** 60 **** Document lists **** 61 ** A doclist (document list) holds a docid-sorted list of hits for a 62 ** given term. Doclists hold docids and associated token positions. 63 ** A docid is the unique integer identifier for a single document. 64 ** A position is the index of a word within the document. The first 65 ** word of the document has a position of 0. 66 ** 67 ** FTS3 used to optionally store character offsets using a compile-time 68 ** option. But that functionality is no longer supported. 69 ** 70 ** A doclist is stored like this: 71 ** 72 ** array { 73 ** varint docid; 74 ** array { (position list for column 0) 75 ** varint position; (2 more than the delta from previous position) 76 ** } 77 ** array { 78 ** varint POS_COLUMN; (marks start of position list for new column) 79 ** varint column; (index of new column) 80 ** array { 81 ** varint position; (2 more than the delta from previous position) 82 ** } 83 ** } 84 ** varint POS_END; (marks end of positions for this document. 85 ** } 86 ** 87 ** Here, array { X } means zero or more occurrences of X, adjacent in 88 ** memory. A "position" is an index of a token in the token stream 89 ** generated by the tokenizer. Note that POS_END and POS_COLUMN occur 90 ** in the same logical place as the position element, and act as sentinals 91 ** ending a position list array. POS_END is 0. POS_COLUMN is 1. 92 ** The positions numbers are not stored literally but rather as two more 93 ** than the difference from the prior position, or the just the position plus 94 ** 2 for the first position. Example: 95 ** 96 ** label: A B C D E F G H I J K 97 ** value: 123 5 9 1 1 14 35 0 234 72 0 98 ** 99 ** The 123 value is the first docid. For column zero in this document 100 ** there are two matches at positions 3 and 10 (5-2 and 9-2+3). The 1 101 ** at D signals the start of a new column; the 1 at E indicates that the 102 ** new column is column number 1. There are two positions at 12 and 45 103 ** (14-2 and 35-2+12). The 0 at H indicate the end-of-document. The 104 ** 234 at I is the next docid. It has one position 72 (72-2) and then 105 ** terminates with the 0 at K. 106 ** 107 ** A "position-list" is the list of positions for multiple columns for 108 ** a single docid. A "column-list" is the set of positions for a single 109 ** column. Hence, a position-list consists of one or more column-lists, 110 ** a document record consists of a docid followed by a position-list and 111 ** a doclist consists of one or more document records. 112 ** 113 ** A bare doclist omits the position information, becoming an 114 ** array of varint-encoded docids. 115 ** 116 **** Segment leaf nodes **** 117 ** Segment leaf nodes store terms and doclists, ordered by term. Leaf 118 ** nodes are written using LeafWriter, and read using LeafReader (to 119 ** iterate through a single leaf node's data) and LeavesReader (to 120 ** iterate through a segment's entire leaf layer). Leaf nodes have 121 ** the format: 122 ** 123 ** varint iHeight; (height from leaf level, always 0) 124 ** varint nTerm; (length of first term) 125 ** char pTerm[nTerm]; (content of first term) 126 ** varint nDoclist; (length of term's associated doclist) 127 ** char pDoclist[nDoclist]; (content of doclist) 128 ** array { 129 ** (further terms are delta-encoded) 130 ** varint nPrefix; (length of prefix shared with previous term) 131 ** varint nSuffix; (length of unshared suffix) 132 ** char pTermSuffix[nSuffix];(unshared suffix of next term) 133 ** varint nDoclist; (length of term's associated doclist) 134 ** char pDoclist[nDoclist]; (content of doclist) 135 ** } 136 ** 137 ** Here, array { X } means zero or more occurrences of X, adjacent in 138 ** memory. 139 ** 140 ** Leaf nodes are broken into blocks which are stored contiguously in 141 ** the %_segments table in sorted order. This means that when the end 142 ** of a node is reached, the next term is in the node with the next 143 ** greater node id. 144 ** 145 ** New data is spilled to a new leaf node when the current node 146 ** exceeds LEAF_MAX bytes (default 2048). New data which itself is 147 ** larger than STANDALONE_MIN (default 1024) is placed in a standalone 148 ** node (a leaf node with a single term and doclist). The goal of 149 ** these settings is to pack together groups of small doclists while 150 ** making it efficient to directly access large doclists. The 151 ** assumption is that large doclists represent terms which are more 152 ** likely to be query targets. 153 ** 154 ** TODO(shess) It may be useful for blocking decisions to be more 155 ** dynamic. For instance, it may make more sense to have a 2.5k leaf 156 ** node rather than splitting into 2k and .5k nodes. My intuition is 157 ** that this might extend through 2x or 4x the pagesize. 158 ** 159 ** 160 **** Segment interior nodes **** 161 ** Segment interior nodes store blockids for subtree nodes and terms 162 ** to describe what data is stored by the each subtree. Interior 163 ** nodes are written using InteriorWriter, and read using 164 ** InteriorReader. InteriorWriters are created as needed when 165 ** SegmentWriter creates new leaf nodes, or when an interior node 166 ** itself grows too big and must be split. The format of interior 167 ** nodes: 168 ** 169 ** varint iHeight; (height from leaf level, always >0) 170 ** varint iBlockid; (block id of node's leftmost subtree) 171 ** optional { 172 ** varint nTerm; (length of first term) 173 ** char pTerm[nTerm]; (content of first term) 174 ** array { 175 ** (further terms are delta-encoded) 176 ** varint nPrefix; (length of shared prefix with previous term) 177 ** varint nSuffix; (length of unshared suffix) 178 ** char pTermSuffix[nSuffix]; (unshared suffix of next term) 179 ** } 180 ** } 181 ** 182 ** Here, optional { X } means an optional element, while array { X } 183 ** means zero or more occurrences of X, adjacent in memory. 184 ** 185 ** An interior node encodes n terms separating n+1 subtrees. The 186 ** subtree blocks are contiguous, so only the first subtree's blockid 187 ** is encoded. The subtree at iBlockid will contain all terms less 188 ** than the first term encoded (or all terms if no term is encoded). 189 ** Otherwise, for terms greater than or equal to pTerm[i] but less 190 ** than pTerm[i+1], the subtree for that term will be rooted at 191 ** iBlockid+i. Interior nodes only store enough term data to 192 ** distinguish adjacent children (if the rightmost term of the left 193 ** child is "something", and the leftmost term of the right child is 194 ** "wicked", only "w" is stored). 195 ** 196 ** New data is spilled to a new interior node at the same height when 197 ** the current node exceeds INTERIOR_MAX bytes (default 2048). 198 ** INTERIOR_MIN_TERMS (default 7) keeps large terms from monopolizing 199 ** interior nodes and making the tree too skinny. The interior nodes 200 ** at a given height are naturally tracked by interior nodes at 201 ** height+1, and so on. 202 ** 203 ** 204 **** Segment directory **** 205 ** The segment directory in table %_segdir stores meta-information for 206 ** merging and deleting segments, and also the root node of the 207 ** segment's tree. 208 ** 209 ** The root node is the top node of the segment's tree after encoding 210 ** the entire segment, restricted to ROOT_MAX bytes (default 1024). 211 ** This could be either a leaf node or an interior node. If the top 212 ** node requires more than ROOT_MAX bytes, it is flushed to %_segments 213 ** and a new root interior node is generated (which should always fit 214 ** within ROOT_MAX because it only needs space for 2 varints, the 215 ** height and the blockid of the previous root). 216 ** 217 ** The meta-information in the segment directory is: 218 ** level - segment level (see below) 219 ** idx - index within level 220 ** - (level,idx uniquely identify a segment) 221 ** start_block - first leaf node 222 ** leaves_end_block - last leaf node 223 ** end_block - last block (including interior nodes) 224 ** root - contents of root node 225 ** 226 ** If the root node is a leaf node, then start_block, 227 ** leaves_end_block, and end_block are all 0. 228 ** 229 ** 230 **** Segment merging **** 231 ** To amortize update costs, segments are grouped into levels and 232 ** merged in batches. Each increase in level represents exponentially 233 ** more documents. 234 ** 235 ** New documents (actually, document updates) are tokenized and 236 ** written individually (using LeafWriter) to a level 0 segment, with 237 ** incrementing idx. When idx reaches MERGE_COUNT (default 16), all 238 ** level 0 segments are merged into a single level 1 segment. Level 1 239 ** is populated like level 0, and eventually MERGE_COUNT level 1 240 ** segments are merged to a single level 2 segment (representing 241 ** MERGE_COUNT^2 updates), and so on. 242 ** 243 ** A segment merge traverses all segments at a given level in 244 ** parallel, performing a straightforward sorted merge. Since segment 245 ** leaf nodes are written in to the %_segments table in order, this 246 ** merge traverses the underlying sqlite disk structures efficiently. 247 ** After the merge, all segment blocks from the merged level are 248 ** deleted. 249 ** 250 ** MERGE_COUNT controls how often we merge segments. 16 seems to be 251 ** somewhat of a sweet spot for insertion performance. 32 and 64 show 252 ** very similar performance numbers to 16 on insertion, though they're 253 ** a tiny bit slower (perhaps due to more overhead in merge-time 254 ** sorting). 8 is about 20% slower than 16, 4 about 50% slower than 255 ** 16, 2 about 66% slower than 16. 256 ** 257 ** At query time, high MERGE_COUNT increases the number of segments 258 ** which need to be scanned and merged. For instance, with 100k docs 259 ** inserted: 260 ** 261 ** MERGE_COUNT segments 262 ** 16 25 263 ** 8 12 264 ** 4 10 265 ** 2 6 266 ** 267 ** This appears to have only a moderate impact on queries for very 268 ** frequent terms (which are somewhat dominated by segment merge 269 ** costs), and infrequent and non-existent terms still seem to be fast 270 ** even with many segments. 271 ** 272 ** TODO(shess) That said, it would be nice to have a better query-side 273 ** argument for MERGE_COUNT of 16. Also, it is possible/likely that 274 ** optimizations to things like doclist merging will swing the sweet 275 ** spot around. 276 ** 277 ** 278 ** 279 **** Handling of deletions and updates **** 280 ** Since we're using a segmented structure, with no docid-oriented 281 ** index into the term index, we clearly cannot simply update the term 282 ** index when a document is deleted or updated. For deletions, we 283 ** write an empty doclist (varint(docid) varint(POS_END)), for updates 284 ** we simply write the new doclist. Segment merges overwrite older 285 ** data for a particular docid with newer data, so deletes or updates 286 ** will eventually overtake the earlier data and knock it out. The 287 ** query logic likewise merges doclists so that newer data knocks out 288 ** older data. 289 ** 290 ** TODO(shess) Provide a VACUUM type operation to clear out all 291 ** deletions and duplications. This would basically be a forced merge 292 ** into a single segment. 293 */ 294 #define CHROMIUM_FTS3_CHANGES 1 295 296 #if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS3) 297 298 #if defined(SQLITE_ENABLE_FTS3) && !defined(SQLITE_CORE) 299 # define SQLITE_CORE 1 300 #endif 301 302 #include "fts3Int.h" 303 304 #include <assert.h> 305 #include <stdlib.h> 306 #include <stddef.h> 307 #include <stdio.h> 308 #include <string.h> 309 #include <stdarg.h> 310 311 #include "fts3.h" 312 #ifndef SQLITE_CORE 313 # include "sqlite3ext.h" 314 SQLITE_EXTENSION_INIT1 315 #endif 316 317 /* 318 ** Write a 64-bit variable-length integer to memory starting at p[0]. 319 ** The length of data written will be between 1 and FTS3_VARINT_MAX bytes. 320 ** The number of bytes written is returned. 321 */ 322 int sqlite3Fts3PutVarint(char *p, sqlite_int64 v){ 323 unsigned char *q = (unsigned char *) p; 324 sqlite_uint64 vu = v; 325 do{ 326 *q++ = (unsigned char) ((vu & 0x7f) | 0x80); 327 vu >>= 7; 328 }while( vu!=0 ); 329 q[-1] &= 0x7f; /* turn off high bit in final byte */ 330 assert( q - (unsigned char *)p <= FTS3_VARINT_MAX ); 331 return (int) (q - (unsigned char *)p); 332 } 333 334 /* 335 ** Read a 64-bit variable-length integer from memory starting at p[0]. 336 ** Return the number of bytes read, or 0 on error. 337 ** The value is stored in *v. 338 */ 339 int sqlite3Fts3GetVarint(const char *p, sqlite_int64 *v){ 340 const unsigned char *q = (const unsigned char *) p; 341 sqlite_uint64 x = 0, y = 1; 342 while( (*q&0x80)==0x80 && q-(unsigned char *)p<FTS3_VARINT_MAX ){ 343 x += y * (*q++ & 0x7f); 344 y <<= 7; 345 } 346 x += y * (*q++); 347 *v = (sqlite_int64) x; 348 return (int) (q - (unsigned char *)p); 349 } 350 351 /* 352 ** Similar to sqlite3Fts3GetVarint(), except that the output is truncated to a 353 ** 32-bit integer before it is returned. 354 */ 355 int sqlite3Fts3GetVarint32(const char *p, int *pi){ 356 sqlite_int64 i; 357 int ret = sqlite3Fts3GetVarint(p, &i); 358 *pi = (int) i; 359 return ret; 360 } 361 362 /* 363 ** Return the number of bytes required to encode v as a varint 364 */ 365 int sqlite3Fts3VarintLen(sqlite3_uint64 v){ 366 int i = 0; 367 do{ 368 i++; 369 v >>= 7; 370 }while( v!=0 ); 371 return i; 372 } 373 374 /* 375 ** Convert an SQL-style quoted string into a normal string by removing 376 ** the quote characters. The conversion is done in-place. If the 377 ** input does not begin with a quote character, then this routine 378 ** is a no-op. 379 ** 380 ** Examples: 381 ** 382 ** "abc" becomes abc 383 ** 'xyz' becomes xyz 384 ** [pqr] becomes pqr 385 ** `mno` becomes mno 386 ** 387 */ 388 void sqlite3Fts3Dequote(char *z){ 389 char quote; /* Quote character (if any ) */ 390 391 quote = z[0]; 392 if( quote=='[' || quote=='\'' || quote=='"' || quote=='`' ){ 393 int iIn = 1; /* Index of next byte to read from input */ 394 int iOut = 0; /* Index of next byte to write to output */ 395 396 /* If the first byte was a '[', then the close-quote character is a ']' */ 397 if( quote=='[' ) quote = ']'; 398 399 while( ALWAYS(z[iIn]) ){ 400 if( z[iIn]==quote ){ 401 if( z[iIn+1]!=quote ) break; 402 z[iOut++] = quote; 403 iIn += 2; 404 }else{ 405 z[iOut++] = z[iIn++]; 406 } 407 } 408 z[iOut] = '\0'; 409 } 410 } 411 412 /* 413 ** Read a single varint from the doclist at *pp and advance *pp to point 414 ** to the first byte past the end of the varint. Add the value of the varint 415 ** to *pVal. 416 */ 417 static void fts3GetDeltaVarint(char **pp, sqlite3_int64 *pVal){ 418 sqlite3_int64 iVal; 419 *pp += sqlite3Fts3GetVarint(*pp, &iVal); 420 *pVal += iVal; 421 } 422 423 /* 424 ** As long as *pp has not reached its end (pEnd), then do the same 425 ** as fts3GetDeltaVarint(): read a single varint and add it to *pVal. 426 ** But if we have reached the end of the varint, just set *pp=0 and 427 ** leave *pVal unchanged. 428 */ 429 static void fts3GetDeltaVarint2(char **pp, char *pEnd, sqlite3_int64 *pVal){ 430 if( *pp>=pEnd ){ 431 *pp = 0; 432 }else{ 433 fts3GetDeltaVarint(pp, pVal); 434 } 435 } 436 437 /* 438 ** The xDisconnect() virtual table method. 439 */ 440 static int fts3DisconnectMethod(sqlite3_vtab *pVtab){ 441 Fts3Table *p = (Fts3Table *)pVtab; 442 int i; 443 444 assert( p->nPendingData==0 ); 445 assert( p->pSegments==0 ); 446 447 /* Free any prepared statements held */ 448 for(i=0; i<SizeofArray(p->aStmt); i++){ 449 sqlite3_finalize(p->aStmt[i]); 450 } 451 sqlite3_free(p->zSegmentsTbl); 452 sqlite3_free(p->zReadExprlist); 453 sqlite3_free(p->zWriteExprlist); 454 455 /* Invoke the tokenizer destructor to free the tokenizer. */ 456 p->pTokenizer->pModule->xDestroy(p->pTokenizer); 457 458 sqlite3_free(p); 459 return SQLITE_OK; 460 } 461 462 /* 463 ** Construct one or more SQL statements from the format string given 464 ** and then evaluate those statements. The success code is written 465 ** into *pRc. 466 ** 467 ** If *pRc is initially non-zero then this routine is a no-op. 468 */ 469 static void fts3DbExec( 470 int *pRc, /* Success code */ 471 sqlite3 *db, /* Database in which to run SQL */ 472 const char *zFormat, /* Format string for SQL */ 473 ... /* Arguments to the format string */ 474 ){ 475 va_list ap; 476 char *zSql; 477 if( *pRc ) return; 478 va_start(ap, zFormat); 479 zSql = sqlite3_vmprintf(zFormat, ap); 480 va_end(ap); 481 if( zSql==0 ){ 482 *pRc = SQLITE_NOMEM; 483 }else{ 484 *pRc = sqlite3_exec(db, zSql, 0, 0, 0); 485 sqlite3_free(zSql); 486 } 487 } 488 489 /* 490 ** The xDestroy() virtual table method. 491 */ 492 static int fts3DestroyMethod(sqlite3_vtab *pVtab){ 493 int rc = SQLITE_OK; /* Return code */ 494 Fts3Table *p = (Fts3Table *)pVtab; 495 sqlite3 *db = p->db; 496 497 /* Drop the shadow tables */ 498 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_content'", p->zDb, p->zName); 499 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_segments'", p->zDb,p->zName); 500 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_segdir'", p->zDb, p->zName); 501 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_docsize'", p->zDb, p->zName); 502 fts3DbExec(&rc, db, "DROP TABLE IF EXISTS %Q.'%q_stat'", p->zDb, p->zName); 503 504 /* If everything has worked, invoke fts3DisconnectMethod() to free the 505 ** memory associated with the Fts3Table structure and return SQLITE_OK. 506 ** Otherwise, return an SQLite error code. 507 */ 508 return (rc==SQLITE_OK ? fts3DisconnectMethod(pVtab) : rc); 509 } 510 511 512 /* 513 ** Invoke sqlite3_declare_vtab() to declare the schema for the FTS3 table 514 ** passed as the first argument. This is done as part of the xConnect() 515 ** and xCreate() methods. 516 ** 517 ** If *pRc is non-zero when this function is called, it is a no-op. 518 ** Otherwise, if an error occurs, an SQLite error code is stored in *pRc 519 ** before returning. 520 */ 521 static void fts3DeclareVtab(int *pRc, Fts3Table *p){ 522 if( *pRc==SQLITE_OK ){ 523 int i; /* Iterator variable */ 524 int rc; /* Return code */ 525 char *zSql; /* SQL statement passed to declare_vtab() */ 526 char *zCols; /* List of user defined columns */ 527 528 /* Create a list of user columns for the virtual table */ 529 zCols = sqlite3_mprintf("%Q, ", p->azColumn[0]); 530 for(i=1; zCols && i<p->nColumn; i++){ 531 zCols = sqlite3_mprintf("%z%Q, ", zCols, p->azColumn[i]); 532 } 533 534 /* Create the whole "CREATE TABLE" statement to pass to SQLite */ 535 zSql = sqlite3_mprintf( 536 "CREATE TABLE x(%s %Q HIDDEN, docid HIDDEN)", zCols, p->zName 537 ); 538 if( !zCols || !zSql ){ 539 rc = SQLITE_NOMEM; 540 }else{ 541 rc = sqlite3_declare_vtab(p->db, zSql); 542 } 543 544 sqlite3_free(zSql); 545 sqlite3_free(zCols); 546 *pRc = rc; 547 } 548 } 549 550 /* 551 ** Create the backing store tables (%_content, %_segments and %_segdir) 552 ** required by the FTS3 table passed as the only argument. This is done 553 ** as part of the vtab xCreate() method. 554 ** 555 ** If the p->bHasDocsize boolean is true (indicating that this is an 556 ** FTS4 table, not an FTS3 table) then also create the %_docsize and 557 ** %_stat tables required by FTS4. 558 */ 559 static int fts3CreateTables(Fts3Table *p){ 560 int rc = SQLITE_OK; /* Return code */ 561 int i; /* Iterator variable */ 562 char *zContentCols; /* Columns of %_content table */ 563 sqlite3 *db = p->db; /* The database connection */ 564 565 /* Create a list of user columns for the content table */ 566 zContentCols = sqlite3_mprintf("docid INTEGER PRIMARY KEY"); 567 for(i=0; zContentCols && i<p->nColumn; i++){ 568 char *z = p->azColumn[i]; 569 zContentCols = sqlite3_mprintf("%z, 'c%d%q'", zContentCols, i, z); 570 } 571 if( zContentCols==0 ) rc = SQLITE_NOMEM; 572 573 /* Create the content table */ 574 fts3DbExec(&rc, db, 575 "CREATE TABLE %Q.'%q_content'(%s)", 576 p->zDb, p->zName, zContentCols 577 ); 578 sqlite3_free(zContentCols); 579 /* Create other tables */ 580 fts3DbExec(&rc, db, 581 "CREATE TABLE %Q.'%q_segments'(blockid INTEGER PRIMARY KEY, block BLOB);", 582 p->zDb, p->zName 583 ); 584 fts3DbExec(&rc, db, 585 "CREATE TABLE %Q.'%q_segdir'(" 586 "level INTEGER," 587 "idx INTEGER," 588 "start_block INTEGER," 589 "leaves_end_block INTEGER," 590 "end_block INTEGER," 591 "root BLOB," 592 "PRIMARY KEY(level, idx)" 593 ");", 594 p->zDb, p->zName 595 ); 596 if( p->bHasDocsize ){ 597 fts3DbExec(&rc, db, 598 "CREATE TABLE %Q.'%q_docsize'(docid INTEGER PRIMARY KEY, size BLOB);", 599 p->zDb, p->zName 600 ); 601 } 602 if( p->bHasStat ){ 603 fts3DbExec(&rc, db, 604 "CREATE TABLE %Q.'%q_stat'(id INTEGER PRIMARY KEY, value BLOB);", 605 p->zDb, p->zName 606 ); 607 } 608 return rc; 609 } 610 611 /* 612 ** Store the current database page-size in bytes in p->nPgsz. 613 ** 614 ** If *pRc is non-zero when this function is called, it is a no-op. 615 ** Otherwise, if an error occurs, an SQLite error code is stored in *pRc 616 ** before returning. 617 */ 618 static void fts3DatabasePageSize(int *pRc, Fts3Table *p){ 619 if( *pRc==SQLITE_OK ){ 620 int rc; /* Return code */ 621 char *zSql; /* SQL text "PRAGMA %Q.page_size" */ 622 sqlite3_stmt *pStmt; /* Compiled "PRAGMA %Q.page_size" statement */ 623 624 zSql = sqlite3_mprintf("PRAGMA %Q.page_size", p->zDb); 625 if( !zSql ){ 626 rc = SQLITE_NOMEM; 627 }else{ 628 rc = sqlite3_prepare(p->db, zSql, -1, &pStmt, 0); 629 if( rc==SQLITE_OK ){ 630 sqlite3_step(pStmt); 631 p->nPgsz = sqlite3_column_int(pStmt, 0); 632 rc = sqlite3_finalize(pStmt); 633 }else if( rc==SQLITE_AUTH ){ 634 p->nPgsz = 1024; 635 rc = SQLITE_OK; 636 } 637 } 638 assert( p->nPgsz>0 || rc!=SQLITE_OK ); 639 sqlite3_free(zSql); 640 *pRc = rc; 641 } 642 } 643 644 /* 645 ** "Special" FTS4 arguments are column specifications of the following form: 646 ** 647 ** <key> = <value> 648 ** 649 ** There may not be whitespace surrounding the "=" character. The <value> 650 ** term may be quoted, but the <key> may not. 651 */ 652 static int fts3IsSpecialColumn( 653 const char *z, 654 int *pnKey, 655 char **pzValue 656 ){ 657 char *zValue; 658 const char *zCsr = z; 659 660 while( *zCsr!='=' ){ 661 if( *zCsr=='\0' ) return 0; 662 zCsr++; 663 } 664 665 *pnKey = (int)(zCsr-z); 666 zValue = sqlite3_mprintf("%s", &zCsr[1]); 667 if( zValue ){ 668 sqlite3Fts3Dequote(zValue); 669 } 670 *pzValue = zValue; 671 return 1; 672 } 673 674 /* 675 ** Append the output of a printf() style formatting to an existing string. 676 */ 677 static void fts3Appendf( 678 int *pRc, /* IN/OUT: Error code */ 679 char **pz, /* IN/OUT: Pointer to string buffer */ 680 const char *zFormat, /* Printf format string to append */ 681 ... /* Arguments for printf format string */ 682 ){ 683 if( *pRc==SQLITE_OK ){ 684 va_list ap; 685 char *z; 686 va_start(ap, zFormat); 687 z = sqlite3_vmprintf(zFormat, ap); 688 if( z && *pz ){ 689 char *z2 = sqlite3_mprintf("%s%s", *pz, z); 690 sqlite3_free(z); 691 z = z2; 692 } 693 if( z==0 ) *pRc = SQLITE_NOMEM; 694 sqlite3_free(*pz); 695 *pz = z; 696 } 697 } 698 699 /* 700 ** Return a copy of input string zInput enclosed in double-quotes (") and 701 ** with all double quote characters escaped. For example: 702 ** 703 ** fts3QuoteId("un \"zip\"") -> "un \"\"zip\"\"" 704 ** 705 ** The pointer returned points to memory obtained from sqlite3_malloc(). It 706 ** is the callers responsibility to call sqlite3_free() to release this 707 ** memory. 708 */ 709 static char *fts3QuoteId(char const *zInput){ 710 int nRet; 711 char *zRet; 712 nRet = 2 + strlen(zInput)*2 + 1; 713 zRet = sqlite3_malloc(nRet); 714 if( zRet ){ 715 int i; 716 char *z = zRet; 717 *(z++) = '"'; 718 for(i=0; zInput[i]; i++){ 719 if( zInput[i]=='"' ) *(z++) = '"'; 720 *(z++) = zInput[i]; 721 } 722 *(z++) = '"'; 723 *(z++) = '\0'; 724 } 725 return zRet; 726 } 727 728 /* 729 ** Return a list of comma separated SQL expressions that could be used 730 ** in a SELECT statement such as the following: 731 ** 732 ** SELECT <list of expressions> FROM %_content AS x ... 733 ** 734 ** to return the docid, followed by each column of text data in order 735 ** from left to write. If parameter zFunc is not NULL, then instead of 736 ** being returned directly each column of text data is passed to an SQL 737 ** function named zFunc first. For example, if zFunc is "unzip" and the 738 ** table has the three user-defined columns "a", "b", and "c", the following 739 ** string is returned: 740 ** 741 ** "docid, unzip(x.'a'), unzip(x.'b'), unzip(x.'c')" 742 ** 743 ** The pointer returned points to a buffer allocated by sqlite3_malloc(). It 744 ** is the responsibility of the caller to eventually free it. 745 ** 746 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op (and 747 ** a NULL pointer is returned). Otherwise, if an OOM error is encountered 748 ** by this function, NULL is returned and *pRc is set to SQLITE_NOMEM. If 749 ** no error occurs, *pRc is left unmodified. 750 */ 751 static char *fts3ReadExprList(Fts3Table *p, const char *zFunc, int *pRc){ 752 char *zRet = 0; 753 char *zFree = 0; 754 char *zFunction; 755 int i; 756 757 if( !zFunc ){ 758 zFunction = ""; 759 }else{ 760 zFree = zFunction = fts3QuoteId(zFunc); 761 } 762 fts3Appendf(pRc, &zRet, "docid"); 763 for(i=0; i<p->nColumn; i++){ 764 fts3Appendf(pRc, &zRet, ",%s(x.'c%d%q')", zFunction, i, p->azColumn[i]); 765 } 766 sqlite3_free(zFree); 767 return zRet; 768 } 769 770 /* 771 ** Return a list of N comma separated question marks, where N is the number 772 ** of columns in the %_content table (one for the docid plus one for each 773 ** user-defined text column). 774 ** 775 ** If argument zFunc is not NULL, then all but the first question mark 776 ** is preceded by zFunc and an open bracket, and followed by a closed 777 ** bracket. For example, if zFunc is "zip" and the FTS3 table has three 778 ** user-defined text columns, the following string is returned: 779 ** 780 ** "?, zip(?), zip(?), zip(?)" 781 ** 782 ** The pointer returned points to a buffer allocated by sqlite3_malloc(). It 783 ** is the responsibility of the caller to eventually free it. 784 ** 785 ** If *pRc is not SQLITE_OK when this function is called, it is a no-op (and 786 ** a NULL pointer is returned). Otherwise, if an OOM error is encountered 787 ** by this function, NULL is returned and *pRc is set to SQLITE_NOMEM. If 788 ** no error occurs, *pRc is left unmodified. 789 */ 790 static char *fts3WriteExprList(Fts3Table *p, const char *zFunc, int *pRc){ 791 char *zRet = 0; 792 char *zFree = 0; 793 char *zFunction; 794 int i; 795 796 if( !zFunc ){ 797 zFunction = ""; 798 }else{ 799 zFree = zFunction = fts3QuoteId(zFunc); 800 } 801 fts3Appendf(pRc, &zRet, "?"); 802 for(i=0; i<p->nColumn; i++){ 803 fts3Appendf(pRc, &zRet, ",%s(?)", zFunction); 804 } 805 sqlite3_free(zFree); 806 return zRet; 807 } 808 809 /* 810 ** This function is the implementation of both the xConnect and xCreate 811 ** methods of the FTS3 virtual table. 812 ** 813 ** The argv[] array contains the following: 814 ** 815 ** argv[0] -> module name ("fts3" or "fts4") 816 ** argv[1] -> database name 817 ** argv[2] -> table name 818 ** argv[...] -> "column name" and other module argument fields. 819 */ 820 static int fts3InitVtab( 821 int isCreate, /* True for xCreate, false for xConnect */ 822 sqlite3 *db, /* The SQLite database connection */ 823 void *pAux, /* Hash table containing tokenizers */ 824 int argc, /* Number of elements in argv array */ 825 const char * const *argv, /* xCreate/xConnect argument array */ 826 sqlite3_vtab **ppVTab, /* Write the resulting vtab structure here */ 827 char **pzErr /* Write any error message here */ 828 ){ 829 Fts3Hash *pHash = (Fts3Hash *)pAux; 830 Fts3Table *p = 0; /* Pointer to allocated vtab */ 831 int rc = SQLITE_OK; /* Return code */ 832 int i; /* Iterator variable */ 833 int nByte; /* Size of allocation used for *p */ 834 int iCol; /* Column index */ 835 int nString = 0; /* Bytes required to hold all column names */ 836 int nCol = 0; /* Number of columns in the FTS table */ 837 char *zCsr; /* Space for holding column names */ 838 int nDb; /* Bytes required to hold database name */ 839 int nName; /* Bytes required to hold table name */ 840 int isFts4 = (argv[0][3]=='4'); /* True for FTS4, false for FTS3 */ 841 int bNoDocsize = 0; /* True to omit %_docsize table */ 842 const char **aCol; /* Array of column names */ 843 sqlite3_tokenizer *pTokenizer = 0; /* Tokenizer for this table */ 844 845 char *zCompress = 0; 846 char *zUncompress = 0; 847 848 assert( strlen(argv[0])==4 ); 849 assert( (sqlite3_strnicmp(argv[0], "fts4", 4)==0 && isFts4) 850 || (sqlite3_strnicmp(argv[0], "fts3", 4)==0 && !isFts4) 851 ); 852 853 nDb = (int)strlen(argv[1]) + 1; 854 nName = (int)strlen(argv[2]) + 1; 855 856 aCol = (const char **)sqlite3_malloc(sizeof(const char *) * (argc-2) ); 857 if( !aCol ) return SQLITE_NOMEM; 858 memset((void *)aCol, 0, sizeof(const char *) * (argc-2)); 859 860 /* Loop through all of the arguments passed by the user to the FTS3/4 861 ** module (i.e. all the column names and special arguments). This loop 862 ** does the following: 863 ** 864 ** + Figures out the number of columns the FTSX table will have, and 865 ** the number of bytes of space that must be allocated to store copies 866 ** of the column names. 867 ** 868 ** + If there is a tokenizer specification included in the arguments, 869 ** initializes the tokenizer pTokenizer. 870 */ 871 for(i=3; rc==SQLITE_OK && i<argc; i++){ 872 char const *z = argv[i]; 873 int nKey; 874 char *zVal; 875 876 /* Check if this is a tokenizer specification */ 877 if( !pTokenizer 878 && strlen(z)>8 879 && 0==sqlite3_strnicmp(z, "tokenize", 8) 880 && 0==sqlite3Fts3IsIdChar(z[8]) 881 ){ 882 rc = sqlite3Fts3InitTokenizer(pHash, &z[9], &pTokenizer, pzErr); 883 } 884 885 /* Check if it is an FTS4 special argument. */ 886 else if( isFts4 && fts3IsSpecialColumn(z, &nKey, &zVal) ){ 887 if( !zVal ){ 888 rc = SQLITE_NOMEM; 889 goto fts3_init_out; 890 } 891 if( nKey==9 && 0==sqlite3_strnicmp(z, "matchinfo", 9) ){ 892 if( strlen(zVal)==4 && 0==sqlite3_strnicmp(zVal, "fts3", 4) ){ 893 bNoDocsize = 1; 894 }else{ 895 *pzErr = sqlite3_mprintf("unrecognized matchinfo: %s", zVal); 896 rc = SQLITE_ERROR; 897 } 898 }else if( nKey==8 && 0==sqlite3_strnicmp(z, "compress", 8) ){ 899 zCompress = zVal; 900 zVal = 0; 901 }else if( nKey==10 && 0==sqlite3_strnicmp(z, "uncompress", 10) ){ 902 zUncompress = zVal; 903 zVal = 0; 904 }else{ 905 *pzErr = sqlite3_mprintf("unrecognized parameter: %s", z); 906 rc = SQLITE_ERROR; 907 } 908 sqlite3_free(zVal); 909 } 910 911 /* Otherwise, the argument is a column name. */ 912 else { 913 nString += (int)(strlen(z) + 1); 914 aCol[nCol++] = z; 915 } 916 } 917 if( rc!=SQLITE_OK ) goto fts3_init_out; 918 919 if( nCol==0 ){ 920 assert( nString==0 ); 921 aCol[0] = "content"; 922 nString = 8; 923 nCol = 1; 924 } 925 926 if( pTokenizer==0 ){ 927 rc = sqlite3Fts3InitTokenizer(pHash, "simple", &pTokenizer, pzErr); 928 if( rc!=SQLITE_OK ) goto fts3_init_out; 929 } 930 assert( pTokenizer ); 931 932 933 /* Allocate and populate the Fts3Table structure. */ 934 nByte = sizeof(Fts3Table) + /* Fts3Table */ 935 nCol * sizeof(char *) + /* azColumn */ 936 nName + /* zName */ 937 nDb + /* zDb */ 938 nString; /* Space for azColumn strings */ 939 p = (Fts3Table*)sqlite3_malloc(nByte); 940 if( p==0 ){ 941 rc = SQLITE_NOMEM; 942 goto fts3_init_out; 943 } 944 memset(p, 0, nByte); 945 p->db = db; 946 p->nColumn = nCol; 947 p->nPendingData = 0; 948 p->azColumn = (char **)&p[1]; 949 p->pTokenizer = pTokenizer; 950 p->nNodeSize = 1000; 951 p->nMaxPendingData = FTS3_MAX_PENDING_DATA; 952 p->bHasDocsize = (isFts4 && bNoDocsize==0); 953 p->bHasStat = isFts4; 954 fts3HashInit(&p->pendingTerms, FTS3_HASH_STRING, 1); 955 956 /* Fill in the zName and zDb fields of the vtab structure. */ 957 zCsr = (char *)&p->azColumn[nCol]; 958 p->zName = zCsr; 959 memcpy(zCsr, argv[2], nName); 960 zCsr += nName; 961 p->zDb = zCsr; 962 memcpy(zCsr, argv[1], nDb); 963 zCsr += nDb; 964 965 /* Fill in the azColumn array */ 966 for(iCol=0; iCol<nCol; iCol++){ 967 char *z; 968 int n; 969 z = (char *)sqlite3Fts3NextToken(aCol[iCol], &n); 970 memcpy(zCsr, z, n); 971 zCsr[n] = '\0'; 972 sqlite3Fts3Dequote(zCsr); 973 p->azColumn[iCol] = zCsr; 974 zCsr += n+1; 975 assert( zCsr <= &((char *)p)[nByte] ); 976 } 977 978 if( (zCompress==0)!=(zUncompress==0) ){ 979 char const *zMiss = (zCompress==0 ? "compress" : "uncompress"); 980 rc = SQLITE_ERROR; 981 *pzErr = sqlite3_mprintf("missing %s parameter in fts4 constructor", zMiss); 982 } 983 p->zReadExprlist = fts3ReadExprList(p, zUncompress, &rc); 984 p->zWriteExprlist = fts3WriteExprList(p, zCompress, &rc); 985 if( rc!=SQLITE_OK ) goto fts3_init_out; 986 987 /* If this is an xCreate call, create the underlying tables in the 988 ** database. TODO: For xConnect(), it could verify that said tables exist. 989 */ 990 if( isCreate ){ 991 rc = fts3CreateTables(p); 992 } 993 994 /* Figure out the page-size for the database. This is required in order to 995 ** estimate the cost of loading large doclists from the database (see 996 ** function sqlite3Fts3SegReaderCost() for details). 997 */ 998 fts3DatabasePageSize(&rc, p); 999 1000 /* Declare the table schema to SQLite. */ 1001 fts3DeclareVtab(&rc, p); 1002 1003 fts3_init_out: 1004 sqlite3_free(zCompress); 1005 sqlite3_free(zUncompress); 1006 sqlite3_free((void *)aCol); 1007 if( rc!=SQLITE_OK ){ 1008 if( p ){ 1009 fts3DisconnectMethod((sqlite3_vtab *)p); 1010 }else if( pTokenizer ){ 1011 pTokenizer->pModule->xDestroy(pTokenizer); 1012 } 1013 }else{ 1014 *ppVTab = &p->base; 1015 } 1016 return rc; 1017 } 1018 1019 /* 1020 ** The xConnect() and xCreate() methods for the virtual table. All the 1021 ** work is done in function fts3InitVtab(). 1022 */ 1023 static int fts3ConnectMethod( 1024 sqlite3 *db, /* Database connection */ 1025 void *pAux, /* Pointer to tokenizer hash table */ 1026 int argc, /* Number of elements in argv array */ 1027 const char * const *argv, /* xCreate/xConnect argument array */ 1028 sqlite3_vtab **ppVtab, /* OUT: New sqlite3_vtab object */ 1029 char **pzErr /* OUT: sqlite3_malloc'd error message */ 1030 ){ 1031 return fts3InitVtab(0, db, pAux, argc, argv, ppVtab, pzErr); 1032 } 1033 static int fts3CreateMethod( 1034 sqlite3 *db, /* Database connection */ 1035 void *pAux, /* Pointer to tokenizer hash table */ 1036 int argc, /* Number of elements in argv array */ 1037 const char * const *argv, /* xCreate/xConnect argument array */ 1038 sqlite3_vtab **ppVtab, /* OUT: New sqlite3_vtab object */ 1039 char **pzErr /* OUT: sqlite3_malloc'd error message */ 1040 ){ 1041 return fts3InitVtab(1, db, pAux, argc, argv, ppVtab, pzErr); 1042 } 1043 1044 /* 1045 ** Implementation of the xBestIndex method for FTS3 tables. There 1046 ** are three possible strategies, in order of preference: 1047 ** 1048 ** 1. Direct lookup by rowid or docid. 1049 ** 2. Full-text search using a MATCH operator on a non-docid column. 1050 ** 3. Linear scan of %_content table. 1051 */ 1052 static int fts3BestIndexMethod(sqlite3_vtab *pVTab, sqlite3_index_info *pInfo){ 1053 Fts3Table *p = (Fts3Table *)pVTab; 1054 int i; /* Iterator variable */ 1055 int iCons = -1; /* Index of constraint to use */ 1056 1057 /* By default use a full table scan. This is an expensive option, 1058 ** so search through the constraints to see if a more efficient 1059 ** strategy is possible. 1060 */ 1061 pInfo->idxNum = FTS3_FULLSCAN_SEARCH; 1062 pInfo->estimatedCost = 500000; 1063 for(i=0; i<pInfo->nConstraint; i++){ 1064 struct sqlite3_index_constraint *pCons = &pInfo->aConstraint[i]; 1065 if( pCons->usable==0 ) continue; 1066 1067 /* A direct lookup on the rowid or docid column. Assign a cost of 1.0. */ 1068 if( pCons->op==SQLITE_INDEX_CONSTRAINT_EQ 1069 && (pCons->iColumn<0 || pCons->iColumn==p->nColumn+1 ) 1070 ){ 1071 pInfo->idxNum = FTS3_DOCID_SEARCH; 1072 pInfo->estimatedCost = 1.0; 1073 iCons = i; 1074 } 1075 1076 /* A MATCH constraint. Use a full-text search. 1077 ** 1078 ** If there is more than one MATCH constraint available, use the first 1079 ** one encountered. If there is both a MATCH constraint and a direct 1080 ** rowid/docid lookup, prefer the MATCH strategy. This is done even 1081 ** though the rowid/docid lookup is faster than a MATCH query, selecting 1082 ** it would lead to an "unable to use function MATCH in the requested 1083 ** context" error. 1084 */ 1085 if( pCons->op==SQLITE_INDEX_CONSTRAINT_MATCH 1086 && pCons->iColumn>=0 && pCons->iColumn<=p->nColumn 1087 ){ 1088 pInfo->idxNum = FTS3_FULLTEXT_SEARCH + pCons->iColumn; 1089 pInfo->estimatedCost = 2.0; 1090 iCons = i; 1091 break; 1092 } 1093 } 1094 1095 if( iCons>=0 ){ 1096 pInfo->aConstraintUsage[iCons].argvIndex = 1; 1097 pInfo->aConstraintUsage[iCons].omit = 1; 1098 } 1099 return SQLITE_OK; 1100 } 1101 1102 /* 1103 ** Implementation of xOpen method. 1104 */ 1105 static int fts3OpenMethod(sqlite3_vtab *pVTab, sqlite3_vtab_cursor **ppCsr){ 1106 sqlite3_vtab_cursor *pCsr; /* Allocated cursor */ 1107 1108 UNUSED_PARAMETER(pVTab); 1109 1110 /* Allocate a buffer large enough for an Fts3Cursor structure. If the 1111 ** allocation succeeds, zero it and return SQLITE_OK. Otherwise, 1112 ** if the allocation fails, return SQLITE_NOMEM. 1113 */ 1114 *ppCsr = pCsr = (sqlite3_vtab_cursor *)sqlite3_malloc(sizeof(Fts3Cursor)); 1115 if( !pCsr ){ 1116 return SQLITE_NOMEM; 1117 } 1118 memset(pCsr, 0, sizeof(Fts3Cursor)); 1119 return SQLITE_OK; 1120 } 1121 1122 /* 1123 ** Close the cursor. For additional information see the documentation 1124 ** on the xClose method of the virtual table interface. 1125 */ 1126 static int fts3CloseMethod(sqlite3_vtab_cursor *pCursor){ 1127 Fts3Cursor *pCsr = (Fts3Cursor *)pCursor; 1128 assert( ((Fts3Table *)pCsr->base.pVtab)->pSegments==0 ); 1129 sqlite3_finalize(pCsr->pStmt); 1130 sqlite3Fts3ExprFree(pCsr->pExpr); 1131 sqlite3Fts3FreeDeferredTokens(pCsr); 1132 sqlite3_free(pCsr->aDoclist); 1133 sqlite3_free(pCsr->aMatchinfo); 1134 sqlite3_free(pCsr); 1135 return SQLITE_OK; 1136 } 1137 1138 /* 1139 ** Position the pCsr->pStmt statement so that it is on the row 1140 ** of the %_content table that contains the last match. Return 1141 ** SQLITE_OK on success. 1142 */ 1143 static int fts3CursorSeek(sqlite3_context *pContext, Fts3Cursor *pCsr){ 1144 if( pCsr->isRequireSeek ){ 1145 pCsr->isRequireSeek = 0; 1146 sqlite3_bind_int64(pCsr->pStmt, 1, pCsr->iPrevId); 1147 if( SQLITE_ROW==sqlite3_step(pCsr->pStmt) ){ 1148 return SQLITE_OK; 1149 }else{ 1150 int rc = sqlite3_reset(pCsr->pStmt); 1151 if( rc==SQLITE_OK ){ 1152 /* If no row was found and no error has occured, then the %_content 1153 ** table is missing a row that is present in the full-text index. 1154 ** The data structures are corrupt. 1155 */ 1156 rc = SQLITE_CORRUPT; 1157 } 1158 pCsr->isEof = 1; 1159 if( pContext ){ 1160 sqlite3_result_error_code(pContext, rc); 1161 } 1162 return rc; 1163 } 1164 }else{ 1165 return SQLITE_OK; 1166 } 1167 } 1168 1169 /* 1170 ** This function is used to process a single interior node when searching 1171 ** a b-tree for a term or term prefix. The node data is passed to this 1172 ** function via the zNode/nNode parameters. The term to search for is 1173 ** passed in zTerm/nTerm. 1174 ** 1175 ** If piFirst is not NULL, then this function sets *piFirst to the blockid 1176 ** of the child node that heads the sub-tree that may contain the term. 1177 ** 1178 ** If piLast is not NULL, then *piLast is set to the right-most child node 1179 ** that heads a sub-tree that may contain a term for which zTerm/nTerm is 1180 ** a prefix. 1181 ** 1182 ** If an OOM error occurs, SQLITE_NOMEM is returned. Otherwise, SQLITE_OK. 1183 */ 1184 static int fts3ScanInteriorNode( 1185 const char *zTerm, /* Term to select leaves for */ 1186 int nTerm, /* Size of term zTerm in bytes */ 1187 const char *zNode, /* Buffer containing segment interior node */ 1188 int nNode, /* Size of buffer at zNode */ 1189 sqlite3_int64 *piFirst, /* OUT: Selected child node */ 1190 sqlite3_int64 *piLast /* OUT: Selected child node */ 1191 ){ 1192 int rc = SQLITE_OK; /* Return code */ 1193 const char *zCsr = zNode; /* Cursor to iterate through node */ 1194 const char *zEnd = &zCsr[nNode];/* End of interior node buffer */ 1195 char *zBuffer = 0; /* Buffer to load terms into */ 1196 int nAlloc = 0; /* Size of allocated buffer */ 1197 int isFirstTerm = 1; /* True when processing first term on page */ 1198 sqlite3_int64 iChild; /* Block id of child node to descend to */ 1199 1200 /* Skip over the 'height' varint that occurs at the start of every 1201 ** interior node. Then load the blockid of the left-child of the b-tree 1202 ** node into variable iChild. 1203 ** 1204 ** Even if the data structure on disk is corrupted, this (reading two 1205 ** varints from the buffer) does not risk an overread. If zNode is a 1206 ** root node, then the buffer comes from a SELECT statement. SQLite does 1207 ** not make this guarantee explicitly, but in practice there are always 1208 ** either more than 20 bytes of allocated space following the nNode bytes of 1209 ** contents, or two zero bytes. Or, if the node is read from the %_segments 1210 ** table, then there are always 20 bytes of zeroed padding following the 1211 ** nNode bytes of content (see sqlite3Fts3ReadBlock() for details). 1212 */ 1213 zCsr += sqlite3Fts3GetVarint(zCsr, &iChild); 1214 zCsr += sqlite3Fts3GetVarint(zCsr, &iChild); 1215 if( zCsr>zEnd ){ 1216 return SQLITE_CORRUPT; 1217 } 1218 1219 while( zCsr<zEnd && (piFirst || piLast) ){ 1220 int cmp; /* memcmp() result */ 1221 int nSuffix; /* Size of term suffix */ 1222 int nPrefix = 0; /* Size of term prefix */ 1223 int nBuffer; /* Total term size */ 1224 1225 /* Load the next term on the node into zBuffer. Use realloc() to expand 1226 ** the size of zBuffer if required. */ 1227 if( !isFirstTerm ){ 1228 zCsr += sqlite3Fts3GetVarint32(zCsr, &nPrefix); 1229 } 1230 isFirstTerm = 0; 1231 zCsr += sqlite3Fts3GetVarint32(zCsr, &nSuffix); 1232 1233 /* NOTE(shess): Previous code checked for negative nPrefix and 1234 ** nSuffix and suffix overrunning zEnd. Additionally corrupt if 1235 ** the prefix is longer than the previous term, or if the suffix 1236 ** causes overflow. 1237 */ 1238 if( nPrefix<0 || nSuffix<0 /* || nPrefix>nBuffer */ 1239 || &zCsr[nSuffix]<zCsr || &zCsr[nSuffix]>zEnd ){ 1240 rc = SQLITE_CORRUPT; 1241 goto finish_scan; 1242 } 1243 if( nPrefix+nSuffix>nAlloc ){ 1244 char *zNew; 1245 nAlloc = (nPrefix+nSuffix) * 2; 1246 zNew = (char *)sqlite3_realloc(zBuffer, nAlloc); 1247 if( !zNew ){ 1248 rc = SQLITE_NOMEM; 1249 goto finish_scan; 1250 } 1251 zBuffer = zNew; 1252 } 1253 memcpy(&zBuffer[nPrefix], zCsr, nSuffix); 1254 nBuffer = nPrefix + nSuffix; 1255 zCsr += nSuffix; 1256 1257 /* Compare the term we are searching for with the term just loaded from 1258 ** the interior node. If the specified term is greater than or equal 1259 ** to the term from the interior node, then all terms on the sub-tree 1260 ** headed by node iChild are smaller than zTerm. No need to search 1261 ** iChild. 1262 ** 1263 ** If the interior node term is larger than the specified term, then 1264 ** the tree headed by iChild may contain the specified term. 1265 */ 1266 cmp = memcmp(zTerm, zBuffer, (nBuffer>nTerm ? nTerm : nBuffer)); 1267 if( piFirst && (cmp<0 || (cmp==0 && nBuffer>nTerm)) ){ 1268 *piFirst = iChild; 1269 piFirst = 0; 1270 } 1271 1272 if( piLast && cmp<0 ){ 1273 *piLast = iChild; 1274 piLast = 0; 1275 } 1276 1277 iChild++; 1278 }; 1279 1280 if( piFirst ) *piFirst = iChild; 1281 if( piLast ) *piLast = iChild; 1282 1283 finish_scan: 1284 sqlite3_free(zBuffer); 1285 return rc; 1286 } 1287 1288 1289 /* 1290 ** The buffer pointed to by argument zNode (size nNode bytes) contains an 1291 ** interior node of a b-tree segment. The zTerm buffer (size nTerm bytes) 1292 ** contains a term. This function searches the sub-tree headed by the zNode 1293 ** node for the range of leaf nodes that may contain the specified term 1294 ** or terms for which the specified term is a prefix. 1295 ** 1296 ** If piLeaf is not NULL, then *piLeaf is set to the blockid of the 1297 ** left-most leaf node in the tree that may contain the specified term. 1298 ** If piLeaf2 is not NULL, then *piLeaf2 is set to the blockid of the 1299 ** right-most leaf node that may contain a term for which the specified 1300 ** term is a prefix. 1301 ** 1302 ** It is possible that the range of returned leaf nodes does not contain 1303 ** the specified term or any terms for which it is a prefix. However, if the 1304 ** segment does contain any such terms, they are stored within the identified 1305 ** range. Because this function only inspects interior segment nodes (and 1306 ** never loads leaf nodes into memory), it is not possible to be sure. 1307 ** 1308 ** If an error occurs, an error code other than SQLITE_OK is returned. 1309 */ 1310 static int fts3SelectLeaf( 1311 Fts3Table *p, /* Virtual table handle */ 1312 const char *zTerm, /* Term to select leaves for */ 1313 int nTerm, /* Size of term zTerm in bytes */ 1314 const char *zNode, /* Buffer containing segment interior node */ 1315 int nNode, /* Size of buffer at zNode */ 1316 sqlite3_int64 *piLeaf, /* Selected leaf node */ 1317 sqlite3_int64 *piLeaf2 /* Selected leaf node */ 1318 ){ 1319 int rc; /* Return code */ 1320 int iHeight; /* Height of this node in tree */ 1321 1322 assert( piLeaf || piLeaf2 ); 1323 1324 sqlite3Fts3GetVarint32(zNode, &iHeight); 1325 rc = fts3ScanInteriorNode(zTerm, nTerm, zNode, nNode, piLeaf, piLeaf2); 1326 assert( !piLeaf2 || !piLeaf || rc!=SQLITE_OK || (*piLeaf<=*piLeaf2) ); 1327 1328 if( rc==SQLITE_OK && iHeight>1 ){ 1329 char *zBlob = 0; /* Blob read from %_segments table */ 1330 int nBlob; /* Size of zBlob in bytes */ 1331 1332 if( piLeaf && piLeaf2 && (*piLeaf!=*piLeaf2) ){ 1333 rc = sqlite3Fts3ReadBlock(p, *piLeaf, &zBlob, &nBlob); 1334 if( rc==SQLITE_OK ){ 1335 rc = fts3SelectLeaf(p, zTerm, nTerm, zBlob, nBlob, piLeaf, 0); 1336 } 1337 sqlite3_free(zBlob); 1338 piLeaf = 0; 1339 zBlob = 0; 1340 } 1341 1342 if( rc==SQLITE_OK ){ 1343 rc = sqlite3Fts3ReadBlock(p, piLeaf ? *piLeaf : *piLeaf2, &zBlob, &nBlob); 1344 } 1345 if( rc==SQLITE_OK ){ 1346 rc = fts3SelectLeaf(p, zTerm, nTerm, zBlob, nBlob, piLeaf, piLeaf2); 1347 } 1348 sqlite3_free(zBlob); 1349 } 1350 1351 return rc; 1352 } 1353 1354 /* 1355 ** This function is used to create delta-encoded serialized lists of FTS3 1356 ** varints. Each call to this function appends a single varint to a list. 1357 */ 1358 static void fts3PutDeltaVarint( 1359 char **pp, /* IN/OUT: Output pointer */ 1360 sqlite3_int64 *piPrev, /* IN/OUT: Previous value written to list */ 1361 sqlite3_int64 iVal /* Write this value to the list */ 1362 ){ 1363 assert( iVal-*piPrev > 0 || (*piPrev==0 && iVal==0) ); 1364 *pp += sqlite3Fts3PutVarint(*pp, iVal-*piPrev); 1365 *piPrev = iVal; 1366 } 1367 1368 /* 1369 ** When this function is called, *ppPoslist is assumed to point to the 1370 ** start of a position-list. After it returns, *ppPoslist points to the 1371 ** first byte after the position-list. 1372 ** 1373 ** A position list is list of positions (delta encoded) and columns for 1374 ** a single document record of a doclist. So, in other words, this 1375 ** routine advances *ppPoslist so that it points to the next docid in 1376 ** the doclist, or to the first byte past the end of the doclist. 1377 ** 1378 ** If pp is not NULL, then the contents of the position list are copied 1379 ** to *pp. *pp is set to point to the first byte past the last byte copied 1380 ** before this function returns. 1381 */ 1382 static void fts3PoslistCopy(char **pp, char **ppPoslist){ 1383 char *pEnd = *ppPoslist; 1384 char c = 0; 1385 1386 /* The end of a position list is marked by a zero encoded as an FTS3 1387 ** varint. A single POS_END (0) byte. Except, if the 0 byte is preceded by 1388 ** a byte with the 0x80 bit set, then it is not a varint 0, but the tail 1389 ** of some other, multi-byte, value. 1390 ** 1391 ** The following while-loop moves pEnd to point to the first byte that is not 1392 ** immediately preceded by a byte with the 0x80 bit set. Then increments 1393 ** pEnd once more so that it points to the byte immediately following the 1394 ** last byte in the position-list. 1395 */ 1396 while( *pEnd | c ){ 1397 c = *pEnd++ & 0x80; 1398 testcase( c!=0 && (*pEnd)==0 ); 1399 } 1400 pEnd++; /* Advance past the POS_END terminator byte */ 1401 1402 if( pp ){ 1403 int n = (int)(pEnd - *ppPoslist); 1404 char *p = *pp; 1405 memcpy(p, *ppPoslist, n); 1406 p += n; 1407 *pp = p; 1408 } 1409 *ppPoslist = pEnd; 1410 } 1411 1412 /* 1413 ** When this function is called, *ppPoslist is assumed to point to the 1414 ** start of a column-list. After it returns, *ppPoslist points to the 1415 ** to the terminator (POS_COLUMN or POS_END) byte of the column-list. 1416 ** 1417 ** A column-list is list of delta-encoded positions for a single column 1418 ** within a single document within a doclist. 1419 ** 1420 ** The column-list is terminated either by a POS_COLUMN varint (1) or 1421 ** a POS_END varint (0). This routine leaves *ppPoslist pointing to 1422 ** the POS_COLUMN or POS_END that terminates the column-list. 1423 ** 1424 ** If pp is not NULL, then the contents of the column-list are copied 1425 ** to *pp. *pp is set to point to the first byte past the last byte copied 1426 ** before this function returns. The POS_COLUMN or POS_END terminator 1427 ** is not copied into *pp. 1428 */ 1429 static void fts3ColumnlistCopy(char **pp, char **ppPoslist){ 1430 char *pEnd = *ppPoslist; 1431 char c = 0; 1432 1433 /* A column-list is terminated by either a 0x01 or 0x00 byte that is 1434 ** not part of a multi-byte varint. 1435 */ 1436 while( 0xFE & (*pEnd | c) ){ 1437 c = *pEnd++ & 0x80; 1438 testcase( c!=0 && ((*pEnd)&0xfe)==0 ); 1439 } 1440 if( pp ){ 1441 int n = (int)(pEnd - *ppPoslist); 1442 char *p = *pp; 1443 memcpy(p, *ppPoslist, n); 1444 p += n; 1445 *pp = p; 1446 } 1447 *ppPoslist = pEnd; 1448 } 1449 1450 /* 1451 ** Value used to signify the end of an position-list. This is safe because 1452 ** it is not possible to have a document with 2^31 terms. 1453 */ 1454 #define POSITION_LIST_END 0x7fffffff 1455 1456 /* 1457 ** This function is used to help parse position-lists. When this function is 1458 ** called, *pp may point to the start of the next varint in the position-list 1459 ** being parsed, or it may point to 1 byte past the end of the position-list 1460 ** (in which case **pp will be a terminator bytes POS_END (0) or 1461 ** (1)). 1462 ** 1463 ** If *pp points past the end of the current position-list, set *pi to 1464 ** POSITION_LIST_END and return. Otherwise, read the next varint from *pp, 1465 ** increment the current value of *pi by the value read, and set *pp to 1466 ** point to the next value before returning. 1467 ** 1468 ** Before calling this routine *pi must be initialized to the value of 1469 ** the previous position, or zero if we are reading the first position 1470 ** in the position-list. Because positions are delta-encoded, the value 1471 ** of the previous position is needed in order to compute the value of 1472 ** the next position. 1473 */ 1474 static void fts3ReadNextPos( 1475 char **pp, /* IN/OUT: Pointer into position-list buffer */ 1476 sqlite3_int64 *pi /* IN/OUT: Value read from position-list */ 1477 ){ 1478 if( (**pp)&0xFE ){ 1479 fts3GetDeltaVarint(pp, pi); 1480 *pi -= 2; 1481 }else{ 1482 *pi = POSITION_LIST_END; 1483 } 1484 } 1485 1486 /* 1487 ** If parameter iCol is not 0, write an POS_COLUMN (1) byte followed by 1488 ** the value of iCol encoded as a varint to *pp. This will start a new 1489 ** column list. 1490 ** 1491 ** Set *pp to point to the byte just after the last byte written before 1492 ** returning (do not modify it if iCol==0). Return the total number of bytes 1493 ** written (0 if iCol==0). 1494 */ 1495 static int fts3PutColNumber(char **pp, int iCol){ 1496 int n = 0; /* Number of bytes written */ 1497 if( iCol ){ 1498 char *p = *pp; /* Output pointer */ 1499 n = 1 + sqlite3Fts3PutVarint(&p[1], iCol); 1500 *p = 0x01; 1501 *pp = &p[n]; 1502 } 1503 return n; 1504 } 1505 1506 /* 1507 ** Compute the union of two position lists. The output written 1508 ** into *pp contains all positions of both *pp1 and *pp2 in sorted 1509 ** order and with any duplicates removed. All pointers are 1510 ** updated appropriately. The caller is responsible for insuring 1511 ** that there is enough space in *pp to hold the complete output. 1512 */ 1513 static void fts3PoslistMerge( 1514 char **pp, /* Output buffer */ 1515 char **pp1, /* Left input list */ 1516 char **pp2 /* Right input list */ 1517 ){ 1518 char *p = *pp; 1519 char *p1 = *pp1; 1520 char *p2 = *pp2; 1521 1522 while( *p1 || *p2 ){ 1523 int iCol1; /* The current column index in pp1 */ 1524 int iCol2; /* The current column index in pp2 */ 1525 1526 if( *p1==POS_COLUMN ) sqlite3Fts3GetVarint32(&p1[1], &iCol1); 1527 else if( *p1==POS_END ) iCol1 = POSITION_LIST_END; 1528 else iCol1 = 0; 1529 1530 if( *p2==POS_COLUMN ) sqlite3Fts3GetVarint32(&p2[1], &iCol2); 1531 else if( *p2==POS_END ) iCol2 = POSITION_LIST_END; 1532 else iCol2 = 0; 1533 1534 if( iCol1==iCol2 ){ 1535 sqlite3_int64 i1 = 0; /* Last position from pp1 */ 1536 sqlite3_int64 i2 = 0; /* Last position from pp2 */ 1537 sqlite3_int64 iPrev = 0; 1538 int n = fts3PutColNumber(&p, iCol1); 1539 p1 += n; 1540 p2 += n; 1541 1542 /* At this point, both p1 and p2 point to the start of column-lists 1543 ** for the same column (the column with index iCol1 and iCol2). 1544 ** A column-list is a list of non-negative delta-encoded varints, each 1545 ** incremented by 2 before being stored. Each list is terminated by a 1546 ** POS_END (0) or POS_COLUMN (1). The following block merges the two lists 1547 ** and writes the results to buffer p. p is left pointing to the byte 1548 ** after the list written. No terminator (POS_END or POS_COLUMN) is 1549 ** written to the output. 1550 */ 1551 fts3GetDeltaVarint(&p1, &i1); 1552 fts3GetDeltaVarint(&p2, &i2); 1553 do { 1554 fts3PutDeltaVarint(&p, &iPrev, (i1<i2) ? i1 : i2); 1555 iPrev -= 2; 1556 if( i1==i2 ){ 1557 fts3ReadNextPos(&p1, &i1); 1558 fts3ReadNextPos(&p2, &i2); 1559 }else if( i1<i2 ){ 1560 fts3ReadNextPos(&p1, &i1); 1561 }else{ 1562 fts3ReadNextPos(&p2, &i2); 1563 } 1564 }while( i1!=POSITION_LIST_END || i2!=POSITION_LIST_END ); 1565 }else if( iCol1<iCol2 ){ 1566 p1 += fts3PutColNumber(&p, iCol1); 1567 fts3ColumnlistCopy(&p, &p1); 1568 }else{ 1569 p2 += fts3PutColNumber(&p, iCol2); 1570 fts3ColumnlistCopy(&p, &p2); 1571 } 1572 } 1573 1574 *p++ = POS_END; 1575 *pp = p; 1576 *pp1 = p1 + 1; 1577 *pp2 = p2 + 1; 1578 } 1579 1580 /* 1581 ** nToken==1 searches for adjacent positions. 1582 ** 1583 ** This function is used to merge two position lists into one. When it is 1584 ** called, *pp1 and *pp2 must both point to position lists. A position-list is 1585 ** the part of a doclist that follows each document id. For example, if a row 1586 ** contains: 1587 ** 1588 ** 'a b c'|'x y z'|'a b b a' 1589 ** 1590 ** Then the position list for this row for token 'b' would consist of: 1591 ** 1592 ** 0x02 0x01 0x02 0x03 0x03 0x00 1593 ** 1594 ** When this function returns, both *pp1 and *pp2 are left pointing to the 1595 ** byte following the 0x00 terminator of their respective position lists. 1596 ** 1597 ** If isSaveLeft is 0, an entry is added to the output position list for 1598 ** each position in *pp2 for which there exists one or more positions in 1599 ** *pp1 so that (pos(*pp2)>pos(*pp1) && pos(*pp2)-pos(*pp1)<=nToken). i.e. 1600 ** when the *pp1 token appears before the *pp2 token, but not more than nToken 1601 ** slots before it. 1602 */ 1603 static int fts3PoslistPhraseMerge( 1604 char **pp, /* IN/OUT: Preallocated output buffer */ 1605 int nToken, /* Maximum difference in token positions */ 1606 int isSaveLeft, /* Save the left position */ 1607 int isExact, /* If *pp1 is exactly nTokens before *pp2 */ 1608 char **pp1, /* IN/OUT: Left input list */ 1609 char **pp2 /* IN/OUT: Right input list */ 1610 ){ 1611 char *p = (pp ? *pp : 0); 1612 char *p1 = *pp1; 1613 char *p2 = *pp2; 1614 int iCol1 = 0; 1615 int iCol2 = 0; 1616 1617 /* Never set both isSaveLeft and isExact for the same invocation. */ 1618 assert( isSaveLeft==0 || isExact==0 ); 1619 1620 assert( *p1!=0 && *p2!=0 ); 1621 if( *p1==POS_COLUMN ){ 1622 p1++; 1623 p1 += sqlite3Fts3GetVarint32(p1, &iCol1); 1624 } 1625 if( *p2==POS_COLUMN ){ 1626 p2++; 1627 p2 += sqlite3Fts3GetVarint32(p2, &iCol2); 1628 } 1629 1630 while( 1 ){ 1631 if( iCol1==iCol2 ){ 1632 char *pSave = p; 1633 sqlite3_int64 iPrev = 0; 1634 sqlite3_int64 iPos1 = 0; 1635 sqlite3_int64 iPos2 = 0; 1636 1637 if( pp && iCol1 ){ 1638 *p++ = POS_COLUMN; 1639 p += sqlite3Fts3PutVarint(p, iCol1); 1640 } 1641 1642 assert( *p1!=POS_END && *p1!=POS_COLUMN ); 1643 assert( *p2!=POS_END && *p2!=POS_COLUMN ); 1644 fts3GetDeltaVarint(&p1, &iPos1); iPos1 -= 2; 1645 fts3GetDeltaVarint(&p2, &iPos2); iPos2 -= 2; 1646 1647 while( 1 ){ 1648 if( iPos2==iPos1+nToken 1649 || (isExact==0 && iPos2>iPos1 && iPos2<=iPos1+nToken) 1650 ){ 1651 sqlite3_int64 iSave; 1652 if( !pp ){ 1653 fts3PoslistCopy(0, &p2); 1654 fts3PoslistCopy(0, &p1); 1655 *pp1 = p1; 1656 *pp2 = p2; 1657 return 1; 1658 } 1659 iSave = isSaveLeft ? iPos1 : iPos2; 1660 fts3PutDeltaVarint(&p, &iPrev, iSave+2); iPrev -= 2; 1661 pSave = 0; 1662 } 1663 if( (!isSaveLeft && iPos2<=(iPos1+nToken)) || iPos2<=iPos1 ){ 1664 if( (*p2&0xFE)==0 ) break; 1665 fts3GetDeltaVarint(&p2, &iPos2); iPos2 -= 2; 1666 }else{ 1667 if( (*p1&0xFE)==0 ) break; 1668 fts3GetDeltaVarint(&p1, &iPos1); iPos1 -= 2; 1669 } 1670 } 1671 1672 if( pSave ){ 1673 assert( pp && p ); 1674 p = pSave; 1675 } 1676 1677 fts3ColumnlistCopy(0, &p1); 1678 fts3ColumnlistCopy(0, &p2); 1679 assert( (*p1&0xFE)==0 && (*p2&0xFE)==0 ); 1680 if( 0==*p1 || 0==*p2 ) break; 1681 1682 p1++; 1683 p1 += sqlite3Fts3GetVarint32(p1, &iCol1); 1684 p2++; 1685 p2 += sqlite3Fts3GetVarint32(p2, &iCol2); 1686 } 1687 1688 /* Advance pointer p1 or p2 (whichever corresponds to the smaller of 1689 ** iCol1 and iCol2) so that it points to either the 0x00 that marks the 1690 ** end of the position list, or the 0x01 that precedes the next 1691 ** column-number in the position list. 1692 */ 1693 else if( iCol1<iCol2 ){ 1694 fts3ColumnlistCopy(0, &p1); 1695 if( 0==*p1 ) break; 1696 p1++; 1697 p1 += sqlite3Fts3GetVarint32(p1, &iCol1); 1698 }else{ 1699 fts3ColumnlistCopy(0, &p2); 1700 if( 0==*p2 ) break; 1701 p2++; 1702 p2 += sqlite3Fts3GetVarint32(p2, &iCol2); 1703 } 1704 } 1705 1706 fts3PoslistCopy(0, &p2); 1707 fts3PoslistCopy(0, &p1); 1708 *pp1 = p1; 1709 *pp2 = p2; 1710 if( !pp || *pp==p ){ 1711 return 0; 1712 } 1713 *p++ = 0x00; 1714 *pp = p; 1715 return 1; 1716 } 1717 1718 /* 1719 ** Merge two position-lists as required by the NEAR operator. 1720 */ 1721 static int fts3PoslistNearMerge( 1722 char **pp, /* Output buffer */ 1723 char *aTmp, /* Temporary buffer space */ 1724 int nRight, /* Maximum difference in token positions */ 1725 int nLeft, /* Maximum difference in token positions */ 1726 char **pp1, /* IN/OUT: Left input list */ 1727 char **pp2 /* IN/OUT: Right input list */ 1728 ){ 1729 char *p1 = *pp1; 1730 char *p2 = *pp2; 1731 1732 if( !pp ){ 1733 if( fts3PoslistPhraseMerge(0, nRight, 0, 0, pp1, pp2) ) return 1; 1734 *pp1 = p1; 1735 *pp2 = p2; 1736 return fts3PoslistPhraseMerge(0, nLeft, 0, 0, pp2, pp1); 1737 }else{ 1738 char *pTmp1 = aTmp; 1739 char *pTmp2; 1740 char *aTmp2; 1741 int res = 1; 1742 1743 fts3PoslistPhraseMerge(&pTmp1, nRight, 0, 0, pp1, pp2); 1744 aTmp2 = pTmp2 = pTmp1; 1745 *pp1 = p1; 1746 *pp2 = p2; 1747 fts3PoslistPhraseMerge(&pTmp2, nLeft, 1, 0, pp2, pp1); 1748 if( pTmp1!=aTmp && pTmp2!=aTmp2 ){ 1749 fts3PoslistMerge(pp, &aTmp, &aTmp2); 1750 }else if( pTmp1!=aTmp ){ 1751 fts3PoslistCopy(pp, &aTmp); 1752 }else if( pTmp2!=aTmp2 ){ 1753 fts3PoslistCopy(pp, &aTmp2); 1754 }else{ 1755 res = 0; 1756 } 1757 1758 return res; 1759 } 1760 } 1761 1762 /* 1763 ** Values that may be used as the first parameter to fts3DoclistMerge(). 1764 */ 1765 #define MERGE_NOT 2 /* D + D -> D */ 1766 #define MERGE_AND 3 /* D + D -> D */ 1767 #define MERGE_OR 4 /* D + D -> D */ 1768 #define MERGE_POS_OR 5 /* P + P -> P */ 1769 #define MERGE_PHRASE 6 /* P + P -> D */ 1770 #define MERGE_POS_PHRASE 7 /* P + P -> P */ 1771 #define MERGE_NEAR 8 /* P + P -> D */ 1772 #define MERGE_POS_NEAR 9 /* P + P -> P */ 1773 1774 /* 1775 ** Merge the two doclists passed in buffer a1 (size n1 bytes) and a2 1776 ** (size n2 bytes). The output is written to pre-allocated buffer aBuffer, 1777 ** which is guaranteed to be large enough to hold the results. The number 1778 ** of bytes written to aBuffer is stored in *pnBuffer before returning. 1779 ** 1780 ** If successful, SQLITE_OK is returned. Otherwise, if a malloc error 1781 ** occurs while allocating a temporary buffer as part of the merge operation, 1782 ** SQLITE_NOMEM is returned. 1783 */ 1784 static int fts3DoclistMerge( 1785 int mergetype, /* One of the MERGE_XXX constants */ 1786 int nParam1, /* Used by MERGE_NEAR and MERGE_POS_NEAR */ 1787 int nParam2, /* Used by MERGE_NEAR and MERGE_POS_NEAR */ 1788 char *aBuffer, /* Pre-allocated output buffer */ 1789 int *pnBuffer, /* OUT: Bytes written to aBuffer */ 1790 char *a1, /* Buffer containing first doclist */ 1791 int n1, /* Size of buffer a1 */ 1792 char *a2, /* Buffer containing second doclist */ 1793 int n2, /* Size of buffer a2 */ 1794 int *pnDoc /* OUT: Number of docids in output */ 1795 ){ 1796 sqlite3_int64 i1 = 0; 1797 sqlite3_int64 i2 = 0; 1798 sqlite3_int64 iPrev = 0; 1799 1800 char *p = aBuffer; 1801 char *p1 = a1; 1802 char *p2 = a2; 1803 char *pEnd1 = &a1[n1]; 1804 char *pEnd2 = &a2[n2]; 1805 int nDoc = 0; 1806 1807 assert( mergetype==MERGE_OR || mergetype==MERGE_POS_OR 1808 || mergetype==MERGE_AND || mergetype==MERGE_NOT 1809 || mergetype==MERGE_PHRASE || mergetype==MERGE_POS_PHRASE 1810 || mergetype==MERGE_NEAR || mergetype==MERGE_POS_NEAR 1811 ); 1812 1813 if( !aBuffer ){ 1814 *pnBuffer = 0; 1815 return SQLITE_NOMEM; 1816 } 1817 1818 /* Read the first docid from each doclist */ 1819 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1820 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1821 1822 switch( mergetype ){ 1823 case MERGE_OR: 1824 case MERGE_POS_OR: 1825 while( p1 || p2 ){ 1826 if( p2 && p1 && i1==i2 ){ 1827 fts3PutDeltaVarint(&p, &iPrev, i1); 1828 if( mergetype==MERGE_POS_OR ) fts3PoslistMerge(&p, &p1, &p2); 1829 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1830 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1831 }else if( !p2 || (p1 && i1<i2) ){ 1832 fts3PutDeltaVarint(&p, &iPrev, i1); 1833 if( mergetype==MERGE_POS_OR ) fts3PoslistCopy(&p, &p1); 1834 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1835 }else{ 1836 fts3PutDeltaVarint(&p, &iPrev, i2); 1837 if( mergetype==MERGE_POS_OR ) fts3PoslistCopy(&p, &p2); 1838 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1839 } 1840 } 1841 break; 1842 1843 case MERGE_AND: 1844 while( p1 && p2 ){ 1845 if( i1==i2 ){ 1846 fts3PutDeltaVarint(&p, &iPrev, i1); 1847 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1848 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1849 nDoc++; 1850 }else if( i1<i2 ){ 1851 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1852 }else{ 1853 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1854 } 1855 } 1856 break; 1857 1858 case MERGE_NOT: 1859 while( p1 ){ 1860 if( p2 && i1==i2 ){ 1861 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1862 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1863 }else if( !p2 || i1<i2 ){ 1864 fts3PutDeltaVarint(&p, &iPrev, i1); 1865 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1866 }else{ 1867 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1868 } 1869 } 1870 break; 1871 1872 case MERGE_POS_PHRASE: 1873 case MERGE_PHRASE: { 1874 char **ppPos = (mergetype==MERGE_PHRASE ? 0 : &p); 1875 while( p1 && p2 ){ 1876 if( i1==i2 ){ 1877 char *pSave = p; 1878 sqlite3_int64 iPrevSave = iPrev; 1879 fts3PutDeltaVarint(&p, &iPrev, i1); 1880 if( 0==fts3PoslistPhraseMerge(ppPos, nParam1, 0, 1, &p1, &p2) ){ 1881 p = pSave; 1882 iPrev = iPrevSave; 1883 }else{ 1884 nDoc++; 1885 } 1886 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1887 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1888 }else if( i1<i2 ){ 1889 fts3PoslistCopy(0, &p1); 1890 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1891 }else{ 1892 fts3PoslistCopy(0, &p2); 1893 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1894 } 1895 } 1896 break; 1897 } 1898 1899 default: assert( mergetype==MERGE_POS_NEAR || mergetype==MERGE_NEAR ); { 1900 char *aTmp = 0; 1901 char **ppPos = 0; 1902 1903 if( mergetype==MERGE_POS_NEAR ){ 1904 ppPos = &p; 1905 aTmp = sqlite3_malloc(2*(n1+n2+1)); 1906 if( !aTmp ){ 1907 return SQLITE_NOMEM; 1908 } 1909 } 1910 1911 while( p1 && p2 ){ 1912 if( i1==i2 ){ 1913 char *pSave = p; 1914 sqlite3_int64 iPrevSave = iPrev; 1915 fts3PutDeltaVarint(&p, &iPrev, i1); 1916 1917 if( !fts3PoslistNearMerge(ppPos, aTmp, nParam1, nParam2, &p1, &p2) ){ 1918 iPrev = iPrevSave; 1919 p = pSave; 1920 } 1921 1922 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1923 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1924 }else if( i1<i2 ){ 1925 fts3PoslistCopy(0, &p1); 1926 fts3GetDeltaVarint2(&p1, pEnd1, &i1); 1927 }else{ 1928 fts3PoslistCopy(0, &p2); 1929 fts3GetDeltaVarint2(&p2, pEnd2, &i2); 1930 } 1931 } 1932 sqlite3_free(aTmp); 1933 break; 1934 } 1935 } 1936 1937 if( pnDoc ) *pnDoc = nDoc; 1938 *pnBuffer = (int)(p-aBuffer); 1939 return SQLITE_OK; 1940 } 1941 1942 /* 1943 ** A pointer to an instance of this structure is used as the context 1944 ** argument to sqlite3Fts3SegReaderIterate() 1945 */ 1946 typedef struct TermSelect TermSelect; 1947 struct TermSelect { 1948 int isReqPos; 1949 char *aaOutput[16]; /* Malloc'd output buffer */ 1950 int anOutput[16]; /* Size of output in bytes */ 1951 }; 1952 1953 /* 1954 ** Merge all doclists in the TermSelect.aaOutput[] array into a single 1955 ** doclist stored in TermSelect.aaOutput[0]. If successful, delete all 1956 ** other doclists (except the aaOutput[0] one) and return SQLITE_OK. 1957 ** 1958 ** If an OOM error occurs, return SQLITE_NOMEM. In this case it is 1959 ** the responsibility of the caller to free any doclists left in the 1960 ** TermSelect.aaOutput[] array. 1961 */ 1962 static int fts3TermSelectMerge(TermSelect *pTS){ 1963 int mergetype = (pTS->isReqPos ? MERGE_POS_OR : MERGE_OR); 1964 char *aOut = 0; 1965 int nOut = 0; 1966 int i; 1967 1968 /* Loop through the doclists in the aaOutput[] array. Merge them all 1969 ** into a single doclist. 1970 */ 1971 for(i=0; i<SizeofArray(pTS->aaOutput); i++){ 1972 if( pTS->aaOutput[i] ){ 1973 if( !aOut ){ 1974 aOut = pTS->aaOutput[i]; 1975 nOut = pTS->anOutput[i]; 1976 pTS->aaOutput[i] = 0; 1977 }else{ 1978 int nNew = nOut + pTS->anOutput[i]; 1979 char *aNew = sqlite3_malloc(nNew); 1980 if( !aNew ){ 1981 sqlite3_free(aOut); 1982 return SQLITE_NOMEM; 1983 } 1984 fts3DoclistMerge(mergetype, 0, 0, 1985 aNew, &nNew, pTS->aaOutput[i], pTS->anOutput[i], aOut, nOut, 0 1986 ); 1987 sqlite3_free(pTS->aaOutput[i]); 1988 sqlite3_free(aOut); 1989 pTS->aaOutput[i] = 0; 1990 aOut = aNew; 1991 nOut = nNew; 1992 } 1993 } 1994 } 1995 1996 pTS->aaOutput[0] = aOut; 1997 pTS->anOutput[0] = nOut; 1998 return SQLITE_OK; 1999 } 2000 2001 /* 2002 ** This function is used as the sqlite3Fts3SegReaderIterate() callback when 2003 ** querying the full-text index for a doclist associated with a term or 2004 ** term-prefix. 2005 */ 2006 static int fts3TermSelectCb( 2007 Fts3Table *p, /* Virtual table object */ 2008 void *pContext, /* Pointer to TermSelect structure */ 2009 char *zTerm, 2010 int nTerm, 2011 char *aDoclist, 2012 int nDoclist 2013 ){ 2014 TermSelect *pTS = (TermSelect *)pContext; 2015 2016 UNUSED_PARAMETER(p); 2017 UNUSED_PARAMETER(zTerm); 2018 UNUSED_PARAMETER(nTerm); 2019 2020 if( pTS->aaOutput[0]==0 ){ 2021 /* If this is the first term selected, copy the doclist to the output 2022 ** buffer using memcpy(). TODO: Add a way to transfer control of the 2023 ** aDoclist buffer from the caller so as to avoid the memcpy(). 2024 */ 2025 pTS->aaOutput[0] = sqlite3_malloc(nDoclist); 2026 pTS->anOutput[0] = nDoclist; 2027 if( pTS->aaOutput[0] ){ 2028 memcpy(pTS->aaOutput[0], aDoclist, nDoclist); 2029 }else{ 2030 return SQLITE_NOMEM; 2031 } 2032 }else{ 2033 int mergetype = (pTS->isReqPos ? MERGE_POS_OR : MERGE_OR); 2034 char *aMerge = aDoclist; 2035 int nMerge = nDoclist; 2036 int iOut; 2037 2038 for(iOut=0; iOut<SizeofArray(pTS->aaOutput); iOut++){ 2039 char *aNew; 2040 int nNew; 2041 if( pTS->aaOutput[iOut]==0 ){ 2042 assert( iOut>0 ); 2043 pTS->aaOutput[iOut] = aMerge; 2044 pTS->anOutput[iOut] = nMerge; 2045 break; 2046 } 2047 2048 nNew = nMerge + pTS->anOutput[iOut]; 2049 aNew = sqlite3_malloc(nNew); 2050 if( !aNew ){ 2051 if( aMerge!=aDoclist ){ 2052 sqlite3_free(aMerge); 2053 } 2054 return SQLITE_NOMEM; 2055 } 2056 fts3DoclistMerge(mergetype, 0, 0, aNew, &nNew, 2057 pTS->aaOutput[iOut], pTS->anOutput[iOut], aMerge, nMerge, 0 2058 ); 2059 2060 if( iOut>0 ) sqlite3_free(aMerge); 2061 sqlite3_free(pTS->aaOutput[iOut]); 2062 pTS->aaOutput[iOut] = 0; 2063 2064 aMerge = aNew; 2065 nMerge = nNew; 2066 if( (iOut+1)==SizeofArray(pTS->aaOutput) ){ 2067 pTS->aaOutput[iOut] = aMerge; 2068 pTS->anOutput[iOut] = nMerge; 2069 } 2070 } 2071 } 2072 return SQLITE_OK; 2073 } 2074 2075 static int fts3DeferredTermSelect( 2076 Fts3DeferredToken *pToken, /* Phrase token */ 2077 int isTermPos, /* True to include positions */ 2078 int *pnOut, /* OUT: Size of list */ 2079 char **ppOut /* OUT: Body of list */ 2080 ){ 2081 char *aSource; 2082 int nSource; 2083 2084 aSource = sqlite3Fts3DeferredDoclist(pToken, &nSource); 2085 if( !aSource ){ 2086 *pnOut = 0; 2087 *ppOut = 0; 2088 }else if( isTermPos ){ 2089 *ppOut = sqlite3_malloc(nSource); 2090 if( !*ppOut ) return SQLITE_NOMEM; 2091 memcpy(*ppOut, aSource, nSource); 2092 *pnOut = nSource; 2093 }else{ 2094 sqlite3_int64 docid; 2095 *pnOut = sqlite3Fts3GetVarint(aSource, &docid); 2096 *ppOut = sqlite3_malloc(*pnOut); 2097 if( !*ppOut ) return SQLITE_NOMEM; 2098 sqlite3Fts3PutVarint(*ppOut, docid); 2099 } 2100 2101 return SQLITE_OK; 2102 } 2103 2104 int sqlite3Fts3SegReaderCursor( 2105 Fts3Table *p, /* FTS3 table handle */ 2106 int iLevel, /* Level of segments to scan */ 2107 const char *zTerm, /* Term to query for */ 2108 int nTerm, /* Size of zTerm in bytes */ 2109 int isPrefix, /* True for a prefix search */ 2110 int isScan, /* True to scan from zTerm to EOF */ 2111 Fts3SegReaderCursor *pCsr /* Cursor object to populate */ 2112 ){ 2113 int rc = SQLITE_OK; 2114 int rc2; 2115 int iAge = 0; 2116 sqlite3_stmt *pStmt = 0; 2117 Fts3SegReader *pPending = 0; 2118 2119 assert( iLevel==FTS3_SEGCURSOR_ALL 2120 || iLevel==FTS3_SEGCURSOR_PENDING 2121 || iLevel>=0 2122 ); 2123 assert( FTS3_SEGCURSOR_PENDING<0 ); 2124 assert( FTS3_SEGCURSOR_ALL<0 ); 2125 assert( iLevel==FTS3_SEGCURSOR_ALL || (zTerm==0 && isPrefix==1) ); 2126 assert( isPrefix==0 || isScan==0 ); 2127 2128 2129 memset(pCsr, 0, sizeof(Fts3SegReaderCursor)); 2130 2131 /* If iLevel is less than 0, include a seg-reader for the pending-terms. */ 2132 assert( isScan==0 || fts3HashCount(&p->pendingTerms)==0 ); 2133 if( iLevel<0 && isScan==0 ){ 2134 rc = sqlite3Fts3SegReaderPending(p, zTerm, nTerm, isPrefix, &pPending); 2135 if( rc==SQLITE_OK && pPending ){ 2136 int nByte = (sizeof(Fts3SegReader *) * 16); 2137 pCsr->apSegment = (Fts3SegReader **)sqlite3_malloc(nByte); 2138 if( pCsr->apSegment==0 ){ 2139 rc = SQLITE_NOMEM; 2140 }else{ 2141 pCsr->apSegment[0] = pPending; 2142 pCsr->nSegment = 1; 2143 pPending = 0; 2144 } 2145 } 2146 } 2147 2148 if( iLevel!=FTS3_SEGCURSOR_PENDING ){ 2149 if( rc==SQLITE_OK ){ 2150 rc = sqlite3Fts3AllSegdirs(p, iLevel, &pStmt); 2151 } 2152 while( rc==SQLITE_OK && SQLITE_ROW==(rc = sqlite3_step(pStmt)) ){ 2153 2154 /* Read the values returned by the SELECT into local variables. */ 2155 sqlite3_int64 iStartBlock = sqlite3_column_int64(pStmt, 1); 2156 sqlite3_int64 iLeavesEndBlock = sqlite3_column_int64(pStmt, 2); 2157 sqlite3_int64 iEndBlock = sqlite3_column_int64(pStmt, 3); 2158 int nRoot = sqlite3_column_bytes(pStmt, 4); 2159 char const *zRoot = sqlite3_column_blob(pStmt, 4); 2160 2161 /* If nSegment is a multiple of 16 the array needs to be extended. */ 2162 if( (pCsr->nSegment%16)==0 ){ 2163 Fts3SegReader **apNew; 2164 int nByte = (pCsr->nSegment + 16)*sizeof(Fts3SegReader*); 2165 apNew = (Fts3SegReader **)sqlite3_realloc(pCsr->apSegment, nByte); 2166 if( !apNew ){ 2167 rc = SQLITE_NOMEM; 2168 goto finished; 2169 } 2170 pCsr->apSegment = apNew; 2171 } 2172 2173 /* If zTerm is not NULL, and this segment is not stored entirely on its 2174 ** root node, the range of leaves scanned can be reduced. Do this. */ 2175 if( iStartBlock && zTerm ){ 2176 sqlite3_int64 *pi = (isPrefix ? &iLeavesEndBlock : 0); 2177 rc = fts3SelectLeaf(p, zTerm, nTerm, zRoot, nRoot, &iStartBlock, pi); 2178 if( rc!=SQLITE_OK ) goto finished; 2179 if( isPrefix==0 && isScan==0 ) iLeavesEndBlock = iStartBlock; 2180 } 2181 2182 rc = sqlite3Fts3SegReaderNew(iAge, iStartBlock, iLeavesEndBlock, 2183 iEndBlock, zRoot, nRoot, &pCsr->apSegment[pCsr->nSegment] 2184 ); 2185 if( rc!=SQLITE_OK ) goto finished; 2186 pCsr->nSegment++; 2187 iAge++; 2188 } 2189 } 2190 2191 finished: 2192 rc2 = sqlite3_reset(pStmt); 2193 if( rc==SQLITE_DONE ) rc = rc2; 2194 sqlite3Fts3SegReaderFree(pPending); 2195 2196 return rc; 2197 } 2198 2199 2200 static int fts3TermSegReaderCursor( 2201 Fts3Cursor *pCsr, /* Virtual table cursor handle */ 2202 const char *zTerm, /* Term to query for */ 2203 int nTerm, /* Size of zTerm in bytes */ 2204 int isPrefix, /* True for a prefix search */ 2205 Fts3SegReaderCursor **ppSegcsr /* OUT: Allocated seg-reader cursor */ 2206 ){ 2207 Fts3SegReaderCursor *pSegcsr; /* Object to allocate and return */ 2208 int rc = SQLITE_NOMEM; /* Return code */ 2209 2210 pSegcsr = sqlite3_malloc(sizeof(Fts3SegReaderCursor)); 2211 if( pSegcsr ){ 2212 Fts3Table *p = (Fts3Table *)pCsr->base.pVtab; 2213 int i; 2214 int nCost = 0; 2215 rc = sqlite3Fts3SegReaderCursor( 2216 p, FTS3_SEGCURSOR_ALL, zTerm, nTerm, isPrefix, 0, pSegcsr); 2217 2218 for(i=0; rc==SQLITE_OK && i<pSegcsr->nSegment; i++){ 2219 rc = sqlite3Fts3SegReaderCost(pCsr, pSegcsr->apSegment[i], &nCost); 2220 } 2221 pSegcsr->nCost = nCost; 2222 } 2223 2224 *ppSegcsr = pSegcsr; 2225 return rc; 2226 } 2227 2228 static void fts3SegReaderCursorFree(Fts3SegReaderCursor *pSegcsr){ 2229 sqlite3Fts3SegReaderFinish(pSegcsr); 2230 sqlite3_free(pSegcsr); 2231 } 2232 2233 /* 2234 ** This function retreives the doclist for the specified term (or term 2235 ** prefix) from the database. 2236 ** 2237 ** The returned doclist may be in one of two formats, depending on the 2238 ** value of parameter isReqPos. If isReqPos is zero, then the doclist is 2239 ** a sorted list of delta-compressed docids (a bare doclist). If isReqPos 2240 ** is non-zero, then the returned list is in the same format as is stored 2241 ** in the database without the found length specifier at the start of on-disk 2242 ** doclists. 2243 */ 2244 static int fts3TermSelect( 2245 Fts3Table *p, /* Virtual table handle */ 2246 Fts3PhraseToken *pTok, /* Token to query for */ 2247 int iColumn, /* Column to query (or -ve for all columns) */ 2248 int isReqPos, /* True to include position lists in output */ 2249 int *pnOut, /* OUT: Size of buffer at *ppOut */ 2250 char **ppOut /* OUT: Malloced result buffer */ 2251 ){ 2252 int rc; /* Return code */ 2253 Fts3SegReaderCursor *pSegcsr; /* Seg-reader cursor for this term */ 2254 TermSelect tsc; /* Context object for fts3TermSelectCb() */ 2255 Fts3SegFilter filter; /* Segment term filter configuration */ 2256 2257 pSegcsr = pTok->pSegcsr; 2258 memset(&tsc, 0, sizeof(TermSelect)); 2259 tsc.isReqPos = isReqPos; 2260 2261 filter.flags = FTS3_SEGMENT_IGNORE_EMPTY 2262 | (pTok->isPrefix ? FTS3_SEGMENT_PREFIX : 0) 2263 | (isReqPos ? FTS3_SEGMENT_REQUIRE_POS : 0) 2264 | (iColumn<p->nColumn ? FTS3_SEGMENT_COLUMN_FILTER : 0); 2265 filter.iCol = iColumn; 2266 filter.zTerm = pTok->z; 2267 filter.nTerm = pTok->n; 2268 2269 rc = sqlite3Fts3SegReaderStart(p, pSegcsr, &filter); 2270 while( SQLITE_OK==rc 2271 && SQLITE_ROW==(rc = sqlite3Fts3SegReaderStep(p, pSegcsr)) 2272 ){ 2273 rc = fts3TermSelectCb(p, (void *)&tsc, 2274 pSegcsr->zTerm, pSegcsr->nTerm, pSegcsr->aDoclist, pSegcsr->nDoclist 2275 ); 2276 } 2277 2278 if( rc==SQLITE_OK ){ 2279 rc = fts3TermSelectMerge(&tsc); 2280 } 2281 if( rc==SQLITE_OK ){ 2282 *ppOut = tsc.aaOutput[0]; 2283 *pnOut = tsc.anOutput[0]; 2284 }else{ 2285 int i; 2286 for(i=0; i<SizeofArray(tsc.aaOutput); i++){ 2287 sqlite3_free(tsc.aaOutput[i]); 2288 } 2289 } 2290 2291 fts3SegReaderCursorFree(pSegcsr); 2292 pTok->pSegcsr = 0; 2293 return rc; 2294 } 2295 2296 /* 2297 ** This function counts the total number of docids in the doclist stored 2298 ** in buffer aList[], size nList bytes. 2299 ** 2300 ** If the isPoslist argument is true, then it is assumed that the doclist 2301 ** contains a position-list following each docid. Otherwise, it is assumed 2302 ** that the doclist is simply a list of docids stored as delta encoded 2303 ** varints. 2304 */ 2305 static int fts3DoclistCountDocids(int isPoslist, char *aList, int nList){ 2306 int nDoc = 0; /* Return value */ 2307 if( aList ){ 2308 char *aEnd = &aList[nList]; /* Pointer to one byte after EOF */ 2309 char *p = aList; /* Cursor */ 2310 if( !isPoslist ){ 2311 /* The number of docids in the list is the same as the number of 2312 ** varints. In FTS3 a varint consists of a single byte with the 0x80 2313 ** bit cleared and zero or more bytes with the 0x80 bit set. So to 2314 ** count the varints in the buffer, just count the number of bytes 2315 ** with the 0x80 bit clear. */ 2316 while( p<aEnd ) nDoc += (((*p++)&0x80)==0); 2317 }else{ 2318 while( p<aEnd ){ 2319 nDoc++; 2320 while( (*p++)&0x80 ); /* Skip docid varint */ 2321 fts3PoslistCopy(0, &p); /* Skip over position list */ 2322 } 2323 } 2324 } 2325 2326 return nDoc; 2327 } 2328 2329 /* 2330 ** Call sqlite3Fts3DeferToken() for each token in the expression pExpr. 2331 */ 2332 static int fts3DeferExpression(Fts3Cursor *pCsr, Fts3Expr *pExpr){ 2333 int rc = SQLITE_OK; 2334 if( pExpr ){ 2335 rc = fts3DeferExpression(pCsr, pExpr->pLeft); 2336 if( rc==SQLITE_OK ){ 2337 rc = fts3DeferExpression(pCsr, pExpr->pRight); 2338 } 2339 if( pExpr->eType==FTSQUERY_PHRASE ){ 2340 int iCol = pExpr->pPhrase->iColumn; 2341 int i; 2342 for(i=0; rc==SQLITE_OK && i<pExpr->pPhrase->nToken; i++){ 2343 Fts3PhraseToken *pToken = &pExpr->pPhrase->aToken[i]; 2344 if( pToken->pDeferred==0 ){ 2345 rc = sqlite3Fts3DeferToken(pCsr, pToken, iCol); 2346 } 2347 } 2348 } 2349 } 2350 return rc; 2351 } 2352 2353 /* 2354 ** This function removes the position information from a doclist. When 2355 ** called, buffer aList (size *pnList bytes) contains a doclist that includes 2356 ** position information. This function removes the position information so 2357 ** that aList contains only docids, and adjusts *pnList to reflect the new 2358 ** (possibly reduced) size of the doclist. 2359 */ 2360 static void fts3DoclistStripPositions( 2361 char *aList, /* IN/OUT: Buffer containing doclist */ 2362 int *pnList /* IN/OUT: Size of doclist in bytes */ 2363 ){ 2364 if( aList ){ 2365 char *aEnd = &aList[*pnList]; /* Pointer to one byte after EOF */ 2366 char *p = aList; /* Input cursor */ 2367 char *pOut = aList; /* Output cursor */ 2368 2369 while( p<aEnd ){ 2370 sqlite3_int64 delta; 2371 p += sqlite3Fts3GetVarint(p, &delta); 2372 fts3PoslistCopy(0, &p); 2373 pOut += sqlite3Fts3PutVarint(pOut, delta); 2374 } 2375 2376 *pnList = (int)(pOut - aList); 2377 } 2378 } 2379 2380 /* 2381 ** Return a DocList corresponding to the phrase *pPhrase. 2382 ** 2383 ** If this function returns SQLITE_OK, but *pnOut is set to a negative value, 2384 ** then no tokens in the phrase were looked up in the full-text index. This 2385 ** is only possible when this function is called from within xFilter(). The 2386 ** caller should assume that all documents match the phrase. The actual 2387 ** filtering will take place in xNext(). 2388 */ 2389 static int fts3PhraseSelect( 2390 Fts3Cursor *pCsr, /* Virtual table cursor handle */ 2391 Fts3Phrase *pPhrase, /* Phrase to return a doclist for */ 2392 int isReqPos, /* True if output should contain positions */ 2393 char **paOut, /* OUT: Pointer to malloc'd result buffer */ 2394 int *pnOut /* OUT: Size of buffer at *paOut */ 2395 ){ 2396 char *pOut = 0; 2397 int nOut = 0; 2398 int rc = SQLITE_OK; 2399 int ii; 2400 int iCol = pPhrase->iColumn; 2401 int isTermPos = (pPhrase->nToken>1 || isReqPos); 2402 Fts3Table *p = (Fts3Table *)pCsr->base.pVtab; 2403 int isFirst = 1; 2404 2405 int iPrevTok = 0; 2406 int nDoc = 0; 2407 2408 /* If this is an xFilter() evaluation, create a segment-reader for each 2409 ** phrase token. Or, if this is an xNext() or snippet/offsets/matchinfo 2410 ** evaluation, only create segment-readers if there are no Fts3DeferredToken 2411 ** objects attached to the phrase-tokens. 2412 */ 2413 for(ii=0; ii<pPhrase->nToken; ii++){ 2414 Fts3PhraseToken *pTok = &pPhrase->aToken[ii]; 2415 if( pTok->pSegcsr==0 ){ 2416 if( (pCsr->eEvalmode==FTS3_EVAL_FILTER) 2417 || (pCsr->eEvalmode==FTS3_EVAL_NEXT && pCsr->pDeferred==0) 2418 || (pCsr->eEvalmode==FTS3_EVAL_MATCHINFO && pTok->bFulltext) 2419 ){ 2420 rc = fts3TermSegReaderCursor( 2421 pCsr, pTok->z, pTok->n, pTok->isPrefix, &pTok->pSegcsr 2422 ); 2423 if( rc!=SQLITE_OK ) return rc; 2424 } 2425 } 2426 } 2427 2428 for(ii=0; ii<pPhrase->nToken; ii++){ 2429 Fts3PhraseToken *pTok; /* Token to find doclist for */ 2430 int iTok = 0; /* The token being queried this iteration */ 2431 char *pList = 0; /* Pointer to token doclist */ 2432 int nList = 0; /* Size of buffer at pList */ 2433 2434 /* Select a token to process. If this is an xFilter() call, then tokens 2435 ** are processed in order from least to most costly. Otherwise, tokens 2436 ** are processed in the order in which they occur in the phrase. 2437 */ 2438 if( pCsr->eEvalmode==FTS3_EVAL_MATCHINFO ){ 2439 assert( isReqPos ); 2440 iTok = ii; 2441 pTok = &pPhrase->aToken[iTok]; 2442 if( pTok->bFulltext==0 ) continue; 2443 }else if( pCsr->eEvalmode==FTS3_EVAL_NEXT || isReqPos ){ 2444 iTok = ii; 2445 pTok = &pPhrase->aToken[iTok]; 2446 }else{ 2447 int nMinCost = 0x7FFFFFFF; 2448 int jj; 2449 2450 /* Find the remaining token with the lowest cost. */ 2451 for(jj=0; jj<pPhrase->nToken; jj++){ 2452 Fts3SegReaderCursor *pSegcsr = pPhrase->aToken[jj].pSegcsr; 2453 if( pSegcsr && pSegcsr->nCost<nMinCost ){ 2454 iTok = jj; 2455 nMinCost = pSegcsr->nCost; 2456 } 2457 } 2458 pTok = &pPhrase->aToken[iTok]; 2459 2460 /* This branch is taken if it is determined that loading the doclist 2461 ** for the next token would require more IO than loading all documents 2462 ** currently identified by doclist pOut/nOut. No further doclists will 2463 ** be loaded from the full-text index for this phrase. 2464 */ 2465 if( nMinCost>nDoc && ii>0 ){ 2466 rc = fts3DeferExpression(pCsr, pCsr->pExpr); 2467 break; 2468 } 2469 } 2470 2471 if( pCsr->eEvalmode==FTS3_EVAL_NEXT && pTok->pDeferred ){ 2472 rc = fts3DeferredTermSelect(pTok->pDeferred, isTermPos, &nList, &pList); 2473 }else{ 2474 if( pTok->pSegcsr ){ 2475 rc = fts3TermSelect(p, pTok, iCol, isTermPos, &nList, &pList); 2476 } 2477 pTok->bFulltext = 1; 2478 } 2479 assert( rc!=SQLITE_OK || pCsr->eEvalmode || pTok->pSegcsr==0 ); 2480 if( rc!=SQLITE_OK ) break; 2481 2482 if( isFirst ){ 2483 pOut = pList; 2484 nOut = nList; 2485 if( pCsr->eEvalmode==FTS3_EVAL_FILTER && pPhrase->nToken>1 ){ 2486 nDoc = fts3DoclistCountDocids(1, pOut, nOut); 2487 } 2488 isFirst = 0; 2489 iPrevTok = iTok; 2490 }else{ 2491 /* Merge the new term list and the current output. */ 2492 char *aLeft, *aRight; 2493 int nLeft, nRight; 2494 int nDist; 2495 int mt; 2496 2497 /* If this is the final token of the phrase, and positions were not 2498 ** requested by the caller, use MERGE_PHRASE instead of POS_PHRASE. 2499 ** This drops the position information from the output list. 2500 */ 2501 mt = MERGE_POS_PHRASE; 2502 if( ii==pPhrase->nToken-1 && !isReqPos ) mt = MERGE_PHRASE; 2503 2504 assert( iPrevTok!=iTok ); 2505 if( iPrevTok<iTok ){ 2506 aLeft = pOut; 2507 nLeft = nOut; 2508 aRight = pList; 2509 nRight = nList; 2510 nDist = iTok-iPrevTok; 2511 iPrevTok = iTok; 2512 }else{ 2513 aRight = pOut; 2514 nRight = nOut; 2515 aLeft = pList; 2516 nLeft = nList; 2517 nDist = iPrevTok-iTok; 2518 } 2519 pOut = aRight; 2520 fts3DoclistMerge( 2521 mt, nDist, 0, pOut, &nOut, aLeft, nLeft, aRight, nRight, &nDoc 2522 ); 2523 sqlite3_free(aLeft); 2524 } 2525 assert( nOut==0 || pOut!=0 ); 2526 } 2527 2528 if( rc==SQLITE_OK ){ 2529 if( ii!=pPhrase->nToken ){ 2530 assert( pCsr->eEvalmode==FTS3_EVAL_FILTER && isReqPos==0 ); 2531 fts3DoclistStripPositions(pOut, &nOut); 2532 } 2533 *paOut = pOut; 2534 *pnOut = nOut; 2535 }else{ 2536 sqlite3_free(pOut); 2537 } 2538 return rc; 2539 } 2540 2541 /* 2542 ** This function merges two doclists according to the requirements of a 2543 ** NEAR operator. 2544 ** 2545 ** Both input doclists must include position information. The output doclist 2546 ** includes position information if the first argument to this function 2547 ** is MERGE_POS_NEAR, or does not if it is MERGE_NEAR. 2548 */ 2549 static int fts3NearMerge( 2550 int mergetype, /* MERGE_POS_NEAR or MERGE_NEAR */ 2551 int nNear, /* Parameter to NEAR operator */ 2552 int nTokenLeft, /* Number of tokens in LHS phrase arg */ 2553 char *aLeft, /* Doclist for LHS (incl. positions) */ 2554 int nLeft, /* Size of LHS doclist in bytes */ 2555 int nTokenRight, /* As nTokenLeft */ 2556 char *aRight, /* As aLeft */ 2557 int nRight, /* As nRight */ 2558 char **paOut, /* OUT: Results of merge (malloced) */ 2559 int *pnOut /* OUT: Sized of output buffer */ 2560 ){ 2561 char *aOut; /* Buffer to write output doclist to */ 2562 int rc; /* Return code */ 2563 2564 assert( mergetype==MERGE_POS_NEAR || MERGE_NEAR ); 2565 2566 aOut = sqlite3_malloc(nLeft+nRight+1); 2567 if( aOut==0 ){ 2568 rc = SQLITE_NOMEM; 2569 }else{ 2570 rc = fts3DoclistMerge(mergetype, nNear+nTokenRight, nNear+nTokenLeft, 2571 aOut, pnOut, aLeft, nLeft, aRight, nRight, 0 2572 ); 2573 if( rc!=SQLITE_OK ){ 2574 sqlite3_free(aOut); 2575 aOut = 0; 2576 } 2577 } 2578 2579 *paOut = aOut; 2580 return rc; 2581 } 2582 2583 /* 2584 ** This function is used as part of the processing for the snippet() and 2585 ** offsets() functions. 2586 ** 2587 ** Both pLeft and pRight are expression nodes of type FTSQUERY_PHRASE. Both 2588 ** have their respective doclists (including position information) loaded 2589 ** in Fts3Expr.aDoclist/nDoclist. This function removes all entries from 2590 ** each doclist that are not within nNear tokens of a corresponding entry 2591 ** in the other doclist. 2592 */ 2593 int sqlite3Fts3ExprNearTrim(Fts3Expr *pLeft, Fts3Expr *pRight, int nNear){ 2594 int rc; /* Return code */ 2595 2596 assert( pLeft->eType==FTSQUERY_PHRASE ); 2597 assert( pRight->eType==FTSQUERY_PHRASE ); 2598 assert( pLeft->isLoaded && pRight->isLoaded ); 2599 2600 if( pLeft->aDoclist==0 || pRight->aDoclist==0 ){ 2601 sqlite3_free(pLeft->aDoclist); 2602 sqlite3_free(pRight->aDoclist); 2603 pRight->aDoclist = 0; 2604 pLeft->aDoclist = 0; 2605 rc = SQLITE_OK; 2606 }else{ 2607 char *aOut; /* Buffer in which to assemble new doclist */ 2608 int nOut; /* Size of buffer aOut in bytes */ 2609 2610 rc = fts3NearMerge(MERGE_POS_NEAR, nNear, 2611 pLeft->pPhrase->nToken, pLeft->aDoclist, pLeft->nDoclist, 2612 pRight->pPhrase->nToken, pRight->aDoclist, pRight->nDoclist, 2613 &aOut, &nOut 2614 ); 2615 if( rc!=SQLITE_OK ) return rc; 2616 sqlite3_free(pRight->aDoclist); 2617 pRight->aDoclist = aOut; 2618 pRight->nDoclist = nOut; 2619 2620 rc = fts3NearMerge(MERGE_POS_NEAR, nNear, 2621 pRight->pPhrase->nToken, pRight->aDoclist, pRight->nDoclist, 2622 pLeft->pPhrase->nToken, pLeft->aDoclist, pLeft->nDoclist, 2623 &aOut, &nOut 2624 ); 2625 sqlite3_free(pLeft->aDoclist); 2626 pLeft->aDoclist = aOut; 2627 pLeft->nDoclist = nOut; 2628 } 2629 return rc; 2630 } 2631 2632 2633 /* 2634 ** Allocate an Fts3SegReaderArray for each token in the expression pExpr. 2635 ** The allocated objects are stored in the Fts3PhraseToken.pArray member 2636 ** variables of each token structure. 2637 */ 2638 static int fts3ExprAllocateSegReaders( 2639 Fts3Cursor *pCsr, /* FTS3 table */ 2640 Fts3Expr *pExpr, /* Expression to create seg-readers for */ 2641 int *pnExpr /* OUT: Number of AND'd expressions */ 2642 ){ 2643 int rc = SQLITE_OK; /* Return code */ 2644 2645 assert( pCsr->eEvalmode==FTS3_EVAL_FILTER ); 2646 if( pnExpr && pExpr->eType!=FTSQUERY_AND ){ 2647 (*pnExpr)++; 2648 pnExpr = 0; 2649 } 2650 2651 if( pExpr->eType==FTSQUERY_PHRASE ){ 2652 Fts3Phrase *pPhrase = pExpr->pPhrase; 2653 int ii; 2654 2655 for(ii=0; rc==SQLITE_OK && ii<pPhrase->nToken; ii++){ 2656 Fts3PhraseToken *pTok = &pPhrase->aToken[ii]; 2657 if( pTok->pSegcsr==0 ){ 2658 rc = fts3TermSegReaderCursor( 2659 pCsr, pTok->z, pTok->n, pTok->isPrefix, &pTok->pSegcsr 2660 ); 2661 } 2662 } 2663 }else{ 2664 rc = fts3ExprAllocateSegReaders(pCsr, pExpr->pLeft, pnExpr); 2665 if( rc==SQLITE_OK ){ 2666 rc = fts3ExprAllocateSegReaders(pCsr, pExpr->pRight, pnExpr); 2667 } 2668 } 2669 return rc; 2670 } 2671 2672 /* 2673 ** Free the Fts3SegReaderArray objects associated with each token in the 2674 ** expression pExpr. In other words, this function frees the resources 2675 ** allocated by fts3ExprAllocateSegReaders(). 2676 */ 2677 static void fts3ExprFreeSegReaders(Fts3Expr *pExpr){ 2678 if( pExpr ){ 2679 Fts3Phrase *pPhrase = pExpr->pPhrase; 2680 if( pPhrase ){ 2681 int kk; 2682 for(kk=0; kk<pPhrase->nToken; kk++){ 2683 fts3SegReaderCursorFree(pPhrase->aToken[kk].pSegcsr); 2684 pPhrase->aToken[kk].pSegcsr = 0; 2685 } 2686 } 2687 fts3ExprFreeSegReaders(pExpr->pLeft); 2688 fts3ExprFreeSegReaders(pExpr->pRight); 2689 } 2690 } 2691 2692 /* 2693 ** Return the sum of the costs of all tokens in the expression pExpr. This 2694 ** function must be called after Fts3SegReaderArrays have been allocated 2695 ** for all tokens using fts3ExprAllocateSegReaders(). 2696 */ 2697 static int fts3ExprCost(Fts3Expr *pExpr){ 2698 int nCost; /* Return value */ 2699 if( pExpr->eType==FTSQUERY_PHRASE ){ 2700 Fts3Phrase *pPhrase = pExpr->pPhrase; 2701 int ii; 2702 nCost = 0; 2703 for(ii=0; ii<pPhrase->nToken; ii++){ 2704 Fts3SegReaderCursor *pSegcsr = pPhrase->aToken[ii].pSegcsr; 2705 if( pSegcsr ) nCost += pSegcsr->nCost; 2706 } 2707 }else{ 2708 nCost = fts3ExprCost(pExpr->pLeft) + fts3ExprCost(pExpr->pRight); 2709 } 2710 return nCost; 2711 } 2712 2713 /* 2714 ** The following is a helper function (and type) for fts3EvalExpr(). It 2715 ** must be called after Fts3SegReaders have been allocated for every token 2716 ** in the expression. See the context it is called from in fts3EvalExpr() 2717 ** for further explanation. 2718 */ 2719 typedef struct ExprAndCost ExprAndCost; 2720 struct ExprAndCost { 2721 Fts3Expr *pExpr; 2722 int nCost; 2723 }; 2724 static void fts3ExprAssignCosts( 2725 Fts3Expr *pExpr, /* Expression to create seg-readers for */ 2726 ExprAndCost **ppExprCost /* OUT: Write to *ppExprCost */ 2727 ){ 2728 if( pExpr->eType==FTSQUERY_AND ){ 2729 fts3ExprAssignCosts(pExpr->pLeft, ppExprCost); 2730 fts3ExprAssignCosts(pExpr->pRight, ppExprCost); 2731 }else{ 2732 (*ppExprCost)->pExpr = pExpr; 2733 (*ppExprCost)->nCost = fts3ExprCost(pExpr); 2734 (*ppExprCost)++; 2735 } 2736 } 2737 2738 /* 2739 ** Evaluate the full-text expression pExpr against FTS3 table pTab. Store 2740 ** the resulting doclist in *paOut and *pnOut. This routine mallocs for 2741 ** the space needed to store the output. The caller is responsible for 2742 ** freeing the space when it has finished. 2743 ** 2744 ** This function is called in two distinct contexts: 2745 ** 2746 ** * From within the virtual table xFilter() method. In this case, the 2747 ** output doclist contains entries for all rows in the table, based on 2748 ** data read from the full-text index. 2749 ** 2750 ** In this case, if the query expression contains one or more tokens that 2751 ** are very common, then the returned doclist may contain a superset of 2752 ** the documents that actually match the expression. 2753 ** 2754 ** * From within the virtual table xNext() method. This call is only made 2755 ** if the call from within xFilter() found that there were very common 2756 ** tokens in the query expression and did return a superset of the 2757 ** matching documents. In this case the returned doclist contains only 2758 ** entries that correspond to the current row of the table. Instead of 2759 ** reading the data for each token from the full-text index, the data is 2760 ** already available in-memory in the Fts3PhraseToken.pDeferred structures. 2761 ** See fts3EvalDeferred() for how it gets there. 2762 ** 2763 ** In the first case above, Fts3Cursor.doDeferred==0. In the second (if it is 2764 ** required) Fts3Cursor.doDeferred==1. 2765 ** 2766 ** If the SQLite invokes the snippet(), offsets() or matchinfo() function 2767 ** as part of a SELECT on an FTS3 table, this function is called on each 2768 ** individual phrase expression in the query. If there were very common tokens 2769 ** found in the xFilter() call, then this function is called once for phrase 2770 ** for each row visited, and the returned doclist contains entries for the 2771 ** current row only. Otherwise, if there were no very common tokens, then this 2772 ** function is called once only for each phrase in the query and the returned 2773 ** doclist contains entries for all rows of the table. 2774 ** 2775 ** Fts3Cursor.doDeferred==1 when this function is called on phrases as a 2776 ** result of a snippet(), offsets() or matchinfo() invocation. 2777 */ 2778 static int fts3EvalExpr( 2779 Fts3Cursor *p, /* Virtual table cursor handle */ 2780 Fts3Expr *pExpr, /* Parsed fts3 expression */ 2781 char **paOut, /* OUT: Pointer to malloc'd result buffer */ 2782 int *pnOut, /* OUT: Size of buffer at *paOut */ 2783 int isReqPos /* Require positions in output buffer */ 2784 ){ 2785 int rc = SQLITE_OK; /* Return code */ 2786 2787 /* Zero the output parameters. */ 2788 *paOut = 0; 2789 *pnOut = 0; 2790 2791 if( pExpr ){ 2792 assert( pExpr->eType==FTSQUERY_NEAR || pExpr->eType==FTSQUERY_OR 2793 || pExpr->eType==FTSQUERY_AND || pExpr->eType==FTSQUERY_NOT 2794 || pExpr->eType==FTSQUERY_PHRASE 2795 ); 2796 assert( pExpr->eType==FTSQUERY_PHRASE || isReqPos==0 ); 2797 2798 if( pExpr->eType==FTSQUERY_PHRASE ){ 2799 rc = fts3PhraseSelect(p, pExpr->pPhrase, 2800 isReqPos || (pExpr->pParent && pExpr->pParent->eType==FTSQUERY_NEAR), 2801 paOut, pnOut 2802 ); 2803 fts3ExprFreeSegReaders(pExpr); 2804 }else if( p->eEvalmode==FTS3_EVAL_FILTER && pExpr->eType==FTSQUERY_AND ){ 2805 ExprAndCost *aExpr = 0; /* Array of AND'd expressions and costs */ 2806 int nExpr = 0; /* Size of aExpr[] */ 2807 char *aRet = 0; /* Doclist to return to caller */ 2808 int nRet = 0; /* Length of aRet[] in bytes */ 2809 int nDoc = 0x7FFFFFFF; 2810 2811 assert( !isReqPos ); 2812 2813 rc = fts3ExprAllocateSegReaders(p, pExpr, &nExpr); 2814 if( rc==SQLITE_OK ){ 2815 assert( nExpr>1 ); 2816 aExpr = sqlite3_malloc(sizeof(ExprAndCost) * nExpr); 2817 if( !aExpr ) rc = SQLITE_NOMEM; 2818 } 2819 if( rc==SQLITE_OK ){ 2820 int ii; /* Used to iterate through expressions */ 2821 2822 fts3ExprAssignCosts(pExpr, &aExpr); 2823 aExpr -= nExpr; 2824 for(ii=0; ii<nExpr; ii++){ 2825 char *aNew; 2826 int nNew; 2827 int jj; 2828 ExprAndCost *pBest = 0; 2829 2830 for(jj=0; jj<nExpr; jj++){ 2831 ExprAndCost *pCand = &aExpr[jj]; 2832 if( pCand->pExpr && (pBest==0 || pCand->nCost<pBest->nCost) ){ 2833 pBest = pCand; 2834 } 2835 } 2836 2837 if( pBest->nCost>nDoc ){ 2838 rc = fts3DeferExpression(p, p->pExpr); 2839 break; 2840 }else{ 2841 rc = fts3EvalExpr(p, pBest->pExpr, &aNew, &nNew, 0); 2842 if( rc!=SQLITE_OK ) break; 2843 pBest->pExpr = 0; 2844 if( ii==0 ){ 2845 aRet = aNew; 2846 nRet = nNew; 2847 nDoc = fts3DoclistCountDocids(0, aRet, nRet); 2848 }else{ 2849 fts3DoclistMerge( 2850 MERGE_AND, 0, 0, aRet, &nRet, aRet, nRet, aNew, nNew, &nDoc 2851 ); 2852 sqlite3_free(aNew); 2853 } 2854 } 2855 } 2856 } 2857 2858 if( rc==SQLITE_OK ){ 2859 *paOut = aRet; 2860 *pnOut = nRet; 2861 }else{ 2862 assert( *paOut==0 ); 2863 sqlite3_free(aRet); 2864 } 2865 sqlite3_free(aExpr); 2866 fts3ExprFreeSegReaders(pExpr); 2867 2868 }else{ 2869 char *aLeft; 2870 char *aRight; 2871 int nLeft; 2872 int nRight; 2873 2874 assert( pExpr->eType==FTSQUERY_NEAR 2875 || pExpr->eType==FTSQUERY_OR 2876 || pExpr->eType==FTSQUERY_NOT 2877 || (pExpr->eType==FTSQUERY_AND && p->eEvalmode==FTS3_EVAL_NEXT) 2878 ); 2879 2880 if( 0==(rc = fts3EvalExpr(p, pExpr->pRight, &aRight, &nRight, isReqPos)) 2881 && 0==(rc = fts3EvalExpr(p, pExpr->pLeft, &aLeft, &nLeft, isReqPos)) 2882 ){ 2883 switch( pExpr->eType ){ 2884 case FTSQUERY_NEAR: { 2885 Fts3Expr *pLeft; 2886 Fts3Expr *pRight; 2887 int mergetype = MERGE_NEAR; 2888 if( pExpr->pParent && pExpr->pParent->eType==FTSQUERY_NEAR ){ 2889 mergetype = MERGE_POS_NEAR; 2890 } 2891 pLeft = pExpr->pLeft; 2892 while( pLeft->eType==FTSQUERY_NEAR ){ 2893 pLeft=pLeft->pRight; 2894 } 2895 pRight = pExpr->pRight; 2896 assert( pRight->eType==FTSQUERY_PHRASE ); 2897 assert( pLeft->eType==FTSQUERY_PHRASE ); 2898 2899 rc = fts3NearMerge(mergetype, pExpr->nNear, 2900 pLeft->pPhrase->nToken, aLeft, nLeft, 2901 pRight->pPhrase->nToken, aRight, nRight, 2902 paOut, pnOut 2903 ); 2904 sqlite3_free(aLeft); 2905 break; 2906 } 2907 2908 case FTSQUERY_OR: { 2909 /* Allocate a buffer for the output. The maximum size is the 2910 ** sum of the sizes of the two input buffers. The +1 term is 2911 ** so that a buffer of zero bytes is never allocated - this can 2912 ** cause fts3DoclistMerge() to incorrectly return SQLITE_NOMEM. 2913 */ 2914 char *aBuffer = sqlite3_malloc(nRight+nLeft+1); 2915 rc = fts3DoclistMerge(MERGE_OR, 0, 0, aBuffer, pnOut, 2916 aLeft, nLeft, aRight, nRight, 0 2917 ); 2918 *paOut = aBuffer; 2919 sqlite3_free(aLeft); 2920 break; 2921 } 2922 2923 default: { 2924 assert( FTSQUERY_NOT==MERGE_NOT && FTSQUERY_AND==MERGE_AND ); 2925 fts3DoclistMerge(pExpr->eType, 0, 0, aLeft, pnOut, 2926 aLeft, nLeft, aRight, nRight, 0 2927 ); 2928 *paOut = aLeft; 2929 break; 2930 } 2931 } 2932 } 2933 sqlite3_free(aRight); 2934 } 2935 } 2936 2937 assert( rc==SQLITE_OK || *paOut==0 ); 2938 return rc; 2939 } 2940 2941 /* 2942 ** This function is called from within xNext() for each row visited by 2943 ** an FTS3 query. If evaluating the FTS3 query expression within xFilter() 2944 ** was able to determine the exact set of matching rows, this function sets 2945 ** *pbRes to true and returns SQLITE_IO immediately. 2946 ** 2947 ** Otherwise, if evaluating the query expression within xFilter() returned a 2948 ** superset of the matching documents instead of an exact set (this happens 2949 ** when the query includes very common tokens and it is deemed too expensive to 2950 ** load their doclists from disk), this function tests if the current row 2951 ** really does match the FTS3 query. 2952 ** 2953 ** If an error occurs, an SQLite error code is returned. Otherwise, SQLITE_OK 2954 ** is returned and *pbRes is set to true if the current row matches the 2955 ** FTS3 query (and should be included in the results returned to SQLite), or 2956 ** false otherwise. 2957 */ 2958 static int fts3EvalDeferred( 2959 Fts3Cursor *pCsr, /* FTS3 cursor pointing at row to test */ 2960 int *pbRes /* OUT: Set to true if row is a match */ 2961 ){ 2962 int rc = SQLITE_OK; 2963 if( pCsr->pDeferred==0 ){ 2964 *pbRes = 1; 2965 }else{ 2966 rc = fts3CursorSeek(0, pCsr); 2967 if( rc==SQLITE_OK ){ 2968 sqlite3Fts3FreeDeferredDoclists(pCsr); 2969 rc = sqlite3Fts3CacheDeferredDoclists(pCsr); 2970 } 2971 if( rc==SQLITE_OK ){ 2972 char *a = 0; 2973 int n = 0; 2974 rc = fts3EvalExpr(pCsr, pCsr->pExpr, &a, &n, 0); 2975 assert( n>=0 ); 2976 *pbRes = (n>0); 2977 sqlite3_free(a); 2978 } 2979 } 2980 return rc; 2981 } 2982 2983 /* 2984 ** Advance the cursor to the next row in the %_content table that 2985 ** matches the search criteria. For a MATCH search, this will be 2986 ** the next row that matches. For a full-table scan, this will be 2987 ** simply the next row in the %_content table. For a docid lookup, 2988 ** this routine simply sets the EOF flag. 2989 ** 2990 ** Return SQLITE_OK if nothing goes wrong. SQLITE_OK is returned 2991 ** even if we reach end-of-file. The fts3EofMethod() will be called 2992 ** subsequently to determine whether or not an EOF was hit. 2993 */ 2994 static int fts3NextMethod(sqlite3_vtab_cursor *pCursor){ 2995 int res; 2996 int rc = SQLITE_OK; /* Return code */ 2997 Fts3Cursor *pCsr = (Fts3Cursor *)pCursor; 2998 2999 pCsr->eEvalmode = FTS3_EVAL_NEXT; 3000 do { 3001 if( pCsr->aDoclist==0 ){ 3002 if( SQLITE_ROW!=sqlite3_step(pCsr->pStmt) ){ 3003 pCsr->isEof = 1; 3004 rc = sqlite3_reset(pCsr->pStmt); 3005 break; 3006 } 3007 pCsr->iPrevId = sqlite3_column_int64(pCsr->pStmt, 0); 3008 }else{ 3009 if( pCsr->pNextId>=&pCsr->aDoclist[pCsr->nDoclist] ){ 3010 pCsr->isEof = 1; 3011 break; 3012 } 3013 sqlite3_reset(pCsr->pStmt); 3014 fts3GetDeltaVarint(&pCsr->pNextId, &pCsr->iPrevId); 3015 pCsr->isRequireSeek = 1; 3016 pCsr->isMatchinfoNeeded = 1; 3017 } 3018 }while( SQLITE_OK==(rc = fts3EvalDeferred(pCsr, &res)) && res==0 ); 3019 3020 return rc; 3021 } 3022 3023 /* 3024 ** This is the xFilter interface for the virtual table. See 3025 ** the virtual table xFilter method documentation for additional 3026 ** information. 3027 ** 3028 ** If idxNum==FTS3_FULLSCAN_SEARCH then do a full table scan against 3029 ** the %_content table. 3030 ** 3031 ** If idxNum==FTS3_DOCID_SEARCH then do a docid lookup for a single entry 3032 ** in the %_content table. 3033 ** 3034 ** If idxNum>=FTS3_FULLTEXT_SEARCH then use the full text index. The 3035 ** column on the left-hand side of the MATCH operator is column 3036 ** number idxNum-FTS3_FULLTEXT_SEARCH, 0 indexed. argv[0] is the right-hand 3037 ** side of the MATCH operator. 3038 */ 3039 static int fts3FilterMethod( 3040 sqlite3_vtab_cursor *pCursor, /* The cursor used for this query */ 3041 int idxNum, /* Strategy index */ 3042 const char *idxStr, /* Unused */ 3043 int nVal, /* Number of elements in apVal */ 3044 sqlite3_value **apVal /* Arguments for the indexing scheme */ 3045 ){ 3046 const char *azSql[] = { 3047 "SELECT %s FROM %Q.'%q_content' AS x WHERE docid = ?", /* non-full-scan */ 3048 "SELECT %s FROM %Q.'%q_content' AS x ", /* full-scan */ 3049 }; 3050 int rc; /* Return code */ 3051 char *zSql; /* SQL statement used to access %_content */ 3052 Fts3Table *p = (Fts3Table *)pCursor->pVtab; 3053 Fts3Cursor *pCsr = (Fts3Cursor *)pCursor; 3054 3055 UNUSED_PARAMETER(idxStr); 3056 UNUSED_PARAMETER(nVal); 3057 3058 assert( idxNum>=0 && idxNum<=(FTS3_FULLTEXT_SEARCH+p->nColumn) ); 3059 assert( nVal==0 || nVal==1 ); 3060 assert( (nVal==0)==(idxNum==FTS3_FULLSCAN_SEARCH) ); 3061 assert( p->pSegments==0 ); 3062 3063 /* In case the cursor has been used before, clear it now. */ 3064 sqlite3_finalize(pCsr->pStmt); 3065 sqlite3_free(pCsr->aDoclist); 3066 sqlite3Fts3ExprFree(pCsr->pExpr); 3067 memset(&pCursor[1], 0, sizeof(Fts3Cursor)-sizeof(sqlite3_vtab_cursor)); 3068 3069 if( idxNum!=FTS3_DOCID_SEARCH && idxNum!=FTS3_FULLSCAN_SEARCH ){ 3070 int iCol = idxNum-FTS3_FULLTEXT_SEARCH; 3071 const char *zQuery = (const char *)sqlite3_value_text(apVal[0]); 3072 3073 if( zQuery==0 && sqlite3_value_type(apVal[0])!=SQLITE_NULL ){ 3074 return SQLITE_NOMEM; 3075 } 3076 3077 rc = sqlite3Fts3ExprParse(p->pTokenizer, p->azColumn, p->nColumn, 3078 iCol, zQuery, -1, &pCsr->pExpr 3079 ); 3080 if( rc!=SQLITE_OK ){ 3081 if( rc==SQLITE_ERROR ){ 3082 p->base.zErrMsg = sqlite3_mprintf("malformed MATCH expression: [%s]", 3083 zQuery); 3084 } 3085 return rc; 3086 } 3087 3088 rc = sqlite3Fts3ReadLock(p); 3089 if( rc!=SQLITE_OK ) return rc; 3090 3091 rc = fts3EvalExpr(pCsr, pCsr->pExpr, &pCsr->aDoclist, &pCsr->nDoclist, 0); 3092 sqlite3Fts3SegmentsClose(p); 3093 if( rc!=SQLITE_OK ) return rc; 3094 pCsr->pNextId = pCsr->aDoclist; 3095 pCsr->iPrevId = 0; 3096 } 3097 3098 /* Compile a SELECT statement for this cursor. For a full-table-scan, the 3099 ** statement loops through all rows of the %_content table. For a 3100 ** full-text query or docid lookup, the statement retrieves a single 3101 ** row by docid. 3102 */ 3103 zSql = (char *)azSql[idxNum==FTS3_FULLSCAN_SEARCH]; 3104 zSql = sqlite3_mprintf(zSql, p->zReadExprlist, p->zDb, p->zName); 3105 if( !zSql ){ 3106 rc = SQLITE_NOMEM; 3107 }else{ 3108 rc = sqlite3_prepare_v2(p->db, zSql, -1, &pCsr->pStmt, 0); 3109 sqlite3_free(zSql); 3110 } 3111 if( rc==SQLITE_OK && idxNum==FTS3_DOCID_SEARCH ){ 3112 rc = sqlite3_bind_value(pCsr->pStmt, 1, apVal[0]); 3113 } 3114 pCsr->eSearch = (i16)idxNum; 3115 3116 if( rc!=SQLITE_OK ) return rc; 3117 return fts3NextMethod(pCursor); 3118 } 3119 3120 /* 3121 ** This is the xEof method of the virtual table. SQLite calls this 3122 ** routine to find out if it has reached the end of a result set. 3123 */ 3124 static int fts3EofMethod(sqlite3_vtab_cursor *pCursor){ 3125 return ((Fts3Cursor *)pCursor)->isEof; 3126 } 3127 3128 /* 3129 ** This is the xRowid method. The SQLite core calls this routine to 3130 ** retrieve the rowid for the current row of the result set. fts3 3131 ** exposes %_content.docid as the rowid for the virtual table. The 3132 ** rowid should be written to *pRowid. 3133 */ 3134 static int fts3RowidMethod(sqlite3_vtab_cursor *pCursor, sqlite_int64 *pRowid){ 3135 Fts3Cursor *pCsr = (Fts3Cursor *) pCursor; 3136 if( pCsr->aDoclist ){ 3137 *pRowid = pCsr->iPrevId; 3138 }else{ 3139 /* This branch runs if the query is implemented using a full-table scan 3140 ** (not using the full-text index). In this case grab the rowid from the 3141 ** SELECT statement. 3142 */ 3143 assert( pCsr->isRequireSeek==0 ); 3144 *pRowid = sqlite3_column_int64(pCsr->pStmt, 0); 3145 } 3146 return SQLITE_OK; 3147 } 3148 3149 /* 3150 ** This is the xColumn method, called by SQLite to request a value from 3151 ** the row that the supplied cursor currently points to. 3152 */ 3153 static int fts3ColumnMethod( 3154 sqlite3_vtab_cursor *pCursor, /* Cursor to retrieve value from */ 3155 sqlite3_context *pContext, /* Context for sqlite3_result_xxx() calls */ 3156 int iCol /* Index of column to read value from */ 3157 ){ 3158 int rc; /* Return Code */ 3159 Fts3Cursor *pCsr = (Fts3Cursor *) pCursor; 3160 Fts3Table *p = (Fts3Table *)pCursor->pVtab; 3161 3162 /* The column value supplied by SQLite must be in range. */ 3163 assert( iCol>=0 && iCol<=p->nColumn+1 ); 3164 3165 if( iCol==p->nColumn+1 ){ 3166 /* This call is a request for the "docid" column. Since "docid" is an 3167 ** alias for "rowid", use the xRowid() method to obtain the value. 3168 */ 3169 sqlite3_int64 iRowid; 3170 rc = fts3RowidMethod(pCursor, &iRowid); 3171 sqlite3_result_int64(pContext, iRowid); 3172 }else if( iCol==p->nColumn ){ 3173 /* The extra column whose name is the same as the table. 3174 ** Return a blob which is a pointer to the cursor. 3175 */ 3176 sqlite3_result_blob(pContext, &pCsr, sizeof(pCsr), SQLITE_TRANSIENT); 3177 rc = SQLITE_OK; 3178 }else{ 3179 rc = fts3CursorSeek(0, pCsr); 3180 if( rc==SQLITE_OK ){ 3181 sqlite3_result_value(pContext, sqlite3_column_value(pCsr->pStmt, iCol+1)); 3182 } 3183 } 3184 return rc; 3185 } 3186 3187 /* 3188 ** This function is the implementation of the xUpdate callback used by 3189 ** FTS3 virtual tables. It is invoked by SQLite each time a row is to be 3190 ** inserted, updated or deleted. 3191 */ 3192 static int fts3UpdateMethod( 3193 sqlite3_vtab *pVtab, /* Virtual table handle */ 3194 int nArg, /* Size of argument array */ 3195 sqlite3_value **apVal, /* Array of arguments */ 3196 sqlite_int64 *pRowid /* OUT: The affected (or effected) rowid */ 3197 ){ 3198 return sqlite3Fts3UpdateMethod(pVtab, nArg, apVal, pRowid); 3199 } 3200 3201 /* 3202 ** Implementation of xSync() method. Flush the contents of the pending-terms 3203 ** hash-table to the database. 3204 */ 3205 static int fts3SyncMethod(sqlite3_vtab *pVtab){ 3206 int rc = sqlite3Fts3PendingTermsFlush((Fts3Table *)pVtab); 3207 sqlite3Fts3SegmentsClose((Fts3Table *)pVtab); 3208 return rc; 3209 } 3210 3211 /* 3212 ** Implementation of xBegin() method. This is a no-op. 3213 */ 3214 static int fts3BeginMethod(sqlite3_vtab *pVtab){ 3215 UNUSED_PARAMETER(pVtab); 3216 assert( ((Fts3Table *)pVtab)->nPendingData==0 ); 3217 return SQLITE_OK; 3218 } 3219 3220 /* 3221 ** Implementation of xCommit() method. This is a no-op. The contents of 3222 ** the pending-terms hash-table have already been flushed into the database 3223 ** by fts3SyncMethod(). 3224 */ 3225 static int fts3CommitMethod(sqlite3_vtab *pVtab){ 3226 UNUSED_PARAMETER(pVtab); 3227 assert( ((Fts3Table *)pVtab)->nPendingData==0 ); 3228 return SQLITE_OK; 3229 } 3230 3231 /* 3232 ** Implementation of xRollback(). Discard the contents of the pending-terms 3233 ** hash-table. Any changes made to the database are reverted by SQLite. 3234 */ 3235 static int fts3RollbackMethod(sqlite3_vtab *pVtab){ 3236 sqlite3Fts3PendingTermsClear((Fts3Table *)pVtab); 3237 return SQLITE_OK; 3238 } 3239 3240 /* 3241 ** Load the doclist associated with expression pExpr to pExpr->aDoclist. 3242 ** The loaded doclist contains positions as well as the document ids. 3243 ** This is used by the matchinfo(), snippet() and offsets() auxillary 3244 ** functions. 3245 */ 3246 int sqlite3Fts3ExprLoadDoclist(Fts3Cursor *pCsr, Fts3Expr *pExpr){ 3247 int rc; 3248 assert( pExpr->eType==FTSQUERY_PHRASE && pExpr->pPhrase ); 3249 assert( pCsr->eEvalmode==FTS3_EVAL_NEXT ); 3250 rc = fts3EvalExpr(pCsr, pExpr, &pExpr->aDoclist, &pExpr->nDoclist, 1); 3251 return rc; 3252 } 3253 3254 int sqlite3Fts3ExprLoadFtDoclist( 3255 Fts3Cursor *pCsr, 3256 Fts3Expr *pExpr, 3257 char **paDoclist, 3258 int *pnDoclist 3259 ){ 3260 int rc; 3261 assert( pCsr->eEvalmode==FTS3_EVAL_NEXT ); 3262 assert( pExpr->eType==FTSQUERY_PHRASE && pExpr->pPhrase ); 3263 pCsr->eEvalmode = FTS3_EVAL_MATCHINFO; 3264 rc = fts3EvalExpr(pCsr, pExpr, paDoclist, pnDoclist, 1); 3265 pCsr->eEvalmode = FTS3_EVAL_NEXT; 3266 return rc; 3267 } 3268 3269 /* 3270 ** After ExprLoadDoclist() (see above) has been called, this function is 3271 ** used to iterate/search through the position lists that make up the doclist 3272 ** stored in pExpr->aDoclist. 3273 */ 3274 char *sqlite3Fts3FindPositions( 3275 Fts3Expr *pExpr, /* Access this expressions doclist */ 3276 sqlite3_int64 iDocid, /* Docid associated with requested pos-list */ 3277 int iCol /* Column of requested pos-list */ 3278 ){ 3279 assert( pExpr->isLoaded ); 3280 if( pExpr->aDoclist ){ 3281 char *pEnd = &pExpr->aDoclist[pExpr->nDoclist]; 3282 char *pCsr; 3283 3284 if( pExpr->pCurrent==0 ){ 3285 pExpr->pCurrent = pExpr->aDoclist; 3286 pExpr->iCurrent = 0; 3287 pExpr->pCurrent += sqlite3Fts3GetVarint(pExpr->pCurrent,&pExpr->iCurrent); 3288 } 3289 pCsr = pExpr->pCurrent; 3290 assert( pCsr ); 3291 3292 while( pCsr<pEnd ){ 3293 if( pExpr->iCurrent<iDocid ){ 3294 fts3PoslistCopy(0, &pCsr); 3295 if( pCsr<pEnd ){ 3296 fts3GetDeltaVarint(&pCsr, &pExpr->iCurrent); 3297 } 3298 pExpr->pCurrent = pCsr; 3299 }else{ 3300 if( pExpr->iCurrent==iDocid ){ 3301 int iThis = 0; 3302 if( iCol<0 ){ 3303 /* If iCol is negative, return a pointer to the start of the 3304 ** position-list (instead of a pointer to the start of a list 3305 ** of offsets associated with a specific column). 3306 */ 3307 return pCsr; 3308 } 3309 while( iThis<iCol ){ 3310 fts3ColumnlistCopy(0, &pCsr); 3311 if( *pCsr==0x00 ) return 0; 3312 pCsr++; 3313 pCsr += sqlite3Fts3GetVarint32(pCsr, &iThis); 3314 } 3315 if( iCol==iThis && (*pCsr&0xFE) ) return pCsr; 3316 } 3317 return 0; 3318 } 3319 } 3320 } 3321 3322 return 0; 3323 } 3324 3325 /* 3326 ** Helper function used by the implementation of the overloaded snippet(), 3327 ** offsets() and optimize() SQL functions. 3328 ** 3329 ** If the value passed as the third argument is a blob of size 3330 ** sizeof(Fts3Cursor*), then the blob contents are copied to the 3331 ** output variable *ppCsr and SQLITE_OK is returned. Otherwise, an error 3332 ** message is written to context pContext and SQLITE_ERROR returned. The 3333 ** string passed via zFunc is used as part of the error message. 3334 */ 3335 static int fts3FunctionArg( 3336 sqlite3_context *pContext, /* SQL function call context */ 3337 const char *zFunc, /* Function name */ 3338 sqlite3_value *pVal, /* argv[0] passed to function */ 3339 Fts3Cursor **ppCsr /* OUT: Store cursor handle here */ 3340 ){ 3341 Fts3Cursor *pRet; 3342 if( sqlite3_value_type(pVal)!=SQLITE_BLOB 3343 || sqlite3_value_bytes(pVal)!=sizeof(Fts3Cursor *) 3344 ){ 3345 char *zErr = sqlite3_mprintf("illegal first argument to %s", zFunc); 3346 sqlite3_result_error(pContext, zErr, -1); 3347 sqlite3_free(zErr); 3348 return SQLITE_ERROR; 3349 } 3350 memcpy(&pRet, sqlite3_value_blob(pVal), sizeof(Fts3Cursor *)); 3351 *ppCsr = pRet; 3352 return SQLITE_OK; 3353 } 3354 3355 /* 3356 ** Implementation of the snippet() function for FTS3 3357 */ 3358 static void fts3SnippetFunc( 3359 sqlite3_context *pContext, /* SQLite function call context */ 3360 int nVal, /* Size of apVal[] array */ 3361 sqlite3_value **apVal /* Array of arguments */ 3362 ){ 3363 Fts3Cursor *pCsr; /* Cursor handle passed through apVal[0] */ 3364 const char *zStart = "<b>"; 3365 const char *zEnd = "</b>"; 3366 const char *zEllipsis = "<b>...</b>"; 3367 int iCol = -1; 3368 int nToken = 15; /* Default number of tokens in snippet */ 3369 3370 /* There must be at least one argument passed to this function (otherwise 3371 ** the non-overloaded version would have been called instead of this one). 3372 */ 3373 assert( nVal>=1 ); 3374 3375 if( nVal>6 ){ 3376 sqlite3_result_error(pContext, 3377 "wrong number of arguments to function snippet()", -1); 3378 return; 3379 } 3380 if( fts3FunctionArg(pContext, "snippet", apVal[0], &pCsr) ) return; 3381 3382 switch( nVal ){ 3383 case 6: nToken = sqlite3_value_int(apVal[5]); 3384 case 5: iCol = sqlite3_value_int(apVal[4]); 3385 case 4: zEllipsis = (const char*)sqlite3_value_text(apVal[3]); 3386 case 3: zEnd = (const char*)sqlite3_value_text(apVal[2]); 3387 case 2: zStart = (const char*)sqlite3_value_text(apVal[1]); 3388 } 3389 if( !zEllipsis || !zEnd || !zStart ){ 3390 sqlite3_result_error_nomem(pContext); 3391 }else if( SQLITE_OK==fts3CursorSeek(pContext, pCsr) ){ 3392 sqlite3Fts3Snippet(pContext, pCsr, zStart, zEnd, zEllipsis, iCol, nToken); 3393 } 3394 } 3395 3396 /* 3397 ** Implementation of the offsets() function for FTS3 3398 */ 3399 static void fts3OffsetsFunc( 3400 sqlite3_context *pContext, /* SQLite function call context */ 3401 int nVal, /* Size of argument array */ 3402 sqlite3_value **apVal /* Array of arguments */ 3403 ){ 3404 Fts3Cursor *pCsr; /* Cursor handle passed through apVal[0] */ 3405 3406 UNUSED_PARAMETER(nVal); 3407 3408 assert( nVal==1 ); 3409 if( fts3FunctionArg(pContext, "offsets", apVal[0], &pCsr) ) return; 3410 assert( pCsr ); 3411 if( SQLITE_OK==fts3CursorSeek(pContext, pCsr) ){ 3412 sqlite3Fts3Offsets(pContext, pCsr); 3413 } 3414 } 3415 3416 /* 3417 ** Implementation of the special optimize() function for FTS3. This 3418 ** function merges all segments in the database to a single segment. 3419 ** Example usage is: 3420 ** 3421 ** SELECT optimize(t) FROM t LIMIT 1; 3422 ** 3423 ** where 't' is the name of an FTS3 table. 3424 */ 3425 static void fts3OptimizeFunc( 3426 sqlite3_context *pContext, /* SQLite function call context */ 3427 int nVal, /* Size of argument array */ 3428 sqlite3_value **apVal /* Array of arguments */ 3429 ){ 3430 int rc; /* Return code */ 3431 Fts3Table *p; /* Virtual table handle */ 3432 Fts3Cursor *pCursor; /* Cursor handle passed through apVal[0] */ 3433 3434 UNUSED_PARAMETER(nVal); 3435 3436 assert( nVal==1 ); 3437 if( fts3FunctionArg(pContext, "optimize", apVal[0], &pCursor) ) return; 3438 p = (Fts3Table *)pCursor->base.pVtab; 3439 assert( p ); 3440 3441 rc = sqlite3Fts3Optimize(p); 3442 3443 switch( rc ){ 3444 case SQLITE_OK: 3445 sqlite3_result_text(pContext, "Index optimized", -1, SQLITE_STATIC); 3446 break; 3447 case SQLITE_DONE: 3448 sqlite3_result_text(pContext, "Index already optimal", -1, SQLITE_STATIC); 3449 break; 3450 default: 3451 sqlite3_result_error_code(pContext, rc); 3452 break; 3453 } 3454 } 3455 3456 /* 3457 ** Implementation of the matchinfo() function for FTS3 3458 */ 3459 static void fts3MatchinfoFunc( 3460 sqlite3_context *pContext, /* SQLite function call context */ 3461 int nVal, /* Size of argument array */ 3462 sqlite3_value **apVal /* Array of arguments */ 3463 ){ 3464 Fts3Cursor *pCsr; /* Cursor handle passed through apVal[0] */ 3465 assert( nVal==1 || nVal==2 ); 3466 if( SQLITE_OK==fts3FunctionArg(pContext, "matchinfo", apVal[0], &pCsr) ){ 3467 const char *zArg = 0; 3468 if( nVal>1 ){ 3469 zArg = (const char *)sqlite3_value_text(apVal[1]); 3470 } 3471 sqlite3Fts3Matchinfo(pContext, pCsr, zArg); 3472 } 3473 } 3474 3475 /* 3476 ** This routine implements the xFindFunction method for the FTS3 3477 ** virtual table. 3478 */ 3479 static int fts3FindFunctionMethod( 3480 sqlite3_vtab *pVtab, /* Virtual table handle */ 3481 int nArg, /* Number of SQL function arguments */ 3482 const char *zName, /* Name of SQL function */ 3483 void (**pxFunc)(sqlite3_context*,int,sqlite3_value**), /* OUT: Result */ 3484 void **ppArg /* Unused */ 3485 ){ 3486 struct Overloaded { 3487 const char *zName; 3488 void (*xFunc)(sqlite3_context*,int,sqlite3_value**); 3489 } aOverload[] = { 3490 { "snippet", fts3SnippetFunc }, 3491 { "offsets", fts3OffsetsFunc }, 3492 { "optimize", fts3OptimizeFunc }, 3493 { "matchinfo", fts3MatchinfoFunc }, 3494 }; 3495 int i; /* Iterator variable */ 3496 3497 UNUSED_PARAMETER(pVtab); 3498 UNUSED_PARAMETER(nArg); 3499 UNUSED_PARAMETER(ppArg); 3500 3501 for(i=0; i<SizeofArray(aOverload); i++){ 3502 if( strcmp(zName, aOverload[i].zName)==0 ){ 3503 *pxFunc = aOverload[i].xFunc; 3504 return 1; 3505 } 3506 } 3507 3508 /* No function of the specified name was found. Return 0. */ 3509 return 0; 3510 } 3511 3512 /* 3513 ** Implementation of FTS3 xRename method. Rename an fts3 table. 3514 */ 3515 static int fts3RenameMethod( 3516 sqlite3_vtab *pVtab, /* Virtual table handle */ 3517 const char *zName /* New name of table */ 3518 ){ 3519 Fts3Table *p = (Fts3Table *)pVtab; 3520 sqlite3 *db = p->db; /* Database connection */ 3521 int rc; /* Return Code */ 3522 3523 rc = sqlite3Fts3PendingTermsFlush(p); 3524 if( rc!=SQLITE_OK ){ 3525 return rc; 3526 } 3527 3528 fts3DbExec(&rc, db, 3529 "ALTER TABLE %Q.'%q_content' RENAME TO '%q_content';", 3530 p->zDb, p->zName, zName 3531 ); 3532 if( p->bHasDocsize ){ 3533 fts3DbExec(&rc, db, 3534 "ALTER TABLE %Q.'%q_docsize' RENAME TO '%q_docsize';", 3535 p->zDb, p->zName, zName 3536 ); 3537 } 3538 if( p->bHasStat ){ 3539 fts3DbExec(&rc, db, 3540 "ALTER TABLE %Q.'%q_stat' RENAME TO '%q_stat';", 3541 p->zDb, p->zName, zName 3542 ); 3543 } 3544 fts3DbExec(&rc, db, 3545 "ALTER TABLE %Q.'%q_segments' RENAME TO '%q_segments';", 3546 p->zDb, p->zName, zName 3547 ); 3548 fts3DbExec(&rc, db, 3549 "ALTER TABLE %Q.'%q_segdir' RENAME TO '%q_segdir';", 3550 p->zDb, p->zName, zName 3551 ); 3552 return rc; 3553 } 3554 3555 static const sqlite3_module fts3Module = { 3556 /* iVersion */ 0, 3557 /* xCreate */ fts3CreateMethod, 3558 /* xConnect */ fts3ConnectMethod, 3559 /* xBestIndex */ fts3BestIndexMethod, 3560 /* xDisconnect */ fts3DisconnectMethod, 3561 /* xDestroy */ fts3DestroyMethod, 3562 /* xOpen */ fts3OpenMethod, 3563 /* xClose */ fts3CloseMethod, 3564 /* xFilter */ fts3FilterMethod, 3565 /* xNext */ fts3NextMethod, 3566 /* xEof */ fts3EofMethod, 3567 /* xColumn */ fts3ColumnMethod, 3568 /* xRowid */ fts3RowidMethod, 3569 /* xUpdate */ fts3UpdateMethod, 3570 /* xBegin */ fts3BeginMethod, 3571 /* xSync */ fts3SyncMethod, 3572 /* xCommit */ fts3CommitMethod, 3573 /* xRollback */ fts3RollbackMethod, 3574 /* xFindFunction */ fts3FindFunctionMethod, 3575 /* xRename */ fts3RenameMethod, 3576 }; 3577 3578 /* 3579 ** This function is registered as the module destructor (called when an 3580 ** FTS3 enabled database connection is closed). It frees the memory 3581 ** allocated for the tokenizer hash table. 3582 */ 3583 static void hashDestroy(void *p){ 3584 Fts3Hash *pHash = (Fts3Hash *)p; 3585 sqlite3Fts3HashClear(pHash); 3586 sqlite3_free(pHash); 3587 } 3588 3589 /* 3590 ** The fts3 built-in tokenizers - "simple", "porter" and "icu"- are 3591 ** implemented in files fts3_tokenizer1.c, fts3_porter.c and fts3_icu.c 3592 ** respectively. The following three forward declarations are for functions 3593 ** declared in these files used to retrieve the respective implementations. 3594 ** 3595 ** Calling sqlite3Fts3SimpleTokenizerModule() sets the value pointed 3596 ** to by the argument to point to the "simple" tokenizer implementation. 3597 ** And so on. 3598 */ 3599 void sqlite3Fts3SimpleTokenizerModule(sqlite3_tokenizer_module const**ppModule); 3600 void sqlite3Fts3PorterTokenizerModule(sqlite3_tokenizer_module const**ppModule); 3601 #ifdef SQLITE_ENABLE_ICU 3602 void sqlite3Fts3IcuTokenizerModule(sqlite3_tokenizer_module const**ppModule); 3603 #endif 3604 3605 /* 3606 ** Initialise the fts3 extension. If this extension is built as part 3607 ** of the sqlite library, then this function is called directly by 3608 ** SQLite. If fts3 is built as a dynamically loadable extension, this 3609 ** function is called by the sqlite3_extension_init() entry point. 3610 */ 3611 int sqlite3Fts3Init(sqlite3 *db){ 3612 int rc = SQLITE_OK; 3613 Fts3Hash *pHash = 0; 3614 const sqlite3_tokenizer_module *pSimple = 0; 3615 const sqlite3_tokenizer_module *pPorter = 0; 3616 3617 #ifdef SQLITE_ENABLE_ICU 3618 const sqlite3_tokenizer_module *pIcu = 0; 3619 sqlite3Fts3IcuTokenizerModule(&pIcu); 3620 #endif 3621 3622 rc = sqlite3Fts3InitAux(db); 3623 if( rc!=SQLITE_OK ) return rc; 3624 3625 sqlite3Fts3SimpleTokenizerModule(&pSimple); 3626 sqlite3Fts3PorterTokenizerModule(&pPorter); 3627 3628 /* Allocate and initialise the hash-table used to store tokenizers. */ 3629 pHash = sqlite3_malloc(sizeof(Fts3Hash)); 3630 if( !pHash ){ 3631 rc = SQLITE_NOMEM; 3632 }else{ 3633 sqlite3Fts3HashInit(pHash, FTS3_HASH_STRING, 1); 3634 } 3635 3636 /* Load the built-in tokenizers into the hash table */ 3637 if( rc==SQLITE_OK ){ 3638 if( sqlite3Fts3HashInsert(pHash, "simple", 7, (void *)pSimple) 3639 || sqlite3Fts3HashInsert(pHash, "porter", 7, (void *)pPorter) 3640 #ifdef SQLITE_ENABLE_ICU 3641 || (pIcu && sqlite3Fts3HashInsert(pHash, "icu", 4, (void *)pIcu)) 3642 #endif 3643 ){ 3644 rc = SQLITE_NOMEM; 3645 } 3646 } 3647 3648 #ifdef SQLITE_TEST 3649 if( rc==SQLITE_OK ){ 3650 rc = sqlite3Fts3ExprInitTestInterface(db); 3651 } 3652 #endif 3653 3654 /* Create the virtual table wrapper around the hash-table and overload 3655 ** the two scalar functions. If this is successful, register the 3656 ** module with sqlite. 3657 */ 3658 if( SQLITE_OK==rc 3659 #if CHROMIUM_FTS3_CHANGES && !SQLITE_TEST 3660 /* fts3_tokenizer() disabled for security reasons. */ 3661 #else 3662 && SQLITE_OK==(rc = sqlite3Fts3InitHashTable(db, pHash, "fts3_tokenizer")) 3663 #endif 3664 && SQLITE_OK==(rc = sqlite3_overload_function(db, "snippet", -1)) 3665 && SQLITE_OK==(rc = sqlite3_overload_function(db, "offsets", 1)) 3666 && SQLITE_OK==(rc = sqlite3_overload_function(db, "matchinfo", 1)) 3667 && SQLITE_OK==(rc = sqlite3_overload_function(db, "matchinfo", 2)) 3668 && SQLITE_OK==(rc = sqlite3_overload_function(db, "optimize", 1)) 3669 ){ 3670 rc = sqlite3_create_module_v2( 3671 db, "fts3", &fts3Module, (void *)pHash, hashDestroy 3672 ); 3673 #if CHROMIUM_FTS3_CHANGES && !SQLITE_TEST 3674 /* Disable fts4 pending review. */ 3675 #else 3676 if( rc==SQLITE_OK ){ 3677 rc = sqlite3_create_module_v2( 3678 db, "fts4", &fts3Module, (void *)pHash, 0 3679 ); 3680 } 3681 #endif 3682 return rc; 3683 } 3684 3685 /* An error has occurred. Delete the hash table and return the error code. */ 3686 assert( rc!=SQLITE_OK ); 3687 if( pHash ){ 3688 sqlite3Fts3HashClear(pHash); 3689 sqlite3_free(pHash); 3690 } 3691 return rc; 3692 } 3693 3694 #if !SQLITE_CORE 3695 int sqlite3_extension_init( 3696 sqlite3 *db, 3697 char **pzErrMsg, 3698 const sqlite3_api_routines *pApi 3699 ){ 3700 SQLITE_EXTENSION_INIT2(pApi) 3701 return sqlite3Fts3Init(db); 3702 } 3703 #endif 3704 3705 #endif 3706