ndx
1.0.0
輕巧的全文索引和搜索庫。
當將所有文檔存儲在磁盤上(索引EDDB)上時,該庫是為特定用例設計的,並且可以動態添加或刪除到索引中。
查詢函數僅支持分離運算符。像one two
這樣的查詢將以"one" or "two"
作用。
倒置索引不會存儲術語位置,查詢功能將無法搜索諸如"Super Mario"
之類的短語。
有許多具有不同權衡的替代解決方案,可以更適合您的特定用例。對於使用靜態數據集的簡單文檔搜索,我建議使用諸如FST之類的東西並將其部署為邊緣功能(WASM)。
import { createIndex , indexAdd } from "ndx" ;
import { indexQuery } from "ndx/query" ;
const termFilter = ( term ) => term . toLowerCase ( ) ;
function createDocumentIndex ( fields ) {
// `createIndex()` creates an index data structure.
// First argument specifies how many different fields we want to index.
const index = createIndex (
fields . length ,
// Tokenizer is a function that breaks text into words, phrases, symbols,
// or other meaningful elements called tokens.
( s ) => s . split ( " " ) ,
// Filter is a function that processes tokens and returns terms, terms are
// used in Inverted Index to index documents.
termFilter ,
) ;
// `fieldGetters` is an array with functions that will be used to retrieve
// data from different fields.
const fieldGetters = fields . map ( ( f ) => ( doc ) => doc [ f . name ] ) ;
// `fieldBoostFactors` is an array of boost factors for each field, in this
// example all fields will have identical weight.
const fieldBoostFactors = fields . map ( ( ) => 1 ) ;
return {
index ,
// `add()` will add documents to the index.
add ( doc ) {
indexAdd (
index ,
fieldGetters ,
// Docum ent key, it can be an unique document id or a refernce to a
// document if you want to store all documents in memory.
doc . id ,
// Document.
doc ,
) ;
} ,
// `remove()` will remove documents from the index.
remove ( id ) {
// When document is removed we are just marking document id as being
// removed. Index data structure still contains references to the removed
// document.
indexRemove ( index , removed , id ) ;
if ( removed . size > 10 ) {
// `indexVacuum()` removes all references to removed documents from the
// index.
indexVacuum ( index , removed ) ;
}
} ,
// `search()` will be used to perform queries.
search ( q ) {
return indexQuery (
index ,
fieldBoostFactors ,
// BM25 ranking function constants:
// BM25 k1 constant, controls non-linear term frequency normalization
// (saturation).
1.2 ,
// BM25 b constant, controls to what degree document length normalizes
// tf values.
0.75 ,
q ,
) ;
}
} ;
}
// Create a document index that will index `content` field.
const index = createDocumentIndex ( [ { name : "content" } ] ) ;
const docs = [
{
"id" : "1" ,
"content" : "Lorem ipsum dolor" ,
} ,
{
"id" : "2" ,
"content" : "Lorem ipsum" ,
}
] ;
// Add documents to the index.
docs . forEach ( ( d ) => { index . add ( d ) ; } ) ;
// Perform a search query.
index . search ( "Lorem" ) ;
// => [{ key: "2" , score: ... }, { key: "1", score: ... } ]
//
// document with an id `"2"` is ranked higher because it has a `"content"`
// field with a less number of terms than document with an id `"1"`.
index . search ( "dolor" ) ;
// => [{ key: "1", score: ... }]
ndx
庫不提供任何令牌或過濾器。還有其他一些庫可以實現引物器,例如,自然圖書館有良好的引物器和詞幹。
麻省理工學院