js search下载 - js search源码下载

js search

其他源码

1.0.0

下载

安装|概述 |代币化 |词干 |停用词 |搜索索引 |指数策略

Js Search：客户端搜索库

Js Search 支持对 JavaScript 和 JSON 对象进行高效的客户端搜索。它与 ES5 兼容，不需要 jQuery 或任何其他第三方库。

Js Search 最初是 Lunr JS 的轻量级实现，提供运行时性能改进和更小的文件大小。此后，它已扩展到包含丰富的功能集 - 支持词干提取、停用词和 TF-IDF 排名。

以下是一些比较两个搜索库的 JS Perf 基准测试。（感谢 olivernn 调整 Lunr 一侧以获得更好的比较！）

搜索索引的初始构建
运行搜索

如果您正在寻找更简单的、针对 web-worker 优化的 JS 搜索实用程序，请查看 js-worker-search。

如果你喜欢这个项目，？成为赞助商或 ☕ 请我喝杯咖啡

安装

您可以使用 Bower 或 NPM 进行安装，如下所示：

npm install js-search
bower install js-search

概述

在较高级别上，您可以通过告诉 Js Search 应该索引哪些字段来进行搜索来配置 Js Search，然后添加要搜索的对象。

例如，JS Search 的简单使用如下：

 import * as JsSearch from 'js-search' ;

var theGreatGatsby = {
  isbn : '9781597226769' ,
  title : 'The Great Gatsby' ,
  author : {
    name : 'F. Scott Fitzgerald'
  } ,
  tags : [ 'book' , 'inspirational' ]
} ;
var theDaVinciCode = {
  isbn : '0307474275' ,
  title : 'The DaVinci Code' ,
  author : {
    name : 'Dan Brown'
  } ,
  tags : [ 'book' , 'mystery' ]
} ;
var angelsAndDemons = {
  isbn : '074349346X' ,
  title : 'Angels & Demons' ,
  author : {
    name : 'Dan Brown' ,
  } ,
  tags : [ 'book' , 'mystery' ]
} ;

var search = new JsSearch . Search ( 'isbn' ) ;
search . addIndex ( 'title' ) ;
search . addIndex ( [ 'author' , 'name' ] ) ;
search . addIndex ( 'tags' )

search . addDocuments ( [ theGreatGatsby , theDaVinciCode , angelsAndDemons ] ) ;

search . search ( 'The' ) ;    // [theGreatGatsby, theDaVinciCode]
search . search ( 'scott' ) ;  // [theGreatGatsby]
search . search ( 'dan' ) ;    // [angelsAndDemons, theDaVinciCode]
search . search ( 'mystery' ) // [angelsAndDemons, theDaVinciCode]

代币化

标记化是将文本（例如句子）分解为更小的、可搜索的标记（例如单词或单词的一部分）的过程。 Js Search 提供了一个基本的分词器，应该适用于英语，但您可以提供自己的分词器，如下所示：

 search . tokenizer = {
  tokenize ( text /* string */ ) {
    // Convert text to an Array of strings and return the Array
  }
} ;

词干提取

词干提取是将搜索标记减少到其词根（或“词干”）的过程，以便搜索单词的不同形式仍然会产生结果。例如“search”、“searching”和“searched”都可以简化为词干“search”。

Js Search 没有实现自己的词干库，但它确实支持通过使用第三方库进行词干提取。

要启用词干提取，请使用StemmingTokenizer如下所示：

 var stemmer = require ( 'porter-stemmer' ) . stemmer ;

search . tokenizer =
	new JsSearch . StemmingTokenizer (
        stemmer , // Function should accept a string param and return a string
	    new JsSearch . SimpleTokenizer ( ) ) ;

停用词

停用词非常常见（例如，a、an、and、the、of），并且通常没有语义意义。默认情况下，Js Search 不会过滤这些单词，但可以通过使用StopWordsTokenizer来启用过滤，如下所示：

 search . tokenizer =
	new JsSearch . StopWordsTokenizer (
    	new JsSearch . SimpleTokenizer ( ) ) ;

默认情况下，Js Search 使用 www.ranks.nl/stopwords 上列出的 Google 历史记录停用词的稍微修改版本。您可以通过在JsSearch.StopWordsMap对象中添加或删除值来修改此停用词列表，如下所示：

 JsSearch . StopWordsMap . the = false ; // Do not treat "the" as a stop word
JsSearch . StopWordsMap . bob = true ;  // Treat "bob" as a stop word

请注意，停用词是小写的，因此使用区分大小写的清理程序可能会阻止某些停用词被删除。

配置搜索索引

js-search打包了两个搜索索引。

词频-逆文档频率（或 TF-IDF）是一种数字统计量，旨在反映一个或多个单词对于语料库中的文档的重要性。 TF-IDF 值与单词在文档中出现的次数成比例增加，但会随着单词在语料库中出现的频率而偏移。这有助于调整某些单词（例如 and、or、the）比其他单词出现频率更高的事实。

默认情况下，Js Search 支持 TF-IDF 排名，但如果不需要，可以出于性能原因禁用此功能。您可以指定备用ISearchIndex实现以禁用 TF-IDF，如下所示：

 // default
search . searchIndex = new JsSearch . TfIdfSearchIndex ( ) ;

// Search index capable of returning results matching a set of tokens
// but without any meaningful rank or order.
search . searchIndex = new JsSearch . UnorderedSearchIndex ( ) ;

配置索引策略

js-search封装了三种索引策略。

PrefixIndexStrategy用于前缀搜索的索引。（例如，术语“cat”被索引为“c”、“ca”和“cat”，允许前缀搜索查找）。

AllSubstringsIndexStrategy所有子字符串的索引。换句话说，“c”、“ca”、“cat”、“a”、“at”和“t”都与“cat”匹配。

ExactWordIndexStrategy精确单词匹配的索引。例如，“bob”将匹配“bob jones”（但“bo”不会）。

默认情况下，Js Search 支持前缀索引，但这是可配置的。您可以指定备用IIndexStrategy实现以禁用前缀索引，如下所示：

 // default
search . indexStrategy = new JsSearch . PrefixIndexStrategy ( ) ;

// this index strategy is built for all substrings matches.
search . indexStrategy = new JsSearch . AllSubstringsIndexStrategy ( ) ;

// this index strategy is built for exact word matches.
search . indexStrategy = new JsSearch . ExactWordIndexStrategy ( ) ;