pg_search 다운로드 - pg_search 소스 코드 다운로드

pg_search

빌드 상태

설명

PgSearch는 PostgreSQL의 전체 텍스트 검색을 활용하는 명명된 범위를 구축합니다.

https://tanzu.vmware.com/content/blog/pg-search-how-i-learned-to-stop-worrying-and-love-postgresql-full-text-search에서 PgSearch를 소개하는 블로그 게시물을 읽어보세요.

요구 사항

루비 3.1+
액티브 레코드 7.0+
포스트그레SQL 9.2+
특정 기능을 위한 PostgreSQL 확장

설치하다

 $ gem install pg_search

또는 Gemfile에 다음 줄을 추가하세요.

 gem 'pg_search'

레일이 아닌 프로젝트

gem을 설치하고 요구하는 것 외에도 Rakefile에 PgSearch rake 작업을 포함할 수 있습니다. 이는 Railtie를 통해 Rake 작업을 얻는 Rails 프로젝트에는 필요하지 않습니다.

 load "pg_search/tasks.rb"

용법

Active Record 모델에 PgSearch를 추가하려면 PgSearch 모듈을 포함하기만 하면 됩니다.

 class Shape < ActiveRecord :: Base
  include PgSearch :: Model
end

내용물

다중 검색과 검색 범위
다중 검색
- 설정
- multisearchable
- 추가 옵션
- 다중 검색 연결
- 글로벌 검색 색인에서 검색하기
- 결과에 대한 체인 메서드 호출
- 다중 검색 구성
- 특정 클래스에 대한 검색 문서 재구축
- 다중 검색 색인 생성을 일시적으로 비활성화
pg_search_scope
- 하나의 열에 대해 검색
- 여러 열에 대해 검색
- 동적 검색 범위
- 협회를 통한 검색
다양한 검색 기능을 사용하여 검색하기
- :tsearch (전체 텍스트 검색)
  - 가중치
  - :prefix (PostgreSQL 8.4 이상에만 해당)
  - :negation
  - :dictionary
  - :normalization
  - :any_word
  - :sort_only
  - :highlight
- :dmetaphone (더블 메타폰 유사음향 검색)
- :trigram (트라이그램 검색)
  - :threshold
  - :word_similarity
기능 결합 시 필드 제한
악센트 표시 무시
tsVector 열 사용
- 여러 ts벡터 결합
순위 및 순서 구성
- :ranked_by (순위 알고리즘 선택)
- :order_within_rank (동점 끊기)
- PgSearch#pg_search_rank (레코드의 순위를 Float로 읽기)
- 검색 순위 및 연결된 범위

다중 검색과 검색 범위

pg_search는 다중 검색, 검색 범위라는 두 가지 검색 기술을 지원합니다.

첫 번째 기술은 다중 검색으로, 다양한 Active Record 클래스의 레코드를 전체 애플리케이션에 걸쳐 하나의 글로벌 검색 인덱스로 혼합할 수 있습니다. 일반 검색 페이지를 지원하려는 대부분의 사이트는 이 기능을 사용하기를 원할 것입니다.

다른 기술은 검색 범위로, 하나의 Active Record 클래스에 대해서만 고급 검색을 수행할 수 있습니다. 이는 자동 완성 기능과 같은 기능을 구축하거나 패싯 검색에서 항목 목록을 필터링하는 데 더 유용합니다.

다중 검색

설정

다중 검색을 사용하기 전에 마이그레이션을 생성하고 실행하여 pg_search_documents 데이터베이스 테이블을 생성해야 합니다.

$ rails g pg_search:migration:multisearch
$ rake db:migrate

다중 검색 가능

애플리케이션의 전역 검색 색인에 모델을 추가하려면 해당 클래스 정의에서 multisearchable을 호출하세요.

 class EpicPoem < ActiveRecord :: Base
  include PgSearch :: Model
  multisearchable against : [ :title , :author ]
end

class Flower < ActiveRecord :: Base
  include PgSearch :: Model
  multisearchable against : :color
end

이 모델에 이미 기존 레코드가 있는 경우 기존 레코드를 pg_search_documents 테이블로 가져오려면 이 모델을 다시 색인화해야 합니다. 아래의 재구축 작업을 참조하세요.

레코드가 생성, 업데이트 또는 삭제될 때마다 활성 레코드 콜백이 실행되어 pg_search_documents 테이블에 해당 PgSearch::Document 레코드가 생성됩니다. :against 옵션은 검색 텍스트를 생성하기 위해 레코드에 대해 호출되는 하나 이상의 메소드일 수 있습니다.

특정 레코드가 포함되어야 하는지 여부를 결정하기 위해 호출할 Proc 또는 메소드 이름을 전달할 수도 있습니다.

 class Convertible < ActiveRecord :: Base
  include PgSearch :: Model
  multisearchable against : [ :make , :model ] ,
                  if : :available_in_red?
end

class Jalopy < ActiveRecord :: Base
  include PgSearch :: Model
  multisearchable against : [ :make , :model ] ,
                  if : lambda { | record | record . model_year > 1970 }
end

Proc 또는 메소드 이름은 after_save 후크에서 호출됩니다. 이는 Time이나 다른 개체를 사용할 때 주의해야 한다는 의미입니다. 다음 예에서 레코드가 게시된_at 타임스탬프 이전에 마지막으로 저장된 경우 타임스탬프 이후에 다시 터치될 때까지 전역 검색에 전혀 나열되지 않습니다.

 class AntipatternExample
  include PgSearch :: Model
  multisearchable against : [ :contents ] ,
                  if : :published?

  def published?
    published_at < Time . now
  end
end

problematic_record = AntipatternExample . create! (
  contents : "Using :if with a timestamp" ,
  published_at : 10 . minutes . from_now
)

problematic_record . published?     # => false
PgSearch . multisearch ( "timestamp" ) # => No results

sleep 20 . minutes

problematic_record . published?     # => true
PgSearch . multisearch ( "timestamp" ) # => No results

problematic_record . save!

problematic_record . published?     # => true
PgSearch . multisearch ( "timestamp" ) # => Includes problematic_record

추가 옵션

조건부로 pg_search_documents 업데이트

또한 :update_if 옵션을 사용하여 특정 레코드를 업데이트해야 하는지 여부를 결정하기 위해 호출할 Proc 또는 메소드 이름을 전달할 수도 있습니다.

Proc 또는 메소드 이름은 after_save 후크에서 호출되므로 ActiveRecord 더티 플래그를 사용하는 경우 *_previously_changed? 사용하십시오. .

 class Message < ActiveRecord :: Base
  include PgSearch :: Model
  multisearchable against : [ :body ] ,
                  update_if : :body_previously_changed?
end

pg_search_documents 테이블에 저장할 추가 속성을 지정합니다.

pg_search_documents 테이블 내에 저장되도록 :additional_attributes 지정할 수 있습니다. 예를 들어, 책 모델과 기사 모델을 색인화하고 작성자_ID를 포함하고 싶을 수도 있습니다.

먼저 pg_search_documents 테이블을 생성하는 마이그레이션에 작성자에 대한 참조를 추가해야 합니다.

  create_table :pg_search_documents do | t |
    t . text :content
    t . references :author , index : true
    t . belongs_to :searchable , polymorphic : true , index : true
    t . timestamps null : false
  end

그런 다음 이 추가 속성을 람다로 보낼 수 있습니다.

  multisearchable (
    against : [ :title , :body ] ,
    additional_attributes : -> ( article ) { { author_id : article . author_id } }
  )

이렇게 하면 나중에 다음과 같은 작업을 수행하여 조인 없이 훨씬 더 빠른 검색이 가능해집니다.

 PgSearch . multisearch ( params [ 'search' ] ) . where ( author_id : 2 )

참고: pg_search_documents 테이블에 추가 속성을 포함하려면 현재 record.update_pg_search_document 수동으로 호출해야 합니다.

다중 검색 연결

두 개의 연결이 자동으로 구축됩니다. 원본 레코드에는 PgSearch::Document 레코드를 가리키는 has_one :pg_search_document 연관이 있고, PgSearch::Document 레코드에는 원래 레코드를 다시 가리키는 속하는_to :searchable 다형성 연관이 있습니다.

 odyssey = EpicPoem . create! ( title : "Odyssey" , author : "Homer" )
search_document = odyssey . pg_search_document #=> PgSearch::Document instance
search_document . searchable #=> #<EpicPoem id: 1, title: "Odyssey", author: "Homer">

글로벌 검색 색인에서 검색하기

특정 쿼리와 일치하는 모든 레코드에 대한 PgSearch::Document 항목을 가져오려면 PgSearch.multisearch를 사용하세요.

 odyssey = EpicPoem . create! ( title : "Odyssey" , author : "Homer" )
rose = Flower . create! ( color : "Red" )
PgSearch . multisearch ( "Homer" ) #=> [#<PgSearch::Document searchable: odyssey>]
PgSearch . multisearch ( "Red" ) #=> [#<PgSearch::Document searchable: rose>]

결과에 대한 체인 메서드 호출

PgSearch.multisearch는 범위와 마찬가지로 ActiveRecord::Relation을 반환하므로 범위 호출을 끝까지 연결할 수 있습니다. 이는 범위 메소드를 추가하는 Kaminari와 같은 gem에서 작동합니다. 일반 범위와 마찬가지로 데이터베이스는 필요한 경우에만 SQL 요청을 받습니다.

 PgSearch . multisearch ( "Bertha" ) . limit ( 10 )
PgSearch . multisearch ( "Juggler" ) . where ( searchable_type : "Occupation" )
PgSearch . multisearch ( "Alamo" ) . page ( 3 ) . per ( 30 )
PgSearch . multisearch ( "Diagonal" ) . find_each do | document |
  puts document . searchable . updated_at
end
PgSearch . multisearch ( "Moro" ) . reorder ( "" ) . group ( :searchable_type ) . count ( :all )
PgSearch . multisearch ( "Square" ) . includes ( :searchable )

다중 검색 구성

PgSearch.multisearch는 pg_search_scope 와 동일한 옵션을 사용하여 구성할 수 있습니다(아래에 자세히 설명되어 있음). 초기화 프로그램에서 PgSearch.multisearch_options를 설정하기만 하면 됩니다.

 PgSearch . multisearch_options = {
  using : [ :tsearch , :trigram ] ,
  ignoring : :accents
}

특정 클래스에 대한 검색 문서 재구축

클래스의 :against 옵션을 변경하거나 데이터베이스에 이미 레코드가 있는 클래스에 multisearchable을 추가하거나 클래스에서 multisearchable을 제거하여 인덱스에서 제거하면 pg_search_documents 테이블이 아웃될 수 있습니다. 다른 테이블의 실제 레코드와 동기화됩니다.

Active Record 콜백을 트리거하지 않는 방식으로 레코드를 수정하는 경우에도 인덱스가 동기화되지 않을 수 있습니다. 예를 들어 #update_attribute 인스턴스 메서드와 .update_all 클래스 메서드는 모두 콜백을 건너뛰고 데이터베이스를 직접 수정합니다.

특정 클래스에 대한 모든 문서를 제거하려면 PgSearch::Document 레코드를 모두 삭제하면 됩니다.

 PgSearch :: Document . delete_by ( searchable_type : "Animal" )

특정 클래스에 대한 문서를 다시 생성하려면 다음을 실행하세요.

 PgSearch :: Multisearch . rebuild ( Product )

rebuild 방법은 해당 클래스를 다시 생성하기 전에 해당 클래스의 모든 문서를 삭제합니다. 단일 테이블 상속을 사용하고 searchable_type 이 기본 클래스인 경우와 같은 일부 상황에서는 이것이 바람직하지 않을 수 있습니다. 다음과 같이 rebuild 인해 레코드가 삭제되는 것을 방지할 수 있습니다.

 PgSearch :: Multisearch . rebuild ( Product , clean_up : false )

rebuild 단일 트랜잭션 내에서 실행됩니다. 트랜잭션 외부에서 실행하려면 다음과 같이 transactional: false 전달할 수 있습니다.

 PgSearch :: Multisearch . rebuild ( Product , transactional : false )

편의를 위해 재구성은 Rake 작업으로도 사용할 수 있습니다.

 $ rake pg_search:multisearch:rebuild[BlogPost]

여러 pg_search_documents 테이블이 있는 다중 테넌트 데이터베이스에 대해 사용할 PostgreSQL 스키마 검색 경로를 지정하기 위해 두 번째 선택적 인수를 전달할 수 있습니다. 다음은 다시 인덱싱하기 전에 스키마 검색 경로를 "my_schema"로 설정합니다.

 $ rake pg_search:multisearch:rebuild[BlogPost,my_schema]

다중 검색이 가능한 모델의 경우 Active Record 속성에 직접 매핑되는 메서드 :against 효율적인 단일 SQL 문이 실행되어 pg_search_documents 테이블을 한 번에 업데이트합니다. 그러나 :against 에서 동적 메소드를 호출하면 일괄적으로 색인화되는 개별 레코드에 대해 update_pg_search_document 호출됩니다.

또한 모델에 rebuild_pg_search_documents 라는 클래스 메서드를 추가하여 문서를 다시 작성하기 위한 사용자 정의 구현을 제공할 수도 있습니다.

 class Movie < ActiveRecord :: Base
  belongs_to :director

  def director_name
    director . name
  end

  multisearchable against : [ :name , :director_name ]

  # Naive approach
  def self . rebuild_pg_search_documents
    find_each { | record | record . update_pg_search_document }
  end

  # More sophisticated approach
  def self . rebuild_pg_search_documents
    connection . execute <<~SQL . squish
     INSERT INTO pg_search_documents (searchable_type, searchable_id, content, created_at, updated_at)
       SELECT 'Movie' AS searchable_type,
              movies.id AS searchable_id,
              CONCAT_WS(' ', movies.name, directors.name) AS content,
              now() AS created_at,
              now() AS updated_at
       FROM movies
       LEFT JOIN directors
         ON directors.id = movies.director_id
    SQL
  end
end

참고: 9.1 이전에 PostgreSQL을 사용하는 경우 CONCAT_WS() 함수 호출을 이중 파이프 연결로 바꾸십시오. (movies.name || ' ' || directors.name) . 그러나 이제 조인된 값 중 하나 라도 NULL이면 최종 content 값도 NULL이 되는 반면 CONCAT_WS() NULL 값을 선택적으로 무시합니다.

다중 검색 색인 생성을 일시적으로 비활성화

외부 소스에서 많은 레코드를 가져오는 등 대규모 대량 작업을 수행해야 하는 경우 색인 생성을 일시적으로 꺼서 작업 속도를 높일 수 있습니다. 그런 다음 위의 기술 중 하나를 사용하여 검색 문서를 오프라인으로 다시 작성할 수 있습니다.

 PgSearch . disable_multisearch do
  Movie . import_from_xml_file ( File . open ( "movies.xml" ) )
end

pg_search_scope

pg_search_scope를 사용하여 검색 범위를 구축할 수 있습니다. 첫 번째 매개변수는 범위 이름이고 두 번째 매개변수는 옵션 해시입니다. 유일한 필수 옵션은 pg_search_scope에 검색할 열을 알려주는 :against입니다.

하나의 열에 대해 검색

열을 검색하려면 기호를 :against 옵션으로 전달하세요.

 class BlogPost < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_by_title , against : :title
end

이제 BlogPost 모델에 search_by_title이라는 ActiveRecord 범위가 있습니다. 하나의 매개변수인 검색 쿼리 문자열을 사용합니다.

 BlogPost . create! ( title : "Recent Developments in the World of Pastrami" )
BlogPost . create! ( title : "Prosciutto and You: A Retrospective" )
BlogPost . search_by_title ( "pastrami" ) # => [#<BlogPost id: 2, title: "Recent Developments in the World of Pastrami">]

여러 열에 대해 검색

두 개 이상의 열을 검색하려면 배열을 전달하면 됩니다.

 class Person < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_by_full_name , against : [ :first_name , :last_name ]
end

이제 검색 쿼리는 열 중 하나 또는 둘 다와 일치할 수 있습니다.

 person_1 = Person . create! ( first_name : "Grant" , last_name : "Hill" )
person_2 = Person . create! ( first_name : "Hugh" , last_name : "Grant" )

Person . search_by_full_name ( "Grant" ) # => [person_1, person_2]
Person . search_by_full_name ( "Grant Hill" ) # => [person_1]

동적 검색 범위

Active Record 명명된 범위와 마찬가지로 옵션 해시를 반환하는 Proc 개체를 전달할 수 있습니다. 예를 들어, 다음 범위는 검색할 열을 동적으로 선택하는 매개변수를 사용합니다.

중요: 반환된 해시에는 :query 키가 포함되어야 합니다. 그 값이 반드시 동적일 필요는 없습니다. 원하는 경우 특정 값으로 하드 코딩하도록 선택할 수 있습니다.

 class Person < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_by_name , lambda { | name_part , query |
    raise ArgumentError unless [ :first , :last ] . include? ( name_part )
    {
      against : name_part ,
      query : query
    }
  }
end

person_1 = Person . create! ( first_name : "Grant" , last_name : "Hill" )
person_2 = Person . create! ( first_name : "Hugh" , last_name : "Grant" )

Person . search_by_name :first , "Grant" # => [person_1]
Person . search_by_name :last , "Grant" # => [person_2]

협회를 통한 검색

관련 모델의 컬럼을 검색할 수 있습니다. 이렇게 하면 데이터베이스 인덱스를 사용한 검색 속도를 높일 수 없습니다. 그러나 모델 간 검색을 시도하는 빠른 방법으로 지원됩니다.

:associated_against 옵션에 해시를 전달하여 연관 검색을 설정할 수 있습니다. 키는 연결의 이름이며 값은 다른 모델의 :against 옵션처럼 작동합니다. 현재로서는 하나의 연결보다 더 깊은 검색이 지원되지 않습니다. 일련의 :through 연결을 설정하여 모든 방향을 가리켜 이 문제를 해결할 수 있습니다.

 class Cracker < ActiveRecord :: Base
  has_many :cheeses
end

class Cheese < ActiveRecord :: Base
end

class Salami < ActiveRecord :: Base
  include PgSearch :: Model

  belongs_to :cracker
  has_many :cheeses , through : :cracker

  pg_search_scope :tasty_search , associated_against : {
    cheeses : [ :kind , :brand ] ,
    cracker : :kind
  }
end

salami_1 = Salami . create!
salami_2 = Salami . create!
salami_3 = Salami . create!

limburger = Cheese . create! ( kind : "Limburger" )
brie = Cheese . create! ( kind : "Brie" )
pepper_jack = Cheese . create! ( kind : "Pepper Jack" )

Cracker . create! ( kind : "Black Pepper" , cheeses : [ brie ] , salami : salami_1 )
Cracker . create! ( kind : "Ritz" , cheeses : [ limburger , pepper_jack ] , salami : salami_2 )
Cracker . create! ( kind : "Graham" , cheeses : [ limburger ] , salami : salami_3 )

Salami . tasty_search ( "pepper" ) # => [salami_1, salami_2]

다양한 검색 기능을 사용하여 검색하기

기본적으로 pg_search_scope는 내장된 PostgreSQL 텍스트 검색을 사용합니다. :using 옵션을 pg_search_scope에 전달하면 대체 검색 기술을 선택할 수 있습니다.

 class Beer < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_name , against : :name , using : [ :tsearch , :trigram , :dmetaphone ]
end

다음은 추가 구성과 함께 여러 :using 옵션을 전달하는 경우의 예입니다.

 class Beer < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_name ,
  against : :name ,
  using : {
      :trigram => { } ,
      :dmetaphone => { } ,
      :tsearch => { :prefix => true }
  }
end

현재 구현된 기능은

:tsearch - PostgreSQL에 내장된 전체 텍스트 검색
:trigram - 트라이그램 확장이 필요한 트라이그램 검색
:dmetaphone - fuzzystrmatch 확장이 필요한 이중 메타폰 검색

:tsearch(전체 텍스트 검색)

PostgreSQL에 내장된 전체 텍스트 검색은 여러 언어로 가중치 부여, 접두사 검색 및 형태소 분석을 지원합니다.

가중치

검색 가능한 각 열에는 "A", "B", "C" 또는 "D"의 가중치가 부여될 수 있습니다. 앞 글자가 있는 열은 뒤의 글자가 있는 열보다 가중치가 더 높습니다. 따라서 다음 예에서는 제목이 가장 중요하고 그 다음이 부제, 마지막으로 내용입니다.

 class NewsArticle < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_full_text , against : {
    title : 'A' ,
    subtitle : 'B' ,
    content : 'C'
  }
end

가중치를 배열의 배열로 전달하거나 #each에 응답하고 단일 기호 또는 기호와 가중치를 생성하는 다른 구조로 전달할 수도 있습니다. 가중치를 생략하면 기본값이 사용됩니다.

 class NewsArticle < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_full_text , against : [
    [ :title , 'A' ] ,
    [ :subtitle , 'B' ] ,
    [ :content , 'C' ]
  ]
end

class NewsArticle < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_full_text , against : [
    [ :title , 'A' ] ,
    { subtitle : 'B' } ,
    :content
  ]
end

:prefix(PostgreSQL 8.4 이상에만 해당)

PostgreSQL의 전체 텍스트 검색은 기본적으로 전체 단어와 일치합니다. 그러나 부분 단어를 검색하려면 :prefix를 true로 설정하면 됩니다. 이는 :tsearch 전용 옵션이므로 다음 예와 같이 :tsearch에 직접 전달해야 합니다.

 class Superhero < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :whose_name_starts_with ,
                  against : :name ,
                  using : {
                    tsearch : { prefix : true }
                  }
end

batman = Superhero . create name : 'Batman'
batgirl = Superhero . create name : 'Batgirl'
robin = Superhero . create name : 'Robin'

Superhero . whose_name_starts_with ( "Bat" ) # => [batman, batgirl]

:부정

PostgreSQL의 전체 텍스트 검색은 기본적으로 모든 검색어와 일치합니다. 특정 단어를 제외하려면 :negation을 true로 설정하면 됩니다. 그런 다음 느낌표로 시작하는 모든 용어 ! 결과에서 제외됩니다. 이는 :tsearch 전용 옵션이므로 다음 예와 같이 :tsearch에 직접 전달해야 합니다.

이를 다른 검색 기능과 결합하면 예상치 못한 결과가 발생할 수 있습니다. 예를 들어 :trigram 검색에는 제외된 용어라는 개념이 없으므로 :tsearch와 :trigram을 동시에 사용하는 경우에도 제외하려는 용어가 포함된 결과를 찾을 수 있습니다.

 class Animal < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :with_name_matching ,
                  against : :name ,
                  using : {
                    tsearch : { negation : true }
                  }
end

one_fish = Animal . create ( name : "one fish" )
two_fish = Animal . create ( name : "two fish" )
red_fish = Animal . create ( name : "red fish" )
blue_fish = Animal . create ( name : "blue fish" )

Animal . with_name_matching ( "fish !red !blue" ) # => [one_fish, two_fish]

:사전

PostgreSQL 전체 텍스트 검색은 형태소 분석을 위한 여러 사전도 지원합니다. PostgreSQL 문서를 읽으면 사전 작동 방식에 대해 자세히 알아볼 수 있습니다. "english"와 같은 언어 사전 중 하나를 사용하면 단어 변형(예: "jumping" 및 "jumped")이 서로 일치합니다. 형태소 분석을 원하지 않으면 형태소 분석을 수행하지 않는 "간단한" 사전을 선택해야 합니다. 사전을 지정하지 않으면 "간단한" 사전이 사용됩니다.

 class BoringTweet < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :kinda_matching ,
                  against : :text ,
                  using : {
                    tsearch : { dictionary : "english" }
                  }
  pg_search_scope :literally_matching ,
                  against : :text ,
                  using : {
                    tsearch : { dictionary : "simple" }
                  }
end

sleep = BoringTweet . create! text : "I snoozed my alarm for fourteen hours today. I bet I can beat that tomorrow! #sleep"
sleeping = BoringTweet . create! text : "You know what I like? Sleeping. That's what. #enjoyment"
sleeps = BoringTweet . create! text : "In the jungle, the mighty jungle, the lion sleeps #tonight"

BoringTweet . kinda_matching ( "sleeping" ) # => [sleep, sleeping, sleeps]
BoringTweet . literally_matching ( "sleeps" ) # => [sleeps]

:표준화

PostgreSQL은 쿼리에 대한 결과 순위를 매기는 여러 알고리즘을 지원합니다. 예를 들어, 전체 문서 크기나 원본 텍스트의 여러 검색어 사이의 거리를 고려할 수 있습니다. 이 옵션은 PostgreSQL에 직접 전달되는 정수를 사용합니다. 최신 PostgreSQL 문서에 따르면 지원되는 알고리즘은 다음과 같습니다.

 0 (the default) ignores the document length
1 divides the rank by 1 + the logarithm of the document length
2 divides the rank by the document length
4 divides the rank by the mean harmonic distance between extents
8 divides the rank by the number of unique words in document
16 divides the rank by 1 + the logarithm of the number of unique words in document
32 divides the rank by itself + 1

이 정수는 비트마스크이므로 알고리즘을 결합하려는 경우 해당 숫자를 함께 추가할 수 있습니다. (예: 알고리즘 1, 8, 32를 사용하려면 1 + 8 + 32 = 41을 전달합니다.)

 class BigLongDocument < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :regular_search ,
                  against : :text

  pg_search_scope :short_search ,
                  against : :text ,
                  using : {
                    tsearch : { normalization : 2 }
                  }

long = BigLongDocument . create! ( text : "Four score and twenty years ago" )
short = BigLongDocument . create! ( text : "Four score" )

BigLongDocument . regular_search ( "four score" ) #=> [long, short]
BigLongDocument . short_search ( "four score" ) #=> [short, long]

:아무_단어

이 속성을 true로 설정하면 검색어에 단어가 포함된 모든 모델을 반환하는 검색이 수행됩니다.

 class Number < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search_any_word ,
                  against : :text ,
                  using : {
                    tsearch : { any_word : true }
                  }

  pg_search_scope :search_all_words ,
                  against : :text
end

one = Number . create! text : 'one'
two = Number . create! text : 'two'
three = Number . create! text : 'three'

Number . search_any_word ( 'one two three' ) # => [one, two, three]
Number . search_all_words ( 'one two three' ) # => []

:정렬만

이 속성을 true로 설정하면 이 기능을 정렬에 사용할 수 있지만 쿼리의 WHERE 조건에는 포함되지 않습니다.

 class Person < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search ,
                  against : :name ,
                  using : {
                    tsearch : { any_word : true } ,
                    dmetaphone : { any_word : true , sort_only : true }
                  }
end

exact = Person . create! ( name : 'ash hines' )
one_exact_one_close = Person . create! ( name : 'ash heinz' )
one_exact = Person . create! ( name : 'ash smith' )
one_close = Person . create! ( name : 'leigh heinz' )

Person . search ( 'ash hines' ) # => [exact, one_exact_one_close, one_exact]

:가장 밝은 부분

pg_search_scope 뒤에 .with_pg_search_highlight를 추가하면 각 객체의 pg_highlight 속성에 액세스할 수 있습니다.

 class Person < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :search ,
                  against : :bio ,
                  using : {
                    tsearch : {
                      highlight : {
                        StartSel : '<b>' ,
                        StopSel : '</b>' ,
                        MaxWords : 123 ,
                        MinWords : 456 ,
                        ShortWord : 4 ,
                        HighlightAll : true ,
                        MaxFragments : 3 ,
                        FragmentDelimiter : '&hellip;'
                      }
                    }
                  }
end

Person . create! ( :bio => "Born in rural Alberta, where the buffalo roam." )

first_match = Person . search ( "Alberta" ) . with_pg_search_highlight . first
first_match . pg_search_highlight # => "Born in rural <b>Alberta</b>, where the buffalo roam."

하이라이트 옵션은 ts_headline에서 지원하는 모든 옵션을 허용하고 PostgreSQL의 기본값을 사용합니다.

각 옵션의 의미에 대한 자세한 내용은 설명서를 참조하세요.

:dmetaphone (더블 메타폰 유사음향 검색)

Double Metaphone은 철자가 매우 다르더라도 비슷하게 들리는 단어를 일치시키는 알고리즘입니다. 예를 들어, "Geoff"와 "Jeff"는 소리가 동일하므로 일치합니다. 현재는 첫 번째 메타폰만 검색에 사용되므로 이는 진정한 이중 메타폰이 아닙니다.

Double Metaphone 지원은 현재 이 기능을 사용하기 전에 설치해야 하는 fuzzystrmatch 확장의 일부로 제공됩니다. 확장 외에도 데이터베이스에 유틸리티 기능을 설치해야 합니다. 이에 대한 마이그레이션을 생성하고 실행하려면 다음을 실행하세요.

 $ rails g pg_search:migration:dmetaphone
$ rake db:migrate

다음 예에서는 :dmetaphone을 사용하는 방법을 보여줍니다.

 class Word < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :that_sounds_like ,
                  against : :spelling ,
                  using : :dmetaphone
end

four = Word . create! spelling : 'four'
far = Word . create! spelling : 'far'
fur = Word . create! spelling : 'fur'
five = Word . create! spelling : 'five'

Word . that_sounds_like ( "fir" ) # => [four, far, fur]

:trigram(트라이그램 검색)

트라이그램 검색은 쿼리와 텍스트 간에 일치하는 세 글자 하위 문자열(또는 "트라이그램") 수를 계산하여 작동합니다. 예를 들어 문자열 "Lorem ipsum"은 다음 트라이그램으로 분할될 수 있습니다.

 [" Lo", "Lor", "ore", "rem", "em ", "m i", " ip", "ips", "psu", "sum", "um ", "m  "]

트라이그램 검색에는 쿼리나 텍스트의 오타나 철자가 틀린 경우에도 작동할 수 있는 기능이 있습니다.

트라이그램 지원은 현재 이 기능을 사용하기 전에 설치해야 하는 pg_trgm 확장의 일부로 제공됩니다.

 class Website < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :kinda_spelled_like ,
                  against : :name ,
                  using : :trigram
end

yahooo = Website . create! name : "Yahooo!"
yohoo = Website . create! name : "Yohoo!"
gogle = Website . create! name : "Gogle"
facebook = Website . create! name : "Facebook"

Website . kinda_spelled_like ( "Yahoo!" ) # => [yahooo, yohoo]

:한계점

기본적으로 트라이그램 검색은 pg_trgm의 계산을 사용하여 최소 0.3의 유사성을 갖는 레코드를 찾습니다. 원하는 경우 사용자 정의 임계값을 지정할 수 있습니다. 숫자가 높을수록 더 엄격하게 일치하므로 더 적은 수의 결과가 반환됩니다. 숫자가 낮을수록 더 허용적으로 일치하여 더 많은 결과를 얻을 수 있습니다. 파생 쿼리가 % 연산자 대신 similarity() 함수를 사용하므로 트라이그램 임계값을 설정하면 테이블 스캔이 강제로 수행됩니다.

 class Vegetable < ActiveRecord :: Base
  include PgSearch :: Model

  pg_search_scope :strictly_spelled_like ,
                  against : :name ,
                  using : {
                    trigram : {
                      threshold : 0.5
                    }
                  }

  pg_search_scope :roughly_spelled_like ,
                  against : :name ,
                  using : {
                    trigram : {
                      threshold : 0.1
                    }
                  }
end

cauliflower = Vegetable . create! name : "cauliflower"

Vegetable . roughly_spelled_like ( "couliflower" ) # => [cauliflower]
Vegetable . strictly_spelled_like ( "couliflower" ) # => [cauliflower]

Vegetable . roughly_spelled_like ( "collyflower" ) # => [cauliflower]
Vegetable . strictly_spelled_like ( "collyflower" ) # => []

:단어_유사성

긴 문자열의 단어를 일치시킬 수 있습니다. 기본적으로 트라이그램 검색은 % 또는 similarity() 유사성 값으로 사용합니다. 대신 <% 및 word_similarity 선택하려면 word_similarity true 로 설정하세요. 이로 인해 트라이그램 검색에서는 쿼리 용어와 가장 유사성이 높은 단어의 유사성을 사용하게 됩니다.

 class Sentence < ActiveRecord :: Base
  include PgSearch :: Model

  pg_search_scope :similarity_like ,
                  against : :name ,
                  using : {
                    trigram : {
                      word_similarity : true
                    }
                  }

  pg_search_scope :word_similarity_like ,
                  against : :name ,
                  using : [ :trigram ]
end

sentence = Sentence . create! name : "Those are two words."

Sentence . similarity_like ( "word" ) # => []
Sentence . word_similarity_like ( "word" ) # => [sentence]

기능 결합 시 필드 제한

때로는 다양한 기능을 결합한 쿼리를 수행할 때 특정 기능이 있는 일부 필드에 대해서만 검색하고 싶을 수도 있습니다. 예를 들어 임계값을 과도하게 줄일 필요가 없도록 더 짧은 필드에 대해서만 트라이그램 검색을 수행하려고 할 수 있습니다. 'only' 옵션을 사용하여 어떤 필드를 지정할 수 있습니다.

 class Image < ActiveRecord :: Base
  include PgSearch :: Model

  pg_search_scope :combined_search ,
                  against : [ :file_name , :short_description , :long_description ]
                  using : {
                    tsearch : { dictionary : 'english' } ,
                    trigram : {
                      only : [ :file_name , :short_description ]
                    }
                  }

end

이제 다음을 사용하여 file_name: 'image_foo.jpg' 및 long_description: '이 설명이 너무 길어서 트라이그램 검색이 합리적인 임계값 제한을 통과하지 못하게 만듭니다.'를 사용하여 이미지를 성공적으로 검색할 수 있습니다.

 Image . combined_search ( 'reasonable' ) # found with tsearch
Image . combined_search ( 'foo' ) # found with trigram

악센트 표시 무시

대부분의 경우 검색할 때 악센트 표시를 무시하고 싶을 것입니다. 이렇게 하면 "pinata"라는 검색어로 검색할 때 "piñata"와 같은 단어를 찾을 수 있습니다. 악센트를 무시하도록 pg_search_scope를 설정하면 검색 가능한 텍스트와 쿼리 용어 모두에서 악센트가 무시됩니다.

악센트 무시는 이 기능을 사용하기 전에 설치해야 하는 unaccent 확장을 사용합니다.

 class SpanishQuestion < ActiveRecord :: Base
  include PgSearch :: Model
  pg_search_scope :gringo_search ,
                  against : :word ,
                  ignoring : :accents
end

what = SpanishQuestion . create ( word : "Qué" )
how_many = SpanishQuestion . create ( word : "Cuánto" )
how = SpanishQuestion . create ( word : "Cómo" )

SpanishQuestion . gringo_search ( "Que" ) # => [what]
SpanishQuestion . gringo_search ( "Cüåñtô" ) # => [how_many]

고급 사용자는 pg_search가 생성하는 표현식에 대한 색인을 추가할 수 있습니다. 불행하게도 이 확장이 제공하는 unaccent 함수는 색인을 생성할 수 없습니다(PostgreSQL 9.1 기준). 따라서 자신만의 래퍼 함수를 작성하여 대신 사용할 수도 있습니다. 이는 초기화 프로그램에서 다음 코드를 호출하여 구성할 수 있습니다.

 PgSearch . unaccent_function = "my_unaccent"

tsVector 열 사용

PostgreSQL을 사용하면 표현식을 사용하는 대신 tsVector 유형의 열을 검색할 수 있습니다. 이는 tsquery가 평가되는 tsVector의 생성을 오프로드하므로 검색 속도가 크게 향상됩니다.

이 기능을 사용하려면 몇 가지 작업을 수행해야 합니다.

검색하려는 tsVector 유형의 열을 만듭니다. tsearch 및 dmetaphone과 같은 여러 검색 방법을 사용하여 검색하려면 각각에 대한 열이 필요합니다.
해당 검색 유형에 적합한 표현식을 사용하여 열을 업데이트하는 트리거 함수를 만듭니다. 참조: 텍스트 검색 트리거에 대한 PostgreSQL 설명서
테이블에 기존 데이터가 있는 경우 새로 생성된 tsVector 열을 트리거 함수에서 사용하는 표현식으로 업데이트하세요.

pg_search_scope에 옵션을 추가합니다. 예:

 pg_search_scope :fast_content_search ,
                against : :content ,
                using : {
                  dmetaphone : {
                    tsvector_column : 'tsvector_content_dmetaphone'
                  } ,
                  tsearch : {
                    dictionary : 'english' ,
                    tsvector_column : 'tsvector_content_tsearch'
                  } ,
                  trigram : { } # trigram does not use tsvectors
                }

:against 열은 검색 유형에 대해 tsVector_column이 없는 경우에만 사용됩니다.

여러 ts벡터 결합

한 번에 두 개 이상의 tsVector를 검색하는 것이 가능합니다. 이는 여러 검색 범위를 유지하고 싶지만 각 범위에 대해 별도의 ts벡터를 유지하고 싶지 않은 경우 유용할 수 있습니다. 예를 들어:

 pg_search_scope :search_title ,
                against : :title ,
                using : {
                  tsearch : {
                    tsvector_column : "title_tsvector"
                  }
                }

pg_search_scope :search_body ,
                against : :body ,
                using : {
                  tsearch : {
                    tsvector_column : "body_tsvector"
                  }
                }

pg_search_scope :search_title_and_body ,
                against : [ :title , :body ] ,
                using : {
                  tsearch : {
                    tsvector_column : [ "title_tsvector" , "body_tsvector" ]
                  }
                }

순위 및 순서 구성

:ranked_by (순위 알고리즘 선택)

기본적으로 pg_search는 검색 가능한 텍스트와 쿼리 간의 :tsearch 유사성을 기준으로 결과 순위를 매깁니다. 다른 순위 알고리즘을 사용하려면 :ranked_by 옵션을 pg_search_scope에 전달할 수 있습니다.

 pg_search_scope :search_by_tsearch_but_rank_by_trigram ,
                against : :title ,
                using : [ :tsearch ] ,
                ranked_by : ":trigram"

:ranked_by는 문자열을 사용하여 순위 표현식을 나타냅니다. 이는 더 복잡한 가능성을 허용합니다. ":tsearch", ":trigram", ":dmetaphone"과 같은 문자열은 자동으로 적절한 SQL 표현식으로 확장됩니다.

 # Weighted ranking to balance multiple approaches
ranked_by : ":dmetaphone + (0.25 * :trigram)"

# A more complex example, where books.num_pages is an integer column in the table itself
ranked_by : "(books.num_pages * :trigram) + (:tsearch / 2.0)"

:order_within_rank (동점 끊기)

PostgreSQL은 ORDER BY 절에 여러 레코드의 값이 동일한 경우 일관된 순서를 보장하지 않습니다. 이로 인해 페이지 매김에 문제가 발생할 수 있습니다. 12개의 레코드가 모두 동일한 순위 값을 갖는 경우를 상상해 보세요. kaminari 또는 will_paginate와 같은 페이지 매김 라이브러리를 사용하여 10페이지의 결과를 반환하는 경우 1페이지에 10개의 레코드가 표시되고 나머지 2개의 레코드가 다음 페이지 상단에 표시될 것으로 예상할 수 있습니다. 순위 결과.

그러나 일관된 순서가 없기 때문에 PostgreSQL은 서로 다른 SQL 문 사이에서 해당 12개 레코드의 순서를 재정렬하도록 선택할 수 있습니다. 2페이지의 1페이지에서도 동일한 레코드 중 일부를 얻게 될 수도 있고, 마찬가지로 전혀 표시되지 않는 레코드도 있을 수 있습니다.

pg_search는 위에서 설명한 :ranked_by 표현식 뒤에 ORDER BY 절에 두 번째 표현식을 추가하여 이 문제를 해결합니다. 기본적으로 순위결정 순서는 ID별로 오름차순입니다.

 ORDER BY [complicated :ranked_by expression...], id ASC

특히 오래된 레코드가 새 레코드보다 순위가 높은 것을 원하지 않는 경우에는 이는 애플리케이션에 바람직하지 않을 수 있습니다. :order_within_rank를 전달하면 대체 순위결정 표현식을 지정할 수 있습니다. 일반적인 예는 가장 최근에 업데이트된 레코드의 순위를 먼저 지정하기 위해 update_at를 기준으로 내림차순을 수행하는 것입니다.

 pg_search_scope :search_and_break_ties_by_latest_update ,
                against : [ :title , :content ] ,
                order_within_rank : "blog_posts.updated_at DESC"

PgSearch#pg_search_rank(레코드의 순위를 Float로 읽기)

특정 기록의 순위를 확인하는 것은 유용하거나 흥미로울 수 있습니다. 이는 한 레코드가 다른 레코드보다 순위가 높은 이유를 디버깅하는 데 도움이 될 수 있습니다. 또한 이를 사용하여 애플리케이션의 최종 사용자에게 일종의 관련성 값을 표시할 수도 있습니다.

순위를 검색하려면 범위에서 .with_pg_search_rank 호출한 다음 반환된 레코드에서 .pg_search_rank 호출합니다.

 shirt_brands = ShirtBrand . search_by_name ( "Penguin" ) . with_pg_search_rank
shirt_brands [ 0 ] . pg_search_rank #=> 0.0759909
shirt_brands [ 1 ] . pg_search_rank #=> 0.0607927

검색 순위 및 연결된 범위

각 PgSearch 범위는 검색 순위에 대한 명명된 하위 쿼리를 생성합니다. 여러 범위를 연결하는 경우 PgSearch는 각 범위에 대한 순위 쿼리를 생성하므로 순위 쿼리에는 고유한 이름이 있어야 합니다. 순위 쿼리(예: GROUP BY 절)를 참조해야 하는 경우 쿼리된 테이블의 이름을 전달하여 PgScope::Configuration.alias 메서드로 하위 쿼리 이름을 다시 생성할 수 있습니다.

 shirt_brands = ShirtBrand . search_by_name ( "Penguin" )
  . joins ( :shirt_sizes )
  . group ( "shirt_brands.id, #{ PgSearch :: Configuration . alias ( 'shirt_brands' ) } .rank" )

속성

PgSearch는 texticle(현재 textacular로 이름 변경)에서 영감을 받지 않았다면 불가능했을 것입니다. 원본 버전을 제공한 Aaron Patterson과 커뮤니티에 선물을 주신 Casebook PBC(https://www.casebook.net)에게 감사드립니다!

기여 및 피드백

CONTRIBUTING 가이드를 읽어보세요.

또한 pg_search 및 기타 Casebook PBC 오픈 소스 프로젝트에 대해 논의하기 위한 Google 그룹도 있습니다.

특허

확장하다