該軟體包將句子轉換器(也稱為句子 BERT)直接包裝在 spaCy 中。您可以將任何 spaCy 模型中提供的向量替換為專門針對語義相似性進行調整的向量。
正如 STS 基準所示,建議使用以下模型來分析句子相似性。請記住, sentence-transformers
最大序列長度配置為 128。
相容性:
要安裝此軟體包,您可以執行以下命令之一:
pip install spacy-sentence-bert
pip install git+https://github.com/MartinoMensio/spacy-sentence-bert.git
您可以使用 pip 從 GitHub 安裝獨立的 spaCy 套件。如果安裝獨立套件,則可以使用spacy.load
API 直接載入語言模型,無需新增管道階段。此表採用 Sentence Transformers 文件中所列的模型,並顯示一些統計資料以及安裝獨立模型的說明。如果您不想安裝獨立模型,您仍然可以透過新增管道階段來使用它們(請參閱下文)。
句子-BERT名稱 | 模型名稱 | 方面 | 語言 | STS基準測試 | 獨立安裝 |
---|---|---|---|---|---|
paraphrase-distilroberta-base-v1 | en_paraphrase_distilroberta_base_v1 | 第768章 | zh | 81.81 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_paraphrase_distilroberta_base_v1-0.1.2.tar.gz#en_paraphrase_distilroberta_base_v1-0.1.2 |
paraphrase-xlm-r-multilingual-v1 | xx_paraphrase_xlm_r_multilingual_v1 | 第768章 | 50+ | 83.50 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_paraphrase_xlm_r_multilingual_v1-0.1.2.tar.gz#xx_paraphrase_xlm_r_multilingual_v1-0.1.2 |
stsb-roberta-large | en_stsb_roberta_large | 1024 | zh | 86.39 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_roberta_large-0.1.2.tar.gz#en_stsb_roberta_large-0.1.2 |
stsb-roberta-base | en_stsb_roberta_base | 第768章 | zh | 85.44 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_roberta_base-0.1.2.tar.gz#en_stsb_roberta_base-0.1.2 |
stsb-bert-large | en_stsb_bert_large | 1024 | zh | 85.29 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_bert_large-0.1.2.tar.gz#en_stsb_bert_large-0.1.2 |
stsb-distilbert-base | en_stsb_distilbert_base | 第768章 | zh | 85.16 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_distilbert_base-0.1.2.tar.gz#en_stsb_distilbert_base-0.1.2 |
stsb-bert-base | en_stsb_bert_base | 第768章 | zh | 85.14 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_bert_base-0.1.2.tar.gz#en_stsb_bert_base-0.1.2 |
nli-bert-large | en_nli_bert_large | 1024 | zh | 79.19 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_large-0.1.2.tar.gz#en_nli_bert_large-0.1.2 |
nli-distilbert-base | en_nli_distilbert_base | 第768章 | zh | 78.69 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_distilbert_base-0.1.2.tar.gz#en_nli_distilbert_base-0.1.2 |
nli-roberta-large | en_nli_roberta_large | 1024 | zh | 78.69 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_roberta_large-0.1.2.tar.gz#en_nli_roberta_large-0.1.2 |
nli-bert-large-max-pooling | en_nli_bert_large_max_pooling | 1024 | zh | 78.41 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_large_max_pooling-0.1.2.tar.gz#en_nli_bert_large_max_pooling-0.1.2 |
nli-bert-large-cls-pooling | en_nli_bert_large_cls_pooling | 1024 | zh | 78.29 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_large_cls_pooling-0.1.2.tar.gz#en_nli_bert_large_cls_pooling-0.1.2 |
nli-distilbert-base-max-pooling | en_nli_distilbert_base_max_pooling | 第768章 | zh | 77.61 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_distilbert_base_max_pooling-0.1.2.tar.gz#en_nli_distilbert_base_max_pooling-0.1.2 |
nli-roberta-base | en_nli_roberta_base | 第768章 | zh | 77.49 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_roberta_base-0.1.2.tar.gz#en_nli_roberta_base-0.1.2 |
nli-bert-base-max-pooling | en_nli_bert_base_max_pooling | 第768章 | zh | 77.21 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_base_max_pooling-0.1.2.tar.gz#en_nli_bert_base_max_pooling-0.1.2 |
nli-bert-base | en_nli_bert_base | 第768章 | zh | 77.12 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_base-0.1.2.tar.gz#en_nli_bert_base-0.1.2 |
nli-bert-base-cls-pooling | en_nli_bert_base_cls_pooling | 第768章 | zh | 76.30 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_base_cls_pooling-0.1.2.tar.gz#en_nli_bert_base_cls_pooling-0.1.2 |
average_word_embeddings_glove.6B.300d | en_average_word_embeddings_glove.6B.300d | 第768章 | zh | 61.77 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_glove.6B.300d-0.1.2.tar.gz#en_average_word_embeddings_glove.6B.300d-0.1.2 |
average_word_embeddings_komninos | en_average_word_embeddings_komninos | 第768章 | zh | 61.56 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_komninos-0.1.2.tar.gz#en_average_word_embeddings_komninos-0.1.2 |
average_word_embeddings_levy_dependency | en_average_word_embeddings_levy_dependency | 第768章 | zh | 59.22 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_levy_dependency-0.1.2.tar.gz#en_average_word_embeddings_levy_dependency-0.1.2 |
average_word_embeddings_glove.840B.300d | en_average_word_embeddings_glove.840B.300d | 第768章 | zh | 52.54 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_glove.840B.300d-0.1.2.tar.gz#en_average_word_embeddings_glove.840B.300d-0.1.2 |
quora-distilbert-base | en_quora_distilbert_base | 第768章 | zh | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_quora_distilbert_base-0.1.2.tar.gz#en_quora_distilbert_base-0.1.2 |
quora-distilbert-multilingual | xx_quora_distilbert_multilingual | 第768章 | 50+ | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_quora_distilbert_multilingual-0.1.2.tar.gz#xx_quora_distilbert_multilingual-0.1.2 |
msmarco-distilroberta-base-v2 | en_msmarco_distilroberta_base_v2 | 第768章 | zh | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_msmarco_distilroberta_base_v2-0.1.2.tar.gz#en_msmarco_distilroberta_base_v2-0.1.2 |
msmarco-roberta-base-v2 | en_msmarco_roberta_base_v2 | 第768章 | zh | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_msmarco_roberta_base_v2-0.1.2.tar.gz#en_msmarco_roberta_base_v2-0.1.2 |
msmarco-distilbert-base-v2 | en_msmarco_distilbert_base_v2 | 第768章 | zh | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_msmarco_distilbert_base_v2-0.1.2.tar.gz#en_msmarco_distilbert_base_v2-0.1.2 |
nq-distilbert-base-v1 | en_nq_distilbert_base_v1 | 第768章 | zh | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nq_distilbert_base_v1-0.1.2.tar.gz#en_nq_distilbert_base_v1-0.1.2 |
distiluse-base-multilingual-cased-v2 | xx_distiluse_base_multilingual_cased_v2 | 第512章 | 50+ | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_distiluse_base_multilingual_cased_v2-0.1.2.tar.gz#xx_distiluse_base_multilingual_cased_v2-0.1.2 |
stsb-xlm-r-multilingual | xx_stsb_xlm_r_multilingual | 第768章 | 50+ | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_stsb_xlm_r_multilingual-0.1.2.tar.gz#xx_stsb_xlm_r_multilingual-0.1.2 |
T-Systems-onsite/cross-en-de-roberta-sentence-transformer | xx_cross_en_de_roberta_sentence_transformer | 第768章 | 恩、德 | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_cross_en_de_roberta_sentence_transformer-0.1.2.tar.gz#xx_cross_en_de_roberta_sentence_transformer-0.1.2 |
LaBSE | xx_LaBSE | 第768章 | 109 | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_LaBSE-0.1.2.tar.gz#xx_LaBSE-0.1.2 |
allenai-specter | en_allenai_specter | 第768章 | zh | 不適用 | pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_allenai_specter-0.1.2.tar.gz#en_allenai_specter-0.1.2 |
如果您的模型不在此列表中(例如xlm-r-base-en-ko-nli-ststb
),您仍然可以將其與此庫一起使用,但不能作為獨立語言。您將需要新增正確配置的管道階段(請參閱下面的nlp.add_pipe
API)。
有不同的方法來載入sentence-bert
的模型。
spacy.load
API:您需要安裝上表中的模型之一spacy_sentence_bert.load_model
:您可以載入上表中的模型之一,而無需安裝獨立套件nlp.add_pipe
API:您可以在nlp
物件之上載入任何sentence-bert
模型spacy.load
API從 GitHub 安裝的獨立模型(例如,從上表中, pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_roberta_large-0.1.2.tar.gz#en_stsb_roberta_large-0.1.2
),您可以使用 spaCy API 直接載入模型:
import spacy
nlp = spacy . load ( 'en_stsb_roberta_large' )
spacy_sentence_bert.load_model
API使用以下方法,無需安裝獨立模型即可獲得相同的結果:
import spacy_sentence_bert
nlp = spacy_sentence_bert . load_model ( 'en_stsb_roberta_large' )
nlp.add_pipe
API如果您想在現有 Language 物件上使用句子嵌入之一,可以使用nlp.add_pipe
方法。如果您想使用上表中未列出的語言模型,這也適用。只要確保句子轉換器支援它即可。
import spacy
nlp = spacy . blank ( 'en' )
nlp . add_pipe ( 'sentence_bert' , config = { 'model_name' : 'allenai-specter' })
nlp . pipe_names
首次使用模型時,會將句子 BERT 下載到環境變數中TORCH_HOME
定義的資料夾(預設~/.cache/torch
)。
載入模型後,透過 spaCy 的vector
屬性和similarity
方法使用它:
# get two documents
doc_1 = nlp ( 'Hi there, how are you?' )
doc_2 = nlp ( 'Hello there, how are you doing today?' )
# get the vector of the Doc, Span or Token
print ( doc_1 . vector . shape )
print ( doc_1 [ 3 ]. vector . shape )
print ( doc_1 [ 2 : 4 ]. vector . shape )
# or use the similarity method that is based on the vectors, on Doc, Span or Token
print ( doc_1 . similarity ( doc_2 [ 0 : 7 ]))
建置並上傳
VERSION=0.1.2
# build the standalone models (17)
./build_models.sh
# build the archive at dist/spacy_sentence_bert-${VERSION}.tar.gz
python setup.py sdist
# upload to pypi
twine upload dist/spacy_sentence_bert- ${VERSION} .tar.gz