ruby spacy下載 - ruby spacy原始碼下載

？紅寶石空間

概述

ruby-spacy是一個包裝模組，用於透過 PyCall 使用 Ruby 程式語言中的 spaCy。該模組旨在讓 Ruby 程式設計師輕鬆自然地使用 spaCy。該模組涵蓋了 spaCy 功能的領域，即使用其多種語言模型，而不是建立語言模型。

	功能性
✅	分詞、詞形還原、句子分割
✅	詞性標註與依賴句法分析
✅	命名實體識別
✅	句法依存可視化
✅	存取預先訓練的詞向量
✅	OpenAI 聊天/完成/嵌入 API 集成

目前版本： 0.2.3

支援 spaCy 3.7.0
OpenAI API 集成

安裝先決條件

重要資訊：確保在 Python 安裝中啟用了enable-shared選項。您可以使用 pyenv 安裝您喜歡的任何版本的 Python。例如，使用具有enable-shared的 pyenv 安裝 Python 3.10.6，如下所示：

$ env CONFIGURE_OPTS= " --enable-shared " pyenv install 3.10.6

請記住使其可以從您的工作目錄中存取。建議將global設定為剛安裝的python版本。

$ pyenv global 3.10.6

然後，安裝 spaCy。如果您使用pip ，則可以執行下列命令：

$ pip install spacy

安裝經過訓練的語言模型。對於初學者來說， en_core_web_sm對於用英語進行基本文字處理是最有用的。但是，如果您想使用 spaCy 的高級功能，例如命名實體識別或文件相似度計算，您還應該安裝更大的模型，例如en_core_web_lg 。

$ python -m spacy download en_core_web_sm
$ python -m spacy download en_core_web_lg

有關各種語言的其他模型，請參閱 Spacy：模型和語言。例如，要安裝日語模型，您可以執行以下操作：

$ python -m spacy download ja_core_news_sm
$ python -m spacy download ja_core_news_lg

ruby-spacy的安裝

將此行新增至應用程式的 Gemfile 中：

 gem 'ruby-spacy'

然後執行：

 $ bundle install

或自己安裝：

 $ gem install ruby-spacy

用法

請參閱下面的範例。

範例

以下許多範例都是 spaCy 101 中程式碼片段examples Python 到 Ruby 翻譯。

代幣化

→ spaCy：標記化

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )

doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

row = [ ]

doc . each do | token |
  row << token . text
end

headings = [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]
table = Terminal :: Table . new rows : [ row ] , headings : headings

puts table

輸出：

1	2	3	4	5	6	7	8	9	10	11
蘋果	是	尋找	在	購買	英國	啟動	為了	$	1	十億

詞性和依賴關係

→ spaCy：詞性標籤與依賴項

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

headings = [ "text" , "lemma" , "pos" , "tag" , "dep" ]
rows = [ ]

doc . each do | token |
  rows << [ token . text , token . lemma , token . pos , token . tag , token . dep ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

文字	引理	位置	標籤	部門
蘋果	蘋果	丙二醇	國家核子計劃	恩蘇吉
是	是	輔助設備	VBZ	輔助
尋找	看	動詞	VBG	根
在	在	腺苷二磷酸	在	準備
購買	買	動詞	VBG	複合材料
英國	英國	丙二醇	國家核子計劃	多吉
啟動	啟動	名詞	神經網路	廣告
為了	為了	腺苷二磷酸	在	準備
$	$	SYM	$	量子模型
1	1	編號	光碟	化合物
十億	十億	編號	光碟	普吉

詞性和依存關係（日文）

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )
doc = nlp . read ( "任天堂は1983年にファミコンを14,800円で発売した。" )

headings = [ "text" , "lemma" , "pos" , "tag" , "dep" ]
rows = [ ]

doc . each do | token |
  rows << [ token . text , token . lemma , token . pos , token . tag , token . dep ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

文字	引理	位置	標籤	部門
任天堂	任天堂	丙二醇	名詞-固有名詞-一般	恩蘇吉
は	は	腺苷二磷酸	助詞-系助詞	案件
1983年	1983年	編號	名詞數詞	數字模數
年	年	名詞	名詞-普通名詞-助數詞可能	奧布爾
に	に	腺苷二磷酸	助詞- 助詞	案件
法米康	法米康	名詞	名詞-普通名詞-一般	物件
を	を	腺苷二磷酸	助詞- 助詞	案件
14,800	14,800	編號	名詞數詞	固定的
円	円	名詞	名詞-普通名詞-助數詞可能	奧布爾
で	で	腺苷二磷酸	助詞- 助詞	案件
発売	発売	動詞	名詞-普通名詞-サ変可能	根
し	する	輔助設備	動詞-非自立可能	輔助
た	た	輔助設備	助動詞	輔助
。	。	旁路CT	補助記號-句點	點

形態學

→ POS 和形態標籤

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

headings = [ "text" , "shape" , "is_alpha" , "is_stop" , "morphology" ]
rows = [ ]

doc . each do | token |
  morph = token . morphology . map do | k , v |
    " #{ k } = #{ v } "
  end . join ( " n " )
  rows << [ token . text , token . shape , token . is_alpha , token . is_stop , morph ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

文字	形狀	is_alpha	是_停止	形態學
蘋果	xxxx	真的	錯誤的	名詞類型 = 道具數字 = 唱歌
是	xx	真的	真的	心情=工業數字 = 唱歌人 = 3 時態 = Pres 動詞形式 = Fin
尋找	xxxx	真的	錯誤的	方面=前衛時態 = Pres 動詞形式=部分
在	xx	真的	真的
購買	xxxx	真的	錯誤的	方面=前衛時態 = Pres 動詞形式=部分
英國	XX	錯誤的	錯誤的	名詞類型 = 道具數字 = 唱歌
啟動	xxxx	真的	錯誤的	數字 = 唱歌
為了	xxx	真的	真的
$	$	錯誤的	錯誤的
1	d	錯誤的	錯誤的	數字類型 = 卡
十億	xxxx	真的	錯誤的	數字類型 = 卡

可視化依賴關係

→ spaCy：視覺化工具

紅寶石代碼：

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "en_core_web_sm" )

sentence = "Autonomous cars shift insurance liability toward manufacturers"
doc = nlp . read ( sentence )

dep_svg = doc . displacy ( style : "dep" , compact : false )

File . open ( File . join ( "test_dep.svg" ) , "w" ) do | file |
  file . write ( dep_svg )
end

輸出：

可視化依賴關係（緊湊）

紅寶石代碼：

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "en_core_web_sm" )

sentence = "Autonomous cars shift insurance liability toward manufacturers"
doc = nlp . read ( sentence )

dep_svg = doc . displacy ( style : "dep" , compact : true )

File . open ( File . join ( "test_dep_compact.svg" ) , "w" ) do | file |
  file . write ( dep_svg )
end

輸出：

命名實體識別

→ spaCy：命名實體

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

rows = [ ]

doc . ents . each do | ent |
  rows << [ ent . text , ent . start_char , ent . end_char , ent . label ]
end

headings = [ "text" , "start_char" , "end_char" , "label" ]
table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

文字	起始字元	結束字符	標籤
蘋果	0	5	奧格
英國	27	31	通用電氣
10億美元	44	54	錢

命名實體辨識（日文）

紅寶石代碼：

 require ( "ruby-spacy" )
require "terminal-table"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )

sentence = "任天堂は1983年にファミコンを14,800円で発売した。"
doc = nlp . read ( sentence )

rows = [ ]

doc . ents . each do | ent |
  rows << [ ent . text , ent . start_char , ent . end_char , ent . label ]
end

headings = [ "text" , "start" , "end" , "label" ]
table = Terminal :: Table . new rows : rows , headings : headings
print table

輸出：

文字	開始	結尾	標籤
任天堂	0	3	奧格
1983年	4	9	日期
法米康	10	15	產品
14,800日元	16	23	錢

檢查詞向量的可用性

→ spaCy：詞向量和相似度

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_lg" )
doc = nlp . read ( "dog cat banana afskfsd" )

rows = [ ]

doc . each do | token |
  rows << [ token . text , token . has_vector , token . vector_norm , token . is_oov ]
end

headings = [ "text" , "has_vector" , "vector_norm" , "is_oov" ]
table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

文字	有向量	向量範數	是_oov
狗	真的	7.0336733	錯誤的
貓	真的	6.6808186	錯誤的
香蕉	真的	6.700014	錯誤的
AFSKFSD	錯誤的	0.0	真的

相似度計算

紅寶石代碼：

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "en_core_web_lg" )
doc1 = nlp . read ( "I like salty fries and hamburgers." )
doc2 = nlp . read ( "Fast food tastes very good." )

puts "Doc 1: " + doc1 . text
puts "Doc 2: " + doc2 . text
puts "Similarity: #{ doc1 . similarity ( doc2 ) } "

輸出：

 Doc 1: I like salty fries and hamburgers.
Doc 2: Fast food tastes very good.
Similarity: 0.7687607012190486

相似度計算（日文）

紅寶石代碼：

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )
ja_doc1 = nlp . read ( "今日は雨ばっかり降って、嫌な天気ですね。" )
puts "doc1: #{ ja_doc1 . text } "
ja_doc2 = nlp . read ( "あいにくの悪天候で残念です。" )
puts "doc2: #{ ja_doc2 . text } "
puts "Similarity: #{ ja_doc1 . similarity ( ja_doc2 ) } "

輸出：

 doc1: 今日は雨ばっかり降って、嫌な天気ですね。
doc2: あいにくの悪天候で残念です。
Similarity: 0.8684192637149641

詞向量計算

東京-日本+法國=巴黎？

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_lg" )

tokyo = nlp . get_lexeme ( "Tokyo" )
japan = nlp . get_lexeme ( "Japan" )
france = nlp . get_lexeme ( "France" )

query = tokyo . vector - japan . vector + france . vector

headings = [ "rank" , "text" , "score" ]
rows = [ ]

results = nlp . most_similar ( query , 10 )
results . each_with_index do | lexeme , i |
  index = ( i + 1 ) . to_s
  rows << [ index , lexeme . text , lexeme . score ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

秩	文字	分數
1	法國	0.8346999883651733
2	法國	0.8346999883651733
3	法國	0.8346999883651733
4	巴黎	0.7703999876976013
5	巴黎	0.7703999876976013
6	巴黎	0.7703999876976013
7	土魯斯	0.6381999850273132
8	土魯斯	0.6381999850273132
9	土魯斯	0.6381999850273132
10	馬賽	0.6370999813079834

詞向量計算（日文）

東京 - 日本 + furansu = パri ?

紅寶石代碼：

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )

tokyo = nlp . get_lexeme ( "東京" )
japan = nlp . get_lexeme ( "日本" )
france = nlp . get_lexeme ( "フランス" )

query = tokyo . vector - japan . vector + france . vector

headings = [ "rank" , "text" , "score" ]
rows = [ ]

results = nlp . most_similar ( query , 10 )
results . each_with_index do | lexeme , i |
  index = ( i + 1 ) . to_s
  rows << [ index , lexeme . text , lexeme . score ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

輸出：

秩	文字	分數
1	パ裡	0.7376999855041504
2	法蘭蘇	0.7221999764442444
3	東京	0.6697999835014343
4	蘇托拉蘇布魯	0.631600022315979
5	里恩	0.5939000248908997
6	巴黎	0.574400007724762
7	ベルギー	0.5683000087738037
8	妮蘇	0.5679000020027161
9	阿魯札斯	0.5644999742507935
10	南仏	0.5547999739646912

OpenAI API 集成

️此功能目前處於實驗階段。詳細資訊可能會發生變化。請參閱 OpenAI 的 API 參考和 Ruby OpenAI 以取得可用參數（ max_tokens 、 temperature等）。

使用 OpenAI API 金鑰輕鬆利用 ruby-spacy 中的 GPT 模型。在為Doc::openai_query方法建立提示時，您可以合併文件的以下標記屬性。這些屬性透過函數呼叫（必要時由 GPT 在內部進行）檢索，並無縫整合到您的提示中。請注意，函數呼叫需要gpt-4o-mini或更高版本。可用的屬性包括：

surface
lemma
tag
pos （詞性）
dep （依賴）
ent_type （實體型別）
morphology

GPT 提示（翻譯）

紅寶石代碼：

 require "ruby-spacy"

api_key = ENV [ "OPENAI_API_KEY" ]
nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "The Beatles released 12 studio albums" )

# default parameter values
# max_tokens: 1000
# temperature: 0.7
# model: "gpt-4o-mini"
res1 = doc . openai_query (
  access_token : api_key ,
  prompt : "Translate the text to Japanese."
)
puts res1

輸出：

ビートルズは12枚のスタジオアルバムをoririsuしました。

GPT 提示（詳細說明）

紅寶石代碼：

 require "ruby-spacy"

api_key = ENV [ "OPENAI_API_KEY" ]
nlp = Spacy

展開

ruby spacy

？紅寶石空間

概述

安裝先決條件

ruby-spacy的安裝

用法

範例

代幣化

詞性和依賴關係

詞性和依存關係（日文）

形態學

可視化依賴關係

可視化依賴關係（緊湊）

命名實體識別

命名實體辨識（日文）

檢查詞向量的可用性

相似度計算

相似度計算（日文）

詞向量計算

詞向量計算（日文）

OpenAI API 集成

GPT 提示（翻譯）

GPT 提示（詳細說明）

Ruby的陷阱

Ruby初級教程

Ruby 語言入門教學

Ruby way Ruby 程式設計師第二版

Ruby on Rails實踐

使用 Ruby on Rails 進行捲動

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind