ruby spacy Download - ruby spacy Quellcode herunterladen

? rubinrot-spazig

Überblick

ruby-spacy ist ein Wrapper-Modul zur Nutzung von spaCy aus der Programmiersprache Ruby über PyCall. Dieses Modul soll Ruby-Programmierern die Verwendung von spaCy einfach und natürlich machen. Dieses Modul deckt die Bereiche der spaCy-Funktionalität zur Verwendung vieler Varianten seiner Sprachmodelle ab, nicht zum Erstellen solcher.

	Funktionalität
✅	Tokenisierung, Lemmatisierung, Satzsegmentierung
✅	Wortartkennzeichnung und Abhängigkeitsanalyse
✅	Erkennung benannter Entitäten
✅	Visualisierung syntaktischer Abhängigkeiten
✅	Zugriff auf vorab trainierte Wortvektoren
✅	OpenAI Chat/Completion/Embeddings API-Integration

Aktuelle Version: 0.2.3

spaCy 3.7.0 unterstützt
OpenAI-API-Integration

Installation der Voraussetzungen

WICHTIG : Stellen Sie sicher, dass die Option enable-shared in Ihrer Python-Installation aktiviert ist. Sie können pyenv verwenden, um jede beliebige Python-Version zu installieren. Installieren Sie beispielsweise Python 3.10.6, indem Sie pyenv mit enable-shared wie folgt verwenden:

$ env CONFIGURE_OPTS= " --enable-shared " pyenv install 3.10.6

Denken Sie daran, den Zugriff über Ihr Arbeitsverzeichnis zu ermöglichen. Es wird empfohlen, global die Version von Python festzulegen, die Sie gerade installiert haben.

$ pyenv global 3.10.6

Installieren Sie dann spaCy. Wenn Sie pip verwenden, reicht der folgende Befehl aus:

$ pip install spacy

Installieren Sie trainierte Sprachmodelle. Zunächst einmal ist en_core_web_sm am nützlichsten, um eine grundlegende Textverarbeitung auf Englisch durchzuführen. Wenn Sie jedoch erweiterte Funktionen von spaCy nutzen möchten, wie z. B. die Erkennung benannter Entitäten oder die Berechnung der Dokumentähnlichkeit, sollten Sie auch ein größeres Modell wie en_core_web_lg installieren.

$ python -m spacy download en_core_web_sm
$ python -m spacy download en_core_web_lg

Weitere Modelle in verschiedenen Sprachen finden Sie unter Spacy: Modelle und Sprachen. Um beispielsweise Modelle für die japanische Sprache zu installieren, können Sie wie folgt vorgehen:

$ python -m spacy download ja_core_news_sm
$ python -m spacy download ja_core_news_lg

Installation von Ruby-Spacy

Fügen Sie diese Zeile zur Gemfile Ihrer Anwendung hinzu:

 gem 'ruby-spacy'

Und dann ausführen:

 $ bundle install

Oder installieren Sie es selbst als:

 $ gem install ruby-spacy

Verwendung

Siehe Beispiele unten.

Beispiele

Viele der folgenden Beispiele sind Python-zu-Ruby-Übersetzungen von Codefragmenten in spaCy 101. Weitere Beispiele finden Sie im examples .

Tokenisierung

→ spaCy: Tokenisierung

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )

doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

row = [ ]

doc . each do | token |
  row << token . text
end

headings = [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]
table = Terminal :: Table . new rows : [ row ] , headings : headings

puts table

Ausgabe:

1	2	3	4	5	6	7	8	9	10	11
Apfel	Ist	suchen	bei	Kauf	Vereinigtes Königreich	Start-up	für	$	1	Milliarde

Wortart und Abhängigkeit

→ spaCy: Wortart-Tags und Abhängigkeiten

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

headings = [ "text" , "lemma" , "pos" , "tag" , "dep" ]
rows = [ ]

doc . each do | token |
  rows << [ token . text , token . lemma , token . pos , token . tag , token . dep ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Text	Lemma	Pos	Etikett	abt
Apfel	Apfel	PROPN	NNP	nsubj
Ist	Sei	AUX	VBZ	Aux
suchen	sehen	VERB	VBG	WURZEL
bei	bei	ADP	IN	Vorbereitung
Kauf	kaufen	VERB	VBG	pcomp
Vereinigtes Königreich	Vereinigtes Königreich	PROPN	NNP	dobj
Start-up	Start-up	SUBSTANTIV	NN	advcl
für	für	ADP	IN	Vorbereitung
$	$	SYM	$	Quantmod
1	1	ANZAHL	CD	Verbindung
Milliarde	Milliarde	ANZAHL	CD	pobj

Wortart und Abhängigkeit (Japanisch)

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )
doc = nlp . read ( "任天堂は1983年にファミコンを14,800円で発売した。" )

headings = [ "text" , "lemma" , "pos" , "tag" , "dep" ]
rows = [ ]

doc . each do | token |
  rows << [ token . text , token . lemma , token . pos , token . tag , token . dep ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Text	Lemma	Pos	Etikett	abt
任天堂	任天堂	PROPN	名詞-固有名詞-一般	nsubj
は	は	ADP	助詞-係助詞	Fall
1983	1983	ANZAHL	名詞-数詞	Nummod
年	年	SUBSTANTIV	名詞-普通名詞-助数詞可能	obl
に	に	ADP	助詞-格助詞	Fall
ファミコン	ファミコン	SUBSTANTIV	名詞-普通名詞-一般	obj
を	を	ADP	助詞-格助詞	Fall
14.800	14.800	ANZAHL	名詞-数詞	behoben
円	円	SUBSTANTIV	名詞-普通名詞-助数詞可能	obl
で	で	ADP	助詞-格助詞	Fall
発売	発売	VERB	名詞-普通名詞-サ変可能	WURZEL
し	Ja	AUX	動詞-非自立可能	Aux
た	た	AUX	助動詞	Aux
。	。	PUNKT	補助記号-句点	Punkt

Morphologie

→ POS- und Morphologie-Tags

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

headings = [ "text" , "shape" , "is_alpha" , "is_stop" , "morphology" ]
rows = [ ]

doc . each do | token |
  morph = token . morphology . map do | k , v |
    " #{ k } = #{ v } "
  end . join ( " n " )
  rows << [ token . text , token . shape , token . is_alpha , token . is_stop , morph ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Text	Form	is_alpha	is_stop	Morphologie
Apfel	Xxxxx	WAHR	FALSCH	NounType = Prop Zahl = Singen
Ist	xx	WAHR	WAHR	Stimmung = Ind Zahl = Singen Person = 3 Zeitform = Präs VerbForm = Fin
suchen	xxxx	WAHR	FALSCH	Aspekt = Prog Zeitform = Präs VerbForm = Teil
bei	xx	WAHR	WAHR
Kauf	xxxx	WAHR	FALSCH	Aspekt = Prog Zeitform = Präs VerbForm = Teil
Vereinigtes Königreich	XX	FALSCH	FALSCH	NounType = Prop Zahl = Singen
Start-up	xxxx	WAHR	FALSCH	Zahl = Singen
für	xxx	WAHR	WAHR
$	$	FALSCH	FALSCH
1	D	FALSCH	FALSCH	NumType = Karte
Milliarde	xxxx	WAHR	FALSCH	NumType = Karte

Abhängigkeit visualisieren

→ spaCy: Visualisierer

Ruby-Code:

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "en_core_web_sm" )

sentence = "Autonomous cars shift insurance liability toward manufacturers"
doc = nlp . read ( sentence )

dep_svg = doc . displacy ( style : "dep" , compact : false )

File . open ( File . join ( "test_dep.svg" ) , "w" ) do | file |
  file . write ( dep_svg )
end

Ausgabe:

Abhängigkeit visualisieren (kompakt)

Ruby-Code:

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "en_core_web_sm" )

sentence = "Autonomous cars shift insurance liability toward manufacturers"
doc = nlp . read ( sentence )

dep_svg = doc . displacy ( style : "dep" , compact : true )

File . open ( File . join ( "test_dep_compact.svg" ) , "w" ) do | file |
  file . write ( dep_svg )
end

Ausgabe:

Anerkennung benannter Entitäten

→ spaCy: Benannte Entitäten

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "Apple is looking at buying U.K. startup for $1 billion" )

rows = [ ]

doc . ents . each do | ent |
  rows << [ ent . text , ent . start_char , ent . end_char , ent . label ]
end

headings = [ "text" , "start_char" , "end_char" , "label" ]
table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Text	start_char	end_char	Etikett
Apfel	0	5	ORG
Vereinigtes Königreich	27	31	GPE
1 Milliarde Dollar	44	54	GELD

Anerkennung benannter Unternehmen (Japanisch)

Ruby-Code:

 require ( "ruby-spacy" )
require "terminal-table"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )

sentence = "任天堂は1983年にファミコンを14,800円で発売した。"
doc = nlp . read ( sentence )

rows = [ ]

doc . ents . each do | ent |
  rows << [ ent . text , ent . start_char , ent . end_char , ent . label ]
end

headings = [ "text" , "start" , "end" , "label" ]
table = Terminal :: Table . new rows : rows , headings : headings
print table

Ausgabe:

Text	Start	Ende	Etikett
任天堂	0	3	ORG
1983	4	9	DATUM
ファミコン	10	15	PRODUKT
14.800円	16	23	GELD

Überprüfen der Verfügbarkeit von Wortvektoren

→ spaCy: Wortvektoren und Ähnlichkeit

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_lg" )
doc = nlp . read ( "dog cat banana afskfsd" )

rows = [ ]

doc . each do | token |
  rows << [ token . text , token . has_vector , token . vector_norm , token . is_oov ]
end

headings = [ "text" , "has_vector" , "vector_norm" , "is_oov" ]
table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Text	has_vector	vector_norm	is_oov
Hund	WAHR	7.0336733	FALSCH
Katze	WAHR	6.6808186	FALSCH
Banane	WAHR	6.700014	FALSCH
afskfsd	FALSCH	0,0	WAHR

Ähnlichkeitsberechnung

Ruby-Code:

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "en_core_web_lg" )
doc1 = nlp . read ( "I like salty fries and hamburgers." )
doc2 = nlp . read ( "Fast food tastes very good." )

puts "Doc 1: " + doc1 . text
puts "Doc 2: " + doc2 . text
puts "Similarity: #{ doc1 . similarity ( doc2 ) } "

Ausgabe:

 Doc 1: I like salty fries and hamburgers.
Doc 2: Fast food tastes very good.
Similarity: 0.7687607012190486

Ähnlichkeitsberechnung (Japanisch)

Ruby-Code:

 require "ruby-spacy"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )
ja_doc1 = nlp . read ( "今日は雨ばっかり降って、嫌な天気ですね。" )
puts "doc1: #{ ja_doc1 . text } "
ja_doc2 = nlp . read ( "あいにくの悪天候で残念です。" )
puts "doc2: #{ ja_doc2 . text } "
puts "Similarity: #{ ja_doc1 . similarity ( ja_doc2 ) } "

Ausgabe:

 doc1: 今日は雨ばっかり降って、嫌な天気ですね。
doc2: あいにくの悪天候で残念です。
Similarity: 0.8684192637149641

Wortvektorberechnung

Tokio – Japan + Frankreich = Paris?

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "en_core_web_lg" )

tokyo = nlp . get_lexeme ( "Tokyo" )
japan = nlp . get_lexeme ( "Japan" )
france = nlp . get_lexeme ( "France" )

query = tokyo . vector - japan . vector + france . vector

headings = [ "rank" , "text" , "score" ]
rows = [ ]

results = nlp . most_similar ( query , 10 )
results . each_with_index do | lexeme , i |
  index = ( i + 1 ) . to_s
  rows << [ index , lexeme . text , lexeme . score ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Rang	Text	Punktzahl
1	FRANKREICH	0,8346999883651733
2	Frankreich	0,8346999883651733
3	Frankreich	0,8346999883651733
4	PARIS	0,7703999876976013
5	Paris	0,7703999876976013
6	Paris	0,7703999876976013
7	TOULOUSE	0,6381999850273132
8	Toulouse	0,6381999850273132
9	Toulouse	0,6381999850273132
10	Marseille	0,6370999813079834

Wortvektorberechnung (Japanisch)

東京 - 日本 + フランス = パリ ?

Ruby-Code:

 require "ruby-spacy"
require "terminal-table"

nlp = Spacy :: Language . new ( "ja_core_news_lg" )

tokyo = nlp . get_lexeme ( "東京" )
japan = nlp . get_lexeme ( "日本" )
france = nlp . get_lexeme ( "フランス" )

query = tokyo . vector - japan . vector + france . vector

headings = [ "rank" , "text" , "score" ]
rows = [ ]

results = nlp . most_similar ( query , 10 )
results . each_with_index do | lexeme , i |
  index = ( i + 1 ) . to_s
  rows << [ index , lexeme . text , lexeme . score ]
end

table = Terminal :: Table . new rows : rows , headings : headings
puts table

Ausgabe:

Rang	Text	Punktzahl
1	パリ	0,7376999855041504
2	フランス	0,7221999764442444
3	Nein	0,6697999835014343
4	ストラスブール	0,631600022315979
5	リヨン	0,5939000248908997
6	Paris	0,574400007724762
7	ベルギー	0,5683000087738037
8	ニース	0,5679000020027161
9	アルザス	0,5644999742507935
10	南仏	0,5547999739646912

OpenAI-API-Integration

️ Diese Funktion ist derzeit experimentell. Einzelheiten können sich ändern. Die verfügbaren Parameter ( max_tokens , temperature usw.) finden Sie in der API-Referenz von OpenAI und in Ruby OpenAI.

Nutzen Sie GPT-Modelle ganz einfach in Ruby-Spacy, indem Sie einen OpenAI-API-Schlüssel verwenden. Beim Erstellen von Eingabeaufforderungen für die Methode Doc::openai_query können Sie die folgenden Token-Eigenschaften des Dokuments einbeziehen. Diese Eigenschaften werden durch Funktionsaufrufe abgerufen (bei Bedarf intern von GPT durchgeführt) und nahtlos in Ihre Eingabeaufforderung integriert. Beachten Sie, dass Funktionsaufrufe gpt-4o-mini oder höher benötigen. Zu den verfügbaren Eigenschaften gehören:

surface
lemma
tag
pos (Teil der Rede)
dep (Abhängigkeit)
ent_type (Entitätstyp)
morphology

GPT-Eingabeaufforderung (Übersetzung)

Ruby-Code:

 require "ruby-spacy"

api_key = ENV [ "OPENAI_API_KEY" ]
nlp = Spacy :: Language . new ( "en_core_web_sm" )
doc = nlp . read ( "The Beatles released 12 studio albums" )

# default parameter values
# max_tokens: 1000
# temperature: 0.7
# model: "gpt-4o-mini"
res1 = doc . openai_query (
  access_token : api_key ,
  prompt : "Translate the text to Japanese."
)
puts res1

Ausgabe:

12 Monate.

GPT-Eingabeaufforderung (Ausarbeitung)

Ruby-Code:

 require "ruby-spacy"

api_key = ENV [ "OPENAI_API_KEY" ]
nlp = Spacy

Expandieren

ruby spacy

? rubinrot-spazig

Überblick

Installation der Voraussetzungen

Installation von Ruby-Spacy

Verwendung

Beispiele

Tokenisierung

Wortart und Abhängigkeit

Wortart und Abhängigkeit (Japanisch)

Morphologie

Abhängigkeit visualisieren

Abhängigkeit visualisieren (kompakt)

Anerkennung benannter Entitäten

Anerkennung benannter Unternehmen (Japanisch)

Überprüfen der Verfügbarkeit von Wortvektoren

Ähnlichkeitsberechnung

Ähnlichkeitsberechnung (Japanisch)

Wortvektorberechnung

Wortvektorberechnung (Japanisch)

OpenAI-API-Integration

GPT-Eingabeaufforderung (Übersetzung)

GPT-Eingabeaufforderung (Ausarbeitung)

Rubys Fallstricke

Ruby-Anfänger-Tutorial

Einführungs-Tutorial zur Ruby-Sprache

Ruby Way Ruby Programmer 2. Auflage

Ruby on Rails-Übung

Rollen mit Ruby on Rails

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind