PsychWordVec
1.0.0
Han-Wu-Shuang (Bruce) Bao 包寒吳霜
? psychbruce.github.io
library( PsychWordVec )
for the APA-7 format of your installed version. # # Method 1: Install from CRAN
install.packages( " PsychWordVec " )
# # Method 2: Install from GitHub
install.packages( " devtools " )
devtools :: install_github( " psychbruce/ PsychWordVec " , force = TRUE )
PsychWordVec
embed | wordvec | |
---|---|---|
Basic class | matrix | data.table |
Row size | vocabulary size | vocabulary size |
Column size | dimension size | 2 (variables: word , vec ) |
Advantage | faster (with matrix operation) | easier to inspect and manage |
Function to get | as_embed() | as_wordvec() |
Function to load | load_embed() | load_wordvec() |
: Note: Word embedding refers to a natural language processing technique that embeds word semantics into a low-dimensional embedding matrix , with each word (actually token) quantified as a numergronced Users 片形 v. suggested to import word vectors data as the embed
class using the function load_embed()
, which would automatically normalize all word vectors to the unit length 1 (see the normalize()
PsychWordVec
) and
PsychWordVec
as_embed()
: from wordvec
(data.table) to embed
(matrix)as_wordvec()
: from embed
(matrix) to wordvec
(data.table)load_embed()
: load word embeddings data as embed
(matrix)load_wordvec()
: load word embeddings data as wordvec
(data.table)data_transform()
: transform plain text word vectors to wordvec
or embed
subset()
: extract a subset of wordvec
and embed
normalize()
: normalize all word vectors to the unit length 1get_wordvec()
: extract word vectorssum_wordvec()
: calculate the sum vector of multiple wordsplot_wordvec()
: visualize word vectorsplot_wordvec_tSNE()
: 2D or 3D visualization with t-SNEorth_procrustes()
: Orthogonal Procrustes matrix alignmentcosine_similarity()
: cos_sim()
or cos_dist()
pair_similarity()
: compute a similarity matrix of word pairsplot_similarity()
: visualize similarities of word pairstab_similarity()
: tabulate similarities of word pairsmost_similar()
: find the Top-N most similar wordsplot_network()
: visualize a (partial correlation) network graph of wordstest_WEAT()
: WEAT and SC-WEAT with permutation test of significancetest_RND()
: RND with permutation test of significancedict_expand()
: expand a dictionary from the most similar wordsdict_reliability()
: reliability analysis and PCA of a dictionarytokenize()
: tokenize raw texttrain_wordvec()
: train static word embeddingstext_init()
: set up a Python environment for PLMtext_model_download()
: download PLMs from Hugging Face to local ".cache" foldertext_model_remove()
: remove PLMs from local ".cache" foldertext_to_vec()
: extract contextualized token and text embeddingstext_unmask()
: <deprecated> <please use FMAT> fill in the blank mask(s) in a querySee the documentation (help pages) for their usage and details.