PsychWordVec下載 - PsychWordVec原始碼下載

PsychWordVec

Ai源碼

1.0.0

下載

PsychWordVec

Author

Han-Wu-Shuang (Bruce) Bao 包寒吳霜

? [email protected]

? psychbruce.github.io

Citation

Bao, H.-W.-S. (2022). PsychWordVec : Word embedding research framework for psychological science . https://CRAN.R-project.org/package=PsychWordVec
- Note : This is the original citation format. Please refer to the information when you library( PsychWordVec ) for the APA-7 format of your installed version.
Bao, H.-W.-S., Wang, Z.-X., Cheng, X., Su, Z., Yang, Y., Zhang, G.-Y., Wang, B., & Cai, H. (2023). Using word embeddings to investigate human psychology: Methods and applications. Advances in Psychological Science, 31 (6), 887--904.
[包寒吳霜, 王梓西, 程曦, 蘇展, 楊盈, 張光耀, 王博, 蔡華儉. (2023). 基於詞嵌入技術的心理學研究：方法與應用.心理科學進展, 31 (6) , 887--904.]

Installation

PsychWordVec") ## Method 2: Install from GitHub install.packages("devtools") devtools::install_github("psychbruce/ PsychWordVec ", force=TRUE)">

 # # Method 1: Install from CRAN
install.packages( " PsychWordVec " )

# # Method 2: Install from GitHub
install.packages( " devtools " )
devtools :: install_github( " psychbruce/ PsychWordVec " , force = TRUE )

Types of Data for `PsychWordVec`

	`embed`	`wordvec`
Basic class	matrix	data.table
Row size	vocabulary size	vocabulary size
Column size	dimension size	2 (variables: `word` , `vec` )
Advantage	faster (with matrix operation)	easier to inspect and manage
Function to get	`as_embed()`	`as_wordvec()`
Function to load	`load_embed()`	`load_wordvec()`

: Note: Word embedding refers to a natural language processing technique that embeds word semantics into a low-dimensional embedding matrix , with each word (actually token) quantified as a numergronced Users 片形 v. suggested to import word vectors data as the embed class using the function load_embed() , which would automatically normalize all word vectors to the unit length 1 (see the normalize() PsychWordVec ) and

Functions in `PsychWordVec`

Word Embeddings Data Management and Transformation
- as_embed() : from wordvec (data.table) to embed (matrix)
- as_wordvec() : from embed (matrix) to wordvec (data.table)
- load_embed() : load word embeddings data as embed (matrix)
- load_wordvec() : load word embeddings data as wordvec (data.table)
- data_transform() : transform plain text word vectors to wordvec or embed
Word Vectors Extraction, Linear Operation, and Visualization
- subset() : extract a subset of wordvec and embed
- normalize() : normalize all word vectors to the unit length 1
- get_wordvec() : extract word vectors
- sum_wordvec() : calculate the sum vector of multiple words
- plot_wordvec() : visualize word vectors
- plot_wordvec_tSNE() : 2D or 3D visualization with t-SNE
- orth_procrustes() : Orthogonal Procrustes matrix alignment
Word Semantic Similarity Analysis, Network Analysis, and Association Test
- cosine_similarity() : cos_sim() or cos_dist()
- pair_similarity() : compute a similarity matrix of word pairs
- plot_similarity() : visualize similarities of word pairs
- tab_similarity() : tabulate similarities of word pairs
- most_similar() : find the Top-N most similar words
- plot_network() : visualize a (partial correlation) network graph of words
- test_WEAT() : WEAT and SC-WEAT with permutation test of significance
- test_RND() : RND with permutation test of significance
Dictionary Automatic Expansion and Reliability Analysis
- dict_expand() : expand a dictionary from the most similar words
- dict_reliability() : reliability analysis and PCA of a dictionary
Local Training of Static Word Embeddings (Word2Vec, GloVe, and FastText)
- tokenize() : tokenize raw text
- train_wordvec() : train static word embeddings
Pre-trained Language Models (PLM) and Contextualized Word Embeddings
- text_init() : set up a Python environment for PLM
- text_model_download() : download PLMs from Hugging Face to local ".cache" folder
- text_model_remove() : remove PLMs from local ".cache" folder
- text_to_vec() : extract contextualized token and text embeddings
- text_unmask() : <deprecated> <please use FMAT> fill in the blank mask(s) in a query