fouRplebsAPI下载 - fouRplebsAPI源码下载

fouRplebsAPI

其他源码

via the 4plebs.org API

下载

四个API

R 包 fouRplebsAPI 使研究人员能够查询 4plebs.org 存档的 4chan 数据库。该数据库是 imageboard 4chan 上不断消失的帖子的最大持续存档。通过该软件包，研究人员可以使用 4plebs 提供的详细搜索功能并检索 4chan 上通信的结构化数据。

该包基于 4plebs API 文档。

引文

如果 fouRplebsAPI 对您的研究有帮助，请引用为：

布林，K.（2022）。 fouRplebsAPI：用于通过 4plebs.org API 访问 4chan 帖子的 R 包（版本 0.9.0）。 https://doi.org/10.5281/zenodo.6637440

安装

您可以使用以下命令从 GitHub 安装 fouRplebsAPI：

 # install.packages("devtools")
devtools :: install_github( " buehlk/fouRplebsAPI " )

目前 4plebs 覆盖的 4chan 板有：

板名	缩写
政治不正确	波尔
高分辨率	小时
传统游戏	TG
影视	电视
超自然现象	x
狗屎 4chan 说	s4s
汽车	哦
建议	副词
旅行	tv
闪光	f
运动的	sp
我的小政治	米波尔
机甲汽车	莫

搜索 4chan 档案

虽然该软件包包含多个功能，允许研究人员查询和检查特定的 4chan 帖子 (get_4chan_post) 或线程 (get_4chan_thread)，但想要从 4plebs 存档收集数据的研究人员可能会对收集大量数据感兴趣。

收集数据的第一种方法是收集给定板上的最新线程。假设您对“建议”板上的 20 个最新主题感兴趣（不包括开篇文章附带的评论），查询数据的一种方法是：

library( fouRplebsAPI )

recentAdv <- get_4chan_board_range( board = " adv " , page_start = 1 , page_stop = 2 , latest_comments = FALSE )

str( recentAdv , vec.len = 1 , nchar.max = 60 )
# > 'data.frame':    20 obs. of  15 variables:
# >  $ thread_id          : chr  "26681983" ...
# >  $ doc_id             : chr  "12984655" ...
# >  $ num                : chr  "26681983" ...
# >  $ subnum             : chr  "0" ...
# >  $ op                 : num  1 1 ...
# >  $ timestamp          : int  1655111247 1655110365 ...
# >  $ fourchan_date      : chr  "6/13/22(Mon)5:07" ...
# >  $ name               : chr  "Anonymous" ...
# >  $ title              : logi  NA ...
# >  $ referencing_comment: logi  NA ...
# >  $ comments           : chr  "I have a very good friend. Maybe one of my "| __truncated__ ...
# >  $ poster_country     : logi  NA ...
# >  $ nreplies           : logi  NA ...
# >  $ formatted          : logi  FALSE ...
# >  $ media_link         : logi  NA ...

输出描述可以在函数文档中找到。理论上，即使 API 有 API 速率限制（这会减慢查询过程），也可以使用此函数抓取大范围的存档。

使用此软件包收集 4chan 数据的第二种方法是搜索功能。 4plebs 允许使用许多搜索过滤器进行非常详细的搜索。我将仅展示可以使用 fouRplebsAPI 收集的数据的简单示例。

我在这里展示的例子相当令人愉快，因为我想避免 4chan，尤其是 /pol/ 委员会臭名昭著的更具争议性的话题。研究人员，例如那些对具有争议意识形态的行为者的政治沟通感兴趣的人，会发现很容易采用这个例子。但这是关于假期的。

让我们在“旅游”版块中找到讨论西班牙马略卡岛的通讯。

首先，为了获得搜索结果的第一印象，可以检查包含搜索词“mallorca”的 25 个最新帖子的片段。

 mallorca_snippet <- search_4chan_snippet( boards = " trv " , start_date = " 2021-01-01 " , end_date = " 2022-12-31 " , text = " mallorca " )
# > The 1 - 25 oldest posts of the 78 total search results are shown.
# > Scraping all 78 results would take ~ 1.33 minutes.

str( mallorca_snippet , vec.len = 1 , nchar.max = 60 )
# > 'data.frame':    25 obs. of  15 variables:
# >  $ thread_id          : chr  "1938850" ...
# >  $ doc_id             : chr  "1113628" ...
# >  $ num                : chr  "1938924" ...
# >  $ subnum             : chr  "0" ...
# >  $ op                 : num  0 1 ...
# >  $ timestamp          : int  1610611412 1611403974 ...
# >  $ fourchan_date      : chr  "1/14/21(Thu)3:03" ...
# >  $ name               : chr  "Anonymous" ...
# >  $ title              : chr  NA ...
# >  $ referencing_comment: chr  "1938909n" ...
# >  $ comments           : chr  ">got murdered and/or raped in shitholes ove"| __truncated__ ...
# >  $ poster_country     : logi  NA ...
# >  $ nreplies           : int  NA 13 ...
# >  $ formatted          : logi  FALSE ...
# >  $ media_link         : chr  NA ...

请注意，函数 search_4chan_snippet() 还会打印搜索结果的总数以及使用 search_4chan() 检索它们的估计时间。此估计基于每分钟 5 个请求的 API 限制。

仅对结果数量感兴趣的用户可以通过将参数 result_type 更改为“results_num”来检索结果。现在，我们可以比较不同时间段内提及马洛卡的帖子数量。例如，大流行前与大流行后：

 mallorca_pre <- search_4chan_snippet( boards = " trv " , start_date = " 2018-01-01 " , end_date = " 2019-12-31 " , text = " mallorca " , result_type = " results_num " )
mallorca_post <- search_4chan_snippet( boards = " trv " , start_date = " 2020-01-01 " , end_date = " 2021-12-31 " , text = " mallorca " , result_type = " results_num " )

data.frame ( " Years " = c( " 2018 & 2019 " , " 2020 & 2021 " ),
       " Total results " = c( mallorca_pre [ " total_found " ], mallorca_post [ " total_found " ])
       )
# >         Years Total.results
# > 1 2018 & 2019            86
# > 2 2020 & 2021            99

由于人们倾向于呆在家里，这个岛似乎被更多地提及。

有兴趣收集更多数据而不仅仅是帖子片段的研究人员可以使用函数 search_4chan()。继续以一段时间内提到马洛卡的帖子为例，人们可能会想问，马洛卡的形象在大流行期间是否发生了变化。除了简单地获取所有提到搜索词的帖子之外，还可以过滤包含图像数据的帖子：

 mallorca_pre_pics <- search_4chan( boards = " trv " , start_date = " 2018-01-01 " , end_date = " 2019-12-31 " , text = " mallorca " , show_only = " image " )
# > [1] "Approximate time: 0.33 minutes."
mallorca_post_pics <- search_4chan_snippet( boards = " trv " , start_date = " 2018-01-01 " , end_date = " 2019-12-31 " , text = " mallorca " , show_only = " image " )
# > The 1 - 16 oldest posts of the 16 total search results are shown.
# > Scraping all 16 results would take ~ 0.33 minutes.

head( mallorca_post_pics $ media_link )
# > [1] "http://i.4pcdn.org/trv/1521713616876.jpg"
# > [2] "http://i.4pcdn.org/trv/1525249686528.jpg"
# > [3] "http://i.4pcdn.org/trv/1527534752103.jpg"
# > [4] "http://i.4pcdn.org/trv/1527865867839.jpg"
# > [5] "http://i.4pcdn.org/trv/1533082869505.jpg"
# > [6] "http://i.4pcdn.org/trv/1547062117808.jpg"