Research on Cyberbullying Text Detection Model Based on SHAP Explanations Tool

LIU Dong; LIU Ruili; WENG Haiguang

您当前所在位置：首页> 文献列表> 基于SHAP解释工具的网络欺凌文本检测模型研究

2024, 03, v.30 59-69

基于SHAP解释工具的网络欺凌文本检测模型研究

刘冬刘瑞丽翁海光

1.上海公安学院信息化与网络安全系

基金项目(Foundation): 上海公安学院科研项目(23xkx53)

邮箱(Email):

DOI:

投稿时间： 2024-04-25

投稿日期（年）： 2024

终审时间： 2024-12-24

终审日期（年）： 2024

审稿周期（年）： 1

发布时间： 2024-08-15

出版时间： 2024-08-15

移动端阅读

264	4	811
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

针对如何快速识别社交网络平台文本内容是否为欺凌文本的问题，提出了一种基于RoBERTa-BiGRU的网络欺凌文本检测模型。该模型首先使用预训练RoBERTa抽取文本的语义特征，并使用BiGRU进行特征综合提炼；然后将RoBERTa-BiGRU分类模型在网络欺凌文本检测数据集CB-tweets上的分类性能进行了相关评估；最后引入SHAP解释工具从全局和局部两个维度对模型所识别出的关键特征和基线值进行比较分析。实验结果表明，RoBERTa-BiGRU模型具有更高的分类准确率；使用可解释工具发现RoBERTa-BiGRU在Age、Ethnicity、Gender、Religion 4个类别上计算得到的关键词与该类别的标签主题相符，但在Other CB和Not CB类别上发现的关键词多为生僻字符和连写词，模型并未真正理解Other CB和Not CB的内在特征区别。

关键词： Cyberbullying; SHAP; RoBERTa; BiGRU; 文本检测;

Abstract：

Aiming to quickly identify whether text content in social media was cyberbullying text, a cyberbullying text detection model based on RoBERTa-BiGRU was proposed. Firstly, the pretrained RoBERTa was used to extract semantic features of the text in the model, and BiGRU was utilized for comprehensively feature extraction. Secondly, the classification performance of the RoBERTa-BiGRU classification model was evaluated on the Cyberbullying dataset CB-tweets. Finally, the SHAP interpretation tool was introduced to compare and analyze the key features and baseline values identified by RoBERTa-BiGRU model from both global and local dimensions. Experimental results showed that RoBERTa-BiGRU model had higher classification accuracy. It was found that the keywords calculated by RoBERTa-BiGRU on Age, Ethnicity, Gender, and Religion categories matched the labels of that category by using interpretable tool. However, the keywords found on Other CB and Not CB categories were mostly rare characters and ligatures, indicating that the model did not truly understand the inherent feature differences between Other CB and Not CB categories.

KeyWords： Cyberbullying; SHAP; RoBERTa; BiGRU; text detection;

如需获取全文，请访问cnki.net

参考文献

[1] UNICEF.Cyberbullying:what is it and how to stop it[EB/OL].(2019-03-25)[2024-04-09].https://www.unicef.cn/en/child-online-protection/10-things-teens-want-to-know-about-cyberbullying.

[2] 石国亮，徐子梁.网络欺凌的界定及其特点分析[J].中国青年研究，2010(12):5-8.

[3] 莫梅锋.未成年人网络欺凌治理的探索与思考[J].人民论坛，2021(36):82-85.

[4] 全国人大常委会办公厅.中华人民共和国未成年人保护法[M].北京：中国民主法制出版社，2020.

[5] BALAKRISNAN V,KAITY M.Cyberbullying detection and machine learning:a systematic literature review[J].Artificial Intelligence Review,2023,56(1):1375-1416.

[6] DEVLIN J,CHANG M W,LEE K,et al.BERT:pretraining of deep bidirectional transformers for language understanding[J].Arxiv,2018.

[7] 林伟，陈雁.融合BERT- BiGRU和多尺度CNN的中文微博情感分析[J].中国电子科学研究院学报，2023(10):939-945

[8] LIU Y H,OTT M,GOYAL N,et al.RoBERTa:a robustly optimized BERT pretraining approach[J].Arxiv,2019.

[9] 宋宇琦，高旻，李骏东，等.网络欺凌检测综述[J].电子学报，2020,48(6):1220-1229.

[10] CHO K,MERRIENBOER B V,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]// Conference on empirical methods in natural language processing,2014.

[11] XU J,JUN K,ZHU X,et al.Learning from bullying traces in social media[C]//Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2012.

[12] 靳庆文，李胡蓉，徐红霞.面向网络欺凌文本检测模型的算法解释及其故事化呈现研究[J/OL].现代情报，(2024-04-11)[2024-06-12].http://kns.cnki.net/kcms/detail/22.1182.G3.20240314.0925.002.html.

[13] 柳致远，范永胜，张万里，等.常见中文社交平台中网络欺凌语言的检测分析[J].西南师范大学学报(自然科学版),2021(8):86-94.

[14] HASAN M T,HOSSAIN M A E,MUKTA M S H,et al.A review on deep-learning-based cyberbullying detection[J].Future Internet,2023,15(179):179.

[15] 叶水欢，葛寅辉，陈波，等.基于ELMo- TextCNN的网络欺凌检测模型[J].信息安全研究，2023,9(9):868-876.

[16] MUNIF A,BANDAR A,ABDUL R.A multichannel deep learning framework for cyberbullying detection on social media[J].Electronics,2021,10(21):2664.

[17] OGUNLEYE B,DHARMARAJ B.The use of a large language model for cyberbullying detection[J].Analytics,2023,2:694-707.

[18] 林伟.融合注意力机制和Albert- BiGRU的中文微博情感分析[J].中国人民公安大学学报(自然科学版),2023(4):52-56.

[19] LUNDBERG S M,LEE S I.A unified approach to interpreting model predictions[C]//NIPS'17:Proceedings of the 31st International Conference on Neural Information Processing Systems,2017:4768-4777.

[20] SHAPLEY L S.A value for n-person games[J].Contributions to the Theory of Games,1953,2:307-317.

[21] JACKIE A,YANG X J,FENG Z.Combat COVID-19 infodemic using explainable natural language processing models[J].Information Processing and Management:An International Journal,2021,58(4):102569.

[22] 贺国秀，任佳渝，李宗耀，等.以可解释工具重探基于深度学习的谣言检测[J].数据分析与知识发现，2024,8(4):1-13.

基本信息:

中图分类号:TP391.1

引用信息:

[1]刘冬,刘瑞丽,翁海光.基于SHAP解释工具的网络欺凌文本检测模型研究[J].中国人民公安大学学报(自然科学版),2024,30(03):59-69.

基金信息:

上海公安学院科研项目(23xkx53)

投稿时间：

2024-04-25

投稿日期（年）：

2024

终审时间：

2024-12-24

终审日期（年）：

2024

审稿周期（年）：

发布时间：

2024-08-15

出版时间：

2024-08-15

请选择需要下载的pdf数据

中国人民公安大学学报（自然科学版）

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

中国人民公安大学学报（自然科学版）

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈