Resumen
(Objetivo) Cuando Rusia invadió oficialmente a Ucrania, el mundo vivió un periodo de tensión e incertidumbre. Como válvula de escape social, los canales de comunicación digital aumentaron su númerode usuarios y su actividad, lo cual generó una gran cantidad de datos. Twitter, en particular, uno de loscanales más populares para compartir información y opiniones, explotó con actividades relacionadas con el acontecimiento. Igual que con muchos otros eventos sociales, como el COVID-19, tal red social se convirtió en una de las principales fuentes de información, opinión y conocimiento. Este trabajo realizaun análisis de sentimientos de tuits ligados al conflicto armado entre Rusia y Ucrania.
(Metodología) Elconjunto de datos analizado contiene tuits desde el 1 de enero de 2022 hasta el 3 de marzo de 2022 yse recopiló utilizando hashtags vinculados con el suceso. En total, se analizaron 603 552 tuits en inglés y1664 en ruso. Para realizar la clasificación de emociones, se utilizaron la variante DistilRoBERTa y el odelo preentrenado XLM-RoBERTa-Base, respectivamente. Los tuits en inglés se clasificaron en siete emociones:ira, disgusto, miedo, alegría, neutro, tristeza y sorpresa. Los tuits rusos se clasificaron en positivos, negativosy neutros.
(Resultados) Los resultados mostraron que la mayoría de los tuits en inglés mostró miedo e iracomo sentimientos predominantes, que alcanzaron el 32,08 % y el 15,18 % del total de tuits examinados, respectivamente. En cuanto a los tuits en ruso, la mayor parte presentó polaridad negativa, con un 86,83% del total. Algunas de las frases más recurrentes en el análisis aluden al apoyo a Ucrania y piden el cese de la guerra. Asimismo, son frecuentes las frases de preocupación por la crisis, las armas y las víctimas mortales.
(Conclusión) Como era de esperar, la mayoría de la gente estaba preocupada por el conflicto armado y tanto molesta como enfadada por sus consecuencias. Futuros trabajos podrían utilizar más tuits para mejorar el análisis y aumentar el rango temporal por estudiar. También, se podría segmentar el análisis, para estudiar el sentimiento de los tuits, según diferentes agrupaciones, y compararlos con otras sociedades; por ejemplo, examinar los tuits, de acuerdo con cada país o región.
Palabras clave: análisis de sentimientos; conflicto Rusia-Ucrania; Twitter; RoBERTa.
Abstract
(Objective) The moment Russia officially invaded Ukraine, the world experienced a period of tension and uncertainty. As a social release valve digital communication, channels increased their number of users and activity, generating a large amount of data. Twitter, in particular, being one of the most popular channels for sharing information and opinions, exploded with activities related to this historical moment. And as with many other social events, such as COVID-19, this social network became one of the main sources of information, opinion, and knowledge. This paper analyzes sentiments in tweets related to the armed conflict between Russia and Ukraine.
(Methodology) The analyzed dataset contains tweets from January 1, 2022, through March 3, 2022, and was collected using event-related hashtags. In total, 603,552 tweets in English and 1,664 in Russian were analyzed. To perform emotion classification, DistilRoBERTa variant and the pre-trained XLM-RoBERTa-Base model were used, respectively. English tweets were classified into seven emotions: anger, disgust, fear, joy, neutral, sadness, and surprise. Russian tweets were classified into positive negative, and neutral polarities.
(Results) The results showed that most English tweets convey fear and anger as predominant feelings, reaching 32.08% and 15.18% of the total tweets analyzed, respectively. Regarding tweets in Russian, the majority presented negative polarity, with 86.83% of the total. Some of the most recurrent phrases in the analysis allude to support for Ukraine and call for a halt to the war. Likewise, phrases of concern for the crisis, weapons, and fatalities are recurrent.
(Conclusion) As expected, most people were concerned about the armed conflict and upset and angry about its consequences. Future works could use more tweets to improve the analysis and increase the time range to be studied. The analysis could also be segmented to study the sentiments of tweets according to different groupings and compare them with other societies, for instance, tweets could be segmented by country and analyzed accordingly.
Keywords: sentiment analysis; Russia-Ukraine conflict; Twitter; RoBERTa
Resumo
(Objetivo) Quando a Rússia invadiu oficialmente a Ucrânia, o mundo viveu um período de tensão e incerteza. Como válvula de escape social, os canais digitais de comunicação aumentaram seu númerode usuários e sua atividade, gerando uma grande quantidade de dados. O Twitter, em especial, por ser um dos canais mais populares de compartilhamento de informações e opiniões, explodiu com atividades relacionadas ao evento. E como em muitos outros eventos sociais, como a COVID-19, essa rede social se tornou uma das principais fontes de informação, opinião e conhecimento. Este artigo realiza uma análise de sentimento de tuítes relacionados ao conflito armado entre Rússia e Ucrânia.
(Metodologia) O conjunto de dados analisado contém tuítes de 1º de janeiro de 2022 a 3 de março de 2022 e foi coletado usando hashtags relacionadas ao evento. No total, foram analisados 603.552 tuítes em inglês e 1.664 em russo. Para realizar a classificação das emoções, foram utilizados os modelos pré-treinados DistilRoBERTa e XLM-RoBERTa -Base, respectivamente. Os tuítes em inglês foram categorizados em sete emoções: raiva, descontentamento, medo, alegria, neutralidade, tristeza e surpresa. Os tuítes russos foram classificados como positivos, negativos e neutros.
(Resultados) Os resultados mostraram que a maioria dos tuítes em inglês apresentou medo e raiva como sentimentos predominantes, atingindo 32,08% e 15,18% do total de tuítes analisados, respectivamente. Quanto aos tuítes em russo, a maioria apresentou polaridade negativa, com 86,83% do total. Algumas das frases mais recorrentes na análise fazem alusão ao apoio à Ucrânia e pedem o fim da guerra. Da mesma forma, as frases de preocupação com a crise, as armas e as mortes são recorrentes.
(Conclusão) Como esperado, a maioria das pessoas estava preocupada com o conflito armado e chateada e irritada com suas consequências. Trabalhos futuros poderiam usar mais tuítes para melhorar a análise e aumentar o intervalo de tempo a ser estudado. A análise também poderia ser segmentada para estudar o sentimento dos tuítes de acordo com diferentes agrupamentos e compará-los com outras sociedades; por exemplo, analisar os tuítes de acordo com cada país ou região.
Palavras-chave: análise de sentimento; conflito Rússia-Ucrânia; Twitter; RoBERTa
Introduction
On February 24, 2022, Russia offi cially decided to invade Ukraine (Mbah & Wasum, 2022). The conflict between the two countries began when Russia annexed the Crimea region in 2014. However, it was not until Russia officially decided to invade the Ukrainian territory that the world was put in check. As happens in these situations, the amount of information transmitted through communication channels increased. The Internet has made digital media one of the most widely used today (Mota & Cilento, 2020). Twitter has become one of the most popular platforms for communicating, sharing information, and expressing opinions, with millions of users worldwide interact ing with each other every second.
Given the popularity of Twitter and the relative ease of accessing its data, several studies have addressed areas related to this social media platform. One of the most popular areas is sentiment analysis. This area consists of analyzing and determining the emotions behind a text concerning some specific topic or field (Khun & Thant, 2019). Organizations widely use it to obtain information about people to understand better their sentiment or position regarding, for example, a product or service.
The classification of emotions is a widely studied and debated field, as there are different approaches to it (Imran, Daudpota, Kastrati, & Batra, 2020). The most basic classification categorized emotions into three types: positive, negative, and neutral. On the other hand, Tomkins (Tomkins, 1962), proposed a classification of eight categories for emotions: fear, disgust, shame, surprise, interest, joy, rage, and anguish. Likewise, Ekman (Ekman, 1992) advanced a classification of six categories for emotions: anger, disgust, fear, joy, sadness, and surprise. Other proposals classify emotions based on different perspectives.
Among the approaches used for sentiment analysis, Machine Learning (ML) and Deep Learning (DL), both subareas of artificial intelligence, stand out. DL techniques are also the most extensively employed thanks to their excellent performance. Natural Language Processing (NLP) is the subfield of DL that addresses human language tasks (Zhang & Lu, 2021). NLP has evolved models from recurrent neural networks RNN) to modern architectures based on attentional mechanisms. These attentional mechanisms capture the full context of sentences and allow for better inference, unlike RNNs, which suffer from lack of memory Janiesch, Zschech, & Heinrich, 2021). Two of the best-known models for NLP tasks are BERT and RoBERTa.
In this paper, a pre-trained network, specifically DistilRoBERTa, was employed to identify sentiments from a dataset of tweets related to the Russia-Ukraine war conflict. Seven emotions were under study: anger, disgust, fear, joy, neutral, sadness, and surprise. This aimed to understand people's sentiments and reactions to the armed conflict. Likewise, bar chart, line chart, and wordcloud techniques were used to visualize the results.
Methodology
Dataset Description
The corpus used for this study was the 65-day Russia-Ukraine war tweets dataset, which is available for download on Kaggle (https://www.kaggle.com/datasets/foklacu/ ukraine-war-tweets-dataset-65-days). It collects tweets related to the Russia-Ukraine conflict from January 1, 2022, to March 3, 2022. The author collected a maximum of 5000 tweets per day using the following keywords: 'ukraine war,' 'ukraine troops,' 'ukraine border,' 'ukraine NATO,' 'StandwithUkraine,' 'russian troops,' 'russian border ukraine,' and 'russia invades.' The dataset was divided into different CSV files, one for each keyword, and had a total weight of 3.78 GB.
Filtering
In this step, both English and Russian tweets were filtered for analysis. This aimed to compare the similarities and differences of analyzing each group of tweets and understand better the conflict's sentiment.
Filtering English Tweets. In this step, the tweets were filtered to obtain only those written in English by using the 'lang' column, which refers to the tweet's language. All entries with 'en' (English) value in the 'lang' column were selected. It is worth mentioning that since the resulting corpus of tweets in English was considerable, a random sample of 50% was taken from this subset for its analysis.
Filtering Russian Tweets. Similar to the previous step, the tweets were also filtered to include only those written in Russian by using the 'ru' (Russian) value in the 'lang' column again.
Preprocessing
Duplicate Removal. After obtaining only English and Russian tweets, the next step involved removing repeated tweets in each subset. Duplicate tweets were present due to users' retweeting posts, resulting in redundancy within the dataset. The drop_ duplicates() function provided by pandas was used to remove this redundancy.
Removing Mentions, URLs, and Stop Words. Some elements of the tweets were not pertinent for the analysis. Removing these depended on the nature of the dataset and the problem addressed (Pota, Ventura, Fujita, & Esposito, 2021; Jianqiang & Xiaolin, 2017). In this case, mentions, URLs, and stop words were identified as elements that did not provide relevant information. These elements could lead to classification errors, as they could bias the interpretation of the text by the model. Therefore, they needed to be subjected to data cleaning (Jianqiang & Xiaolin, 2017). Regular expression-based substitution was used to remove mentions and URLs. The NLTK Python library was employed for stop words. Hashtags were retained as they provided relevant information and helped better understanding of each tweet's sentiment. Table 1 shows examples of the tweets from each subset before and after preprocessing at this stage.
Tokenization. The final step of preprocessing was tokenization. This is nothing more than preparing the inputs to apply the model, dividing each tweet into substrings, also called tokens. This was done using the AutoTokenizer method of the transformers' library provided by Hugging Face for each subset.
Sentiment Classification
English Tweets. The English sentiment classification procedure utilized DistilRoBERTa, a pre-trained natural language processing model programmed in Python. The model and its documentation are available on the Hugging Face website (https:// huggingface.co/j-hartmann/emotion-english-distilroberta-base). As the name suggests, it is a distilled and a smaller version of the RoBERTa model. Some advantages of smaller models are less weight, less complexity, and less training time. Specifically, DistilRoBERTa requires approximately four times less training time and is approximately twice as fast in prediction as the parent model (Sanh, Debut, Chaumond, & Wolf, 2019).
The model consists of 6 layers, with 82 million parameters, 39 million fewer parameters than the large model (Sanh, Debut, Chaumond, & Wolf, 2019). It can classify the emotions of English tweets within seven possible alternatives: anger, disgust, fear, joy, neutral, sadness, and surprise (Hartmann, 2022). In other words, the model predicts Ekman's six basic emotions plus an additional neutral class. It was trained with six datasets: Crowdflower, Emotion Dataset, GoEmotions, ISEAR, MELD, and SemEval-2018, using 80% for training and 20% for validation (Hartmann, 2022).
Russian Tweets. Regarding the Russian tweets, a pre-trained model was employed for their classification. The pre-trained
XLM-RoBERTa-Base (Smetanin & Komarov, Deep transfer learning baselines for sentiment analysis in Russian, 2021) was used on the RuReviews (Smetanin & Komarov, 2019) dataset. It is a variant that has been tuned specifically for sentiment analysis in Russian texts. It can predict sentiment polarity in three categories: negative, positive, and neutral. The model has about 278 million parameters and is available in the Hugging Face model library.
Visualization
Line Chart. A line chart displays the time evolution of one or more variables. Usually, the x-axis represents a continuous variable, such as time, while the y-axis represents the variable of interest intended to be analyzed over time (Ashman & Cruthers, 2021).
Bar Chart. The bar chart, also known as bar diagram, is a graph used to visualize and compare different categories. The graph has a categorical variable on one axis and a quantitative variable on the other (Ashman & words indicate greater frequency within Cruthers, 2021; Dougherty & Ilyankou, 2021, Li, 2020). In this case, it was used to compare Neogi, Garg, Mishra, & Dwivedi, 2021). the total count of tweets classified into the dif This procedure implements the wordcloud ferent seven emotions predicted by the model.
Wordcloud. A wordcloud is a visual representation of words of a text. The words are displayed in different sizes; the larger words indicate greater frequency within the text, and smaller ones are less frequent (Neogi, Garg, Mishra, & Dwivedi, 2021). This procedure implements the wordcloud library provided by python.
Results and Discussion
English Tweets
This figure and its contents were made by the authors of this work and its authorship corresponds to them.
After cleaning the data, removing duplicates, and selecting only the tweets in English, 603,552 tweets were obtained and fed into the model as input to predict sentiment. The implementation was performed in Google Colaboratory Pro version.
As shown in Figure 2, fear is the most prevalent sentiment in English tweets, accounting for 32.08% of the total number of tweets. It is followed by anger sentiment, with 15.18%. Neutral sentiment ranks third with 10.59%, followed by feelings of sadness, surprise, joy, and disgust with 8.7%, 6.76%, 5.11%, and 0.24%, respectively. Most tweets expressed fear due to the widespread panic over a possible world war, a feeling derived from the tension between the world's most powerful nations.
Anger is a consequence of all the acts that occurred in this event. This is consistent with the behavior of other animal societies, such as chimpanzees, where anger and fear are the two primary responses (Kret, Prochazkova, Sterck, & Clay, 2020; Signe, 2000). If attacked, the local clan's first reaction is usually fear derived from the situation and followed by anger to counterattack the enemy (Karin, 2019).
As expected, the most common words were 'Russia' and 'Ukraine' in all the categories. Likewi-se, 'NATO,' 'troop,' 'Putin,' 'invade,' 'border,' 'Biden,' and 'war' were recurrent. To a lesser extent, the words 'Trump,' 'join,' 'stop,' and 'China' also appeared in the wordclouds. Focusing on the sentiments of fear and anger, the two most weigh-ted in the English corpus, in the former, tweets such as 'Invasion,' 'Weapons,' and 'Crisis' could be observed. Concerning anger, phrases alluding to stopping the war and supporting Ukraine, such as 'StopRussia,' 'Stand-WithUkraine,' and 'Suppor-tUkraine,' could be also seen, as shown in Figure 3.
The temporal analy-sis in Figure 4 shows that, in January, the number of tweets written remained relatively stable until Fe-bruary, when it increased significantly. This is because February was precisely the month Russia's invasion of Ukraine began. The highest peak of tweets can be obser-ved in the week of February 28. As in Figure 2, the trend continued as most tweets corresponded to feelings of fear and anger, while disgust was the category with the lowest number of tweets in each analyzed month.
Russian Tweets
After undergoing the cleaning pro-cesses, 1,664 Russian tweets were obtained and entered the respective model for anal-ysis. Figure 5 shows the distribution of tweets according to each polarity analyzed. It is evident how negative polarity is the most prominent of all. It represents 86.83% of the tweets in Russian. The positive polarity follows this at 16.94%, and the neutral one at 5.23%.
Supported by the wordclouds in Figure 6, it is evident how the neg-ative polarity alludes to messages against Rus-sia's invasion of Ukraine. In this group of tweets, phrases such as 'StopPu-tin' and 'StopRussianAg-gression' clearly represent people's stance against the invasion. Likewise, some phrases can be observed that allude to feelings of fear and anger, such as 'StopWar,' and 'Putin-WarCriminal.' Regarding the positive polarity, one can find messages such as 'StopRussia,' 'Zelensky' and 'NATO.' Finally, in the neutral polarity, phras-es such as 'Ukraine,' 'Russian troops,' and 'USNA-TO' can be evidenced. In general, it is evident that the analysis of tweets in English and Russian presents similar results as most tweets are oriented to fear, anger, and oppo-sition to the Russian invasion of Ukraine.
Figure 7 shows the timeline for Russian tweets. As with the English corpus, there were a few tweets in the three polarities during January. Nevertheless, from February onwards, there is evidence of a significant increase in the number of tweets.
The highest peak can be seen in the week of February 28, as was also seen with tweets in English. Likewise, tweets with negative polarity are the most weighted throughout the analyzed period.
Conclusions
This paper analyzed 603,552 tweets in English and 1,664 tweets in Russian related to the Russia-Ukraine war conflict, aiming to understand sentiments people expressed in these tweets regarding this global shaking event. DistilRoBERTa and XLM-RoBERTa-Base neural network models were used to classify the tweets. English tweets were classified into seven emotions: anger, disgust, fear, joy, neutral, sadness, and surprise. Russian tweets were categorized into positive negative, and neutral polarities. The findings show that fear and anger were the most common feelings in English tweets, with 32.08% and 15.18%, respectively. In contrast, disgust was the least recurrent in this subset of data, with 0.24%. Concerning Russian tweets, negative polarity was the most heavily weighted with 86.83% of tweets, followed by positive polarity with 16.94%. The visualizations showed that February was the month with the highest number of tweets, and the week of February 28 had the highest peak for both English and Russian tweets. The most common words were 'Russia,' 'Ukraine,' 'NATO,' 'troops,' and 'Putin,' all related to the organizations involved in the conflict and their leaders. Tweets related to fear, anger, and negativity showed messages favoring Ukraine and opposing the Russian invasion and its leaders. Also, many tweets showed concern for the crises, weapons, and fatalities caused by the conflict.
Future work could improve the analysis by using more tweets and increasing the time range to be studied. The field of analysis could also be segmented to study the tweet sentiments according to different groupings and compare them with other societies, for instance, tweets could be segmented by country and analyzed accordingly.
Conflict of Interest
The authors have no relevant financial interests in this manuscript and no other potential conflicts of interest to disclose.
Author Contribution Statement
The total contribution percentage for the conceptualization, preparation, and correction of this paper was as follows: L.R. 70% and O.C. 30%. All authors declare that they have read and approved the final version of this paper.
Data Availability Statement
The data supporting the results of this study is only available for consultation in Kaggle at the following link: https://www.kaggle.com/datasets/foklacu/ ukraine-war-tweets-dataset-65-days
References
- Ashman, M., & Cruthers, A. (2021). Tables, charts, and graphs. Advanced Professional Communication, eCampus Pressbooks.
- Dougherty, J., & Ilyankou, I. (2021). Hands-On Data Visualization. O'Reilly Media, Inc.
- Ekman, P. (1992). An argument for basic emotions. Cognition & emotion, 6, 169-200. https://doi. org/10.1080/02699939208411068
- Hartmann, J. (2022). Emotion English DistilRoBERTa-base.
-
Imran, A., Daudpota, S., Kastrati, Z., & Batra, R. (2020). Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on covid-19 related tweets. IEEE Access, 8, 181074-181090. https://doi.org/10.1109/ACCESS.2020.3027350
» https://doi.org/10.1109/ACCESS.2020.3027350 -
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31, 685-695. https://doi. org/10.1007/s12525-021-00475-2
» https://doi.org/10.1007/s12525-021-00475-2 -
Jianqiang, Z., & Xiaolin, G. (2017). Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis. IEEE Access, 2870-2879. https://doi.org/10.1109/ACCESS.2017.2672677
» https://doi.org/10.1109/ACCESS.2017.2672677 -
Karin, E. (2019). Primate Ecology and Behavior. Retrieved from https://pressbooks-dev.oer.hawaii.edu/ explorationsbioanth/chapter/__unknown__-6/
» https://pressbooks-dev.oer.hawaii.edu/ explorationsbioanth/chapter/__unknown__-6/ -
Khun, N., & Thant, H. (2019). Visualization of Twitter Sentiment during the Period of US Banned Huawei. International Conference on Advanced Information Technologies, ICAIT 2019 (pp. 274-279). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/AITC.2019.8921014
» https://doi.org/10.1109/AITC.2019.8921014 -
Kret, M., Prochazkova, E., Sterck, E., & Clay, Z. (2020). Emotional expressions in human and non-human great apes. Neuroscience & Biobehavioral Reviews, 115. https://doi.org/10.1016/j.neubiorev.2020.01.027
» https://doi.org/10.1016/j.neubiorev.2020.01.027 -
Li, Q. (2020). Overview of Data Visualization. Embodying Data, 17-47. https://dx.doi.org/10.1007/978-981-15-5069-0_2
» https://doi.org/10.1007/978-981-15-5069-0_2 -
Mbah, R., & Wasum, D. (2022). Russian-Ukraine 2022 War: A Review of the Economic Impact of Russian-Ukraine Crisis on the USA, UK, Canada, and Europe. Advances in Social Sciences Research Journal, 9, 144-153. https://doi.org/10.14738/assrj.93.12005
» https://doi.org/10.14738/assrj.93.12005 -
Mota, F., & Cilento, I. (2020). Competence for internet use: Integrating knowledge, skills, and attitudes. Computers and Education Open, 1, 115. https://doi.org/10.1016/j.caeo.2020.100015
» https://doi.org/10.1016/j.caeo.2020.100015 -
Neogi, A., Garg, K., Mishra, R., & Dwivedi, Y. (2021). Sentiment analysis and classification of Indian farmers' protest using twitter data. International Journal of Information Management Data Insights, 1(2), 100019. https://doi.org/10.1016/j.jjimei.2021.100019
» https://doi.org/10.1016/j.jjimei.2021.100019 -
Pota, M., Ventura, M., Fujita, H., & Esposito, M. (2021). Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Systems with Applications, 115119. https://doi.org/10.1016/j.eswa.2021.115119
» https://doi.org/10.1016/j.eswa.2021.115119 - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv.
- Signe, P. (2000). Primate Faces and Facial Expressions. Social Research, 67, 245-271.
-
Smetanin, S., & Komarov, M. (2019). Sentiment Analysis of Product Reviews in Russian using Convolutional Neural Networks. 2019 IEEE 21st Conference on Business Informatics (CBI) (pp. 482-486). IEEE. https://doi.org/10.1109/CBI.2019.00062
» https://doi.org/10.1109/CBI.2019.00062 -
Smetanin, S., & Komarov, M. (2021). Deep transfer learning baselines for sentiment analysis in Russian. Information Processing & Management, 102484. https://doi.org/10.1016/j.ipm.2020.102484
» https://doi.org/10.1016/j.ipm.2020.102484 - Tomkins, S. (1962). Affect imagery consciousness: Volume I: The positive affects. Springer publishing company.
-
Zhang, C., & Lu, Y. (2021). Study on artificial intelligence: The state of the art and future prospects. Journal of Industrial Information Integration, 23. https://doi.org/10.1016/j.jii.2021.100224
» https://doi.org/10.1016/j.jii.2021.100224
Fechas de Publicación
-
Fecha del número
Jan-Dec 2023
Histórico
-
Recibido
26 Oct 2022 -
Acepto
20 Mar 2023