“Survey on sentiment analysis: evolution of research methods and topics” is a comprehensive academic paper published in Artificial Intelligence Review. This survey, authored by Jingfeng Cui, Zhaoxia Wang, Seng-Beng Ho, and Erik Cambria, delves into the dynamic landscape of sentiment analysis research.
What is Sentiment Analysis? Sentiment analysis is a rapidly evolving field within Natural Language Processing (NLP) that focuses on mining and interpreting people’s attitudes, emotions, appraisals, and opinions from unstructured text. It can determine whether emotions expressed in text are positive, negative, or neutral, and even quantify their intensity. This analytical technique is widely applied across various sectors, including business, finance, politics, education, and services, aiding decision-making for leaders, businessmen, and service providers by helping them understand user thoughts and opinions.
Unique Contribution of This Survey: While many existing literature reviews on sentiment analysis focus on techniques, methods, and applications, this particular study addresses a notable gap: the lack of a dedicated survey on the evolution of research methods and topics in sentiment analysis. It uniquely leverages keyword co-occurrence analysis combined with a community detection algorithm to uncover the historical development and current trends within the field over the past two decades. This approach allows for a comprehensive understanding of how research content and topics have transformed over time, providing valuable guidance for researchers.
Methodology at a Glance: The authors systematically collected 9,714 papers from the Web of Science platform, spanning from January 2002 to January 2022. The data collection involved specific keywords such as “sentiment analy*”, “sentiment mining,” and “sentiment classification,” with careful exclusions of irrelevant topics like image or speech processing. The core of their analytical methodology involved:
- Keyword Extraction: Utilizing KeyBERT to synthesize titles, abstracts, and author keywords, filtering for representative and high-frequency terms.
- Co-occurrence Analysis: Using BibExcel to count keyword co-occurrences, revealing associations between research contents.
- Network Visualization and Community Detection: Employing Pajek with the Louvain community detection algorithm to construct and divide the keyword co-occurrence network into sub-communities, and VOSviewer for optimizing visualization.
- Evolution Analysis: Tracking keyword frequencies year-by-year using Excel to map the evolution of research methods and topics.
Key Findings and Research Hotspots: The study successfully divided the sentiment analysis research landscape into six distinct communities (C1-C6), each representing a crucial research area:
- C1: Social Media Platforms (e.g., Twitter, COVID-19, topic models like LDA).
- C2: Machine Learning Methods (e.g., SVM, Naive Bayes, text classification, stock market prediction).
- C3: Natural Language Processing & Deep Learning Methods (e.g., LSTM, CNN, RNN, BERT, Aspect-Based Sentiment Analysis).
- C4: Opinion Mining & User Review Analysis (e.g., product review, sentiment lexicon, recommender systems, evaluation metrics).
- C5: Arabic Sentiment Analysis (focusing on non-English languages and linguistic complexities).
- C6: Emerging Methods (e.g., semi-supervised learning, domain adaptation, transfer learning, cross-lingual analysis).
The analysis reveals a clear evolution: deep learning technologies and social media-related studies (C1 and C3) have seen significant and continuous growth, becoming more prominent than traditional machine learning after 2016. The COVID-19 pandemic also triggered a rapid increase in related sentiment analysis studies on social media. While opinion mining (C4) initially dominated, the focus has shifted, though user and online reviews remain highly relevant. There’s also a growing interest in non-English languages (C5) and addressing data challenges through cross-domain and transfer learning (C6).
Practical Implications and Future Directions: The research offers broad practical insights, indicating that sentiment analysis can greatly enhance the understanding of user-generated content for various applications, from product improvement to public opinion monitoring and customer satisfaction assessment. For future work, the authors suggest that current methods, while useful, still lack true deep semantic understanding. Therefore, future research should integrate human commonsense knowledge, domain knowledge, and affective reasoning rules to improve performance and enable machines to achieve a more profound language comprehension. Continued efforts in fine-grained analysis, non-English language processing, and cross-domain sentiment analysis are also highlighted.
Limitations: The study acknowledges certain limitations, including its focus on English papers from the Web of Science platform, potentially omitting relevant research in other languages or databases. It also notes that the focus on high-frequency keywords might have overlooked some important low-frequency ones. Future research could expand on these aspects by including more diverse databases, precise keywords, and exploring the interdisciplinary nature of sentiment analysis using co-citation analysis.
Reference: Cui, J., Wang, Z., Ho, S.-B., & Cambria, E. (2023). Survey on sentiment analysis: evolution of research methods and topics. Artificial Intelligence Review, 56, 8469–8510. https://doi.org/10.1007/s10462-022-10386-z

