Across recent empirical work on AI-supported academic writing, the role of large language models in literature reviewing emerges as a bounded but strategically important form of “assistive scaffolding” rather than a substitute for human scholarly judgment. The ten studies considered here do not all target literature reviews directly, yet together they offer a multi-layered view on how students and institutions actually position generative AI in writing workflows, how this positioning shapes motivation and self-efficacy, and where integrity and governance tensions concentrate (Alrefaie et al., 2026; Boillos & Idoiaga-Mondragon, 2026; Gabay et al., 2026; Karthika & Mariyam, 2026; Qiu & Zhu, 2026; Scuderi et al., 2026; Shue et al., 2026; Sun et al., 2026; Zhang & Saeed, 2026; Zhao & Lei, 2026). Reading these studies together allows a more precise specification of where AI can responsibly support literature reviews – mainly at the level of language, organization, and formative feedback – and where its use threatens the epistemic foundations of review work, especially in source work and attribution.
At the level of writing processes, several studies converge in depicting AI tools as linguistic and organizational support. In a case study of Chinese postgraduate writers in Malaysia, Zhang and Saeed (2026) show that students engage with ChatGPT-generated feedback through accepting, questioning, and rejecting suggestions, annotating drafts, and seeking external validation. Importantly, participants frame ChatGPT primarily as an instrument for linguistic refinement rather than content generation, and they display agentive and critical engagement rather than passive compliance (Zhang & Saeed, 2026). A duoethnographic account by Karthika and Mariyam (2026) similarly documents how AI feedback can be integrated into a recursive self-assessment cycle, in which writers alternate between AI-generated comments and human reflection to refine their academic discourse. Alrefaie et al. (2026) provide the most explicit model connecting AI directly to narrative review writing. In their mentor-guided framework, medical students use AI to generate and then refine review objectives, draft summaries, and rephrase text, while mentors supervise literature searching, critical appraisal, and ethical use. Students report substantial gains in confidence with PubMed and Google Scholar, and most perceive the hybrid AI–mentor approach as preparing them for future research tasks (Alrefaie et al., 2026). Synthesizing across these interventions, Gabay et al.’s (2026) scoping review shows that generative AI is most consistently deployed as assistance across the planning-to-revision span of writing, with benefits clustered around organization, fluency, efficiency, and language support, particularly for multilingual writers. Zhao and Lei’s (2026) corpus comparison adds a cautionary nuance: AI-generated abstracts differ systematically from human-written ones in their use of informal linguistic features and exhibit a more standardized, less variable style. For literature reviews, these process-focused findings suggest that AI is best conceptualized as a powerful drafting, refinement, and style harmonization tool whose contributions must remain subordinate to human decisions about what evidence to include and how to interpret it.
The same empirical corpus also illuminates how AI use in academic writing reshapes the affective and motivational landscape in which literature reviews are produced. Sun et al. (2026), drawing on control–value and self-efficacy theory, show that acceptance of large language models among Chinese EFL graduate students positively predicts both enjoyment and anxiety, which in turn have opposing effects on self-efficacy: enjoyment enhances, whereas anxiety undermines students’ belief in their writing capability. Enjoyment acts as a complementary mediator and anxiety as a competitive mediator between LLM acceptance and self-efficacy (Sun et al., 2026). Qiu and Zhu (2026), using the UTAUT2 framework, find that effort expectancy, habit, and social influence significantly predict behavioral intention to use AI chatbots for academic writing, and that intention, habit, and facilitating conditions predict actual use; their model explains 68% of the variance in usage behavior. These results suggest that once AI tools become easy to use, socially normalized, and integrated into study habits, their use in literature review assignments will be widespread regardless of formal policies. At the same time, Boillos and Idoiaga-Mondragon (2026) show that students themselves articulate substantial worries about erosion of academic writing competence. In their lexical-class analysis of free associations, the dominant negative themes are academic ethics (AI as unreliable and grade-threatening), loss of transversal skills such as creativity and reflective thinking, and underdevelopment of argumentation, coherence, and authorship. Taken together, these studies indicate that AI-supported literature review training must address a dual challenge: leveraging the motivational and self-efficacy benefits of AI use while explicitly counteracting the tendency toward passivity and skill atrophy that students already intuit as a risk.
Ethical, integrity, and governance questions form a third axis along which AI’s role in literature reviewing must be understood. Scuderi et al. (2026) review the legal, ethical, and practical challenges associated with AI in medical publishing, foregrounding concerns about accuracy, bias, intellectual property, data security, and potential manipulation of peer review systems. Their analysis underscores how journals are attempting to balance innovation with the need to maintain scientific rigor, transparency of AI use, and the integrity of human authorship. Shue et al. (2026) focus more narrowly on the fairness of AI-detection practices in the context of “ChatGPT-polished” scientific writing, arguing that cohort effects, baseline definitions, and technical limitations can lead to inequitable treatment of different author groups. These concerns intersect with the student perceptions documented by Boillos and Idoiaga-Mondragon (2026), who report that learners see AI both as an ethically ambiguous source of information and as a potential threat to grading fairness. At the empirical synthesis level, Gabay et al. (2026) identify hallucinations and unreliable or fabricated citations, inconsistent disclosure or attribution, overreliance in unscaffolded settings, and the limited reliability of AI-detection tools as the most recurrent risks across higher-education contexts. Zhao and Lei’s (2026) demonstration that AI writing is stylistically more standardized also has governance implications, because such homogeneity may make it harder to distinguish between legitimate language support and wholesale outsourcing of writing. For literature reviews, where evidentiary claims are mediated entirely through text and references, these integrity and governance issues point to source work and attribution as the highest-risk zones for AI involvement.
Against this backdrop, the ten studies collectively support a differentiated model of AI’s appropriate role in literature reviewing. At one pole lies a “linguistic–organizational” zone, in which AI can legitimately support task such as brainstorming alternative framings of a research question, constructing outlines, improving coherence between sections, refining sentence-level clarity, and adjusting register for disciplinary conventions (Alrefaie et al., 2026; Karthika & Mariyam, 2026; Zhang & Saeed, 2026). Here, AI’s generative capacity directly relieves mechanical burdens while leaving epistemic responsibility with the human author. At the other pole lies an “epistemic core” zone comprising comprehensive searching, source selection, data extraction, critical appraisal, and the weighing of evidence; none of the studies provide evidence that AI can perform these functions with reliability comparable to trained humans, and several explicitly document student or institutional skepticism about trusting AI in these roles (Boillos & Idoiaga-Mondragon, 2026; Gabay et al., 2026; Scuderi et al., 2026). Between these poles sits a “governance and disclosure” zone: institutions and journals require transparent reporting of AI use and must calibrate the deployment of detection tools in ways that avoid discriminatory effects (Scuderi et al., 2026; Shue et al., 2026). Within such a tripartite model, the appropriate stance is not prohibition but finely grained task allocation: AI is permitted – indeed encouraged – in the linguistic–organizational zone, conditional and tightly supervised in edge cases near the epistemic core, and subject to explicit disclosure and process evidence in all zones.
The educational designs examined in this corpus point toward practical strategies for operationalizing such a model in graduate and advanced undergraduate training. Alrefaie et al.’s (2026) narrative review assignment exemplifies a workflow in which AI supports objective formulation and rephrasing, while mentors structure literature search, enforce ethical constraints, and ensure that students genuinely engage with primary sources. Zhang and Saeed’s (2026) and Karthika and Mariyam’s (2026) accounts suggest that when students are encouraged to interrogate rather than obey AI feedback, they can use generative tools to sharpen metacognitive awareness of their own writing decisions. Gabay et al. (2026) recommend aligning institutional policies and access conditions so that constructive uses of GenAI – especially for multilingual writers – are supported without normalizing unverified source work. Qiu and Zhu’s (2026) evidence on habit and social influence implies that such protocols must be embedded early and consistently in curricula, or else students’ established AI practices will drift away from official expectations. Finally, Boillos and Idoiaga-Mondragon’s (2026) findings on perceived losses in creativity and argumentation suggest that assessment rubrics for literature reviews should weight argument structure, evidential reasoning, and originality of synthesis more heavily than surface fluency, thereby reducing the incentive to use AI as a shortcut for “polished” but shallow work.
In sum, the current empirical literature on AI-assisted academic writing supports a disciplined, segmented conception of AI’s role in literature reviewing. Generative tools already function as de facto infrastructure for language support and process scaffolding, and – when coupled with human mentorship and explicit ethical framing – they can enhance students’ confidence with core research practices such as database searching and summarization (Alrefaie et al., 2026; Gabay et al., 2026; Sun et al., 2026; Zhang & Saeed, 2026). At the same time, persistent risks around hallucinated or fabricated citations, erosion of writing and reasoning skills, and opaque detection and authorship norms indicate that AI cannot be entrusted with the epistemic core of literature reviews or with unmonitored control over source work and attribution (Boillos & Idoiaga-Mondragon, 2026; Gabay et al., 2026; Scuderi et al., 2026; Shue et al., 2026; Zhao & Lei, 2026). The most defensible position, therefore, is an assistive rather than substitutive stance: AI may accelerate the mechanics of writing and provide valuable feedback, but the authority to decide what counts as evidence, how that evidence is interpreted, and how the review demonstrates its own rigor must remain with human researchers and the communities that evaluate their work.
References
Alrefaie, Z., Alhazimi, A., Almarabheh, A., Madkhali, T., & Elsamanoudy, A. (2026). AI assisted, mentor-guided narrative review writing task for medical students, a novel educational strategy to enhance research and academic writing. Medical Teacher. Advance online publication. https://doi.org/10.1080/0142159X.2025.2604240
Boillos, M. M., & Idoiaga-Mondragon, N. (2026). Students’ negative perceptions of the use of artificial intelligence in academic writing: Didactic implications for higher education. Educación XX1, 29(1). https://doi.org/10.5944/educxx1.43943
Gabay, R. A. E., Funa, A. A., & Ricafort, J. D. (2026). Generative artificial intelligence (GenAI) for academic writing in higher education: A scoping review of applications, challenges, and implications. International Journal of Education in Mathematics, Science and Technology, 14(1), 200–232. https://doi.org/10.46328/ijemst.5682
Karthika, V. K., & Mariyam B, H. (2026). Exploring the impact of AI feedback in academic writing self-assessment: A duoethnographic study. Interactive Learning Environments. Advance online publication. https://doi.org/10.1080/10494820.2026.2619502
Qiu, N., & Zhu, D. (2026). Predicting postgraduates’ use behavior of AI-based chatbots for academic writing: Based on the UTAUT2 model. SAGE Open, 16(1). https://doi.org/10.1177/21582440251415288
Scuderi, G. R., Taunton, M. J., Browne, J. A., & Mont, M. A. (2026). The challenges with artificial intelligence in scientific writing. Journal of Arthroplasty, 41(2), 299–303. https://doi.org/10.1016/j.arth.2025.12.001
Shue, E., Jairath, N. K., & Hu, G. (2026). Response to “ChatGPT-polished scientific writing and artificial intelligence detection: Cohorts, baselines, fairness.” JAAD International, 24, 244–245.
Sun, F., Wang, J., Mendoza, L., & Li, H. (2026). Exploring the relationships among large language model acceptance, enjoyment, anxiety, and self-efficacy in L2 academic writing. Acta Psychologica, 263. https://doi.org/10.1016/j.actpsy.2026.106237
Zhang, K., & Saeed, M. A. (2026). Chinese EFL learners’ engagement with ChatGPT feedback on academic writing: A case study in Malaysia. Computers and Composition, 79. https://doi.org/10.1016/j.compcom.2025.102976
Zhao, N., & Lei, L. (2026). Informality features in AI-generated academic writing: A corpus-based comparison between human and AI. Journal of English for Academic Purposes, 79. https://doi.org/10.1016/j.jeap.2026.101629
