This article is a methodological audit plus a practical “how-to” guide for one of the most routinely mishandled decisions in survey research: choosing and justifying sample size. Using services marketing and management as its empirical setting, it systematically reviews how authors actually report sample size logic in recent top-journal survey papers, then turns the diagnosis into a consolidated, decision-oriented framework researchers can apply before data collection (Ali, 2026).
The study’s empirical backbone is a structured review of 551 quantitative, non-experimental, questionnaire-based survey articles published between January 2020 and June 2024 in six leading services journals (Ali, 2026). The author describes a transparent screening and coding process, recording variables such as context, analytical method, sample size, and the type of justification offered, and reports strong intercoder agreement (Cohen’s κ = 0.87) after pilot testing and independent coding (Ali, 2026). That “audit design” matters: it makes the paper more than an opinion piece, because the central claims are tied to observable reporting patterns rather than anecdotal reviewer frustration.
The headline finding is blunt: explicit and context-specific sample size justifications are uncommon. The abstract frames this as “less than 15%” providing explicit or context-specific justification (Ali, 2026). In the detailed results, the problem looks even more structural: across analytical approaches, the dominant category is “no justification,” and in the total summary it reaches 471 of 551 studies (85.5%) (Ali, 2026). Even where authors cite established tools or heuristics, reporting is often superficial: references to power software or popular guidelines appear, but typically without the parameters that would let a reader verify adequacy (alpha, target power, expected effect size, model complexity assumptions) (Ali, 2026).
The sample size distribution itself is also revealing because it complicates the usual “small-N” stereotype. Over half of the reviewed studies used samples above 400 (51.5%), with a median of 406 and a mean of 644 (inflated by some extremely large datasets, up to 30,621) (Ali, 2026). The paper’s argument is not that services research is chronically under-sampled; it is that it is chronically under-justified. That distinction is methodological gold for authors: your sample can be big and still be weakly designed if you cannot explain why it is big, what it enables (and what it does not), and which error risks you were managing (Ali, 2026).
From these patterns, the article identifies recurring failure modes that map cleanly onto what reviewers often suspect but cannot prove from a finished manuscript: not reporting any estimation procedure, applying method-specific rules without understanding their assumptions, using finite-population tables/formulas when the population is effectively infinite or undefined, and leaning on secondary or irrelevant sources as rhetorical cover (Ali, 2026). It also flags a newer risk that grows as online panels and platforms make large-N cheap: overpowered studies can turn trivial effects into “significant” effects, creating a managerial-implication problem if effect sizes and practical relevance are not foregrounded (Ali, 2026). Quick joke with teeth: “N=1,000 because Qualtrics had a discount” is not a justification, it’s a shopping story.
The paper’s second contribution is the consolidated guideline set, framed as corrective actions directly tied to the empirical deficiencies it observed. It starts with basics that are often skipped in manuscripts: define objectives and population, then explicitly choose and justify the sampling method (probability vs non-probability), because what counts as “adequate” depends on what you are trying to generalize to (Ali, 2026). It then forces a decision that many papers blur: whether the population is finite and known or effectively infinite, because that choice determines whether population-based formulas/tables are even logically applicable (Ali, 2026). The centerpiece remains power analysis, but the emphasis is not “do it once”: it is “report the parameters,” align the estimation with the statistical model actually used, and treat the calculation as part of design transparency rather than a ceremonial citation (Ali, 2026). The guideline set also includes pragmatic steps researchers routinely forget to document, like inflating target N for non-response (Ali, 2026). Finally, it pushes a norm shift: sample size is not a mechanical threshold but a theoretically consequential design choice that affects credibility, replicability, and interpretability (Ali, 2026).
Mini glossary of key concepts
Sample size. The number of participants or observational units included in a study, intended to support inference about a target population. If the sample is too small, statistical power drops and results become more uncertain; if it is unnecessarily large, it can waste resources and increase the risk of “statistically significant but practically trivial” findings (Ali, 2026).
Statistical power. The probability of not missing a specified effect size at a given alpha level; adequate power increases the chance of detecting meaningful effects and reduces Type II error risk (Ali, 2026).
Power analysis. An approach to calculating the minimum required sample size using parameters such as target power, alpha, expected effect size, and variance. The paper’s emphasis is not that power analysis should be merely mentioned, but that it should be reported with its parameters (Ali, 2026).
Effect size. A measure that expresses the magnitude of a relationship or difference; because large samples can make p-values “significant” too easily, reporting effect sizes is critical for judging practical importance (Ali, 2026).
Type I error (false positive). Concluding that an effect exists when it actually does not; its probability is denoted by alpha (α) (Ali, 2026).
Type II error (false negative). Missing an effect that truly exists; insufficient power and insufficient sample size increase this risk (Ali, 2026).
Underpowered study. A study in which the sample size is not sufficient to detect meaningful effects; results may look like “no effect” while actually reflecting missed effects (Ali, 2026).
Overpowered study. A situation where an excessively large sample makes even very small, practically unimportant effects statistically significant; the paper treats this as a threat to the quality of managerial inference in services research (Ali, 2026).
No justification. Failure to report any procedure or rationale for how the sample size was determined; the review finds this category to be overwhelmingly dominant (Ali, 2026).
Finite vs. infinite/undefined population. If the population size is known, it is treated as finite and appropriate formulas/tables should be selected accordingly; if the population is undefined, automatically applying the same tools can be incorrect, and the paper reports this confusion as a common misuse (Ali, 2026).
Non-response inflation / oversampling. Increasing the initial target N by an expected non-response proportion so the final achieved sample does not fall below the required level; the paper frames this as a practical necessity especially for survey studies (Ali, 2026).
Transparent reporting. Clear reporting of the sample size method, the parameters used (α, power, effect size), the rationale for choices, and any adjustments (e.g., design effect, loss/attrition); the paper’s “credibility” claim is grounded in this reporting norm (Ali, 2026).
Reference: Ali, F. (2026). Sample size practices and guidelines in services marketing survey research. The Service Industries Journal. Advance online publication. https://doi.org/10.1080/02642069.2026.2612706
