Project title: The Bayesian revolution in L2 research: An applied approach
Status: under review
Collaborators: Reza Norouzian, Michael de Miranda

The Frequentist method has long-since dominated in quantitative L2 research (Plonsky, 2013). In recent years, however, a number of fields have begun to embrace an alternative known as the Bayesian method. Using an open-source approach, the purpose of this article is to provide an applied, non-technical rationale for using Bayesian methods in L2 research. Specifically, we take three steps to achieve our goal. First, we compare the conceptual underpinning of the Bayesian and the Frequentist methods. Second, using real as well as carefully simulated examples, we introduce and apply Bayesian methods to estimate effect sizes from t-test designs. Third, to promote the use of Bayesian methods in L2 research, we introduce a free, web-accessed, point-and-click software package developed by the first author of the present study. Additionally, we demonstrate how our Bayesian method applies to past studies. Practical and theoretical dimensions of a “Bayesian revolution” in L2 research are discussed.


Project title: A Bayesian approach to measuring evidence in L2 research: An empirical investigaton
Status: under review
Collaborators: Reza Norouzian, Michael de Miranda

Null hypothesis testing has long-since been “the go-to analytic approach” in quantitative second-language (L2) research (Norris, 2015, p. 97). To many, however, years of reliance on this approach has resulted in a crisis of inference across the social and behavioral sciences (e.g., Rouder, Morey, Verhagen, Province, & Wagenmakers, 2016). As an alternative to the null hypothesis testing approach, many such experts recommend the Bayesian hypothesis testing approach. Adopting an open-source framework, the present study (a) re-evaluates the empirical findings of 418 t-tests from published L2 research using the Bayesian hypothesis testing, and (b) compare the Bayesian results with their conventional, null hypothesis testing counterparts as observed in the original reports. The results show that the Bayesian and the null hypothesis testing approaches generally arrive at similar inferential conclusions. However, considerable differences arise in the rejections of the null hypothesis. Notably, 64.06% of cases when p-values fell between .01 and .05 (i.e., evidence to reject the null), the Bayesian analysis found the evidence in the primary studies to be only at an “anecdotal” level (i.e., insufficient evidence to reject the null). Practical implications, field-wide recommendations, and an introduction to free online software ( for Bayesian hypothesis testing are discussed.


Project title: Multiple regression in L2 research: A methodological synthesis and meta-analysis of R2 values
Status: in preparation
Collaborator: Hessam Ghanbar

Multiple regression is a family of statistics used to investigate the relationship between a set of predictors and a criterion (dependent) variable. This procedure is applicable in a variety of research contexts and data structures. Consequently, and following quantitative traditions in education and psychology (see Skidmore & Thompson, 2010), second-language (L2) researchers have turned increasingly to multiple regression. The present study employs research synthetic techniques to describe and evaluate the use of this procedure in the field. 541 regression analyses (K = 171) were coded for different models, variables, procedures, reporting practices, and overall variance explained (R2). Summary results reveal a number of inconsistencies and missed opportunities as well as a lack of transparency (see Larson-Hall & Plonsky, 2015). The distribution of R2 values (Median = .32) is described to facilitate utilization and interpretation of regressions analyses. We also provide specific, empirically-grounded recommendations for future research.


Project title: Eta- and Partial Eta-Squared in L2 Research: A cautionary review and guide to more appropriate usage
Status: in press (available on “early view” here)
Collaborator: Reza Norouzian

Eta-squared (η2) and partial eta-squared (ηp2) are effect sizes that express the amount of variance accounted for by one or more independent variables. These indices are generally used in conjunction with ANOVA, the most commonly used statistical test in second language (L2) research (Plonsky, 2013). Consequently, it is critical that these effect sizes are applied and interpreted appropriately. The present study examined the use of these two effect sizes in L2 research. We begin by outlining the statistical and conceptual foundation of and distinction between η2 and ηp2. We then review the use of these indices in a sample of published L2 research (K = 156). Among other results, we show that ηp2 values are frequently being mislabeled as η2. We interpret and discuss potential causes and consequences related to the confusion surrounding these related but distinct indices. Within the context of reform efforts in quantitative L2 research, the current study seeks to respond to the recent, pointed calls for improving study quality (Plonsky, 2014) and statistical literacy (Loewen et al., 2014) in the field.


Project title (tentative): A Meta-Analysis of Second Language Pragmatics Instruction
Status: in press in N. Taguchi (Ed.), The Routledge handbook of SLA and pragmatics
Collaborator: Jingyuan Zhuang

Studies of instructional effectiveness comprise one of the most prominent and longstanding lines of inquiry in the field of second language acquisition. The bulk of empirical efforts in this domain have targeted second language (L2) grammar and vocabulary. Nevertheless, there is now a substantial body of studies that examine the effects of L2 pragmatics instruction as well (see Taguchi, 2015). Jeon and Kaya (2006) meta-analyzed 13 such studies, and found them to produce generally positive and moderate effects (d = .59, 95% CIs [0.05, 1.13] for between-group contrasts). Since then, this domain has matured, producing a great deal of additional evidence. The present study seeks to update and extend Jeon and Kaya (2006) to provide a more precise, stable, and current estimate of the overall effectiveness of L2 pragmatics instruction. Toward these ends, following a comprehensive literature search, 50 primary studies (98 samples) were coded for substantive and methodological features as well as their corresponding effect sizes (Cohen’s d). The aggregated results revealed a large overall effect of L2 pragmatics instruction (d = 1.68 and 1.61 for between- and within-group contrasts, respectively) and considerable retention of instructional effects over time. The results also indicate several relationships between instructional effectiveness and moderators related to learning contexts, treatment features, and outcome measures. We interpret the results in relation to previous reviews on pragmatics instruction and in instructed second language acquisition in general, and we provide empirically grounded suggestions for future research as well as practice in this domain.


Project title: Meta-analysis in language testing: A second order review and call for future research 
Status: in preparation
Collaborator: Sumi Han

Researchers interested in language assessment, as in other domains of applied linguistics, have begun to turn in recent years to meta-analysis as the preferred means of quantitatively synthesizing previous research. This study presents a description and evaluation of seven such meta-analyses resulting from an extensive search. Each meta-analysis is coded and evaluated to examine a number of methodological and substantive features. The results indicate a need for improved rigor and transparency in meta-analytic practice among language testing researchers. For example, meta-analyses of language assessment generally lack comprehensive searches for primary studies; they also fall short in transparency surrounding the development and implementation of their data collection instruments. In addition to providing empirically based suggestions for improving synthetic methods, we identify a number of subdomains of language assessment that may be fruitfully explored and synthesized using a meta-analytic approach. By calling attention to these issues now, we hope encourage more high quality applications of meta-analysis in language testing.


Project title: Sampling and generalizability in quantitative L2 research
Status: in preparation
Presented at: 
2014 SLS Symposium; Workshop on Reliability and Validity in SLA Research

Like many other fields, second language acquisition (SLA) researchers are often interested in generalizing their findings beyond the samples they collect data from. Currently, however, very little is known about the range of learner types and contexts that have been examined despite interest in generalizing findings to them. The study proposed here uses synthetic techniques to examine the extent to which concerns expressed over this issue in recent years are merited and worthy of further attention (e.g., Bigelow & Watson, 2012; Byrnes, 2013; Rose, 2005; Zhao, 2003). Specifically, I survey a body of quantitative studies published during a 6-year period (2008-2012) in six journals of second language research to extract data related to samples, demographics, and contexts. These data are then used to determine the extent to which L2 research has sampled–and might generalize to–different learner populations and contexts. The results seek to inform future research design and contribute to the incipient methodological reform movement in SLA.


Project title: The development of statistical literacy in applied linguistics graduate students
Status: in press (International Journal of Applied Linguistics)
Collaborators: Shawn Loewen, Talip Gonulal

As is true of all fields that employ quantitative methods, statistics play a crucial role in analyzing data in second language (L2) research. In fact, given that the use of statistics in L2 research has increased considerably since the L2 field’s inception (Brown, 2004; Loewen & Gass, 2009), statistical literacy appears to be an important skill for both producers and consumers of L2 research. Although there has been some investigation into statistical literacy among L2 researchers, no research to date has examined how such literacy is obtained by either professors or graduate students. In this study, we investigated the statistical literacy development of a group of L2 graduate students during a semester-long discipline-specific quantitative research methods courses. Participants completed a pre-course and post-course survey.  The results indicate that participants increased their knowledge of basic descriptive statistics and common inferential statistics to a great extent. Furthermore, participants reported that they felt more confident interpreting and using statistics. Based on these findings, recommendations for improving methodological practices in our field are provided.


Project title: The effectiveness of teaching formulaic sequences in L2 learning: A meta-analysis
Status: in preparation
Collaborators: Sumi Han, Bek Nurmukhamedov

A great deal of recent research has investigated the effectiveness of teaching L2 formulaic sequences (FSs). The overall effects of such interventions are, however, unknown as is the role of various contextual and pedagogical features (e.g., explicitness of instruction; treatment length). To this end, this meta-analysis aimed to (a) examine overall study features of previous L2 FS instruction (e.g., context, treatment, instrumentation); (b) assess the overall effect of FS instruction in L2 learning; and (c) examine study features and potential moderators of the effectiveness of FS instruction. A comprehensive and systematic search yielded 29 studies with 61 samples. The sample was then coded for a number of substantive and methodological features as well as effect sizes (Cohen’s d). Weighted mean effects for between-group (i.e., control-experimental) contrasts were found to be d = 1.67 (k = 19) and within-group (pretest-posttest) contrasts d =1.26 (k = 28), both large relative to other domains of L2 research (Plonsky & Oswald, forthcoming). These effects also varied across a number of moderators (e.g., explicit vs. implicit FS instruction: d = 2.96 vs. 1.10, respectively). We discuss our results in the context of L2 vocabulary pedagogy, and we provide direction for future studies seeking to test the effects of FS instruction.