Project title: A proposed framework for assessing research on L2 pronunciation instruction: A systematic review and meta-analysis
Status: under review
Collaborators: Kazuya Saito
Presented at: SLRF 2016, GURT 2017

We propose a framework for conceptualizing measures of instructed L2 pronunciation proficiency according to three sets of parameters: (a) the constructs being focused on (global vs. specific), (b) the scoring method (human raters vs. acoustic analyses), and (c) the type of knowledge being elicited (controlled vs. spontaneous). Using a synthetic approach, we apply this framework to the instruments found in 86 studies of L2 pronunciation instruction (see Lee, Jang & Plonsky, 2015) and we calculate the frequency of each measurement type and re-examine the interaction of instructional effectiveness and measurement within the sample. According to the results, instruction is most effective when it targets learners’ monitored production of specific segmental/suprasegmental features. The efficacy of instruction remains relatively unclear when gains are measured via subjective/human judgements especially at a spontaneous level. The findings are discussed to improve the designs in L2 pronunciation research and, more generally, strengthen the interface between pronunciation instruction, assessment and SLA


Project title: The use of course grades as metrics in L2 research: A systematic review
Status: under review
Collaborators: Alan Brown, Yasser Teimouri

Much of applied linguistics research is concerned with classroom-based second-language (L2) development. Such a setting is ideal for examining the institutional ecology of L2 learning, teaching, use, policy, and assessment. Building on the guidance for conducting L2 research, some scholars in second language acquisition (SLA) continue to call for greater attention to the operationalization of constructs, selection of valid assessments, and use of appropriate statistical analyses (e.g., Al-Hoorie and Vitta, 2018; Plonsky, 2013). Thomas (2006) and Tremblay (2011), for example, argue that the construct of proficiency, despite its near ubiquity in L2 research, is not adequately defined or validly measured, thus calling into question subsequent findings and conclusions. One cause for concern in this area is the use of both impressionistic and teacher-constructed evaluations used for research purposes, commonly found both as individual assessments and, in their composite form, as course grades and grade point averages (GPAs). The present study sought to examine the use of course grades in L2 research. In order to do so, we adopt a synthetic approach involving systematic searching, coding, and analysis of course grades as variables in a sample of published L2 research (Plonsky, 2015). More concretely, we examine the use and justification of grades in empirical research published in four prominent journals that focus on L2 teaching and learning. We also exemplify the grades as a research tool by presenting the results of a meta-analysis of the relationship between grades L2 anxiety. Based on our review, we make a number of suggestions for the use of grades and other assessment tools in educationally-orientedL2 research.


Project title: Second language anxiety and achievement: A meta-analysis
Status: under review
Collaborators: Yasser Teimouri, Julia Goetze

Second-language (L2) anxiety has been the object of constant empirical and theoretical attention for several decades. As a matter of both theoretical and practical interest, much of the research in this domain has examined the relationship between anxiety and L2 achievement. The present study meta-analyzes this body of research. Following a comprehensive search, a sample of 88 reports were identified which contributed 96 independent samples (N=18,278) from 21 countries. In the aggregate, the 204 effect sizes (i.e., correlations) reported in the primary studies yielded a mean of r = −.36 for the relationship between L2 anxiety and language achievement. Moderator analyses revealed effects sizes to vary across different types of language achievement measures, study contexts, educational levels, target languages, and anxiety types. Overall, the results of this study provide firm evidence for both the negative role of L2 anxiety in L2 learning and the moderating effects of a number of (non)linguistic variables. We discuss the findings in the context of theoretical and practical concerns, and we provide direction for future research.


Project title: The Bayesian revolution in L2 research: An applied approach
Status: accepted / in press (Language Learning)
Collaborators: Reza Norouzian, Michael de Miranda

Frequentist methods have long-since dominated in quantitative L2 research (Plonsky, 2013). Recently, however, a number of fields have begun to embrace an alternative known as the Bayesian method (see e.g., Kruschke, Aguinis, & Joo, 2012). Using an open-source approach, this article provides an applied, non-technical rationale for Bayesian methods in L2 research. Specifically, we take three steps to achieve our goal. First, we compare the conceptual underpinning of Bayesian and Frequentist methods. Second, using real as well as carefully simulated examples, we introduce and apply Bayesian methods to estimate effect sizes from t-test designs. Third, to promote the use of Bayesian methods in L2 research, we introduce a free, web-accessed, point-and-click software package ( as well as a suite of flexible R functions. Additionally, we demonstrate Bayesian methods for secondary analysis. Practical and theoretical dimensions of a “Bayesian revolution” for L2 research are discussed.


Project title: A Bayesian approach to measuring evidence in L2 research: An empirical investigaton
Status: under review
Collaborators: Reza Norouzian, Michael de Miranda

Null hypothesis testing has long-since been “the go-to analytic approach” in quantitative second-language (L2) research (Norris, 2015, p. 97). To many, however, years of reliance on this approach has resulted in a crisis of inference across the social and behavioral sciences (e.g., Rouder, Morey, Verhagen, Province, & Wagenmakers, 2016). As an alternative to the null hypothesis testing approach, many such experts recommend the Bayesian hypothesis testing approach. Adopting an open-source framework, the present study (a) re-evaluates the empirical findings of 418 t-tests from published L2 research using the Bayesian hypothesis testing, and (b) compare the Bayesian results with their conventional, null hypothesis testing counterparts as observed in the original reports. The results show that the Bayesian and the null hypothesis testing approaches generally arrive at similar inferential conclusions. However, considerable differences arise in the rejections of the null hypothesis. Notably, 64.06% of cases when p-values fell between .01 and .05 (i.e., evidence to reject the null), the Bayesian analysis found the evidence in the primary studies to be only at an “anecdotal” level (i.e., insufficient evidence to reject the null). Practical implications, field-wide recommendations, and an introduction to free online software ( for Bayesian hypothesis testing are discussed.


Project title: L2 Grit: Perseverance and passion in L2 learning
Status: in preparation
Collaborator: Yasser Teimouri, Farhad Tabandeh
Presented at: SLRF 2017; SLA Research Methodology, 2018

As a higher-order personality trait, “grit” has been defined as a combination of perseverance and passion for long-term goals (Duckworth, Peterson, Matthews, & Kelly, 2007). Past research in social psychology has found grit as an important predictor of success across different populations in various areas (see Duckworth, 2013; but cf. Crede et al., 2016). Since successful mastery of an L2 is highly dependent on L2 learners’ sustained effort over a long period of time (e.g., Dörnyei & Ryan, 2015; Dörnyei & Ushioda, 2015), the study of grit and its effects on students’ language achievement is immediately relevant in SLA. In the present study, a language-specific grit scale was developed and validated to measure L2 learners’ grit. We also examined the relationship between L2 grit and learners’ motivational behaviors, emotions, and language achievements. A total of 91 English University students studying English as a Foreign Language (EFL) participated. Following item analysis and various checks on internal consistency and construct validity, the results of a Principal Component Analysis (PCA) revealed two related but distinct components of perseverance and passion of the L2 grit scale; reliability analyses also evidenced high internal consistency of the scale; moderate, positive correlation between L2 grit and the domain-general grit scale (r = .53; Duckworth & Quinn, 2009) further substantiated the validity of the L2 grit as a related but distinct construct. Correlational analyses showed consistent positive relationships between L2 grit and students’ effort (r = .45; see Taguchi, Magid, & Papi, 2009), second language willingness to communicate (L2 WTC) (r = .39; MacIntyre, Dörnyei, Clément, & Noels, 1998), and mental attentiveness in class (r = .43). L2 grit was also found to be positively related to students’ language achievements as measured by their English course grades (rs = .31-.33; i.e., Grammar course, Speaking course, Laboratory course) as well as their GPA. Taken together, and consistent with results of past research in social psychology, L2 grit was found to be a positive factor associated with language achievements of L2 learners. Theoretical and pedagogical implications of the study are discussed in detail.


Project title: Multiple regression in L2 research: A methodological synthesis and guide to interpreting R2 values
Status: in press (Modern Language Journal)
Collaborator: Hessam Ghanbar

Multiple regression is a family of statistics used to investigate the relationship between a set of predictors and a criterion (dependent) variable. This procedure is applicable in a variety of research contexts and data structures. Consequently, and following quantitative traditions in education and psychology (see Skidmore & Thompson, 2010), second-language (L2) researchers have turned increasingly to multiple regression. The present study employs research synthetic techniques to describe and evaluate the use of this procedure in the field. 541 regression analyses (K = 171) were coded for different models, variables, procedures, reporting practices, and overall variance explained (R2). Summary results reveal a number of inconsistencies and missed opportunities as well as a lack of transparency (see Larson-Hall & Plonsky, 2015). The distribution of R2 values (Median = .32) is described to facilitate utilization and interpretation of regressions analyses. We also provide specific, empirically-grounded recommendations for future research.


Project title (tentative): A Meta-Analysis of Second Language Pragmatics Instruction
Status: in press in N. Taguchi (Ed.), The Routledge handbook of SLA and pragmatics
Collaborator: Jingyuan Zhuang

Studies of instructional effectiveness comprise one of the most prominent and longstanding lines of inquiry in the field of second language acquisition. The bulk of empirical efforts in this domain have targeted second language (L2) grammar and vocabulary. Nevertheless, there is now a substantial body of studies that examine the effects of L2 pragmatics instruction as well (see Taguchi, 2015). Jeon and Kaya (2006) meta-analyzed 13 such studies, and found them to produce generally positive and moderate effects (d = .59, 95% CIs [0.05, 1.13] for between-group contrasts). Since then, this domain has matured, producing a great deal of additional evidence. The present study seeks to update and extend Jeon and Kaya (2006) to provide a more precise, stable, and current estimate of the overall effectiveness of L2 pragmatics instruction. Toward these ends, following a comprehensive literature search, 50 primary studies (98 samples) were coded for substantive and methodological features as well as their corresponding effect sizes (Cohen’s d). The aggregated results revealed a large overall effect of L2 pragmatics instruction (d = 1.68 and 1.61 for between- and within-group contrasts, respectively) and considerable retention of instructional effects over time. The results also indicate several relationships between instructional effectiveness and moderators related to learning contexts, treatment features, and outcome measures. We interpret the results in relation to previous reviews on pragmatics instruction and in instructed second language acquisition in general, and we provide empirically grounded suggestions for future research as well as practice in this domain.


Project title: Meta-analysis in language testing: A second order review and call for future research 
Status: in preparation
Collaborator: Sumi Han

Researchers interested in language assessment, as in other domains of applied linguistics, have begun to turn in recent years to meta-analysis as the preferred means of quantitatively synthesizing previous research. This study presents a description and evaluation of seven such meta-analyses resulting from an extensive search. Each meta-analysis is coded and evaluated to examine a number of methodological and substantive features. The results indicate a need for improved rigor and transparency in meta-analytic practice among language testing researchers. For example, meta-analyses of language assessment generally lack comprehensive searches for primary studies; they also fall short in transparency surrounding the development and implementation of their data collection instruments. In addition to providing empirically based suggestions for improving synthetic methods, we identify a number of subdomains of language assessment that may be fruitfully explored and synthesized using a meta-analytic approach. By calling attention to these issues now, we hope encourage more high quality applications of meta-analysis in language testing.


Project title: Sampling and generalizability in quantitative L2 research
Status: in preparation
Presented at: 2014 SLS SymposiumWorkshop on Reliability and Validity in SLA Research

Like many other fields, second language researchers are often interested in generalizing their findings beyond the samples they collect data from. Currently, however, very little is known about the range of learner types and contexts that have been examined despite interest in generalizing findings to them. The current study uses synthetic techniques to examine the extent to which concerns expressed over this issue in recent years are merited and worthy of further attention (e.g., Bigelow & Watson, 2012; Byrnes, 2013; Rose, 2005; Zhao, 2003). Data were collected from several sources. First, sampling, demographic, and contextual data were hand-coded for a body of approximately 600 studies published during a 6-year period across six journals. The study also examined and recorded demographic data as reported in 100+ meta-analyses of L2 research. The third and final data source represents a novel and potentially powerful approach to collecting synthetic data: Corpus linguistic techniques were utilized to extract demographic and contextual features from a sample of 8000+ primary reports of L2 research (L2RC; Plonsky, n.d.). The data from these sources are then used to determine the extent to which L2 research has sampled–and might or might not be able to generalize to–different learner populations and contexts. The results seek to inform future research design and contribute to the incipient methodological reform movement in applied linguistics.


Project title: The Quality and Identity of L2 Research Journals: Ratings, Bibliometrics, and Trends
Status: in preparation
Collaborators: Ryan Blair, Kelsey Boyce, Amy Kim, Fei Li, Rachel Thorson Hernández, Yiran Xu, Jingyuan Zhuang

The importance of academic journals in second-language (L2) research is evident on at least two accounts. Journals are, first of all, central to the process of disseminating scientific findings. Journals are also critical on a professional level as most L2 researchers must publish articles to advance their careers. However, not all journals are perceived as equal; some may be considered more prestigious or of higher quality and may, therefore, achieve a greater impact on the field. It is therefore critical that we understand the identify and quality of L2 research journals, yet very little research (e.g., Egbert, 2007; VanPatten & Williams, 2002) has considered these issues to date. The current study sought to explore L2 journal identity and quality, and the relationship between these constructs. In order to do so, a database was compiled based on three different types of sources: (1) a questionnaire eliciting L2 researchers’ perceptions of the quality of 27 journals that publish L2 research (N = 327); (2) using the same 27 journals, manual coding of different types of articles (e.g., empirical studies, review papers), data (quantitative, qualitative, mixed), research settings, and authorship patterns (K = 2,024); and (3) bibliometric and submission data such as impact factors, citation counts, and acceptance rates. Descriptive statistics were computed to explore overall quality ratings as well as publication trends found in each journal. The relationships between those patterns and quality ratings were also examined. In addition, regression models were built to determine the extent to which perceptions of journal quality could be explained as a function of journal and article features. We discuss the findings of the study in terms of on-going debates concerning publication practices, study quality, impact factors, journal selection, and the “journal culture” in applied linguistics.


Project title: The effectiveness of teaching formulaic sequences in L2 learning: A meta-analysis
Status: in preparation
Collaborators: Sumi Han, Bek Nurmukhamedov

A great deal of recent research has investigated the effectiveness of teaching L2 formulaic sequences (FSs). The overall effects of such interventions are, however, unknown as is the role of various contextual and pedagogical features (e.g., explicitness of instruction; treatment length). To this end, this meta-analysis aimed to (a) examine overall study features of previous L2 FS instruction (e.g., context, treatment, instrumentation); (b) assess the overall effect of FS instruction in L2 learning; and (c) examine study features and potential moderators of the effectiveness of FS instruction. A comprehensive and systematic search yielded 29 studies with 61 samples. The sample was then coded for a number of substantive and methodological features as well as effect sizes (Cohen’s d). Weighted mean effects for between-group (i.e., control-experimental) contrasts were found to be d = 1.67 (k = 19) and within-group (pretest-posttest) contrasts d =1.26 (k = 28), both large relative to other domains of L2 research (Plonsky & Oswald, forthcoming). These effects also varied across a number of moderators (e.g., explicit vs. implicit FS instruction: d = 2.96 vs. 1.10, respectively). We discuss our results in the context of L2 vocabulary pedagogy, and we provide direction for future studies seeking to test the effects of FS instruction.