The Reality and Necessity of Teaching Singular THEY: JALT PanSIG Presentation



iWEB Corpus

The BYU Corpora bank, which of course includes COCA, has added a new, 14 billion word corpus. I haven’t looked at it in detail yet, but it seems to have a lot of useful new search features. There is an overview here:

DDL Meta-analyses

Last year saw the publication of the first large meta-analysis of DDL (Boulton & Cobb, 2017) and it had generally positive findings for the field; for example (emphasis added):

Focusing only on the most robust results (i.e., MVs with at least 10 unique samples in both P/P and C/E designs), 70% had large effects, 25% medium, and only 5% small or negligible. The most consistent large effects showed that DDL is perhaps most appropriate in foreign language contexts for undergraduates as much as graduates, for intermediate levels as much as advanced, for general as much as specific/academic purposes, for local as much as large corpora, for hands-on concordancing as much as for paper-based exploration, for learning as much as reference, and particularly for vocabulary and lexicogrammar. Many of these findings go against common perceptions, and the elements missing from the list (e.g., skills or other language areas) are for the most part missing because there is as yet insufficient research rather than because research evidence is against them.

I bolded “vocabulary and lexicogrammar” because in the past week another meta-analysis was published. This meta-analysis looked at corpus use and vocabulary learning (Lee, Warschauer, & Lee, 2018). They found that:

[O]ur meta-analysis showed a medium-sized effect on L2 vocabulary learning, with the greatest benefits for promoting in-depth knowledge to learners who have at least intermediate L2 proficiency. Corpus use was also more effective when the concordance lines were purposely selected and provided and when learning materials were given along with hands-on corpus-use opportunities. Moreover, we found that corpus use is still effective even without prior training and remains effective regardless of the corpus type or the length of the intervention.

The details of both papers are worth digging into, especially if one is statistically-minded. Furthermore, Boulton & Cobb make a number of recommendations for what kinds of studies and data could benefit the field in the future, while Lee, Warschauer, & Lee also mention that certain types of data/studies could help with the problem of small-sample sizes in future meta-analyses. In other words, if you do or are interested in doing DDL or corpus-use research, these papers point out gaps where research could be done and would be useful.

So the gist is that both papers are excellent contributions to understanding the effects of DDL and moderating influences, and they point to future research needs.

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta‚ÄźAnalysis.¬†Language Learning,¬†67(2), 348-393.

Lee, H., Warschauer, M., & Lee, J.H. (2018). The Effects of Corpus Use on Second Language Vocabulary Learning: A Multilevel Meta-analysis. Applied Linguistics.


Conference: Data-driven learning in foreign language and CLIL classes

A multilingual (CfP in 5 languages) conference on DDL will be held on September 27-28 in Torino, Italy. The conference website is available here, and the CfP can be downloaded from this page.

This is an excerpt from the English CfP (bold emphasis in the original):

Researchers and teachers are invited to submit proposals discussing theoretical implications and practical applications of DDL in foreign language classes and in CLIL contexts.
Since one of the aims of the Conference is to reach out to the teachers in primary and secondary education, participants are also invited to deliver practical workshops and/or share their experience and examples of good practices with the other participants.

Other info

  • Keynote speakers: Alex Boulton, Fanny Meunier
  • Submission deadline: May 15, 2018

    Please follow the links above for details about submissions, registration, and other conference matters.

Me and I, I and Me: Frequency effects?

Russ Mayne wrote an interesting post a few days ago on the constructions “my wife and I” and “me and my wife”. I fully agree with him regarding the ‘correctness’ of either construction and the main thrust of his post, but I did have a thought about one very specific part of his post.

Here is the pertinent part of Russ’ post:

The words ‘Me and my wife’ are in the subject position (at the start of the sentence) and so we should use the subject pronoun ‘I’¬†

English sentences usually start with¬†subjects. so in ‘I¬†love you’,¬†I¬†is the subject. If it were the¬†object¬†it would change to ‘me’ such as ‘you love¬†me’. The sentence ‘me and my wife went to the party’ seems to flaunt this rule because ‘me’ is in the subject position and so it should be I.

The problem with this argument is, were it true, the sentence ‘I and my wife went to the party’ would be a perfectly proper sentence, after all, the subject is properly ‘I’. However, ‘I and my wife’ sounds a bit off to me. So is something else is going on here?

McWhorter makes the rather bold claim that ‘me’, not ‘I’ is in fact English’s subject pronoun and that I is a rather special word that is only used when there is only one subject before the verb. Therefore ‘I went to the party’ sounds OK, and ‘me and the lads went to the party’ sounds OK, but ‘I and the lads went to the party’ doesn’t sound right because there is more than one subject. I’d never heard this argument before but I’d welcome some disconfirming evidence.

I’m not going to claim that I have disconfirming evidence, but I would like to express my skepticism and suggest that frequency effects are playing a large role (there’s also an element in play here that pronouns are more slippery than many people realize).

I also find that the construction “I and my wife” sounds a bit off. However, I doubt the notion of “I is a rather special word that is only used when there is only one subject before the verb”. Rather, because I encounter “me” used in these constructions much more frequently than “I”, that is the construction that¬†sounds right to me.

Here is a bit of data from the BNC:





Looking at these lines, “me and my wife” is more frequent than “I and my wife”, though only a relatively small sample was found (7:3). The open “me and my *” was more frequent than “I and my *”. In this case the ratio is 244:83. To me, those 83 occurrences of “I and my *” are interesting. The construction is in use. Is the sample large enough (are enough people using it) to assume these aren’t simply mistakes or production errors? I think for the moment it is. Presumably, the people using the construction don’t find anything about it that¬†sounds wrong, or feel a need to stop and rephrase. And, as odd as it¬†sounds to me, I don’t really think there is anything wrong with it.

A few years ago in Slate, Gretchen McCulloch wrote:

Your sense of English as a whole is really an abstract combination of all of the idiolects that you’ve experienced over the course of your life, especially at a young and formative age. The conversations you’ve had, the books you’ve read, the television you’ve watched: all of these give you a sense of what exists out there as possible variants on the English language. The elements that you hear more commonly, or the features that you prefer for whatever reason, are the ones you latch onto as prototypical.

I think that¬†my preference for “me and my *” over “I and my *” isn’t rooted in anything inherent about the subjective/objective properties of I/me, but stems from the frequency of my encounters (including my own production) with the construction(s). In other words, I doubt that there is a binary I=sub. pronoun and me=obj. pronoun (or vice versa) quality at work here, but that either pronoun can be sub/obj depending on context, and that my idiolect (more specifically for the matter at hand, my internal grammar about which construction is preferred in certain situations) is influenced, if not driven, by frequency effects of usage.

In summary: I prefer “me and my *”. But the corpus data shows that some other people use “I and my *”. I think that both are grammatical and that the preference for one or the other is largely a product of frequency effects on the idiolect. That is, frequency of exposure affects whether something ‘sounds right’.

(One more complicating factor: I don’t know enough about phonology to be sure, but I wouldn’t be surprised if the majority preference for “me and my *” had a phonological aspect, too. Something along the lines of a preference for duplicating the intial “m” or avoiding the rhyme of “I and my”. Then again, I might just be showcasing my idiolectal preference for both of those things ūüėČ )

Open corpora on Sketch Engine: EcoLexicon

If you work with or read about corpora, you are probably familiar with Sketch Engine. If you aren’t familiar with it, it is described on its own website as “the ultimate corpus tool”, and that’s maybe not an exaggeration. You can do a ton of cool stuff with it. Sketch Engine also provides access to hundreds of ready-to-use corpora in close to a hundred languages.

However, it requires a subscription (although a 30-day trial is available for starters). This puts some people off from it; after all, there are a lot of free resources out there.

What some people may not realize, though, is that there are some “open corpora” on Sketch Engine that can be explored (with all of Sketch Engine’s features) without registration.


Some interesting corpora there!

What inspired me to write this post is the presence in the list of open corpora of the EcoLexicon English Corpus, which was made available earlier this year.

The EcoLexicon is an environmental knowledge base and tool developed at the University of Granada. It is described as a knowledge base for “language specialists, domain experts, and the general public. Its representations are designed to help translators, technical writers, and environmental experts who need to access and better understand specialized environmental concepts with a view to writing or translating specialized and semi-specialized texts” (San Mart√≠n et al., 2017, p. 97).

The EcoLexicon English Corpus is a collection of English texts used in the EcoLexicon project. Searches can be limited according to domain of environmental studies, type of intended user of the text, geographical variety of English, country of publication, year of publication (1973-2016), publication genre, and who edited the text.

From an ELT-perspective, perhaps certain ESP-settings are the most obvious place for such a corpus to be useful. But really I’m just happy such a corpus has been made available to anyone who wants to explore language concerning environmental studies, policies, and communication.

The other open corpora are valuable too! And can let you try out Sketch Engine without needing to commit to anything or even registering.

Antonio San Mart√≠n, Melania Cabezas-Garc√≠a, Miriam Buend√≠a, Beatriz S√°nchez-C√°rdenas, Pilar Le√≥n-Ara√ļz, Pamela Faber. 2017. Recent Advances in EcoLexicon. Dictionaries: Journal of the Dictionary Society of North America, (38)1, pp. 96-115.¬†

New to corpus linguistics? Here are the basics

This is a great little explainer from Warren M. Tang. It covers the basics of the basics, and provides ready-to-use definitions and descriptions. He has also got a straightforward glossary of corpus types.

I get questions from colleagues about the basics of corpora sometimes, and I’m always happy to find simple reference materials that can help them out just in case my explanations aren’t clear or head off on a tangent ūüôā