Me and I, I and Me: Frequency effects?

Russ Mayne wrote an interesting post a few days ago on the constructions “my wife and I” and “me and my wife”. I fully agree with him regarding the ‘correctness’ of either construction and the main thrust of his post, but I did have a thought about one very specific part of his post.

Here is the pertinent part of Russ’ post:

The words ‘Me and my wife’ are in the subject position (at the start of the sentence) and so we should use the subject pronoun ‘I’ 

English sentences usually start with subjects. so in ‘I love you’, I is the subject. If it were the object it would change to ‘me’ such as ‘you love me’. The sentence ‘me and my wife went to the party’ seems to flaunt this rule because ‘me’ is in the subject position and so it should be I.

The problem with this argument is, were it true, the sentence ‘I and my wife went to the party’ would be a perfectly proper sentence, after all, the subject is properly ‘I’. However, ‘I and my wife’ sounds a bit off to me. So is something else is going on here?

McWhorter makes the rather bold claim that ‘me’, not ‘I’ is in fact English’s subject pronoun and that I is a rather special word that is only used when there is only one subject before the verb. Therefore ‘I went to the party’ sounds OK, and ‘me and the lads went to the party’ sounds OK, but ‘I and the lads went to the party’ doesn’t sound right because there is more than one subject. I’d never heard this argument before but I’d welcome some disconfirming evidence.

I’m not going to claim that I have disconfirming evidence, but I would like to express my skepticism and suggest that frequency effects are playing a large role (there’s also an element in play here that pronouns are more slippery than many people realize).

I also find that the construction “I and my wife” sounds a bit off. However, I doubt the notion of “I is a rather special word that is only used when there is only one subject before the verb”. Rather, because I encounter “me” used in these constructions much more frequently than “I”, that is the construction that sounds right to me.

Here is a bit of data from the BNC:





Looking at these lines, “me and my wife” is more frequent than “I and my wife”, though only a relatively small sample was found (7:3). The open “me and my *” was more frequent than “I and my *”. In this case the ratio is 244:83. To me, those 83 occurrences of “I and my *” are interesting. The construction is in use. Is the sample large enough (are enough people using it) to assume these aren’t simply mistakes or production errors? I think for the moment it is. Presumably, the people using the construction don’t find anything about it that sounds wrong, or feel a need to stop and rephrase. And, as odd as it sounds to me, I don’t really think there is anything wrong with it.

A few years ago in Slate, Gretchen McCulloch wrote:

Your sense of English as a whole is really an abstract combination of all of the idiolects that you’ve experienced over the course of your life, especially at a young and formative age. The conversations you’ve had, the books you’ve read, the television you’ve watched: all of these give you a sense of what exists out there as possible variants on the English language. The elements that you hear more commonly, or the features that you prefer for whatever reason, are the ones you latch onto as prototypical.

I think that my preference for “me and my *” over “I and my *” isn’t rooted in anything inherent about the subjective/objective properties of I/me, but stems from the frequency of my encounters (including my own production) with the construction(s). In other words, I doubt that there is a binary I=sub. pronoun and me=obj. pronoun (or vice versa) quality at work here, but that either pronoun can be sub/obj depending on context, and that my idiolect (more specifically for the matter at hand, my internal grammar about which construction is preferred in certain situations) is influenced, if not driven, by frequency effects of usage.

In summary: I prefer “me and my *”. But the corpus data shows that some other people use “I and my *”. I think that both are grammatical and that the preference for one or the other is largely a product of frequency effects on the idiolect. That is, frequency of exposure affects whether something ‘sounds right’.

(One more complicating factor: I don’t know enough about phonology to be sure, but I wouldn’t be surprised if the majority preference for “me and my *” had a phonological aspect, too. Something along the lines of a preference for duplicating the intial “m” or avoiding the rhyme of “I and my”. Then again, I might just be showcasing my idiolectal preference for both of those things 😉 )


Open corpora on Sketch Engine: EcoLexicon

If you work with or read about corpora, you are probably familiar with Sketch Engine. If you aren’t familiar with it, it is described on its own website as “the ultimate corpus tool”, and that’s maybe not an exaggeration. You can do a ton of cool stuff with it. Sketch Engine also provides access to hundreds of ready-to-use corpora in close to a hundred languages.

However, it requires a subscription (although a 30-day trial is available for starters). This puts some people off from it; after all, there are a lot of free resources out there.

What some people may not realize, though, is that there are some “open corpora” on Sketch Engine that can be explored (with all of Sketch Engine’s features) without registration.


Some interesting corpora there!

What inspired me to write this post is the presence in the list of open corpora of the EcoLexicon English Corpus, which was made available earlier this year.

The EcoLexicon is an environmental knowledge base and tool developed at the University of Granada. It is described as a knowledge base for “language specialists, domain experts, and the general public. Its representations are designed to help translators, technical writers, and environmental experts who need to access and better understand specialized environmental concepts with a view to writing or translating specialized and semi-specialized texts” (San Martín et al., 2017, p. 97).

The EcoLexicon English Corpus is a collection of English texts used in the EcoLexicon project. Searches can be limited according to domain of environmental studies, type of intended user of the text, geographical variety of English, country of publication, year of publication (1973-2016), publication genre, and who edited the text.

From an ELT-perspective, perhaps certain ESP-settings are the most obvious place for such a corpus to be useful. But really I’m just happy such a corpus has been made available to anyone who wants to explore language concerning environmental studies, policies, and communication.

The other open corpora are valuable too! And can let you try out Sketch Engine without needing to commit to anything or even registering.

Antonio San Martín, Melania Cabezas-García, Miriam Buendía, Beatriz Sánchez-Cárdenas, Pilar León-Araúz, Pamela Faber. 2017. Recent Advances in EcoLexicon. Dictionaries: Journal of the Dictionary Society of North America, (38)1, pp. 96-115.

New to corpus linguistics? Here are the basics

This is a great little explainer from Warren M. Tang. It covers the basics of the basics, and provides ready-to-use definitions and descriptions. He has also got a straightforward glossary of corpus types.

I get questions from colleagues about the basics of corpora sometimes, and I’m always happy to find simple reference materials that can help them out just in case my explanations aren’t clear or head off on a tangent 🙂

Presentation at JALT 2017: Using the SCoRE corpus

UPDATE: If the slides are not appearing properly below, they can be seen here.

UPDATE 2: I noticed that the bottom of slide #20 has been cut off. The ending of that sentence should be “… are marked as being incorrect.”

These are my slides from a presentation I gave at the JALT International Conference in Tsukuba, Japan. I talked about using SCoRE with my students and their reactions to it (and toward guided induction activities).

The main points were that my students felt SCoRE was a simple-to-use tool; and they liked it although it was hard to differentiate their feelings toward SCoRE itself from the guided induction approach. As an aside, in my perception they liked the activities more and became much quicker at doing them as they became more familiar with the nature of the activities. I think that if a tool like SCoRE  or an approach like guided induction is going to be used, it should be used with some consistency so that students can get accustomed to it  (i.e. use it habitually, not as a one-off).

Note: They are saved on a different device than the one I’m using right now, but later I will post an example of one of the guided induction worksheets I mentioned in the presentation/slides.

Calls for Papers

In addition to the burst of recent research articles, there are a couple journals putting out CfPs for corpora-related topics.

JCADS is the Journal of Corpora & Discourse Studies. It’s new and will be publishing its first issue in the summer of 2018. The prior link is to the CfP posted on Facebook, submission info is here.

Études en Didactique des Langues (EDL) will publish a special issue on didactic (instructional) uses of corpora. The link is to a French-language page, but if you don’t read French, don’t worry, you can find a PDF of the CfP in both French and English in the list of documents. If you have trouble figuring out which document it is, the date on the document is 17/09/2017.

And finally, not a corpus-related issue, but an interesting (to me) special issue nonetheless, The Language Learning Journal put out a CfP for an issue focusing on the use of video and other audio-visual material.



A smorgasboard of DDL journal activity

Last month, in addition to the release of new corpora, two journals released special issues dedicated to DDL/CL in language learning.

One is the open-access Language Learning & Technology. I haven’t read it yet, but the table of contents looks very interesting. The other one is Language Testing. It’s interesting to see how CL and questions of assessment interact.

Finally, though not a whole dedicated issue, ReCALL has an online first article titled ‘Unlearning overgenerated be through data-driven learning in the secondary EFL classroom’. This will be the first article I get to, as overgenerated be is a recurring issue for many of my students and I’m curious to see what the authors found.

What bounty 🙂


If the ReCALL link above isn’t working for you, here is the doi:



Spoken BNC2014 & EFCAMDAT

Two large, open-access* corpora have recently been released.

Go here to learn about and get access to the Spoken BNC2014, and go here to learn about and get access to the EF-Cambridge Open Language Database (EFCAMDAT). EFCAMDAT is a learner corpus featuring essays from adult learners of English around the world, and this is its second, bigger release. Spoken BNC2014 is brand new (to the public), and, as you may very well be aware, is being described as “the largest ever public collection of transcribed British conversations“.

I don’t have anything too special to say. These both look like terrific resources. I think all who are interested should check them out.

*with registration