Against the Capitol I met a lion, Who glared upon me, and went surly by

It is, I believe, generally accepted that who can be used to refer to animals (or at least some animals, or at least under certain conditions). Thus, Merriam-Webster’s online dictionary has this entry (I’ve outlined the key part in red):


However, Merriam-Webster’s online learner’s dictionary has this entry:


No mentions that who can be used for animals. And it isn’t only MW. I checked six learner’s dictionaries and none of them said this was acceptable, or an option. This isn’t a huge deal, necessarily, but it can lead to confusion if, say, this has been taught as a ‘rule’ and then students read graded readers where the ‘rule’ is broken without any consequence (search for horse who, fish who, or monkey who in the Lextutor graded reader corpus, for example). Of course, there are tons of sources where students might encounter who being used for animals.

This came up because a student of mine had written “I don’t want a dog who is so big” and a peer suggested it should be which. And that’s FINE. It can be which :-). Or that :-). But it can also be who :-D.

For teachers who like to do consciousness raising or language awareness activities, this kind of situation provides opportunities for discussions of things like what if you read/hear some language in real life that doesn’t seem to match what the dictionary/grammar guide says, analyzing  lines of ‘controversial’ lexicogrammar patterns, or formulating ideas about why people choose to use who, or which, or maybe that in different circumstances (does it seem ok for certain animals but not others? is it special for pets? does it change the meaning/tone/nuance?).

Of course, the underlying ideas could be applied to other language questions, too. So, in general, the ‘corpus lesson’ here is that corpora can be used to explore alternatives to more conventional patterns and aid in developing greater language awareness. Corpus-use can be applied to not just learning frequent or common patterns of expression, but to expanding the ways in which learners are able to express themselves.

While talking about this with another teacher, it was suggested that maybe the learner’s dictionaries (and perhaps some other learner-oriented materials) don’t acknowledge who for animals as acceptable because it’s new (recent) and thus ‘non-standard’. But I have trouble seeing which of these would be considered ‘non-standard’ (in fact, I doubt that in many cases fluent English users would even notice this usage unless it were pointed out or they were looking for it). And it’s not really a recent thing, is it?:

Against the Capitol I met a lion,

Who glared upon me, and went surly by

Julius Caesar, 1.3 20-21

This corpus-based analysis (Gilquin & Jacobs, 2006) of who being used with animals may also be of interest.

“I agree —.” Referencing with COCA

In the past couple weeks I have had reason to pull up frequencies or concordances in response to student questions on several occasions. For a teacher this is a pretty basic but useful application of corpora.

For instance, this past week in a business-focused ESP course we were reading through a role play which used the expression ‘I agree wholeheartedly’. A few types, or maybe lines, of questions arose. Is ‘wholeheartedly’ very common? Can I say ‘I wholeheartedly agree’? Can I say ‘I agree completely’? Can I say ‘I agree perfectly’? What if I agree but not that strongly? These didn’t arise all at once, but came about through discussion (some after I pulled up results in COCA).

Notably, to me, meaning was not an issue at all. All of the questions were regarding usage.

I ran a quick and relatively simple search on COCA using: I agree [r*]

([r*] is the PoS code for adverb)

Here are the first 20 results:


Across the entire corpus we can see that I agree wholeheartedly is the second most common formulation of the I+agree+adverb pattern. I agree completely is by far the most common. The top 4, 6 of the top 10, and 8 of the top 20 seem to be synonymous expressions of the sense of ‘100% agreement’.

We did not have time to go through each expression but the students said they appreciated seeing several examples of alternative phrasings. For the expression I agree perfectly, I actually thought it sounded a little strange when my student asked if it was ok, but here it was, attested. Only 2 hits, but after opening the concordance lines I saw how it was used and my hesitation about it evaporated. For me, this was a good reminder of how corpus reference can help me (and by extension my students) avoid over-generalizing from my own intuition.

The 9th most frequent hit, I agree in, is also notable, and my students asked specifically about it. So we took a look at its 6 concordance lines. 4 of the lines revealed the expression I agree in part and 1 revealed I agree in general; so they found some useful expressions for expressing various degrees of agreement to go along with hits such as (13) I agree basically, (18) I agree strongly, or (19) I agree somewhat.

We also briefly looked at the query: I [r*] agree

This resulted in a very different frequency list:


There are more overall hits for this pattern. I totally agree is the top hit, with I completely agree the second. Interestingly, I certainly agree is the 4th most frequent hit despite I agree certainly occurring zero times in the corpus. In fact, the I+adverb+agree pattern revealed several expressions that had no lexical correlates in COCA to the I+agree+adverb pattern. This wasn’t the point of the referencing in class, so we didn’t dwell on it and focused on the variety of expressions that can be used, but, for me, it was definitely food for thought. Agree?

Playing by ear and verbing by body parts: Using COCA to discover usage

Last week, a passage in the textbook for one of my classes had the expressions ‘he tunes it by ear’ and ‘they learn to play by ear’. For many of the students (not all, tho) this was a new expression, ‘[verb] by ear’, but in the context of the passage most of them figured out, on their own, that it meant the people tuning or playing instruments were able to do so without any sheet music or other aids. The textbook had a question using the expression ‘play by ear’ in the chapter review, so I felt it would to pay to explore the pattern a little.

So we had this pattern, ‘[verb] by ear’. What does ‘by ear’ really mean? And what verbs could fit into that slot? Several students agreed that it meant using one’s ears to do something. That was a fine start. But would it make any sense to say ‘sleep by ear’ or ‘imagine by ear’ with senses of using one’s ears to sleep or using one’s ears to imagine? No, not really. I asked my students if they could replace ‘by ear’ with a different expression in the phrase ‘they learn to play by ear’. Pretty quickly, they came up with ‘they learn to play by listening’. That’s more like it. So ‘[verb] by ear’ means that someone is using listening skills to [verb].

At this point, I directed the students to open up the newly-mobile-friendly COCA (every one of my students has an iPad) and enter the following query under the List function: VERB by ear.

play by ear 1
Search area highlighted in red


play by ear
VERB by ear: top 20 frequency list

Looking at the frequency data, it was immediately clear to them that ‘play(s)/playing/played by ear’ is, by far, the most frequent formulation of this pattern. ‘Learn(s)/learning/learned by ear’ is also quite a bit higher in frequency than other formulations of the pattern, most of which have only 1 or 2 occurrences in the corpus (note: ‘birding by ear’ is the title of a book for birdwatchers about learning to recognize birdsongs, and references to this book in the corpus make this formulation appear higher in the frequency list than one might otherwise expect).

Using this data my students and I discussed how the pattern ‘[verb] by ear’ can be used with a lot of different verbs, but in general they are safe knowing that it’s most often used with forms of play and learn.

For homework, students were assigned one phrase from a set of phrases: ‘[verb] by mouth’ or ‘[verb] by hand’ or ‘[verb] by foot’ or ‘[verb] by eye’; and they were told to do a similar analysis to what we did in class. That is, explain how the expression is used and find out what verbs commonly fit into the empty slot. Also, they were to find 3 example sentences (using concordance lines from COCA, an online learner’s dictionary, SkELL, etc.).

In our next class students who were assigned the same phrase sat together and checked/discussed their results, and then they taught members of other groups about their assigned phrase.

Overall, this activity went over very well. A big part of that is due to the BYU Corpora interface redesign. It’s so much cleaner visually, easier to navigate, much easier to search, less intimidating, and so much more mobile-friendly.

Importantly, considering other class and content needs, this did not take up a lot of (in-class) time. The activity on the first day took around 15 mins (in a 90 min class), and the activity on the second day took around 25 mins. Within the activity there were elements of individual, group, and whole class work, which helped keep it from seeming tedious.

Of course, there were other options for how to exploit the corpus, too. For example, instead of assigning expressions for the students to research, we could have had some fun by having them find expressions of the ‘[verb] by [body part]’ variety themselves, maybe discovering some surprising combinations. I suspect you can think of a few ways to positively tweak this kind of activity, too.


Frequency lists in pre-reading activities

This Saturday (6/4/16) I am showing a poster at the JALTCALL Conference in Tokyo. The poster describes a process for using Antconc, or any software that can generate frequency lists, to create wordlists that can be used by students during pre-reading activities. The basic idea is that students can use these lists to mark the words they don’t know or don’t feel confident about, and by doing so they create differentiated/personalized study lists. As a pre-reading activity, this can help students reach (or confirm) a word knowledge percentage threshold needed for comprehensible reading of a particular text. The underlying idea is actually quite flexible and doesn’t need to follow the exact steps as outlined in the poster, rather the process can be tailored to different contexts and settings. For example, I don’t always use the process to determine exact percents of word knowledge for each student, instead I might let students work in a group and research any unknown words together just before reading (especially if it’s a relatively short reading and I just want to ensure they will recognize most, if not all, of the words in a text).

If anyone is interested, below are a .pdf of the poster and a .docx  of the stoplist I used in the example in the poster (I made the stoplist based mostly on frequency data found in COCA). If you want to use the stoplist with a program such as Antconc, you should save it as a .txt file first.



For some background on word knowledge percentage thresholds for reading, check out The Extensive Reading Foundation’s Guide to Extensive Reading.




SkELL: Homonymy and Polysemy

One drawback when using SkELL is that it won’t differentiate between, say, lead/lead or the various senses of ‘rat’. The word sketch function will differentiate between parts of speech, but the easy-to-read concordance lines initially generated will have the various words, meanings, and senses jumbled up. However, this drawback can be exploited for the teaching of various kinds of homonyms and polysemous words.

There are several ways to do this, but I’ll only discuss one basic approach here. Take the word ‘sweet’. Maybe you have students familiar with the taste sense of the word, as in “The berries are rather sweet and juicy”. You could show them (or have them look up) the SkELL concordance lines for ‘sweet’. Have them mark off the sentences that they recognize as referring to sweet taste. This would leave several sentences that use ‘sweet’ in different senses, and your students could discuss what ‘sweet’ might mean in those other sentences.

skell sweet
Screenshot: Partial SkELL output for ‘sweet’

In the screenshot above, for instance, lines 9, 10, 19, and 20 appear to be describing something about people’s personalities. Discuss with your students what it could mean to describe a person as ‘sweet’.

Alternatively, students could use a dictionary to look up all/several of the senses of ‘sweet’, and then try to categorize the SkELL sentences according to each sense.

Regardless of how exactly you approach it, there are a lot of ways to exploit this drawback for teaching and learning purposes.

Any other SkELL tips?

SkELL: Easy to use for teachers and students


In a previous post I said that presentation and design factors were barriers to corpus use by teachers. I’ll add the sense that reading concordance lines is not intuitive for most people and, although central to corpus methods, adds to a discouraging visual aspect of many concordancers. Teachers don’t want to deal with this and they especially don’t want to expose their students to it. Aren’t there some tools that don’t have such a steep learning curve, have simple menus, and won’t scare our students? Thankfully, yes there are.

One of the best tools for non-specialists (teachers and students) is the Sketch Engine for Language Learning (SkELL). Among its user friendly features are a simple search mechanism (just input a word or phrase), a limited number of numbered output lines (40 max), lines in sentence format (not cut-off at a certain number of tokens before/after the node word), and plenty of white space which makes the appearance easier to read and process on screen. I’ll just go over a few straightforward ways to use SkELL.

Screenshot: SkELL’s simple menu and search

At a basic (and basic is good) level concordance lines can be used as illustrative examples of target features, lexical and grammatical. SkELL is an excellent resource for finding authentic sentences for the target word(s). One thing to keep in mind when selecting lines to use as examples is to consider what exactly you want the example for. Is it to help students understand the meaning of a word/phrase? Is it to help them understand the usage? Both? Some other skill or aspect you’re teaching? For a deeper discussion of this topic: a series of articles that discuss example sentences (particularly in dictionaries) that help learners with decoding (meaning) or encoding (usage) is included at the bottom of this post. The same principles apply to teachers wanting to use example sentences in class.

As an example, here is a screenshot showing some of the lines generated by searching for “aware”. I have outlined a sentence useful for decoding in red, and a couple sentences useful for encoding in blue.

Screenshot: some concordance lines for ‘aware’

The line “They are well informed and politically aware” is useful for decoding because of contextual clues, like ‘well informed’, which can help someone understand what ‘aware’ means. The lines “Ensure students are aware of their responsibility” and “You are probably already aware of this” are useful for encoding because they illustrate certain collocational and colligational features, such as ‘aware + of’, the high frequency use of be-verbs preceding ‘aware’, and in the latter case the use of verb + adverb preceding ‘aware’.

SkELL is also a great resource for discovering and exploring collocates. By clicking on the Word Sketch button in the top menu, a table of collocates is displayed, with the collocates separated into groups according to kinds of collocates.

Screenshot: word sketch for ‘aware’

Each collocate can be clicked, which will result in a new list of example sentences featuring the original search term and the selected collocate. This is useful for teachers, and for students to get some direct experience using a relatively straightforward, easy-to-use corpus resource.

Screenshot: several lines for ‘aware’ and its collocate ‘grow’

A few other ideas for using SkELL, though I won’t go into detail here, are creating gap-fill exercises, having students find and investigate examples/collocates (maybe each student/group of students could find and present examples of different kinds of collocates for a target word), using the sentences for translation exercises, etc.

My final point about SkELL’s usefulness is that it is a mobile friendly site, so even if students don’t have access to a PC in class, if they have smartphones or some other mobile device, they can use it just as well.

Have any other tips or good ideas for using SkELL? Please write a comment.


A second post of teaching-with-SkELL ideas is available.

Frankenberg-Garcia (2012) “Learners’ use of corpus examples”. International Journal of Lexicography, vol 25/3, 273-296.

Frankenberg-Garcia (2014) “The use of corpus examples for language comprehension and production”. Special Issue on Researching Uses of Corpora for Language Teaching and Learning, ReCALL, 26, 128-146.

Frankenberg-Garcia, A. (2015) “Dictionaries and encoding examples to support language production”. International Journal of Lexicography, 24/4, 490-512.

Each of the above articles is available through the author’s personal site.

BAISA, Vít a Vít SUCHOMEL. SkELL: Web Interface for English Language Learning. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, pp. 63-70, 8 p. ISSN 2336-4289. (online)


Google Images as a picture dictionary/corpus

I gave a very short presentation about using Google Images as a picture dictionary at a training at work recently. Google Images works as a picture dictionary because it is basically a picture corpus, and the image hits are ‘concordances’ that can be used like dictionary examples.

I gave examples using words such as ‘fuzzy’ and ‘rough’. Of course, students can look these up in a text-based dictionary, but even if they understand the entries they might still not know exactly how ‘fuzzy’ differs from, say, ‘hairy’, or which sense of ‘rough’ is intended in the phrase ‘rough neighborhood’. For these types of words visual data can be useful and is sort of a form of Data-driven Learning.

Visual data such as photographs can be processed very quickly, and for some learners may be more memorable. I find that Google Images works well with concrete and descriptive words. Unfortunately, the more abstract a word or phrase is, the less likely one is to get image hits that are going to be helpful in understanding the word/phrase.

But for many words, it’s great! It’s quick, simple, can be done on the fly or integrated into a planned activity. In the training session I showed how students could create a word profile for ‘rough’ by searching for ‘rough —‘, where I supplied some of the most common collocates for rough that reflect the various senses of the word.

Using Google Images as a picture dictionary/corpus is simple enough that most people who don’t consider themselves very tech-savvy could use it to good effect, I believe.