DDL Meta-analyses

Last year saw the publication of the first large meta-analysis of DDL (Boulton & Cobb, 2017) and it had generally positive findings for the field; for example (emphasis added):

Focusing only on the most robust results (i.e., MVs with at least 10 unique samples in both P/P and C/E designs), 70% had large effects, 25% medium, and only 5% small or negligible. The most consistent large effects showed that DDL is perhaps most appropriate in foreign language contexts for undergraduates as much as graduates, for intermediate levels as much as advanced, for general as much as specific/academic purposes, for local as much as large corpora, for hands-on concordancing as much as for paper-based exploration, for learning as much as reference, and particularly for vocabulary and lexicogrammar. Many of these findings go against common perceptions, and the elements missing from the list (e.g., skills or other language areas) are for the most part missing because there is as yet insufficient research rather than because research evidence is against them.

I bolded “vocabulary and lexicogrammar” because in the past week another meta-analysis was published. This meta-analysis looked at corpus use and vocabulary learning (Lee, Warschauer, & Lee, 2018). They found that:

[O]ur meta-analysis showed a medium-sized effect on L2 vocabulary learning, with the greatest benefits for promoting in-depth knowledge to learners who have at least intermediate L2 proficiency. Corpus use was also more effective when the concordance lines were purposely selected and provided and when learning materials were given along with hands-on corpus-use opportunities. Moreover, we found that corpus use is still effective even without prior training and remains effective regardless of the corpus type or the length of the intervention.

The details of both papers are worth digging into, especially if one is statistically-minded. Furthermore, Boulton & Cobb make a number of recommendations for what kinds of studies and data could benefit the field in the future, while Lee, Warschauer, & Lee also mention that certain types of data/studies could help with the problem of small-sample sizes in future meta-analyses. In other words, if you do or are interested in doing DDL or corpus-use research, these papers point out gaps where research could be done and would be useful.

So the gist is that both papers are excellent contributions to understanding the effects of DDL and moderating influences, and they point to future research needs.


(Links are above)

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta‐Analysis. Language Learning67(2), 348-393.

Lee, H., Warschauer, M., & Lee, J.H. (2018). The Effects of Corpus Use on Second Language Vocabulary Learning: A Multilevel Meta-analysis. Applied Linguistics.

 

Advertisements

Conference: Data-driven learning in foreign language and CLIL classes

A multilingual (CfP in 5 languages) conference on DDL will be held on September 27-28 in Torino, Italy. The conference website is available here, and the CfP can be downloaded from this page.

This is an excerpt from the English CfP (bold emphasis in the original):

Researchers and teachers are invited to submit proposals discussing theoretical implications and practical applications of DDL in foreign language classes and in CLIL contexts.
Since one of the aims of the Conference is to reach out to the teachers in primary and secondary education, participants are also invited to deliver practical workshops and/or share their experience and examples of good practices with the other participants.

Other info

  • Keynote speakers: Alex Boulton, Fanny Meunier
  • Submission deadline: May 15, 2018

    Please follow the links above for details about submissions, registration, and other conference matters.

Presentation at JALT 2017: Using the SCoRE corpus

UPDATE: If the slides are not appearing properly below, they can be seen here.

UPDATE 2: I noticed that the bottom of slide #20 has been cut off. The ending of that sentence should be “… are marked as being incorrect.”

These are my slides from a presentation I gave at the JALT International Conference in Tsukuba, Japan. I talked about using SCoRE with my students and their reactions to it (and toward guided induction activities).

The main points were that my students felt SCoRE was a simple-to-use tool; and they liked it although it was hard to differentiate their feelings toward SCoRE itself from the guided induction approach. As an aside, in my perception they liked the activities more and became much quicker at doing them as they became more familiar with the nature of the activities. I think that if a tool like SCoRE  or an approach like guided induction is going to be used, it should be used with some consistency so that students can get accustomed to it  (i.e. use it habitually, not as a one-off).

Note: They are saved on a different device than the one I’m using right now, but later I will post an example of one of the guided induction worksheets I mentioned in the presentation/slides.

A smorgasboard of DDL journal activity

Last month, in addition to the release of new corpora, two journals released special issues dedicated to DDL/CL in language learning.

One is the open-access Language Learning & Technology. I haven’t read it yet, but the table of contents looks very interesting. The other one is Language Testing. It’s interesting to see how CL and questions of assessment interact.

Finally, though not a whole dedicated issue, ReCALL has an online first article titled ‘Unlearning overgenerated be through data-driven learning in the secondary EFL classroom’. This will be the first article I get to, as overgenerated be is a recurring issue for many of my students and I’m curious to see what the authors found.

What bounty 🙂


UPDATE

If the ReCALL link above isn’t working for you, here is the doi: https://doi.org/10.1017/S0958344017000246

 

 

Alive (and a meta-analysis)

Well, that was a longer break from blogging than I expected. I’ve had a tremendously busy winter, but hopefully I can get back to updating more frequently.

The first item on my list is to mention that Boulton & Cobb’s meta-analysis of DDL has been published. It adds context, detail, and discussion to their earlier slides.

It’s not a quick read, but the short version is that DDL works quite well in general; there are very encouraging results and several medium-to-large effect sizes were found. Going forward there needs to be more fine-grained research on for whom, for what, under what conditions, and for how long does DDL work well. They also make some important points about what information needs to be included in the future by researchers doing quantitative work on DDL.

To not mention*, it just feels right

@anthonyteacher has a great post at his site discussing the patterns “not to VERB” and “to not VERB”. He writes about his students’ reactions to the constructions, his own view, and some findings from Google N-grams and COCA. You should read his post in full.

I basically agree with everything he says, with one point he makes that I would like to extend a little bit. So I’d like to highlight this paragraph from his post, and especially the statement I put in red:

All of this data tells me several things. First, “to not” is on the rise, most likely due to the fact that the ability to separate an infinitive has become more accepted and “to not” has probably rolled in through a snowball effect. Second, the placement of “not” does not necessarily imply emphasis, as can be seen in the sentences above. Third, while my speech may make some of the older generations shake their first with anger, possibly telling me I am killing English, I can now reply confidently that my speech is the vanguard of an English where “not” is as placement-fluid as “they” is gender-fluid. My speech may be a speech that is likely to boldly go where few have gone before. Or to not boldly go, because language change is really unpredictable, and this is just a tiny thing.

I chose to highlight this section because I felt that sometimes my own choices regarding placement of “not” are definitely, if not necessarily, done for emphasis, but after thinking about it I don’t think it is a matter of placing emphasis per se. Rather, it is about restricting possible meanings/uses.

Let me explain. Here are two partial lines from COCA (query terms: not to mention):

1) … He would talk only if I promised not to mention he lived in …

2) … But tours and marketing materials, not to mention data on the average student, won’t tell you if that college will …

In the first line, I, personally, would probably phrase that as “to not mention”, though not necessarily. The point is that both constructions feel natural to me. However, I can’t imagine myself saying that about the second line. To me, and the way I’m processing these constructions, the first line’s meaning is straightforward, but the second line’s meaning is based on my understanding of “not to mention” as a fixed or partially fixed expression in this instance.

In this case, the construction is not simply negating the mentioning of something (in fact, the thing in question is explicitly and necessarily mentioned/understood). Indeed, the online Cambridge Dictionary, for example, defines “not to mention” as a phrase used when you want to emphasize something that you are adding to a list.

So, generally speaking, I process “to not VERB” as basically interchangeable with “not to VERB” (with a personal preference for “to not VERB”) when the meaning is straightforward (i.e. negating the verb). But “not to VERB”, perhaps because of it’s associations with certain fixed expressions, seems to me to have a broader range of usage. Something like this:

“not to VERB”: can negate the verb or have idiomatic/figurative meaning and usage

“to not VERB”: restricted to negating the verb

Here is a sample of COCA lines from @anthonyteacher ‘s post:

cocatonotkwicacademic

All the “to not VERB” uses here have meanings that can be understood as simply negating the verb. I suspect it would be this way throughout all the lines.

At least, until the language changes some more 😉


If I have said something glaringly, obviously wrong please tell me. Or if you have evidence of “to not VERB” used in an idiomatic/figurative way, please share it. Or if you have better choices for terminology etc. etc. etc. …

Parallel corpora and concordances

This might be tl;dr … If you are just looking for a list or some links to parallel corpora, please go to the end of this post.


In response to my presentation at this years ETJ Tokyo conference, where I talked about the parallel corpus and DDL tool called SCoRE, I was asked whether there were parallel corpora available in other languages. Short answer: Yes! Caveat: They are not always straightforward to use.

First of all, a quick explanation of what is a parallel corpus. It is a kind of bi- or multilingual corpus. A parallel corpus is a corpus that contains text from one language that is aligned with translations from one or more other languages; so, for example, if I query talk * in the Japanese interface of SCoRE I will get concordance lines in English that contain talk+ any word and concordance lines in Japanese that are translations of those English lines. These are parallel concordances.

parlines
English-Japanese parallel concordancing in SCoRE

Here is another illustration  showing a sample of a concordance from the Portugese-English parallel corpus COMPARA. The query terms were “talk” “.*” (this is the syntax for the talk + any word search in COMPARA, quote marks included).

comparalines
Parallel concordancing in COMPARA

Parallel corpora are often used in translation and contrastive studies [see McEnery, A., & Xiao, R. (2007). Parallel and comparable corpora: What is happening. Incorporating Corpora. The Linguist and the Translator, 18-31]. Although they are not used as much in language learning, there has been promising work recently, particularly (as far as I’m aware) here in Japan [see Anthony, L., Chujo, K., & Oghigian, K. (2011). A novel, web-based, parallel concordancer for use in the ESL/EFL classroom. In Corpus-based studies in language use, language learning, and language documentation (pp. 123-138). Brill.; see also Chujo, K., Kobayashi, Y., Mizumoto, A., & Oghigian, K. (2016). Exploring the Effectiveness of Combined Web-based Corpus Tools for Beginner EFL DDL. Linguistics and Literature Studies, 4(4).]

Parallel concordancing can used for activities like translation tasks, of course, but they are also useful for DDL, at least in certain situations. In my experience, having translations of English concordance lines available in students’ L1 is very helpful for both lower-proficiency students and novice DDL students. Both the content and format of concordance lines can be difficult for such students, but in both cases the L1 support offered by parallel corpora allows students to quickly grasp the meaning of the English lines, letting them focus on the context or patterns in the lines. Even if they don’t always need the L1 support to really understand the English lines, they often feel more comfortable and are more receptive to doing activities and work that they are generally unaccustomed to doing. Perhaps as they become more familiar with concordance lines they can switch to monolingual lines.

Another benefit is that they can get a sense of how differently (or similarly) concepts, ideas, or notions may be expressed in the L2 as compared to their L1. Students can pick up on shades of meaning, nuance, and usage. I’ve seen this lead to lexical development where students have commented that they found a phrase or new (and natural-sounding) way to express something they had previously expressed inaccurately due to L1 interference, or had been completely unaware of because it wasn’t covered in any traditional way (i.e. it really is something they discovered for themselves). It’s only anecdotal, but I have spoken with my students about these mini ‘light-bulb’ moments and they react very positively to them.

There can be issues, though. There needs to be some understanding of, say, the directionality and relationship of the source material to the translations, or where the translations have come from and their quality, and of course that the translation seen in a concordance line is almost certainly not the only potential/accurate way to translate the source text. And another thing to keep in mind is that students’ need to share a single L1 unless the corpus is multilingual with translations available for all of the students’ L1s (which would overcome one issue but possibly raise others).

But still, parallel concordances can be quite useful and make it easier for students to get involved in doing DDL work. For more info about uses and issues with parallel corpora/concordances I recommend reading ‘Frankenberg-Garcia, A. (2005). Pedagogical uses of monolingual and parallel concordances. ELT Journal, 59(3), 189-198.’


Finally, where are these parallel corpora? A simple google search will turn up numerous parallel corpora available for download, such as the Open Parallel Corpus (OPUS), but that means you need to run your own parallel concordancing software. Something like AntPConc might be a relatively easy-to-use piece of software for this. However, even if you are comfortable running an application like AntPConc, the parallel corpora you find might not be appropriate for your students unless you are in an ESP environment with students learning language for, say, international legal or technical contexts (like the EuroParl corpus).

Alternatively, I’ve compiled a very brief list of some parallel corpora and projects that have web-based interfaces. A caution, though, I am familiar only with the English-Japanese corpora on this list; although some of the others have been used for language learning, or designed with language learning as a goal, I cannot vouch for the pedagogic applicability or accuracy of the other language combinations here (I’ll leave that to folks who understand the languages in these corpora).

Note: All of these are combined with English

Japanese

SCoRE (and WebSCoRE); WebParaNews

Chinese

E-C Concord (more info can be found here); BFSU CQPweb has several parallel corpora (for guests the user ID and password are both test)

Korean

MOA

Thai

ETPC

Polish

PACO for EPPC

Portugese

COMPARA (further information about COMPARA is available here)

Multilingual

Tatoeba; Linguee; Reverso Context

Learner Language

ENEJE (this parallel corpus aligns essays by Japanese EFL students with edits made by native English speakers)


I’m sure there are many more. Feel free to list others in the comments 🙂