The MERLIN corpus

 

The MERLIN corpus contains 2,286 texts for learners of Italian, German and Czech that were taken from written examinations of acknowledged test institutions. The exams aim to test knowledge across the levels A1-C1 of the Common European Framework of Reference (CEFR).

 

Texts and test institutions

The texts are writing samples from TELC - language tests (German and Italian) and the exams of the University of Prague (Czech). The tasks are standardized and aligned to the Common European Framework of Reference for Languages (CEFR). ...

Both institutions, TELC as provider of the European Language Certificates and the Test Centre of the Institute of Language and Preparatory Studies at Charles University in Prague, are full members of ALTE. They offer internationally recognized language examinations in line with the high ALTE standards.

more information on the tests

more details on data preparation for annotations

 

 

The relation to the Framework of Reference - the MERLIN rating grid

To ensure an immediate relation to the CEFR, specially trained testers re-rated all exam texts using the MERLIN rating grid that was developed within the project. ...

The reliability of the ratings was subjected to rigorous statistical verification procedures. As a result, a reliable rating profile is created for each text in the corpus. The profile reflects both a general holistic overall level and the individual rating criteria detailed below:

general linguistic range | vocabulary range | vocabulary control | grammatical accuracy | coherence | sociolinguistic appropriateness | orthography

download the MERLIN rating grid

more information on the re-ratings

 

Test tasks

We provide a comprehensive overview of the test tasks by target language and CEFR level tested. ...

The level of the test may differ from the level that the learner text received in the re-ratings.

The tasks are represented using a grid that was developed for these purposes by ALTE (Association of Language Testers in Europe, www.alte.org). The grid contains detailed information about the tasks and the specific characteristics of the intended text, e.g. regarding topic, register, domain (author: Olaf Bärenfänger).

 

General notes on task descriptions pdf
Please cite the task descriptions as: MERLIN project, task description: <name of the task>, 2014, http://merlin-platform.eu 

 

German

A1

Informal e-mail: ask a friend for help with finding an apartmentpdf
Informal e-mail: arrange an appointment with a friend to go swimming togetherpdf
Informal letter: congratulate to birth of a childpdf

 

A2

Formal letter to housing office pdf
Informal letter: ask friend to take care of pet pdf
Informal letter: offer a ticket not used to a friend pdf

 

B1

Informal letter for New Year to a friend pdf
Informal letter to a friend announcing a visit pdf
Informal letter: birthday congratulations pdf

 

B2

Formal letter: ask for information at Au pair Agency pdf
Formal letter: Au pair writes letter of complaint to Agency pdf
Formal letter: apply for internship in sales department pdf

 

C1

Essay: why it's of value to learn German pdf
Online article: about sticking to one's traditions and "assimilation" in a new environment pdf
Report: about the housing situation pdf

 

Italian

A1

Informal e-mail: reschedule an appointment pdf
Informal e-mail: help a friend who is looking for work pdf

 

A2

Informal letter: go see a friend pdf
Informal letter: contact a friend after a long time pdf
Informal letter: inform friends about language course pdf

 

B1

Formal letter: inform oneself about language course pdf
Informal letter: cook with teacher pdf
Informal letter: answer to a wedding invitation pdf
Informal letter: help a friend who is looking for work after school-leaving exam pdf

 

B2

Informal letter: help someone who has problems with chats pdf
Formal letter: describe experiences with language learning pdf
Formal letter: complaining against a hotel pdf
Formal letter: ask for information about International Cooking Evenings pdf
Formal letter: inform oneself about an aid project pdf
Formal letter: apply for an internship in a company pdf
Formal letter: apply for an internship in fashion sector pdf

 

Czech

A2

Informal e-mail: answering a birthday invitation pdf
Description of a photo: swimming in the sea pdf
Formal e-mail: write an email to a hotel pdf
Description of a photo: playground pdf
Description of a photo: woman sitting at the window pdf

 

B1

Informal e-mail: answer to the email of Alena, a friend pdf
Informal e-mail: answer to the email of Jana, a friend pdf
Formal e-mail: Information request, e-mail to a Tandem agency pdf

 

B2

Essay: Everywhere well but at home the best pdf
Essay: No pain no gain pdf
Essay: A friend in need is a friend indeed pdf
Essay: More people know more pdf
Essay: School is the basis of life pdf
Essay: Clothes make the man pdf

 

Available metadata

Each text in the corpus is described with the following metadata:

 

Information about the text author:
age, gender, mother tongue (L1)

Information about the test:
task ID and topic, CEFR level of the test the written production was extracted from

Ratings
Overall rating: CEFR level the test received in the re-rating

Fair CEFR level according to single rating criteria: general linguistic range | vocabulary range | vocabulary control | grammatical accuracy | coherence | sociolinguistic appropriateness | orthogaphy

 

You can define a subset of learner texts using these metadata in » Define a subcorpus. In this way you can analyse phenomena of the learner language for certain groups of learners.

Metadata can be displayed for each text:
» Simple search: click on "view learner info and ratings" in your search result
» Advanced search: click on the [i] in your search result
» Define a subcorpus: click on "view learner info and ratings" in your search result

 

The MERLIN corpus in figures

 

Number of texts per CEFR level of the test (test level) compared to the number of texts per CEFR level assigned in the re-rating (fair average)

 

Test Level

Fair Average

Czech

 

 

A1

1

A2

111

A2

189

B1

143

B1

165

B2

188

B2

81

 

 

C1

2

Italian

A1

207

A1

29

A2

202

A2

378

B1

201

B1

394

B2

201

B2

2

German

A1

206

A1

57

A2

209

A2

297

B1

210

B1

331

B2

204

B2

293

C1

204

C1

42

 

 

C2

4

Total

2286

2265

 

Number of annotated texts per annotation layer

 

 

Czech

German

Italian

Total number of texts

442

1033

813

Texts that received TH1

440

1033

813

Texts that received EA1

361

752

754

Texts that received TH2

231

275

154

Texts that received EA2

198

258

85