Annotations in the MERLIN corpus

The annotation structure

The MERLIN corpus has a mulitlayer annotation. The texts are lemmatized and part-of-speech-tagged. Furthermore, in addition to a minimally correct version of the text (target hypothesis), specific features of the learner language have been annotated. Go to MERLIN for research to learn more about whether the single layers result from manual or automatic annotations (NLP).

The target hypotheses form the basis for annotations of learner language features (L2 features). The "minimal target hypothesis" (TH1) is a minimally intervening version of the learner text that is orthographically and grammatically correct. Annotations of grammatical and orthographical learner language features refer to them (EA1).

In the explorative, smaller MERLIN core corpus, further L2 features regarding vocabulary, pragmatics, sociolinguistic appropriateness, and intelligibility have been annotated (EA2). Very often, those phenomena are not errors. These pilot annotations a rather explorative nature and should be interpreted with caution. They refer to the "extended target hypothesis" (TH2).

All L2 feature annotations have been deduced from various sources and described in detail in the annotation scheme. You can review the development and origin of the indicators on which the annotation scheme is based at MERLIN for reserach. The MERLIN annotations followed a strict policy of reliability control. Again, you can read more about this at MERLIN for research .

Excursus: Interpretating „errors“ with target hypotheses

As learner language (L2) is regarded as an evolving language system in its own right, annotations were not merely based on error coding, but also took into account other linguistic characteristics.

In order to determine whether and to what extent a text deviates incorrectly, there must be a clear idea of what a learner presumably intended to write. In a learner text collection (learner corpus), it is important to make this interpretation explicit to make annotations more easily understandable and to avoid problems of reliability. Therefore, the MERLIN team formulated target hypotheses (TH) that are a corrected version of the learner texts. The team followed the rules developed for the FALKO corpus and adapted them to the project needs where necessary (cf. Reznicek/Lüdeling et al. 2012).

Target hypothesis 1 (TH1) = orthographically and grammatically correct version of the learner text

The "minimal target hypothesis" is a solely orthographically and grammatically correct version of the learner text, but might contain deviations from what a native speaker would say on other levels (e.g., lexical). TH1 interferes as little as possible with the learner text. They were written for the whole MERLIN corpus.

Target hypothesis 2 (TH2) = lexically and pragmatically accetable version of the learner text

The "extended target hypothesis" aims at creating an acceptable (for a native speaker) version of the original learner text. TH2 takes into account more language dimensions that often regard context-dependent phenomena like vocabulary and pragmatics. This assessment could only be made for a smaller part of the MERLIN corpus, the core corpus. It consists of a collection of texts which received either A2 or B2 ratings (for Italian: A2 and B1/B1+).

For examples and more details see MERLIN for research.

Annotated L2 features with examples

The following contains lists of L2 features annotated in the MERLIN corpus that are illustrated by examples from the languages in question.

Grammar tags

Grammar features	Example*
word order in main clause	[Vielleicht du könntest mir bei meine Wohnungssuche helfen.] [Sollst du Wasser und Bikini mitbringen.]
word order in subordinate clause	*[wenn haben Sie Zeit,] dann bitte sagen Sie mir.
negation general	Ich habe [nicht] Zeit.; Er wird dort arbeiten [nein].
CZE: double negation	[mám] žádný čas {nemám žádný čas}; nikdo [volal] {nikdo nevolal}
verb valency: number of obligatory arguments	CZE: Petr vstává v 6 hodin. On nesnídá, protože [on] nemá hlad. GER: Er hat uns nicht gesagt, ob {er} kommen will.
agreement (subject and verb)	Jana [hast] gelesen, Jana [sind] müde
reflexive pronoun	CZE: smála [si] GER: er [entschuldigt], Laura und Ferdinand reden [sich] ITA: [se] {si} lava ogni mattina
CZE: possessive reflexive pronoun	potřebuju [moji] knihu, vidím [mého] otce
inexistent inflection (nouns, adj, verb)	adjective: ein [blaus] Himmel {blauer}; [teuerer] {teurer}; [größen] {großen / größeren} noun: das schöne [Hause], [euche] [Fahrrade] verb: Johannes [trinks] keine Milch. … meine Rechte und Pflichten zu [weißen]; Wie ich dir [gesagen] hate...
wrong inflection (nouns, pronouns, adj)	case: CZE: čte romány a chodí na [procházce]; GER: *… ich suche eine neue Wohnung in [diese] Stadt
	number: *Ich werde zwei [Woche] dort verbringen;
	gender: *Ich brauche [eine] [große] Wagen für die Möbel.
	ambiguous (number? case?): *Die Silvesternacht habe ich mit [meiner] [Kinder] verbracht.
verb: tense	GER: gestern wir [kochen] gemeinsam ITA: Mi ha domandato se [ho] fretta {Mi ha domandato se avevo fretta}
verb: voice	CZE: studenti [budou napsáni] test GER: Peter [wurde gezeigt] mir sein neues Buch; die Stadt [gründete] im Jahre 1234;
verb: mood	CZE: [Jdi] do města? GER: er [würde gehen] gestern ins Kino {ist gestern ins Kino gegangen/ging gestern} ITA: *[Stai] bene!
verb: aspect (CZE+ITA)	CZE: celý den [se naučil] {celý den učil} ITA: imperfetto instead of pass.pross.: sempre pensavo {ho sempre pensato} che voi due
verb formation (morphol.)	errors in the formation of complex predicates (i.e. analytical verb forms, predicates with modals and copulative predicates): er wird [lese]; du musst [kommst]; Diese zwei Frage richtig {zu} beantworten ist nicht einfach.; Der Buchladen [hat] in der Stadt, *Die Studentin [ist] kam in die Schule
main verb	… mit großem Interesse habe ich in XY Zeitung Ihre Anzeige {gelesen}; Ich [nehme] besoche meine Tochter.
preposition	ich warte {auf} deine Antwort; kannst du [bei] mir helfen?, *Er ist gekommen eine Stunde [vor]
article	GER: habe {die} litauische Staatsangehörigkeit; ich bringe [etwas] Geschänk ITA: *[il] mese fa siamo andati;
conjunction	er füttert den Hund, {der/welcher} nicht ihm gehört; er half mir [dass] ich aufstehe, *Karl kam [um] [für] helfen
ITA: clitic	puoi [chiamarla] {puoi chiamarmi}; ho dimenticato di [scrivere] prima {ho dimenticato di scriverlo prima}; *non { c'è } problema
part of speech error	Ich freue mich für unsere [besucht] {Besuch}; Ich bin sehr flexibel und [Mobilität] {mobil}; *Kannst du mich [Hilfe] {helfen}

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Orthografie

Orthographical features	Examples*
general grapheme error	GER: [libe] {liebe}, [Monart] {Monat}; [schreipt] {schreibt};[wie] {wir} ITA: [mo] {ma}; [experienza] {esperienza};
grapheme transposition	CZE: [kraští] {kratší} GER: [revelant] {relevant} ITA: *[saulti] {saluti};
CZE+ITA: diacritical marks	CZE: [kratši] {kratší}; [Váčlav] {Václav}; [ůplný] {úplný} ITA: [e] andata {è}; *[perchè] {perché}
capitalization	*[sie] waren in Frankreich, [Und] danach in Deutschland.
word boundary	CZE: [ne čekala] {nečekala}; [dolesa] {do lesa} GER: [Schlafe zimmer]; [das selbe]; [Desweiteren] ITA: [qui ndi]
abbreviation	CZE: [at.] {atd.} GER: [Sms] {SMS};
punctuation	[Er kam nicht] aber er hat sich nicht entschuldigt. Rom, Paris[,] und Berlin gefallen mir sehr.
GER+ITA: apostrophe	GER: Das ist [Mama's] Buch. ITA: d{‘}accordo

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Intelligibility

Intelligibility of text

In the text, deviations occur.

ITA: *[Ciao Caro. Come stai? Io sto bene. Vorrei andare a trovare te in Italia. Com'è la tua città? È la città grande? O forse una vecchia città? Anche ha il mare, o in vicino? Alla sera vorrei andiamo in discoteca. Qualce volta mangiamo asieme al ristorante. C'è anche possibile per andiamo al cinema. Mi piace per guardare un film. Penso le persone in Italia sono gentili. Ma purtroppo resto solo per cinque giorni. E poi devo ritornare a casa mia. Tanti cari saluti. Un bacio. Maria]

The text is not comprehensible. More than one half of all sentences is not intelligible.

CZE: *[Děkuje za E-mail, že jsi pozval k narozeniním. Tesi mě, a mám otazky: Kde bude místo oslavy? Myslím že bude u tobě, ale kde v Praze? Já ještě nejsem nebyl u tobě ... . A kdy začiná? V pátek večer asi 19 hodin? Kromě toho, kdo a jaké hosty bude při tom? Zdravím, Tomoo]

GER: *[Hallo Julia
meine Frau und Ichwollten nach Köln im ZuG in der nacht vahren wann 2.1.2011 bis 04.01.2011 Ich nehme besoche meine Tochter. Ich kanne nicht Fahren. Fragen Sie Ihre Julia. meine wollten ist gut ich Besoche Kerche und centrem. meine Hotel ist gut
viel Gruße Danke]

Intelligibility of sentence

a) The sentence manifests deviations, but it is still interpretable.

CZE: *[Určitě, dobře si využijeme ten večer.]
GER: *[Diese Wohnung sind Bitte 2,3 km von Centrum, warum ich immer fahre mit meine fahrrad, und meine Beruf ist Kürche Hilfe, immer morgens, und Abend Arbeit.]

b) The sentence is completely incomprehensible

CZE: *[Děkuje za E-mail, že jsi pozval k narozeniním.]
GER: *[meine wollten ist gut ich Besoche Kerche und centrem.]
ITA: *[A queste cita di posto?]

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Vocabulary

Lexical features	Examples*
formulaic sequence: collocation	CZE: [dávej na sebe pozor], [nabyla jsem dojmu], tam [se cítím jako doma] GER: … dass meine Tochter im April ein gesundes [Kind zur Welt bringt]; [Erfahrung im Umgang mit] Kindern und der Haushaltsführung; [den Teufel an die große grüne Wand malen] ITA: [ho suonato il pianoforte] - *[ho suonato] per tante tante ore [il pianoforte]
formulaic sequence: compound equivalent (ITA)	[occhiali da sole], [ferro da stiro], *[lista di desideri] {lista di nozze}
formulaic sequence: idiom	CZE: najít klíč ke štěstí, mít černé svědomí GER: etwas auf die lange Bank schieben; Morgenstund hat Gold im Mund. ITA: {non cavare un ragno dal buco}
formulaic sequence: communicative phraseologism	CZE: pokud vím, tak...; mám na mysli...; upřímně řečeno...; jak bylo řečeno výše...;přejděme k dalšímu... common places: Co se stalo, stalo se. dicta: Méně je někdy více, Vše má své výhody a nevýhody. GER: Wie geht’s, wie steht’s?; Mach dir nichts draus.; ich meine ... ; meines Erachtens ... common places (e.g. Was man hat, hat man.) dicta (geflügelte Worte) (e.g. Nicht immer, aber immer öfter.) ITA: non so che dirti ... a; scolta ...; come dico sempre ... common places: Quel che è fatto è fatto. dicta (geflügelte Worte): Non ha prezzo.
non-existing form (word / formulaic sequence)	CZE: výsledky [průžek] {?}; [trvali] čas {trávili}; urobit GER: Kaus; wer will schon Staub essen; … ist ein Menefreghista ITA: * passegere {meaning passeggiate}; bisogna mangiare una mela acida; compra milk e tomatoes
semantic error: denotation (word / formulaic sequence)	CZE: [využít si] života {užít si} života (1) [zaměstnání na celou dobu] {zaměstnání na plný úvazek} (1) GER: kauen {essen}, sich die hand mit warmem Wasser verbrennen (1) Ihr Baby [gewohnt]! (0) Das ist eine schwierige Zeit. Jetzt müssen wir alle [ins Gras beißen] {die Zähne zusammenbeißen} (0) ITA: [venire] {andare}, [imparare] {studiare} (1) ---- (1) minor deviation from meaning (0) wrong, incomprehensible, hardly or not inferable from context
semantic error: connotation (attitude), (word / formulaic sequence)	CZE: [barák] {dům}; odejít navěky {zemřít} GER: [Köter] {Hund}, [Alter] {Vater}; ins Gras beißen {sterben} ITA: bagnarola {(vecchia) automobile/imbarcazione}; i miei vecchi {i miei genitori]
semantic error: precision (word / formulaic sequence)	a) semantically acceptable and comprehensible but unusual, not precise GER: eine [Liste] {eine Liste mit Wohnungsadressen} ITA: vi devo chiedere qualche cosa {informazione} CZE: doma má roztomilé [zvíře] {psa} b) semantically acceptable but imprecise; a specific term/sequence exists to express the same meaning GER: [eine Firma, die Bücher macht] {Verlag} ITA: [per me ci sono tante cose nuove] {per me ci sono tante novità} CZE: *vzdělání [dalo pro mě velkou pomoc] {pomohlo mi}
word formation error: derivation	CZE: odpovědání {odpověď}, opravdivý {opravdový} GER: Suchung {Suche} , [unheilsam] {unheilbar} ITA: *bracciare instead of abbracciare
word formation error: composition	CZE: životuschopný {životaschopný} GER: Sprache Kurs {Sprachkurs}, [Türhaus] {Haustür} ITA: ferro di stiro, areoporto
formulaic sequence: form error	CZE: je to [jen] příklad z mnoha {je to jen jediný příklad z mnoha} brát něco [doslova] vážně {brát vážně} / {brát doslova} [známkové] oblečení {značkové oblečení} Kdo jinému [kopá jámu], sám do ní padá. {Kdo jinému jámu kopá, sám do ní padá.} GER: etwas auf [die] Bank schieben {etwas auf die lange Bank schieben}; … ist meiner [Meinung], nicht ein großes Problem {meiner Meinung nach}; Öl ins [heiße] Feuer gießen; den Teufel an die [große] Wand malen * in Betracht [nehmen] {in Betracht ziehen} Der Apfel fällt [vom Baum nicht weit.] ITA: [carini] {miei cari} prendere due piccioni [neri] con una fava instead of prendere due piccioni con una fava la stagione d'estate {la stagione estiva} * tanti [saluti cari] {tanti cari saluti}

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Coherence/Cohesion

Connector accuracy

GER:
*dort gibt es viele Studenten [als] die Miete nicht sehr hoch ist
* Ich will auch Istanbul besuchen, [weil] schicke mir bitte Informationen.
*[Ich fände es am besten eine Möglichkeit gäbe,] eine Unterkunft in einer Gastfamilie zu bekommen.
*[Für] was die Familien angeht, .

ITA:
*Gli rivolgo allo scopo [che] ho qualche domanda.
*La mia famiglia gioca volentieri a pallavolo, non [però] c'era nessun possibilità nella Residence"
*Il mio titolo di studio è l'insegnante e per questo motivo mi piace lavorare [anche] con i bambini, organizzare le gite e l'altro divertimento

CZE:
*Chtěla jsem se zeptat [pokud] máte parkoviště protože přijdu s autem.
*Přinese [pokud] nějaké jídlo, můžeme mít oběd.
*[Potřebovala bys pomoct,] klidně napiš.
*[Pokud] přijde-li, budu rád.

Content jumps

GER:
*Ich habe am Wochenende deine Brief bekommen. Das ist schön dass Anna in den Kindergarten und Max ist in der dritten Klasse. [Ich möchte eine Hasen haben, aber ich habe Allergie für die Haar.]
*Ich bin verheiratet und habe ich 3 Kinder. [Wir arbeiten bis 04:00 Uhr.]

ITA:
*Ich habe am Wochenende deine Brief bekommen. Das ist schön dass Anna in den Kindergarten und Max ist in der dritten Klasse. [Ich möchte eine Hasen haben, aber ich habe Allergie für die Haar.]
*Ich bin verheiratet und habe ich 3 Kinder. [Wir arbeiten bis 04:00 Uhr.]

CZE:
*Prázdninové kurzy češtiny jsou zajímavé. Chci se ucházet o kurz a asi stipendium. Kolik to stojí? Kdy začina kurz? [To je kouzelný, že zůstam chvilečku v Praze.] Můzeš posílat mě toho inzerát?

Reference

GER:
*meiner Küssen für [ihre] (=deine) Kinder
*Die Frage ist sehr zusammengesetzt, [es] lässt sich nicht so einfach beantworten.

ITA:
*Spero che la vostra [=tua] famiglia anche è sana
*Se glielo non [te lo] pagano devi lavorare in una ditta nel tempo libero

CZE:
*paní, jehož se ptal {jíž},
*dal jsem to jeho bratrovi {jejímu}
Budeš mít narozeniny? Jaký dárek si přejete?

Metacommunicative device

GER:
im Folgenden; zusammenfassend; erstens, zweitens, drittens; wie wir besprochen haben; Jetzt wechseln wir das Thema

ITA:
insomma; in conclusione; in primo luogo; in secondo luogo; in altre parole, in breve; inoltre; si osservi poi; si noti, in particolare, che

CZE:
zároveň; nadto; navíc; potom; především; ani – ani; jednak – jednak; popřípadě; prostě; přesněji; tedy; totiž; tudíž; vlastně

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Sociolinguistic appropriateness

Sociolinguistic features	Examples*
salutations / complimentary closes	CZE: [Ahoj Davide]; [Dobrý den Pane ředitele hotelu] GER: [Hallo Maria]; [tschüß Herr Meier] ITA: [Ciao Francesco] [Tanti saluti, Maria]
opening / closing formulae	CZE: [S přáním hezkého dne]; [Mejte se hezký] GER: [Vielen Dank für Deinen Brief. Ich habe mich sehr gefreut.] ITA: [Aspetto la sua risposta al più presto]; [Come stai?]
inappropriate style (formality)	introducing a letter to a friend with CZE:[Ahoj pane řediteli] GER:[Sehr geehrter Marco]; [Willkommen in „Stadt X“] (task: Bericht über Wohnungsmarkt) ITA:[Egregio Andrea], [ti ringrazio cordialemente della tua gentile lettera del 12 m.s.]
inappropriate addressing (formality)	in a formal letter: CZE: Prosím, [máš] další informace pro mě? ITA: Mi [puoi] dare informazione sulle condizioni? GER: Kann ich Informationen von [euch] bekommen? in an informal letter:* CZE: Bylo by dobré, kdybyste přijela do Drážd'an. ITA: [Vi] ringrazio per la tua email. GER: *Kommen [Sie] nächste Woche mich besuchen?
ITA: lexicalised clitics (verbi procomplementari)	-CI andarci, arrivarci (arrivarci a capire) -LA contarla, farla (la fa a tutti), farla franca -LE (darle, prenderle) -NE farne (farne di tutti i colori), volerne (non volermene) -CELA (avercela, mettercela, farcela) -CENE (volercene,) (corrercene) -CISI (mettercisi) -SELA cavarsela, cercarsela, contarsela, darsela (darsela a gambe) -SENE (andarsene, fregarsene, intendersene, restarsene, rimanersene, starsene, tornarsene, venirsene (venirsene a casa) POLIREMATICHE: darci dentro, dormirci sopra, mettercela tutta
ITA: personal pronoun redundancy	[A me mi piace]...; [A lui] non [gli lascio] nulla. [Ne racconta di] storie! [Mi bevo] una birra; [Mi vedo] un film
ITA: marked syntactic structures	Frasi scisse (cleft sentences) sei tu che hai detto questo; Sono le foto che mi fanno pensare alle vacanze dell'anno passato Dislocazioni a sinistra Che non sarei venuto, lo sapevi benissimo.; La spesa l’ho fatta ieri quindi oggi sono libera. Disclocazione a destra Ne voglio parlare con te, dei miei problemi.; Non preoccuparti! Lo portiamo noi, il vino! c'è presentativo (special kind of cleft sentence) C'è mia cugina che ti vuole parlare.; C’è Andrea che ti cerca. Cosa gli dico?
ITA: 'che polivalente'	[Vieni qui che ti voglio dare qualcosa.] {in modo che/perché} [Ho sentito cose che non avevo fatto caso] {a cui non} [Il paese che sono stata] {in cui/dove}
GER: main clause word order after 'weil'	Ich habe Hunger, [weil es ist ja auch schon ganz schön spät.]

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Pragmatics

Features	Beispiele*
direct REQUEST	CZE: Prosím, poslej mně to inzerát. Mam jenom dvě nebo tři otázky…; Neříkáš jestli oslavy bude poledne nebo večer. Čekám za odpovědaní. GER: Fragen Sie Ihre Julia.; Ruf mich bitte an. Aus ausgegebenen Gründe fördere ich mich Zurückerstattung diese Kosten.; Bitte nicht vergessen! ITA: Ne pensi e fammi sapere la tua decisione. ; Mi chiami per dirmi. Vi prego di farmi sapere se avete bisogna delle informazione ulteriore; Fatemi sapere!; Portammi il libro!
indirect REQUEST	GER: Entschuldingung. Aber möchte ich ein PostCard von Istanbul kannst du mir schinken? Ich wünsche mir aus Istambul einige Postkarte. Können Sie bitte meine Katze füttern [?] CZE: Mohl bys mi poslat ten inzerát? Chtela bych uvidět tvůj novy byt! Hodi si ti to? ITA: Potreste mandarmi la lista dei corsi al mare e la possibilità di alloggio * Puoi dirmi dove lavori adesso e che cosa fai? * Potreste organizzare un posto dove posso dormire?

Features

Beispiele*

direct REQUEST

CZE:
*Prosím, poslej mně to inzerát. Mam jenom dvě nebo tři otázky…; *Neříkáš jestli oslavy bude poledne nebo večer. Čekám za odpovědaní. GER:
Fragen Sie Ihre Julia.; Ruf mich bitte an.
*Aus ausgegebenen Gründe fördere ich mich Zurückerstattung diese Kosten.; Bitte nicht vergessen!

ITA:
Ne pensi e fammi sapere la tua decisione. ; Mi chiami per dirmi.
*Vi prego di farmi sapere se avete bisogna delle informazione ulteriore; *Fatemi sapere!; *Portammi il libro!

indirect REQUEST

GER:
*Entschuldingung. Aber möchte ich ein PostCard von Istanbul kannst du mir schinken?
*Ich wünsche mir aus Istambul einige Postkarte.
*Können Sie bitte meine Katze füttern [?] CZE:
*Mohl bys mi poslat ten inzerát?
*Chtela bych uvidět tvůj novy byt! Hodi si ti to?

ITA:
*Potreste mandarmi la lista dei corsi al mare e la possibilità di alloggio
* Puoi dirmi dove lavori adesso e che cosa fai?
* Potreste organizzare un posto dove posso dormire?

* [...] tag-relevant extracts of learner language expressions {...} correction of the erroneous learner expression

Hint: A comprehensive overview of the annotated features is provided in the annotation scheme. To learn how to search MERLIN for annotated features go to Search and help.