SYNDATA. Corpus collection of marked syntactic constructions
SYNDATA (“syntactic data”) is a corpus collection of marked syntactic constructions in Italian, French, Spanish, English and German. Most of these structures have been extracted from the CONTRAST-IT and COMPARE-IT corpora in the frame of the ICOCP research project (2011-2015).
The SYNDATA corpus collection comprises marked syntactic constructions belonging to the group of cleft constructions. Here are some representative corpus examples of these constructions:
In the SYNDATA corpus collection, each example is given with a minimal context, ranging from 1-3 sentences before and/or after the marked syntactic construction. For copyright reasons, we are not able to provide more context for each occurrence. However, most of these example can easily be found on the internet or in the CONTRAST-IT or COMPARE-IT corpora.
Composition of the SYNDATA corpus collection
The following tables specify the number of corpus examples collected for each language (we also distinguish Italian written in Italy and in Switzerland). In the third column to the right, you find the name of each file (this is important information if you want to receive the files; for more information on this issue, see “How to get the files” below).
In the future, we plan to code each corpus example for a series of grammatical and informational properties and release a database with new data and a new interface. For the beta version of the first release, we would be grateful for any suggestions on improving the data and on the features that would be useful to code in the search interface of the database.
How to get the files
The SYNDATA corpus collection includes a series of files (in the form of word documents) listing separately specific marked syntactic constructions in a given language. To receive one or more file(s), please email anna-maria.decesare@clutterunibas.ch with the following information:
This information is intended to give us an idea of how the data is used and how to further develop and improve it in the future.
Acknowledgments and how to quote SYNDATA
The data included in the SYNDATA corpus collection has been gathered by the following SNSF-collaborators:
If you use SYNDATA in your research, please acknowledge it by referring to:
De Cesare, Anna-Maria (2011-2018), Contrast-it. SYNDATA. University of Basel. <link en syndata>https://contrast-it.philhist.unibas.ch/en/syndata/