The Classical Arabic Text Collection (CATC)

This project covers training and improvement of reading and comprehension skills in Classical Arabic for those who are at the intermediate high or advanced low stage in their learning (B1 and/or B2 of the CEFR) and who are advancing from comprehension on the sentence level to comprehension on the text level. The project’s text collection facilitates the transition from reading texts written for educational purposes to the next phase of reading authentic texts written by and for native speakers.
For this purpose we have created a collection of texts and a web-based tool for presenting these texts, equipped with various tabs presenting information on different linguistic levels. These levels will be presented after the explanation of the didactic principles.

Texts in the CATC
We have selected 20 stories from the collection Alf Layla wa Layla (1001 Nights, best know as the Arabian Nights). The stories were selected on the basis of amusingness, diversity and length. A few short but amusing texts have been combined to constitute one text in the CATC. The full (unvowelled) text of the Alf Layla wa Layla can be downloaded in two parts. Part 1 and Part 2.
In the near future we intend to add two other collections (Murûj adh-dhahab wa-ma‘âdin al-jawâhir of al-Mas'ûdî, Maqâmât of al-Hamadhânî), in ascending degree of difficulty.

Didactic principles
The collection of texts is treated and presented on the basis of two didactic principles covering the lexical level (vocabulary) and the syntactic level.

We want to offer the user glosses containing information about the meanings of words and some morphological information. However, we do not present glosses for all words. We assume the user has a basic vocabulary knowledge of Modern Standard Arabic. Our 'definition' of basic vocabulary knowledge is that the user knows (at least) the 2000 most frequent Arabic words from the Frequency Dictionary of Buckwalter and Parkinson. So in the Vocabulary Tab we present glosses with words that are not included in that basic list.
This Basic Vocabulary List (BVL) is available as a pdf and students are strongly encouraged to memorise these words before starting with the CATC. On the MEMRI vocabulary training platform I have created a group of 4 courses containing the 2000 words of the Basic Vocabulary. This link will lead you to the first course.
If a word occurs in a text and is NOT in the BVL, it will be supplied with a gloss only once, i.e. only the first time it occurs in the collection. In other words: the user is strongly advised to memorise the words presented with glosses in order to extend his vocabulary knowledge and to efficiently proceed to a next text in the collection. With each text we supply a downloadable Excel worksheet containing the glosses of that text (download example). Furthermore we are experimenting with a course in the webapp Memrise to help the user to memorise these words. The total number of 'new' words in this subset is about 1300 words. This means a selection of 20 stories from the Arabian Nights consisting of 15000 words is fully covered with a vocabulary knowledge of 3300 words (2000 BVL + 1300 glosses).

Syntactic structures:
It is my experience as a teacher of Arabic that students can run into 'syntactic hurdles' while reading sentences. The hurdles are difficulties in recognizing the various constituents of a sentence and the relationships between these constituents. Earlier I have developed a set of codes inserted into sentences to help students overcome those syntactic hurdles and thus better understand what they are reading. The principles of the set of codes and hurdles are explained on a separate page.
The system of syntactic coding was developed for the collection of Media Texts (MATC – Media Arabic Text Collection) in which Modern Standard Arabic is the language used. The syntactic coding system contains 61 different codes explaining 61 different syntactic phenomena, of which many are seldomly used in Classical Arabic (CA). On the other hand, some syntactic phenomena are more frequent in CA than in MSA (f.i. the Circumstantial Clause – Hal sentence, or the use of as a negation). The set of syntactic codes is divided into three proficiency levels. The user can choose the level appropriate to his proficiency.
The syntactic coding system is thoroughly explained on this page and in this Youtube video.

How To Use Tabs
On a separate page you will find an extensive explanation of the different tabs.

Remarks for each tab regarding the CATC

Plain Text Tab
In this tab you will the text without vowels. There is also a downloadable pdf containing both the vowelled and unvowelled version of the text. Press the Download button to download it.

Vowelled Text Tab
In this tab you will find the text with vowels. There is also a downloadable pdf containing both the vowelled and unvowelled version of the text. Press the Download button to download it.

Vocabulary Info Tab
See the 'How to Use Tabs Page' for an explanation. What is mentioned on that page about the Top Media Vocabulary (TMV) is not applicable to the CATC. Earlier on the present page you have read about the vocabulary list used in the CATC.

Syntactic Info Tab
See the remark earlier on this page concerning the fact that the syntactic coding system was not developed for Classical Arabic texts. The functioning of the system is extensively explained on the 'How to Use Tabs Page' and other pages it links to.

Vocab/Syntax Tab
Since this tab is a combination of the previous two tabs, al remarks made above also apply to this tab.

Phraseology Tab
This tab is empty in the CATC. This tab was developed to point out how in MSA, and particularly in Media Arabic, frequent and fixed combinations of words occur very frequently. This type of combinations is far less frequent in CA. If you want to know more about this phenomenon, you can read the page about collocations.

Connectors Tab
This tab is empty in the CATC. This tab was developed to point out how in MSA, and particularly in Media Arabic, sentences and constituents of sentences are connected. The CA texts, and certainly the 1001 Nights texts, are in most cases interconnected in less complicated orders. Very often there is a chronological order of events in linear order.

Questions Tab
This tab is empty in the CATC.

Translation Tab
In this tab we provide an English translation of the Arabic text. The translation is not meant to be a literary translation, but a pedagogical translation in which we have tried to translate the English text very close to the Arabic source text so the user can easily recognise which English word is representing which Arabic word. The user who wants to read a literary translation, rather than our ‘Arabic in English disguise’ can chose from various existing translations, available in almost every library.
Readers might feel uncomfortable with those passages in the Alf Layla wa-Layla stories that deal with violence, racism, sexism and sex. 'By God' in the literal translation might be felt as a curse, which it is not in Arabic.

About Us
Roel Otten, retired assistant professor of Arabic at Utrecht University, the Netherlands, selected and prepared the plain and vowelled texts and provided them with glosses, notes and literal translation. In performing these tasks he has used with great gratitude the help and advise of those mentioned below.
The system of tabs, glosses and syntactic coding was developed by Jan Hoogland, retired assistant professor of Arabic at Radboud University, Nijmegen the Netherlands.
The web based application was realised by Jan Verhaar, physicist interested in languages and in Arabic in particular..
Corné Hanssen, assistant professor of Arabic and Islam in the University of Utrecht, the Netherlands, helped with usefull comments.
Richard Fox and Bob Godard made language corrections in the literal translation.

Classical Arabic Text Collection by Jan Hoogland is licensed under  CC BY-NC-ND 4.0