The Media Arabic Text Collection

Are you a student of Arabic?

Do you want to know more about the didactic principles of the Media Arabic Text Collection? If yes, please continue reading this page.
Do you want to start using the Media Arabic Text Collection? If yes, please click on this link to the practical tips how to use the Media Arabic Text Collection.
There is a video on YouTube in which I explain how the MATC works. Link:

NEW: Radio fragments in the MATC

NEW: Launch of the Advanced Level of the MATC

This project covers training and improvement of reading and comprehension skills in Media Arabic for those who are at the intermediate high or advanced low stage in their learning (B1 and/or B2 of the CEFR) and who are advancing from comprehension on the sentence level to comprehension on the text level. The project’s text collection facilitates the transition from reading texts written for educational purposes to the next phase of reading authentic texts written by native speakers to convey a message to other native speakers.
For this purpose I have created a collection of texts and a web-based tool for presenting these texts, equipped with various tabs presenting information on different linguistic levels. These levels will be presented after the explanation of the didactic principles.

Didactic principles
The collection of Media Arabic texts is treated and presented on the basis of two main didactic principles and two secondary principles. The main principles cover the lexical level (vocabulary) and the syntactic level. The secondary principles cover phraseology (combinations of words) and connectors (words creating coherence in a text).

On the lexical level it is my personal experience that a limited vocabulary of 2500-2600 content words (i.e., not function words like pronouns, prepositions, etc.) is sufficient to cover 90% - 95% of most media reports covering non- specialized topics. This vocabulary is available as a pdf and students are strongly encouraged to memorise these words. A separate page tells more about the vocabulary. The texts will provide glosses to less frequent words.
On the syntactic level it is my experience as a teacher of Arabic that students can run into 'syntactic hurdles' while reading sentences. The hurdles are difficulties in recognizing the various constituents of a sentence and the relationships between these constituents. In this collection of texts I have developed a set of codes inserted into sentences to help students overcome those syntactic hurdles and thus better understand what they are reading. The principles of the set of codes and hurdles are explained on a separate page.
The set of syntactic codes is divided into three proficiency levels. The user can choose the level appropriate to his proficiency.

Secondary didactic principles:
On a more advanced level of language proficiency the learner has to acquire knowledge of phraseology, i.e., combinations of words (often collocations) or multi-word expressions (idiomatic expressions or fixed expressions). These relations between words in the texts are indicated in order to improve the awareness of the learners of these combinations.
On another page I present a table in which I have merged together phraseology of all texts in the current collection.
Texts consist of sentences and the relationship between sentences (causal, temporal), and the coherence of a text is created by specific words expressing these relations. Those words are called connectors. Connectors can be conjunctions, adverbs or prepositions, but content words like ‘cause’, ‘result’, ‘next’, etc. can also be considered connectors since they can be essential for the coherence of a text. This category of content words is called 'signal words'

Information available at various linguistic levels
Each text from the collection will be presented in different ways using separate tabs. The specific tabs are discussed below on a separate page, here I will briefly mention the different types of information. Each text will appear in 10 tabs and on each tab a different type of linguistic information will be added to the text.

The first tab contains meta information about the text.
The other tabs contain linguistic information on different levels:

In the Arabic tabs, proper names of persons and geographical entities are printed in a different colour, as a solution to deal with the orthographic hurdle of the absence of capital letters in Arabic.
With all these types of information, the learner will be able to pursue an in-depth study of a text, focussing on shape through various linguistic levels, and on content while reading or by answering the questions or consulting the translation. To facilitate off-line review there is some information available for downloading in Excel (vocabulary) or in pdf format (plain text version, phraseology).

Media Arabic Texts Collection
Presently the collection contains 3 subsets of texts:
All texts were retrieved from the Al Ahram website. The texts about Tunisia represent a pedagogically useful degree of depth and contain news items written about a nearby Arab country, with some detachment and without too many details. The length of the articles is mostly between 60 and 200 words. The length is mentioned in the meta-data in the About tab.
The topic of the Palestinian/Israeli conflict has been, and in the foreseeable future will remain, an important topic in Arabic media coverage and therefore should be part of this collection.
The short texts are between very short and short, sometimes consisting of only one sentence or one paragraph. These short texts can be used to retain one’s Arabic, if only limited time is available. Reading these short texts will take just a few minutes. Short texts are available on various topics: politics, religion, migration, etc. These topics and vocabulary can differ substantially from 'normal' news reporting.
The number of topics will grow in the near future.

All texts are authentic, retrieved from the Al Ahram website and occasionally slightly edited. Spelling corrections have been made to all texts since Egyptian spelling of MSA often does not obey the rules.
Texts are numbered in chronological order but texts may not be very recent, although none were written before 2016. It is my experience in teaching Media Arabic that names of leaders and politicians may change but rarely do their manners and behaviours. The year of publication of each text is mentioned in the About tab.

On the ‘How to Use the Tabs’ page you can read how these ideas and principles were turned into a practical web-based tool. With this tool I have prepared various supporting pages and resources.
After reading this rather extensive and abstract explanation I invite you to try the tool yourself. Start with the Text Selection page.
Or you might first read the more practically oriented explanation meant for students: How to use the tabs.

There is a video on YouTube in which I explain how the MATC works. Link:

NEW: Radio fragments in the MATC

I am deeply grateful to Jan Verhaar for technically making this web-based tool available. Jan has his own web-based tool for the conjugation of verbs in Arabic.

NEW: Launch of the Advanced Level of the MATC

Arabic Media Text Collection by Jan Hoogland is licensed under  CC BY-NC-ND 4.0