In the Media Arabic Text Collection (MATC) phraseology is one of the key components since every text has a tab specifically dedicated to phraseology. Two subcategories of phraselogy are indicated in these tabs: collocations and multi word expressions. You can read more about this on the MATC explanation page.
If you want to learn to speak and write Arabic correctly, it is important that your command of phraseology is correct and this is an important element to proceed to higher levels of proficiency.
To help you benefit from the phraseology in the MATC I have decided to collect all the phraseology data from the texts and merge them in one Excel table. Since the MATC is still growing, this database will grow accordingly. This means the downloadable file will be updated on a regular basis. Updates will be announced on the MATC News page.
To enable the user to maximally benefit from this data I have decided to make the original Excel database available for downloading. I request you to respect the Creative Commons Licence conditions as mentioned in the bottom of this page.
You can download the file here.
In addition I made a downloadable pdf available for those who are not familiar with working in Excel.
What can you do with the Phraseology file? (not exhaustive)
for 'pure' media terminology, you could filter out the SHO texts, they cover a much wider variety of topics than the TUN and PAL texts
sort by the column of the first keyword or the second keyword. Thus you can learn from different collocations with (for example) the noun 'ijrâ' as its first keyword, or collocations with (for example) the adjective muhimm as its second keyword. By default the file will be sorted by the first keyword.
sort by root of the first or second keyword.
sort/filter by 'type of phraseology' to group all multi word expressions
Description of the columns in the database
A Arabic collocation (or mwe)
B English translation
C source text
D type of phraseology (if not mwe it is a collocation, see below for an explanation of the various categories of collocations)
E first keyword (database is sorted by this field, you can change this yourself)
F root of first keyword
G second keyword (can be empty)
H root of second keyword
Frequent collocations may occur more than once in the merged version. You can consider this as an indication of the frequency and importance.
The database will by default be sorted by the first keyword. The first keyword is generally the most specific word, i.e. the lemma where you would look up the combination in a dictionary.
The second keyword of a MWE can be more or less randomly chosen.
An MWE can have only one keyword if other constituents are function words.
Collocations represent the vast majority of phraseology items in the MATC.
I have added information about different categories of collocations in the column 'Type' of the table with phraseology.
To explain this I will give a short introduction about collocations.
Collocations are combinations of two words related to each other.
There exist grammatical collocations and lexical collocations. The first type consists for example of a verb and a preposition. I will not treat this type of collocation in the MATC.
The second type, lexical collocations, consists of a combination of two words related to each other on the basis of their meanings. A lexical collocation in English is for example 'to commit a crime' or 'to perpetrate a crime'. These are combinations of a verb and a noun as its object. Another examples is 'a vicious crime', in which the same noun combines with an adjective.
In 'Media language' lexical collocations are very frequent. When for example laws are mentioned, they often are violated or maintained (law as an object). The law can forbid or allow something (law as subject of a verb) and a law can be severe or controversial (law specified by an adjective).
All these lexical collocations exist in Arabic too. And based on the POS (part of speech) of the two constituents they can be classified in different categories.
In the introduction above we have already seen:
noun as object + verb
noun as subject + verb
noun + adjective
These three types are the most frequent in Media Arabic (and probably in language in general).
But there are more categories:
noun + noun (construct phrase, iDâfa)
noun (maSdar) + preposition + noun
noun as indirect object + verb with preposition
verb + adverb
adjective + adverb
adjective + noun
Less frequent categories:
adjective (participle) + preposition + noun
noun + adverb (circumstantial accusative)
a pair of two words, more or less synonyms, frequently mentioned together
In the following section I will present a few examples for every category. For these examples I will use combinations with the two most frequent nouns in the table: qânûn or 'ijrâ', if these are present in the table.
NN, noun + noun (construct phrase, iDâfa)
In the table you will find two different categories N1N2 and N2N1. N1N2 means the first word of the construct phrase is mentioned in the first column of the table (Key1) and the second word in the second column (Key2). With N2N1 this is vice versa: the second word of the construct phrase (in general the most specific word) is mentioned in the first column. This first column would correspond to the lemma in a dictionary to which I would include the collocation. So the difference between N2N1 and N1N2 is only relevant for dictionary makers ;-).
PAIR, a pair of two words, more or less synonyms, frequently mentioned together الأَمْن والِاسْتِقْرار
You will understand that collocations of different categories can be combined.
A noun can collocate with a verb (NOV, NSV, NPV) and with an adjective (NA). So this can lead to a combined collocation, for example:
NOV+NA = Verb + Noun(object of verb) + Adjective (modifying the noun)
Obviously the same can be applied to the cateogories NSV and NPV.
And other combinations are possible:
N1N2+NA = Noun + Noun (construct phrase) + Adjective (modifyig noun1 or noun2)
N1N2+N1N2 = Noun + Noun + Noun (construct phrase of 3 components)
NA+NA = Noun + Adjective + Adjective (noun modified by 2 adjectives)
Here I will present a few examples taken from texts in the MATC:
فَتَحَ تَحْقيقًا قَضائيًّا
أستاذ العلوم السياسية
ٱنْسِحابِ قُوّاتِ ٱلِٱحْتِلالِ
قرار مجلس الأمن
المؤسسات الإقليمية والدولية
الجهود الدبلوماسية مستمرة
In addition you might want to know that combinations of more than two collocations are possible too. Here I will present one example of a series of 3 collocations: NOV+NN+NA (verb+noun+noun+adjective):
اتَّخّذَ إجْراءاتِ أَمْنٍ صارِمةً