Pronoun Reference in the Quran

From Wiki

Jump to: navigation, search

Anaphora in a text is when a referring expression (like a pronoun) refers to an antecedent entity.

For example,

وَإِذَا طَلَّقْتُمُ ٱلنِّسَآءَ فَبَلَغْنَ [أَجَلَهُنَّ الزوجة ] فَلَا [تَعْضُلُوهُنَّ الزوجة ] أَن يَنكِحْنَ أَزْوَٰجَهُنَّ إِذَا تَرَٰضَوْا۟ بَيْنَهُم بِٱلْمَعْرُوفِ

In the verse above, the feminine plural third person pronoun 'هن' refers to the noun women 'النساء'.

Note, in the above example that after the word 'women' mentioned, there were 5 instance of the pronoun هن referring to this same entity. These pronouns all co-refer to the entity. All these co-referring expressions form a chain called coreference chain.

Computationally, two tasks are popular:

  • coreference resolution: where coreference chain is found.
  • pronominal anaphora resolution: where given a pronoun, the referent is identified.

Contents

Annotating Quran with Pronoun referents

Resolving pronominal anaphora in the Qur’an leads to better understanding of the Qur’an in general and helps in various NLP applications. It happens in many instances –especially in the context of stories- that the referent is mentioned at a very early stage. When tagging such pronouns with their referents will make computational analysis much easier. Consider the example in verse 2:92:

وَلَقَدْ جَاءَكُم مُّوسَىٰ بِالْبَيِّنَاتِ
And Moses had certainly brought you clear proofs.

Here the pronoun ‘you’ refers to ‘the children of Israel’, but is never found nearby and the earliest reference was made 10 verses earlier in 2:83.

Moreover, as the Qur’an being the revelation of Allah to Prophet Muhammad, it is common to refer to Allah in 1st person form (both singular and plural), and refer to Prophet Muhammad as 2nd person. Examples 2:4 and 2:23 below illustrate this. Note that It has been a practice by most English translators to include 2nd person reference to Prophet Muhammad in square brackets.


وَالَّذِينَ يُؤْمِنُونَ بِمَا أُنزِلَ إِلَيْكَ وَمَا أُنزِلَ مِن قَبْلِكَ 
And who believe in what has been revealed to you, [O Muhammad], and what was revealed before you, [2:4]


وَإِن كُنتُمْ فِي رَيْبٍ مِّمَّا نَزَّلْنَا عَلَىٰ عَبْدِنَا فَأْتُوا بِسُورَةٍ مِّن مِّثْلِهِ
And if you are in doubt about what We have sent down upon Our Servant [Muhammad], then produce a surah the like thereof [2:23]

Given that Quranic style portraits abundant use of pronouns like the verse 43:37 blow, tagging these pronouns will add greatly to understanding the Quran.

وَإِنَّهُمْ لَيَصُدُّونَهُمْ عَنِ السَّبِيلِ وَيَحْسَبُونَ أَنَّهُم مُّهْتَدُونَ
And indeed, they avert them from the way [of guidance] while they think that they are [rightly] guided

Annotation Scope

We have created annotation of the Quranic pronouns that are tagged in Quranic Arabic corpus as PRON. These are amounting to 24,668 instances. With each such instance we link this pronoun to a list of concepts and also identify the location of the antecedent if available.

Maintaining an ontological listing of concepts helps in keeping multiple Quranic terms converge to a single concept, thus improve searching for this concept. For example the concept "Quran" is referred in the Quran by various terms other than the word "Quran" like "Book", "reminder", "light", "distinguishes", etc. Some of these terms also share with other concepts like "Torah" or "Gospel".


This annotation is available through this query page which provides following information for an entered verse.

File:Pronoun.PNG

Statistics

The Quranic Arabic Corpus (QAC) gives three tags for pronouns :

  • PRON: These are personal pronouns like “I, we, you, them, us, etc.” In Arabic these pronouns can be separate words or they could be attached with the verb or noun. In total QAC have a total of 24,668 words (83.8%) among them only 3301 pronouns are separate word pronoun and the rest are attached to another word (often a verb or noun).
  • DEM: Demonstrative pronouns like “this, that, these, those, etc.”. In QAC their total is 1,060 representing only 3.6% of the total pronouns in the Qur’an.
  • REL: Relative pronouns like “which, that, etc.” counts 3,766 (12.8%).

Among the 24,668 pronouns in the QAC corpus of the Quran, 13,158 pronouns (i.e.,53%) showed antecedents the rest has no antecedents. Among these available antecedents, some 112 pronouns had antecedents coming after the pronoun.

Among the 13,158 pronouns which has antecedents, only 2,309 (17.5%) antecedents matched with the nearest preceding noun. Considering the whole population of pronouns only 9% antecedents are captured correctly when attaching with the nearest preceding noun.

We found that 80% of the antecedents stay within 50 segments from the pronoun. 75% within 40 word segments. 67% within 30 word segments. 55% within 20 word segments.

Next, gender and number agreement was imposed using QAC tagging, this improved the results slightly making 2,669 matches (20.3%) with human tagged antecedents. When excluding those pronouns where antecedents contain Relative nouns, we had better performance of nearest noun with gender and number agreement of 24.2%

It is noted that the Quran is a revelation from Allah to Prophet Muhammad (Peace be Upon Him), and as such quite a considerable amount of second person singular masculine pronouns (826 out of 3101 instances - 27%) refer to Prophet Muhammad.

Altogether Pronouns are counted as 29,494 among the 77,430 total words of the Qur’an. This 38% in a text is by all means much higher than the 1.5% which (Barbu & Mitkov 2001 [1]) reported in English technical manuals, or the 4.2% which (Dimitrov et al 2002[2]) reported for the ACE newswire corpus.

Pronouns in the Qur'an by person

Person Total Percent
1st person3,90313.3%
2nd person6,88123.3%
3rd person13,93347.2%
None/Other4,77716.2%
total29,494

Pronouns in the Qur'an by Gender

Gender Total Percent
Masculine22,28475.6%
Feminine1,8226.2%
None/Other5,38818.3%
total29,494

Pronouns in the Qur'an by Number

Number Total Percent
Singular Pronouns9,14131%
Dual Pronouns3811%
Plural Pronouns17,67160%
None/Other2,3018%
total29,494

Known Issues

  • Number agreement: Instead of creating two separate concepts for singular and plural, one is created and both singular/plural references are anchored with this concepts.

For example, in verse 2:231 وَإِذَا طَلَّقْتُمُ ٱلنِّسَآءَ فَبَلَغْنَ أَجَلَهُنَّ [فَأَمْسِكُوهُنَّ الزوجة ] بِمَعْرُوفٍ أَوْ [سَرِّحُوهُنَّ الزوجة ] بِمَعْرُوفٍ وَلَا [تُمْسِكُوهُنَّ الزوجة ] ضِرَارًا لِّتَعْتَدُوا۟ وَمَن يَفْعَلْ ذَٰلِكَ فَقَدْ ظَلَمَ نَفْسَهُۥ وَلَا تَتَّخِذُوٓا۟ ءَايَٰتِ ٱللَّهِ هُزُوًا وَٱذْكُرُوا۟ نِعْمَتَ ٱللَّهِ [عَلَيْكُمْ الذين آمنوا ] وَمَآ أَنزَلَ [عَلَيْكُم الذين آمنوا ] مِّنَ ٱلْكِتَٰبِ وَٱلْحِكْمَةِ يَعِظُكُم [بِهِۦ القرآن ] وَٱتَّقُوا۟ ٱللَّهَ وَٱعْلَمُوٓا۟ أَنَّ ٱللَّهَ بِكُلِّ شَىْءٍ عَلِيمٌ

  • Multiple pronouns: As one Quránic word might contain multiple pronouns (i.e., subject and object) it is necessary to separate the two with different symbols.

for example:

وَإِذَا طَلَّقْتُمُ ٱلنِّسَآءَ فَبَلَغْنَ [أَجَلَهُنَّ الزوجة ] فَلَا [تَعْضُلُوهُنَّ الزوجة ] أَن يَنكِحْنَ أَزْوَٰجَهُنَّ إِذَا تَرَٰضَوْا۟ بَيْنَهُم بِٱلْمَعْرُوفِ [ذَٰلِكَ النهي عن العضل ] يُوعَظُ بِهِۦ مَن كَانَ مِنكُمْ يُؤْمِنُ بِٱللَّهِ وَٱلْيَوْمِ ٱلْءَاخِرِ ذَٰلِكُمْ أَزْكَىٰ لَكُمْ وَأَطْهَرُ وَٱللَّهُ يَعْلَمُ وَأَنتُمْ لَا تَعْلَمُونَ

  • Not all times a reference is a concrete entity. It could be a verb or an action.

For example: وَإِذَا طَلَّقْتُمُ ٱلنِّسَآءَ فَبَلَغْنَ [أَجَلَهُنَّ الزوجة ] فَلَا [تَعْضُلُوهُنَّ الزوجة ] أَن يَنكِحْنَ أَزْوَٰجَهُنَّ إِذَا تَرَٰضَوْا۟ بَيْنَهُم بِٱلْمَعْرُوفِ [ذَٰلِكَ النهي عن العضل ] يُوعَظُ بِهِۦ مَن كَانَ مِنكُمْ يُؤْمِنُ بِٱللَّهِ وَٱلْيَوْمِ ٱلْءَاخِرِ ذَٰلِكُمْ أَزْكَىٰ لَكُمْ وَأَطْهَرُ وَٱللَّهُ يَعْلَمُ وَأَنتُمْ لَا تَعْلَمُونَ

  • some times, the pronoun is adjacent with the referent with the template <entity> pronoun ...., in such cases these pronouns will not be labelled.

For example: يَٰٓأَيُّهَا ٱلَّذِينَ ءَامَنُوٓا۟ أَنفِقُوا۟ مِمَّا [رَزَقْنَٰكُم الذين آمنوا ] مِّن قَبْلِ أَن يَأْتِىَ يَوْمٌ لَّا بَيْعٌ فِيهِ وَلَا خُلَّةٌ وَلَا شَفَٰعَةٌ وَٱلْكَٰفِرُونَ هُمُ ٱلظَّٰلِمُونَ

also this example: ٱلَّذِينَ يُنفِقُونَ أَمْوَٰلَهُمْ فِى سَبِيلِ ٱللَّهِ ثُمَّ لَا يُتْبِعُونَ مَآ أَنفَقُوا۟ مَنًّا وَلَآ أَذًى لَّهُمْ أَجْرُهُمْ عِندَ رَبِّهِمْ وَلَا خَوْفٌ عَلَيْهِمْ وَلَا هُمْ يَحْزَنُونَ

  • It might be important to distinguish those referent that are within the Qurán and those that are fetched from books of tafsir.
  • It is a Quránic style that names are not important as much as the lesson from a story. Thus, who is referent by a particular name could be debatable in tafsir books.

For example, أَوْ كَٱلَّذِى مَرَّ عَلَىٰ [قَرْيَةٍ وَهِىَ بيت المقدس ] خَاوِيَةٌ عَلَىٰ عُرُوشِهَا قَالَ أَنَّىٰ يُحْىِۦ هَٰذِهِ ٱللَّهُ بَعْدَ مَوْتِهَا فَأَمَاتَهُ ٱللَّهُ مِا۟ئَةَ عَامٍ ثُمَّ بَعَثَهُۥ قَالَ كَمْ لَبِثْتَ قَالَ لَبِثْتُ يَوْمًا أَوْ بَعْضَ يَوْمٍ قَالَ بَل لَّبِثْتَ مِا۟ئَةَ عَامٍ فَٱنظُرْ إِلَىٰ طَعَامِكَ وَشَرَابِكَ لَمْ يَتَسَنَّهْ وَٱنظُرْ إِلَىٰ حِمَارِكَ وَلِنَجْعَلَكَ ءَايَةً لِّلنَّاسِ وَٱنظُرْ إِلَى ٱلْعِظَامِ كَيْفَ نُنشِزُهَا ثُمَّ نَكْسُوهَا لَحْمًا فَلَمَّا تَبَيَّنَ لَهُۥ قَالَ أَعْلَمُ أَنَّ ٱللَّهَ عَلَىٰ كُلِّ شَىْءٍ قَدِيرٌ

  • when there is a reference to an entity (say birds) then does that refer to the general bird or those particular bird mentioned in that story.

For example: وَإِذْ قَالَ إِبْرَٰهِۦمُ رَبِّ أَرِنِى كَيْفَ تُحْىِ ٱلْمَوْتَىٰ [قَالَ الله ] أَوَلَمْ تُؤْمِن [قَالَ إبراهيم ] بَلَىٰ وَلَٰكِن لِّيَطْمَئِنَّ قَلْبِى [قَالَ الله ] فَخُذْ أَرْبَعَةً مِّنَ ٱلطَّيْرِ فَصُرْهُنَّ [إِلَيْكَ إبراهيم ] ثُمَّ ٱجْعَلْ عَلَىٰ كُلِّ جَبَلٍ مِّنْهُنَّ جُزْءًا ثُمَّ ٱدْعُهُنَّ يَأْتِينَكَ سَعْيًا وَٱعْلَمْ أَنَّ ٱللَّهَ عَزِيزٌ حَكِيمٌ

  • The problem of anaphora resolution can be exploited to give synonym of difficult quranic terms.

For example: يَٰٓأَيُّهَا ٱلَّذِينَ ءَامَنُوا۟ لَا تُبْطِلُوا۟ صَدَقَٰتِكُم بِٱلْمَنِّ وَٱلْأَذَىٰ كَٱلَّذِى يُنفِقُ مَالَهُۥ رِئَآءَ ٱلنَّاسِ وَلَا يُؤْمِنُ بِٱللَّهِ وَٱلْيَوْمِ ٱلْءَاخِرِ [فَمَثَلُهُۥ المرائي ] كَمَثَلِ [صَفْوَانٍ حجر أملس ] عَلَيْهِ تُرَابٌ [فَأَصَابَهُۥ حجر أملس ] [وَابِلٌ مطر ] [فَتَرَكَهُۥ حجر أملس ] صَلْدًا لَّا يَقْدِرُونَ عَلَىٰ شَىْءٍ مِّمَّا كَسَبُوا۟ وَٱللَّهُ لَا يَهْدِى ٱلْقَوْمَ ٱلْكَٰفِرِينَ

  • In many cases a word referring to 'Allah' like 'rabb' appears, and then repeated reference of Allah occurs in many subsequent verses. Should the antecedent referent be linked all the time no matter how far previously it has been mentioned?
  • Note the change of referent from more specific 'o you who believe' to generic 'husbands' in verse 2:187.

download

File:Quran-pron.zip

A typical xml structure goes as follows (from pronxml112.xml):

 <?xml version="1.0" encoding="utf-8" ?> 
- <chapter id="112">
- <verse id="1">
  <seg id="127717">قُلْ</seg> 
- <pron id="1" ant="127719 127719" con="1">
  <seg id="127718">هُوَ</seg> 
  </pron>
  <seg id="127719">ٱللَّهُ</seg> 
  <seg id="127720">أَحَدٌ</seg> 
  </verse>
- <verse id="2">
  <seg id="127721">ٱللَّهُ</seg> 
  <seg id="127722">ٱل</seg> 
  <seg id="127723">صَّمَدُ</seg> 
  </verse>
- <verse id="3">
  <seg id="127724">لَمْ</seg> 
  <seg id="127725">يَلِدْ</seg> 
  <seg id="127726">وَ</seg> 
  <seg id="127727">لَمْ</seg> 
  <seg id="127728">يُولَدْ</seg> 
  </verse>
- <verse id="4">
  <seg id="127729">وَ</seg> 
  <seg id="127730">لَمْ</seg> 
  <seg id="127731">يَكُن</seg> 
  <seg id="127732">لَّ</seg> 
- <pron id="2" ant="127721 127721" con="1">
  <seg id="127733">هُۥ</seg> 
  </pron>
  <seg id="127734">كُفُوًا</seg> 
  <seg id="127735">أَحَدٌۢ</seg> 
  </verse>
  </chapter>

As for the concepts they are maintained in Concepts.xml file, some entries are as follows:

  <?xml version="1.0" encoding="utf-8" ?> 
- <concepts>
- <con id="1">
  <arabic>الله</arabic> 
  <english>Allah</english> 
  </con>
- <con id="2">
  <arabic>القرآن</arabic> 
  <english>the Qur'an</english> 
  </con>
- <con id="3">
  <arabic>المتقين</arabic> 
  <english>(Muttaqun) the pious, the righteous, God fearing</english> 
  </con>
- <con id="4">
  <arabic>محمد</arabic> 
  <english>Prophet Muhammad</english> 
  </con>
- <con id="5">
  <arabic>الكافرين</arabic> 
  <english>(Kaafir) the infidels</english> 
  </con>

See also

Reference

[1]

  1. Barbu, C. and R. Mitkov (2001). “Evaluation tool for rule-based anaphora resolution methods”. Proc of ACL 2001, 34-41.
  2. Dimitrov M., Bontcheva K., Cunningham H., Maynard D., (2002) A Light-weight Approach to Coreference Resolution for Named Entities in Text. Proceedings of the Fourth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon.
Personal tools