Verse relatedness in Ibn Kathir

From Wiki

Jump to: navigation, search
Related verses from Ibn Kathir

Ibn Kathir is the famous author of commentary of the Qur'an [1]. Although, it has been authored over 700 years ago, still it is the most widely referred book on the subject.

Ibn Kathir has adopted certain methodology in explaining Qur'anic verses. Firstly, he brings other related verses when explaining certain verse. Often, when certain verse covers a subject briefly, there might be many other verses that come to cover other aspects of this subject. See many examples of Understand Quran with Quran. Secondly, he refers to traditions and saying of the Prophet Muhammad -peace be upon him- (i.e., Hadith). Thirdly, he cites opinions of Sahabah (i.e., companions of the Prophet) on this verse, especially those who are well known for their knowledge of the Quran like Ibn Abbas and Ibn Masoud.

Take for example verse 1:4 “Master of the Day of Judgment”, which could be related with a number of other verses in different chapters, like for example verse 25:26 “The Sovereignty on that day will be the True (Sovereignty) belonging to the Beneficent One”, and verse 40:16 “Whose is the Sovereignty this day? It is Allah´s, the One, the Almighty.”

I have attempted to build a dataset of related verses from Tafsir Ibn Kathir. In this article I will give a description of this task. You can try this dataset using this link.

Contents

Usage of the Dataset

Test the page here. A form is created to enable users entering a single query verse. Then following information are returned back:

  • A graphical illustration of the directly and indirectly related verses. This solution is been adopted using graph dracula.
  • English and Arabic translation of all directly and indirectly related verses including the current verse.
  • The list of results are sorted according to shared number of roots between our inquired verses and its relatives. It is noted however, that no overlapping keywords does NOT necessarily indicate no relations. See for example 11:21 which is semantically related strongly with 29:25 although they share no roots. The opposite is also true. I am keeping a record of such verses here.
  • A link is provided with each related verses to Ibn Kathir's page of commentary on this verses.

The Objective of this Dataset

Text similarity and text relatedness are very important applications of corpus linguistics. With the surge of textual information over the web and elsewhere, it is very hard to link similar or related information. As a result most search engines will not be able to find relations between related texts where lexical matching is absent.

In order to enable machine learning techniques to automatically detect similar and related texts, it is important first to create a corpus of training data that collects a large sample of similar and related texts which has been dictated as such by expert human sources. This would then form a gold standard for researchers to benchmark their machine learning algorithms.

To this end I considered the Holy Quran as my dataset, and I leveraged on Ibn Kathir's methodology of cross referencing related verses on collecting this dataset. My objectives behind this collection is two folds: to widen the scope of Quranic search and allow users to search beyond just keywords and lemmas, and secondly, to act as a gold standards for computational tasks in text mining and semantic relatedness.

I have noticed that Ibn Kathir did not claim to exhaustively list all related verses for the verse under discussion. Rather he mentions only few. However, when visiting Ibn Kathir's discussion on some of those related verses we could see few more new verses are appearing. See for example verses related to 11:2. Hence, in my search tool I have included the directly and indirectly related verses.

Verification and removal of non-related verses

After the first collection of all verses that has been mentioned by Ibn Kathir in the context of explaining a certain verse, I went manually throughout the 7,679 pairs of related verses and removed 883 pairs which looked unrelated. I have kept a log of these pairs here.

Following are some reasons why Ibn Kathir has included those pairs:

  • Often a verse is related to one verse only, but Ibn Kathir widens the context further and includes some adjacent verses as well. Take for example the verses related to 68:15 which informs about a person who considers the Quran as fables of the old. Ibn Kathir relates this verse with 15 consecutive short verses of 74:11-26 which talks in detail about an example of such a person. I have selected 8 out of 15 of these verses as somehow related to 86:15.
  • When the pair are related very loosely. Take for example, the relation between 2:26 which talks about mosquito (or gnat) and verse 6:44 which talks about the those who forget and reject God's instruction then Allah opens for them the gates of provisions and then takes them with punishment all in a sudden. It appears that these two are too far away semantically. When examining the context it appears that Ibn Kathir related them based on an opinion that gnat in 2:26 is mentioned as a parable for our world and we human beings out of greediness and love of this world try to take more and more of its provision till we die, as the gnat eat and eat more and more till it dies. Same as the people of this world take from this world more and more -while rejecting and forgetting Allah's instruction- till punishment befalls them all in a sudden. As this type of deep linkage is very difficult to identify computationally, it has been dropped out.

Goldstandard Data Set for semantic relatedness research and inter-annotation agreement

One objective behind creating this dataset is to help researchers in the filed of computational semantic relatedness to train their classifiers. One important parameter for efficient algorithms is to have the human annotated goldstandard of high quality. This dataset of related verses can be qualified as a dataset with high agreement rate of 88.5%.

In this setting Ibn Kathir can be considered the first annotator who is an expert on the Quran and has identified 7,679 paris of related verses. Myself -having fair exposure of Quran and Tafsir- is the second annotator who went through each pair and marked it as 'related' or 'not-related', and thus removed 883 pairs from the original list.

An Illustrative Example

To demonstrate verse relatedness, consider verse 13:7. Investigating this verse reveals the following network. So, this verse has three direct relations and 8 indirect relations. I am working on producing interactive graphical view of the whole network. Monitor the development here

Collecting the Dataset

Tafsir Ibn Kathir is available freely on various websites. I choose the one available at Quran Complex website. Web rendering on this website enables applying semi-automated techniques to extract related verses. I have developed certain perl scripts to visit URLs given chapter and verse numbers and populate a database maintaining these pairs of related verses.

Manual cleaning and correction followed. As Ibn Kathir discusses a group of verses togather, the most time consuming manual task was to assign those verses mentioned under one group to a specific single verse.

After completing the initial compilation further cleaning was done by removing duplicates and verses referring to itself.

Then on a second pass, I went on to find all verses indirectly related to a verse. This is done by searching the direct relations of those verses that are directly related to a given verse. See the example above.

Possible Shortcomings

Relating Short Verses

As Quranic verses vary in size we run into two different problems: 1) Those verses that are long may contain multiple topics and hence, pairing the whole verse with another verse reflects only a partial relation 2) those verses that are very small share with adjacent verses a single topic, and again in this case the one-to-one pairing with another verse is not representative. As an example on this check the verses related to 10:71 or 11:97

Similar to above, we encounter cases when a verse is related with a number of verses that share relation.

Going into Details on a single Word

For example consider the verse 11:8 where the word "Ummah" was mentioned which mostly means a "nation". However, in Quran this word can have other less frequently used meanings like "a leader" or "a short period of time". Here Ibn Kathir cites references of all other verses in the Quran that this word are used to mean things other than a "nation". These verses are related not on the semantic topic, rather on the different usage of this word "Ummah".

Same applies to relating 12:20 to 72:13.

Relatedness in case of stories

When many adjacent verses talk about a certain story, then they all are inter-related and cited as such by ibn kathir. See for example verse 18:65.

Another issue with stories is that when this story is supported by Hadiths and these hadiths narrate the whole story with reference to many verses which might not be directly related to the verse in context. To get the flavor of this issue consider verse 20:40 which talks about how Allah reminds him of His favor on Moses on various events of his life starting from childhood till he saved him from Pharaoh's conspiracy. However, in this particular place Ibn Kathir narrates a Hadith that summarizes the whole story of Moses with reference to verses from the Qur'an.

Need relatedness on segment level

As Quranic verses vary in size, a verse might contain more than one clause or sentence. Ibn Kathir might have used only a portion of a verse to show relatedness. As a result when going one level further verses with unrelated concepts might get linked.

Consider for example relations of verse 13:7 as depicted above. This verse goes as follows:

وَيَقولُ الَّذينَ كَفَروا لَولا أُنزِلَ عَلَيهِ ءايَةٌ مِن رَبِّهِ إِنَّما أَنتَ مُنذِرٌ وَلِكُلِّ قَومٍ هادٍ

  1. " Those who disbelieve say: If only some portent were sent down upon him from his Lord!
  2. Thou art a warner only,
  3. and for every folk a guide."

Note how this single verse contain within three different points. And because Ibn Kathir cites verses against these smaller segments of verses, going to indirectly related verses we notice some totally unrelated verses.

For example, consider verse 2:272 which relates to 13:7 on the segment "Thou art a warner only,". But this verse (2:272) is a lengthy verse and contains other points. So, when taking the verses related to 2:272 we get 41:46, 45:15 and 60:8 which are cited by Ibn Kathir on different segment of 2:272 and hence should not be related to our original verse 13:7. Since our unit of comparison is not these smaller segments within a bigger verse, we look for improvement in future.

Take a look on verses related to 12:109 which contains multiple topics and observe how a particular topic leads to many branches of related verses.

Another example of two related verses on segment level not on the overall verse level would be those between 12:76 and 58:11.

Following table gives a plain meaning of all the related verses for your easy reference [2]

Directly related verses
2:272لَيسَ عَلَيكَ هُدىٰهُم وَلٰكِنَّ اللَّهَ يَهدى مَن يَشاءُ وَما تُنفِقوا مِن خَيرٍ فَلِأَنفُسِكُم وَما تُنفِقونَ إِلَّا ابتِغاءَ وَجهِ اللَّهِ وَما تُنفِقوا مِن خَيرٍ يُوَفَّ إِلَيكُم وَأَنتُم لا تُظلَمونَThe guiding of them is not thy duty (O Muhammad), but Allah guideth whom He will. And whatsoever good thing ye spend, it is for yourselves, when ye spend not save in search of Allah´s Countenance; and whatsoever good thing ye spend, it will be repaid to you in full, and ye will not be wronged.
17:59وَما مَنَعَنا أَن نُرسِلَ بِالءايٰتِ إِلّا أَن كَذَّبَ بِهَا الأَوَّلونَ وَءاتَينا ثَمودَ النّاقَةَ مُبصِرَةً فَظَلَموا بِها وَما نُرسِلُ بِالءايٰتِ إِلّا تَخويفًاNaught hindereth Us from sending portents save that the folk of old denied them. And We gave Thamud the she-camel - a clear portent save to warn.
35:24إِنّا أَرسَلنٰكَ بِالحَقِّ بَشيرًا وَنَذيرًا وَإِن مِن أُمَّةٍ إِلّا خَلا فيها نَذيرٌLo! We have sent thee with the Truth, a bearer of glad tidings and a warner; and there is not a nation but a warner hath passed among them.
Indirectly related verses
41:46مَن عَمِلَ صٰلِحًا فَلِنَفسِهِ وَمَن أَساءَ فَعَلَيها وَما رَبُّكَ بِظَلّٰمٍ لِلعَبيدِWhoso doeth right it is for his soul, and whoso doeth wrong it is against it. And thy Lord is not at all a tyrant to His slaves.
45:15مَن عَمِلَ صٰلِحًا فَلِنَفسِهِ وَمَن أَساءَ فَعَلَيها ثُمَّ إِلىٰ رَبِّكُم تُرجَعونَWhoso doeth right, it is for his soul, and whoso doeth wrong, it is against it. And afterward unto your Lord ye will be brought back.
60:8لا يَنهىٰكُمُ اللَّهُ عَنِ الَّذينَ لَم يُقٰتِلوكُم فِى الدّينِ وَلَم يُخرِجوكُم مِن دِيٰرِكُم أَن تَبَرّوهُم وَتُقسِطوا إِلَيهِم إِنَّ اللَّهَ يُحِبُّ المُقسِطينَAllah forbiddeth you not those who warred not against you on account of religion and drove you not out from your homes, that ye should show them kindness and deal justly with them. Lo! Allah loveth the just dealers.
5:115قالَ اللَّهُ إِنّى مُنَزِّلُها عَلَيكُم فَمَن يَكفُر بَعدُ مِنكُم فَإِنّى أُعَذِّبُهُ عَذابًا لا أُعَذِّبُهُ أَحَدًا مِنَ العٰلَمينَAllah said: Lo! I send it down for you. And whoso disbelieveth of you afterward, him surely will I punish with a punishment wherewith I have not punished any of (My) creatures.
11:65فَعَقَروها فَقالَ تَمَتَّعوا فى دارِكُم ثَلٰثَةَ أَيّامٍ ذٰلِكَ وَعدٌ غَيرُ مَكذوبٍBut they hamstrung her, and then he said: Enjoy life in your dwelling-place three days! This is a threat that will not be belied.
13:31وَلَو أَنَّ قُرءانًا سُيِّرَت بِهِ الجِبالُ أَو قُطِّعَت بِهِ الأَرضُ أَو كُلِّمَ بِهِ المَوتىٰ بَل لِلَّهِ الأَمرُ جَميعًا أَفَلَم يَا۟يـَٔسِ الَّذينَ ءامَنوا أَن لَو يَشاءُ اللَّهُ لَهَدَى النّاسَ جَميعًا وَلا يَزالُ الَّذينَ كَفَروا تُصيبُهُم بِما صَنَعوا قارِعَةٌ أَو تَحُلُّ قَريبًا مِن دارِهِم حَتّىٰ يَأتِىَ وَعدُ اللَّهِ إِنَّ اللَّهَ لا يُخلِفُ الميعادَHad it been possible for a Lecture to cause the mountains to move, or the earth to be torn asunder, or the dead to speak, (this Qur´an would have done so). Nay, but Allah´s is the whole command. Do not those who believe know that, had Allah willed, He could have guided all mankind? As for those who disbelieve, disaster ceaseth not to strike them because of what they do, or it dwelleth near their home until the threat of Allah come to pass. Lo! Allah faileth not to keep the tryst.
26:214وَأَنذِر عَشيرَتَكَ الأَقرَبينَAnd warn thy tribe of near kindred,
16:36وَلَقَد بَعَثنا فى كُلِّ أُمَّةٍ رَسولًا أَنِ اعبُدُوا اللَّهَ وَاجتَنِبُوا الطّٰغوتَ فَمِنهُم مَن هَدَى اللَّهُ وَمِنهُم مَن حَقَّت عَلَيهِ الضَّلٰلَةُ فَسيروا فِى الأَرضِ فَانظُروا كَيفَ كانَ عٰقِبَةُ المُكَذِّبينَAnd verily We have raised in every nation a messenger, (proclaiming): Serve Allah and shun false gods. Then some of them (there were) whom Allah guided, and some of them (there were) upon whom error had just hold. Do but travel in the land and see the nature of the consequence for the deniers!

Download

Here is the XML file containing these pairs. File:Kathir-verse-similarity.xml

In total there are 7,679 pairs of related verses taken from Ibn Kathir, among them 882 pairs are branded with 'relevance' value '0' where the pairs does not seem related without reading the context in the tafsir book. Relevance '1' shows 3718 of related pairs, among them 3079 where further branded as relevance level '2' which shows clear relations and could be suitable for machine learning algorithms.


 *  Dataset on Quranic Verse Relatedness from Tafsir Ibn Kathir (version 0.1)
*  Copyright (C) 2011 Abdul-Baquee M. Sharaf
*  License: GNU Public License
*
*  This dataset lists pairs of verses that have been identified by Ibn
*  Kathir in his Tafsir book. After collecting these pairs, two further
*  passes were made manually to brand degree of relatedness. 
*  
*  Level '0':
*  seems very loosely related and should be understood by looking
*  into the context in the tafsir book.  
* 
*	Level '1':
*	These pairs are understandable by Human reader to be related, but
*	still might be difficult for training learning algorithms
*
*	Level '2':
*	These pairs are very much related and might be suitable for taining
*	machine learning algorithms.
*
*  TERMS OF USE:
*
*  - Permission is granted to copy and distribute verbatim copies
*    of this file, but CHANGING IT IS NOT ALLOWED.
*
*  - This annotation can be used in any website or application,
*    provided its source (TextMiningTheQuran.com) is clearly
*    indicated.
*
*  - This copyright notice shall be included in all verbatim copies
*    of the text, and shall be reproduced appropriately in all works
*    derived from or containing substantial portion of this file.
*
*  Check updates at (http://TextMiningtheQuran.com)
*
*  USAGE:
* 
*	"uid"   	: incremantal ID
*	"ss"    	: source chapter number
*	"sv"		: source verse number
*	"ts"		: target chapter number
*	"tv"		: target verse number
*	"common"	: the number of common root words between the two verses
*	"relevance"	: the degree of relatedness as explained above
-->

<pma_xml_export version="1.0">
   
    <database name="related-verses">
        <!-- Table kathir -->
        <table name="kathir">
            <column name="uid">1</column>
            <column name="ss">1</column>
            <column name="sv">1</column>
            <column name="ts">1</column>
            <column name="tv">2</column>
            <column name="common">0</column>
            <column name="relevance">2</column>
        </table>

Computational Analysis

This dataset could be very valuable for researchers experimenting in computational phrase similarity and relatedness. Note that among the 7,679 pairs of related verses, a good portion of 36% of them share no common keywords (i.e., 2,783 pairs).

Statistics show that among the total 6,236 verses, Ibn Kathir dataset assigns relations with 2,445 verses (40% of quranic verses), with an average of 3 relations per verse.

Applications

Chapters related to Chapter 13

One of the applications designed out of this dataset is finding the relatives of a Quranic Chapter. Given a chapter who are its closest relatives? I based my search for relatives of a chapter by searching number of cross-reference between the individual verses within the chapeters. Further, I distinguished the Meccan and Medinan surahs through color codes.

Check the application here.

Future Enhancements

Following are future enhancement plans.

  • Manually remove those relations that are not obvious. This require human judgement and should adopt some well defined strategy on defining which verses and related.
  • categorize relations into well-defined categories, like: part-of, antonym, exemplification, reason, justification, etc.
  • make unit of related pair on the level of segments and not verses.
  • populating other related pairs from sources other than Ibn Kathir.

References

  1. [1] Tafseer Ibn Kathir.
  2. [2],http://www.textminingthequran.com/apps/similarity.php.
Personal tools