Text Mining Hadeeth

From Wiki

Jump to: navigation, search

Hadeeth (or Hadith) are the traditions of the Prophet Muhammad peace be upon him. This includes sayings of the Prophet -which are in classical Arabic- as well as the reports of his companions about incidences and actions of the Prophet. Hadeeth texts play a very significant role in Islam and it complements the Quran in elaborating the generalities mentioned in the Quran. Islamic library contains a vast number of works on Hadeeth. Text mining and computational linguistics can play potential role in establishing linkages and associations between related hadeeths which might be very difficult through manual inspection.

Contents

key research issues

Traditionally a Hadeeth contains Sanad [سند] (long chain of narrators) as well as Matn [متن] (the actual content/text of the hadeeth). Interesting computational projects can emanate from both these parts of a Hadeeth. A brief introduction on this subject can be viewed here.

Early Muslim scholars established a unique science of studying the Sanad of Hadith in order to study its authenticity. One computational project could be to study this field and formalize certain rules in order to automate (or semi-automate) the authenticity aspect. Some corpus-based techniques could be employed in giving judgment about the sanad of a Hadeeth if a huge corpus of "Elm ar-Rijal [علم الرجال]" is available along with proper annotations. To evaluate the need of such automated tools, scholars from the field should be consulted in order to identify their current "pain points" while studying the authenticity of a certain hadith. These "pain points" could then be analyzed by computational linguists in order to check how much this "pain" could be elevated through automated systems.

Challenges

While the Qur'an is already defined as a bound text with 114 chapters and a total of 77,430 words, the Hadeeth can not be delimited by certain number. Hence, a good starting point could be by choosing certain authored collections like Bukhari, Muslim and other books of Hadeeth.

These collected digitized text need to be then unified under a structured format in order to enable machine reading and designing programs informed by this unified representation.

Next comes the challenge of annotating hadith text with selected of layers. Traditional layers are: part-of-speech annotation, syntactic annotation, semantic annotation, anaphora annotation, named-entity annotation. Each layer has its own challenges and complexities.

In any corpus building and annotation project, the most laborious task is the manual annotation. This involves defining a clear and concise annotation scheme along with annotation tools. Then selecting suitable human annotators and training them with the scheme. Then comes the issue of studying inter-annotation agreement and through further training reducing dis-agreement cases as much as possible. During this task close coordination with expert scholars in vital in order to resolve subject matter issues which annotators might discover.

During the project, close connection with Hadeeth scholars is mandatory. It is important that hadeeth scholars should feel the ownership of such a project as they will be the potential end-users of such applications in future - . Their input will be vital down the road and in all phases: requirement gathering, annotation process, validating results, testing and verification, etc.

Key research topics

The first step is to digitize collections of Hadeeth and make them machine readable. This part is mostly available and could be found in machine readable textual format in many online sites as shown in the sub-section below. Another key aspect is to decide if the project is intended to be in Arabic only or include multi-lingual translations. For later option, a great effort is needed to have all texts in multi-lingual format.

As Arabic part-of-speech tagger and other automated tools are very rare and not much accurate, a great effort is needed in manual annotation of hadith text. A study need to be done with Hadeeth scholars on the nature and extent of annotation.

Once the annotated corpus with these multiple layers are at disposal, interesting applications can be initiated out of it. It is advisable that any application is made online and open-source for greater benefit and outreach.

One interesting computation project is to establish linkage between Quran and Hadith. As a starting point, Tafseer Ibn Kathir can be investigated and a dataset could be built where each Quranic Aya is linked with the ahadeeth which Ibn Kathir has found to be in connection with this aya. Later this list could be enriched from other Tafseer books like al-Baghawi, Qurtubi, Tabari, etc.

There has been few attempts to automatically classify Hadith into topics. These researches follow text classification using machine learning algorithms. See for example (Alkhatib 2010 [1]) (Al-Kabi and Al-Sinjilawi 2007 [2]) (Jbara 2010 [3])

Hadith Collections

Following are 9 renowned large collections of Hadith books, listed in their order of importance. I have included links to English online translation whenever available. Also, included against each book, a well-known commentary if available.

No. Collection Name Arabic On-line Version English On-line Version Commentary Book Arabic Online Version of Commentary
1Saheeh Al-Bukhari [صحيح البخاري] al-islam.com sunnah.com Fathul Bari [فتح الباري] by Ibn Hajaral-islam.com
2Saheeh Muslim [صحيح مسلم] al-islam.com sunnah.com [شرح النووي] by An-Nawawial-islam.com
3Sunan Al-Tirmiji [سنن الترمذي] al-islam.com sunnah.com (in-progress)Tohfatul Ahwadhi [تحفة الأحوذي]al-islam.com
4Sunan an-Nasae [سنن النسائي]al-islam.comsunnah.com (in-progress)commentary book: by As-Souti (شرح السيوطي وحاشية السندي)al-islam.com
5Sunan Abu Dawoud [سنن أبو داود]al-islam.comCMJE.orgAoun al-Ma'boud [عون المعبود]al-islam.com
6Sunan Ibn Majah [سنن ابن ماجه]al-islam.comsunnah.com (in-progress)Al-Sindi [حاشية السندي]al-islam.com
7Musnad Imam Ahmad [مسند الإمام أحمد]al-islam.comno known translationno known commentaryn/a
8Moutta Imam Malik [موطأ مالك]al-islam.comsunnah.comAl-Muntaqa [المنتقى]al-islam.com
9Sunan ad-Darami [سنن الدارمي]al-islam.comno known en translationno known commentaryn/a

Another well-known collection of hadith book which is authored later, referencing from above core collections, is Riyad as-Salihin [رياض الصالحين], which has an English translation at sunnah.com.

Typical Chapters

Following are chapter names from Sahih Muslim, which gives an idea on the themes of typical hadith collection.

   The Book of Faith (Kitab Al-Iman)
   The Book of Purification (Kitab Al-Taharah)
   The Book of Menstruation (Kitab Al-Haid)
   The Book of Prayers (Kitab Al-Salat)
   The Book of Zakat (Kitab Al-Zakat)
   The Book of Fasting (Kitab Al-Sawm)
   The Book of Pilgrimage (Kitab Al-Hajj)
   The Book of Marriage (Kitab Al-Nikah)
   The Book of Divorce (Kitab Al-Talaq)
   The Book of Transactions (Kitab Al-Buyu`)
   The Book Pertaining to the Rules of Inheritance (Kitab Al-Farai`d)
   The Book of Gifts (Kitab Al-Hibat)
   The Book of Bequests (Kitab Al-Wasiyya)
   The Book of Vows (Kitab Al-Nadhr)
   The Book of Oaths (Kitab Al-Aiman)
   The Book Pertaining to the Oath, for Establishing the Responsibility of Murders, Fighting, Requital and Blood-Wit (Kitab Al-Kitab Al-Qasama wa'l-Muharaba wa'l-Qisas wa'l-Diyat)
   The Book Pertaining to Punishments Prescribed by Islam (Kitab Al-Hudud)
   The Book Pertaining to Judicial Decisions (Kitab Al-Aqdiyya)
   The Book of Jihad and Expedition (Kitab Al-Jihad wa'l-Siyar)
   The Book on Government (Kitab Al-Imara)
   The Book of Games and the Animals which May be Slaughtered and the Aninals that Are to be Eaten (Kitab-us-Said wa'l-Dhaba'ih wa ma Yu'kalu min Al-Hayawan)
   The Book of Sacrifices (Kitab Al-Adahi)
   The Book of Drinks (Kitab Al-Ashriba)
   The Book Pertaining to Clothes and Decoration (Kitab Al-Libas wa'l-Zinah)
   The Book on General Behaviour (Kitab Al-Adab)
   The Book on Salutations and Greetings (Kitab As-Salam)
   The Book Concerning the Use of Correct Words (Kitab Al-Alfaz min Al-Adab wa Ghairiha)
   The Book of Poetry (Kitab Al-Sh`ir)
   The Book of Vision (Kitab Al-Ruya)
   The Book Pertaining to the Excellent Qualities of the Holy Prophet (may Peace be upon them) and His Companions (Kitab Al-Fada'il)
   The Book Pertaining to the Merits of the Companions (Allah Be Pleased With Them) of the Holy Prophet (May Peace Be Upon Him) (Kitab Al-Fada'il Al-Sahabah)
   The Book of Virtue, Good Manners and Joining of the Ties of Relationship (Kitab Al-Birr was-Salat-I-wa'l-Adab)
   The Book of Destiny (Kitab-ul-Qadr)
   The Book of Knowledge (Kitab Al-`Ilm)
   The Book Pertaining to the Remembrance of Allah, Supplication, Repentance and Seeking Forgiveness (Kitab Al-Dhikr)
   The Book of Heart-Melting Traditions (Kitab Al-Riqaq)
   The Book Pertaining to Repentance and Exhortation to Repentance (Kitab Al-Tauba)
   Pertaining To The Charateristics Of The Hypocrites And Command Concerning Them (Kitab Sifat Al-Munafiqin Wa Ahkamihim)
   The Book Giving Description of the Day of Judgement, Paradise and Hell (Kitab Sifat Al-Qiyamah wa'l Janna wa'n-Nar)
   The Book Pertaining to Paradise, Its Description, Its Bounties and Its Intimates (Kitab Al-Jannat wa Sifat Na'imiha wa Ahliha)
   The Book Pertaining to the Turmoil and Portents of the Last Hour (Kitab Al-Fitan wa Ashrat As-Sa`ah)
   The Book Pertaining to Piety and Softening of Hearts (Kitab Al-Zuhd wa Al-Raqa'iq)
   The Book of Commentary (Kitab Al-Tafsir)

Some Hadeeth Search Sites

state of the art

Text mining biological texts is getting mature, and there exists many resources, projects, journals, software, conferences on this subjects from which this project could benefit.

"Knowledge representation" research might be also be relevant in this context, especially those research circling around "semantic Web" and ontologies. A "semantic wiki" on Quran and Hadith might be an interesting product that will benefit Muslim and non-Muslim community, and its implementation might not be very difficult.

"Information Visualization" is gaining momentum these days as the number of digital information is becoming readily available and with huge data appealing visual tools brings concepts closer and clearer. See examples at http://www.informationisbeautiful.net/

References

  1. Alkhatib, M. (2010) " Classification of Al-Hadith Al-Shareef Using Data Mining Algorithm" Proc. of European, Mediterranean & Middle Eastern Conference on Information Systems, Abu Dhabi, 2010. Available online pdf
  2. Al-Kabi, M.and Al-Sinjilawi, S. (2007) A comparative study of the efficienty of different measures to classify Arabic text", University of Sharjah Journal of Pure and Applied Sciences, Vol. 4,No. 2. Available Online pdf
  3. Jbara, K. (2010) Knowledge Discovery in Al-Hadith Using Text Classification Algorithm. Journal of American Science. Vol. 6(11), 2010. Available online pdf
Personal tools