During LREC 2012 conference in Istanbul

I am Abdul Baqi M. Sharaf. I have a PhD from the Leeds University. My thesis title was: ” Annotation of Conceptual Co-reference and Text Mining the Qur’an“.  My supervisor was Eric Atwell.

I am investigating ways that computational power could help mine the Qur’anic text! Text Mine the Quran sounds like a job done by machines. However, the Quran is not meant for machine to understand, it is for mankind to comprehend and implement in their life. Machine can only assist in this human understanding by facilitating search and linking up things together. Thus, you will find me in this blog doing lots of human text mining, rather than machine mining, trying to related contemporary events and practices with the Quran and build up a world view out of that.

Qur’an holds lots of interesting information, facts, correlations, patterns, associations between facts and concepts that are difficult to discover by manual processing. Hence, we try to employ computational techniques from the fields of text mining, machine learning, natural language processing, computational linguistics and stylometrics to reveal some of the hidden trends and make it easy to link the scattered yet related concepts in the Qur’an.

I am interested in carrying on computational research to exploit techniques  from computational fields and at the same time respect the guidelines and methodology set by early Qur’anic scholars and in their books of Tafsir like Ibn Jarir at-Tabari, Al-Baghawi and Ibn Kathir.

You can join my Telegram channels “Gems from the Quran”  https://telegram.me/thisisquran

If you liked my research and wanted to contribute in sustaining such computational Quranic research and to cover various administrative cost of running this site:

Data Sets

Visit my Wiki for more on this. During my PhD work, I have developed two data sources:

– Related verses from Ibn Katheer. See this page for details and download options.

– Pronoun Tagging. See this page for details and download options.


Is Computational Linguistics useful for Quranic Studies?“, presented as Guest Speaker at a workshop on “Plagiarism Detection in Arabic” at King Abdul-Aziz University, Jeddah on 17-March-2015. [slides]

Interview in Arabic in the Al-Watan Newspaper, appeared in the “Jumatuna Annex” on 8-Nov-2013.  [al-watan interview]

Sharaf, Abdul-Baquee (2012) “Annotation of Conceptual Co-reference and Text Mining the Qur’an“. PhD Thesis, University of Leeds, 2012. [muhammad12phd]

Sharaf, Abdul-Baquee and Atwell, Eric. (2012) “QurSim: A corpus for evaluation of relatedness in short texts“, LREC 2012. [muhammad et al – 2012b- qurSim]

Sharaf, Abdul-Baquee and Atwell, Eric, (2012) “QurAna: corpus of the Quran annotated with pronominal anaphora“, LREC 2012. [muhammad et al – 2012a QurAna]

Sharaf, Abdul-Baquee; Atwell, Eric (2011)التصنيف الآلي للسور القرآنية “Automatic categorization of the Quranic chapters”. 7th International Computing Conference in Arabic (ICCA11).31th May – 2nd June 2011, Imam Mohammed Ibn Saud University, Riyadh, KSA. IN ARABIC [ICCA2011_proceedings_paper26]

Sharaf, A. et al (2010). “NLP Projects on Arabic and the Quran at Leeds University”. Workshop on enriching Arabic digital contents. Damascus, Syria. [alesco-paper-Sharaf10]

Dukes, K. Sharaf, A and Atwell, E (2010). “Online Visualization of Traditional Quranic Grammar using Dependency Graphs.” Conference on The Foundations of Arab Linguistics – Sibawayhi and the Earliest Arabic Grammatical Theory, Faculty of Asian and Middle Eastern Studies, Cambridge University [qcorpus-fal2010]

Dukes, K., Atwell, E., Sharaf, A. (2010) Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank. LREC-2010, Valletta, Malta [qsyntax-lrec2010]

Eric Atwell, Kais Dukes, Abdul-Baquee Sharaf, Nizar Habash, et al.(2010) Understanding the Quran: A new Grand Challenge for Computer Science and Artificial Intelligence. Grand Challenges for Computing Research (2010). British Computer Society Workshop. Edinburgh [UnderstandingTheQuran-Edinburgh-2010]

Sharaf, A. and Atwell, E. (2009) A Corpus-based computational model for knowledge representation of the Qur’an. 5th Corpus Linguistics Conference, Liverpool [sharaf2009-cl2009]

Abdul-Baqi Sharaf (2009) The Qur’an Annotation for Text Mining. PhD 1st Year Transfer Report. Leeds University [firstYearTransferReport]


Here are few applications that you may want to try:

Verse Similarity

verse similarity from Ibn Kathir

check character similarity of quranic verses using similar_text function

check verse character similarity using Text::Similarity::Overlap module

check verse similarity using TF-IDF vector similarity measure

verse segments similarity using text::similarity::overlap module


Word Co-occurrence

N-gram search

Display and Visualizations

Quran Chapter relatedness

word cloud from Quran

Quran Concordance

Part of Speech display of Chapters

 Tafsir Corpus

Pronouns and Concepts

Pronoun Reference

Quranic Pronoun concept lists


10 thoughts on “About”

  1. AL-Salam Alaikom,

    Dear brother,
    I just wanted to thank you very much for this great work you have done in mining the Noble Qura’an text.

    I thought of doing the same right after I finished a datamining course, but decide to search for similar works so I wouldn’t end up reinventing the wheel.
    I also graduated from KFUPM, but as an undergraduate.

    Jazak Allah kol khair dear brother for this great work.
    Looking forward to see your future work.
    Best wishes,

  2. Assalamu alaikum

    May Allah bless you for the work you are doing and grant you Paradise -amin.

    Insha’Allah this research will come into great use…


  3. mashallah i am delighted to see this, even i was thinking this and experimented with text mining on quran, but since i dont have much knowledge of Arabic i did it only with various English translations.
    it also became my mtech project.

    1. AOA Bro,

      Please may I know what was the nature of your experiments with English translations of Quran. I am working on a methodology for QA. I would like to know more about your work as it might be connected to my work somewhere somehow.

      Please suggest.


      1. most of my research work can be browsed under “wiki”, mainly I worked on a database of “quranic pronouns” and “similarity and relatedness” from Ibn Katheer tafsir.


  4. Alsalamu alikom
    I am a studing data mining courses these days.
    I have read your wiki in Hadith text mining,i liked the idea and i’m interesting in doing some researchs on that. may you give me some advices, ideas in chains of authority(Sanad) to start with.

  5. AOA Abdul Baqi M. Sharaf, Would you scomment on how the Information technology can help in investigating thematic and cotextual relationshps, order and systems in the text of Quran. You will be aware of the recent works that has be done by many scholars. Some introduction & refrences are in the following here:




    The problem is that describing, presenting such thematic contextual relationshps, order and systems in the text seems cumbersome for understanding and slows further work.

  6. AOA,
    You are awesome. I just finished building a quran app (open source and free)for android to search queries like “alms zakat fr poor peple” and ayats containing these words and finding ayats related to the ayats retrieved. Initially it was such a tedious task I had a book that contained ayats group together on similar subject. I want to use the data you have compiled Its AMAZING. If I can have your permission it will be really good as it will increases the recall greatly to the max extent possible. Me and my friends are already working on similar words for token in Quran. The platform is kept general so when ever some gives us a translation of Quran in any format other than uneditable_pdf and we can find a tokenizer for it that translation can also be added to the app.
    Your work is phenomenal may Allah bless you for this work and make you ever sucessful.

Leave a Reply

Your email address will not be published. Required fields are marked *