
The Library
A method for ontology and knowledgebase assisted text mining for diabetes discussion forum
Tools
Issa, Ahmad (2015) A method for ontology and knowledgebase assisted text mining for diabetes discussion forum. PhD thesis, University of Warwick.
|
PDF
WRAP_THESIS_Issa_2015.pdf - Submitted Version - Requires a PDF viewer. Download (5Mb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b2812650~S1
Abstract
Social media offers researchers vast amount of unstructured text as a source to discover hidden knowledge and insights. However, social media poses new challenges to text mining and knowledge discovery due to its short length, temporal nature and informal language.
In order to identify the main requirements for analysing unstructured text in social media, this research takes a case study of a large discussion forum in the diabetes domain. It then reviews and evaluates existing text mining methods for the requirements to analyse such a domain. Using domain background knowledge to bridge the semantic gap in traditional text mining methods was identified as a key requirement for analysing text in discussion forums. Existing ontology engineering methodologies encounter difficulties in deriving suitable domain knowledge with the appropriate breadth and depth in domain-specific concepts with a rich relationships structure. These limitations usually originate from a reliance on human domain experts.
This research developed a novel semantic text mining method. It can identify the concepts and topics being discussed, the strength of the relationships between them and then display the emergent knowledge from a discussion forum. The derived method has a modular design that consists of three main components: The Ontology building Process, Semantic Annotation and Topic Identification, and Visualisation Tools. The ontology building process generates domain ontology quickly with little need for domain experts. The topic identification component utilises a hybrid system of domain ontology and a general knowledge base for text enrichment and annotation, while the visualisation methods of dynamic tag clouds and cooccurrence network for pattern discovery enable a flexible visualisation of these results and can help uncover hidden knowledge.
Application of the derived text mining method within the case study helped identify trending topics in the forum and how they change over time. The derived method performed better in semantic annotation of the text compared to the other systems evaluated.
The new text mining method appears to be “generalisable” to other domains than diabetes. Future study needs to confirm this ability and to evaluate its applicability to other types of social media text sources.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software Z Bibliography. Library Science. Information Resources > ZA Information resources |
||||
Library of Congress Subject Headings (LCSH): | Ontologies (Information retrieval), Data mining, Online social networks -- Data processing, Semantics -- Data processing | ||||
Official Date: | May 2015 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Warwick Manufacturing Group | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Bal, Jay | ||||
Extent: | x, 217 leaves : illustrations (colour), charts | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year