The Library

Probabilistic neural topic models for text understanding

Tools

Pergola, Gabriele (2020) Probabilistic neural topic models for text understanding. PhD thesis, University of Warwick.

Preview

PDF
WRAP_Theses_Pergola_2020.pdf - Submitted Version - Requires a PDF viewer.
Download (2728Kb) | Preview

Official URL: http://webcat.warwick.ac.uk/record=b3735705

Request Changes to record.

Abstract

Making sense of text is still one of the most fascinating and open challenges thanks and despite the vast amount of information continuously produced by recent technologies. Along with the growing size of textual data, automatic approaches have to deal with the wide variety of linguistic features across different domains and contexts: for example, user reviews might be characterised by colloquial idioms, slang or contractions; while clinical notes often contain technical jargon, with typical medical abbreviations and polysemous words whose meaning strictly depend on the particular context in which they were used.

We propose to address these issues by combining topic modelling principles and models with distributional word representations. Topic models generate concise and expressive representations for high volumes of documents by clustering words into “topics”, which can be interpreted as document decompositions. They are focused on analysing the global context of words and their co-occurrences within the whole corpus. Distributional language representations, instead, encode the word syntactic and semantic properties by leveraging the word local contexts and can be conveniently pre-trained to facilitate the model training and the simultaneous encoding of external knowledge. Our work represents one step in bridging the gap between the recent advances in topic modelling and the increasingly richer distributional word representations, with the aim of addressing the aforementioned issues related to different linguistic features within different domains.

In this thesis, we first propose a hierarchical neural model inspired by topic modelling, which leverages an attention mechanism along with a novel neural cell for fine-grained detection of sentiments and themes discussed in user reviews. Next, we present a neural topic model with adversarial training to distinguish topics based on their high-level semantics (e.g. opinions or factual descriptions). Then, we design a probabilistic topic model specialised for the extraction of biomedical phrases, whose inference process goes beyond the limitations of traditional topic models by seamlessly combining the word co-occurrences statistics with the information from word embeddings. Finally, inspired by the usage of entities in topic modelling [85], we design a novel masking strategy to fine-tune language models for biomedical question-answering. For each of the above models, we report experimental assessments supporting their efficacy across a wide variety of tasks and domains.

Item Type:

Thesis (PhD)

Subjects:

Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software

Library of Congress Subject Headings (LCSH):

Computational linguistics, Natural language processing (Computer science), Neural networks (Computer science), Sentiment analysis, Machine learning

Official Date:

December 2020

Dates:

Date	Event
December 2020	UNSPECIFIED

Institution:

University of Warwick

Theses Department:

Department of Computer Science

Thesis Type:

PhD

Publication Status:

Unpublished

Supervisor(s)/Advisor:

He, Yulan

Format of File:

pdf

Extent:

v, viii, 126 leaves : illustrations

Language:

eng

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics

University of Warwick
Publications service & WRAP

Highlight your research

The Library

Probabilistic neural topic models for text understanding

Abstract

Repository staff actions (login required)

Downloads

University of WarwickPublications service & WRAP

Highlight your research

The Library

Probabilistic neural topic models for text understanding

Abstract

Repository staff actions (login required)

Downloads

University of Warwick
Publications service & WRAP