Robust regression on clustered data and signature based online Arabic handwriting recognition

[thumbnail of WRAP_Theses_Wilson_Nunn_2021.pdf]
Preview
PDF
WRAP_Theses_Wilson_Nunn_2021.pdf - Submitted Version - Requires a PDF viewer.

Download (1MB) | Preview

Request Changes to record.

Abstract

In this thesis, we present two different methodologies; one for processing time series data for use in machine learning and the other, a robust linear regression for clustered data. The foundation in both of these methodologies is attempting to utilise time series data in ways in which have traditionally been prohibitive, owing to the ragged nature of such data. In order to use standard machine learning tools to classify online Arabic handwritten characters, we develop a dyadic iterated integral path signature approach to processing the underlying time series data. The process developed transforms raw online Arabic handwritten character data in the form of multiple time series, into a single set of features that can be used as features for machine learning. When applied to the Online KHATT segmented character data set, the methodology combined with both random forests and long short term memory (LSTM) neural networks demonstrates a dramatic improvement in recognition performance over the previously published best (using hidden Markov models).

Furthermore, this processing methodology can be applied to any number of similar scenarios including other online handwritten scripts and even drawings on tablets. Secondly, with the aim of carrying out polynomial regression using the iterated integral pathlogsignature, we present a robust eigenvalue polynomial regression. This new form of regression is designed to significantly reduce the impact of clustered data on the fitting of apolynomial approximation to the data. Using knowledge of the location of the clusters of data in space, combined with the region over which we wish to obtain a robust estimate, this eigen- value based method can be seen to have vast improvements over standard least squares linear regression. The methodology is demonstrated to result in a large decrease in the L2 error of polynomial approximations to a number of functions.

Item Type: Thesis [via Doctoral College] (PhD)
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Z Bibliography. Library Science. Information Resources > Z004 Books. Writing. Paleography
Library of Congress Subject Headings (LCSH): Writing -- Data processing, Writing -- Statistical methods, Arabic language -- Writing, Optical character recognition, Regression analysis
Official Date: June 2021
Dates:
Date
Event
June 2021
UNSPECIFIED
Institution: University of Warwick
Theses Department: Department of Statistics
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Papavasiliou, Anastasia, 1975- ; Lyons, T. J. (Terry J.), 1953- ; Ni, Hao
Format of File: pdf
Extent: x, 95 leaves : illustrations
Language: eng
URI: https://wrap.warwick.ac.uk/162465/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item