
The Library
Robust regression on clustered data and signature based online Arabic handwriting recognition
Tools
Wilson-Nunn, Daniel (2021) Robust regression on clustered data and signature based online Arabic handwriting recognition. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Wilson_Nunn_2021.pdf - Submitted Version - Requires a PDF viewer. Download (1489Kb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3733264
Abstract
In this thesis, we present two different methodologies; one for processing time series data for use in machine learning and the other, a robust linear regression for clustered data. The foundation in both of these methodologies is attempting to utilise time series data in ways in which have traditionally been prohibitive, owing to the ragged nature of such data. In order to use standard machine learning tools to classify online Arabic handwritten characters, we develop a dyadic iterated integral path signature approach to processing the underlying time series data. The process developed transforms raw online Arabic handwritten character data in the form of multiple time series, into a single set of features that can be used as features for machine learning. When applied to the Online KHATT segmented character data set, the methodology combined with both random forests and long short term memory (LSTM) neural networks demonstrates a dramatic improvement in recognition performance over the previously published best (using hidden Markov models).
Furthermore, this processing methodology can be applied to any number of similar scenarios including other online handwritten scripts and even drawings on tablets. Secondly, with the aim of carrying out polynomial regression using the iterated integral pathlogsignature, we present a robust eigenvalue polynomial regression. This new form of regression is designed to significantly reduce the impact of clustered data on the fitting of apolynomial approximation to the data. Using knowledge of the location of the clusters of data in space, combined with the region over which we wish to obtain a robust estimate, this eigen- value based method can be seen to have vast improvements over standard least squares linear regression. The methodology is demonstrated to result in a large decrease in the L2 error of polynomial approximations to a number of functions.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | P Language and Literature > P Philology. Linguistics Q Science > QA Mathematics Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software Z Bibliography. Library Science. Information Resources > Z004 Books. Writing. Paleography |
||||
Library of Congress Subject Headings (LCSH): | Writing -- Data processing, Writing -- Statistical methods, Arabic language -- Writing, Optical character recognition, Regression analysis | ||||
Official Date: | June 2021 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Statistics | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Papavasiliou, Anastasia, 1975- ; Lyons, T. J. (Terry J.), 1953- ; Ni, Hao | ||||
Format of File: | |||||
Extent: | x, 95 leaves : illustrations | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year