On reducing the data sparsity in collaborative filtering recommender systems

[thumbnail of WRAP_Theses_Guan_2017.pdf]
Preview
PDF
WRAP_Theses_Guan_2017.pdf - Submitted Version - Requires a PDF viewer.

Download (2MB) | Preview

Request Changes to record.

Abstract

A recommender system is one of the most common software tools and techniques for generating personalized recommendations. Collaborative filtering, as an effective recommender system approach, predicts a user's preferences (ratings) on an item based on the previous preferences of other users. However, collaborative filtering suffers from the data sparsity problem, that is, the users' preference data on items are usually too few to understand the users’ true preferences, which makes the recommendation task difficult.

This thesis focuses on approaches to reducing the data sparsity in collaborative filtering recommender systems. Active learning algorithms are effective in reducing the sparsity problem for recommender systems by requesting users to give ratings to some items when they come in. However, this process focuses on new users and is often based on the assumption that a user can provide ratings for any queried items, which is unrealistic and costly. Take movie recommendation for example, to rate a movie that is generated by an active learning strategy, a user has to watch it. On the other hand, the user may be frustrated when asked to rate a movie that he/she has not watched. This could lower the customer's confidence and expectation of the recommender system. Instead, an ESVD algorithm is proposed which combines classic matrix factorization algorithms with ratings completion inspired by active learning, allowing the system to 'add' ratings automatically through learning. This general framework can be incorporated with different SVD-based algorithms such as SVD++ by proposing the ESVD++ method. The proposed EVSD model is further explored by presenting the MESVD approach, which learns the model iteratively, to get more precise prediction results. Two variants of ESVD model: IESVD and UESVD are also proposed to handle the imbalanced datasets that contains more users than items or more items than users, respectively. These algorithms can be seen as pure collaborative filtering algorithms since they do not require human efforts to give ratings. Experimental results show the reduction of the prediction error when compared with collaborative filtering algorithms (matrix factorization).

Secondly, traditional active learning methods only evaluate each user or items independently and only consider the benefits of the elicitations to new users or items, but pay less attention to the effects of the system. In this thesis, the traditional methods are extended by proposing a novel generalized system-driven active learning framework. Specifically, it focuses on the elicitations of the past users instead of the new users and considers a more general scenario where users repeatedly come back to the system instead of during the sign-up process. In the proposed framework the ratings are elicited by combining the user-focused active learning with item-focused active learning, for the purpose of improving the performance of the whole system. A variety of active learning strategies are evaluated on the proposed framework. Experimental results demonstrate its effectiveness on reducing the sparsity, and then enables improvements on the system performance.

Thirdly, traditional recommender systems suggest items belonging to a single domain, therefore existing research on active learning only applies and evaluates elicitation strategies on a single-domain scenario. Cross-domain recommendation utilizes the knowledge derived from the auxiliary domain(s) with sufficient ratings to alleviate the data sparsity in the target domain. A special case of cross-domain recommendation is multi-domain recommendation that utilizes the shared knowledge across multiple domains to alleviate the data sparsity in all domains. A multi-domain active learning framework is proposed by combining active learning with the cross-domain collaborative filtering algorithm (RMGM) in the multi-domain scenarios, in which the sparsity problem can be further alleviated by sharing knowledge among multiple sources, along with the data acquired from users. The proposed algorithms are evaluated on real-world recommender system datasets and experimental results confirmed their effectiveness.

Item Type: Thesis [via Doctoral College] (PhD)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Library of Congress Subject Headings (LCSH): Recommender systems (Information filtering), Expert systems (Computer science), Information filtering systems, Data mining, Machine learning, Computer algorithms
Official Date: April 2017
Dates:
Date
Event
April 2017
Submitted
Institution: University of Warwick
Theses Department: Department of Computer Science
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Li, Chang-Tsun
Format of File: pdf
Extent: vii, 141 leaves : illustrations, charts
Language: eng
URI: https://wrap.warwick.ac.uk/97978/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item