Jointly Learning Aspect-Focused and Inter-Aspect Relations with Graph Convolutional Networks for Aspect Sentiment Analysis

In this paper, we explore a novel solution of constructing a heterogeneous graph for each instance by leveraging aspect-focused and inter-aspect contextual dependencies for the specific aspect and propose an Interactive Graph Convolutional Networks (InterGCN) model for aspect sentiment analysis. Specifically, an ordinary dependency graph is first constructed for each sentence over the dependency tree. Then we refine the graph by considering the syntactical dependencies between contextual words and aspect-specific words to derive the aspect-focused graph. Subsequently, the aspect-focused graph and the corresponding embedding matrix are fed into the aspect-focused GCN to capture the key aspect and contextual words. Besides, to interactively extract the inter-aspect relations for the specific aspect, an inter-aspect GCN is adopted to model the representations learned by aspect-focused GCN based on the inter-aspect graph which is constructed by the relative dependencies between the aspect words and other aspects. Hence, the model can be aware of the significant contextual and aspect words when interactively learning the sentiment features for a specific aspect. Experimental results on four benchmark datasets illustrate that our proposed model outperforms state-of-the-art methods and substantially boosts the performance in comparison with BERT.


Introduction
Aspect sentiment analysis is a fine-grained sentiment analysis task, which aims to identify the sentiment polarity (e.g. positive, negative, or neutral) towards a given aspect (term) in a sentence. For example, given the aspects: food and service, and a sentence of review: great food but the service is dreadful, the sentiment polarity of aspect food is positive, while for the aspect service is negative. That is, in the task of aspect sentiment analysis, we need to discriminate sentiment polarities according to different aspects. The main challenge is that some aspects may contain no explicit sentiment expression.
To vividly illustrate the challenge, we give examples shown in Figure 1, where the key contextual words and corresponding aspects are highlighted in the instances paired with their polarity labels. In Figure 1(a), there is a multiple words aspect (i.e. "soup for the udon"). We could readily resolve that the aspect word "soup" is the key aspect word of this aspect. Thus, syntactical dependencies between this aspect word and the contextual words need to be attended for predicting aspect-specific sentiment. In Figure 1(b), aspect "toppings" and "place" are mentioned simultaneously in the sentence. There is a sufficiently clear positive sentiment word ("great") for aspect "toppings", while for aspect "place", which contains no sentiment expression, the sentiment polarity can also be identified thanks to the inter-aspect relations between aspect "toppings" and "place". Hence, both aspect-focused and inter-aspect contextual relations should be considered for improving the performance of aspect sentiment analysis.
Recently, with the development of deep learning technique, many neural network-based methods achieve promising performance in aspect sentiment analysis (Wang et al., 2016a;Tang et al., 2016a; The soup for the udon was soy sauce and water positive negative (a) Example of aspect-focused relations Great toppings definitely a place you need to check out (b) Example of inter-aspect relations Figure 1: Examples of the contextual relations of different aspect words within an aspect and the sentiment relations of different aspects in a sentence. Chen et al., 2017;Wang et al., 2018;Zheng et al., 2020). Subsequently, attention-based neural models are widely used in this task, which can enforce the model to focus on the given aspect (Wang et al., 2016b;Tang et al., 2016b;. In most previous methods, however, they generally embed aspect information into the sentence representation to learn the pertinent sentiment features for the specific aspect, which leads to a lack of capturing the inter-aspect sentiment relations for a specific aspect. Analogously, most existing graph network-based model merely consider the syntactical dependencies between the specific aspect and the context Huang and Carley, 2019;, which is insufficient to focus on which contextual dependencies along with aspect-specific words are essential for the specific aspect and also largely ignore the sentiment relations between different aspects in the sentence. Since intuitively, the role of distinct aspect word is different in deriving aspect expression. Besides, there are intricate sentiment relations among different aspects in many instances. In this paper, we explore a novel solution to construct heterogeneous graphs of sentences via enriching the contextual syntactical dependency representations of the key aspect words and leveraging the mutual sentiment relations between different aspects in the context. Based on it, an Interactive Graph Convolutional Networks (InterGCN) model is proposed to leverage the sentiment dependencies of the context. Here, the syntactical information from neighbors of each node is aggregated to derive the graph embeddings, so as to extract both aspect-focused and inter-aspect sentiment information for predicting aspect-specific sentiment polarity. The main contributions of our work can be summarized as follows: • We explore a novel solution to construct the graph for each instance, in which both aspect-focused and inter-aspect syntactical dependencies are introduced.
• An Interactive Graph Convolutional Networks model is proposed to derive aspect-specific sentiment features by interactively extracting the sentiment relations within aspect words and across different aspects in the context.
• Experimental results on four benchmark datasets show that the proposed model achieves the stateof-the-art performance in aspect sentiment analysis.

Related Work
Some early works mostly use machine learning algorithms to capture the sentiment polarity based on rich features about content and syntactic structures in aspect sentiment analysis (Pang et al., 2008;Jiang et al., 2011;Kiritchenko et al., 2014). Recently, deep learning models have achieved promising performance in aspect sentiment analysis (Tang et al., 2016a;Wang et al., 2016b;Tang et al., 2016b;Chen et al., 2017;Ma et al., 2017;Xue and Li, 2018;Li et al., 2019;Liang et al., 2019). The majority of current approaches attempt to pay more attention to the specific aspect based on attention mechanism. (Wang et al., 2016b) exploited attention mechanism to capture the contextual representations via paying attention to the key parts of the sentence according to the given aspect. (Tang et al., 2016b) proposed an attention-based memory network to store contextual words and conducted multi-hop attention to derive the sentiment representation for the aspect. (Chen et al., 2017) utilized a weighted-memory mechanism to produce a tailor-made memory for different opinion aspects based on memory network. In addition, (Xue and Li, 2018) utilized a gated CNN to selectively model the sentiment features according to the given aspect.  adopted a capsule network to construct vector-based feature representation and cluster features by an EM routing algorithm. (Majumder et al., 2018) considered the neighboring aspect-related information for the aspect-specific sentiment analysis with memory networks. Graph convolutional network (GCN) has achieved promising performance in many NLP tasks (Kipf and Welling, 2017;Zhang et al., 2018;Yao et al., 2019). In aspect sentiment analysis,  exploited GCN to capture syntactical information and word dependencies for the specific aspect over the dependency tree of a sentence. ) proposed a GCN model over the dependency tree of the sentence to enhance the feature representations of aspects learned by a Bi-directional LSTM (Bi-LSTM). In addition, to develop the merit of BERT (Devlin et al., 2019), a GCN model based on selective attention was proposed to extract and aggregate the most important contextual features for the aspect representation (Hou et al., 2019). The above GCN-based models, however, neither considered the specific aspect when constructing the graph of the sentence nor extracted inter-aspect sentiment relations for the specific aspect. To this end, based on the merit of GCN in aspect sentiment analysis, we explore a novel solution of constructing syntactical dependency graph for a sentence according to the specific aspect and propose an Interactive Graph Convolutional Networks (InterGCN) model to extract both aspect-focused and inter-aspect sentiment features for the specific aspect.

Proposed Approach
As demonstrated in Figure 2, the architecture of the proposed InterGCN model mainly contains two components: 1) aspect-focused graph convolutional networks, which aims to extract the aspect-specific sentiment features based on our novel syntactical dependency graph of the sentence, and 2) inter-aspect graph convolutional networks, which is designed to derive the sentiment relations between different aspects. The feature representations captured from these two components are interactively combined to produce the sentiment features for the specific aspect. We first assume that there is a sentence with n words and two aspects, i.e. s = {w 1 , w 2 , · · · , a 11 , a 12 , · · · , a 1p , · · · , a 21 , a 22 , · · · , a 2q , · · · , w n }, where w i represents the i-th contextual word and a ij represents the j-th word of aspect i. Each instance contains a sentence and one or more aspects corresponding to different sentiment polarities (P ositive, N egative, or N eutral), and each aspect may consist of single or multiple words. The aim of aspect sentiment analysis is to predict the sentiment polarity over a given aspect in a sentence.

Embedding Module
In our InterGCN model, each word embedding is a distributed representation of a word in the sentence, which is retrieved from the embedding lookup table V ∈ R m×|N | according to the word index, where |N | is the vocabulary size. And thus, for a sentence with n words, we can get the corresponding embedding matrix x = [x 1 , x 2 , · · · , x n ], where x i ∈ R m is the word embedding of w i , m is the dimension of word vectors. We exploit the pre-trained word embeddings GloVe (Pennington et al., 2014) and BERT (Devlin et al., 2019) to initialize word vectors, and fine-tune them during the training process.

Producing Ordinary Graphs over Dependency Tree
Inspired by the previous GCN-based works , we first produce an ordinary dependency graph for each input sentence over the dependency tree 1 : After that, an adjacency matrix D ∈ R n×n is derived via the dependency tree of the input sentence.

Refining Graphs for Specific Aspect
To highlight the specific aspect from the contextual words and capture aspect-focused enhanced dependency graph for the sentence, we refine the graph via computing a relative position weight for each element of the adjacency matrix according to the specific aspect: where |·| is an absolute value function, p s is the beginning position of the specific aspect, {a s i } is the word set of the specific aspect. And thus, we can capture the relative dependencies between words of the specific aspect and other contextual words.
To augment the syntactical dependencies of contextual words and produce the relations between aspect and contextual words, here, we integrate the aspect-focused weights and the ordinary dependency graph to derive an aspect-focused syntactical dependency adjacency matrix: (3) Intuitively, as mentioned above, some aspects might not signal distinct sentiment expression in the context. That is, the aspect-focused syntactical dependencies derived by the aspect might be insufficient for identifying the accurate sentiment relations, since the sentiment dependency derivation of those aspects should be provided with the help of other aspects. Thus to leverage the connections of multiple aspects in the sentence, we further refine the aspect-focused graph via incorporating relative graphs from other aspects into the aspect-focused adjacency matrix: where {a o i } is the word set of length l of other aspects, and the p o for each a ∈ {a o i } denotes the beginning position of the other aspect. The procedure of generating the adjacency matrix for each sentence via focusing on the specific aspect is depicted in Algorithm 1. Here, to enrich the information of dependencies for the input sentence, we construct the adjacency matrix in un-directional, i.e. A F i,j = A F j,i .

Constructing Inter-Aspect Graphs
According to Figure 1, sentiment polarities of some aspects require to be predicted through the sentiment relations between others in the sentence. Hence, we screen the aspects from the sentence and construct Algorithm 1: The procedure of constructing the aspect-focused adjacency matrix for a sentence Input: a sequence of words s = {w1, w2, · · · , wn}; a set of aspect-specific words a s = {a s 1 , a s 2 , · · · , a s p }; a set of other aspects words a o = {a o 1 , a o 2 , · · · , a o q }; the dependency tree of the sentence dependency(s) 1 for i = 1 → n; j = 1 → n do 2 Producing the ordinary graph over dependency tree 3 if dependency(wi, wj) ∈ dependency(s) or i = j then 4 Di,j ← 1 5 else 6 Di,j ← 0 7 Refining the adjacency matrix of the graph for the specific aspect 8 if wi ∈ a s and wj ∈ a s then Leveraging the relations between multiple aspects corresponding to the specific aspect 21 for a ∈ a o do 22 an inter-aspect adjacency matrix for these aspects to derive the contextual sentiment dependencies of these aspects: Analogously, to capture the interactive dependencies between multiple aspects in the sentence, we also construct the inter-aspect graph of the sentence in un-directional: A Inter

Interactive Graph Convolutional Network
In InterGCN, aspect-focused GCN takes each aspect-focused graph and corresponding word embedding matrix as input, and the inter-aspect GCN receives the inter-aspect graph and hidden representations learned by aspect-focused GCN layers to produce interactive sentiment features for the specific aspect. Each node in the l-th GCN layer is updated according to the hidden representations of its neighborhoods: where g l−1 i = F(h l−1 i ) is the hidden representation evolved from the preceding GCN layer. F(·) is a position-aware transformation function, which is utilized in a previous GCN-based work .Ã F is a normalized symmetric of an aspect-focused adjacency matrix: where D F i = n j=1 A F i,j is the degree of A F i . The original nodes of the aspect-focused GCN layers are derived from the hidden representations of Bi-LSTM layers, which takes word embeddings as input: Interactively, original nodes of the inter-aspect GCN layers are generated by the aspect-focused GCN layers. After that, we can successively capture the final representations of the aspect-focused and interaspect GCN layers, i.e. h F and h Inter . And thus, we combine these two final representations to extract the interactive relations between aspect-focused features and inter-aspect features: where γ is the coefficient of inter-aspect features. To highlight the significant features of aspect words, we exploit aspect-specific masking to mask the non-aspect representations:H mask = {0, · · · ,h τ , · · · ,h τ +k−1 , · · · , 0}, whereh t is the representation of the t-th word learned by InterGCN, τ is the beginning index of the specific aspect, and k is the length of the aspect. Then inspired by , we adopt a retrieval-based attention mechanism to capture significant sentiment features from the context representations for the specific aspect: Hence, the final representation of the input with respect to the specific aspect is formulated as: where softmax(·) is the softmax function to obtain the output distribution of the classifier.

Model Training
The objective to train the classifiers is defined as minimizing the cross-entropy loss between predicted and ground-truth distribution: Where S is the number of training samples, C is the number of classes.ŷ is the ground-truth distribution of sentiment. λ is the weight of the L 2 regularization term. Θ denotes all trainable parameters.   Table 1.
In our experiments, we use GloVe vectors (Pennington et al., 2014) to initialize each word into 300dimensional word embedding for all non-BERT models. The dimensionality of hidden vector representations is set to 300. The number of GCN layers is set to 2, which is the optimal depth in pilot studies. The coefficient γ is set to 0.2, and the coefficient λ of L 2 regularization item is set to 10 −5 . Adam is utilized as the optimizer with a learning rate of 10 −3 to train the model, and the mini-batch size is 16. We randomly initialize all the W and b with uniform distribution 2 .

Comparison Models
We compared the proposed model (InterGCN) with the following methods: SVM (Kiritchenko et al., 2014) trains a SVM classifier based on conventional feature extraction methods. TD-LSTM (Tang et al., 2016a) models bidirectional contextual features for a given aspect with LSTMs. ATAE-LSTM (Wang et al., 2016b) explores aspect-specific attention mechanism based on LSTM.
MemNet (Tang et al., 2016b) exploits word and position attention to focus on specific aspect by a multihop memory network. IAN (Ma et al., 2017) learns the interactive relationships for aspect and context representations by an interactive attention network. RAM (Chen et al., 2017) proposes a recurrent attention memory network for aspect sentiment analysis. GCAE (Xue and Li, 2018) explores a gated CNN to control the flow of features for a given aspect. MGAN (Fan et al., 2018) exploits fine-grained and coarse-grained attention mechanisms to capture the word-level interaction between aspect and context. AOA (Huang et al., 2018) utilizes an attention-over-attention model to learn the interaction between aspect words and contextual words. TNet-LF ) exploits a target-specific transformation component to better integrate target information into the word representations. IARM (Majumder et al., 2018) extracts the influence of the neighboring aspects related information for the aspect sentiment analysis.  Table 2 shows the comparison results on four benchmark datasets, which demonstrate that the proposed InterGCN consistently outperforms all comparison models. This verifies the effectiveness of our proposed InterGCN in aspect sentiment analysis. We can note that the proposed InterGCN+BERT achieves SVM (Kiritchenko et al., 2014) 80.16 -70.49 -----TD-LSTM (Tang et al., 2016a) 78   significant improvement on all datasets compared to the other BERT-based models, which indicates that InterGCN can be easily integrated with pre-trained BERT and improve the performance of aspect sentiment analysis. It is also noteworthy that both AFGCN and InterGCN perform significantly better than the previous GCN-based models (for both non-BERT and BERT), which fundamentally verifies the effectiveness of the novel solution of constructing graphs exploited in this work. Compared with AFGCN, which only utilizes aspect-focused GCN layers, the complete InterGCN achieves better performance on all datasets. This denotes that exploiting the interaction of the features extracted from aspect-focused and inter-aspect GCN layers can further improve the performance of aspect sentiment analysis.

Ablation Study
We conduct an ablation study to further analyze the impact of different components of InterGCN. The results are shown in Table 3. We can observe that removal of "dependency tree" degrades the performance slightly, which indicates that the dependency tree based graph construction can improve the quality of dependency representations but it is not an essential part of InterGCN. We can also notice that model without "aspect-focused" performs most unsatisfactorily on all datasets, verifying that incorporating aspect-focused information into the model is the most important improvement for aspect sentiment analysis. In addition, both removal of "aspect relations" and "inter-aspect" lead to performance drops evidently, which further indicates that extracting sentiment relations between different aspects can largely  improve the performance of aspect sentiment analysis.

Impact of GCN Layers
We investigate the impact of the layer number on the performance of our proposed InterGCN. We vary the layer number from 1 to 8 and report the results in Figure 3. Overall, 2-layer GCN achieves better performance on all datasets, and thus we set the number of GCN layers as 2 in our experiments. Comparatively, 1-layer InterGCN performs unsatisfactorily, which indicates that 1-layer GCN is insufficient to derive precise syntactical dependencies of the context towards the specific aspect. Additionally, the performance of InterGCN fluctuates with the increasing number of GCN layers and essentially tends to decline when the model depth is greater than 4. This implies that roughly increasing the depth of GCN is vulnerable to slash the learning ability of the model due to the sharp increase of model parameters.

Impact of the Coefficient of Inter-Aspect
To further analyze the effect of extracting inter-aspect relations in InterGCN, we conduct experiments based on different values of γ and demonstrate the results in Figure 4. We can observe that as the value of γ increases from 0 to 0.2 the performance improves steadily, which implies that appropriately incorporating the interactive features extracted from inter-aspect GCN layers can assist the aspect-focused component to learn precise aspect-specific sentiment features and improve the performance of aspect sentiment analysis. However, the curve quite fluctuates when the value of γ is greater than 0.2 and considerably tends to decline when γ is greater than 0.5. This indicates that excessively consider extracting inter-aspect relations of the sentence may hamper the learning of aspect-specific sentiment features.

Analysis of Multiple Aspects
To further analyze the improvement of the performance brought by multiple aspects sentences with the proposed InterGCN, we separate the training instances into different groups according to the number of aspects in the sentences and report the training accuracy for different data groups in comparison with a precious GCN-based model (ASGCN-DG) on REST14 and LAP14 dataset 4 . The comparison results are demonstrated in Figure 5. We can observe that the fitting results of the proposed InterGCN are superior in comparison with ASGCN-DG on all numbers of aspects. In addition, InterGCN achieves remarkable fitting results on all vary of aspect numbers, however, the performance of fitting is insufficient with ASGCN-DG when the number of aspects is either particular small or great. This implies that InterGCN can capture significant sentiment features for a specific aspect with aspect-focused GCN when the number of aspects is small in the sentence. Concurrently, InterGCN can commendably discriminate the sentiment features of multiple aspects sentences with the help of inter-aspect relations.

Visualization
To qualitatively demonstrate how the proposed InterGCN improves the performance of aspect sentiment analysis, we visualize the attention weights by showing two typical examples, which are cited in Figure  1. The results of attention weight visualization are demonstrated in Figure 6. According to Figure 6(a), the proposed InterGCN can pay more attention to the key aspect word for extracting aspect-specific sentiment. In addition, the multiple-aspects example shown in Figure 6(b) denotes that InterGCN can connect the related aspect when dealing with an aspect without intuitive sentiment expression. This verifies that interactively learning the aspect-focused and inter-aspect sentiment relations can derive more precise aspect-specific sentiment features and improve the performance of aspect sentiment analysis.

Conclusion
In this paper, we explore a novel solution of constructing aspect-focused and inter-aspect dependency graphs for aspect sentiment analysis. Based on it, an Interactive Graph Convolutional Networks (In-terGCN) model is proposed to extract the aspect-specific sentiment features from the aspect-focused and inter-aspect perspective. To this end, the proposed InterGCN model can pay significant attention to the key aspect words when dealing with multiple words aspect, and connect the valuable sentiment features of related aspects when considering an aspect without distinct sentiment expression. Experimental results on four benchmark datasets show that the proposed InterGCN can outperform state-of-the-art methods including remarkable GCN-based models and BERT-based models.