Target-adaptive Graph for Cross-target Stance Detection

Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive pragmatics dependency graphs (TPDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent pragmatics dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target-adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection.


INTRODUCTION
Stance detection aims to identify people's opinionated standpoint or attitude (i.e. favor, against, or none etc.) expressed in text towards a specific target [1,9,23,25,39]. Thanks in part to the availability of data sufficiently annotated with target-dependent stance labels, previous methods achieved promising performance in targetdependent stance detection when trained and tested on the same dataset of targets [7,18]. However, in practice, it is not possible to enumerate all possible targets beforehand for training stance detection models. As such, there is an urgent need to learn a cross-target stance classifier for targets with few or no labeled data.
To illustrate the task of cross-target stance detection, we show examples in Figure 1 where the source and destination targets are paired with their corresponding sentences and stance labels. Suppose there is no annotated data for the destination target "Legalization of Abortion", i.e. "Legalization of Abortion" is unseen in the training dataset. Cross-target stance detection aims to build stance classifiers trained on features extracted from the context of source targets which might be relevant to the destination targets, so as to alleviate the sparsity or lack of annotated data for the stance detection of destination targets.
Some recent studies have been adopted to address cross-target stance detection [32,34,37]. These methods either leverage shared features for stance detection of destination targets by way of modeling the topical information with source targets [32,34] or incorporate external knowledge between source and destination targets  : Examples paired with their targets and stance labels. "Source target" denotes the target labeled in the training set, whereas "Destination target" denotes the target is unseen in the training dataset but occurs in the test dataset.
into model learning [37]. Existing methods largely focused on extracting shared information across different targets. Moreover, they only considered the contextual stance expressions in the annotated target dataset. We argue that words may play different roles when used in stance expressions for different targets. As such, it is desirable to leverage fundamental word-level pragmatics dependencies across all targets in order to improve performance of stance detection of unknown targets.
As shown in Figure 1, in Example 1, noting that the stance expressions relating to the word "equality" present the opposite stance for the two targets. That is, directly employing the stance information associated with the source target for the learning of stance representations of the destination target may produce wrong results. The main reason is that the same word or expression may signal different stances when associating with different targets. Therefore, it is important to understand the word-level pragmatics information and adapt it for different targets, which could lead to the improved performance in cross-target stance detection. Here, we regard the words (such as "equality") whose inherent stances are target-dependent as in-target words. Additionally, in Example 2, words with colors expressing the same stance regardless of the targets associated with are regarded as target-independent stance expressions. These words can effectively boost the performance of stance detection for unknown targets. Correspondingly, we regard these words as cross-target words. We argue that the main challenges in stance detection are to identify these two types of words (in-target v.s. cross-target), and model the context features for stance detection of targets based on different words types (targetadaptive). Specifically, we develop our methodology based on the following hypotheses: • For words exclusively occurred in stance expressions for certain targets and always convey the same stance, modeling and adapting these words for deriving pragmatics information according to their associated targets could improve the learning of stance representations. • If the occurrence of words evenly distributes across different targets, then the pragmatics dependencies formed by these words should be target-independent and they will be useful for generating stance representations for any targets including unknown targets.
To better address cross-target stance detection, in this paper, we propose a novel framework to leverage the fundamental word-level pragmatics dependencies of stance expressions towards a target by constructing target-adaptive heterogeneous (syntactic dependency and pragmatics information) graphs from the in-target and the cross-target perspectives. Utilizing the interactions between different targets, the proposed framework can capture the stance information more accurately and distill the knowledge under different targets with better interpretability.
Specifically, 1) we first compute the target-specific pragmatics weights at the word-level. Here, to capture the inherent role of a word in stance expressions towards a target, we compute the word's relative occurrence frequency in the context of the target in comparison with that of other targets. We then make use of the stance information from the annotated training data to derive the stance-related pragmatics weight for the word. We next construct an in-target graph in which nodes are contextual words and edges between nodes are determined by the pragmatics information and dependency parsing results. The weight of the edge connecting between contextual words is determined by the target-specific pragmatics weight. 2) corresponding to produce target-specific pragmatics weight for each word, we consider the distribution of each word across different known targets to discern which words are the clues for deriving stance expressions to different targets including the unknown targets. Here, we still adopt the stance information extracted from the annotated training data to compute the pragmatics weight for each word across all the targets, and then each edge of another pragmatics dependency graph is derived along with the dependency three and the word-level pragmatics weight across the targets (called cross-target graph).
To leverage both the in-target and cross-target pragmatics dependencies of higher order neighborhoods in heterogeneous graphs, we propose a graph-aware model with interactive GCN blocks to capture the stance information towards a target and adapt the context representations for the target of interest. That is, for each GCN block, the information from neighbors of each node is aggregated to generate the in-target graph embeddings by modeling the in-target graph, and the cross-target graph is further utilized to enrich graph embeddings learning and modify the target-adaptive contextual stance representations. The main contributions of our work can be summarized as follows: • We are the first to study cross-target stance detection by leveraging target-adaptive pragmatics dependencies of the context based on both in-target and cross-target heterogeneous graphs. The word-level pragmatics information can be decomposed into target-dependent and target-independent, which can be subsequently adapted for the stance detection of unknown targets.
• A novel graph-aware model with interactive GCN blocks is proposed to learn contextual graph representations, which allows the learning of more accurate stance representations for unknown targets. • Experimental results on a number of benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art models in cross-target stance detection.

RELATED WORK 2.1 Stance Detection
Stance detection aims to detect the attitude of a context (e.g. comment or review) according to the given target [6,13,17,27], which is critical to many scenarios such as argumentation mining [22], fake news detection [12], fact checking [29] [34] proposed a self-attention based neural model to extract the shared features learned from a source target to a destination target and improved the generalization in certain scenarios. To employ transferable topic knowledge from source targets to destination targets, Wei and Mao [32] learned latent topics with neural variational inference [16,24] to enhance text representations and adopted adversarial training technique to learn more target-invariant representations. Zhang et al. [37] employed external semantic and emotion knowledge as a bridge to enable knowledge to transfer across different targets and enrich the representation learning of the text and target. These works partially extract transferable stance features from source targets to destination targets, while they always ignore the learning of the most rudimentary word-level pragmatics dependencies information across different targets. Since the word-level pragmatics dependencies can perfect the text representation adapt to the target via generalizing the stand expressions across different targets at the principal pragmatics level.

Graph Neural Network
Graph neural networks (GNN) have attracted uptrend attention, since the information in GNN can be propagated through a graph structure rather than as a simple feature [33,42]. Recently, graph neural network-based models have achieved promising performance in many NLP tasks, such as text classification [35,41], sentiment analysis [30], fake news detection [15], neural machine translation [36], Chinese NER [4,8] Figure 2: The architecture of the proposed target-adaptive pragmatics dependency graph (TPDG) framework.
dependencies for the specific aspect. Tang et al. [28] proposed a dependency graph enhanced dual-transformer network for aspect sentiment analysis to support mutual reinforcement between the flat representation learning and graph-based representation learning. Zhang et al. [40] proposed a extension of the graph convolutional network which is tailored for relation extraction via encoding the dependency structure over the input sentence based on dependency tree. These studies presented the importance of the decent initial weights of graphs. To leverage the target-adaptive semantic dependencies of the sentence, inspired by the success achieved by previous GCN-based methods [38,40], we explore a novel solution of constructing semantic dependency graph for each sentence and propose a novel graph-ware model with interactive GCN blocks to model the word-level semantic dependencies for deriving precise stance expression from both in-target and cross-target perspective.

METHODOLOGY
In this section, we present our proposed Target-adaptive Pragmatics Dependency Graph (TPDG) framework for cross-target stance detection in details. We first define the task of cross-target stance detection in Section 3.1, and then proceed to describe each of the components of our proposed framework. As demonstrated in Figure 2, the architecture of the proposed TPDG framework contains four main components: 1) vector representation, which derives the word representations of the input context with bidirectional LSTM layers (described in Section 3.2), 2) heterogeneous graphs construction, which constructs in-target and cross-target graphs and learn target-adaptive pragmatics dependencies of the context from both graphs (described in Section 3.3 and 3.4), 3) interactive GCN blocks, which are designed to leverage the target-dependent and targetindependent contextual graph representations for a given target (described in Section 3.5), and 4) stance representation, which captures the crucial clues for stance detection and output the final representation (described in Section 3.6).

Task Description
Given a collection set of annotated instances towards source targets and a set of unlabeled instances towards destination targets (there will be one or more destination targets) where y i s is the stance label of an annotated instance of the source target t s , N s and N d are the number of the instances towards the source and destination targets, respectively. The goal of cross-target stance detection is to model the stance features of each sentence r i s towards the source target t s from D s and predict the stance label y i d of each sentence r i d towards the corresponding destination target t i d in D d .

Vector Representation
For a sentence consists of n words r = {w i } n i=1 , we embed each word in the sentence into an m-dimensional embedding via mapping the embedding x i ∈ R m from the lookup table X ∈ R m×|V | , V is the full vocabulary, |V | is the vocabulary size. Then we can obtain an embedding matrix for each sentence r , i.e.
Subsequently, we utilize bidirectional LSTMs to encode the input sentence into vector representations with embedding matrix: where h t denotes the hidden vector representation of x t in time step t, ⊕ represents the concatenation.

Target-adaptive Pragmatics Weight Computation
To understand and adapt the stance expression of the context with respect to the target, a series of pivotal semantically-important words that point at the target are discerned from the dataset, i.e. these words either play distinct roles for distinct targets or dominate the stance across all targets. Thereupon, we compute the pragmatics weight for each word from two perspectives: 1) From the in-target perspective, we would define the pragmatics weight of the word based on the degree of pragmatics association between the word and the target. That is, the word is assigned with distinctive pragmatics weight for distinct target. 2) From the cross-target perspective, the pragmatics weight of the word is determined by the pragmatics frequency distribution across all targets. Here, a word with a large weight indicates that it is semantically-rich and could be adopted to derive target-adaptive stance expression for different targets.
There are many possible ways to derive pragmatics weights for contextual words. Such as word frequency [35], cosine similarity between words [14], external knowledge [37] et al. But they generally focus on the relations of the contextual words, which is unable to distinguish the distinct role of different contextual words for the target. In our work, we propose a novel approach to automatically capture the weights for the words based on the pragmatics information of annotated source target and the importance of word occurrence in other unlabeled destination targets.
In-target Pragmatics Weight Computation. Based on the times of each word appearing over the whole corpus, the pragmatics weight for each word w k in V adapting to the target can be computed as: where # D s (w k ) is the times of w k occurs in the specific source target dataset D s , # D o (w k ) represents the times of w k occurs in other target datasets D o 1 . ξ I (w k ) represents the stance-related weight of w k : where # Favor (w k ) and # Against (w k ) represent the times of w k appearing in "favor" and "against" instances of source target dataset respectively. # Favor D s and # Against D s denote the number of "favor" and "against" samples in the source target dataset respectively.
Here, to capture more semantically-rich information, we only consider "favor" and "against" instances when deriving stance-related weight. Since instances from these two labels contain more definite pragmatics information.
Cross-target Pragmatics Weight Computation. pragmatics information across different targets is significant to detect stance for unknown target. Therefore, based on the in-target pragmatics weight, we leverage the word occurrence information to compute the cross-target pragmatics weight for each word: where N T is the number of targets, # D max (w k ) and # D min (w k ) represent the dataset with the largest and the least number of w k respectively. Because instances of destination unknown targets are unlabeled, here, we only integrate the stance-related weight of the source target into the computation of the cross-target pragmatics weight.
In this way, we can obtain the pragmatics weight for each word according to degree of contribution in different stance expressions from both in-target and cross-target perspective.

Pragmatics Dependency Graphs Construction
Based on the target-adaptive pragmatics weight learned above, this section presents how we construct the heterogeneous pragmatics dependency graphs of adjacency matrices for each sentence. Noting that the graph can preserve global structure information of contextual words, our proposed method aims to emphasize the crucial word relations and evade the inconsequential ones. That is, if both words have optimistic pragmatics weights, their edge weight in the graph will be large. Conversely, if one of the words has a very small pragmatics weight, the weight of their edge would be vastly reduced.
To develop the syntactical dependency, before integrating pragmatics information, we first construct the graph for each sentence over the dependency tree to capture the word dependencies of the sentence 2 . Here, the adjacency matrix D ∈ R n×n for each sentence can be derived from the dependency tree of the sentence T : where T (w i , w j ) represents that w i is connected to w j in the dependency tree of the sentence. Here, inspired by previous GCN-based methods, we simply assume that the dependencies between parents and children nodes in the dependency parsing are symmetrical, which is wildly accepted in GCN based methods [28,38]. Thus we construct the graph with undirected to enrich the dependency information of the adjacency matrix, i.e. D i, j = D j,i , and following [11], we also set a self-loop for each word, i.e. D i,i = 1.
In-target Graph Construction. Here, we integrate the pragmatics information learned from the in-target perspective into the adjacency matrix derivation. The edge weight of each node pair of the in-target graph adjacency matrix A I ∈ R n×n for each sentence can be obtained by: In this way, the pragmatics information towards the target could be integrated into the context representation via the in-target pragmatics dependency graph structure.
Cross-target Graph Construction. Additionally, to harmonize and refine the graph structure of each sentence for adapting to different targets, we integrate the cross-target pragmatics information into producing the adjacency matrix A C ∈ R n×n for cross-target graph: Consequently, each sentence can derive two different graphs (i.e. in-target and cross-target graph) according to the dependency parsing results and the target-adaptive pragmatics information of the context. Here, the in-target graph preserves and adapts the pragmatics dependencies of the contextual words according to the target. That is, even though for the unknown target, we can still obtain a distinctive in-target graph towards the target. Besides, for the cross-target graph, it harmonizes the target-adaptive pragmatics information of words across all the targets. That is, the cross-target graph could act on distinct targets to derive target-adaptive stance expression, including the unknown target with no annotated data. The procedure of generating the adjacency matrices of in-target Algorithm 1: Deriving adjacency matrices of in-target and cross-target graph for each sentence ▷ Deriving the cross-target adjacency matrix and cross-target pragmatics dependency graphs for each sentence is depicted in Algorithm 1.

Interactive GCN Blocks
Based on the pragmatics dependency graphs learned over dependency tree and target-adaptive pragmatics information, here we discuss how to leverage the target-adaptive stance expressions to the destination targets. For each interactive GCN block, an in-target GCN layer and a cross-target GCN layer are assembled to interactively and adaptively learn and adjust the target-adaptive graph representations for stance detection. Each node in the l-th GCN block is updated according to the hidden representations of its neighborhoods according to the adjacency matrices of in-target and cross-target graph, the process is defined as: where д l −1 is the hidden representation evolved from the preceding GCN block.Ã is a normalized symmetric adjacency matrix: where E i = n j=1 A i, j is the degree of A i . Here, the original input nodes of the first GCN block are derived from the vector representations learned by bidirectional LSTM layers in Section 3.2, i.e.

Stance Representations
For each instance, inspired by Zhang et al. [38], we adopt a retrievalbased attention mechanism to capture significant stance features based on the final graph representations learned by interactive GCN blocks and the contextual vector representations derived from bidirectional LSTM layers: where ⊤ represents matrix transposition, д L is the final output of GCN blocks. Then, the final stance representation of the input instance is formulated as: After that, we adopt a fully-connected layer with softmax normalization to yield a probability distribution of stance representation: whereŷ ∈ R d p is the predicted stance probability for the input instance towards the target, d p is the dimensionality of stance labels. W o ∈ R d p ×2d h and b o ∈ R d p are parameters to be learned, and d h denotes the dimensionality of hidden representation.

Learning Objective
The objective of training the model is to minimize the cross-entropy loss on the dataset of source target D s via the standard gradient descent algorithm, the corresponding function is defined as: where y i is the ground-truth stance label distribution of instance i, y i is the estimated distribution, Θ denotes all trainable parameters of the model, λ represents the coefficient of L 2 regularization term.

EXPERIMENTAL SETUP 4.1 Experimental Data
We conduct experiments on two benchmark datasets from SemEval-2016 Task 6 [17] (Sem16) and a large-scale dataset of targeted stance detection collected from Twitter [2] (Wt-wt). The statistics of the experimental data are shown in Table 1.
Sem16. SemEval-2016 Task 6 contains a stance detection dataset of stance-bearing tweets on different targets. Following the previous cross-target stance detection studies [32,34,37], we select four targets from the dataset: Feminist Movement (FM), Legalization of Abortion (LA), Hillary Clinton (HC) and Donald Trump (DT), which are categorized into two domains: Women's Rights (FM, LA) and American Politics (HC, DT). Each instance in Sem16 dataset could be classified as favor, against or none. Following [37], we also extend Sem16 dataset by adding an additional Trade Policy (TP) target as the fifth target in American Politics domain. Following [32], we split the labeled data of destination target to obtain development and test set with 3:7.
Wt-wt. The largest available stance detection dataset so far, which consists of 51,284 tweets in discussing mergers and acquisition operations between companies. Wt-wt contains 8 companies:

Evaluation Metrics
For Sem16 dataset, following [32], we perform mean value of Macro F1-score for favor and against to measure the classification performance of the models: Macro F1-score. In addition, following [34], the average score of both the micro-averaged F1 (large classes dominate) and the macro-averaged F1 (small classes dominate) is computed as another evaluation metric to alleviate the imbalance of targets in the dataset: F 1 avд = (F 1 micr o + F 1 macr o )/2. For Wt-wt dataset, following [2], we utilize Macro F1-score for all labels to measure the performance of the models for all targets.

Training Setup
The word embeddings are initialized with the pre-trained 300dimensional word vectors from GloVe [19]. The number of GCN blocks is set to 3, which is the optimal depth in pilot experiments. The dimensionality of hidden vector representations is set to 300. The coefficient λ of L 2 regularization is set to 10 −5 . Adam is utilized as the optimizer with a learning rate of 10 −3 to train the model, and the mini-batch size is 16. All the W and b of network layers are randomly initialized with uniform distribution 3 .

Comparison Models
We compare and evaluate our model with several strong related works, summarized as follow: • BiLSTM: Utilizing two bidirectional LSTMs to learn the sentence and the target separately, and the hidden representations from both directions are combined to predict the stance label. • textCNN-E: Extending TextCNN [10] to the cross-target stance detection task, in which the word vector is represented as a 3D tensor by integrating the semantically and emotionally related words. • BiCond [1]: Adopting bidirectional conditional encoding to learn both the sentence and the target representation for detecting stance expression.

EXPERIMENTAL RESULTS
This section presents how models perform on stance detection. In Section 5.1, we first approximately demonstrate the comparison results in targeted stance detection (i.e. train and test on the same target). Subsequently, we focus on the stance detection for unknown targets in Section 5.2. Section 5.3 analyzes the generalizability of our proposed model across all the targets. Section 5.4 shows an ablation study of our proposed model. Afterwards, we analyze the impact of pragmatics words (in Section 5.5), destination target data size (in Section 5.6) and GCN blocks (in Section 5.7) of our proposed model. Finally, Section 5.8 presents a case study. Table 2 shows the stratified 10-fold cross-validation comparison results in targeted stance detection with several remarkable models, including neural network-based models (BiLSTM and BiCond), pre-trained model (BERT), attention-based models (CrossNet and ATT-LSTM ), and graph-based model (ASGCN). We can observe that our proposed model achieves the best performance over all targets on all datasets. Specifically, the best improvement are 12.1% on HC target from Sem16 dataset and 5.7% on CVS_AET target from Wt-wt dataset. This demonstrates that our proposed model, which leveraging target-adaptive pragmatics dependencies by fundamentally identifying and adapting the stance expression for the distinct target with a graph-aware model, outstandingly improves the performance of stance detection in a more simple targeted stance detection task.

Results of Cross-target Stance Detection
Cross-target Stance Detection on Sem16. Table 3 shows the comparison results over 8 cross-target tasks on Sem16 dataset. We can see that, compared with targeted stance detection, all the previous models achieve inferior performance on all cross-target tasks, which demonstrates the challenge of cross-target stance detection. It is observed that our proposed model (TPDG) consistently outperforms all comparison models on all cross-target tasks. Among them, the best improvement of F1-score and F 1 avд are 19.2% and 17.4% on HC→TP, which explicitly verifies the tremendous superiority of our proposed model in cross-target stance detection. Owing to the limitation of unknown destination target information, BiL-STM, BiCond and TextCNN-E overall perform worst since they neither leverage target-specific contextual information nor learn transferable knowledge for the destination target. Analogously, BERT can leverage rich semantic information, but it still produces a poor performance because of the ignorance of target-adaptive stance expressions for the unknown target. Comparatively, models that considering target information (ATT-LSTM and ASGCN) perform slightly better, since they explicitly incorporate the target information into the sentence representation. Among them, AS-GCN evidently performs better than ATT-LSTM, which implies the latent superiority of graph-based model in stance detection. Additionally, appreciable performance is achieved by models that Table 3: Experimental results of cross-target stance detection on Sem16 dataset. FM→LA represents training on FM (source target) and testing on LA (destination target), etc. The results with ♮ are retrieved from [37].

Model
FM→LA LA→FM HC→DT DT→HC HC→TP TP→HC DT→TP TP→DT F1-score F 1 av д F1-score F 1 av д F1-score F 1 av д F1-score F 1 av д F1-score F 1 av д F1-score F 1 av д F1-score F 1 av д F1-score F 1 av д  extracting shared stance information for the destination targets (CrossNet, VTN and SEKT). Compared with the previous comparatively promising models (ASGCN and SEKT), our proposed model yields outstandingly better performance. This indicates that simply modeling transferable in-target stance information for destination target is insufficient in stance detection for a target without annotated data, while our TPDG model can leverage the significant stance expression for distinct target and improving cross-target stance detection by means of employing word-level target-adaptive pragmatics dependencies of the context.

Cross-target Stance Detection on Wt-wt.
To demonstrate the robustness of our proposed model in cross-target stance detection, following [2], we test on the unknown destination target while train on the other three targets in Healthcare domain from Wt-wt dataset. The results are reported in Table 4. We can observe that except to the average accuracy of the unrelated label, our proposed model also achieves tremendously better performance than all the baselines. Among them, compared with previous promising graphbased model (ASGCN), our proposed model improves 8.4% on avдF 1 and 8.0% on avд w F 1, which verifies that leveraging target-adaptive pragmatics dependencies in graph model could potentially lead to improved cross-target stance detection results. Compared with previous noteworthy cross-target model (SiamNet), our proposed model improves 7.6% on both avдF 1 and avд w F 1, which further demonstrates the significant role of leveraging target-adaptive pragmatics information from both in-target and cross-target perspective in cross-target stance detection. Additionally, noting that our proposed model achieves superior and more balanced accuracy across all the labels. This implies that our proposed model, which identifies and modifies the target-adaptive pragmatics information with interactive GCN blocks, could potentially lead to the improved identification of different stance labels.
Cross-domain Stance Detection. Intuitively, stance expressions across targets in the same domain could be shared conveniently. However, in some extreme special cases, the unseen targets maybe occur in an unknown domain. Hence in this section, we describe how the proposed model works in cross-domain stance detection, i.e. training in one domain dataset and testing in another domain dataset. The results are shown in Table 5. We can see that, due to the difficulties and challenges of cross-domain stance detection, the performance is inferior in comparison with crosstarget experiments. Despite all this, our proposed TPDG model still yields promising performance and achieves outstanding improvement compared with all the baselines over all cross-domain  tasks. This indicates that our proposed model is effective in the more challenging cross-domain stance detection task with the help of leveraging target-adaptive stance information.

Generalizability Analysis
In this section, we conduct experiments over all the targets within domain and across domains to analyze the generalizability performance of detecting stance on the whole dataset. The comparison results are demonstrated in Table 6, here for each model, we feed all the data with various targets into the model and report the stratified 10-fold cross-validation results. We can observe that, our proposed model consistently achieves the best performance in all datasets when concurrently learning stance features for various targets. Intuitively, valuable features need to be modeled to predict the stance labels for different targets in this task. Hence, BERT, which can leverage contextual semantic information for distinct targets, yields promising performance among the previous methods. Compared with BERT, our proposed model produces outstanding improvement in all datasets, which implies that deriving target-adaptive pragmatics dependencies according to distinct targets with interactive graph-aware model could learn more precise target-related stance expressions for multifarious targets in stance detection.

Ablation Study
To analyze the impact of different components of the proposed TPDG model, we conduct experiments over different cross-target tasks on Wt-wt dataset and report the results in Table 7. We can In -t ar g et C r o s s -t a r g e t F1-score (%) In -t ar g et C r o s s -t a r g e t F1-score (%) In -t ar g et C r o s s -t a r g e t F1-score (%) F1-score (%) In -t ar g et C r o s s -t a r g e t CVS_AET CI_ESRX ANTM_CI AET_HUM Figure 3: Impact of the proportion of pragmatics words. In-target=proportion of pragmatics words from in-target perspective, Cross-target=proportion of pragmatics words from cross-target perspective.
see that removal of "cross-target" or "in-target" degrades the performance substantially, which indicates that both in-target and cross-target stance expressions are important in detecting stance for an unknown target. Noting that model without "dependency" (dependency tree of the sentence) degrades the performance considerably, and removal of "pragmatics" leads to performance drops evidently. This implies that only adopting either dependency tree or pragmatics information can not adequately learn accurate stance expressions for the target. That is, leveraging target-adaptive pragmatics dependencies with interactive GCN blocks properly improves the performance of cross-target stance detection.

Impact of Pragmatics Words
To further demonstrate that pragmatics information of the contextual words can enrich the graph representation towards the target, we conduct experiments on Wt-wt dataset by employing different proportions of pragmatics words derived in Section 3.3. We first sort the words by pragmatics weight, and vary the proportion from 0 to 1, the results are shown in Figure 3. We can see that removal of pragmatics information (i.e. the proportion is set to 0) performs worst over all cross-target tasks, which implies that inhibited performance  of the model is produced when ignoring pragmatics information in stance detection. Comparatively, no matter with any proportion of pragmatics words the performance of our proposed model is improved, which verifies the significance of target-adaptive pragmatics information in learning stance expressions of the target. Noting that adopting both in-target and cross-target contextual pragmatics information is outstandingly better than only considering one of both. This further implies that leveraging the pragmatics information from both in-target and cross-target perspective could lead to extraordinarily improved stance detection.

Impact of Destination Target Data Size
As described in Section 3.3, the pragmatics weight might be influenced by the destination target data size. To provide more insights into the role of word-level target-adaptive pragmatics weight, we further study the change of stance detection performance over different cross-target tasks with varying unknown target data size (the proportion is from 0.1 to 1). The comparison results are shown in Figure 4. Noting that the performance of the baseline (SiamNet) violently fluctuates over different test data sizes. Comparatively, although the performance decreased slightly in a small data size (< 40%), our proposed model (TPDG) achieves outstandingly better and more stable performance in different proportions of the test data size. This verifies that our proposed method of computing wordlevel target-adaptive pragmatics weight is applicable for different sizes of unknown target dataset and improves the performance of cross-target stance detection.

Impact of GCN Blocks
To investigate the impact of the interactive GCN block number on the performance of our proposed model. We vary the block number from 1 to 8 and demonstrate the results in Figure 5. Noting that model with 3 GCN blocks performs overall better than other numbers, and thus we set the number of GCN blocks to 3 in our model. Model with one GCN block performs unsatisfactorily over all cross-target tasks, the possible reason maybe inadequate network structure is insufficient to exploit accurate pragmatics dependencies for the target in stance detection. In addition, in the cases of the block number greater than 3, the performance fluctuates with the increasing number of GCN blocks and essentially tends to decline when the number of block is greater than 5. This implies that roughly increasing the number of GCN blocks is vulnerable to slash the learning ability of the model due to the sharp increase of the model parameters.

Case Study
Our pragmatics weight computation is designed for identifying the roles of words in stance expression for a distinct target from both in-target and cross-target perspective, which allows us to capture different significant words paired with the corresponding weights for the distinct target. Thus we demonstrate some crucial words paired with their corresponding pragmatics weights derived from different targets in Table 8. Here CROSS SEM and CROSS WT present words with cross-target word-level pragmatics weights in Sem16 and Wt-wt dataset, and the others are from in-target word-level pragmatics weight computation. We can observe that, for in-target word-level pragmatics weight, whether the words with high-level weight or the with low-level weight is quite distinct in different targets, and the words with large pragmatics weights are fundamentally high-related to the target and dominant in stance expressions. This implies that exclusive pragmatics word sets that are derived for distinct targets from an in-target perspective can effectively help the proposed model to deal with the inherent stance expression of the distinct target. In addition, the words with cross-target word-level pragmatics weights (both CROSS SEM and CROSS WT ) are almost semantically-rich target-independent opinion words. More concretely, the stance expression of cross-target words is generally invariant to different targets, which can be adopted to derive stance expressions for the unknown target.
To further present how the in-target and cross-target information works interactively in stance detection for an unknown target, we present a case study over a typical instance by visualizing the in-target and cross-target pragmatics weights of the words and the  Here the ground-true label of the instance is comment, while a wrong prediction result support is obtained by SiamNet since it pays excessive attention to the word "merger" that potentially expressing positive stance. Noting that, for our method, the in-target pragmatics weights of several target-related words are large, and the pragmatics information produced by the word "merger" is still nonnegligible from the in-target perspective. However, the slight influence of "merger" could be ignored in the cross-target pragmatics dependency graph owing to the small value of pragmatics weight. Thus our proposed model focuses on genuinely significant words and captures a correct label. This vividly indicates that interactively leveraging both in-target and cross-target pragmatics information could modify the crucial stance-related clues of stance expression towards the target, so as to improve the performance of detecting stance for the unknown target.

CONCLUSION
In this paper, we present a novel approach that automatically identifies and adapts the target-dependent and target-independent roles of a word towards a target in cross-target stance detection. Specifically, we explore a novel solution of constructing target-adaptive pragmatics dependency graphs for each sentence from both intarget and cross-target perspective to capture the accurate role of contextual words in stance expression. Subsequently, a novel graph-aware model with interactive GCN blocks is proposed to leverage the contextual pragmatics dependencies towards the target. Based on it, valuable target-adaptive stance expressions could be learned for the stance detection of unknown targets even if unseen domains. Experimental results on multiple benchmark datasets and multiple cross-target tasks show that our proposed model can significantly outperform state-of-the-art methods in cross-target stance detection.