You cannot accurately estimate an individual’s loss aversion using an accept-reject task

Loss aversion of the prospect theory is often measured in the accept-reject task, in which participants accept or reject the chance of playing a series of gambles. The gambles are two-branch 50/50 gambles with varying gain and loss amounts (e.g., 50% chance of winning $20 and a 50% chance of losing $10). Prospect theory quantiﬁes loss aversion by scaling losses up by a parameter λ . Here we show that λ suﬀers from extremely poor parameter recoverability in the accept-reject task. λ cannot be reliably estimated even for a simple version of prospect theory with linear probability weighting and value functions. λ cannot be reliably estimated even in impractically large experiments with participants subject to thousands of choices. The poor recoverability is driven by a trade-oﬀ between λ and the other model parameters. However, a measure derived from these parameters is extremely well recovered—and corresponds to estimating the area of gain-loss space in which people accept gambles. This area is equivalent to the number of gambles accepted in a given choice set. That is, simply counting accept decisions is extremely reliably recovered—but using prospect theory to make further use of exactly which gambles were accepted and which were rejected does not work. of using the accept-reject task is the simplicity with which participant’s binary decisions can be converted into λ values. We illustrate the basic logic starting with the classic value function of prospect theory.

Would you play a gamble that offered the chance to win $20 if a coin comes up heads but lose $10 if the coin comes up tails? Suppose a person is indifferent-they do not mind whether they will or will not participate in this lottery. In such a case, we say that this person is exhibiting loss aversion. For this individual, the pleasure of receiving $20 is equal, psychologically, to the pain of having to pay $10. The idea that losses loom larger than gains is a cornerstone of behavioural economics. Loss aversion is often offered as the explanation for various phenomena in choice, like the endowment effect (Kahneman, Knetsch, & Thaler, 1991). Loss aversion is also a core component in prospect theory (Kahneman & Tversky, 1979;Tversky & Kahneman, 1992), the leading descriptive model of choices where risks are involved. In this model, loss aversion is captured by the λ parameter. λ is the exchange rate between the psychological value of gains and losses. For example, with λ = 2 (other things equal) one would be indifferent between accepting and rejecting the coin toss example-because 20 − λ × 10 = 0. Using people's responses to a series of coin toss questions like the example here, where the gain and loss on offer in each toss are varied, is a popular method used to estimate a person's λ. Here we show that most a commonly used modeling approach to estimate λ in this accept-reject task is inappropriate.
There is a growing literature questioning the theoretical and empirical aspects of loss aversion, in both risky and riskless context (Ert & Erev, 2013;Gal & Rucker, 2018;Yechiam, 2018). Rather than focusing on whether loss aversion exists, the present paper evaluates one of the commonly used methodologies of estimating at individual-level from risky choices. There are multiple approaches to estimating parameters of prospect theory, many of which have been designed with the main purpose of estimating individuals' loss aversion (Abdellaoui, Bleichrodt, & L'Haridon, 2008;Rieskamp, Busemeyer, & Mellers, 2006;Stott, 2006;Pachur & Kellen, 2013). The accept-reject task-asking people to decide whether or not to play each in a series of coin-toss gain-loss questions like the one described above-is one popular approach. In a typical accept-reject experiment, participants are presented with a series of mixed lotteries, each offering 50% chance of winning some amount, and a 50% chance of losing some amount of money. Individuals are simply required to indicate whether they would accept or reject each lottery. In the experiments that incentivize people's decisions, at least one of the lotteries is drawn at random at the end of the procedure to be played out for real outcomes.
From a practical and statistical point of view, the accept-reject task offers an elegant approach for determining how loss averse each participant is. First, the accept-reject task for estimating λ is very easy to administer. Since all lotteries offer two equally likely outcomes, the task can be explained with an analogy of a coin flip. Existing studies vary in the number of trials, but most range from 64 to 256. Thus, the accept-reject task requires relatively little cognitive effort and time from a participant. Indeed, this approach has been predominantly used in conjunction with more involved procedures, such as fMRI scanning.
In one prolific study, the accept-reject task was used by Tom, Fox, Trepel, and Poldrack (2007) to show a positive correlation between activation in the dopaminergic regions of the brain and participants' behavioral loss aversion. Several authors relied on the accept-reject approach to further test the neural antecedents of loss averse behavior (Canessa et al., 2013;Chib, De Martino, Shimojo, & O'Doherty, 2012), or to show that individual-level variations in λ parameters correspond to differences in strategic thinking (Lorains et al., 2014), oxygen deprivation (Pighin, Bonini, Savadori, Hadjichristidis, & Schena, 2014), or the ability to accurately perceive internal physiological states (Sokol-Hessner et al., 2015). Table 1 below summarizes the key properties of 15 articles (from a meta-analysis of Walasek, Stewart, & Mullett, in prep) that used the accept-reject task to estimate loss aversion among their participants. The table shows that most studies find a considerable level of loss aversion, at least on the aggregate level.
The second benefit of using the accept-reject task is the simplicity with which participant's binary decisions can be converted into λ values. We illustrate the basic logic starting with the classic value function of prospect theory.

Estimating Loss Aversion from Accept-Reject Choices
In cumulative prospect theory, the psychological value of a gain or loss x is given by The curvature of the value function is determined by α and β. Following Wakker (2010) and Stewart, Scheibehenne, and Pachur (2018) we set α = β, because without this constraint λ is arbitrarily defined by the choice of unit for x, and does not measure the extent to which losses loom larger than gains across different values of |x|. Further, we set α = β = 1 to reduce the number of free parameters we must estimate from the accept-reject choice data. This is also useful because it is not possible to separately estimate loss aversion and risk aversion with a typical accept-reject task.
The prospect theory value of accepting a 50/50 gamble for +$gain or −$loss is given by where w( 1 /2) is the decision weight attached to the 50/50 chance. Thus the first term is the subjective value of the 50% chance of winning gain and the second term is the subjective value of the 50% chance of losing loss. While cumulative prospect theory allows for the separate weighting of probabilities for gains and losses, we assume that w( 1 /2) is the same for the gain and the loss and, for the accept-reject task, this assumption is without loss of generality. This means that λ captures all loss aversion (see Pachur & Kellen, 2013).
To turn the subjective value of accepting the gamble into the probability of accepting the gamble we, just like other authors, use the exponentiated form of Luce's choice rule (e.g., Glöckner & Pachur, 2012;Rieskamp, 2008;Scheibehenne & Pachur, 2015;Stewart et al., 2018;Stott, 2006). The probability of accepting the gamble is given by p(accept) = e bias e θV (accept) e bias e θV (accept) + e θV (reject) (4) V (reject) is the prospect theory value of rejecting the gamble and keeping the status quo, and has a value of 0. When V (accept) is large and positive p(accept) will be near 1. When V (accept) is large and negative then p(accept) will be near 0. The bias parameter accounts for the general propensity to accept gambles regardless of their outcomes. This is often not included in prospect theory, but below we show that it is necessary. Setting bias = 0 reduces the model to a more usual version of prospect theory with a stochastic Luce's choice rule. The sensitivity parameter θ determines how sensitive the probability of accepting is to differences in prospect theory value, which is seen more clearly when Equation 4 is written in logistic form: Equations 3 and 4 can be rearranged to give the log-odds of accepting a gamble (Tom et al., 2007;Stewart, Reimers, & Harris, 2015).
When α = β = 1 Equations 6 is in the form of a logistic regression, where the choice of accepting is predicted as a function of gain and loss. log where which means that Thus is it convenient to estimate prospect theory parameters using a logistic regression, with λ set by Equation 11.
We can visualize how each parameter of the model plays a role in capturing the propensity to accept mixed gambles in a two dimensional gain-loss space. We demonstrate this with the responses taken from four randomly sampled participants in Experiment 1 of Walasek and Stewart (2015). In their version of the accept-reject task, participants saw 64 lotteries, which were constructed by combining losses and gains ranging from $6.00 to $20.00, in $2.00 intervals. Figure 1 illustrates raw data from four participants, with green • symbols for accept responses and red × for reject responses. Visibly, these participants behaved quite reasonably in that they tended to accept only those lotteries that offered a large gain and a small loss (the bottom right portion of gain-loss space).
Fitting the model in Equation 7 to each participant's responses yields parameter values that can be used to draw each person's indifference curve. When people are indifferent-that is, equally likely to accept or reject-then the log odds of accepting is zero. The equation for the indifference curve is given by setting log p(accept) 1−p(accept) = 0 and rearranging Equation 6 with α = β = 1.
These indifference curves, drawn in Figure 1, clearly separate the area of acceptance and rejection for each participant. Two features of these lines are important to our argument: the slope, given by 1 /λ, and the intercept given by bias λ θ w( 1 /2) . Thus the slope is determined by the loss aversion parameter λ. However, the intercept is determined by a combination of parameters: λ, the probability weighting for one half w( 1 /2), the sensitivity parameter θ, and the overall tendency of accept gambles irrespective of the gain and loss on offer bias. When bias = 0 this means the intercept is required to be zero, which is not the case for most participants. That is, the bias is necessary to account for people's individual differences in the overall propensity to accept or reject gambles independently of the gain and losses on offer.
To preempt the results of our simulations below, we are going to show that λ cannot be reliably estimated from accept-reject choice data. All that can be reliably estimated is the area under the indifference curve (AU IC) in Equation 12. This area corresponds to the fraction of gain-loss space in which people accept gambles. For a gain-loss space with uniformly distributed gains and losses, this corresponds quite simply to the fraction of gambles people accept. The intuition is that the indifference lines in Figure 1 can wiggle about their middles quite a bit, which changes the slope and the intercept a lot without changing the area under the line.

Simulation 1
Our first simulation is a parameter recovery exercise, in which we generated simulated data from the model and then tried to recover the generating parameters by fitting the same model. We simulated responses in hypothetical experiments where participants must accept or reject 100 mixed gambles made up of every possible and unique combination of gains and losses ranging from $2.00 to $20.00, in $2.00 increments. We used the parameters from the logistic regression form in Equation 7 to define our generating parameter grid. Our initial parameter values consisted of φ gain (ranging from .05 to 1 in .05 increments), φ loss (ranging from -1 to .05 in .05 increments) and φ intercept (ranging from -.2 to .2 in .2 increments). The entire parameter grid therefore consisted of all 8,400 combinations of these three parameters.
We converted the logistic-regression-form parameters into the corresponding prospect-theory parameters in Equation 6 using Equations 8-11.
For a given set of parameter values, we simulated 50 experimental sessions. This is analogous to a situation in which 50 individuals with the same underlying preferences take part in the same experiment.
For each choice we generated p(accept) with Equation 6. We then generated a response for each gamble-either accept or reject-as a Bernoulli sample based on p(accept) for that gamble. We excluded datasets for which the standard deviations of p(accept) over all trials was smaller than .2 to discard parameter values that make very similar predictions over all gambles in the choice set. As a result, we were left with 6,586 unique parameter combinations and 329,300 datasets (32,930,000 individual choices). The possible values of λ from our grid included 295 unique values ranging from .05 to 19 with a mean of 1.39 and SD of 1.46.

Results.
We first removed 849 instances in which our model failed to converge, leaving us with 328,451 recovered parameter values. Figure 2 shows how well λ, bias, θ w( 1 /2), and AU IC can be recovered for a random sample of 50,000 parameter values from our grid search. To summarise, while λ, bias, θ w( 1 /2) are poorly recovered, AU IC is nearly perfectly recovered. Figure 2A shows that the recoverability of λ is poor. The recoverability worsens for higher levels of loss aversion, but even for the relatively lower values of λ < 2.5, the amount of heterogeneity is non-trivial in magnitude. Consider the case of true λ of 2.25, which is the often-used median value from (Tversky & Kahneman, 1992) and corresponds to pronounced loss aversion. Interpreting λ as the sole reason for rejecting mixed lotteries could lead us to believe that a person would require a gain to be approximately 2.25 higher than a loss in order to be indifferent with respect to a given 50/50 gamble. However, the recovered λ is less than 1.99 30% of the time and more than 2.62 30% of the time. We therefore cannot know for sure what is the true λ of such a participant.
The ability to recover generating bias is even worse, as illustrated by Figure 2B.
Similarly, in the case of θ w( 1 /2) (Panel C) the recovered values are very variable. The recoverability is particularly poor for high values of θ w( 1 /2) as indicated by the fanning out fits of the quantile regression. Appendix A further illustrates the large variance in the recovered parameters.
What does this variability in recovered values mean for loss averse behaviour?
Answering this question requires us to distinguish between two possible interpretation of loss averse behaviour in the accept-reject task. One view is that loss aversion is simply given by the λ parameter of prospect theory, but another is that loss averse behaviour is simply the overall tendency to reject mixed gambles. To capture the overall tendency to reject mixed gambles we calculated the area under the indifference curve in gain-loss space (see Appendix B). For example, individuals whose indifference curves are plotted in the lower two panels of Figure 1 appear to differ considerably in their willingness to accept mixed lotteries in the accept-reject task, but this is not reflected in the slope of their indifference curves. Instead, the difference here is driven by the bias which contributes to the upward and downward shifts of this curve. If bias and λ (and also θ w( 1 /2)) capture all there is about the propensity to accept mixed lotteries, we should expect to see very good recoverability of the AU IC scores. Figure 2D illustrates recoverability of AU IC. Large AU IC scores mean that a larger portion of the gain-loss space is covered by the decisions to accept, and therefore high AU IC scores represent less loss averse behaviour. It is clear from the plot that AU IC is very well recovered. Of note is the small cloud in the lower right corner of the plot. These values emerge in rare cases when the binary accept-reject data generated from the model probabilities are particularly extreme. In such instances, the fitted model produces the indifference curve that is outside the bounds of the gain-loss space.
If AU IC encompasses the global tendency to accept mixed gambles, then we should also find no correlations, indicative of parameter trade-offs, between AU IC and bias, θ w( 1 /2) or λ. In contrast, we may expect a correlation between bias and λ if these two parameters truly trade-off in the model when they both try to capture decisions to accept and reject mixed lotteries. To demonstrate this, for each possible combination of the generating bias, θ w( 1 /2) and λ parameters, we computed Spearman's correlation coefficients for recovered values of these parameters and AU IC. Figure 3 shows the distributions of correlation coefficients for all six possible pairings of parameters. There is a strong correlation between bias and λ. Recall that the elevation of the indifference curves in the two dimensional gain-loss space is given by the intercept bias λ w( 1 /2) and slope 1 λ . The positive correlation coefficients therefore show that when the propensity to reject lotteries regardless of their outcomes is high, the asymmetric weighting of gains and losses is smaller. Conversely, we observe much weaker correlations between AU IC and λ and AU IC and bias. With respect to θ w( 1 /2) we observe weak correlations with the other two parameters as well as the AU IC.
Our simulation rather unambiguously shows that λ suffers from extremely poor recoverability when attempts are made to estimate it from the responses in the accept-reject task. This issue seems to be mainly driven by the parameter trade-off between λ and bias in the stochastic choice rule. A measure derived from both of these parameters, which corresponds to the area under a person's indifference curve in the gain-loss space, is very well-recovered. Since this measure captures the overall propensity to accept/reject mixed gambles, it could easily be replaced by simply calculating the proportion of "accept" decisions in the accept-reject task. We return to the topic of using AU IC in the accept-reject task in the Discussion.

Simulation 2
In our second simulation, we determine whether the issue of poor recoverability of λ in the accept-reject task can be alleviated with a larger number of data points. That is, we attempt to establish what would be the minimum number of experimental trials that would produce a reliable measure of prospect theory's λ. We simulated data for 50 identical

Discussion
One of the defining features of people's preferences is their attitude towards risks involving gains and losses. It is therefore important that behavioural scientists have reliable and accurate tools for measuring an individual's loss aversion. Here, we evaluated the popular accept-reject task for estimating the loss aversion parameter λ of prospect theory, in which participants must indicate whether they would accept or reject series of 50/50 gambles. We found that λ suffers from poor recoverability. This recoverability remains poor, in absolute terms, even with more than a thousand experimental trials. Our analysis shows very clearly that difficulties in estimating λ arises from parameter trade offs within the model. We also showed that a composite measure derived from both λ, θ w( 1 /2) and bias, which corresponds to portion of the gain-loss space in which people accept gambles (see Figure 1), is well recovered. This area is equivalent to the number of gambles accepted in a choice set with uniform distributions of gains and losses. Yet, while the AU IC measure is well recovered, it would not generalise to other choice sets with different ranges of gains and losses. As such, AU IC should not be regarded as a solution to the problem of estimating λ in the accept-reject task. At the level of the individual, any attempts to estimate prospect theory's λ seem futile and can lead to serious mis-estimations.
By representing people's decisions in the two-dimensional gain-loss space, we can also show that removing bias parameter from the exponentiated Luce's choice rule would not be a good solution to the problem of estimating λ from accept-reject task data. Both the intercept and slope of the indifference curves plotted in Figure 1 are necessary to represent people's attitudes towards mixed gambles. In other words, simplification of the model by eliminating the bias parameter would just result in poorer model fits. We checked this by fitting two models, one with bias = 0 and one with bias free to vary, for each participant in the experiments reported by Walasek and Stewart (2015). Comparing model fits using revealed that the bias = 0 constraint results in significantly poorer model fits for a considerable proportion of the participants (ranging from 15% to 48%, see Appendix ?? for further details). Thus our argument is not that the model described in the Equation 6 is wrong, but rather that data provided by the accept-reject task are not suitable for recovering underlying attitudes towards gains and losses. The problem therefore emerges from the combination of information-poor binary choices and the complexity of the model necessary to capture people's behaviour in full.
It is worth noting that bias may be particularly relevant in the context of the accept-reject task. Several authors pointed out that this type of task confounds loss aversion with the status quo bias (Yechiam & Hochman, 2013;Ert & Erev, 2013). Indeed, if participants are given an opportunity to choose between a $0 for sure and a mixed gamble, they are much more likely to choose the gamble than they are to accept it in the accept-reject version of the task. So if the responses on the accept-reject task are in a large part driven by the status quo bias, then the issue of poor recoverability of λ will be worse in the case of this particular task.
How serious are these problems? In other words, could the variability of recovered values influence results of studies in which the accept-reject task is used to estimate λ?
The problem of poor recoverability will lead to mis-estimation on the individual level. This issue is particularly problematic for studies where individual estimates of λ are correlated with some other individual difference. Take as an example Tom et al. (2007), in which the authors found a strong positive correlation between participants' λ values estimated from the accept-reject task and neural loss aversion as measured by fMRI (specifically the differential sensitivity to gains and losses in dopaminergic regions of the brain). Using the results of our grid search in Simulation 1, we can demonstrate that such correlations are unreliable when the accept-reject task is used to estimate λ. We begin by simultaneously sampling a set of "true" λ values from a normal distribution with a mean of 2.25 and standard deviation of 1.22 (taken from Lorains et al. (2014) as Tom et al. (2007) did not report the mean and standard deviation) and a set of "true" neural loss aversion values that correlate r = 0.65 with the "true" λs. Actually Tom et al. (2007) report an even higher correlation of r = 0.85, but our finding is robust across different values of r. For each "true" λ we drew another corresponding "recovered" λ from the distribution in Figure 2. We then calculated the correlation between the "recovered" λ values and the "true" neural loss aversion. We repeated this 1,000 times for sample sizes of 15, 20, 25, and 35 participants. The distribution of 1,000 correlations is plotted in Figure 5 for each sample size. The recovered correlations were considerably lower than the starting value of 0.65 (blue dotted line). The underestimation is large, with the correlations ranging from -.73 to .77 (median = .10). This result should be worrying to researchers who wish to use the accept-reject task to quantify individual differences in loss aversion.
We think that the issue of poor parameter recoverability is less bad when making estimates about a group of participants rather than individual participants. Recall that the relationship between the standard deviation of a sample and the standard error of the mean-the standard error is the standard deviation divided by the square root of the sample size. For example, while the standard deviation of the recovered λs in Simulation 2, with one replication of 64 trials for each of the 50 simulated participants is 0.15 around a true mean of lambda = 1.26. The standard error is 0.15/ √ 50 = 0.02. Thus the estimate of the mean of the population from which the sample of participants was drawn will be well estimated, given a sufficient number of participants, even though the estimate for each individual participant is poor: While the accept-reject task provides a poor estimate of any given individual's value of λ, the accept-reject task can offer a adequately precise estimation of the population mean.
In Simulation 2, we showed that merely increasing the number of trials does not the solve the problem of poor recoverability of λ for individuals. However, another possibility would be to change the number of trials more strategically, increasing the number of lotteries in the regions that are most diagnostic for determining the slope of one's indifference curve (e.g., at the edges of the gain-loss space). Even if an experimenter had a good idea about the reasonable set of lotteries for recovering λ, manipulating sets of lotteries in the accept-reject task can have serious consequences. Stewart, Canic, and Mullett (2020) and André and de Langhe (2019) both show that the sets of questions used to elicit prospect theory parameters biases the estimate of those parameters. Stewart et al. (2020) show that the correlation between prospect theory parameters estimated across a random partitioning of a choice set is disturbingly low: estimated parameters do not even generalise to a new set of choices drawn from the same master set. André and de Langhe (2019) show that allowing the range of gains and losses to vary biases the estimate of λ in the accept-reject task. Thus if one is to have a measure of λ which is comparable across participants-as is required for correlating λ with other individual differences-one must use the same set of choices over all individuals. This requirement thus rules out having the choices generated for each participant from an adaptive procedure.
Our finding that the accept-reject task is not suitable for recovering prospect theory's λ involved using Luce's choice rule as an error model to allow prospect theory to make stochastic predictions. This extension is often used and, for the accept-reject task, is equivalent to a logistic regression. We believe that the poor recoverability of λ would hold for alternative error models. Here we consider two prominent alternative approaches (see Loomes, 2005, for discussion). The first approach is to assume that people select the alternative (i.e., accept or reject) with the higher prospect theory value, but that their  Table 1 Summary of the studies using the accept-reject task to estimate λ q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  Recovered AU IC against generating AU IC.

Appendix B
Calculating Area Under the Curve Consider a gain-loss space square with gains running horizontally from 0 to G and losses vertically from 0 to L. The indifference line, along which the decision maker is equally likely to accept or reject the gamble is given by Equation 12.

Model fit comparison: with and without bias
We fitted two versions of the logistic regression to individual-level choice data from Walasek and Stewart (2015). We then performed a Likelihood Ratio Test to compare performance of the two models to capture each individual's responses. Table C1 summarizes the proportions of participants for whom there was a significant difference in the model performance, such that the model with bias offered a better model fit than a model without bias.