The nuts and bolts of core-hole constrained ab initio simulation for K-shell x-ray photoemission and absorption spectra

X-ray photoemission (XPS) and near edge x-ray absorption fine structure (NEXAFS) spectroscopy play an important role in investigating the structure and electronic structure of materials and surfaces. Ab initio simulations provide crucial support for the interpretation of complex spectra containing overlapping signatures. Approximate core-hole simulation methods based on density functional theory (DFT) such as the delta-self-consistent-field (ΔSCF) method or the transition potential (TP) method are widely used to predict K-shell XPS and NEXAFS signatures of organic molecules, inorganic materials and metal–organic interfaces at reliable accuracy and affordable computational cost. We present the numerical and technical details of our variants of the ΔSCF and TP method (coined ΔIP-TP) to simulate XPS and NEXAFS transitions. Using exemplary molecules in gas-phase, in bulk crystals, and at metal–organic interfaces, we systematically assess how practical simulation choices affect the stability and accuracy of simulations. These include the choice of exchange–correlation functional, basis set, the method of core-hole localization, and the use of periodic boundary conditions (PBC). We particularly focus on the choice of aperiodic or periodic description of systems and how spurious charge effects in periodic calculations affect the simulation outcomes. For the benefit of practitioners in the field, we discuss sensible default choices, limitations of the methods, and future prospects.


Introduction
X-ray photoelectron spectroscopy (XPS) and x-ray absorption spectroscopy (XAS, often called near edge x-ray * Author to whom any correspondence should be addressed. Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. absorption fine structure, NEXAFS, or x-ray absorption near edge structure, XANES) play an important role in the characterization of materials and surfaces [1][2][3]. XPS is routinely used to provide information on the chemical composition and the oxidation state of elements in various systems, including oxides [4], two-dimensional materials such as graphene [5,6], and metal-organic interfaces [7][8][9]. NEXAFS and, particularly, angular-or polarization-dependent NEXAFS can provide insight into the electronic structure and orientation of Principles of x-ray photoelectron spectroscopy (XPS, left) and near-edge x-ray absorption fine-structure spectroscopy (NEXAFS, right). XPS detects photoelectrons emitted from core-levels after absorption of x-ray radiation. This provides information about the chemical environment of the core-states for each atom. NEXAFS probes secondary processes that are associated with the excitation of core-electrons into unoccupied valence states. This provides information about the unoccupied valence states of the system. (b) In this paper, we calculate XPS spectra with the delta-self-consistent-field (ΔSCF) method (left) and NEXAFS spectra with the ionization-potential-corrected transition potential (ΔIP-TP) method. thin films [10], surfaces and adsorbates [11], and even liquids [12]. Both methods are particularly suitable for surface characterization: XPS is inherently surface sensitive due to the small escape depth of the photoelectrons from condensed matter and NEXAFS can make use of the same effect by choosing a suitable electron yield detection [2,13].
In core-level spectroscopy (see figure 1(a)), the sample is exposed to an x-ray beam, leading either to the emission of a photoelectron from a core-level (XPS) or the excitation of the core-electron to unoccupied electronic states (NEXAFS). For XPS, a fixed photon energy is used and the spectrum is acquired by scanning over the kinetic energy of the photoelectrons. For NEXAFS, the photon energy is varied and the energy-dependent absorption is measured. Ideally, the measured spectrum exhibits sharp signatures which can be directly assigned to originate from individual corelevels (XPS, figure 1(a) left) or from transitions between corelevels and unoccupied states (NEXAFS, figure 1(a) right). Changes in the chemical composition, molecular conformation or even orientation, or electronic structure of a sample will be reflected in the position, shape, and intensity of spectral signatures. Therefore both methods provide direct evidence of such changes.
However, the rich structure and overlapping features present in XPS and NEXAFS spectra often pose a challenge to the unambiguous assignment of all peaks. Here, the theoretical analysis and computational simulation of the electronic structure and transitions can contribute and produce additional insights. The first-principles-based simulation of core-level spectra is widely employed for this purpose. Established methods include the use of time-dependent density functional theory (DFT) [14,15], many body perturbation theory methods such as GW (for XPS) and Bethe-Salpeter equation calculations (for neutral excitations) [16,17], as well as multiconfigurational wave function methods and coupled cluster (CC) calculations [18][19][20]. In addition, more cost-effective DFT methods that approximate the effects of an excited core-hole within a ground-state DFT formalism have been proposed.
The latter methods include ΔSCF DFT [21][22][23] to simulate XPS spectra and the transition potential (TP) method [24] to simulate NEXAFS spectra. Many variations of these methods have been proposed in the past, [25][26][27] with varying uptake in the community. These approximate methods are particularly suitable to study K-shell spectroscopy, i.e. the emission or excitation of 1s electrons, because higher lying states feature additional challenges such as spin-orbit coupling and delocalization over several atoms [28,29].
Approximate core-hole excitation methods to simulate core-level spectroscopy differ in how many electrons are excited, how the core-hole constraint is imposed and localized [30], which underlying basis set is used, e.g. if all-electron local atomic basis functions or pseudopotential-plane-wave approaches are used [27,31,32], and how relativistic effects are treated. The empirical wisdom behind simulating reliable and accurate core-hole spectra is often very software-and application-specific and is rarely the focus of studies, but rather relegated to supplemental information, even as it is crucial for the achieved performance.
In this paper, we present our approach to approximate core-hole simulations and discuss methodological and technical intricacies. We analyze the dependence of simulated spectra with respect to all relevant simulation parameters to establish what considerations are required to produce reliable and converged XPS and NEXAFS spectra that are independent of numerical choices. We draw from our experience in simulating XPS and NEXAFS signatures of organic molecules adsorbed at metal interfaces [33][34][35][36][37] and highlight the challenges associated with simulating 1s XPS and K-edge NEXAFS spectra for organic molecules in gas-phase, in bulk crystals, and adsorbed at surfaces. After a brief introduction into the methodology and practical steps of XPS and NEXAFS calculations in section 2, we analyse how spectra depend on the choice of core-hole localization method (section 3.1), exchange-correlation functional (section 3.2), basis set (section 3.3), and model representation (cluster or periodic) before highlighting the challenges specific to bulk and surface calculations in section 3.5. On the latter topic, we specifically discuss the role of charging artefacts in periodic calculations. We further showcase the strength of analysing simulated NEXAFS spectra in terms of their angular and polarization dependence and by decomposing them into atomic and molecular orbital contributions to disentangle chemical shifts from final state screening effects (section 3.4).

Ab initio simulation of x-ray photoemission spectroscopy (XPS)
Principle of XPS: since the first observation of the photoelectric effect by Hertz and Hallwachs, photoemission spectroscopy has become one of the most important techniques for the chemical analysis of molecules and materials [38]. Upon exposure to x-ray radiation with frequency ν, photoelectrons with kinetic energy E kin and momentum k can be detected following the relationship: (2.1) In (2.1), E B is the binding energy of the electron and Φ is the work function of the material. A typical photoelectron spectrum plots the intensity of an electron detector signal against the binding energy (from highest to lowest binding energy). Depending on the used photon energy, the intensity I of the photoelectrons with kinetic energy E kin,k is proportional to the photoelectron cross section σ k as follows [39]: where λ in is the inelastic mean free path (IMFP), n is the number of atoms of this species per volume, A is the surface area covered by the x-ray spot, and J hν is the flux density of incoming photons of frequency ν. With few exceptions [40,41], the IMFP is mostly neglected in simulations as, within the typical kinetic energy range (a few 100 electronvolts up to 1 keV), the IMFP of photoelectrons is of the order of 1 nm, which amounts to few atomic layers from the surface [13]. The cross section σ k itself can be related to the spectral function A ii (E kin,k − hν) of photoemission of an electron with momentum k from corestate i via Fermi's golden rule and the sudden approximation [17,42]: where A(r) is the vector potential of the incident electromagnetic field and p is the momentum operator. Within many-body theory, the spectral function arises from the single particle Green's function that describes the removal or addition of particles to the system [17] and describes the frequency-dependent probability of photoemission and the associated lineshape. The positions of the poles in the Green's function correspond to the electron addition/removal energies with respect to the Fermi energy, i.e. the binding energies of the associated quasi-particles. This takes account of all relevant interaction processes, including collective screening, electron-electron scattering and relaxation effects due to the introduction of the core-hole, all of which arise from many-body correlation. For example on metals, XPS spectra typically show a characteristic asymmetry, which stems from the interaction of the photoelectron with electron-holepair excitations close to the Fermi level [43]. Full many-body calculations can provide very accurate XPS spectra that fully account for many spectral features [16,17]. Such calculations are, however, associated with high computational cost and are all but unfeasible for systems larger than several tens of atoms, and therefore remain unpractical for organic crystals or metal-organic interfaces.
In practice, the main features of XPS spectra are often simulated by effective single particle approaches. Instead of evaluating the cross section from the full spectral function, it is instead approximated by a set of single particle transitions between effective Kohn-Sham states within DFT. By assuming independent electrons [A ij (hν) = δ i j δ(hν − ε i )], we can simplify (2.3) to the well known Fermi's golden rule expression [44]: (2.4) If no selection rules are to be considered, for example if no angular dependence is required (as would be in angular resolved photoemission spectroscopy [45,46]), the matrix elements in (2.4) are often neglected as all final states are assumed to be extended plane wave states. This amounts to associating each excitation that originates from an atom-centred corestate i with the same intensity, i.e. eliminating all but n and σ k (hν) from (2.2). The result of such a calculation is a spectrum with N infinitely narrow δ-peaks where N is the number of atoms, each with the same intensity and located at the energetic position of the binding energy E B of the respective atom's electrons.
This represents an effective single particle approach that neglects lifetime broadening due to many-body effects such as electron-electron correlation. These effects can be approximatively accounted for by applying an empirical line broadening shape to the infinitely sharp binding energies. The line shape is in this case often approximated by assuming a Gaussian, Lorentzian, or convoluted Gaussian-Lorentzian (i.e. Voigt) function for each atomic peak. The summation of all peak functions leads to a spectral shape comparable to what is commonly observed in experiments. The Lorentzian broadening is justified with life-time effects of the excited states, while Gaussian broadening simulates experimental effects (e.g. analyser resolution). A simplified way to incorporate both kinds of broadening into the simulated spectra, which was chosen for this publication, is the use of a pseudo-Voigt function, which reduces the convolution needed for a proper Voigt function to a summation operation [47,48]. When using such functions, the parameters for line-width (full width at half maximum, FWHM), and Gaussian-Lorentzian ratio of the broadening are chosen according to the expected or observed experimental resolution. Using these approximations to obtain intensities and line shapes, the problem of simulating XPS spectra is reduced to finding the binding energies E B of the relevant electronic core-states. Ab initio calculation of core-level binding energies: many different approaches exist to calculate core-level binding energies including CC [49], equation-of-motion CC (EOM-CC) [20], the GW method [16,17], and real-time TD-DFT [50]. The simplest approach, however, is the use of Koopmans' theorem (2.5), which relates the ionization potential of an electronic state in Hartree-Fock theory with the negative of the relevant Hartree-Fock eigenstate [51]. Despite the fact that in DFT this relationship only holds for the highest occupied Kohn-Sham state [52,53], Kohn-Sham (KS) energies of corelevels are often used to estimate the chemical shift contribution to the binding energy, i.e. the displacement of the core-level before removal of the electron (also termed initial state effect) [54][55][56].
In most cases, however, core-hole relaxation effects (or final state effects) cannot be neglected in the prediction of binding energies. The ΔSCF DFT (or ΔSCF) method [21,22,57], which includes relaxation effects, is therefore the most common approach to simulate core-level binding energies [55,[58][59][60][61]. As shown in figure 1(b), left, the ΔSCF method calculates binding energies as energy differences between two self-consistent KS DFT calculations, namely the ground-state calculation and the core-hole excited calculation, where the electron population p c of the 1s core-state has been constrained to remove one electron: Other methods have been proposed that introduce core-holes equivalent to half an electron (so-called Slater transition state approach) [57], or two-thirds of an electron [62], although full removal of an electron has been shown to provide the best results [24]. Unfortunately, often a simple change of orbital occupation does not provide a sufficient constraint to localize the core-hole onto the atom-centred core-level [63,64]. This is particularly true if the molecule contains symmetry-equivalent atoms with degenerate core-states. Many different approaches have been proposed in the past to overcome this, such as the maximum overlap method (MOM) [60], using a localized hole in a frozen core-state [25], shifting results between symmetric MOs [65], and localization via initialization with a density calculated for an increased atomic core-charge [27]. Due to those difficulties, we will discuss the issue of core-hole localization in detail in section 3.1. An additional problem for calculations of systems with periodic boundary conditions (PBC) is the treatment of the charge introduced by the core-hole calculations. The unit cell of every periodic calculation has to be net-neutral to avoid failure of the Ewald summation. Therefore, a uniform negative charge background is introduced into the calculation of the positively charged unit cell with the core-hole, which might lead to unpredictable changes in the calculated energies. This charging issue in periodic systems has previously been recognized as problematic [66,67]. The effect of the background charge can in principle be reduced by increasing the unit cell size. Unfortunately, the necessary unit cell sizes to reduce the introduced shift to a negligible value are commonly thought to be too large for practical uses. However, while we show in section 3.3 that the absolute energies are not easily converged, we also found that the relative shifts are quite robust with varying cell size.

Ab initio simulation of NEXAFS
Principles of NEXAFS: excellent accounts of the measurement and first-principles simulation of x-ray absorption have been given previously by Hähner [68], Frati et al [1], Tanaka and Mizoguchi [69][70][71], Norman and Dreuw [72], and Guda et al [73]. We do not attempt to cover the full breadth of literature on the subject here and will focus mostly on simulation with constraint-based DFT methods. An x-ray absorption spectrum is experimentally recorded by varying the photon energy of incident electromagnetic radiation and measuring a detector signal, which is proportional to the absorption of x-rays by the sample, and therefore also proportional to the photon energy dependent cross section σ(hν). This x-ray absorption cross section originating from excitation of core-state i as given by Fermi's golden rule in the single particle and dipole approximation is [2]: where ψ f and ψ i correspond to the wave functions of the final valence and the initial core electronic states, respectively. p is the dipole operator and A(r) is the vector potential of the incident electromagnetic field. This can be further simplified by assuming a classical polarization wave with polarization along a unit vector e, leading to A(r) = eA 0 e ik·r [74]. For local coreexcitations, the dipole approximation is valid (e ik·r = 1) and yields: Therefore, the intensity of a NEXAFS spectrum is defined by the nature and overlap of initial and final states. The δ-function ensures energy conservation between the energy of the incident light and the transition energy ΔE fi . To obtain the spectral shape, the discrete δ-functions of each excitation are typically replaced by a model of the finite line shape to account for the life-time of final states and other broadening effects, in analogy to the XPS line shapes (vide infra). Simulation of NEXAFS spectra: a large variety of methods to simulate x-ray absorption spectra have been proposed including linear response based on CASSCF calculations [18], TD-DFT [15,75], and Bethe-Salpeter equation calculations [76][77][78][79]. Here we discuss NEXAFS in the context of molecular bulk and metal-organic surface systems, where the choice of method is much more limited due to the intrinsic size of and computational cost associated with such systems. A variety of more approximate DFT-based methods to calculate NEXAFS transition energies ΔE fi have sprung from the ΔSCF method. These include methods that explicitly simulate each transition between the core-hole and all unoccupied states, such as the ΔSCF method and the Slater-Janak transition-state method [57], where one or half an electron are excited from the core-level p c to the relevant valence state p v , respectively. The transition energy is given by the difference in total energy of the core-excited calculation (p c = 0.0, p v = 1.0) and the DFT ground state (p c = 1.0, p v = 0.0): (2.9) In this case, M calculations are required, where M is the number of valence states v that will be considered. This procedure needs to be repeated for all relevant core-levels N, on which the core-hole is to be localized. For K-shell excitations, N equals the number of symmetry-inequivalent atoms. The total number of DFT calculations for a single NEXAFS spectrum is therefore (N · M) + 1 (including the ground-state). Using this approach, the sheer amount of calculations can be prohibitive if the system contains a large number of atoms.
The transition potential (TP) method [80,81], is an approximation based on Slater's transition state approach [57], which circumvents such a large number of calculations. Rather than calculating each excitation explicitly as a difference between the total energy of an excited state and the total ground-state DFT energy, the transition energy is defined as the difference between the KS energies of valence (ε v ) and core-states (ε c ). These KS energies are calculated for a system, where half an electron is removed from the core-state to model the effect of the core-hole on the transition: (2.10) A single TP DFT calculation provides all KS states and therefore yields all NEXAFS transitions and the number of calculations for a spectrum is reduced to the number of atomic species N. A variant of the TP method has been proposed that introduces a full core-hole (FCH), instead of a half core-hole [82,83]: (2.11) Note that in both cases the (partial) electron removed from the core-state is not placed into the valence states and the system is not net neutral. The lack of charge neutrality, as is also the case for ΔSCF calculations of XPS binding energies, can provide a challenge for simulations of bulk and surface systems using PBC. While charge neutrality in the unit cell can be restored by the introduction of a homogeneous charge background [82], this charge background itself can lead to unphysical artefacts in the calculations [67]. Several solutions to this problem have been proposed [84,85], for example by placing the removed charge into the lowest unoccupied molecular orbital (LUMO) of the system as done in the XCH method which introduces a full core-hole [26,30]: (2.12) A comprehensive summary of different approximate core-hole constraint methods has recently been given by Michelitsch and Reuter, where the authors also introduced other previously not considered variants of the TP method [30]. For a small set of organic molecules in gas-phase, the authors compare these different approaches and find that carbon and nitrogen K-edge NEXAFS spectra (intensities and excitation energies) are best represented by the XCH and XTP methods. The latter is a variation of TP where the removed half core-hole is neutralized with half an electron placed into the LUMO: − ε c (p c = 0.5, p LUMO = 0.5). (2.13) While the TP and XTP methods seem to provide reliable and robust predictions of the relative peak positions and intensities in NEXAFS spectra [30], the XCH and ΔSCF methods, which both introduce a full core-hole, seem to provide a better description of absolute energies [24]. Whereas the half corehole constraint in TP and XTP appears to accurately account for core-hole relaxation effects in valence states, the chemical shift and core-hole screening is more accurately represented with a full core-hole in ΔSCF and XCH. This conundrum has previously already been recognized by Triguero et al [24]. They proposed a hybrid approach, where the TP (half corehole) is corrected by the ionization potential of the core state calculated with a full core-hole. Following Triguero et al [24], we call this ionization-potential-corrected TP method ΔIP-TP to distinguish it from the original TP method [80,81]. The energy of a transition between a core-level and an unoccupied valence state is calculated in ΔIP-TP by combining the results of a ΔSCF and a TP calculation (see also figure 1(b), right): (2.14) In (2.14), ε v are the KS energies of the unoccupied states from the half core-hole TP calculation, whereas E B (1s → free) is the total binding energy of the core-level from a full core-hole ΔSCF calculation as laid out in (2.6). This approach retains the accurate description of relative transition energies for each atomic center, but additionally improves not only the description of the chemical shift of each atom but also the absolute transition energies. We have employed such a hybrid approach in all our previous work on metal-organic interfaces and organic molecules, where we have achieved good agreement with experimental data [34][35][36][37]. In principle this method can also be employed on the basis of XTP instead of TP to calculate the KS-energies of the unoccupied states, as we will show below.
TP core-hole calculations are regularly performed in codes that use all-electron atomic orbital basis sets [30] or codes that use plane waves and pseudopotentials [31]. In the former case, the binding energy from ΔSCF can yield highly accurate absolute energies if well-designed basis sets with sufficiently flexible core-functions are used [27,32,60,86]. In the case of pseudopotential plane-wave codes, in order to calculate dipole matrix elements, the full all-electron wave function needs to be reconstructed using the projectoraugmented-wave method [31,70]. The frozen core adds to already existing errors that stem from the self-interaction error of approximate exchange-correlation functionals and the resulting absolute binding energies are typically not close to experiment. However, the application of a rigid global shift can still allow a fruitful comparison with the experiment. Figure 2 summarizes the practical steps involved in performing a ΔIP-TP (or ΔIP-XTP) simulation of a Kedge NEXAFS spectrum: a ground-state DFT calculation (figure 2(a)) is followed by N ΔSCF calculations (figure 2(b)) with a full core-hole to determine the binding energy for each inequivalent core-level [E B (1s → free) in (2.14)] and N TP calculations (figure 2(c)) with a half core-hole to determine the KS-energies of the unoccupied states [ε v in (2.14)]. These calculations each have to be performed N times, because the full/half core-hole has to be introduced into each of the N symmetry-inequivalent 1s states in the system. Each TP calculation first provides a self-consistently converged density and then works to accurately converge the unoccupied valence states. This approach is necessary to achieve the same convergence accuracy for the unoccupied states as for the occupied ones. Therefore, care needs to be taken that convergence thresholds are set appropriately for the former as their KS energies directly control the transition energies and the respective transition dipole matrix elements with the core-state.
Finally, the ΔSCF binding energies are added to the TP eigenvalue spectra to align the contributions originating from different atomic species onto a single energy axis and the spectra are summed up (figure 2(d)). The result is the final ab initio predicted spectrum. This spectrum can be plotted with the contributions from each core-state (as shown) or further analysed by projecting to the contributions from valence states or molecular orbitals (see section 3.4).
Both XPS and NEXAFS spectra are broadened to simulate typical experimental resolutions and ease the comparison to measured data. We chose to use a bare-bone scheme employing a minimal number of parameters and a mathematically simple implementation. While physically more meaningful peak functions exist [2], their proper application would be much more complex. The scheme used in this publication is still able to produce good agreement with experimental spectra and is superior for most practical purposes due to its simplicity of use. For the XPS spectrum, a fixed peak width and Gaussian/Lorentzian (G/L) ratio can normally be chosen for all peaks stemming from the same core-level. The spectrum is yielded as a summation over the pseudo-Voigt-functions [47,48] for each symmetry equivalent atom weighted by their abundance in the system. While the broadening function can also be chosen to be the same for the NEXAFS transitions of all atomic species, it has to change with photon energy in the NEXAFS spectrum. This approach is necessary to account for the difference in lifetime of states below and above the ionization threshold. Therefore, two different broadening schemes are employed for low and high photon energies. In the high photon energy regime, the width is larger and the G/L ratio is turned more Lorentzian, while the reverse is true for the low photon energy regime. Both regimes are linked by a linear gradient for width and G/L ratio, which crosses the ionization threshold, in line with common approaches in literature [87,88]. Systematic errors in the method and the selfinteraction errors in the exchange-correlation functional make it furthermore necessary to globally shift the simulated spectra to align with the experimental energy axis both in the XPS and NEXAFS spectrum. Angular-and polarization-resolved NEXAFS simulations: NEXAFS can not only probe the character of electronic states, but also their orientation with respect to the incident light polarization [89]. As can be seen in (2.7), the intensity depends on the orientational alignment of the direction of polarization of incident light e with respect to the vectorial change in dipole moment due to the electronic transition μ fi . Assuming linear polarized light, we can simplify the matrix elements in (2.7) as follows: (2.15) Herein, D fi are the dipole matrix elements between initial and final states i and f, respectively. Therefore, the intensity is largest if the direction of the electric field vector e and the transition dipole vector are parallel. In case of a spherical 1s core-state, this means that the intensity is largest if the light polarization aligns with the electronic dipole of the final electronic state, see figure 3(a).
For some systems, all orientational dependence of the NEXAFS transitions is likely to be averaged out in the experimental data due to random orientation within the sample, for example in the gas phase, in liquids, or in amorphous or disordered polycrystalline solids. If this is the case for the experimental data we want to compare our simulations to, the orientational dependence can also be neglected in the calculations: For molecules adsorbed on metal surfaces, however, the polarization dependence of the NEXAFS signal (the so-called dichroism) is often present in the experiment and quite useful, because it enables the extraction of structural information about the studied system [10]. To describe the polarization dependence, we can define the polarization direction e = (e x , e y , e z ) of the incident light in terms of polar ϑ and azimuthal ϕ angles with respect to the surface (see figure 3(b)): e = (e x , e y , e z ) = (sin ϑ · cos ϕ, sin ϑ · sin ϕ, cos ϑ) . (2.17) The resulting intensity for a transition between state i and state f excited by polarized light defined by the angles ϑ and ϕ would be: For some systems, the variation of the azimuthal orientation ϕ of the incident radiation relative to the substrate can also be used to extract structural information [90]. Most systems, however, and this includes most metal-organic interfaces, possess a random azimuthal orientation, or many rotational domains which cancel out the ϕ-dependence due to the symmetry of the underlying substrate [2]. For such systems, it is therefore best practice to average over the azimuthal ϕ dependence of the signal (in the x, y plane) for the corresponding simulations, yielding (2.19):

Computational details
The here presented DFT calculations have been performed with two different electronic structure software packages: the pseudopotential plane-wave code CASTEP [91] and the allelectron atomic-orbital code FHI-aims [92]. CASTEP was used to calculate NEXAFS spectra with the ΔIP-TP method, as laid out in the method section and figure 2. This approach includes ΔSCF total energy calculations and TP calculations for each atom, all of which were performed in CASTEP. The all-electron code FHI-aims was used to calculate XPS energies with the ΔSCF method to obtain comparable values free from the potential errors brought by the forced periodicity and the use of pseudopotentials in CASTEP. The raw inputs and outputs of all electronic structure calculations have been deposited All electron calculations in FHI-aims: to localize the core-hole in the ΔSCF calculation, several approaches were tested and are compared in section 3.1. These approaches employ the force_occupation_basis (FOB) and the force_occupation_projector (FOP) keywords implemented in FHI-aims by Matthias Gramzow (Fritz-Haber-Institute of the Max Planck Society, Berlin). Both keywords apply occupation constraints on KS eigenstates and use variants of the MOM to ensure that the constraint is satisfied [60]. With the FOP keyword, we define the occupation constraint directly onto a KS eigenstate. If eigenstates swap between consecutive SCF steps, the constraint moves with the eigenstate following the maximum overlap with the constrained state of the previous SCF step. With the FOB keyword, we define the core-hole occupation constraint in terms of a localized atomic orbital basis function. The occupation constraint will be enforced on the KS eigenstate that has the highest contribution from the selected core atomic orbital. During successive SCF steps, the constraint will follow the eigenstate with the highest contribution from the atomic orbital, which ensures that the population of the core-level associated with the right atom is constrained. This approach works well for localized core-states such as 1s levels of most elements. The more delocalized the target core-state is, the less reliable this approach will be as basis functions on adjacent orbitals will complement each other. We find that the FOB approach is in most cases preferable as it correctly distinguishes between symmetry-equivalent atoms and therefore eliminates the necessity of an additional calculation to break the symmetry. We tested our approach against calculations with the method proposed by Kahk and Lischner [27], where the core-hole calculation using the FOP functionality is enabled by the use of an initial wave function with reduced symmetry. This symmetry-broken wave function is created from a single SCF step with slightly increased corecharge on the relevant atom, resulting in a symmetry breaking for the desired core-state. This approach requires more effort then the FOB method, but proved more resilient when working with hybrid functionals. Alternatively, the issues of the FOB approach with hybrid functionals can be solved by combining it with a Boys localization that is implemented in FHI-aims [93] and applied before the core-hole calculation [30,94].
The basis set convergence of the atom-centered numerical orbitals native to FHI-aims was extensively tested, as described in detail in the supplementary material (https://stacks.iop.org/JPCM/33/154005/mmedia) (see tables S5-S8). To achieve a better description of the core-states, augmented near-core basis functions were introduced according to the literature [27]. These additional basis functions are s-type hydrogen-like functions with high effective nuclear charges. Taking the results of the convergence series into account, the 'tier2' basis with internal default tight settings for the numerical integration and the mentioned core-hole augmentation was chosen for the remaining calculations. The cluster calculations of the molecular crystal (in section 3.5) employed the 'tight-tier2' basis set only for the atoms in the central molecule itself, while the surrounding 16 molecules were supplied with a 'light-tier2' basis to reduce computational load. The same was true for the lower metal layers in the copper cluster calculations.
In section 3.2, different exchange correlation functionals were tested, these include PBE [95], PW91 [96], HSE06 [97], PBE0 [98], TPSS [99], SCAN [100], and xDH-PBE0 [101]. The calculations were performed with the FHI-aims versions 200408 and 200511, both of which where checked to give the same results. All structures were optimized in FHI-aims with the PBE functional, a 'tight-tier2' basis and a force threshold of 0.01 eV Å −1 . The electronic convergence settings were set to 1 × 10 −4 e Å −3 for the electron density, 1 × 10 −2 eV for the KS-eigenvalues and 1 × 10 −6 eV for the total energy. Relativistic effects were taken into account by employing the atomic ZORA functionality [92]. Higher order relativistic effects such as spin-orbit coupling could be neglected for the studied K-shell excitations and binding energies. XPS spectra were generated from the calculation results by representing each carbon 1s binding energy with a pseudo-Voigt function [47,48]. The application of this broadening scheme allows a better comparison to experimental data and enables a realistic judgement of the magnitude of differences in the XPS binding energies. The parameters of the pseudo-Voigt function were chosen to reflect typical experimental resolutions achieved in surface science experiments (FWHM = 0.7 eV, 70%/30% Gaussian/Lorentzian ratio).
Plane-wave calculations in CASTEP: the plane-wave code CASTEP was used to perform both ΔSCF and TP calculations within the ΔIP-TP approach. All calculations were carried out in PBC using on-the-fly generated pseudopotentials. The stability of the calculation results with regards to the additional parameters introduced by PBC and pseudopotentials was extensively tested. In particular the choice of pseudopotentials, the energy cut-off for the pseudopotentials, and the vacuum box size around isolated molecules (necessary due to the PBC), was investigated in detail, as presented in section 3.3. Calculations were performed using either CASTEP version 18.1 or 19.11 [91], while maintaining the same pseudopotential definitions and with a total energy per atom convergence tolerance of at least 1 × 10 −6 eV/atom. For the calculation of XPS binding energies, the ΔSCF method was used. Here, for each carbon atom an excited core-hole calculation was performed, where the electron configuration of the respective pseudopotential was modified to [1s 1 , 2s 2 , 2p 3 ], effectively localising the core-hole at this atom while moving the electron to the valence region. However, the new electron in the valence region is then removed by the introduction of a total charge of +1.0 e. Due to the PBC, this charge is compensated by a homogeneous background charge of −1.0 e, yielding a net-neutral unit cell. The core-level binding energies obtained for each carbon atom have furthermore to be corrected for the energy change of the core-electrons contained in the modified pseudopotentital [70]. NEXAFS simulations were carried out using the ELNES module in CASTEP [70], the half core-hole was included by changing the electron configuration of the pseudopotential to [1s 1.5 , 2s 2 , 2p 2.5 ]. Each (half ) core-hole calculation includes a total energy SCF calculation followed by a band structure calculation to converge the unoccupied states, and the subsequent evaluation of the dipole matrix elements using the projector augmented wave reconstruction method to reconstruct the core-electron wave functions [31]. A self-written post processing tool called MolPDOS [102] was used to postprocess the data and generate the spectra. This tool is contained in CASTEP version 18.1 and later and exists alongside other useful tools for this purpose such as OptaDOS [103]. The total NEXAFS spectrum is produced from summing all single carbon species contributions and shifting them with their corresponding core binding energies, resulting in the so-called ΔIP-TP approach.
To make the NEXAFS spectra obtained by the calculations comparable to experimental data, a photon energy dependent broadening scheme using pseudo-Voigt functions [47,48] was included. As laid out in section 2, the change in peak shape and width is meant to simulate the different life-times of excited states below and above the ionization potential. After the correction according to the XPS energies, the NEXAFS spectrum is divided into three ranges. The first range starts with the leading edge, spans the first 5 eV of the spectrum and is assigned a pseudo-Voigt FWHM of 0.75 eV and a 80%/20% Gaussian/Lorentzian ratio, while the third range starts from 15 eV above the leading edge and is assigned a FWHM of 2.0 eV and a 20%/80% G/L ratio. Both ranges are connected by an intermediate range, in which the FWHM and the G/L ratio change linearly. Every NEXAFS transition is represented by a pseudo-Voigt peak with parameters according to its position on the photon energy axis. The values mentioned here were consistently used in this publication. If comparison to experiment is desired, these broadening parameters need to be adjusted accordingly, e.g. as done in figure S6 of the supplementary material.

Results and discussion
In the following, we will discuss the calculation of x-ray absorption and photoelectron spectra for a small set of typical systems to exemplify the technical and numerical aspects, as well as the pitfalls involved in performing such calculations. We performed calculations for azupyrene (see figure 2(b)) and ethyl trifluoroacetate (ETFA, see figure 5(a)), as well as for the gas-phase azulene molecule, the azulene bulk crystal, and azulene adsorbed on a Cu(111) surface (see figure 8) [35]. ETFA shows extreme chemical shifts in its carbon 1s binding energies and has been an important reference system since the dawn of photoelectron spectroscopy [104][105][106]. Azupyrene is a polycyclic aromatic hydrocarbon, which was chosen because of its high symmetry, yielding a large number of equivalent carbon atoms. The three different systems involving the azulene molecule are used to show the versatility of our approach. A comprehensive experimental data set exists for azulene on Cu (111) which was previously used to show the feasibility of our DFT-based structures, energies and spectroscopic simulations [35,36,107].

The choice of core-hole localization method for XPS simulations
The most straightforward way to calculate XPS energies via DFT is the use of the negative KS orbital energy values ε KS according to (2.5). In addition to neglecting relaxation effects, this approach carries an additional problem: the KS-states may be delocalized, especially for highly symmetric molecules, e.g. azupyrene, which possesses the symmetry point group D 2h . In such cases, the KS states for symmetrically equivalent atoms are linear combinations of the atomic core-states (see figure 4). If the respective atoms are close enough to produce appreciable overlap, an issue is encountered: the energy of the bonding combination of the core-states is lowered while the energy of the anti-bonding combination is raised. Therefore, two different binding energies exist for two symmetrically equivalent carbon atoms (see figure 4(a)).
The delocalization of the KS-states is not only an issue for the direct use of the KS energies to model binding energies, but also in ΔSCF calculations that account for core-hole screening effects, (2.6). If the occupation of a certain core-level is changed to calculate the ionized state for the ΔSCF calculation, for example by using the FOP functionality in FHI-aims to remove an electron from a KS state, it might distribute the core-hole over multiple atoms. The result is an unphysical system with a calculated binding energy between the values of KS-states and the properly localized core-holes (see figure 4(b)). The core-hole localization, if not strictly enforced, often fails for hybrid functionals (e.g. PBE0). In this example, the localization worked for one case and produced a core-state properly localized on one carbon atom with a realistic binding energy (289.52 eV), while for the second case the core-state is still delocalized over two of the four symmetry equivalent atoms and the binding energy is unrealistically low (285.95 eV). The iso-value for all states shown is 0.1.
To avoid these issues, additional steps have to be taken to assure the proper localization of the core-hole in the calculations of the ionized state for the ΔSCF approach. In particular, it is necessary to break the symmetry of the electronic wave function of the system. In a pseudopotential code this step is straightforward, because the use of a pseudopotential with a built-in core-hole for each atom automatically breaks the symmetry and localizes the core-hole [31]. The same is true for frozen core calculations, where the ionic occupation for the core-orbitals is frozen out with a core-hole localized at the desired atom [56].
For all-electron calculations, the problem is more difficult to solve. One approach suggested by Kahk and Lischner [27,108] is to introduce a third step in addition to the ground state and the core-hole calculation. In this intermediate step, the symmetry of the wave function is broken, e.g. by introducing an increased core-charge to the atom in question. The (unphysical) wave function with broken symmetry and increased core-charge is then used as starting point for a corehole calculation with normal core-charge at all atoms [27,108]. The FOP functionality in FHI-aims is able to implement this approach. Alternatively, localization can also be achieved by using the FOB functionality in FHI-aims. In the latter, the occupation constraint on the KS state is defined by its overlap with a 1s atomic orbital basis function localized at the relevant atom. This approach in most cases automatically breaks the symmetry of the wave function during the self-consistent core-hole calculation.
It should be noted that both approaches, if the core-hole localization is successful, yield numerically identical results for the systems studied here. For most cases, we have found the FOB approach to be more stable with regard to electronic convergence and more convenient, because it does not require an additional initialization step. However, in the case of the highly symmetric azupyrene molecule, the direct use of the FOB approach failed for all hybrid functionals (see figure 4(b)). To obtain the correct binding energies when using hybrid functionals it was necessary to first break symmetry using the Boys localization [94], as proposed in the literature [30]. This approach was successful for all molecule and XC functional combinations. It appears that hybrid functionals, contrary to GGA and meta-GGA functionals, suffer much more from variational collapse during ΔSCF calculations and require a constraint that explicitly breaks symmetry to ensure core-hole localization. An additional way for solving this problem would be the use of a modified version of the ΔSCF method, such as the local SCF method [109,110] or the linear-expansion ΔSCF [102] approach. Additionally, one of the tested meta-GGA functionals (SCAN) showed unphysical behaviour with great differences in the valence spin channels caused by the core-hole. For all other functionals the overall spin density is determined by the core-hole, showing a localised difference in the core-state spin channels on the excited atom, while the valence spin states only show minute differences yielding a vanishing spin density at the other atoms. But for SCAN the wave functions of the valence spin channels differ by a lot, leading to a larger difference in the up and down electron densities and therefore the spin-density distributed over the whole molecule. This finding is consistent with the known 'over-magnetization' of SCAN reported in the literature [111].
For the unsymmetrical molecule ETFA (symmetry point group C 1 in the chosen rotational conformation), the localization poses less of a problem. Here, both tested meta-GGA functionals (SCAN and TPSS) failed when using the FOB approach for some atoms, but results could be obtained using the FOP.
The localization of the core-hole is also important for the calculation of NEXAFS spectra with all-electron DFT and was recently discussed in detail in the literature [30]. In this work, we only perform NEXAFS simulations in the plane-wave code CASTEP with core-hole excited pseudopotentials. Here, the description of the (half ) core-hole within the pseudopotential automatically ensures localization and symmetry breaking. The additional effects this core-hole excited pseudopotential exerts on XPS and NEXAFS signatures is discussed separately below.

The choice of the exchange correlation functional
To judge the influence of the exchange correlation functional on the absolute and relative C 1s XPS energies, a comprehensive study sampling 8 functionals of various types was carried out. The tested functionals included GGAs (PBE and PW91), meta-GGAs (SCAN and TPSS), hybrid functionals (PBE0, HSE06 and B3LYP), as well as the double hybrid functional xDH-PBE0. The test calculations were performed on the two gas phase molecules ETFA and azupyrene. These two molecules were chosen because they represent two different challenges when calculating XPS binding energies. ETFA (see figure 5(a)) is commonly used as example for XPS measurements and calculations due to the extreme chemical shifts of its four different carbon atoms [104][105][106]. On the other hand, azupyrene (see figure 2(b)) represents an alternative challenge in XPS calculations. It is a common organic molecule with carbon atoms in similar environments that produce slight but possibly measurable relative shifts in their C 1s binding energies. For both molecules, the calculations regarding the performance of the XC functionals were carried out using the all-electron code FHI-aims with a 'tight-tier2' core-augmented basis set.
The ETFA molecule possesses extreme chemical shifts for the C1s binding energies of up to 8 eV, leading to clearly separated peaks in a spectrum with typical experimental resolution. There is high quality experimental reference data available obtained by measurements on the gas-phase ETFA molecule [105,106]. This experimental data is compared to the spectra calculated with different XC functionals in figure 5(a) on an absolute binding energy axis. For reference, all calculated binding energies are also compiled in table S1 of the supplementary material.
On first sight, the overall performance in terms of absolute binding energies is not straightforward to extract (figure 5(a), data tabulated in table S1). However, a clearer trend emerges when the spectra are plotted as relative shifts with respect to the lowest binding energy ( figure 5(b), data tabulated in table S2). The performance of the different functionals regarding the relative shifts in the binding energies can be divided into three groups: It is apparent that the hybrid functionals perform best here, and the GGAs perform worst, while TPSS behaves more like the GGAs and SCAN is somewhere in between.
When discussing the performance on the absolute binding energy scale, things are more complex. Due to an additional global displacement of all calculated energies, the average deviation in the absolute binding energies would seem to be smallest for B3LYP and SCAN. But according to the relative shifts, the case can be made that all functionals have similar problems describing the binding energies of the carbon atoms with extreme binding energy shifts. In fact, the functionals PW91, PBE0, HSE06 and xD-PBE0 have an excellent agreement with the absolute binding energy of carbon C4, which possesses the lowest binding energy. Following this argument, the best performance for the absolute binding energies is provided by the hybrid functionals PBE0, HSE06 and xD-PBE0 with an average deviation of 0.50, 0.46, 0.50 eV and the GGA PW91 with the deviation of 0.81 eV.
By comparing absolute and relative deviations, it is obvious that the main error in these calculations is not only in the absolute energies, but already in the relative energies. Generally the hybrid functionals perform slightly better than the meta-GGAs and GGAs but their results are still quite poor compared to experimental data. Apparently all functionals have problems calculating the high binding energies resulting if the core-state is extremely descreened, e.g. for the carbon atom C1 in the CF 3 group of ETFA. For carbon atoms in more common chemical environments, e.g. C4 in ETFA, all functionals perform satisfactorily, i.e. yielding a deviation below or around 0.5 eV. It should also be noted that the use of the double hybrid functional xD-PBE0 brings no improvement relative to the regular hybrids. The better performance of hybrid functionals for K-shell XPS energies was also reported in the literature [112], while other studies found a better performance of regular GGAs [113]. However, many test sets used to judge the performance of binding energy calculations include only molecules with more moderate shifts [24,63,112,113].
To further test the performance of the XC functionals for less extreme chemical shifts, we turn to the molecule azupyrene, which is an aromatic hydrocarbon without heteroatoms. The chemical shifts exhibited by its carbon atoms are therefore more subtle and only caused by its non-alternant topology, similar to the azulene molecule already discussed in the literature [35]. To compare the performance of the functionals with regard to these more subtle shifts, only relative differences in the binding energies are analysed. These relative shifts with respect to carbon C1 (see figure 2) are compared in table 1 for the different XC functionals. For reference, the absolute binding energies can be found in table S3. Within each class of functional used, GGA, meta-GGA and hybrid, the values obtained for the relative XPS shifts are similar with maximal deviations of 0.06 eV within the hybrids and 0.09 eV in the meta-GGAs. As a caveat, the meta-GGA SCAN showed problems with the response of the valence electron states to the core-holes and produced an unphysical spin density. However, the C1s shifts are still in line with the other functionals. It should be noted that the simple approach using Koopmans' theorem also gives similar relative shifts in the binding energies (see table S4 in the supplementary material). This indicates that for the azupyrene molecule the shifts are mainly determined by the chemical environment of the initial state and that the relaxation of the core-hole is very similar for all chemical species.  figure 2(b)). Both spectra were obtained by applying the same pseudo-Voigt broadening on the calculated C 1s binding energies to mimic a typical experimental resolution. Inset: structural formula of ETFA with numbering. The color scheme for the various functionals in (c) is valid for all panels.  Figure 5(b) shows the XPS spectrum of azupyrene obtained by applying the same pseudo-Voigt broadening as used in figure 5(a) to all relative binding energies in table 1. When the results obtained with the different functionals are thereby compared against the backdrop of a typical experimental resolution, they provide almost indistinguishable results. For this reason, we chose the functional PBE for all subsequent calculations.

The choice of basis set in XPS and NEXAFS simulations
In sections 3.1 and 3.2 we have discussed the role of corehole localization and exchange-correlation functional when performing all-electron atomic orbital-based core-hole simulations of XPS binding energies. Another important numerical choice is the type and size of basis set that is used for such calculations. For atomic orbital basis sets, the importance of core augmentation functions has been extensively discussed in literature [27,32,60,86]. We also find that basis set convergence for XPS binding energies in FHI-aims is very fast and, once core augmentation functions (according to reference [27]) are added, does not require to go beyond the standard 'tight' basis set definitions. The full data for the convergence series is shown in tables S5 to S8 in the supplementary material.
Moving from an all-electron basis to a pseudopotential plane wave formalism has the benefit of eliminating the issue of core-hole localization that was discussed in section 3.1, because core-hole excited pseudopotentials account for the core-excitation only on the relevant atom in question. However, this introduces other challenges. For example, the frozen core described by the pseudopotential does not relax and therefore absolute binding energies are typically much worse than if all-electron descriptions are used. Similarly, as the corestates are not explicitly treated, the core-level wave function needs to be rebuilt to enable the calculation of transition dipole moments between core and valence states in NEXAFS [31]. Lastly, it is not clear how the choice of pseudopotential affects core-level binding energies and XAS signatures.
Different types of pseudopotentials exist varying in their degree of 'hardness', i.e. the number of states that are pseudoized and the constraints that are placed on the core region [114]. Examples include non-local norm-conserving pseudopotentials [115] and ultra-soft pseudopotentials [116]. The latter require much lower energy cutoffs and fewer plane waves to accurately represent observables, which they achieve by lifting the constraint of charge conservation in the core region such that the wave functions satisfy the generalized orthonormality condition. We compared XPS and NEXAFS spectra calculated with three different pseudopotentials for the example molecule azupyrene. Those three are based on the default settings of on-the-fly generated pseudopotentials in CASTEP and only differ in the type of projector, namely they use (for each angular momentum channel): (1) a normconserving projector, (2) a single ultrasoft projector, or (3) two ultrasoft projectors. All other parameters that define the pseudopotentials were left unchanged from the defaults. The detailed definitions are available in the raw data deposited on the NOMAD repository.
For the XPS calculations, we compared the spectra obtained with all three pseudopotential definitions with the spectrum generated by the all electron code FHI-aims. The XPS spectra of the different pseudopotentials had to be rigidly shifted by −6.1 eV to align them to the all-electron results, because the lack of electron relaxation imposed by the frozen core approximation and the introduced charge background discussed below. The peak shape, however, is almost identical for all types of projectors and the all-electron method. The visualization of this result is shown in figure S1(a) and the relative peak positions of the different carbon species are compiled in table S9 of the supplementary material. In the case of NEXAFS simulations, the same is true and the different pseudopotentials provide almost identical spectra, with the exception of slight differences in intensities at high excitation energies (see figure  S1(b) of the supplementary material). We therefore find that the main features of core-level spectra are robust with respect to the type of pseudopotential projection functions used. For all further calculations, we select the ultrasoft pseudopotential with two projectors per angular momentum channel, which is the default in CASTEP.
Another consequence of using a plane-wave pseudopotential basis set for core-hole simulations is the limitation that all calculations have to be performed using PBCs. The use of PBCs with a final state core-hole simulation approach such as ΔSCF or TP can introduce errors in the calculation of XPS binding energies. Errors can arise due to finite size effects, when unit cells are too small and core-holes are interacting across periodic images. Another issue is when the positive charge of the core-hole is not compensated. ΔSCF simulations of XPS binding energies and ΔIP-TP simulations of NEXAFS transitions introduce a net charge of +1 or +0.5 e, respectively. Unit cells in PBCs must be charge neutral, otherwise the electrostatic potential would diverge. Therefore, a net charge is typically compensated by the introduction of a uniform (negative) background charge over the whole unit cell. This is a fail safe mechanism of an Ewald summation rather than an artefact, although it does introduce erroneous physics, such as a strong dependence of the total energy on the size of the unit cell.
To calculate gas-phase molecules with PBC, the molecule is placed in a large vacuum box. Figure 6 shows the convergence behaviour of the C 1s binding energies for the gasphase ETFA and azupyrene molecules as a function of the size of this vacuum box. The box is of cubic shape with increasing size from 10 to 40 Å in all directions. It can be seen that the absolute values of core-level binding energies (figures 6(a) and (b)) vary by 2-3 eV as the box size increases and are clearly not converged even at very large vacuum box sizes. This effect can be attributed to the change in core-hole density and electrostatic interaction with the background charge and was recently studied in detail by Taucher et al [67]. However, this dependence on the box size of the absolute energies is not seen in the relative chemical shifts between the single carbon atoms within the molecules. In figures 6(c) and (d) the behaviour of the relative shifts with the vacuum box size is plotted and compared to the relative shifts obtained from the all-electron code FHI-aims (in red).
Here it is clear that the relative shifts are much less affected by the background charge. While very small box sizes lead to slight deviations in relative shifts, the peaks converge very quickly and after about 20 Å box size, no change is seen. The converged relative shifts agree well with the all-electron calculations with average deviations of 0.1 eV for the extreme shifts in ETFA and 0.01 eV for the more subtle shifts in azupyrene. This result shows that absolute binding energies are affected by a homogeneous background compensation charge, but the issue is much less severe for relative energies where careful convergence can reproduce all-electron results. In addition to this data, we compare the performance of the ΔSCF and ΔIP-TP methods with their charge neutral variants for azulene adsorbed on Cu(111) in section 3.5.
The influence of the box size was also studied for the NEX-AFS spectra calculated with the ΔIP-TP method and shown in figure S2(a) of the supplementary material. As seen with the XPS binding energies, the NEXAFS spectra are converged at a box size of 20 Å where little to no change is seen afterwards. Regarding basis set convergence, the spectra show little change in the general shape and features with increased cut-off energy (see figure S2(b)), while the overall intensity increases and is converged after 450 eV, which is a plane wave cut-off energy similar to what is typically required for intermolecular interaction energies or adsorption energies with the employed ultrasoft pseudopotentials. We complete our discussion of numerical and technical parameters of core-hole simulations by studying the influence of the exchange-correlation functional on the simulated NEXAFS spectrum. We have calculated the NEXAFS spectrum of azupyrene with three different GGA functionals: PBE, BLYP and PW91 (see figure S3 of the supplemental materials). While there are slight differences for the intensities of higher energy transitions, the spectra show little variation overall and are almost identical for the lowest energy peaks. The deviations found by us are slightly smaller than what has previously been reported [24].

Analysis of core-level spectra in terms of initial and final state contributions
Once XPS and XAS spectra are simulated, a comparison with experiment often requires a deeper analysis of the individual signatures that constitute the spectra. The decomposition in terms of the core-levels from which photoelectrons and excited electrons originate is trivial, as our approach uses separate core-hole calculations for each carbon atom (see figure 2). Therefore initial state decomposition is achieved by only summing up the transitions which originate from the 1s orbital localized at a specific carbon atom as shown in figure 7(a) for the azupyrene molecule. An analysis in terms of final states in NEXAFS simulations requires further consideration, but can be particularly appealing for organic molecules adsorbed at surfaces and in thin-films, as spectra are often interpreted in terms of the symmetry of these states (π * , σ * ). The final state decomposition for azupyrene in figure 7(b) shows the spectrum projected onto molecular orbital (MO) reference states according to their overlap with the final states f of each transition [102].
The projection coefficient c MO f is then multiplied with the cross section in (2.8) to select core-level transitions that lead to an excitation of a final state that has a contribution from the reference state MO: This approach is implemented in CASTEP via the MolPDOS tool [102], which enables such projections onto densities-ofstates. For the projection, the reference states MO have to be provided for each TP calculation in the form of a reference wave function. In the simplest case, this reference wave function is generated by the ground state calculation of exactly the same system as used for the TP calculations (as shown in figure 7(b)). In this case the projection scheme directly provides contributions of each unoccupied state to the overall NEXAFS spectrum. However, it might be beneficial to choose a different reference system. For example, if the overall system in question is a molecule adsorbed on a metal surface, the reference system can be chosen as the free standing molecular overlayer (in exactly the same structure but without the surface). The projection of the NEXAFS transitions of the whole adsorbate system on the unoccupied states of the molecular overlayer now yields information about how the molecular states are changed by the adsorption and how they still contribute in a distinct manner to the overall spectrum. This approach was already used multiple times and proved its utility for molecules adsorbed on metal surfaces both with regard to NEXAFS transitions and for the analysis of the ground-state density of states [34][35][36][37]117].

XPS and NEXAFS simulation of organic crystals and metal-organic interfaces
Up to now, we have only discussed core-level simulations of XPS and NEXAFS spectra for isolated molecules. In the following, the three different systems involving the azulene molecule are used to show the versatility of our approach and pinpoint special challenges for spectroscopic calculations of bulk and surface systems. As exemplar systems, we chose the azulene molecule both in a molecular crystal and adsorbed on the Cu(111) surface, depicted in figure 8, and compare both to the gas-phase molecule.
One additional issue of condensed systems compared to gas-phase calculations is the sampling of electronic states across the periodic crystal via Brillouin zone integration (or k-space sampling). Therefore, the convergence of both the XPS and NEXAFS calculations has to be ensured with respect to the number of k-points included. In the supplementary material (see figures S4 and S5(a)), we provide a detailed discussion of the convergence properties of the XPS and NEX-AFS spectra for the azulene molecular crystal and azulene adsorbed on Cu(111) when simulated with CASTEP. In short, both systems show rapid convergence with respect to the density of the employed Monkhorst-Pack k-grid [118]. In the case of azulene on Cu(111), we have additionally tested convergence with respect to the size of the vacuum slab that separates the surface model from its periodic image perpendicular to the surface and find that this parameter shows virtually no influence on the spectra (see figure S5(c)). Moreover, we also repeated the convergence series with regard to the cutoff energy of the plane wave basis and found that the convergence of the metal-adsorbed molecule is similar to the gas-phase molecule (see figure S5(b)).
The comparison between gas-phase azulene, the azulene molecular crystal and azulene adsorbed on Cu(111) allows us to show that all three systems can be treated with approximate core-hole simulation methods such as ΔSCF and ΔIP-TP, but also allows us to compare aperiodic simulations with FHI-aims and calculations under PBCs with CASTEP. First, XP spectra where calculated with the ΔSCF approach in both FHI-aims and CASTEP. In the case of FHI-aims, we employ an aperiodic approach (also called cluster approach) where we correctly account for the positive charge in all three systems after photoionization. On the other hand, the plane wave nature of CASTEP made a 3D periodic cell necessary, even for those systems showing a lower periodicity. The cluster approach in FHI-aims can easily simulate the isolated molecule, while the inherently 3D periodic plane-wave approach has to simulate the molecule in a large periodic vacuum box (see section 3.3 for a discussion) at significant computational overhead. For the molecular crystal and the molecule adsorbed on the surface, the plane wave code is more suitable, because it already includes the 3D/2D periodicity. In the cluster approach for the all electron calculation of the crystal and the metal-organic interface, the choice of the cluster is not straightforward. For the molecular crystal, the 16 molecules surrounding a central molecule were cut out of the periodic crystal structure and the XPS shifts of the central molecule were calculated. For the surface-adsorbed molecule, a cluster was cut-out of the 2D periodic slab. The choice of the cluster was guided by the desire to preserve the hexagonal symmetry of the surface while simultaneously providing a sufficient surface area for adsorption. To reduce the number of subsurface atoms, the cluster was truncated into a conical shape, which becomes narrower in lower-lying layers. In the literature, no strong dependence of the XPS calculations on the specific cluster termination was found [27], which we can confirm. Still, the choice of both cluster models for the molecular crystal and the surface system is somewhat arbitrary, forming a draw-back of this method. Figure 8(a) shows a comparison of the XPS calculation results for both approaches and all three systems. The overall spectra were shifted (to account for the difference in absolute binding energies due to the charge background issue within the PBCs) and a broadening scheme was applied to simulate experimental resolution. Both methods agree very well for all systems, be it the free molecule, the molecular crystal  (111). (a) Comparison of the XPS spectra, calculated both with the cluster approach in the all-electron code FHI-aims and in a periodic system with the plane-wave code CASTEP. The XPS spectra were shifted to align the peak maxima and to better show the difference in the peak shape; broadening was introduced to simulate a typical experimental resolution. Right side: structures of the used clusters (b) and periodic unit cells (c).
or the molecule adsorbed on the metal surface. This agreement is a very important result, because it proves that cluster and periodic approaches can yield compatible results, despite their respective challenges. For the systems investigated here, both approaches appear to provide excellent agreement for the obtained relative binding energies. We note, however, that the cluster calculations performed with FHI-aims, both for the molecular crystal and the molecule adsorbed on the surface were much more computationally demanding than the periodic calculations. The all-electron calculations required vastly more computational resources and proved to be less stable with regards to electronic convergence than the periodic calculations, which are straightforward as implemented in CASTEP.
The calculated spectra also provide insight into the interactions present in each system. It is apparent that the interaction between the azulene molecules in the molecular crystal exerts only a minor influence on the XPS peak, which retains its gasphase shape. The interaction between the molecule and the Cu(111) surface, on the other hand, changes the peak shape a lot, which is an indication of the increased molecule-metal interaction and was already discussed in connection to the experimental data elsewhere [35,36]. For the adsorbed azulene, we further note that coverage as modelled within PBCs is different to the low-coverage situation captured by the cluster. However, the coverage only exerts a minor influence on the XPS binding energies for this system, as was also observed in experiment.
The NEXAFS spectra of all three systems, calculated with the ΔIP-TP approach in the plane-wave code CASTEP, are compared in figure 9. The spectra were not shifted with respect to each other. The good agreement, despite the presence of the space-charge effect, is due to the ionization potential correction (ΔIP) of the NEXAFS transitions according to the C1s binding energy of each carbon. In figure 9(a) it can be seen that the spectra of free molecule and molecular crystal are very similar. For the NEXAFS spectrum of the adsorbed molecule (figure 9(b)), the angular dependence of the NEXAFS transitions with regard to the polarization vector is important. Therefore we chose to simulate three different polar angles ϑ for the polarization direction. Two of those correspond to the polarization direction parallel (90 • ) and orthogonal to the surface (0 • ), while the third is the so called magic angle (53 • ). The magic angle is the polarization angle, at which the orientation of the transition dipole moment μ fi relative to the substrate does not influence the transition intensity. It should be noted that the ideal magic angle (for our threefold-symmetric substrate) is in fact 54.7 • [2]. However, the actual magic angle encountered in an experiment is dependent on the degree of polarization of the used radiation [2,10]. The value for the degree of polarization of 0.9 used in the corresponding experiments [35] leads to a magic angle of 53 • . For all polar angles, the spectra are averaged over the azimuthal angle, due to the symmetry of the substrate (see (2.19)). The comparison between NEXAFS of gas-phase and adsorbed molecule reveals stark differences in the spectra, giving insight into the mechanism of the molecule-surface interaction. While the overall dichroism reveals that the molecule lies approximately flat on the Cu(111) surface, there is another telling detail in the spectra. The non-vanishing intensity of the C 1s to LUMO transition (291 eV) for the 90 • polar angle shows that the clear separation of σ * and π * states (as present for the free molecule) is now broken due to the hybridization between the electronic states of adsorbed molecule and metal surface [34,119]. The corresponding experimental spectra, the simulated data, and the subsequent interpretation were already reported in previous publications and show a good agreement between experiment and theory [35,36].

The effect of charge in periodic core-hole simulations
We use this opportunity of having reliable experimental data for azulene thin films and azulene monolayers adsorbed on the Cu(111) surface to show a systematic comparison of the  (111) including angular dependence on the polarization direction of the incident radiation relative to the surface or molecular plane, respectively. The NEXAFS spectra include broadening to simulate a typical experimental resolution.
ΔIP-TP method with the similar ΔIP-XTP calculations. The advantage of the XTP method is that it creates a net-neutral unit cell which does not require a spurious uniform background counter charge, which is otherwise present in the periodic ΔSCF and (ΔIP-)TP calculations and can significantly influence electrostatic properties [67]. The array of calculations making up a ΔIP-XTP simulation differs only in the third step (as shown in figure 2(c)) from the ΔIP-TP method: we perform N half core-hole (but zero charge) XTP calculations where we introduce half an electron into the lowest unoccupied state to compensate for the core-hole.
The results of the ΔIP-TP and ΔIP-XTP simulations are compared for the azulene molecule in the gas phase and adsorbed on the copper surface in figure 10. In this figure all spectra were rigidly shifted to compensate for the global shift due to the different charge state of the systems and therefore enable a better inspection of their spectral shape. Both methods produce virtually indistinguishable XPS peak shapes for the molecule adsorbed on the metal surface, while stark differences are present for the gas-phase molecule (see figure 10(a)). This fact is unsurprising, because the peculiar peak shape of the gas-phase azulene molecule is due to the localized charge distribution, as already discussed in the literature on the backdrop of experimental XPS data for thin multicrystalline films of azulene [35]. The neutral ΔSCF calculations do not remove the electron taken from the core-state, but instead promote it into the lowest unoccupied state. The occupation of the former LUMO has massive consequences for the electronic structure of the gas-phase molecule and therefore the resulting spectra are negatively influenced. On the other hand, in the case of azulene adsorbed on Cu (111), the continuous density of states of the metal surface around the Fermi level has the effect that the additional electron does not present a significant change in the electronic structure and the resulting spectra remain similar.
For the proper simulation of the NEXAFS spectra in figure 10(b), the obvious error in the neutralized ΔSCF calculations for the free molecule made it necessary to perform the ΔIP shift both for the TP and XTP simulation on the basis of the charged ΔSCF calculations. The results show that again the spectra for the molecule adsorbed on the metal surface are almost identical for ΔIP-TP and ΔIP-XTP, while there are some differences for the gas-phase molecule.
In comparison to various core-hole constraining methods, Michelitsch and Reuter have previously reported that the XTP method (note that no ΔIP correction is applied) on balance provides the best agreement with experiment for absolute energies and intensities of first and second transition energies of NEXAFS spectra for a range of isolated organic molecules [30]. We also compared the ΔIP-TP and ΔIP-XTP spectra with the experimental data for the multicrystalline azulene films from reference [37] (see figure S6 in the supplemental material). The quantitative comparison extracted from these spectra yields an average deviation of 0.13 eV for ΔIP-TP and 0.05 eV for ΔIP-XTP, when the relative shifts between the four lowest-energy peaks are compared to the experiment. The relative peak intensities for those four peaks deviate by average factors of 1.8 and 1.5 for ΔIP-TP and ΔIP-XTP, respectively. Therefore we found both ΔIP-TP and ΔIP-XTP to be in good agreement with the experiment, with ΔIP-XTP providing slightly smaller deviations both in transition energies and intensities. As can be seen by this quantitative comparison, the approximate treatment with the ΔIP-TP and ΔIP-XTP methods based on transitions with infinite lifetime and empirical lifetime broadening already provides a reasonable agreement with experiment in the near-edge region, even as many-body effects such as inelastic scattering, Auger effects, and vibronic coupling are neglected.
In the future we will provide a more systematic study of method differences on a larger set of benchmark systems, to test if the performance of the ΔIP-TP and ΔIP-XTP approaches is robust across a variety of cases including Figure 10. Core-level spectra calculated with and without background charge compensation for azulene molecule in the gas-phase and adsorbed on the Cu(111) surface. (a) Comparison of ΔSCF XP spectra calculations with a homogeneous compensation charge (blue) and with a compensation charge introduced in the lowest unoccupied state (red). (b) Comparison of the ΔIP-TP (blue) and ΔIP-XTP (green) NEXAFS spectra. Both spectra share a 0.5 e − core-hole. The TP calculation includes −0.5 e background charge (blue) while the XTP calculations compensate charge by adding half an electron into the lowest unoccupied state (green). All spectra include broadening to simulate a typical experimental resolution and were shifted for better comparison. more challenging systems, e.g. containing coherent dipolar arrangements [67].

Conclusion and future outlook
We have presented the numerical and technical details of DFTbased core-hole simulation methods to calculate 1s XPS and K-edge NEXAFS signatures of organic molecules, organic molecular crystals, and metal-organic interfaces. The introduced methods are variants of the ΔSCF method and the TP approach (ΔIP-TP). For the benefit of practitioners in the field, we have compared and contrasted key parameter choices in performing such simulations including the choice of exchange-correlation functional, basis set, and if the model is to be aperiodic or treated within PBCs. We find that all tested functionals perform well for absolute and relative XPS binding energies, as long as the shifts are not too extreme due to strong descreening. For such extreme shifts, as encountered in the ETFA molecule, hybrid functionals perform better than GGAs and meta-GGAs. For periodic calculations, the calculation of absolute energies is more difficult, however, we find that periodic and aperiodic calculations yield comparable results for relative shifts, even in the presence of the inherent space charge problem due to the use of a compensating background charge.
The choice of periodic or aperiodic description for the simulation of XPS and NEXAFS spectra is strongly dependent on the system under investigation and comes with a complex trade-off regarding advantages and disadvantages. For example, aperiodic atomic-orbital based calculations, as we have performed with the FHI-aims package, can provide accurate core-level binding energies [27] and NEXAFS transitions [30] for an isolated molecule, a molecular crystal, and a metal-adsorbed molecule, but lack robustness with respect to core-hole localization and provide substantial computational overhead for the latter two systems. More advanced approaches to core-hole localization based on excited-state orbital optimization [120] or more sophisticated constraint definition [102,109,110] may be necessary to resolve this generally. In the case of a periodic pseudopotential plane wave description, as provided within CASTEP, the core-hole localization is in-built and the computations are efficient and numerically robust. However, the employed PBC can introduce an artificial charge background and spurious electrostatics with some core-hole constraining approaches [67]. We show that both, aperiodic and periodic approaches yield virtually identical relative core-level binding energies and that the relative peak positions in XPS and NEXAFS spectra do not appear to be strongly affected, even when different charge compensation schemes are used. However, we note that other systems with strong dipoles and no Fermi level pinning may show more significant artefacts in spectra when a homogeneous charge background is present. Furthermore, the comparison of absolute binding energies between simulation and experimental data for condensed matter systems carries additional pitfalls not addressed in this publication. We will systematically assess both issues in future work.
While the here presented constraint-based core-hole simulation methods are efficient and robust, they fail to describe many critical features of core-hole spectra such as Auger effects and energy-dependent lifetime broadening. Highly accurate first-principles methods based on many-body perturbation theory [79,[121][122][123] have recently emerged to address this. Nevertheless, approximate core-hole simulation methods remain essential workhorses for highly complex spectroscopy simulations with applications ranging from in-operando electrocatalysis [124] to ultrafast dynamics [125] and pump-probe spectroscopy [126]. The efficiency of these methods enables the creation of high-volume data [112,127,128] which can be used for machine-learning-assisted spectral assignment [129][130][131]. The here presented ΔSCF and ΔIP-TP approaches as implemented in CASTEP are versatile and reliable for organic molecules in different environments and represent important tools in this context.