

# Manuscript version: Author's Accepted Manuscript

The version presented in WRAP is the author's accepted manuscript and may differ from the published version or Version of Record.

# Persistent WRAP URL:

http://wrap.warwick.ac.uk/164081

# How to cite:

Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.

# **Copyright and reuse:**

The Warwick Research Archive Portal (WRAP) makes this work by researchers of the University of Warwick available open access under the following conditions.

Copyright © and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable the material made available in WRAP has been checked for eligibility before being made available.

Copies of full items can be used for personal research or study, educational, or not-for-profit purposes without prior permission or charge. Provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

# Publisher's statement:

Please refer to the repository item page, publisher's statement section, for further information.

For more information, please contact the WRAP Team at: wrap@warwick.ac.uk.

# Monitoring Power Module Solder Degradation from Heat Dissipation in Two Opposite Directions

Zedong Hu, Borong Hu, Student Member, IEEE, Li Ran, Senior Member, IEEE, Peter Tavner, Senior Member, IEEE, Hua Kong, Philip Mawby, Senior Member, IEEE, Ruizhu Wu, Member, IEEE,

Abstract- Solder degradation is still a main failure mechanism for power semiconductor modules. This study proposes a monitoring method to detect the relative change in heat dissipation from a module in two opposing directions, affected by the degradation: upwards via the silicone gel and downwards via the solder layer to the heatsink. The method is based on external module package measurements, and a Condition Indicator  $\gamma$  is defined as the ratio of heat transfer rates in the two directions. The expected response of  $\gamma$  to the level of degradation is analysed for different module operating points and external environment conditions. The method is demonstrated by experiment.

*Index Terms* – Condition monitoring, solder degradation, heat dissipation paths, power module<sup>1</sup>

#### I. INTRODUCTION

IGBT power modules are widely used in transport drives, renewable generators and grid control apparatus [1]-[3]. Their reliability is important, as the access to the systems can be restricted and/or unplanned downtime can be costly. Variable ambient condition, frequently changing operating point and incessant intra-cycle temperature variation cause considerable thermal stresses depending on the mismatch of the coefficient of thermal expansion (CTE) between packaging materials and chips, leading to solder fatigue in the forms of void, crack and delamination, and bond wire lift-off. These are the two main packaging-related aging-to-failure mechanisms, while in many cases, solder fatigue is a primary concern as the bond wire usually starts to age after solder degradation has reached a certain level [4]-[9]. As reported in [10], bond wire damage is merely noted after die-attach solder being severely degraded and junction temperature being greater than 200°C in their research. This study focuses on solder degradation in the early stage, although it is believed that early bond wire damage can also happen in applications with pulsed power features. The variable operating condition adds to the complexity of health condition monitoring but also provides opportunities for more sensitive diagnosis as to be shown in this paper.

Regarding reliability, Lai *et al.* studied the solder ageing effect of junction temperature cycles of different amplitudes [11]. Hu *et al.* developed a 2-D finite element model to evaluate the fatigue stress in the solder layer and predict the lifetime for a Si device, with results extrapolated to a SiC power module [12]. Kovačević *et al.* also treated the solder layer as the weakest part of the package structure and proposed a physics-of-failure model involving Clech's algorithm [13].

Techniques are being developed to monitor the module's solder health condition. A device temperature gradient based method was proposed in [14]. It requires an IR camera and is hard to apply in practice. Other expensive equipment has been employed to examine the fault development in laboratory, e.g., scanning acoustic tomography [15] and active thermography [16]. The semiconductor chip itself can be temperature sensor by means of temperature sensitive electric parameters (TSEP) [17]-[21], such as, on-state forward voltage for IGBTs and diodes, gate-source or gate-emitter voltage, saturation current and switching time (turn-on or turn-off delay). While theoretically elegant and seemingly easy to calibrate, implementation of these methods is still very rare as they often require fast and EMI immune electric measurements at device terminals, and the TSEP features are weakened in modules with inhomogeneous degradation of several chips in parallel [17], [22]. They can indicate a global temperature of dies but are unlikely to indicate the most degraded and hottest chip. Although temperature determines the safety of a device, it does not immediately signify the package degradation.

For reasons to be expanded later in this paper, solder degradation affects the way of power loss dissipation hence the external temperature distribution. This may be utilised to detect the degradation. A method by Xiang *et al.* requires only external temperatures. It assumes that all the device power loss is dissipated in the dominant direction towards the heatsink so that the change of power loss is estimated from the temperatures in this direction [23]. The estimated power loss is used to infer the chip temperature and solder degradation. As the solder degradation to be monitored is also in this direction, it is questionable whether the change of power loss can be estimated in this way since the solder degradation itself will impede the heat transfer. Wang *et al.* developed an

This work was supported in part by the U.K. EPSRC in Project EP/P009743/1 and in part by the National Key Research and Development Project of China Under Grant 2018YFB0905800.

Zedong Hu, Borong Hu, Philip Mawby and Ruizhu Wu are with the School of Engineering, University of Warwick, Coventry CV4 7AL, U.K. (e-mail: zedong.hu.1@warwick.ac.uk; bh529@cam.ac.uk; p.a.mawby @warwick.ac.uk; ruizhu.wu@qq.com).

Li Ran is with the School of Engineering, University of Warwick, Coventry CV4 7AL, U.K, and with the State Key Laboratory of Power Transmission Equipment and System Security and New Technology, School of Electrical Engineering, Chongqing University, Chongqing 400044, China. (e-mail: l.ran@warwick.ac.uk).

Peter Tavner is with School of Engineering and Computing Sciences, Durham University, Durham, DH1 3LE, U.K. (e-mail: peter.tavner@duram.ac.uk).

Hua Kong is with Shanghai Aerospace Control Technology Institute, Minhang District, Shanghai, 201109, China. (e-mail: kongfhua@163.com).

approach of using the nonuniform 2-D case temperature distribution (ratio k) to monitor the substrate solder degradation, but it was unable to detect the die-attach solder aging [21]. The approach is restricted to monitor a single mode of degradation. Hu et al. [19] adopted a similar approach. For bond wire lift-off and emitter metallization, they utilised the TSEP, the increase of  $V_{\text{CE-on}}$ . For mixed aging modes of the substrate solder, bond wire and emitter metallization, the measurands, case temperatures and  $V_{\text{CE-on}}$ were used simultaneously for diagnosis. However, utilising the ratio k and the change of  $V_{\text{CE-on}}$  to determine the aging modes is essentially related to the estimation of power loss in the downwards direction. It is still necessary to assume that the heat dissipation through the case equals the total power loss. As Hu et al. stated in [24] that large errors could occur if other heat dissipation paths were ignored. They used a neural network as the multiple-input and multiple-output (MIMO) thermal model based on external measurements. It required high volume of training data that may not always be available. Instrumentation was again only in the domination direction.

This paper proposes a monitoring method for detecting solder layer degradation at either the die-attach or substrate locations, with no intention to differentiate them along the same heat dissipation route. It only qualifies the change of equivalent thermal resistance and thus the total level of solder degradation. When, because of degradation, it is more difficult to dissipate heat in the intended heatsink direction, secondary heat transfer through the above-chip silicone gel increases. There were results confidentially reported from industrial partners that the silicone gel on top of the chips liquefied suggesting that a significant amount of heat was dissipated in this direction in aged modules. This effect is exploited in this paper to detect the solder degradation, for better sensitivity and confirmation with other methods. To track the split of heat flow, it is necessary to know a pair of temperature differences between the chip and temperature measuring point above or beneath the chip (in each heat transfer path), and the changed solder layer thermal resistance. This study develops an iterative algorithm to accurately track such information considering the change of electric operating point. A look-up table relating the solder degradation to the electric operating point and external condition is established in the calibration process. A condition monitoring method based on a range of look-up tables is then proposed. It requires calibration but is not necessarily more complicated than a TSEP method. Experiments show that the method can be applied when the converter system operates in any thermal steady state.

## II. BIDIRECTIONAL HEAT TRANSFER MODEL AND RESPONSE TO SOLDER DEGRADATION

Solder degradation increases the thermal resistance in the junction-to-baseplate (case) direction and junction temperature which may eventually exceed the limit causing breakdown. Fig. 1 shows a typical power module with materials/elements from the top to bottom: plastic housing, silicone gel, chip, die-attach solder layer, direct bond copper (DBC), baseplate solder layer and baseplate. Fig. 1 also shows that the module is

mounted on a water-cooling heatsink through thermal interface material (TIM) and the temperature sensing points which may later be used for condition monitoring. Thermocouples (TC) are cost effective and can be used for each chip. The silicone gel has low thermal conductivity and was previously assumed to be a perfect insulator. Unidirectional heat transfer was then considered from a chip downwards through the solder layers, DBC, baseplate and TIM towards the heatsink, with the inevitable conclusion that any extra power loss due to solder degradation can be estimated from the temperatures in this direction [23].

However, a small fraction (approximately 0.5%-1.3%, see Fig.17 in Section IV.B) of the total power loss normally dissipates upwards via the silicone gel to the air inside the plastic housing. It heats up the air and drives the heat out of the housing to the ambient where the air temperature is relatively stable. The small fraction of power loss is expected to respond sensitively to the solder degradation which impedes the heat flow from the chip to baseplate. The degradation level would be seriously under-estimated if only the temperature distribution in the baseplate direction is used to estimate the change of power loss. Instead of correcting the power loss estimation, this paper proposes a new criterion to signify the degradation. This is the ratio of heat transfer rates in the upwards silicone gel and downwards baseplate directions. It is worth mentioning that the silicone gel may be replaced by another heatsink as Chen et al. [25] described referring to the double side cooled power modules. The method reported here can also be used in these cases.



Fig. 1 Typical configuration of a power module.

### A. Generic Model

Power modules usually have multiple chips and a multilayer structure as shown in Fig.1. A thermal network model can be complex [26], as shown in Fig. 2, with several heat sources and spreaders. For each chip, the equivalent thermal resistance to the reference point includes bulk thermal resistance and thermal spreading resistances.

Temperatures of the ambient  $(T_{1, \infty})$  and heatsink coolant  $(T_{2,\infty})$  will affect the split of heat transfer, but they are relatively stable in the sense that they will not be affected by the solder aging condition. In terms of Fourier's law of conduction and Newton's law of convection [27], for a single chip in a thermal steady state, its thermal network can be simplified as shown in Fig. 3.



Fig. 2 Example of a thermal network circuit model of 4-chip dies between heat spreaders, taken from [26].

 $T_1$  is the junction temperature and *P* the total power loss; the heat transfer rates are  $\dot{Q}_1$  (or  $P_1$ ) and  $\dot{Q}_2$  (or  $P_2$ ), in the two directions;  $R_{\text{th}}$  is the equivalent thermal resistance between two nodes in a heat transfer path;  $T_{\text{xx}}$  denotes temperature;  $h_x$ is the heat transfer coefficient.  $T_{1,\infty}$  (= $T_{\text{ambient}}$ ) and  $T_{2,\infty}$  (= $T_{\text{inlet}}$ ) are the relatively stable external temperatures of the ambient (air) and inlet water respectively. The ambient temperature ( $T_{1,\infty}$ ) means the air temperature in the converter enclosure.  $h_1$ and  $h_2$  are the external heat transfer coefficients to these reference points. The external conditions ( $T_{1,\infty}$ ,  $h_1$  and  $T_{2,\infty}$ ,  $h_2$ ) are assumed to be the heat dissipation boundary condition, not being affected by the degradation.



Fig. 3 Schematic of heat transfer paths in a power module.

The thermal network is an approximation for ignoring the lateral heat transfer. From a specific device point of view, the measured  $T_g$  includes the effect of other chips. In turn it also implies that some of the heat generated by the device will be transversely diffused to the others. However, given the small thickness of the materials and components inside a wire bonded power module (between the measurement points), lateral heat transfer is minor [24] and will have only secondary effect on the results of condition monitoring.

The following equations are based on the power balance and heat transfer principle for an individual device. They associate the ambient and inlet water temperatures to the split of the total power loss into heat flow rates in the two directions  $\dot{Q}_1$  and  $\dot{Q}_2$ :

$$\begin{cases} P(T_{\rm J}) = \dot{Q}_{1} + \dot{Q}_{2} \\ \dot{Q}_{1} = \frac{T_{\rm J} - T_{1,\infty}}{R_{\rm th10} + R_{\rm th11} + R_{\rm th12} + R_{\rm th12}} \\ \dot{Q}_{2} = \frac{T_{\rm J} - T_{2,\infty}}{R_{\rm th20} + R_{\rm th21} + R_{\rm th22}} \end{cases}$$
(1)

As the solder layer degrades, its thermal resistance increases impeding the heat transfer in its direction. An increased proportion of the heat will be dissipated in the other direction, through the silicone gel which is assumed to have a constant thermal conductivity in the temperature range of concern. A Condition Indicator  $\gamma$  is defined for a solder layer as follows:

$$\gamma = Q_1/Q_2 \tag{2}$$

which is reduced to

$$\gamma = \frac{T_{\rm I} - T_{\rm 1,00}}{T_{\rm I} - T_{\rm 2,00}} \times \frac{R_{\rm th2}}{R_{\rm th1}}$$
(3)

where qualitatively  $R_{\text{th}1}=R_{\text{th}10}+R_{\text{th}11}+R_{\text{th}12}+R_{\text{th}13}$  and  $R_{\text{th}2}=R_{\text{th}20}+R_{\text{th}21}+R_{\text{th}22}$ .

The Condition Indicator can be calculated from the junction temperature, a pair of external temperatures and the ratio of equivalent thermal resistances above and below the chip.

Suppose with solder degradation, the increment of the solder thermal resistance is  $\Delta R_{\text{th}20}$  and the increment of the junction temperature is  $\Delta T_J$  from  $T_{J0}$ .  $\gamma$  is as follows

$$\gamma = \frac{(T_{\rm J0} - T_{\rm 1,\infty}) + \Delta T_{\rm J}}{(T_{\rm J0} - T_{\rm 2,\infty}) + \Delta T_{\rm J}} \times \frac{R_{\rm th2} + \Delta R_{\rm th20}}{R_{\rm th1}}$$
(4)

Unlike solder degradation, bond wire and emitter metallization aging hardly change the thermal resistance [18, 19, 21], but they cause the junction temperature to increase by  $\Delta T_J$  from  $T_{J0}$ . Therefore  $\gamma$  is as follows

$$\gamma = \frac{(T_{J_0} - T_{1,\infty}) + \Delta T_J}{(T_{J_0} - T_{2,\infty}) + \Delta T_J} \times \frac{R_{\text{th}2}}{R_{\text{th}1}}$$
(5)

In view of (4) and (5),  $\gamma$  can potentially differentiate the two aging-to-failure mechanisms. This paper investigates into condition monitoring of power module for wind turbine system. Solder fatigue is the power module's dominant degradation mode in wind turbine system, therefore, solder degradation is the focus of this paper.

For a given electric operating point and environmental condition, concerning solder aging, a change of  $\gamma$  for a chip reflects a change of its thermal resistance and hence the module health condition. The external environmental conditions can be specified by  $T_{1,\infty}$ ,  $h_1$  and  $T_{2,\infty}$ ,  $h_2$  which are measurable.

This study focuses on a single chip. This is justifiable as when a solder layer starts to degrade, it is subject to progressively increasing fatigue stresses [11]. Therefore, it can be assumed that the condition of other chips' solder layers is not changing, although their condition indicators would be slightly affected due to the lateral heat transfer which is not specified in the model. Even if multiple chips degrade in the process, the algorithm can be dynamically updated to suit the gradually changing condition.

#### B. Response Analysis

For a healthy power module in a certain environment, one electric operating point corresponds to a steady-state junction temperature. Solder degradation increases the equivalent thermal resistance in the junction-to-heatsink direction and the junction temperature which is also affected by the change of device power loss. Condition Indicator  $\gamma$  as defined in (4)

responds differently in 3 scenarios of the external environmental temperature specified by  $T_{1,\infty}$  and  $T_{2,\infty}$ .  $T_{J0}$  is the junction temperature without degradation.

- 1) Scenario 1:  $T_{J0}>T_{1,\infty}>T_{2,\infty}$ .  $\gamma$  increases with the solder degradation. The rise of junction temperature enhances the signature of the increased thermal resistance.
- 2) Scenario 2:  $T_{J0}>T_{1,\infty}=T_{2,\infty}$ . The first factor in (4) collapses to 1.  $\gamma$  increases only as a result of the increased thermal resistance.
- 3) Scenario 3:  $T_{J0}>T_{2,\infty}>T_{1,\infty}$ . There is a complex pattern for  $\gamma$  to vary with the level of solder degradation, depending on the electric operating point.

It is possible that  $\gamma < 0$ , if either the air in the vicinity of the power module or a lightly loaded chip is cooled by the heatsink. In this case,  $T_{1,\infty}>T_{2,\infty}$ .

 $\gamma = 0$  occurs if the silicone gel were such a perfect thermal insulator that no heat would pass through it, corresponding to  $\dot{Q_1} = 0$  in (2).

 $\gamma > 0$  as long as some of the heat generated by the chip is dissipated into the ambient when  $T_{J0} > T_{1,\infty}$ .

Condition monitoring is achieved by observing the change of  $\gamma$  under the same load and external environmental condition. Understanding the effects of the operating point and external environmental temperatures by  $T_{1,\infty}$  and  $T_{2,\infty}$  informs us how to extract condition information, which is useful to select suitable load levels. For instance, early solder degradation may be better detected at high power levels. On the other hand, understanding the response of  $\gamma$  at light load levels may be valuable in systems such as offshore wind turbines because degradation can be addressed before the winter season when maintenance will be more difficult.

Normally most of the heat are removed in the heatsink direction. Unlike the air temperature around the power module, which is easily measurable, it is hard to access the temperature along the water flow in heatsink. Great temperature variation can occur at high power levels, resulting in different effects on chips for a multi-chip module. In this study, the inlet water temperature is measured, and the water flow rate is set to be constant.

#### III. MONITORING METHOD FOR SOLDER DEGRADATION

The above model was used to define the Condition Indicator  $\gamma$  and explain the effect of external environmental temperatures on the change of split heat transfer rates in the two directions. To detect the change and compute the Condition Indicator  $\gamma$ , it is necessary to capture the temperature changes caused by the degradation. Because the thermal resistances are required in the  $\gamma$  computation, the temperature measurement points will be set in nodes from chip to where the equivalent thermal resistances can be estimated qualitatively. For heat transfer by way of conduction, the thermal resistance in healthy condition can generally be found in the datasheet or can be calculated based on components and their material properties. For heat transfer via convection, it is usually very complex, affected by many factors. It needs to introduce extra complex instruments to online measure the heat transfer coefficient. For some points, they may be difficult to access. Therefore, the thermal sensing points are set in locations where heat transfers by conduction. They are respectively on the top side of the heatsink and below but close to the top surface of silicone gel as shown the points labelling  $T_h$  and  $T_g$  in Fig.1. The temperature measurement points are vertically aligned to the corresponding chip centres in concern. Other points shown in Fig. 1 are used for the analysis later in the paper.

For the level of degradation to be detected, the solder layer thermal resistance increases by  $20{\sim}50\%$ . The junction temperature rises by about 10 K and that at the module case by  $1{\sim}2$  K [23]. It is assumed that the material properties are unaffected in the temperature range. Also assume that the silicone gel remains healthy at this early stage and its thermal properties remain the same. The heat transfer system from junction to the measuring points can be approximately linear, and the principle of superposition applies [21].

#### A. Calculation of Condition Indicator

In Fig.1, the measurement points aligned to the device chips are marked as  $T_{gi}$  and  $T_{hi}$ , i=1,...4, in the 4-chip example. For each chip, heat flows are assumed to be either upwards via the silicone gel or downwards to the heatsink, a set of equations similar to (1) but reference to the temperature measurement points are:

$$\begin{pmatrix}
P(T_{J}) = \dot{Q}_{1} + \dot{Q}_{2} \\
\dot{Q}_{1} = \frac{T_{J} - T_{g}}{R_{th10}} \\
\dot{Q}_{2} = \frac{T_{J} - T_{h}}{R_{th20} + R_{th21}}
\end{cases}$$
(6)

Then Condition Indicator is defined as:

$$\gamma = \frac{T_{\rm J} - T_{\rm g}}{T_{\rm I} - T_{\rm h}} \frac{R_{\rm th20} + R_{\rm th21}}{R_{\rm th10}} \tag{7}$$

Again, suppose that the solder degradation causes the junction-to-case thermal resistance to increase by  $\Delta R_{\text{th}20}$  and the junction temperature to increase by  $\Delta T_{\text{J}}$ , then

$$\gamma = \frac{T_{\rm J0} - T_{\rm g} + \Delta T_{\rm J}}{T_{\rm J0} - T_{\rm h} + \Delta T_{\rm J}} \frac{R_{\rm th20} + R_{\rm th21} + \Delta R_{\rm th20}}{R_{\rm th10}}$$
(8)

The ratio of the heat transfer rates,  $\gamma$ , depends on the temperature differences from the junction to the measurement point of the silicone gel and to the top of the heatsink respectively, and the corresponding thermal resistances. To determine the solder degradation level, the value of  $\gamma$  is to be compared with that before degradation, at the same electric operating point and external environmental condition. The difficulty is that the device junction temperature is not measurable and  $\Delta R_{\text{th}20}$  changes with aging. By exploiting the characteristics of the power loss depending on temperature, a bespoke iterative algorithm is developed to compute the junction temperature under a known thermal resistance increment  $\Delta R_{\text{th}20}$ . As the solder layer aging can be emulated by inserting thermal conductive pad (thermal interference material, TIM) between the baseplate and heatsink [22]-[24], then a series of  $\Delta R_{\text{th}20}$  values can be used in calibration (as detailed in Section IV). Then the bespoke iterative algorithm can find the device junction temperature and power loss to

satisfy the measured silicone gel and heatsink temperatures. Condition Indicator  $\gamma$  can be calculated. The calibration results will be collected to build up look-up tables for solder degradation monitoring (as described later).

#### B. Bespoke Iterative Algorithm

The iterative algorithm includes device thermal and electrical models to find device temperature, power loss and Condition Indicator  $\gamma$  for a given  $\Delta R_{\text{th}20}$  with silicone gel and heatsink temperatures ( $T_{\text{g}}$  and  $T_{\text{h}}$ ) measured in the thermal steady state.

The electrical model computes the power loss of the device at a junction temperature  $T_J$  and electric operating point. With a fixed DC link voltage of the converter, the latter is represented by the IGBT current  $I_c$  and gate signal  $V_{ge}$ . The forward current of the antiparallel diode is also denoted as  $I_c$ . The power loss includes both on-state and switching losses, whose calculation can be based on datasheet; deviation from module to module can be considered by adding a bias in the result. The model outputs the total power loss  $P(T_J)$  for the IGBT or diode, as shown in Fig. 4. The output will be fed to the thermal model.

The thermal model, as shown in Fig. 5, is built in terms of the heat transfer rates in the two directions and power balance in the steady state, to compute the device temperature  $T_J'$ . Measurands  $T_g$  and  $T_h$  as well as the total power loss P at  $T_J$ from the electrical model are the inputs to the thermal model with known increment of thermal resistance  $\Delta R_{th20}$ . The model outputs the estimated junction temperature  $T_J'$  and the heat transfer rates in the two directions.



Fig. 4 Electrical model of an IGBT or diode chip.



Fig. 5 Thermal model of an IGBT or diode chip.

The iterative algorithm shown in Fig.6 proceeds recurrently using the two models, to find a small difference between the estimated junction temperature  $(T_J')$  from thermal model and junction temperature  $(T_J)$  from electrical model. The temperature difference value must be within the given accuracy specification. This is trying to null  $T_J'-T_J$  until convergence. Then the Condition Indicator  $\gamma$  is derived.

The power loss is a bridging agent to determine the device temperature, without direct effect on the Condition Indicator. Temperature measurements in the two directions, are the main factors affecting the accuracy.

# C. Establishing $\epsilon \gamma - \Delta R_{th20}(\gamma_n)$ Look-up Tables

In calibration, the change of thermal resistance can be known by inserting thermal conductive pads under the baseplate as stated above [22]-[24]. Another way is to use preaged power modules with thermal resistance measured off-line [11]. Condition Indicator  $\gamma$  can be derived under the condition of knowing the thermal resistance, using the bespoke iterative algorithm with measurands  $T_g$  and  $T_h$ ; A  $\Delta R_{th20} - \gamma$  look-up table can be established for each case of  $T_{1,\infty}$ ,  $h_1$  and  $T_{2,\infty}$ ,  $h_2$ as well as the electric operating point ( $I_c$  and  $V_{ge}$ ). However, for on-line solder degradation monitoring,  $\Delta R_{th20}$ , is to be determined rather than given.



Fig. 6 Bespoke iterative algorithm.

One approach is based on the  $\Delta R_{\text{th}20} - \gamma$  look-up tables. Given the  $\Delta R_{\text{th}20}$  increasing in small steps from 0, the bespoke iterative algorithm with the measured  $T_g$  and  $T_h$ ,  $I_c$  and  $V_{\text{ge}}$ , proceeds a number of iterations to null  $T_J' - T_J$ , and then computes  $\gamma$ . The obtained  $\gamma$  and  $\Delta R_{\text{th}20}$  then are compared with the  $\Delta R_{\text{th}20} - \gamma$  look-up table. The process is repeated until they match. This method requires extensive computing resources and consumes considerable time. For a whole system, multiple indices need to be obtained and this method may not be practical for real-time monitoring. Hence this paper proposes another approach to calculate  $\gamma$  efficiently based upon  $\varepsilon \gamma - \Delta R_{\text{th}20}(\gamma_n)$  look-up tables, which is to be presented in-depth in Section III.D. The  $\varepsilon \gamma - \Delta R_{\text{th}20}(\gamma_n)$ look-up table is established as following:

First, calculate the Condition Indicator for a healthy device labelling as  $\gamma_n$ , as shown in Case 1 of Fig.7.  $R_{\text{th10}}$  and  $R_n$  respectively represents the thermal resistances from the junction to the top of silicone gel and to the heatsink in the healthy condition.

Second, with the same electric operating point and external condition, in calibration, the device is with solder degradation of  $\Delta R_{th20}$ , and the measured temperatures will change, represented by  $T_g$ ' and  $T_h$ '. The junction to heatsink thermal resistance is  $R_n+\Delta R_{th20}$ . To obtain  $\mathcal{E}\gamma$ ,  $\Delta R_{th20}$  is deliberately set to zero. In other word, it is assumed that the device were still healthy. In this case, as shown in Case 2 of Fig.7, Condition Indicator denoting by  $\gamma'$  can still be computed by using the bespoke iterative algorithm with the input of  $R_{th10}$  and  $R_n+0$ ,  $T_g$ ' and  $T_h$ ', as well as  $I_c$  and  $V_{ge}$ .

Finally, the difference of the Condition Indicator in above two cases,  $\epsilon \gamma = \gamma_n - \gamma'$  is attained, resulting from the solder degradation  $\Delta R_{th20}$ . By this way, a set of  $\epsilon \gamma - \Delta R_{th20}(\gamma_n)$ look-up tables corresponding to varying electric operating points and external environmental conditions are then obtained from the calibration process. They are to be employed in online monitoring of solder degradation, as details in next subsection. For the same type of power modules, calibration can be performed on a representative sample. The initial differences between individual modules due to manufacturing tolerance will be reflected in the output of the above procedures. However, it is expected that the effect will be small with modern manufacturing consistency.



Fig. 7 Schematic of attaining  $\epsilon\gamma$ - $\Delta R_{th20}$  mapping in calibration ( $R_n = R_{th20} + R_{th21}$ ).

#### D. Condition Monitoring Strategy

Condition Indicator  $\gamma$  is computed in real-time to detect the level of solder degradation using the pre-established  $\epsilon \gamma - \Delta R_{\text{th}20} (\gamma_n)$  look-up tables. This is achieved in a nested algorithm with two stages, as shown in Fig. 8.

Firstly it assumes that  $\Delta R_{\text{th}20}=0$  to compute the Condition Indicator, denoted as  $\gamma'$  in Stage A, with the measurands  $T_{\text{g}}$ ,  $T_{\text{h}}$  and electric operating point. Then the difference  $\epsilon \gamma = \gamma_n - \gamma'$  is obtained to determine  $\Delta R_{\text{th}20}$  through the  $\epsilon \gamma - \Delta R_{\text{th}20}$  ( $\gamma_n$ ) look-up table. Note that  $\gamma'$  and  $\gamma_n$  (the Condition Indicator of a healthy module) are for the same external environmental condition ( $T_{1,\infty}$ ,  $h_1$  and  $T_{2,\infty}$ ,  $h_2$ ) and electric operating point ( $I_c$  and  $V_{\text{ge}}$ ). If  $\epsilon \gamma = 0$ , then the module is without solder degradation, i.e.  $\Delta R_{\text{th}20}=0$ . Otherwise, it is aged. The increment of thermal resistance  $\Delta R_{\text{th}20}$  can be determined using the  $\epsilon \gamma - \Delta R_{\text{th}20}$  ( $\gamma_n$ ) look-up table. Once  $\Delta R_{\text{th}20}$  is determined, the bespoke iterative algorithm is used again in Stage B with the measurands  $T_g$ ,  $T_h$ , the electric operating point and thermal resistances ( $R_{\text{th}10}$ ,  $R_n+\Delta R_{\text{th}20}$ ). The output includes the Condition Indicator and internal junction temperature as well as the total power loss.

It may appear that  $T_{1,\infty}$ ,  $h_1$  and  $T_{2,\infty}$ ,  $h_2$  are not needed in the calculation. Indeed, once  $\Delta R_{th20}$  is known, the calculation is entirely confined between the measured temperatures  $T_g$  and  $T_h$ . However,  $T_g$  and  $T_h$  depend on the external or boundary condition and the heat source. This is why  $\Delta R_{th20}$  has to be determined using the  $\epsilon \gamma - \Delta R_{th20}$  ( $\gamma_n$ ) look-up table corresponding to the same  $T_{1,\infty}$ ,  $h_1$  and  $T_{2,\infty}$ ,  $h_2$  and the electric operating point.

In Section III it assumes that the packaging material properties are unaffected in the temperature range, however, they are temperature dependent [28]. But this approach is still applicable as it compares the  $\gamma$  signature before and after solder degradation. Because of the larger negative temperature coefficient of thermal conductivity of the silicone gel, the calculated heat transfer rate change in the secondary direction using thermal conductivity at a relatively low temperature would be higher than the actual change. As a result, the condition indicator obtained as in this paper would be more sensitive, which is an advantage. This will not cause any error in estimating the degradation level as long as the calibration process is conducted using the same thermal conductivities. It should however be noted that using thermal conductivities at lower temperatures will lead to an estimated junction temperature lower than the actual value. But this issue can be addressed in the future by measuring the junction temperature during the well-controlled calibration process, using e.g. TSEPs.

The proposed method can accept errors in power loss calculation, as long as the errors are consistent between calibration and the use of condition monitoring during operation. The key point of the study is that with solder degradation, changes of the heat transfer rate should be detected in both directions. If the change is only detected in the traditional downwards direction, the inferred result about the solder degradation would have large error. The relative changes are more important than the absolute heat transfer rate values. It is important that the calculation should be consistent to the calibration.

The proposed condition monitoring is appropriate to implement online in thermal steady state. In systems like offshore wind turbines, there are opportunities to reach such a steady state between diurnal variations as the turbulence in the frequency range from 0.01~0.1 Hz can be largely absorbed by the moment of inertia of the blades. The thermal time constant of an IGBT power module is typically 100~200ms.

To validate the proposed condition monitoring method, an inductively loaded half-bridge inverter test rig is built, as detailed in the next section.



Fig. 8 Condition monitoring strategy.

# IV. EXPERIMENT

A. Test Rig

#### IV. LAFENIVIENI

A commercially available SEMITRANS<sup>®</sup> power module SKM50GB12T4 (1200V/62A) is used as the test specimen. A half-bridge module consists of two IGBTs and two diodes, whose datasheet forward characteristics are shown in Figs. 9(a) and (b) [29].

The IGBT has a low zero temperature coefficient (ZTC) point at about 14 A. It is easy to keep it working in the positive temperature coefficient (PTC) region. The PiN diode has a high ZTC point of 58 A, which is only slightly lower than the rated current. If a 10% margin is to be kept, the diode always works in the negative temperature coefficient (NTC) region. If the temperature of an aged chip increased by about 10 °C, the maximum change of the on-state voltage would be about 2%. The change of switching losses, usually more sensitive depending on switching frequency, would be about 4% [24]. While this needs to be considered in the iterative algorithm, the temperature distribution would be more directly affected by the change of the thermal resistance, which provides another sensible indication of the solder degradation.



Fig. 9 Device forward characteristics.

Figs. 10 (a) shows a half-bridge inverter circuit. Fig. 10 (b) shows the experimental test rig and a close shot of the test IGBT module with thermocouples (TCs), 4 TCs having been embedded in the silicone gel and 4 more TCs in air inside the housing. A 200 V DC link is provided by a power supply. The inverter is controlled from a dSPACE-1103 platform using Matlab/Simulink. Amplitude of the 50 Hz load current (peak value) is controlled in steps of 10 A, from 10 A to 50 A. The load current and gate signals are recorded. The PWM switching frequency is 2550 Hz. The power module is mounted on a water-cooled heatsink (Hi-Contact 416601). The inlet water temperature is controlled by a chiller (Lauda WK4600, 6 litre/min). It is set at two different temperature

levels with certain tolerance:  $T_{2,\infty} = ~27.7^{\circ}\text{C} ~(\pm 0.6^{\circ}\text{C})$  or  $T_{2,\infty} = ~36.8^{\circ}\text{C} ~(\pm 0.8^{\circ}\text{C})$ . The temperature and flow rate at the inlet are measured by a thermocouple and a flowmeter. The ambient temperature is  $T_{1,\infty} = ~20^{\circ}\text{C}$ , monitored by a thermometer.



Schematic of a half-bridge inverter test rig.



(b) The actual half-bridge inverter test rig. Fig. 10 Half-bridge inverter test rig

In this paper, a set of miniature type-K IEC 1/0.2 mm thermocouples are used. Thermal sensors are installed in line with [22]-[24] and [30]-[32]. Four miniature thermocouples are respectively placed at the measurement spots: approximately 0.8 mm below the top surface of the silicone gel, back of baseplate, and the top surface of heatsink. The projected thermocouple axes are located in the centre of the chip. To measure the corresponding hotspot-temperatures of the gel and internal air, 4 holes are drilled on the top of housing as shown in Fig.10(b). The thermocouples are

inserted into housing through the holes. To ensure accurate positioning of the sensors implanted in silicone gel, the work was conducted under an optical microscope with white light illumination. Embedding thermocouples in silicone gel is laboratory-based experimental study only. In practical applications in the future, temperature sensors such as NTCthermistors and distributed thermal FBGs could, when needed, be pre-integrated into packaging. The temperature is acquired at 1 Hz using an NI 9213 DAQ through LabVIEW. Solder degradation is emulated using thermal interface material (TIM) pads inserted between the module baseplate and heatsink, similar to [22]-[24] and [32]. The pad has a thermal conductivity of 4.0 W/m·K and a thickness of 1 mm or 0.5 mm. To quantify the equivalent thermal resistance from the junction to the silicone gel measurement spot, gel of the same thickness was pasted on a chip-size plate with a heat source. Thermal insulation pad was enclosed to impede heat dissipating to the environment. The top plate and gel temperatures are measured under known input power. Based on these, the equivalent junction-to-gel thermal resistance is measured to be 40.18 °C/W for IGBT and 60.6 °C/W for diode respectively. The electro-thermal model and the  $\varepsilon \gamma - \Delta R_{\text{th20}}(\gamma_n)$  look-up tables are obtained with the above arrangements. The equivalent junction-to-heatsink thermal resistance for the IGBT or diode corresponding to the four levels of degradation is labelled as  $R_0$ ,  $R_1$ ,  $R_2$  and  $R_3$  listed in Table I. The scenario with  $R_0$  is treated as the healthy condition and the pad surface in contact with the heatsink is considered as the new case (baseplate) of the module. This has the same effect of increasing the total thermal resistance and will not change the general conclusion of experimental validation. TADIEI

| TADLET     |                  |            |                |  |  |  |  |  |  |  |  |
|------------|------------------|------------|----------------|--|--|--|--|--|--|--|--|
| UNCTION TO | HEATONIZ THEDMAN | DECICEANCE | DI EXDEDITATIV |  |  |  |  |  |  |  |  |

|  | JUNCTION-TO-HEATSINK THERMAL RESISTANCE IN EXPERIMENTS |       |       |       |       |        |       |       |       |  |
|--|--------------------------------------------------------|-------|-------|-------|-------|--------|-------|-------|-------|--|
|  | Chip type                                              | IGBT1 |       |       |       | Diode1 |       |       |       |  |
|  | Thermal resistance                                     | $R_0$ | $R_1$ | $R_2$ | $R_3$ | $R_0$  | $R_1$ | $R_2$ | $R_3$ |  |
|  | Value<br>(°C/W)                                        | 1.54  | 1.94  | 2.26  | 2.31  | 1.87   | 2.31  | 2.71  | 2.80  |  |
|  | Normalize                                              | 1.00  | 1.26  | 1.46  | 1.50  | 1.00   | 1.23  | 1.45  | 1.50  |  |
|  |                                                        |       |       |       |       |        |       |       |       |  |

All sensors to be used need to be carefully calibrated prior to employing or incorporation. For the same testing system, the uncertainties of sensors usually are consistent. The measurement accuracy can be improved by acquiring 60 samples of data to derive the average value. Power module silicone gel may vibrate during operation, in-bult thermocouple sensors in the gel therefore experience displacement and large temperature deviations could be measured during loaded operation. This requires evaluation for practical application, however, temperature deviation, due to vibration, can be reduced for in-built sensors at thermal steady state by taking an average of a 60-sample set of values. In this paper, the silicone gel vibration has not significantly affected the temperature measurement results. Temperature measurement comparison was made between in and not in operation, at a steady state current of 50A over a 60s-time window, 60 samples. The temperature standard deviation (uncertainty) difference between these two conditions is 0.0041°C. Therefore, the temperature measurement uncertainty caused by soft silicone gel vibration may confidently be neglected.

#### B. Experiment Results and Discussion

Two sets of experiments are conducted with the same heat transfer coefficient pair ( $h_1$  and  $h_2$ ) but different environment temperatures:  $T_{1,\infty} \approx 20^{\circ}$ C and  $T_{2,\infty} \approx 27.7^{\circ}$ C, or  $T_{1,\infty} \approx 20^{\circ}$ C and  $T_{2,\infty} \approx 36.8^{\circ}$ C, to show their (thermal boundary/external environmental temperature) effects on the split of heat transfer rate, as discussed in Section II. With a set of pre-set junction-to-heatsink thermal resistance values and inverter current from 10 A to 50A, the Condition Indicator is derived.

A set of representative results regarding IGBT 1 are presented below. The environment temperatures are  $T_{1,\infty} \approx$ 20°C and  $T_{2,\infty} \approx$  36.8°C, and the peak load current is about 40 A. Note that the junction-to-heatsink thermal resistance value  $R_0$  represents the healthy condition while  $R_1$  represents a case of solder degradation.

The current through IGBT 1 is shown in Figs. 11(a) and (b) for 20 ms and this is not apparently affected by the solder degradation. The measured temperatures at different locations are shown in Figs. 11(c) and (d). Small variation ( $<1^{\circ}$ C) is noted for the ambient air temperature. The temperature of inlet water changes in each cooling cycle of the chiller, affecting temperatures at the heatsink, baseplate, and silicone gel.



Fig. 11 Results regarding IGBT1 at 40 A.

The temperatures will be used for condition monitoring at the same electric operating point. In this case, the silicone gel temperature (>58°C) is much higher than that of the ambient (20°C), particularly after degradation, suggesting that a part of the power loss is indeed dissipated through the silicone gel, rather than all in the baseplate/heatsink direction.

In spite of the water cooling cycle effect, it is possible to get a snapshot of, say, 60 seconds, during which the operation

of the converter is in a slow-changing or quasi-steady thermal state. The measurement readouts are taken to the proposed condition monitoring algorithms. Without or with solder degradation, the point-by-point temperature measurements used in the bespoke iterative algorithm are shown in Figs. 12 (a) and (b). The denotation of variables is the same as in Figs. 1 and 3. The temperature of the module baseplate ( $T_b$ ) is also measured for checking the estimation results, as shown later. In practical application, there would be no need to assemble thermal sensors at the bottom of baseplate. The thermal resistance in the heatsink direction under IGBT 1 is increased by 26%. In the computation of algorithms, all thermal resistance values including the increment due to aging are known. The purpose of the computation is to show the effect of the thermal resistance change from  $R_0$  to  $R_1$ .

As a boundary condition required by the generic model, the inlet water temperature  $(T_{2, \infty})$  and the flow rate in the two cases are very close. However, the effect of 0.5~1°C variation of the ambient air temperature  $(T_{1,\infty})$  is negligible because the degradation has caused gel top temperature  $(T_g)$  to change by 5 °C while the heatsink temperature  $(T_h)$  changes only by 0.2 °C. The estimated junction temperature  $(T_J)$  and Condition Indicator  $(\gamma)$  are shown in Figs. 12 (c) and (d). Comparing the results shows that 26% of the thermal resistance increment has caused the Condition Indicator of IGBT 1 to increase by 23.5%, with fluctuation less than 0.5%. The sensitivity of the Condition Indicator is approximately 0.36%/(°C/W) in this case. The sensitivity varies with the level of solder operating point, degradation. electric and external environment condition as to be reported at the end of this section.

As shown previously in Fig.3, the average heat transfer rate in an individual path should be the same between any two points along the path. Based on this, the accuracy of the estimated junction temperature can be verified because in experiments the equivalent thermal resistances from the junction to the heatsink and from the baseplate to heatsink are known, and the baseplate and heatsink temperatures are measurable. The heat transfer rate downwards to the heatsink, should satisfy (9). The downwards heat transfer rate  $(P_2)$  is derived from the estimated junction temperature  $(T_J)$ , measured heatsink temperature, and relating thermal resistance should equal that calculated from the measured temperature difference between the baseplate and heatsink and the corresponding thermal resistance. Fig.13 exhibits that in either the health or degraded condition, the downwards heat transfer rates  $(P_2)$  derived using the two sides of (9) are approximately equal. It shows that the estimated junction temperature and power loss are close to the real values.

$$\frac{T_{\rm J} - T_{\rm h}}{R_{\rm th20} + \Delta R_{th20} + R_{\rm th21}} = \frac{T_{\rm b} - T_{\rm h}}{\Delta R_{th20} + R_{\rm th21}}$$
(9)

Fig.14 explains the building of an  $\epsilon \gamma - \Delta R_{\text{th}20}(\gamma_n)$  look-up table in calibration. It is a detailed illustration of Subsection III.C based on the experiment data. The Condition Indicator with no degradation is used as the reference, marked as  $\gamma_n$  in Fig. 14 (a). The  $\gamma$  value calculated for a degraded

condition (i.e. the junction-to-heatsink thermal resistance is  $R_1$ ) is shown in Fig. 14 (b), with a known increment  $\Delta R_{\text{th}20}$ . However, in this case if the thermal resistance increment is deliberately made zero, namely  $\Delta R_{\text{th20}}=0$ , with the measured temperatures in Fig. 12(b), a provisional Condition Indicator, denoted as  $\gamma'$  is derived through the bespoke iterative algorithm. This provisional result of  $\gamma'$  is shown in Fig. 14 (c). Its difference to  $\gamma_n$  is shown in Fig. 14 (d) and is referred to as  $\varepsilon \gamma$ , which corresponds to a certain level of solder degradation  $\Delta R_{\text{th}20}$  under the given operating point and external environment condition. A look-up table can then be constructed between  $\epsilon \gamma$  and  $\Delta R_{\text{th}20}$  as shown in Fig. 7. For online condition monitoring as in Fig. 8, firstly  $\epsilon\gamma$  is calculated. Then  $\Delta R_{\text{th}20}$  is obtained using the  $\epsilon \gamma - \Delta R_{\text{th}20} (\gamma_n)$ look-up table. Finally, with  $\Delta R_{\text{th}20}$ , the true value of the Condition Indicator  $\gamma$  (Fig. 14 (b)) is obtained by going through the bespoke iterative algorithm once again.





Fig. 13 Average downwards heat transfer rate from chip or baseplate to heatsink in two cases: health and aged conditions.





As the chip temperature increases with solder degradation, its power loss may also increase depending on the electric operating point and device characteristics. Fig.15 shows the results of the increased power loss  $(\Delta P_1/\Delta P_2)$  in the form of heat being dissipated in the two opposite directions. 10-20% of the total increased power loss  $(\Delta P)$  passes through the silicone gel. In this case, a method based on the loss estimation in the downwards direction is likely to give large errors. Take the method of Xiang *et al.* [22] as an example. This is analysed below.



Fig. 15 Increased IGBT 1 power loss ( $\Delta P$ ) dissipated in two directions.

The actual junction-to-heatsink thermal resistance  $(R_{thjh})$  can be expressed in terms of the junction-to-heatsink temperature difference  $(T_J-T_h)$  and the downwards power loss  $(P_2)$ :

$$R_{\rm thjh} = \frac{T_{\rm J} - T_{\rm h}}{P_2} \tag{10}$$

However, by the previous method, the junction temperature  $(T_1)$  is derived from the measured downwards power loss  $(P_2)$ , which is likely to be lower than the actual junction temperature (if in the PTC region). Then the junction-to-heatsink thermal resistance  $(R_{thijh})$  can be written as

$$R'_{\rm thjh} = \frac{T'_{\rm J} - T_{\rm h}}{P_2} \tag{11}$$

The difference of the junction-to-heatsink thermal resistance estimated in (10) and (11) is essentially due to the junction temperature. An under or over-estimated junction temperature will cause a large error in condition monitoring. From (11), the increment of the junction-to-heatsink thermal resistance  $\Delta R'_{\text{th20}}$  of IGBT 1 can be obtained. The ratio of  $\Delta R'_{\text{th20}}$  and the actual  $\Delta R_{\text{th20}}$  under the ambient and inlet temperatures of  $T_{1,\infty} \approx 20^{\circ}$ C and  $T_{2,\infty} \approx 37.5^{\circ}$ C is shown in Fig. 16.  $\Delta R'_{\text{th20}}$  is approximately 70%~90% of the actual  $\Delta R_{\text{th20}}$ , suggesting that the thermal resistance increment estimated with the assumption that all the power loss dissipates downwards is less than the actual increment.



Fig. 16 IGBT  $1 \Delta R'_{\text{th}20} / \Delta R_{\text{th}20}$  against actual increment of thermal resistance (  $\Delta R_{\text{th}20}$ ).

To appreciate the effect of external environment temperatures on the heat flow distribution in the two directions, the inlet water is adjusted to two temperatures:  $T_{2,\infty} \approx 28.2^{\circ}$ C and  $T_{2,\infty} \approx 37.5^{\circ}$ C. In both cases the ambient air temperature is  $T_{1,\infty} \approx 20^{\circ}$ C. Both IGBT 1 and Diode 1 suffer from the varying increment of the junction-to-heatsink thermal resistance ( $R_{\text{thih}}$  from  $R_0$  to  $R_3$ ) at the same load current of 40 A. Fig. 17 shows that for IGBT 1 and Diode 1, the proportions of the upwards and downwards heat transfer rates ( $P_1$  and  $P_2$ ) in the total power loss P are affected by the junction-toheatsink thermal resistance. It also implies that a rise in the inlet water temperature leads to an increase in the percentage of the upwards heat transfer rate through the silicone gel  $(P_1/P_1)$ and a reduction in that downwards through the baseplate  $(P_2/P)$ . The effect of the ambient air temperature can be similarly analysed. The results show that although the environmental condition indeed affects the thermal dissipation, the proposed method can be decoupled from such effects and still effectively track the aging process. This is because the method only relies on the thermoelectrical response of the power module itself which has a much faster thermal time constant than the ambient and water cooling system. Fig.17 also shows that increasing solder degradation lowers the proportion of heat flow through the baseplate and thus forces more in the other direction. This phenomenon can be amplified in the ratio of the heat transfer rates in the two opposite directions, as an indication of the solder health condition.



Fig. 17 Proportion of heat transfer rates in two directions.

Similarly, a set of variables, the Condition Indicator  $\gamma$ , discrepancy  $\epsilon \gamma$  and junction temperature  $T_{\rm J}$  are derived from

experiment data for different levels of degradation, electric operating points and ambient conditions. Those for IGBT 1 and Diode 1 are shown in Fig.s 18 and 19 respectively. It is clear that  $\gamma$  and  $\epsilon \gamma$  increase with the level of solder degradation, which is reflected in junction-to-heatsink thermal resistance  $R_{\text{thih}} = R_{\text{n}} + \Delta R_{\text{th}20}$ . Distinctive correspondence exists between  $\gamma$  or  $\epsilon\gamma$  and the increment of thermal resistance  $\Delta R_{\text{th}20}$ . Note that  $\gamma$  and  $\epsilon \gamma$  increase significantly as the thermal resistance increases from  $R_0$  to  $R_1$ , particularly with greater temperature difference external environmental (e.g.  $T_{2,\infty} \approx 37.5^{\circ}$ C,  $T_{1,\infty} \approx 20^{\circ}$ C), increasing the sensitivity of Condition Indicator. The method is adequately sensitive to distinguish solder degradation from 0% to 20%. This validates the condition monitoring method by using the proposed indicator  $\gamma$ .

The assessment of the health condition is augmented by indicator  $\gamma$ . Recognizing the heat dissipations in the two opposite directions is important for estimating the increment of the junction-to-case (baseplate) thermal resistance, and this has been taken into account in the proposed generic model. With the condition monitoring strategy, there is no longer the need to measure the initial thermal resistance values for each module, which simplifies the calibration for modules within the manufacturing tolerance. The junction temperature is estimated for the safety of operation. When proceeding in diagnosis, it does not need to distinguish the PTC and NTC regions of the *i*-v curves of the IGBT and diode. In fact, the method is far less dependent on the temperature sensitivity of the device characteristics, but more directly reflects the changed internal condition of heat dissipation.

It is hoped that the proposed condition monitoring method, which makes use of the relative change of heat transfer rate in different directions, can provide useful additional information in practice. However, the method in its present form has potential limitations. For instance, it requires multi-point temperature sensing and fixing temperature sensors in the silicone gel which may be subject to the effects of vibration. Measurement errors can cause inaccuracies in the calculated results. Further investigation on sensing points reduction through modelling and internal sensor integration in the power module packaging would be necessary to improve the proposed method in these aspects.

#### V. CONCLUSION

This paper proposes a condition monitoring method for tracking the solder layer degradation in an IGBT power module. The method assesses that degradation by the heat dissipation path split from the device by accurately measuring temperatures. A theoretical model is established to investigate the degradation response. A key variable Condition Indicator  $\gamma$  is proposed to indicate the ratio change of heat transfer rates in the opposing directions: upwards via the silicone gel and downwards via the baseplate. The proportion of heat dissipation in the two split paths depends not only on the thermal resistances but also the chip temperature differences

to the two external environments. Condition Indicator  $\gamma$  is closely related to the severity of degradation, and its response has been analysed with respect to different electric operating points and external temperature conditions.

A bespoke iterative algorithm is proposed to estimate the internal junction temperature with a known increment of the thermal resistance, and then calculate the Condition Indicator  $\gamma$ . With respect to an electric operating point and external environment condition, a set of  $\epsilon \gamma - \Delta R_{\text{th20}}(\gamma_n)$  look-up tables for different levels of solder degradation can be established in a calibration process by exploiting the bespoke iterative algorithm. Based on the look-up tables, a comprehensive evaluation of power module solder health condition can be performed. The condition monitoring method is based solely on external measurements and is not strongly reliant on device temperature dependency. IGBTs and diodes can be treated in the same way. The method is applied during the thermal steady-state operation and can be extended to multi-chip-in-parallel power modules. It is hoped that the study complements the current active research on TSEP monitoring.







#### REFERENCES

- [1] M. Liserre, R. Cárdenas, M. Molinas and J. Rodriguez, "Overview of Multi-MW Wind Turbines and Wind Parks," IEEE Trans Industrial Electronics, vol. 58, DOI: 10.1109/TIE.2010.2103910, no. 4, pp. 1081-1095, April 2011.
- Z. Xu, F. Wang and Z. Liang, "Investigation of Si IGBT Operation at [2] 200°C for Traction Applications," IEEE Transactions on Power Electronics., vol. 28, DOI:10.09/TPEL.2012.2217398, no.5, pp. 2604-2615, May 2013.
- [3] P. Tavner, "How are we going to make offshore wind farms more reliable?," Supergen Wind, 2011.
- [4] D. Xiang, Li Ran, P. J. Tavner and S. Yang, "Control of a doubly fed induction generator in a wind turbine during grid fault ride-through," in IEEE Transactions on Energy Conversion, vol. 21, DOI: 10.1109/TEC.2006.875783, no. 3, pp. 652-662, Sept. 2006.
- K. Ma, M. Liserre, F. Blaabjerg and T. Kerekes, "Thermal Loading and [5] Lifetime Estimation for Power Device Considering Mission Profiles in Wind Power Converter," in IEEE Transactions on Power Electronics, vol. 30, DOI: 10.1109/TPEL.2014.2312335, no. 2, pp. 590-602, Feb. 2015.
- [6] W. Lai, M. Chen, L. Ran, H. Qin, O. Alatise, P. A. Mawby, "Study on the lifetime characteristics of power modules under power cycling conditions," IET Power Electronics, vol. 9, DOI: 10.1049/ietpel.2015.0225, no. 5, pp. 1045-1052, 20 4 2016.

- [7] V. Smet, F. Forest, J.J. Huselstein, F. Richardeau, Z. Khatir, S. Lefebvre, and M. Berkani, "Ageing and Failure Modes of IGBT Modules in High-Temperature Power Cycling," in IEEE Transactions on Industrial Electronics, vol. 58, DOI: 10.1109/TIE.2011.2114313, no. 10, pp. 4931-4941, Oct. 2011.
- [8] S. Yang, D. Xiang, A. Bryant, P. Mawby, L. Ran, and P. Tavner, "Condition monitoring for device reliability in power electronic converters: A review," in IEEE Transactions on Power Electronics, vol. 25, no. 11, pp. 2734-2752, 2010.
- [9] M. Ciappa, "Selected failure mechanisms of modern power modules," Microelectronics Reliability, vol. 42, no. 4-5, pp. 653-667, 2002.
- [10] H. Huang and P. Mawby, " A Lifetime Estimation Technique for Voltage Source Inverter," in IEEE Transactions on Power Electronics, vol. 28, no. 8, pp. 4113-4119, 2013.
- [11] W. Lai, M. Chen, L. Ran, O. Alatise, S. Xu, and P.Mawby, " Low ΔTj stress cycle effect in IGBT power module die-attach lifetime modeling, *IEEE Transactions on Power Electronics,* vol. 31, DOI: 10.1109/TPEL.2015.2501540, no. 9, pp. 6575-6585, Sept. 2016.
- [12] B. Hu, J.O. Gonzalez, L. Ran, H. Ren, Z. Zeng, W. Lai, B. Gao, O. Alatise, H. Lu, C. Bailey, P. A. Mawby, "Failure and reliability analysis of a Sic power module based on stress comparison to a Si device," IEEE Transactions on Device and Materials Reliability, vol. 17, DOI: 10.1109/TDMR.2017.2766692, no. 4, pp. 727-737, 2017.
- [13] I. F. Kovačević, U. Drofenik and J. W. Kolar, "New physical model for lifetime estimation of power modules," The 2010 International Power Electronics Conference - ECCE ASIA-, Sapporo, DOI: 10.1109/ IPEC.2010.5543755, pp. 2106-2114, 2010.
- [14] B. Gao, F. Yang, M. Chen, L. Ran, I. Ullah, S. Xu, P. A. Mawby "A temperature gradient-based potential defects identification method for IGBT module," IEEE Transactions on Power Electronics, vol. 32, DOI: 10.1109/tpel.2016.2565701, no. 3, pp. 2227-2242, Mar 2017.
- [15] L. A. Watanabe and I. Omura, "Real-time failure imaging system under power stress for power semiconductors using scanning acoustic tomography (SAT)," Microelectronics Reliability, vol. 52, no. 9/10, pp. 2081-2086, 2012.
- [16] C. Maierhofer, M. Rollig, H. Steinfurth, M. Ziegler, M. Kreutzbruck, C. Scheuerlein, and S. Heck, "Non-destructive testing of Cu solder connections using active thermography," NDT E Int., vol. 52, pp. 103-111.2012.
- [17] Y. Avenas, L. Dupont and Z. Khatir, "Temperature Measurement of Power Semiconductor Devices by Thermo-Sensitive Electrical Parameters-A Review," in IEEE Transactions on Power Electronics, vol. 27, DOI: 10.1109/TPEL.2011.2178433, no. 6, pp. 3081-3092, June 2012.
- [18] B. Ji, X. Song, W. Cao, V. Pickert, Y. Hu, J. W. Mackersie and G. Pierce, "In Situ Diagnostics and Prognostics of Solder Fatigue in IGBT Modules for Electric Vehicle Drives," in IEEE Transactions on Power Electronics, vol. 30, DOI: 10.1109/TPEL.2014.2318991, no. 3, pp. 1535-1543, March 2015.
- [19] Z. Hu, M. Du. K. Wei, "Online Calculation of the Increase in Thermal Resistance Caused by Solder Fatigue for IGBT Modules," in IEEE Transactions on Device and Materials Reliability, vol. 17, DOI: 10.1109/TDMR.2017.2746571, no. 6, pp. 1168-1178, Dec. 2017.
- [20] D. Brown, M. Abbas, A. Ginart, I. Ali, P. Kalgren, and G. Vachtsevanos, 'Turn-off Time as An Early Inidcator of Insulated Gate Bipolar Transistor Latch-up," in IEEE Transactions on Power Electronics, vol. 27, DOI: 10.1109/TPEL.2011.2159848, no. 2, pp. 479-489, Feb. 2012.
- [21] Z. Wang, B. Tian, W. Qiao and L. Qu, "Real-Time Aging Monitoring for IGBT Modules Using Case Temperature," in IEEE Transactions on Industrial Electronics, vol. 63, DOI: 10.1109/TIE.2015.2497665, no. 2, pp. 1168-1178, Feb. 2016.
- [22] B. Hu, Z. Hu, L. Ran, P. A. Mawby, C. Jia, C. Ng, P. McKeever, "Deep Learning Neural Networks for Heat-Flux Health Condition Monitoring Method of Multi-Device Power Electronics System," 2019 IEEE Energy Conversion Congress and Exposition (ECCE), Baltimore, MD, USA, DOI: 10.1109/ECCE.2019.891266, pp. 3769-3774, 2019.
- [23] D. Xiang, L. Ran, P. Tavner, A. Bryant, S. Yang and P. Mawby, "Monitoring Solder Fatigue in a Power Module Using Case-Above-Ambient Temperature Rise," in IEEE Transactions on Industry Applications, vol. 47, DOI: 10.1109/TIA.2011.2168556, no. 6, pp. 2578-2591, Nov.-Dec. 2011.

#### IEEE POWER ELECTRONICS REGULAR PAPER/LETTER/CORRESPONDENCE

- [24] B. Hu, Z. Hu, L.Ran, C. Ng, C.Jia, P. McKeever, P. Tavner, C. Zhuang, H. Jiang, and P. Mawby, "Heat-Flux Based Condition Monitoring of Multi-chip Power Modules Using a Two-Stage Neural Network," in *IEEE Transactions on Power Electronics*, vol. 36, DOI: 10.1109/TPEL.2020.3045604, no. 7, pp. 7489-7500, July. 2021.
- [25] C. Chen, F. Luo, and Y. Kang, "A Review of SiC Power Module Packaging: Layout, Material System and Integration," in CPSS Transactions on Power Electronics and Applications, vol. 2, DOI: 10.24295/CPSSTPEA.2017.00017, no. 3, pp. 170-186, Sep. 2017.
- [26] M. R. Stan, K. Skadron, M. Barcella, W. Huang, K. Sankaranarayanan, S. Velusamy, "HotSpot: A Dynamic Compact Thermal Model at the Processor-Architecture Level," in *Microelectronics Journal*, Vol. 34, DOI: <u>org/10.1016/S0026-2692(03)00206-4</u>, Issue 12, pp. 1153-1165, Dec. 2003.
- [27] Y. A. Cengel and A. J. Ghajar, "Heat and mass transfer fundamental and applications," Fourth Edition in SI Units, *McGraw-Hill*, pp.17-127, 2011.
- [28] A. S. Bahman, K. Ma and F. Blaabjerg, "A Lumped Thermal Model Including Thermal Coupling and Thermal Boundary Conditions for High-Power IGBT Modules," in *IEEE Trans. on Power Electronics*, vol. 33, no. 3, pp. 2518-2530, March 2018.
- [29] SKM50GB12T4 Datasheet, Jan.2020. [Online]. Available: https://www.semikron.com/products/product-classes/igbtmodules/detail/skm50gb12t4-22892000.html/.
- [30] A. Wrintrich, P. Beckedahl, "Thermal resistance of IGBT Modulesspecification and modelling", Application Note AN1404, SEMIKRON, 2014.
- [31] "Transient Thermal Measurements and thermal equivalent circuit models (infineon.com)", Application note, AN2015, vol. 1, no. 5, pp. 10–13, Apr. 2020.
- [32] A. Mohammed, B. Hu, Z. Hu, S. Djurovic, L. Ran, M.Barnes, P. A. Mawby, "Distributed thermal monitoring of wind turbine power electronic modules using FBG sensing technology," in *IEEE Sensors Journal*, Vol. 20, no. 17, pp. 9886-9894, Sept. 2020.