

#### A Thesis Submitted for the Degree of PhD at the University of Warwick

#### Permanent WRAP URL:

http://wrap.warwick.ac.uk/161795

#### Copyright and reuse:

This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page.

For more information, please contact the WRAP Team at: wrap@warwick.ac.uk

# Uneven Degradation and Condition Monitoring of Multi-Chip Power Modules for Wind Turbines

by

## **Borong Hu**

A thesis submitted for the degree of

Doctor of Philosophy

University of Warwick, School of Engineering

February 2021

## Contents

| List of Figures      | V     |
|----------------------|-------|
| List of Tables       | xi    |
| Acknowledgements     | xii   |
| Declaration          | xiii  |
| List of Publications | xiv   |
| Abstract             | xvii  |
| Nomenclature         | xviii |

| 1 | INT | TRODUCTION1                         |                                    |  |
|---|-----|-------------------------------------|------------------------------------|--|
|   | 1.1 | BACKO                               | GROUND                             |  |
|   | 1.2 | PACKA                               | GING OF HIGH-POWER SEMICONDUCTORS  |  |
|   | 1.3 | Ageing Mechanisms and Failure Modes |                                    |  |
|   |     | 1.3.1                               | Thermomechanical stress            |  |
|   |     | 1.3.2                               | Electrical stress                  |  |
|   |     | 1.3.3                               | Ageing characteristic parameters12 |  |
|   | 1.4 | Condi                               | tion Monitoring13                  |  |
|   |     | 1.4.1                               | Device terminal features           |  |
|   |     | 1.4.2                               | Built-in sensors                   |  |
|   |     | 1.4.3                               | Algorithm-based methods            |  |
|   | 1.5 | Μοτιν                               | ATIONS AND OBJECTIVES              |  |
|   | 1.6 | CONTR                               | IBUTIONS                           |  |
|   | 1.7 | THESIS                              | OUTLINE                            |  |

#### 2 INITIAL AGEING DEVELOPMENT OF POWER MODULES UNDER

| RE  | CALISTIC STRESS CONDITIONS | 26 |
|-----|----------------------------|----|
| 2.1 | INTRODUCTION               | 26 |
| 2.2 | Experiment Techniques      | 28 |
|     | 2.2.1 CT analysis          | 28 |

|   |                         | 2.2.2                                                                                                         | Power cycling test                                                                                                                                                                                                           | 29                                                                                      |
|---|-------------------------|---------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
|   | 2.3                     | CT SCA                                                                                                        | ANNING RESULTS                                                                                                                                                                                                               |                                                                                         |
|   |                         | 2.3.1                                                                                                         | Statistical analysis of voids                                                                                                                                                                                                |                                                                                         |
|   |                         | 2.3.2                                                                                                         | Void growth and crack initialisation                                                                                                                                                                                         |                                                                                         |
|   | 2.4                     | FEA M                                                                                                         | IODEL                                                                                                                                                                                                                        |                                                                                         |
|   | 2.5                     | Void D                                                                                                        | DEFECTS                                                                                                                                                                                                                      | 41                                                                                      |
|   |                         | 2.5.1                                                                                                         | Internal void                                                                                                                                                                                                                | 42                                                                                      |
|   |                         | 2.5.2                                                                                                         | Void at solder-to-chip boundary                                                                                                                                                                                              | 43                                                                                      |
|   | 2.6                     | CRACK                                                                                                         | DEFECTS                                                                                                                                                                                                                      | 45                                                                                      |
|   |                         | 2.6.1                                                                                                         | Crack initialisation                                                                                                                                                                                                         | 45                                                                                      |
|   |                         | 2.6.2                                                                                                         | Crack growth                                                                                                                                                                                                                 | 46                                                                                      |
|   | 2.7                     | Experi                                                                                                        | IMENTAL VALIDATION                                                                                                                                                                                                           | 48                                                                                      |
|   | 2.8                     | Long-                                                                                                         | TERM LIFETIME ESTIMATION                                                                                                                                                                                                     | 51                                                                                      |
|   | 2.9                     | SUMMA                                                                                                         | 4RY                                                                                                                                                                                                                          | 53                                                                                      |
|   |                         |                                                                                                               |                                                                                                                                                                                                                              |                                                                                         |
|   |                         |                                                                                                               |                                                                                                                                                                                                                              |                                                                                         |
| 3 | LIF                     | ETIME                                                                                                         | AND UNEVEN DEGRADATION OF MULTI-CHIP POV                                                                                                                                                                                     | WER                                                                                     |
| 3 |                         |                                                                                                               | AND UNEVEN DEGRADATION OF MULTI-CHIP POV<br>S IN WIND TURBINES                                                                                                                                                               |                                                                                         |
| 3 |                         | DDULES                                                                                                        |                                                                                                                                                                                                                              | 54                                                                                      |
| 3 | MO                      | DDULES<br>Introe                                                                                              | S IN WIND TURBINES                                                                                                                                                                                                           | <b>54</b><br>54                                                                         |
| 3 | <b>M</b> (<br>3.1       | DDULES<br>Introe                                                                                              | S IN WIND TURBINES                                                                                                                                                                                                           | <b>54</b><br>54                                                                         |
| 3 | <b>M</b> (<br>3.1       | DDULES<br>Introe<br>Multi•                                                                                    | S IN WIND TURBINES<br>DUCTION<br>-CHIP POWER MODULE                                                                                                                                                                          | <b>54</b><br>54<br>55<br>56                                                             |
| 3 | <b>M</b> (<br>3.1       | DDULES<br>Introe<br>Multi-<br>3.2.1                                                                           | S IN WIND TURBINES<br>DUCTION<br>-CHIP POWER MODULE<br>Device electrical characteristics                                                                                                                                     | <b>54</b><br>54<br>55<br>56<br>58                                                       |
| 3 | <b>M</b> (<br>3.1       | DDULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3                                                         | S IN WIND TURBINES<br>DUCTION<br>-CHIP POWER MODULE<br>Device electrical characteristics<br>Finite element analysis                                                                                                          | <b>54</b><br>55<br>56<br>58<br>60                                                       |
| 3 | <b>M(</b><br>3.1<br>3.2 | DDULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3                                                         | S IN WIND TURBINES<br>DUCTION<br>-CHIP POWER MODULE<br>Device electrical characteristics<br>Finite element analysis<br>Thermal network of power modules                                                                      | <b>54</b><br>55<br>56<br>56<br>58<br>60<br>61                                           |
| 3 | <b>M(</b><br>3.1<br>3.2 | DDULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3<br>ELECTH                                               | S IN WIND TURBINES<br>DUCTION<br>-CHIP POWER MODULE<br>Device electrical characteristics<br>Finite element analysis<br>Thermal network of power modules<br>ROTHERMAL MODELLING                                               | <b>54</b><br>55<br>56<br>58<br>60<br>61                                                 |
| 3 | <b>M(</b><br>3.1<br>3.2 | DULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3<br>ELECTH<br>3.3.1<br>3.3.2                              | S IN WIND TURBINES<br>DUCTION<br>-CHIP POWER MODULE<br>Device electrical characteristics<br>Finite element analysis<br>Thermal network of power modules<br>ROTHERMAL MODELLING<br>Wind turbines                              | <b>54</b><br>55<br>56<br>58<br>60<br>61<br>61<br>62                                     |
| 3 | M(<br>3.1<br>3.2<br>3.3 | DULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3<br>ELECTH<br>3.3.1<br>3.3.2                              | S IN WIND TURBINES<br>OUCTION<br>-CHIP POWER MODULE<br>Device electrical characteristics<br>Finite element analysis<br>Thermal network of power modules<br>ROTHERMAL MODELLING<br>Wind turbines<br>Electro-thermal modelling | <b>54</b><br>55<br>56<br>58<br>60<br>61<br>61<br>62<br>63                               |
| 3 | M(<br>3.1<br>3.2<br>3.3 | DULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3<br>ELECTH<br>3.3.1<br>3.3.2<br>LIFETIN                   | S IN WIND TURBINES                                                                                                                                                                                                           | <b>54</b><br>55<br>56<br>56<br>58<br>60<br>61<br>61<br>61<br>62<br>63<br>63             |
| 3 | M(<br>3.1<br>3.2<br>3.3 | DULES<br>INTROE<br>MULTI-<br>3.2.1<br>3.2.2<br>3.2.3<br>ELECTH<br>3.3.1<br>3.3.2<br>LIFETIN<br>3.4.1<br>3.4.2 | S IN WIND TURBINES                                                                                                                                                                                                           | <b>54</b><br>55<br>56<br>56<br>58<br>60<br>61<br>61<br>61<br>61<br>61<br>63<br>63<br>65 |

| 4 | CO                                                                   | NDITIC | ON MONITORING OF MULTI-CHIP MODULES WI          | TH UNEVEN |
|---|----------------------------------------------------------------------|--------|-------------------------------------------------|-----------|
|   | DE                                                                   | GRAD   | ATION                                           | 72        |
|   | 4.1                                                                  | Intro  | DUCTION                                         | 72        |
|   | 4.2                                                                  | CHAR   | ACTERIZATION OF UNEVEN DEGRADATION              | 73        |
|   |                                                                      | 4.2.1  | Electrical characterisation                     | 73        |
|   |                                                                      | 4.2.2  | Thermal characterisation                        | 76        |
|   | 4.3                                                                  | Exper  | RIMENTAL TEST RIG                               |           |
|   | 4.4                                                                  | Two-S  | STAGE NEURAL NETWORK                            |           |
|   |                                                                      | 4.4.1  | First stage NNs                                 |           |
|   |                                                                      | 4.4.2  | Generalization and extrapolation                |           |
|   |                                                                      | 4.4.3  | Second stage NN                                 |           |
|   | 4.5                                                                  | Exper  | RIMENTAL RESULTS                                |           |
|   |                                                                      | 4.5.1  | Power loss estimation of first stage NNs        |           |
|   |                                                                      | 4.5.2  | Degradation level monitoring by second stage NN | 90        |
|   |                                                                      | 4.5.3  | Untrained conditions                            |           |
|   | 4.6                                                                  | SUMM   | IARY                                            | 95        |
| 5 | ADVANCED MEASURES FOR CONDITION MONITORING OF MULTI-<br>CHIP MODULES |        |                                                 |           |
|   | 5.1                                                                  | Intro  | DUCTION                                         |           |
|   | 5.2                                                                  | Data   | LABELLING FOR UNEVEN DEGRADATION LEVEL          | 97        |
|   |                                                                      | 5.2.1  | Data labelling platform                         | 97        |
|   |                                                                      | 5.2.2  | Electrical equivalence                          |           |
|   |                                                                      | 5.2.3  | Thermal equivalence                             |           |
|   | 5.3                                                                  | FBG F  | FOR TEMPERATURE MEASUREMENT                     |           |
|   |                                                                      | 5.3.1  | FBG sensor integration and calibration          |           |
|   |                                                                      | 5.3.2  | Temperature measurement results                 |           |
|   | 5.4                                                                  | DEEP   | Learning for Condition Monitoring               | 115       |
|   |                                                                      | 5.4.1  | Improved condition monitoring algorithm         | 116       |
|   |                                                                      | 5.4.2  | NN structure                                    | 117       |
|   |                                                                      | 5.4.3  | Condition monitoring results                    |           |
|   | 5.5                                                                  | SUMM   | IARY                                            | 121       |

| 6  | 6 CONDITION MONITORING OF WIND TURBINE CONVERTERS BASED |                          |                              |  |
|----|---------------------------------------------------------|--------------------------|------------------------------|--|
|    | ON SCADA DATA                                           |                          |                              |  |
|    | 6.1                                                     | Intro                    | DUCTION                      |  |
|    | 6.2 SCADA DATA BASED CONDITION MONITORING               |                          |                              |  |
|    |                                                         | 6.2.1                    | Data pre-processing          |  |
|    |                                                         | 6.2.2                    | Network model                |  |
|    |                                                         | 6.2.3                    | Fault detection              |  |
|    |                                                         | 6.2.4                    | CM with online learning      |  |
|    | 6.3                                                     | 6.3 Cost Function Design |                              |  |
|    |                                                         | 6.3.1                    | Clustering of SCADA channels |  |
|    |                                                         | 6.3.2                    | Principal component analysis |  |
|    |                                                         | 6.3.3                    | Probability distribution     |  |
|    |                                                         | 6.3.4                    | 2-D joint distribution       |  |
|    |                                                         | 6.3.5                    | Probability density weights  |  |
|    | 6.4 CONDITION MONITORING RESULTS AND ANALYSIS           |                          |                              |  |
|    |                                                         | 6.4.1                    | SCADA data pre-processing    |  |
|    |                                                         | 6.4.2                    | Online learning results      |  |
|    |                                                         | 6.4.3                    | Fault detection              |  |
|    | 6.5                                                     | SUMM                     | IARY                         |  |
|    |                                                         |                          |                              |  |
| 7  | COI                                                     | NCLUS                    | SIONS AND FUTURE WORK        |  |
|    | 7.1                                                     | Conci                    | LUSIONS                      |  |
|    | 7.2                                                     | Futuf                    | re Work                      |  |
|    |                                                         |                          |                              |  |
| BI | BLIO                                                    | GRAPI                    | HY                           |  |

## **List of Figures**

| Figure 2.10: Distribution of void radii                                                              |
|------------------------------------------------------------------------------------------------------|
| Figure 2.11: Void growth after 35,000 low $\Delta T_j$ cycles                                        |
| Figure 2.12: Development of an internal void, $(a)/(b)/(c)/(d)$ before and $(e)/(f)/(g)/(h)$ after   |
| power cycling                                                                                        |
| Figure 2.13: Development of a void close to chip-solder boundary, (a)/(b)/(c)/(d) before and         |
| (e)/(f)/(g)/(h) after power cycling                                                                  |
| Figure 2.14: The development of a small crack on the chip-solder boundary, (a)/(b)/(c)               |
| before and (d)/(e)/(f) after power cycling                                                           |
| Figure 2.15: (a) 2D symmetrical model and the boundary conditions and (b) mesh around a              |
| void of 10 $\mu m$ radius and a crack of 2 $\mu m$ length40                                          |
| Figure 2.16: Transient thermal impedance from simulation model and product datasheet.41              |
| Figure 2.17: (a) stress and (b) accumulated inelastic strain distribution around a 20 $\mu m$ radius |
| void42                                                                                               |
| Figure 2.18: Stress and fatigue inelastic strain on the top edge of voids                            |
| Figure 2.19: Stress distribution (a) overall and (b) at the corner when a void of $10\mu m$ radius   |
| touches the chip-solder boundary44                                                                   |
| Figure 2.20: Accumulated inelastic strain distribution (a) overall and (b) at the corner when        |
| a void of $10 \mu m$ radius touches the solder-chip boundary44                                       |
| Figure 2.21: (a) maximum stress and (b) fatigue inelastic strain for voids with different            |
| attached angles44                                                                                    |
| Figure 2.22: (a) stress distribution and (b) accumulated inelastic strain distribution in the        |
| solder layer without defects45                                                                       |
| Figure 2.23: (a) stress distribution and (b) accumulated inelastic strain distribution around a      |
| crack                                                                                                |
| Figure 2.24: (a) stress distribution and (b) accumulated inelastic strain round an initial crack     |
| 10 μm away from the boundary46                                                                       |
| Figure 2.25: The relationship between thermal resistance and crack length47                          |
| Figure 2.26: Stress and fatigue inelastic strain on the tip of crack with different lengths48        |
| Figure 2.27: The growth rate of voids with different initial sizes48                                 |
| Figure 2.28: Multi-stage power cycling test                                                          |
| Figure 2.29: Solder material lifetime results from physics-of-failure modelling and                  |
| experiments                                                                                          |

| Figure 3.18: The uneven degradation development of paralleled diodes with 5% $R_{th}$ incr | ease |
|--------------------------------------------------------------------------------------------|------|
| of diode 1                                                                                 | 70   |

| Figure 4.1: Simulation results (a) junction temperature, and (b) current of the aged and      |
|-----------------------------------------------------------------------------------------------|
| parallel diodes74                                                                             |
| Figure 4.2: The ageing process of the die-attach solder layer74                               |
| Figure 4.3: (a) Electrical and (b) thermal features of parallel diodes under different DLs75  |
| Figure 4.4: Heat flux distribution on (a) external surfaces and (b) baseplate bottom surface  |
| of a healthy module                                                                           |
| Figure 4.5: Temperature distribution on the top chip surfaces77                               |
| Figure 4.6: Temperature distribution on the bottom surface of baseplate78                     |
| Figure 4.7: Temperature distribution on diode 1 top surfaces with different degradation       |
| levels (a)-(e) from 1.1 <i>R</i> <sub>th0</sub> to 1.5 <i>R</i> <sub>th0</sub>                |
| Figure 4.8: Temperature distribution on (a) top surface of diode 1 and (b) case surface       |
| beneath diode 180                                                                             |
| Figure 4.9: Instrumentation points in the power module system                                 |
| Figure 4.10: Multi-chip experimental rig, (a) power module and (b) heatsink82                 |
| Figure 4.11: Thermal pad attached on baseplate                                                |
| Figure 4.12: Complete experimental rig                                                        |
| Figure 4.13: Structure of a sub-NN in the first stage                                         |
| Figure 4.14: Training of sub-neural networks with variable operating point                    |
| Figure 4.15: Illustration of the method by the deviation between EPL and OPL87                |
| Figure 4.16: Block diagram of the condition monitoring method                                 |
| Figure 4.17: Second stage neural network structure for degradation classification             |
| Figure 4.18: Power loss estimation results by sub-neural networks with different degradation  |
| levels and operating points                                                                   |
| Figure 4.19: Power loss estimation from different sub-neural networks with degradation        |
| level DL1-3 and 40 A total current                                                            |
| Figure 4.20: Estimation error histograms for 5 sub-neural networks with 3 degradation levels: |
| DL1-1, DL1-3, and DL1-590                                                                     |
| Figure 4.21: (a) Measurements under DL 3-1 and (b) diode 1 degradation level monitoring       |
| results                                                                                       |

| Figure 4.22: Recognition results for single device aged conditions: (a) diode 1 degraded and   |
|------------------------------------------------------------------------------------------------|
| (b) diode 2 degraded92                                                                         |
| Figure 4.23: Recognition results for conditions with both devices aged: (a) diode 1            |
| classification and (b) diode 2 classification                                                  |
| Figure 4.24: Recognition of an intermediate degradation level: neural network training         |
| results for (a) diode 1 and (b) for diode 2 and, degradation recognition results for (c) diode |
| 1 and (d) diode 2                                                                              |
|                                                                                                |

| Figure 5.1: The half-bridge inverter test, (a) circuit diagram and (b) test rig98                 |
|---------------------------------------------------------------------------------------------------|
| Figure 5.2: Temperature measurement positions on heatsink                                         |
| Figure 5.3: Thermal pad setup for DL 5                                                            |
| Figure 5.4: Output of the test rig under 200V/40A100                                              |
| Figure 5.5: The layout of the uncased power module101                                             |
| Figure 5.6: Temperature distribution measured by thermal camera101                                |
| Figure 5.7: (a) Forward characteristic and (b) reverse recovery energy of the multi-chip          |
| power module102                                                                                   |
| Figure 5.8: (a) The diode forward characteristic and (b) reverse recovery energy in different     |
| temperature102                                                                                    |
| Figure 5.9: The FEA model of the single-chip module103                                            |
| Figure 5.10: Outflow heat flux on (a) top surface and (b) bottom surface of the power module      |
|                                                                                                   |
| Figure 5.11: The ageing process of die-attach in FEA model105                                     |
| Figure 5.12: The temperature distribution on diode surface with $1.2R_{th}$ , (a) practical crack |
| modelled and (b) with a thermal pad105                                                            |
| Figure 5.13: Average temperature on the chip surface under different DLs106                       |
| Figure 5.14: The ratio of the heat dissipating through top surface under different DLs106         |
| Figure 5.15: (a) FBG thermal senor design and (b) FBG array sensor instrumentation in the         |
| DUT107                                                                                            |
| Figure 5.16: Grooved power module108                                                              |
| Figure 5.17: FBG array thermal calibration109                                                     |
| Figure 5.18: FBG and TC temperature measurements under static conditions of load current          |
| (a) 10 A and (b) 50 A                                                                             |

| Figure 5.19: Test power module thermal network                                             |
|--------------------------------------------------------------------------------------------|
| Figure 5.20: Power losses calculation under (a) 10A and (b) 50A total current step-change  |
| conditions                                                                                 |
| Figure 5.21: Validation of temperature measurement based on the thermal resistance of TIM  |
| <i>R</i> <sub>th,TIM</sub> under (a) 10A and (b) 50A total current conditions              |
| Figure 5.22: Thermal measurement under dynamic condition115                                |
| Figure 5.23: Processing diagram of the condition monitoring method117                      |
| Figure 5.24: The basic two-stage NN for the improve condition monitoring method118         |
| Figure 5.25: The DNN structure in (a) first stage and (b) second stage118                  |
| Figure 5.26. Power losses distribution under (a) steady-state and (b) transient conditions |
|                                                                                            |
| Figure 5.27: Classification results of (a) basic NN and (b) deep learning architecture120  |

| Figure 6.1: Condition monitoring framework                                                           | 123   |
|------------------------------------------------------------------------------------------------------|-------|
| Figure 6.2: The network structure of (a) FCNN and (b) LSTM                                           | 124   |
| Figure 6.3: The flowchart of the PDW calculation process                                             | 127   |
| Figure 6.4: The hierarchical clustering dendrogram of SCADA channels                                 | 128   |
| Figure 6.5: The histogram and CDF of the 1 <sup>st</sup> PC for (a) cluster 1 and (b) cluster 2, and | l (c) |
| the joint probability distribution                                                                   | 130   |
| Figure 6.6: The consistency $C$ of FCNN and LSTM with four cost functions on the diffe               | rent  |
| amounts of training data                                                                             | 131   |
| Figure 6.7: The clustering results and corresponding error bar of each class                         | 131   |
| Figure 6.8: (a) The prediction accuracy distribution and (b) condition monitoring result             | s of  |
| healthy WT No.1                                                                                      | 132   |
| Figure 6.9: The prediction accuracy distribution and (b) condition monitoring results of N           | Jo.5  |
| WT with converter fault reported on day 195                                                          | 133   |

### List of Tables

| Table 1.1: Failure mechanisms caused by thermomechanical stress                                                                                                   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Table 1.2: Failure mechanisms caused by electrical stress [29]10                                                                                                  |
| Table 1.3: The ageing characteristic parameters of the high-power IGBT device12                                                                                   |
|                                                                                                                                                                   |
| Table 2.1: Average percentage growth of void area and volume                                                                                                      |
| Table 2.2: Packaging material parameters for FEA modelling                                                                                                        |
| Table 2.3: Parameters definition of Anand's model    40                                                                                                           |
| Table 2.4: Measured crack growth rate results and computed fatigue strain                                                                                         |
|                                                                                                                                                                   |
| Table 3.1: Packaging material properties for FEA modelling                                                                                                        |
| Table 3.2: The parameters of the power module thermal network                                                                                                     |
| Table 3.3: The parameters of Bayerer lifetime model                                                                                                               |
|                                                                                                                                                                   |
| Table 4.1: Diode thermal resistance at different degradation levels                                                                                               |
| Table 4.2: Neural network definition matrix based on combination of device degradation                                                                            |
| levels                                                                                                                                                            |
|                                                                                                                                                                   |
| Table 5.1: Operating conditions                                                                                                                                   |
| Tuese 511 operating conditions                                                                                                                                    |
| Table 5.2: Thermal resistance at different degradation levels    100                                                                                              |
|                                                                                                                                                                   |
| Table 5.2: Thermal resistance at different degradation levels    100                                                                                              |
| Table 5.2: Thermal resistance at different degradation levels100Table 5.3: Packaging material parameters for FEA modelling103                                     |
| Table 5.2: Thermal resistance at different degradation levels100Table 5.3: Packaging material parameters for FEA modelling103Table 5.4: Thermal pad parameters112 |

### Acknowledgements

First and foremost, I would like to express my sincere gratitude to my supervisor Professor Li Ran. His continuous guidance helped me in all the time, from Chongqing China to Coventry UK, from the first COMSOL model we built to the whole thesis finished. It is so lucky that I can work with him, who has always given me all the opportunities to drive myself in research. The most valuable thing is that he has shown me how a true researcher should have such enthusiasm for research and keep thinking all the time sharply.

I would like to thank Dr Chunjiang Jia, Dr Chong Ng and Paul McKeever, from Offshore Renewable Energy Catapult for sponsoring my PhD in Warwick. I am very grateful that they provided wind data, had me in the Blyth office and gave valuable suggestions on my research from the industrial perspective. I also thank Dr Nadia Kourra and Prof. Mark Williams from Warwick Manufacturing Group, who provided the CT equipment and pre-processed scanning data. My thanks also go to Dr Anees Mohammed and Dr Sinisa Djurovic from the University of Manchester for fabricating FBG sensors and the discussion about distributed temperature sensing scheme.

I would like to thank my colleagues in the PEATER group, especially thanks to Dr Jose Ortiz Gonzalez for revising my papers and helping me to solve experimental problems, and Dr Sylvia Konaklieva for working together on the solder scanning. It's also great to have my friends, Dr Fan Li, Dr Ruizhu Wu, Dr Tianxiang Dai, Xuan Guo, and Dan Luo, with the great time we had in the office, lab, restaurant, and even on Steam.

I would like to thank my friends and colleagues in Chongqing, where my research started and where we had a wonderful time. Especially thanks to Prof. Hui Li, Dr Zheng Zeng, Prof. Minyou Chen and Dr Wei Lai for teaching me loads of tools and rules that needed for researching.

I would like to express my gratitude to my parents for supporting me throughout and respecting my every choice. I especially thank my wife, who is such a precious treasure, accompanying me in the UK, supporting my work, and colouring my life.

### Declaration

This thesis is submitted to the University of Warwick in support of my application for the degree of Doctor of Philosophy. It has been composed by myself and has not been submitted in any previous application for any degree. The work presented (including data generated and data analysis) was carried out by the author except in the cases outlined below:

The power module samples were scanned by using the X-ray CT equipment at Warwick Manufacturing Group. The scanning results were analysed by the author and presented in Chapter 2.

The FBG fibre used in Chapter 5 was fabricated and installed by collaborating with the University of Manchester. The author provided the design specification of the FBG fibre, performed the tests, and analysed the results.

Parts of this thesis have been published by the author. The details are given in the List of Publications.

### **List of Publications**

#### Chapter 2

- J1. B. Hu, S. Konaklieva, N. Kourra, M. A. Williams, L. Ran, and W. Lai, "Long-Term Reliability Evaluation of Power Modules with Low Amplitude Thermomechanical Stresses and Initial Defects," in *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 9, no. 1, pp. 602-615, Feb. 2021, doi: 10.1109/JESTPE.2019.2958737.
- C1. B. Hu, S. Konaklieva, L. Ran, N. Kourra, M. A. Williams, W. Lai, and P. Mawby, "Long Term Reliability of Power Modules with Low Amplitude Thermomechanical Stresses and Initial Defects," 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Portland, OR, 2018, pp. 5831-5838, doi: 10.1109/ECCE.2018.8558137.

#### Chapter 3

C2. B. Hu, X. Guo, S. Konaklieva, L. Ran, H. Li, C. Ng, and P. McKeever, "Lifetime Consumption of Wind Turbine Power Converter in the Whole Wind Speed Range," *The 9th International Energy Conference REMOO*, Hong Kong, 2019.

#### Chapter 4

- J2. B. Hu, S. Konaklieva, S. Xu, J. Ortiz-Gonzalez, L. Ran, C. Ng, P. McKeever, and O. Alatise, "Condition monitoring for solder layer degradation in multi-device system based on neural network," in *The Journal of Engineering*, vol. 2019, no. 17, pp. 3582-3586, 6 2019, doi: 10.1049/joe.2018.8025.
- J3. B. Hu, Z. Hu, L. Ran, C. Ng, C. Jia, P. McKeever, P. Tavner, C. Zhang, H. Jiang, and P. Mawby, "Heat-Flux Based Condition Monitoring of Multi-chip Power

Modules Using a Two-Stage Neural Network," in *IEEE Transactions on Power Electronics*, 2021, doi: 10.1109/TPEL.2020.3045604.

#### Chapter 5

- J4. A. Mohammed, B. Hu, Z. Hu, S. Djurovic, L. Ran, M. Barnes, and P. Mawby, "Distributed Thermal Monitoring of Wind Turbine Power Electronic Modules Using FBG Sensing Technology," in *IEEE Sensors Journal*, vol. 20, no. 17, pp. 9886-9894, 1 Sept.1, 2020, doi: 10.1109/JSEN.2020.2992668.
- C3. B. Hu, Z. Hu, L. Ran, P. Mawby, C. Jia, C. Ng, and P. McKeever, "Deep Learning Neural Networks for Heat-Flux Health Condition Monitoring Method of Multi-Device Power Electronics System," 2019 IEEE Energy Conversion Congress and Exposition (ECCE), Baltimore, MD, USA, 2019, pp. 3769-3774, doi: 10.1109/ECCE.2019.8912666.

#### Chapter 6

J5. B. Hu, C. Jia, C. Ng, P. McKeever, S. Lakshminarayana, B. Chen, C. Zhang, and L. Ran, "Condition Monitoring of Wind Turbine Converters Based on Limited SCADA Data," ready to submit to *IEEE Transactions on Power Electronics*.

Other publications

- J6. B. Hu, J. Ortiz Gonzalez, L. Ran, H. Ren, Z. Zeng, W. Lai, B. Gao, O. Alatise, H. Lu, C. Bailey, and P. Mawby, "Failure and Reliability Analysis of a SiC Power Module Based on Stress Comparison to a Si Device," in *IEEE Transactions on Device and Materials Reliability*, vol. 17, no. 4, pp. 727-737, Dec. 2017, doi: 10.1109/TDMR.2017.2766692.
- J7. Z. Zeng, W. Shao, H. Chen, B. Hu, W. Chen, H. Li, and L. Ran, "Changes and challenges of photovoltaic inverter with silicon carbide device," in *Renewable and Sustainable Energy Reviews*, vol. 78, pp. 624-639, Oct. 2017, doi: 10.1016/j.rser.2017.04.096.

- C4. B. Hu, Z. Zeng, W. Shao, Q. Ma, H. Ren, H. Li, L. Ran, and Z. Li, "Novel cooling technology to reduce thermal impedance and thermomechanical stress for SiC application," 2017 IEEE Applied Power Electronics Conference and Exposition (APEC), Tampa, FL, 2017, pp. 3063-3067, doi: 10.1109/APEC.2017.7931133.
- C5. X. Li, H. Jiang, B. Hu, H. Chen, Z. Zeng, L. Ran, and P. Mawby, "Electro-Thermal Limited Switching Frequency for Parallel Diodes," 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Portland, OR, 2018, pp. 4692-4698, doi: 10.1109/ECCE.2018.8557614.

(J for journal, C for conference)

### Abstract

The powertrain conversion system in state-of-the-art wind turbines has developed to a power rating of more than 10 MW. Due to the relatively low current rating of a single semiconductor chip, the large power module in turbine converters still adopts a multi-chip-in-parallel setup, counted as the most vulnerable component in the turbine system. Thus, this thesis focuses on evaluating the uneven degradation of multi-chip power modules under realistic conditions and developing field-deployable condition monitoring methods for wind turbine converters.

Two kinds of initial defects in power module solder layer, voids and cracks, indeed grow quietly under low-temperature stress cycles, illustrated by computed tomography scanning and finite element analysis. This thesis provides a physics-of-failure tool to estimate such dynamic of defect growth and finds that a void may first transform into a crack then grow more rapidly leading to device failure. At converter level, due to deep temperature cycling calculated from an electrothermal model, the machine side converters of fully and partially rated wind turbines, both consume a large amount of lifetime under the fundamental frequency. When looking inside the multi-chip module, an asymmetrical packaging layout and initial defects can cause years lifetime difference between paralleled devices while the weak one's further ageing progress will be significantly accelerated.

A condition monitoring scheme for detecting such uneven degradation in a multi-chip-inparallel system is proposed in this thesis, based on a core concept - train a network to represent the healthy state and then use its prediction deviation to distinguish faulty conditions. A two-stage neural network method based on only external measurements experimentally achieves a detection rate of over 98%. Furthermore, the feasibility of such a method is improved in three aspects. The labelled data for the network training is generated from an inverter test rig of equivalently emulating uneven degradation. The fibre Bragg grating multi-point sensing technique provides high temperature measuring precision with immunity to electromagnetic interference. The complex operating conditions is also generalised by a deep neural network structure, which achieves an overall accuracy of more than 95% under dynamic thermal conditions encountered in a practical wind speed profile.

Finally, based on the same concept, a field-deployable condition monitoring method is proposed to detect the early-stage fault of wind turbine converters using limited and unbalanced SCADA data. A deep neural network with optimised cost function is designed by an unsupervised approach and empowered by an online learning process for long-term real-time anomaly detection. The proposed method shows robust diagnosis results and would predict the converter fault a few days ahead of actual failure.

## Nomenclature

| ALT  | accelerated lifetime testing         |
|------|--------------------------------------|
| B2B  | back-to-back                         |
| CDF  | cumulative density function          |
| CJC  | cold junction temperature            |
| CM   | condition monitoring                 |
| CT   | computed tomography                  |
| CTE  | coefficients of thermal expansion    |
| DBC  | direct bonded copper                 |
| DFIG | doubly-fed induction generator       |
| DL   | degradation level                    |
| DNN  | deep neural network                  |
| DUT  | device under test                    |
| EEP  | estimated electrical parameters      |
| EMI  | electromagnetic interference         |
| EPL  | Estimated power loss                 |
| FBG  | fibre Brag granting                  |
| FBP  | filtered back projection             |
| FCNN | fully connected neural network       |
| FEA  | finite element analysis              |
| FRD  | freewheeling diode                   |
| HVDC | high voltage direct current          |
| IGBT | insulated gate bipolar transistor    |
| IGCT | integrated gate commutated thyristor |
| LSTM | long-short term memory               |
| MAE  | mean absolute error                  |
| MEP  | measured electrical parameters       |
| MPPT | maximum power point tracking         |
| MSE  | mean square error                    |
| MTBF | mean time between failures           |
|      |                                      |

- MTO metal-oxide-semiconductor turn-off thyristor
- NN neural network
- OPL measured operational power loss
- PC principal component
- PCA principal component analysis
- PDW probability density weight
- PMSG permanent magnet synchronous generator
- ReLU rectified linear unit
- SEM scanning electron microscope
- SCADA supervisory control and data acquisition
- TDDB time-dependent dielectric breakdown
- TIM thermal interface material
- TSEP temperature sensitive electrical parameter
- VSC voltage source converter
- WT wind turbine

| A                     | accuracy                                            |
|-----------------------|-----------------------------------------------------|
| С                     | consistency                                         |
| d                     | duty cycle, distance                                |
| D                     | bond wire diameter                                  |
| $D_a$                 | accumulated damage                                  |
| $\Delta arepsilon$ in | fatigue inelastic strain per cycle                  |
| F                     | cumulative density function                         |
| f                     | the fundamental cycle current frequency             |
| f <sub>sw</sub>       | switching frequency                                 |
| Ibw                   | current through each bond wire                      |
| <i>i</i> <sub>c</sub> | converter output phase current                      |
| $I_{g,leak}$          | gate leakage current                                |
| Ig,on                 | gate turn-on current                                |
| Isc                   | short circuit current                               |
| LChrs                 | the lifetime consumption per hour                   |
| LC <sub>MWhrs</sub>   | The lifetime consumption per MW hrs power generated |
| m                     | number of data                                      |
|                       |                                                     |

| Nf                 | the number of cycles to failure                      |
|--------------------|------------------------------------------------------|
| $P_{cond,S}$       | the conduction loss of switch device                 |
| $P_{cond,D}$       | the conduction loss of freewheeling diode            |
| Phrs               | the power generated per hour                         |
| $P_{toff}$         | turn-off losses                                      |
| Pton               | turn-on losses                                       |
| $R^2$              | coefficient of determination                         |
| $R_{Al}$           | metallization layer resistance                       |
| $R_{bw}$           | bond wire equivalent resistance                      |
| Ron                | on-state resistance                                  |
| $R_{on,S}$         | the on-state resistance of switch device             |
| $R_{on,D}$         | the on-state resistance of freewheeling diode        |
| Rth                | thermal resistance                                   |
| $\Delta R_{th,a}$  | increase of normalised thermal resistance to failure |
| $R_{th,c-h}$       | case-heatsink thermal resistance                     |
| $R_{th,j-c}$       | junction-to-case thermal resistance                  |
| $R_{th,TIM}$       | thermal resistance of the thermal interface material |
| σ                  | standard deviation                                   |
| SA                 | accuracy with squared error                          |
| size               | number of points in dataset                          |
| SSA                | smoothed accuracy with squared error                 |
| t                  | actual observation                                   |
| Т                  | principal component                                  |
| Tamb               | ambient temperature                                  |
| $T_c$              | case temperature                                     |
| $T_h$              | heatsink temperature                                 |
| Tin                | inlet coolant temperature                            |
| $T_j$              | junction temperature                                 |
| T <sub>j,min</sub> | minimum value of junction temperature                |
| Tj,mean            | mean value of junction temperature                   |
| $\Delta T_j$       | amplitude of junction temperature cycling            |
| Tout               | outlet coolant temperature                           |
| ton                | time of heating stage                                |

| и            | batch size                       |
|--------------|----------------------------------|
| v            | number of output channels        |
| Var          | variation                        |
| $V_{ce}$     | on-state voltage                 |
| $V_{ce,off}$ | turn-off voltage                 |
| Vce,sat      | on-state saturation voltage drop |
| $V_F$        | forward voltage drop             |
| $V_{ge}$     | gate-emitter voltage             |
| $V_{ge,off}$ | gate turn-off voltage            |
| $V_{ge,on}$  | gate turn-on voltage             |
| $V_{ge,th}$  | gate-emitter threshold voltage   |
| Vrated       | device rated voltage             |
| Vth          | threshold voltage                |
| w            | probability density weight       |
| W            | weight matrix                    |
| X            | standardized data                |
| Χ            | SCADA data                       |
| У            | prediction of the network        |
|              |                                  |

## **1** INTRODUCTION

#### 1.1 BACKGROUND

In large-scale power system applications such as wind energy and high voltage direct current (HVDC) transmission, the use of advanced high-power semiconductor devices can effectively simplify circuit topologies, reduce device failure rates and system maintenance costs. At present, the mature commercial high-power devices on the market mainly include thyristors and their derivatives: integrated gate commutated thyristors (IGCTs), metal-oxide-semiconductor turn-off thyristors (MTOs), and insulated gate bipolar transistors (IGBTs). Among them, the IGBT, as a type of fully controlled semiconductor device by gate voltage, has the advantages of a flexible control scheme and high-power density and has been regarded as the mainstream of high-power devices. With the growth of the power system's demand for power electronics, the proportion of high-power power electronic equipment continues to increase, so do the requirements for higher reliability, namely lower failure rate and longer mean time between failures (MTBF).

The traditional method of improving system reliability adopts redundant design, i.e., by adding hot spare devices, the system can still maintain normal operation after some components fail. Another method is to reduce the failure rate of core components and reduce the number of failures in the system. The failure rate of power semiconductor devices with time is generalised as "bathtub curve" as shown in Figure 1.1 [1]. According to the different stages of failure, it can be divided into early failure, accidental failure, and ageing failure. Devices have a high failure rate in the early and late stages of service. Early failures are affected by the yield of the device production process and can be resolved through device

screening and replacement. The failure rate will gradually decrease over time. In the later stages of service, the device will suffer internal damage and performance degradation caused by fatigue accumulation resulting in ageing failure rate gradually increasing with time. It can be seen that the failure of power semiconductor devices mainly occurs in the later stage of service, and the failure rate is closely related to the ageing development process. Therefore, studying the ageing failure mechanism and monitoring scheme of power semiconductor devices is of great significance for improving the reliability of power electronic equipment and reducing shutdown failure loss.



Figure 1.1: Change in the failure rate of a semiconductor device over time [1]

In order to reduce the ageing failure rate of power semiconductor devices, the currently regular maintenance is adopted as a countermeasure in industry. The ageing and failure of components are prevented through regular maintenance and replacement of long-term serviced components. Although regular maintenance can reduce the device failure rate to a certain extent, it costs relatively high and cannot completely prevent the occurrence of failures. Condition monitoring is a health state evaluation method that uses sensors and signal processing technology to collect and analyse the operating characteristic parameters of a power electronic system [2]. Condition monitoring can timely discover the performance degradation of the device in the later stage of service, so as to take targeted measures to prevent failure. Compared with regular maintenance, condition monitoring can further improve the reliability and effective utilisation of equipment and reduce overall system operation and maintenance costs.

Moreover, the bathtub curve shown in Figure 1.1 is a statistical result extracted from experimental tests of a large number of devices. For a specific single device, its ageing process is affected by system layout design, operating conditions, and initial microscopic defects, which would inevitably cause uneven degradation among devices. The power electronic system reliability depends on the weakest component, e.g., the failure of a single chip can result in the multi-chip power module burned out completely. The cost of maintenance and repair of wind turbines is a considerable large number, especially for offshore wind farms. Thus, the condition monitoring, as effective means to detect ageing and avoid sudden failures, can play a very important role in all stages of the device and system service life.

At present, there has been a lot of research on the ageing failure mechanism of power semiconductor devices, and the packaging-related ageing failure is one of the most critical areas needed to focus on. High-power devices can be divided into solder-bonding device and press-pack device according to their packaging types. The failure of the soldered device is mainly due to mechanical stress caused by the mismatch of coefficients of thermal expansion (CTEs) of packaging materials [3-6]. The press-pack device is highly reliable compared with soldered modules, but its failures are more due to the uniformity of paralleled semiconductor chips in terms of pressure, current, and temperature [7-10]. In addition, long-term electrical and thermal stress can also cause semiconductor chip-related ageing failures [1, 11]. On-line condition monitoring of power devices has become another research hotspot. From the perspective of device ageing characteristics, a variety of monitoring methods have been proposed [2, 4, 12-14]. However, the condition monitoring of high-power devices is still facing the challenges: the interaction of multiple ageing mechanisms, the insensitivity of ageing characteristic parameters, and the difficulty of signal acquisition, which need further research.

#### **1.2 PACKAGING OF HIGH-POWER SEMICONDUCTORS**

The high-power semiconductor device adopts modular packaging, and its packaging forms include solder-bonding and press-pack. The internal packaging structure of the two types of IGBT packaging is shown in Figure 1.2.



Figure 1.2: The packaging of high-power IGBT devices, (a) solder-bonding IGBT module, (b) full press-pack IGBT, and (c) disk-spring press-pack IGBT

The devices with two types of packaging have different post-failure manifestations. The solder-bonding module, after bond wire lift-off, will result in open circuit. While the failure of press-pack is mostly manifested as a short circuit[15]. Short-circuit failure has great advantages when devices are connected in series in high-voltage equipment. A limited number of device failures will not affect the normal operation of series-connected components. With the redundant design, the overall reliability is high, but the cost and power consumption will also increase.

The press-pack device realises the interconnection of the packaging material by applying mechanical pressure, generally 10-20 N/mm<sup>2</sup>. The electrical layout and heatsink are arranged at both ends of the device, which easily achieves the series connection of multiple devices. The structure of the press-pack device consists of two types: full press-pack and disc-spring press-pack. The full press-pack package is evolved from high-power diodes and thyristors. It uses ceramic packaging technology and has good sealing. The disc-spring press-pack is ABB's patented packaging technology (ABB Stakpak): the built-in paralleled disc-spring realises the decoupling of the pressure among paralleled chips leading to even pressure distribution. With regards to press-pack IGBT installation, a layer of TIM is also coated between the device and the heatsink to reduce thermal resistance [16]. Since there is no electrical isolation between the press-pack devices and the heatsinks, deionized water or insulating oil is generally required as a coolant to avoid being conductive. This type of packaging is typically utilised in HVDC power transmission and high-power traction drives.

The packaging materials of each layer of the solder-bonding module are interconnected by reflow soldering and wire bonding. The electrical connection is concentrated on the upper part of the module, and the heatsink is located at the bottom. Both are electrically insulated by direct bonded copper (DBC). The module is not completely sealed: silicone gel is encapsulated into the module to isolate the external environment such as moisture, acid, and alkali, and to improve the insulation capacity. In the application field assembly, the module is fixed on the heatsink surface by screws. In order to reduce the contact thermal resistance between the module baseplate and heatsink surface, a layer of thermal interface material (TIM) must be coated between such two surfaces during installation. This type of packaging is used in most power conversion applications such as power supply units, electric vehicles, wind turbine drivetrain converter, grid-tied inverters and etc. Thus, this thesis will focus on the investigation of reliability and condition monitoring of the solder-bonding module.

#### **1.3 AGEING MECHANISMS AND FAILURE MODES**

The ageing failure of power devices is a combined result of the accumulation of internal fatigue damage under stress and the effect of the external environment. The complicated multi-physics coupling will produce various failure modes. According to the types of stresses

the device suffered, the failure mechanism can be divided into two types: thermomechanical stress and electrical stress.

#### 1.3.1 Thermomechanical stress

The packaging materials of power semiconductor devices have differences in CTEs. When operating conditions changing, the material interconnection interface will be subjected to strong alternating thermomechanical stress. Under such long-term effect, creep, fatigue, and wear-out will cause interconnection degradation and device failure. The ageing failure modes and failure mechanisms are summarised in Table 1.1. The thermal interface material between the power semiconductor device and the heat sink will also gradually degrade due to thermal cycling, which will appear in both packaging types.

| Device         | Failure modes                    | Ageing mechanisms            |
|----------------|----------------------------------|------------------------------|
|                | Bond wire lift-off               | Fatigue                      |
| Solder-        | Solder degradation (void, crack) | Creep, fatigue               |
| bonding        | Metal reconstruction             | Plastic strain               |
|                | TIM degradation                  | Volatilization, displacement |
|                | Contact surface damage           | Fretting wear                |
| Press-<br>pack | Spring failure                   | Fatigue, stress relaxation   |
|                | TIM degradation                  | Volatilization, displacement |

Table 1.1: Failure mechanisms caused by thermomechanical stress

Under the effect of thermomechanical stress, the solder-bonding device suffers internal ageing, which mainly occurs at the interconnection interfaces such as bond wires and solder layers. The bond wire failure is mainly manifested as the bond wire lift-off, as shown in Figure 1.3 [2, 6]. There is a large difference between the CTEs of aluminium bond wires (23 ppm/K) and silicon chips (3 ppm/K). The bond wire heel on the chip surface will be subjected to high reciprocating shear stress under large junction temperature fluctuations. The shear stress is likely to cause fatigue resulting in a crack initiation, and the further development of the crack will eventually lead to bond wire lift-off [2, 6]. After a single wire

falls off, the falling process of remaining bond wires will be accelerated due to current overloaded, which may also cause the top of the lead to fuse [5]. A high-power device is mostly packaged with multi-chip in parallel, in which dozens of bond wires are shunted at the same time. This requires extremely high sensitivity of condition monitoring, necessitating to comprehensively consider measurement signals, monitoring objects and diagnosis solutions.



Figure 1.3: Bond wire lift-off, (a) diagram [2] and (b) scanning electron microscope (SEM) photo [6]

The ageing of the solder layer is manifested as the increase of voids and cracks, as shown in Figure 1.4 [2, 17]. Although the vacuum reflow process is widely used, air bubbles remain in the solder layer during the soldering process, forming initial voids. The initial void will cause stress concentration on the surrounding solder area, lead to void growth and crack initiation, and accelerate device ageing process [2, 5, 17]. In order to improve the current capability, the chip solder area and the baseplate of the high-power device are relatively large. Although the void rate of commercial product can generally be controlled below 5%, some studies have shown that the voids distributed on the boundary between the solder layer

and the chip will cause the cracks to develop faster [18]. In addition, there is also stress concentration at the corners of the solder layer, which is easy to creep and cause initial cracks under the alternating thermal stress [19]. A large amount of stress concentrates on the crack tip and accelerates its expansion and development, decreasing the effective contact area of the solder layer and increasing the thermal resistance of the device. It can be seen that a single failure of the solder layer is a very complicated process, so a more convenient and effective advanced modelling method is needed to evaluate this. At the same time, based on its most important feature, the increase of thermal resistance, the condition monitoring strategy should be studied correspondingly.



Figure 1.4: Solder degradation, (a) diagram [2] and (b) SEM photo [17]

Bond wire lift-off and solder layer degradation are the two most common ageing mechanisms for solder-bonding devices. The occurrence of the two mechanisms is closely related to the changing rate and amplitude of the chip junction temperature  $(T_j)$  during device operation. The power cycling test results show that under a small amplitude temperature cycling  $(\Delta T_j)$  the device is mainly damaged by the solder layer fatigue. As the junction-to-case thermal resistance  $(R_{th,j-c})$  of the device increases,  $\Delta T_j$  increases accordingly, leading to

the bond wire fall off in a short time [3, 20]. In addition, the metallization layer on the surface of the chip is a granular structure formed under vacuum conditions. Under the effect of thermal stress, metallization reconstruction results in plastic deformation of the material, surface particles bulging, and increased electrical resistance and surface roughness [21]. Reconstruction of the chip surface metallization and the bond wire shedding will cause changes in the current distribution on the chip's surface and interior and increase the risk of local over-electric stress failure of the chip [22]. In practices, these two failure mechanisms are interacted, which increases the difficulty of accurately identifying the health state of the device.

The thermal interface material attached to the power semiconductor device will also be degraded under thermal cycling. Thermal interface materials mostly use silicone grease. It is a mixture of silicone oil and thermally conductive fine particles and has low thermal conductivity. Although the thickness of thermal interface materials is generally only 10-100µm, its thermal resistance can account for above 50% of the total thermal resistance of the system [11]. Warping and deformation of the device baseplate during thermal cycling will cause pumping and squeezing of the TIM; high temperature and thermal cycling will separate the silicone oil and the filler particles, volatilise the silicone oil, and dry up the silicone grease. Such effects can decrease the thermal conductivity of the silicone grease. The thermal interface material degradation does not directly affect the electrical characteristics of the device, but similar to the ageing of the solder layer it causes an increase in thermal resistance [23, 24]. Therefore, it is necessary to study the corresponding monitoring methods to distinguish it from the solder layer ageing.

The press-pack packaging system eliminates the bond wire and solder layer in the solderbonding devices. Compared with the solder-bonding devices, the operating life of press-pack can be increased by at least an order of magnitude [25]. However, the effect of thermomechanical stress will cause the internal ageing of the press-pack devices, resulting in the contact degradation of packaging interconnection interface material. The thermal mismatch of press-pack packaging materials will cause relatively little sliding between contact materials, e.g. the sliding distance between the IGBT chip (Si, CTE= 3 ppm/K) and the upper metal sheet (Ag/Al, 19/23 ppm/K) in the disc-spring press-pack is about 10 $\mu$ m [15]. Periodic fretting will cause wear and local fatigue on the contact surface, resulting in fretting damage [26].

It is worth noting that the press-pack usually packages multiple chips parallelly into a single device. This is prone to the problem of uneven pressure distribution and will accelerate the degradation of the device. In a press-pack IGBT, improper pressuring or a  $1\sim3\mu$ m height difference between parallel chips can cause 90% pressure difference among chips [10, 27]. Uneven pressure distribution will cause excessive mechanical stress and poor contact on the local area leading to mechanical damage [8]. It will also change the current distribution among paralleled chips and increase the risk of local over-electrical stress failure. In practices, the degradation failure process is often related to uneven stress distribution [28]. Therefore, attention should be paid to such uniformity problem when studying the ageing failure mechanism.

#### 1.3.2 Electrical stress

In addition to packaging ageing, the high voltage device will also suffer considerable electrical stress under long-term operation. The failures caused by electrical stress are summarised in Table 1.2 [29].

| Location                      | Failure modes                                                   | Ageing mechanisms                                           |
|-------------------------------|-----------------------------------------------------------------|-------------------------------------------------------------|
| Oxide layer                   | High leakage current,<br>loss of gate control,<br>short circuit | TDDB, hot electrons,<br>negative temperature<br>instability |
| Silicon die                   | Device burnout,<br>loss of gate control                         | Latch-up                                                    |
| Aluminium metallization layer | Open circuit,<br>short circuit                                  | Electromigration                                            |

Table 1.2: Failure mechanisms caused by electrical stress [29]

The gate oxide dielectric of the IGBT chip can be degraded by exposing to an electric field environment for the long term. Even if the electric field strength is relatively low, the device will eventually break down under the rated operating voltage, i.e. time-dependent dielectric breakdown (TDDB) [1]. The gate oxide breakdown caused by TDDB and electrostatic discharge is similar, while the latter is instantaneously triggered by high external field strength. The micro defects and impurities in the gate oxide layer will initiate the TDDB and cause micro leakage current. New defects will be further created in the gate oxide by such micro leakage current. As the defects accumulate, the gate oxide will eventually break down, as the process shown in Figure 1.5 [1].



Figure 1.5: Gate oxide dielectric breakdown process [1]

When the current density reaches around  $10^5 \sim 10^6$  A/m<sup>2</sup> under high temperature, the aluminium atoms in the metal layer on the chip surface will undergo electromigration. Plasticization of the surface metal of the solder-bonding package chip, the bond wire lift-off, and poor contact of the local chip in the press-pack module will influence the current distribution among the chips and may cause electromigration at the local area. As shown in Figure 1.6 [1], the aluminium metallization layer on the chip surface has a granular structure, where the electromigration of its surface atoms will produce cavities and hillocks in the metallization layer, resulting in accelerated degradation of the metallization layer [1, 8, 30].



Figure 1.6: The electromigration of the metallization layer on the chip surface [1]

#### 1.3.3 Ageing characteristic parameters

Once a power semiconductor device starts ageing, its electrothermal characteristics will change accordingly with the ageing process. By monitoring such characteristic parameters, the health state of the device can then be evaluated. The different failure modes of high-power IGBT devices and their corresponding ageing characteristic parameters are shown in Table 1.3.

| Failure modes                     | Ageing characteristic parameters                                  |
|-----------------------------------|-------------------------------------------------------------------|
| Bond wire lift-off                | V <sub>ce,sat</sub> , R <sub>bw</sub> , switching characteristics |
| Solder layer degradation          | $R_{th,j-c}$                                                      |
| Gate oxide dielectric degradation | $V_{th}, I_{g,leak}$                                              |
| TIM degradation                   | R <sub>th,c-h</sub>                                               |
| Metallization layer degradation   | Vce,sat, RAI                                                      |

Table 1.3: The ageing characteristic parameters of the high-power IGBT device

The ageing failure modes of the solder-bonding device include bond wire lift-off and solder layer degradation. Once the bond wire falls off from a multi-chip in parallel high-power module, such circuit changing affect the current sharing between paralleled chips, as well as increase the on-state resistance  $R_{on}$ , the on-state saturation voltage drop  $V_{ce,sat}$  [2, 5, 21], and the bond wire equivalent resistance  $R_{bw}$  [31]. In the case that the bond wires of a single chip are all off, namely the module loses a paralleled chip, such fault is potential to be detected through the overall switching characteristics of the module, for instance, gate turn-on current  $I_{g,on}$ , gate turn-on voltage  $V_{ge,on}$  and turn-off voltage  $V_{ce,off}$  [31]. The degradation of the solder layer will increase the junction-to-case thermal resistance  $R_{th,j-c}$  and of course, affect the temperature gradient on the packaging structure [31].

In addition to device packaging-related failure modes, the gate oxide dielectric degradation, thermal interface material degradation, and metallization degradation exist in both types of power devices. Degradation of the gate oxide dielectric will increase the threshold voltage  $V_{th}$  and the gate leakage current  $I_{g,leak}$  [1]. The degradation of the thermal interface material

has a similar thermal effect of the solder degradation, i.e. increase the case-heatsink thermal resistance  $R_{th,c-h}$ . The degradation of the chip aluminium metallization layer will directly manifest as the increase of, metallization layer resistance  $R_{Al}$  and the on-state saturation voltage drop  $V_{ce,sat}$ .

According to power cycling experimental results, 5% increase of  $V_{ce,sat}$  or 20% increase of  $R_{th,j-c}$  are adopted as the failure criteria of IGBT devices [21]. Once the ageing characteristic parameters of a power semiconductor device exceed the above-mentioned criteria, the device is considered to reach the end-of-life and will eventually fail if the local sharply increased stress continuously applies, which is only avoidable if such being-warned device is replaced based on pre-established maintenance strategy.

#### 1.4 CONDITION MONITORING

The condition monitoring technique is to obtain ageing characteristic parameters through sensors and then recognise the health state of the device according to the predefined failure criterion or diagnostic models. The ageing characteristic parameters are desired to reflect the failure mode sensitively and can be accurately measured by field-deployable sensing techniques. Based on signal collection measures, condition monitoring methods can be categorised into device terminal feature collection, built-in sensors, and algorithm-based methods. The three types of monitoring methods complement each other and are often used conjunctionally in practical applications.

#### 1.4.1 Device terminal features

The condition monitoring method based on device terminal feature collection, which does not involve the internal packaging of the device but only measures signal at the connection terminals, is currently the most widely used method. On the terminals of an IGBT module, the  $V_{ce,sat}$ ,  $V_{th}$ , switching characteristic and case temperature are all collectable by using sensors and signal processing circuits.

The  $V_{ce,sat}$  of the IGBT device can be measured using the circuit shown in Figure 1.7 [12]. Two diodes  $D_1$  and  $D_2$  with the same characteristic are connected in series. When the IGBT under test is turned on,  $D_1$  and  $D_2$  are forward biased by the current source ( $I_D$ ); when the IGBT is turned off,  $D_1$  blocks the DC bus capacitor voltage  $V_{ce}$  to protect the measurement circuit. The voltage across  $D_2$  is input into the amplifier, thereby with the resistance  $R_1 = R_2$ , the output signal of the amplifier is  $V_{ce,sat}$ . When the current of the measured IGBT/FRD branch reverses, the forward voltage drop  $V_F$  of the paralleled freewheeling diode is then captured by the same mechanism.



Figure 1.7: The circuit of measuring on-state saturation voltage drop  $V_{ce,sat}$  [12]

However, the measurement result is depended on the operating points, namely the load current, and the accuracy is highly restricted by the precision of electronic components. The  $V_{ce,sat}$  change is very low with IGBT ageing, only tens of millivolts, which is easily interfered by measurement noise. Besides,  $V_{ce,sat}$ , is also as a temperature sensitive electrical parameter (TSEP), and it is affected jointly by junction temperature and ageing states. Regarding this coupling effect, literature [14] proposes a method to measure  $V_{ce,sat}$  and junction temperature separately by a quasi-online condition. In order to eliminate the interference of junction temperature fluctuations, literature [32] proposes to monitor  $V_{ce,sat}$  at the IGBT zero temperature coefficient point. In the high-power IGBT device with multi-chip in parallel, the change of  $V_{ce,sat}$  is related to multiple ageing failure modes, such as bond wire lift-off, metallization layer degradation, and gate oxide mechanical damage in presspack, which all can increase the  $V_{ce,sat}$ . In practices, it is necessary to combine other ageing characteristics and monitoring methods to comprehensively diagnose the device health state.

The threshold voltage  $V_{th}$  and gate leakage current  $I_{g,leak}$  are two indicators for assessing the health of the IGBT gate oxide [33]. Currently, the  $V_{th}$  is accessible through quasi-online or offline measurement methods [8, 33]. In addition, the local ageing failure of multi-chip in

parallel IGBT devices can be monitored according to the changes in gate voltage  $V_{ge,on}$  during turn-on process [34]. In a solder-bonding IGBT module with 18 chips in parallel, the gate voltage of the module at a fixed time point (1.2 µs after turn-on signal triggered) is used as the indicator  $V_{GE(pre-th)}$  to monitor the paralleled chip uneven failure which is mimicked by cutting all bond wires on a chip. As shown in Figure 1.8,  $V_{GE(pre-th)}$  will increase with the number of open-circuit chips. This method only evaluates such characteristics in a double pulse test rig at fixed single operating points. However, considering the complex operating conditions and strong electromagnetic environment of high-power power electronic systems, this method demands extremely high-precision voltage measurement and time-sequence sampling. In addition to measurement challenges, the method only considered the complete open circuit of a single chip and cannot detect the gradual ageing process of bond wires. In a high-power solder-bonding module, the wire falling off can easily cause the failure of a single chip and cause the whole module to burn down completely.



Figure 1.8: (a) The *V<sub>GE(pre-th)</sub>* and (b) experimental measurement of the uneven degraded module under different temperature [34]

In addition to turn-on characteristics, the bond wire lift-off will also affect the  $R_{bw}$  which cannot be measured directly but the gate turn-off voltage  $V_{ge,off}$ , short circuit current  $I_{sc}$  [35], and transconductance can reflect such behaviour according to their relationship with  $R_{bw}$  [36].

The temperature-induced failure is one of the most common failure mechanisms in power electric systems. Junction temperature monitoring is subsequently the fundamental of lifetime estimation, condition monitoring and reliability analysis of high-power semiconductor devices. However, the chips are packaged inside the power module, and their temperature is very difficult to be detected. Thus, the temperature sensitive electrical parameters (TSEPs) are proposed to monitor the junction temperature indirectly, i.e., measure the electrical signals on module terminals and derive the junction temperature according to the temperature dependencies of such signals. The most common TSEPs are forward on voltage, threshold voltage  $V_{th}$ , turn-on delay time, dv/dt of turn-off process, and switching loss [30, 37-39]. However, their sensitivities may reduce to such a level that they are no longer meaningful in a multi-chip system with uneven degradation.

According to the definition of thermal resistance, the monitoring of junction-to-case thermal resistance  $R_{th,j-c}$  and case-heatsink thermal resistance  $R_{th,c-h}$  needs to collect case temperature  $T_c$  and heatsink temperature  $T_h$  respectively. As shown in Figure 1.9, the  $T_c$  and  $T_h$  of a high-power solder-bonding module can be acquired online by thermocouples, and the installation position of the thermocouples should be directly below the chips [40].



Figure 1.9: The installation of thermocouples on the high-power module [40]

The thermal resistance calculation of water-cooled heatsink usually uses the coolant temperature as the temperature reference point and monitors the water temperature at the inlet and outlet of the heatsink. In addition, optical fibre temperature sensing techniques can also be used to eliminate electromagnetic interference and insulation problems, which is a promising tool for field-deployment.

#### 1.4.2 Built-in sensors

Integrated sensors in power device packages and semiconductor chips are commonly used to protect the power electronic system normal operation. Some solder-bonding power modules such as Infineon FF1000R17IE4 embed NTC (negative temperature coefficient) thermistors on the DBC board to achieve overheat protection. The integrated sensor in the semiconductor chip can also monitor the ageing characteristic parameters, such as junction temperature and bond wire equivalent resistance  $R_{bw}$ . The online monitoring method based on the built-in sensor mainly includes additional resistance to monitor the bond wire lift-off [2], optical fibres to measure the junction temperature [39], and a sensor inside the chip to monitor the current and junction temperature [41].

Regarding to bond wire lift-off, an additional resistance branch in parallel with bond wires can be integrated into the module package to monitor the change of  $R_{bw}$  and evaluate the bond wire health condition [2]. But the integration of additional branches may affect the package parasitic parameter distribution and increase the unbalanced current sharing. The response speed of optical fibre temperature sensing is in the microsecond level, and the measurement is not subjected to electromagnetic interference. It can be used for on-line junction temperature monitoring, but the installation of optical fibres and their influence on device sealing and insulation still need to be studied. As shown in Figure 1.10 [41], the integrated temperature-sensitive diode sensor in the IGBT chip can online monitor the current and junction temperature of the single chip. However, it will increase the number of auxiliary terminals of the device. When used in a high-frequency converter, the output of the diode is subject to strong electromagnetic interference, and the signal-to-noise ratio can be extremely low.



Figure 1.10: The IGBT chip with integrated temperature diode sensor [41]

Conventional temperature measurement sensors such as thermocouples (TCs), resistance temperature detectors (RTDs), platinum thermometers (PTs) and thermistors have several inherent limits for effective temperature monitoring [42]. This principally concerns the exposure to the high EMI environment inherent to power device operation, which can affect TC readings by inducing a voltage in its wiring and leading to inductive heating of the TC itself. RTDs, on the other hand, can provide high accuracy and have adequate immunity to electromagnetic fields. However, they tend to be large and are not always practical for application in high power density system. When current flows through a thermistor, it generates heat, which raises the thermistor's temperature above that of its environment. This self-heating may introduce significant error if a correction is not made. In summary, electrical-based temperature sensing techniques are not suitable for high precision measurement in power modules due to high EMI. Besides, their large size and unwieldy wiring are detrimental to the power electronic system's compact structure and can reduce heat transfer efficiency [43, 44].

The fibre Bragg grating (FBG) is a small size, lightweight, and power passive sensing technology that is immune to EMI when implementing the power electronic system. It is also suitable for multi-point temperature distribution measurement by carrying an array of distributed sensing points on a single fibre. Figure 1.11 shows a schematic diagram of an FBG sensor architecture. The FBG sensor is a microstructure, typically a few millimetres in length, imprinted in the core of a single-mode optical fibre. It is fabricated by exposure of a segment of the fibre's core to a pattern of UV light. This process induces a permanent physical change, creating a periodic modulated refractive index in the silica core structure [45].



Figure 1.11: FBG sensor structure [45]

The FBG sensing concept is based on light reflection capability and the modulated refractive index's sensitivity to variation of external thermal and mechanical conditions acting on the FBG structure [42]. Each FBG can reflect a specific spectrum of pre-designed wavelength with the fibre illuminated by broadband light, as shown in Figure 1.12 [46]. The reflected spectrum wavelength shifts linearly with the temperature variation, which can be demodulated by an interrogator device.



Figure 1.12: Operating principle of the FBG array sensor [46]

These are highly attractive sensing features that provide distinct advantages over conventional sensing techniques. Consequently, the application of FBG technology for monitoring temperature in the power modules is increasingly developing but all using invasive methods [47-49]. The application potential of FBG would be limited in in-service modules as it requires access to module internals and involves complicated and intrusive installation procedures.

#### 1.4.3 Algorithm-based methods

In the occasions where the operating points and environmental conditions are frequently changing, such as electric vehicles and renewable energies, the ageing characteristic parameters of power devices fluctuate subsequently. Some algorithm-based methods are proposed to accurately track ageing characteristics and to decouple them from operating points and environmental conditions.

The methods based on the device terminal features and the built-in sensors can only collect limited information from high-power devices. Therefore, some algorithms can mine and extract more health features from measured information. In order to monitor the thermal resistance of power devices, the junction temperature  $T_j$  has to be measured online accordingly. Under the conditions where the  $T_j$  is not able to be detected, the algorithm-based methods, such as transient double interface method [50], can assist in evaluating the ageing state of the solder layer according to the transient thermal resistance curve during device cooling period. Converting the transient thermal resistance curve into a structure function curve can further obtain the thermal resistance and the heat capacity of device packaging material layers and effectively evaluate the health of them. This method is also available to apply on the press-pack device [51]. When it is difficult to collect the junction temperature, it is also possible to measure only  $T_c$  and  $T_h$ , to derive the thermal resistance changes of the device and the thermal interface material. It is especially suitable for thermal resistance monitoring of high-power multi-chip semiconductor devices [52, 53].

A thermal network-based method was proposed for single-chip power modules, using external temperature measurements and the temperature dependence of device characteristics [52]. The internal thermal resistance was derived to signify any degradation that may have occurred. If extrapolated to a multi-chip-in-parallel module with measurement of the temperature distribution, it would take a long time to parameterize the thermal network model, which is of MIMO in nature. Such a model is also difficult to solve in real-time because updating the model to reflect many possible degraded conditions can be very involved. Besides, to cover the full range of operating points, including the ambient effects, it requires establishing many look-up tables to complete an adequate database. Therefore, such a model-based approach has not been developed to a level suitable for field deployment.

The conventional approaches have different types of difficulties when applied to a multichip power module. It is worth considering using the modelling capability of neural networks (NNs) to represent the time variant thermal characteristics. As a universal approximator, neural networks have found success in multi-input multi-output (MIMO) regression, classification, and clustering problems [54-58]. They are an attractive tool for system condition monitoring as long as the process being monitored is stationary. A power electronic system usually starts from the healthy condition, giving opportunities to capture the system response to the change of operating point and identify the effects of the cooling setting, ambient scenarios, and systematic sensor errors. Nowadays, the target system which includes the power module, such as an offshore wind turbine, is usually equipped with supervisory control and data acquisition (SCADA) which can be used to implement the computation of the trained neural networks.

However, it is essential to structure the NNs to appropriately map the input and output variables to detect the deviation from the healthy condition. This requires deep insight into the nature of the degradation and the expected response of the system in such circumstances. To complete the NN training cycle, it is necessary to emulate the degradation pattern and severity. Since there are many combinations, it is also useful to know the link between different cases and identify the possibility of interpolation.

Data-driven intelligent algorithms can also calibrate the collected ageing characteristic parameters and assist in evaluating the ageing status of the device. [59] used relevance vector machine to eliminate the interference factors in the gate current measurement results and support the extraction of current dynamic changes. [17] proposed a method based on pattern recognition of identifying the device operating health states, and [60] used neural networks to differentiate the uneven solder degradation of high-power devices. At the wind turbine (WT) system level, several SCADA data-based CM methods have been proposed for mechanical components [61], but none of these has been applied on WT converter. Meanwhile, there is very few studies in the early-stage detection with limited data. Ref. [60] analysed the possibility of training a group of neural networks (NNs) to identify the multidevices converter's degradation, requiring labelled degradation patterns to train the supervised learning model, which, however, are not consistently available in the early-stage operation. The clustering analysis and autoencoder are common unsupervised algorithms [62, 63] but lacking the capability of distinguishing the fault condition from the healthy state with limited data. A deep neural network (DNN) [64] with a conventional cost function may lose generalization on limited and unbalanced data if it cannot decouple the converter health state from its dependency on the quantity of operating points.

In WT systems, the SCADA data in the early-stage operation cannot cover the whole operating scenarios, e.g., wind speed, environmental temperature. Moreover, trial runs and debugging tests are also conducted frequently, causing the operating point to be below the rated power. The WT may operate mostly in the low power region. It results in unbalanced data distribution, which will bias the model training, i.e., more accurate around the operating points with a large amount of data than those with fewer data. Data enhancement methods, such as rotating, scaling and colour shifting, have been proposed for image processing [65], which cannot be directly applied to the SCADA data. Moreover, the conventional cost functions treat each data equally, which is not a proper metric to develop WT converter CM method with limited and unbalanced data. However, the CM method is required to capture the health state accurately no matter what the converter operating points are.

## 1.5 MOTIVATIONS AND OBJECTIVES

Due to the relatively low current rating of a single IGBT chip, large power modules often contain parallel chips which are not identically packed. The reliability of high-power multichip IGBT modules has become a focus of study as the devices are widely used in renewable energy and grid applications. In offshore wind powertrain conversion, realistic operation sends the module into lower amplitude temperature cycling than accelerated lifetime tests, which ages the packaging, in particular, the die-attach solder layer [20, 66]. From an application point of view, it is important to understand how the module may age from the beginning, and this uncertainty has prevented a more complete lifetime model from being attained. In the converter system with multi-chip in parallel, initial defects in the die-attach solder layers of different devices will gradually grow at different rates and develop into a pattern of uneven degradation [67]. From an application point of view, considering the same terminals, it is desired to detect the ageing fault without internal measurements. These above necessitate the evaluation of such degradation process and the monitoring of individual solder layer conditions later in life.

Based on the understanding introduced in this thesis, the key research objectives are identified in this thesis from two themes: Reliability - analyse the initial solder ageing development and uneven degradation process of the multi-chip power module under realistic stress conditions; Condition Monitoring - develop an online condition monitoring method to

detect such uneven degradation and promote the method to offshore wind turbine fielddeployment.

#### 1.6 CONTRIBUTIONS

With the proposed research objectives, the points of contributions are made in this thesis to the reliability and condition monitoring themes as follows:

This thesis investigates the solder ageing initialization process under realistic stress conditions. The thermomechanical stress distribution around solder layer defects is obtained by a thermomechanical finite element analysis (FEA) model which is validated by microscale computed tomography (CT) scanning plus power cycling tests of low amplitude temperature cycling. A physics-of-failure lifetime model is proposed from the basics of combining such an FEA model and solder material's properties. Based on such models, the thesis then simulates void development process, void-to-crack conversion, and crack growth in terms of stress distribution and timespan.

Another contribution to the reliability theme is the evaluation of the uneven degradation process of the multi-chip power module under long-term operation. Two electrothermal simulation models for the fully and partially rated converters in wind turbines employing permanent magnet synchronous generator (PMSG) and doubly-fed induction generator (DFIG) are established, respectively. Based on the simulation models, the converter lifetime is estimated in terms of end-of-life, per hour operation and per megawatt-hour energy generated. Then the uneven degradation process of paralleled devices in the multi-chip power module is evaluated under the consideration of the damage accumulation and wind speed distribution of long-term operation.

The first contribution in condition monitoring theme is a potential method for detecting uneven solder layer degradation in a multi-chip in parallel system. The sensitivities of the multi-chip power module electrothermal characteristics to uneven degradation are analysed in this thesis based on a multi-physics simulation model, and the external heat flux is found out to be a potential condition monitoring indicator. Based on such simulation results, a twostage neural network (NN) method is proposed to detect the uneven degradation levels: the electrothermal characteristics of the multi-chip system are represented by the first stage NNs and the second is trained to recognise the degradation levels. This condition monitoring method, validated by experimental results, can sensitively detect individual solder degradation in multi-chip power modules or multi-module power electronic systems, with accuracy more than 98%.

Then three optimizations in the aspects of the data labelling process, temperature sensing, and network architectures are proposed in this thesis to improve the feasibility of the condition monitoring method for field-deployment. With an inverter test rig established in this thesis, an equivalent emulation for uneven degradation is used to generate labelled data for the network training. Then, fibre Bragg grating (FBG) sensing technique is employed to measure multi-point temperature on the multi-chip converter system, which integrates multiple sensors on one fibre and achieves high measuring precision with immunity to electromagnetic interference (EMI). Based on the proposed condition monitoring method, the transient stages during operating condition varying, and more complex combinations of operating points and environment conditions can be easily generalised by utilising deep neural network (DNN) structure in practices.

A novel SCADA data-based condition monitoring method is proposed in this thesis to detect the early-stage fault of multi-chip in parallel converters in offshore wind turbines. The main proposed principle is to first design a proper network to represent the healthy state of the wind turbine and then use the network's prediction accuracy to differentiate faulty conditions from the healthy state. A DNN is designed and trained by an unsupervised representation learning approach. In order to improve the generalization on unbalanced SCADA data, a novel solution is proposed to optimise the cost function through limited valid data. Then an online learning process empowers the CM method for long-term real-time diagnosis. A demonstration case study on a group of offshore wind turbine validates the CM robustness and the ability to give alarm with few days ahead of actual converter fault.

### 1.7 THESIS OUTLINE

After this introductory chapter, the following five chapters present the five contributions, respectively.

Chapter 2 analyses the growth process of two kinds of solder defects in an IGBT power module: voids and cracks and provides guidelines to estimate the ageing initialization process of module solder layer under realistic conditions with stress cycles of relatively low amplitudes.

In Chapter 3 the reliability characteristics of the power converters utilised in fully rated and partially rated wind turbines are evaluated, together with the uneven degradation process of paralleled devices in a multi-chip power module under long-term wind profiles.

A potential method for detecting uneven solder layer degradation in a multi-chip-in-parallel system by heat-flux based condition monitoring using a two-stage neural network is proposed in Chapter 4.

Some additional measures for promoting the condition monitoring method proposed in Chapter 4 for field-deployment is discussed in Chapter 5, from the aspects of the data labelling process, temperature sensing, and network architectures.

An unsupervised data-driven condition monitoring scheme is proposed in Chapter 6 to detect the early-stage fault of wind turbine converters based on limited and unbalanced SCADA data. The conclusions from this research and suggestions for future work are presented in Chapter 7.

# 2 INITIAL AGEING DEVELOPMENT OF POWER MODULES UNDER REALISTIC STRESS CONDITIONS

# 2.1 INTRODUCTION

Realistic operating conditions in wind turbine converters, especially in the machine side, often send power modules to temperature cycling and age the packaging, particularly the die-attach solder layer [66]. However, the amplitude of temperature cycling is 30-40 °C or less which is much lower than the range used in accelerated lifetime testing (ALT). The lifetime model can no longer achieve an adequate accuracy of lifetime estimation on low temperature cycling range. It is possible because the ageing progress under low stress cycles may differentiate to ALT conditions. In order to improve the generalization of lifetime models in realistic conditions, it is essential to investigate the slow ageing development under such low amplitude stress cycles. Some research showed the non-negligible effect of the low stress cycles on aged modules [20, 68], but how the module may age from the very beginning is still unclear.

This chapter explores the initiation of a die-attach solder crack and further development under realistic stress conditions with low amplitude temperature cycling. From microscopic examination [69], rapid stress cycles mainly age the die-attach solder layer, because it is closer to semiconductor chips and thus suffers higher thermomechanical stresses than the baseplate solder layer. Microscale voids inevitably exist in the solder layer because of the lack of flux-in-few activities in the material, large solder pad areas, and uneven temperature during reflow [70-73]. Figure 2.1 shows a Computed Tomography (CT) image of the dieattach solder layer in a brand-new IGBT. The percentage of solder porosity is controlled below 3% as required by manufacturing standards, but voids are still visible. The voids can grow in long-term operation [74], and it is necessary to study the characteristics of such development.



Figure 2.1: CT image showing voids in IGBT die-attach solder layer

Besides, as the void grows in size, the curvature reduces attenuating the stress. At first glance, the initial defect would not be consequential because of this self-checking mechanism. However, a void inside the solder layer may develop into a crack at the boundary with the IGBT chip, in which case further progress of damage may not reduce the stress concentration and inelastic strain may increase. The purpose of this chapter is to model and illustrate the dynamism of such development.

Therefore, this chapter establishes a 2D finite element analysis (FEA) model to analyse the thermomechanical stress distribution around initial defects: the void and crack in the dieattach solder layer, and their further development with physics-of-failure modelling. CT scanning is employed to evaluate the effects of the low amplitude stress cycles and verify the FEA model. Different patterns of initial defects are analysed in this chapter. It is shown that a crack formed on the solder-chip boundary is identified as the most critical situation because the stress does not relieve with the crack growth. The growth of a crack is traced and compared with numerical prediction. In general, the void has a smooth boundary in a round shape locating inside solder material, while the crack has an acute angle at its tip and usually grows along the solder boundary splitting the material, as shown in Figure 1.4. The formation of a crack is attributed to the transition from a void in the solder internal or on the chip-solder boundary. The chapter is organised as follows: CT scanning and power cycling test are described in Section 2.2, and CT scanning results are presented in Section 2.3. A 2D symmetrical FEA model for fatigue stress evaluation is presented in Section 2.4, which is verified by transient thermal impedance against the datasheet. Section 2.5 and 2.6 analyse the thermomechanical stress behaviour of the solder layer, including the growth of voids and cracks, respectively. A lifetime prediction of an IGBT module in a grid application is presented in Section 2.8, and finally, Section 2.9 concludes this chapter.

## 2.2 EXPERIMENT TECHNIQUES

#### 2.2.1 CT analysis

This chapter focuses on the mechanism of crack formation and further development under low amplitude stress cycles which would be experienced in realistic operating conditions. CT scanning is employed in this study to inspect the microstructure of the solder layer and measure the dimensions of the voids because the voids will not initially affect the macroscopic parameters such as the junction-to-case thermal resistance  $R_{th,j-c}$ . X-ray CT utilises radiation that loses some of its energy after penetrating any material. The attenuation of the radiation depends on the property of the material and the dimension of the device under test (DUT). The data in this study is collected in the form of radiographs that are then reconstructed by filtered back projection (FBP) to create a 3-dimensional model. The reconstructed model provides information about the internal and external geometry of the DUT and can identify any defects larger than the resolution [75-77]. As shown in Figure 2.2, Zeiss Xradia 520 is the CT scanner used in this study and can achieve voxel resolution of 7.8  $\mu$ m.

The IGBT power modules used in this study is a single-chip half-bridge power module with a rating of 1200V/50A, which consists of one IGBT and one diode in each leg, as shown in Figure 2.3. The semiconductor chips are soldered by 120 µm of SnAg3Cu0.5 (SAC305) solder paste on direct bonded copper (DBC, Copper/Al<sub>2</sub>O<sub>3</sub>/Copper) board.



Figure 2.2: (a) The Zeiss Xradia 520 and (b) the CT scanner test rig [75]



Figure 2.3: The IGBT power module, (a) sectional side and (b) packaging layout in the opened module

#### 2.2.2 Power cycling test

Accelerated lifetime testing can quickly initiate fault development [78]. In order to analyse the initial development of voids, in this chapter, the realistic stress condition is emulated by 35,000 cycles of the power cycling with junction temperature swing  $\Delta T_j=17.5$  °C and mean junction temperature  $T_{j,mean}=68$  °C. In WT converter the junction temperature swing of devices is from around few degrees up to 40 °C, which will be presented in next chapter. The temperature profile for power cycling is selected as a representation for such conditions in WT application but also limited by the control accuracy of the test rig using a TSEP method, i.e. the  $V_{ce}$  with a precisely controlled 100 mA collector current. The ageing process is to be examined by CT scanning. Regarding to the crack, once a crack has been generated, its growth can be monitored by the junction-to-case thermal resistance during power cycling. In addition to the condition of  $\Delta T_j$ =17.5°C and  $T_{j,mean}$ =68 °C, the test results of power cycling with other patterns of  $\Delta T_j$  and  $T_{j,mean}$  are used to evaluate the crack growth under low amplitude stress cycles. Furthermore, all results of CT and power cycling tests will be used to validate an FEA model later in this chapter.

Figure 2.4 shows the power cycling test rig and circuit diagram. IGBT modules are connected in series and are permanently gated-on by individual gate drivers. The heating current is passed through a control switch to heat the devices, during which the on-state voltage  $V_{ce}$  is measured to monitor any bond-wire failure [78]. When the heating current is switched off, the devices enter the cooling phase; the  $R_{th,j-c}$  is then measured by sensing the junction temperature, the case temperature and power loss before the cooling phase [79].  $R_{th,j-c}$  and  $V_{ce}$  are normalized with respect to their initial values ( $R_{th0}$  and  $V_{ce0}$ ) of the device's brand-new state and recorded during tests.



Figure 2.4: (a) power cycling test circuit and (b) test rig

The  $R_{th,j-c}$  and  $V_{ce}$  results from two groups of temperature profiles are shown in Figure 2.5 (a) and (b) respectively, performed by my colleagues W. Lai and etc from the University of Warwick and Chongqing University [68]. For results shown in Figure 2.5 (a), the IGBTs are first ALT tested with  $\Delta T_j = 118^{\circ}$ C and  $T_{j,mean} = 85^{\circ}$ C, then low stress cycles are applied on the module with  $\Delta T_j = 40^{\circ}$ C and  $T_{j,mean} = 74^{\circ}$ C; this is used as a reference for modelling the crack growth in this study. According to references [14, 80, 81], the crack growing in die-attach is the main cause of solder degradation which causes the thermal resistance to increase. These studies support the idea that macroscopic crack growth can be measured by  $R_{th,j-c}$ . However, if a crack has not been created in a module under under ALT with  $\Delta T_j = 110^{\circ}$ C and  $T_{j,mean} = 81.5^{\circ}$ C, the low amplitude stress cycles with  $\Delta T_j = 40.5^{\circ}$ C and  $T_{j,mean} = 60.1^{\circ}$ C apparently do not have a measurable effect on  $R_{th,j-c}$ , as shown in Figure 2.5 (b). It is essential to understand how a new module develops into an aged one. This chapter is to model how the stress cycles will drive the micro voids and cracks to grow before their effect can be externally measured.



Figure 2.5: Multi-stage power cycling experiment results [68].

## 2.3 CT SCANNING RESULTS

#### 2.3.1 Statistical analysis of voids

Figure 2.6 (a) and (b) show two top-down scans taken at the middle height of the IGBT's die-attach solder layer, the first on a new module while the second after 35,000 cycles of power cycling with junction temperature swing  $\Delta T_j=17.5^{\circ}$ C and mean junction temperature

 $T_{j,mean}$ =68 °C. It is possible to locate the individual voids in each scan and track their changes in terms of spherical area and volume which are computed using the Avizo FEI (Thermo Fisher Scientific, 2018) software by processing multiple scans taken on the same sample along the thickness direction [76].

CT scanning data is affected by the voxel size measurement error [82]. In order to reduce the effect, every object smaller than the connected voxel is considered as noise and not identified from the scanning results. Figure 2.7 presents a visual comparison between the images before and after power cycling test. The voids are colour-coded according to their volume size on the logarithmic scale: lighter blue ( $0.000002 - 0.0001 \text{ mm}^3$ ), darker blue ( $0.0001 - 0.001 \text{ mm}^3$ ), red ( $0.001 - 0.01 \text{ mm}^3$ ) and green (> $0.01 \text{ mm}^3$ ). The initial voids have grown after temperature cycling even with a low amplitude temperature variation and short cycling period. The volume of the arrow-pointed void has grown from the range of darker blue into red, while several new voids also appear in the lighter blue range.



Figure 2.6: CT images show voids in the solder layer (a) before and (b) after power cycling



Figure 2.7: Colour-coded void sizes (a) before and (b) after power cycling

There are several hundred identifiable voids distributed in the die-attach solder layer for each of the IGBT samples. The structure of the void can be estimated by a hypothetical inference, based on the void's spherical area and volume acquired from the Avizo FEI software. If the voids are assumed as perfect spheres, the radius of each void can be calculated based on either the spherical area or volume value, and such two calculated value should be the same. Then, the obtained radius values are similar for small voids as shown in Figure 2.8 which suggests that smaller voids are more likely to be spheres, while the larger ones infer to be ellipsoids with the radius value calculated from the area being greater than that from volume. This is mostly because the larger voids reach the chip-solder border, as shown in Figure 2.9, considering the only 120 µm thickness of die-attach solder layer.



Figure 2.8: Radii of voids in an IGBT die-attach solder layer calculated from area and volume



(a) (b)

Figure 2.9: CT images of (a) small void and (b) large void

The radii calculated from volume is used as the metric to analyse the void in this chapter. It can be seen that before power cycling, the radii concentrate around 20  $\mu$ m, but this extends to around 30  $\mu$ m afterwards, as shown in Figure 2.10. The growths of void volume and spherical area (expressed as  $(V_{after}-V_{before})/V_{before}x100$  for volume as an example) are shown in Figure 2.11. The negative growth rate in measurement is attributed to errors in scan with limited resolution leading to inaccuracies in both position and dimension measurements before and after scanning. It can be seen that almost every void has somewhat grown after 35,000 cycles of low power cycling. Table 2.1 shows that the smaller voids grow faster than the larger ones. From a thermomechanical point of view, a smaller void with a larger curvature will induce a higher stress concentration in its surrounding area. However, as the void grows in size, the curvature reduces, attenuating the stresses. This may cause the growth rate of a void to decrease as its radius increases, which will be investigated further by FEA simulation in Section 2.5.



Figure 2.10: Distribution of void radii



Figure 2.11: Void growth after 35,000 low  $\Delta T_j$  cycles

| Average area<br>increase (%) | Average volume<br>increase (%)              |
|------------------------------|---------------------------------------------|
| 243                          | 449.4                                       |
| 73.9                         | 117.8                                       |
| 38.5                         | 60.5                                        |
| 28.1                         | 43.4                                        |
| 8.1                          | 10.2                                        |
|                              | increase (%)<br>243<br>73.9<br>38.5<br>28.1 |

Table 2.1: Average percentage growth of void area and volume

The analysis clearly shows that the void growth occurs under the low amplitude stress cycles, although this apparently does not affect the measurable junction-to-case thermal resistance, as indicated in Figure 2.5(b) if a crack has not been generated. It should be mentioned that the largest voids are likely to be adjacent to the silicon chip or the direct-bonded-copper (DBC) as the thickness of the solder layer is only 120  $\mu$ m. During the scanning process, noise reduction will cause the loss of some information near the boundary, leading to underestimating the size of the largest voids, as shown in the sectional scanning in Figure 2.9.

#### 2.3.2 Void growth and crack initialisation

The voids inside the die-attach solder layer will gradually grow larger as discussed above, and one example is described here. To show the position of the void inside the solder and how its size changes, the 3D solder layer image is cut-away vertically in X and Y directions through the centre point of the void. The quarter section view is shown in Figure 2.12 (a) and (e), before and after power cycling. The boundary of the void is outlined in green and red, respectively. The sectional views in three orthographic coordinate surfaces are shown in Figure 2.12 (b)/(c)/(d) and Figure 2.12. (f)/(g)/(h), before and after power cycling. It can be seen that the internal void has indeed grown. Figure 2.13 shows the situation when the initial void is close to the chip-solder boundary. The green/red outlines again mark the chip-solder boundary to compare the size before and after power cycling. The power cycling made the void to touch the chip, forming a crack on the solder boundary. Other images show the change from different perspectives.

2 - Initial Aging Development of Power Modules under Realistic Stress Conditions



(a)



(e)



(f)



\_\_\_\_20um



(g)



Figure 2.12: Development of an internal void, (a)/(b)/(c)/(d) before and (e)/(f)/(g)/(h) after power cycling

2 – Initial Aging Development of Power Modules under Realistic Stress Conditions



Figure 2.13: Development of a void close to chip-solder boundary, (a)/(b)/(c)/(d) before and (e)/(f)/(g)/(h) after power cycling

Then, one more critical case in which a void is already on the boundary, and a crack is initially present. A small crack is shown in Figure 2.14. Comparing the green and red outlines, it develops towards the solder after power cycling, causing extra space between the chip and

the solder layer. The distance between the green and red outlines is larger in the centre and reduces towards the edge, indicating the direction of the development from the centre towards the edge. This agrees with the analysis based on physics principles and will be used to extract a numerical model.



Figure 2.14: The development of a small crack on the chip-solder boundary, (a)/(b)/(c) before and (d)/(e)/(f) after power cycling

# 2.4 FEA MODEL

Finite element analysis (FEA) is used to study the thermomechanical behaviour involved in the fatigue process [83]. The model is validated and then used to investigate the stress around

the solder defects. The material properties and dimensions are shown in Table 2.2. The test "vehicle" is cut and transferred to a 2D symmetrical model implemented in COMSOL Multiphysics, as shown in Figure 2.15.

| Parameters                                   | Chip    | Solder  | DBC Cu | DBC Al <sub>2</sub> O <sub>3</sub> | Baseplate |
|----------------------------------------------|---------|---------|--------|------------------------------------|-----------|
| Area<br>(mm x mm)                            | 6.8×7.2 | 6.8×7.2 | 13×13  | 14×14                              | 15×15     |
| Thickness (mm)                               | 0.14    | 0.12    | 0.3    | 0.38                               | 3         |
| Coefficient of Thermal<br>Expansion (10-6/K) | 3       | 23      | 17     | 6.5                                | 17        |
| Young's Modulus (GPa)                        | 162     | 40      | 110    | 400                                | 110       |
| Poisson's Ratio                              | 0.28    | 0.4     | 0.35   | 0.22                               | 0.35      |
| Thermal Conductivity $(W/(m \times K))$      | 130     | 50      | 400    | 35                                 | 400       |
| Thermal Capacity<br>(J/(kg × K))             | 700     | 150     | 385    | 730                                | 385       |

Table 2.2: Packaging material parameters for FEA modelling

The simplification has been verified as more than 90% accurate compared to a 3D model for fatigue calculation [14, 19]. Because higher stresses are considered to concentrate around the edge of the solder layer at the solder-chip boundary, a finer mesh is adopted here (defined by points *A*, *B*, *C* and *D*) to evaluate with higher precision of stress and strain. The boundary conditions of FEA model are set as follows: the whole chip is a power source of heat distributed in the chip volume homogenously; the bottom of the DBC is fixed in normal displacement (roller fixed); convective heat flux of 5000 W/(m<sup>2</sup>·K) heat transfer coefficient is set on the baseplate bottom to emulate the heat dissipation of forced water cooling below, which can simulate a more realistic temperature distribution on baseplate than setting a constant temperature condition; the left side edge of the model is the symmetrical axis; the rest of the open surfaces are set as free-to-move without constraints, and they are also set as thermal insulation as shown in Figure 2.15.

The solder layer is modelled using the Anand's viscoplastic material model which is widely used to evaluate stresses involving strain and temperature effect. It accounts for the physics of strain-rate, strain hardening or softening characteristics, crystalline texture and evolution. Parameters of SAC305 solder in the Anand's model are given in Table 2.3 [84], where  $s_0$  is the initial deformation resistance, Q/R the ratio of activation energy to Boltzmann's constant,

A the pre-exponential factor,  $\xi$  the stress multiplier,  $m_0$  and  $\eta$  the strain rate sensitivity of stress and strain rate sensitivity of the saturation value,  $h_0$  the hardening/softening constant, s the coefficient for the saturation value of deformation resistance and a the strain rate sensitivity of the hardening/softening.



Figure 2.15: (a) 2D symmetrical model and the boundary conditions and (b) mesh around a void of 10  $\mu$ m radius and a crack of 2  $\mu$ m length

| Parameters  | Value               | Parameters                  | Value   |
|-------------|---------------------|-----------------------------|---------|
| so(MPa)     | 12.41               | Q/R(K)                      | 9400    |
| $A(s^{-1})$ | $4.1 \times 10^{6}$ | ξ                           | 1.5     |
| $m_0$       | 0.303               | <i>h</i> <sub>0</sub> (MPa) | 1378.95 |
| s(MPa)      | 13.79               | η                           | 0.07    |
| а           | 1.3                 | -                           | -       |

Table 2.3: Parameters definition of Anand's model

To evaluate the fatigue behaviour of the solder material and estimate the lifetime, a physicsbased Coffin-Manson failure model [85-87] is established in combination with the FEA. The relationship between the fatigue inelastic strain per cycle ( $\Delta \varepsilon_{in}$ ) and the number of cycles to failure (*N<sub>f</sub>*) is

$$\Delta \varepsilon_{in} N_f^{\alpha} = C \tag{2.1}$$

where  $\alpha$  and *C* are constants related to the solder material, which are 0.6659 and 2.8530, respectively in this study [88]. The growth of a void or crack is simulated by removing the FEA mesh elements at points where material failure has been predicted. The elements are dynamically ranked according to their accumulated lifetime consumption. The fatigue inelastic strain is extracted from FEA. This determines the number of cycles to failure as above, which subsequently determines the lifetime consumed by the present simulated cycle.

The mean value of the temperature on the chip's top surface is taken as the junction temperature. The junction-to-case transient thermal impedance is then calculated. For an unaged module, the results agree with the IGBT transient thermal impedance extracted from the datasheet as shown in Figure 2.16.



Figure 2.16: Transient thermal impedance from simulation model and product datasheet

## 2.5 VOID DEFECTS

The CT scanning results have shown that solder voids and cracks are two types of initial defects, which can be differentiated by their shape as mentioned in Section 2.1. However, a

void may develop into a crack. The development is investigated here by simulation of two scenarios: a void inside the solder layer growing towards the border with the chip, and the growth of a small void already located at the boundary.

#### 2.5.1 Internal void

According to CT scanning, the void radii initially distribute from 7.8 µm to over 60 µm and mainly concentrate around 20 µm. To investigate the growth of a void under low amplitude stress cycles, the stress distribution around a void with a radius of 2, 5, 10, 20, 30, 40 or 50 µm is analysed in this section. Applied to the 1200V/50A chip is a pulsating power of  $1.4 \times 10^9$  W/m<sup>3</sup> in amplitude with a frequency of 0.1 Hz and mark space ratio of 50%, corresponding to a power cycling condition with  $\Delta T_i = 17.5$ °C and  $T_{i,mean} = 68$  °C.

The von Mises stress of a 20  $\mu$ m radius void reaches the maximum value at the highest temperature during the power cycling, as shown in Figure 2.17 (a). The stress concentrates on the top and bottom edges of the void, with the top edge being higher. After about ten cycles, the high stress at the top edge of the void will cause an inelastic strain accumulation enough to fail the local solder material, as shown in Figure 2.17 (b). The void is likely to grow vertically due to the damage. Eventually, the void may touch the chip-solder border.



Figure 2.17: (a) stress and (b) accumulated inelastic strain distribution around a 20  $\mu m$  radius void

The maximum stress and fatigue inelastic strain accumulated in one cycle are shown in Figure 2.18 for different initial void sizes. A void with 10  $\mu$ m radius would suffer the largest stress and accumulate the most inelastic strain. Although a smaller round void has a lower curvature radius, the smaller interspace constrains its deformation leading to lower levels of

stress and strain accumulation. In larger voids, the opposite is observed – larger curvature radius and because of that decreased stress concentration. With both effects considered, voids of 10  $\mu$ m in radius correspond to the most critical combination of the curvature and internal space to produce the highest growth rate in power cycling. It must be pointed out that, as shown in Figure 2.17, the growth of an initial void may change the shape, and as a result, the maximum local curvature is to reduce at a slower rate than what is implied by an equivalent radius.



Figure 2.18: Stress and fatigue inelastic strain on the top edge of voids

#### 2.5.2 Void at solder-to-chip boundary

An internal void can grow enough to reach the solder boundary with respect to the chip. In addition, some initial voids may be close to or already on the boundary. The development scenario: a void attached to the solder-chip boundary, is investigated in this section.

The attached void position is characterised by the angle between the tangent line of the void and the chip-solder borderline. A 10  $\mu$ m void attached with 60° is shown as an example in Figure 2.19. It is the most critical fatigue stress condition from the crack point of view, which be presented further. Although the bottom side has a larger area with stress concentration, the most stressed point is the corner as zoomed-in in Figure 2.19 (b). The high stress causes the inelastic strain to accumulate at the corner, concentrating in a range of about 1 $\mu$ m. This causes a crack to grow from here towards the centre of the solder layer, as shown in Figure 2.20.

The maximum stress and fatigue inelastic strain at the corner of a 10  $\mu$ m radius and a 30  $\mu$ m radius voids attached in different angular positions are shown in Figure 2.21. High levels of

stress and fatigue strain occur with an angle of  $60^{\circ}$ . It should be mentioned that the 30  $\mu$ m void always causes higher fatigue because there is a larger deformation space.



Figure 2.19: Stress distribution (a) overall and (b) at the corner when a void of 10µm radius touches the chip-solder boundary



Figure 2.20: Accumulated inelastic strain distribution (a) overall and (b) at the corner when a void of 10µm radius touches the solder-chip boundary



Figure 2.21: (a) maximum stress and (b) fatigue inelastic strain for voids with different attached angles

# 2.6 CRACK DEFECTS

#### 2.6.1 Crack initialisation

The crack initialisation is first analysed for a die-attach solder layer without any imperfection undergoing power cycling in the simulation. At the instant of maximum junction temperature, the stress distribution in the area of concern (defined in Figure 2.15) is shown in Figure 2.22. It can be seen that the stress concentrates at the corner and along the chip-solder interface, resulting in this area suffering from a higher level of fatigue inelastic strain. Subsequently, a crack could be initiated from the corner.



Figure 2.22: (a) stress distribution and (b) accumulated inelastic strain distribution in the solder layer without defects

Assuming that an initial crack is present on the solder-chip boundary, its further growth is examined here. The thermomechanical behaviour is analysed first for an initial crack at the edge of the solder-chip boundary. The width and length of this initial crack are assumed to be 10  $\mu$ m and 20  $\mu$ m respectively as shown in Figure 2.23 (a) which also shows the instantaneous stress distribution at the maximum junction temperature; the stress is concentrated at the tip of the crack and is 2.4 times higher than the stress of the corner of solder edge without a crack. Figure 2.23 (b) shows the fatigue inelastic strain accumulated in the area near the crack tip after 10 cycles, which will lead to further material failure. The crack will then grow along the boundary from the edge towards the centre of the solder layer.



Figure 2.23: (a) stress distribution and (b) accumulated inelastic strain distribution around a crack

An initial crack is also possible in other positions at the end of the solder layer. If the distance h to the solder-chip boundary is small enough, the solder material on the top side of the crack tip will follow the mechanical response of the silicon chip, as shown in Figure 2.24. This growth process has been reported in previous study[89]. The stress concentration is again clearly identified at the tip of the initial crack.



Figure 2.24: (a) stress distribution and (b) accumulated inelastic strain round an initial crack 10 µm away from the boundary

#### 2.6.2 Crack growth

Corresponding to the crack growth, the junction-to-case thermal resistance will increase, resulting in higher fatigue accumulated around the crack tip area under the same power loss profile. The relationship between the crack length and the thermal resistance is determined in the simulation model by adding cracks of different lengths along the solder-chip boundary and testing thermal resistance in the model. As the FEA model is 2D symmetrical, the crack

is simulated as growing from the left and right-side edges toward the solder layer centre. Here, the crack length ratio is defined as the ratio of a crack length on each side to the half value of the length of the entire die-attach solder layer. The thermal resistance increases with the crack length, as shown in Figure 2.25. This also contributes to the progressively increasing stress concentration and inelastic strain accumulated at the crack tip.



Figure 2.25: The relationship between thermal resistance and crack length

The change of the maximum stress and fatigue inelastic strain at the crack tip with the different crack lengths is shown in Figure 2.26, which also marks the corresponding thermal resistance increase. It can be seen that, without the initial solder layer defect, the fatigue stress is low. With an initial crack, the stress will concentrate at the tip, and this concentration will cause higher fatigue inelastic strain aggravated with the growing crack length. With long enough crack length, the junction-to-case thermal resistance will increase, leading to higher junction temperature, further intensifying the stress concentration. Thereby the crack tends to grow at a progressively increasing rate which could also be seen in the power cycling experimental results. This is a distinctively different feature to an internal void in the solder layer as a void has a negligible effect on the thermal resistance.

Besides, when the crack is developed very close to the boundary, e.g.  $h=1 \mu m$ , the stress concentration and the fatigue inelastic strain are similar to a crack along the boundary. Although the stress keeps increasing with the crack length, when the crack is further away from the boundary, e.g. h=10 or 50  $\mu m$ , it would not be critical enough to power module reliability. This means that cracks initialized at the corner and growing along the chip-solder boundary would be the most severe cases, and the time taken for this process to develop under in-service conditions will be estimated later with the aim to evaluate the device lifetime.



Figure 2.26: Stress and fatigue inelastic strain on the tip of crack with different lengths

# 2.7 EXPERIMENTAL VALIDATION

Regarding to the void, because the stress is not directly measurable in experiment, in order to validate the physics-of-failure model, four IGBTs are power cycled for 35,000 cycles under  $\Delta T_j=17.5$ °C,  $T_{j,mean}=68$  °C, the same as in the previous experiment and simulation. The die-attach solder is scanned, and the radii of the voids are measured before and after power cycling as described in Section 2.3. In both experiment and simulation, the radius of a void is defined as the distance from its geometric centre to the furthest edge. The void radius growth rate is the ratio of the void radius before and after power cycling, in  $\mu$ m over 35,000 cycles, is presented in Figure 2.27, showing a good correspondence between the measurement and simulation.



Figure 2.27: The growth rate of voids with different initial sizes

Then, the physics-of-failure model used to estimate the crack growth rate, the core of which is the calculation of the inelastic strain, can be validated by measuring the change of the junction-to-case thermal resistance. It indicates the average growth of the crack in each stress cycle, while the CT scanning cannot provide such resolution. Power cycling tests are conducted on two modules of the same type with different junction temperature profiles to cause different crack growth rates, as introduced in Section 2.2.2. Both modules are first subject to unrealistically high stresses to produce measurable initial cracks leading to a slight thermal resistance rise. In the later stages, the devices are under smaller temperature fluctuations similar to those in actual operation. The purpose is to cause the crack to gradually increase and monitor such growth. Experimental results extracted from these tests include a total of seven stages. The monitored thermal resistance of the first module is shown in Figure 2.28. The number of cycles needed to cause 1  $\mu$ m growth of the crack length in the power cycling experiment is used to define the crack growth rate, given in the column with the title of "Number of cycles". The fatigue inelastic strain at the crack tip, corresponding to each stage, is calculated in the simulation, and the results are also given in the last column of Table 2.4.



Figure 2.28: Multi-stage power cycling test

The number of cycles necessary to cause 1  $\mu$ m crack growth in the power cycling test is plotted against the corresponding fatigue inelastic strain from FEA simulation in Figure 2.29 (marked as "O"). Figure 2.29 also shows the physics-of-failure modelling results for voids and cracks (marked as "\*" and " $\Delta$ " respectively). They are further compared with the results of accelerated lifetime tests of the same solder material [88, 90]. Although the overall stress condition in the analysis is relatively mild in terms of  $\Delta T_j=17.5$  °C and  $T_{j,mean}=68$  °C, stress concentration can raise the fatigue inelastic strain at local areas around voids and cracks to a similar level of fatigue as in the accelerated test of the material. This is the basis of model validation, and a good correlation is observed in Figure 2.29.

| Power<br>module<br>(Substage) |   |      | erature<br>ofile<br><i>T<sub>j,mean</sub></i><br>(°C) | Initial<br>R <sub>th,j-c</sub> | Initial<br>crack<br>length<br>(mm) | Number<br>of cycles<br>(/1µm) | ∆inelastic strain<br>obtained from<br>FEA model |
|-------------------------------|---|------|-------------------------------------------------------|--------------------------------|------------------------------------|-------------------------------|-------------------------------------------------|
|                               | 2 | 40   | 74                                                    | 1.06                           | 0.41                               | 1330                          | 1.82×10 <sup>-02</sup>                          |
| Ι                             | 3 | 28   | 54                                                    | 1.2                            | 0.78                               | 3985                          | 1.00×10 <sup>-02</sup>                          |
|                               | 4 | 34   | 58                                                    | 1.24                           | 0.85                               | 1850                          | 1.59×10 <sup>-02</sup>                          |
|                               | 2 | 40   | 74                                                    | 1.17                           | 0.72                               | 1154                          | 2.00×10 <sup>-02</sup>                          |
| II                            | 3 | 32.8 | 56                                                    | 1.39                           | 1.08                               | 2110                          | 1.45×10 <sup>-02</sup>                          |
|                               | 4 | 40.7 | 74                                                    | 1.41                           | 1.10                               | 502                           | 5.81×10 <sup>-02</sup>                          |
|                               | 5 | 41   | 60.5                                                  | 1.8                            | 1.50                               | 568                           | 3.01×10 <sup>-02</sup>                          |

Table 2.4: Measured crack growth rate results and computed fatigue strain



Figure 2.29: Solder material lifetime results from physics-of-failure modelling and experiments

## 2.8 LONG-TERM LIFETIME ESTIMATION

The lifetime of an IGBT module in service can now be evaluated based on the modelling of the solder layer with initial defects. Power cycling is conducted with low thermomechanical stresses while the crack initiation process seems to take a major part of the total lifetime – assuming that there is no initial crack, and the initial voids are all within the solder layer. In accordance with production standards, power devices are often claimed to have an operating lifetime of more than two decades. Because this estimate is based on a specific power cycling condition, it should be re-evaluated relative to the practical application in which the device is used. A typical 'soft-open-point' (SOP) converter designed for an 11kV grid is shown in Figure 2.30. It is installed in place of normally-open points in electrical power distribution networks to provide active power flow control, reactive power compensation, voltage regulation, as well as fast fault isolation [91]. The operating condition of the power modules utilized in this B2B-VSC (back-to-back voltage source converter) will be determined by the power load conditions on Feeders 1 and 2. When the power level through the B2B-VSC fluctuates, it will cause the junction temperature of the power module to swing accordingly. The loading of the modelled Irish distribution network fluctuates 4 times an hour as recorded in [92]. It is assumed that the profile temperature variation is still  $\Delta T_i=17.5^{\circ}$ C and  $T_{i,mean}$  = 68°C, depending on the cooling design. It is also assumed that the same modules as before are paralleled to give the required power rating. Then each ageing cycle will be translated to a time span with respect to the fluctuation frequency.



Figure 2.30: Diagram of a typical 11kV grid with SOP

Based on the analysis of the fatigue behaviour of a single void, the time for it to grow from 2  $\mu$ m to 60  $\mu$ m (radius) at a 1  $\mu$ m growth step is calculated in FEA model as shown in Figure 2.31. The fatigue inelastic strain is still defined in terms of the junction temperature, i.e.  $T_{j,mean}$ =68°C and  $\Delta T_j$ =17.5°C occurring four cycles per hour. This is an optimistic estimation as the void tends to develop vertically, reaching the chip-solder boundary more quickly to form a growing crack. Depending on the initial void position, it can develop into a critical size smaller than 60  $\mu$ m radius. Nevertheless, Figure 2.31 provides a tool to estimate the effect of an initial void.



Figure 2.31: The growth of a single void

Once a crack has been formed on the solder boundary, it will grow at a progressively faster rate, because the fatigue inelastic strain increases with the crack length. Figure 2.32 shows the increase of  $R_{th,j-c}$ . The time scale from the initial crack to device failure (1.2 times normalised  $R_{th,j-c}$ ) can be under 25 years. Considered are three types of conditions to form an initial crack. Firstly, in a hypothetical solder layer without any defects, the crack will be generated from the edge of the solder layer then grow towards the centre. It will take about ninety years to initiate and become critical. Secondly, a single void located in the middle of the solder layer will take five decades to reach the solder-chip boundary and then form a crack that grows towards the centre. Thirdly, the void is already attached to the boundary and can easily generate a crack as the void corners are highly stressed. The solder layer with a larger attached void has a shorter lifetime because there are larger stress concentration and more deformation space for inelastic strain accumulated around the tip of the initiated crack.



Figure 2.32: Power device lifetime estimation considering solder layer defects

## 2.9 SUMMARY

This chapter analyses the growth process of two kinds of solder defects in an IGBT power module: voids and cracks and provides guidelines to estimate the lifetime of the module under conditions which are found in practical applications with stress cycles of relatively low amplitudes from the beginning. The defects are detected and measured by CT scanning before and after power cycling. An FEA model is established to evaluate the stress concentration and fatigue inelastic strain distribution in the solder layer. The mechanisms and characteristics for the defects to grow are investigated, with the aid of the FEA model. Experiments validate key results, i.e. the inelastic fatigue strain calculation and subsequent lifetime modelling. While the device works under low amplitude of temperature cycling in normal operation, solder material near the void and crack still suffers from high fatigue inelastic strain due to stress concentration. It is found that the voids attached to the solderchip boundary would have a critical effect on the reliability of the power module due to the formation of a crack. The findings may help to improve the packaging process during manufacturing in the future. The study shows that a complete lifetime evaluation model should include the initial distribution of the defects and the stress characteristics of the application. Although it is not possible to establish one model that fits all, this work aims to improve the understanding of the packaging defect growth and help engineers carry out educated estimation in system designs using power modules.

# 3 LIFETIME AND UNEVEN DEGRADATION OF MULTI-CHIP POWER MODULES IN WIND TURBINES

## 3.1 INTRODUCTION

The previous chapter analyses the initial solder degradation of single-chip power modules under realistic stress conditions. This chapter then investigates the lifetime of high-power modules under the operating conditions of wind turbines and the uneven degradation of the multi-chip power module.

As more wind turbines are being deployed offshore where access for maintenance is more restricted, there is an increasing demand for reliability. This subsequently requires a more accurate understanding of the reliability characteristics of the power electronic converters in modern wind turbines. The annual average failure rate of a wind power converter, approximately 0.6/yr, is one of the highest among all the components in a modern variable speed wind turbine system [93]. This is partially because the power semiconductor devices can fail under constant thermomechanical fatigue stresses when the turbine works in complex operational and environmental conditions [94]. The two main commercial wind turbine types, doubly fed induction generator (DFIG) and permanent magnet synchronous generator (PMSG), would work in a wide range of wind speed from about 4 m/s to 20 m/s

[95]. These conditions cause the machine side power converter to work in a wide frequency range. In temperature variations corresponding to the fundamental frequency cycles of converter operation, the mismatch of coefficients of thermal expansion (CTE) between materials in the power module packaging structure would cause thermomechanical fatigue stresses, leading to the most common device failure: solder layer degradation [2]. In order to develop maintenance strategies for cost reduction, it is necessary to find out the lifetime consumption of the power converter in the whole wind speed range.

The junction temperature of a power semiconductor device is the key indicator widely used for reliability evaluation and lifetime estimation [96]. An effective and feasible method to extract the junction temperature is solving the thermal network of a power module which is to dissipate the certain power losses under certain operating conditions. In this chapter, the electrical characteristics of a multi-chip power module are introduced, and the module is modelled by FEA. Then, two electrothermal models of fully and partially rated converter wind turbines are established to investigate the junction temperature profiles of the power devices under the whole wind speed range. The lifetime consumptions of the power devices are estimated by using an analytical lifetime model. Moreover, the uneven degradation among paralleled chips in the high-power module is evaluated under long-term operating conditions.

## 3.2 MULTI-CHIP POWER MODULE

The converters in wind turbines considered here are all two-level three-phase back-to-back (B2B) converters, while the half-bridge power module from Infineon (FF1000R17IE4, 1700V/1000A) is used to build up the converters. The module has 6 chips in parallel for each IGBT switchers and freewheeling diodes. Two diode and two IGBT dies are soldered by SAC305 on a direct bond copper (DBC, Cu/Al<sub>2</sub>O<sub>3</sub>/Cu) board, forming a half-bridge section. Six sections are soldered on a common Cu baseplate and paralleled to make the module, as shown in Figure 3.1.





Figure 3.1: Multi-chip power module (a) with packaging and opened module

## 3.2.1 Device electrical characteristics

Figure 3.2 and Figure 3.3 show the output and switching characteristics of the IGBT and the forward and reverse recovery characteristics of the diode under different junction temperatures. These are extracted from the product datasheets. As the wind speed influences the working condition of the B2B power converter, these electrical characteristics of the power devices are extracted for electro-thermal modelling of the wind turbines.



Figure 3.2: (a) Output and (b) switching characteristics of the IGBT at different junction temperatures



Figure 3.3: (a) Forward characteristic and (b) reverse recovery energy of the diode at different junction temperatures

The conduction and switching losses of IGBTs and diodes are the causes of the junction temperature variations in power modules. Extracting the curves at different temperatures from the output and forward-on characteristics of the IGBT and diode respectively [97], the conduction losses ( $P_{cond,S}$  and  $P_{cond,D}$ ) during each switching period are

$$P_{cond,S} = \left(R_{on,S}i_c^2 + V_{ce}i_c\right)d\tag{3.1}$$

$$P_{cond,D} = \left(R_{on,D}i_{D}^{2} + V_{F}i_{D}\right)\left(1 - d\right)$$
(3.2)

where  $R_{on,S}$  and  $R_{on,D}$  are the on-state resistance of an IGBT and a diode respectively,  $V_{ce}$  is the saturation voltage of the IGBT,  $V_F$  is the turn-on voltage of the diode,  $i_c$  is the converter output phase current and d is the switch duty cycle. The effect of power factor on power loss calculated is included by using the duty cycle to represent the current flow through IGBT and diode. The on-state resistances are dependent of the junction temperature. Then the turnon and turn-off losses of an IGBT ( $P_{ton}$  and  $P_{toff}$ ) can be respectively expressed as

$$P_{ton} = \left(a_{on}i_{c}^{3} + b_{on}i_{c}^{2} + c_{on}i_{c}\right)\frac{V_{dc}}{V_{rated}}f_{sw}\rho_{ton}$$
(3.3)

$$P_{toff} = \left(a_{on}i_c^3 + b_{on}i_c^2 + c_{on}i_c\right)\frac{V_{dc}}{V_{rated}}f_{sw}\rho_{toff}$$
(3.4)

where  $a_{on}$ ,  $b_{on}$ ,  $c_{on}$  and  $a_{off}$ ,  $b_{off}$ ,  $c_{off}$  are fitting constants from the datasheet;  $V_{dc}$  is the DC link voltage, and  $V_{rated}$  is the rated value in the datasheet;  $f_{sw}$  is the switching frequency and  $\rho_{ton}$ ,  $\rho_{toff}$  are temperature dependency coefficients. The calculation of the reverse recovery power loss for a diode is similar to the above process for an IGBT but the first item in bracket is the reverse recovery energy indicated in product datasheet.

#### 3.2.2 Finite element analysis

In this section, the thermal behaviour of the multi-chip high-power module is investigated using FEA. The packaging material properties for FEA modelling is shown in Table 3.1. As shown in Figure 3.4, the FEA model includes the epoxy resin case filled with silicone gel for insulation, rigidity, and moisture ingress prevention. The effect of the encapsulation material on the internal heat dissipation is considered in the simulation. The baseplate is mounted on a four-pipe water-cooled heatsink, Hi-Contact 416601, and the heatsink is full filled by the flowing water of a constant flow rate, which realistically characterises the external thermal behaviour of the power module system.

| rande errer a mensking maneral brokennes rer i zir mensking  |                            |                                      |               |                                                     |                                 |                     |
|--------------------------------------------------------------|----------------------------|--------------------------------------|---------------|-----------------------------------------------------|---------------------------------|---------------------|
| Parameter                                                    | Silicon dies<br>IGBT/Diode | Die-<br>attach<br>solder<br>(SAC305) | DBC<br>copper | DBC<br>ceramic<br>(Al <sub>2</sub> O <sub>3</sub> ) | Baseplate<br>solder<br>(SAC305) | Baseplate<br>copper |
| Length (mm)<br>(x-direction)                                 | 13.8/9                     | 13.8/9                               | 38            | 39                                                  | 38                              | 246                 |
| Width (mm)<br>(y-direction)                                  | 13.8/9                     | 13.8/9                               | 52            | 53                                                  | 52                              | 86                  |
| Thickness<br>(mm)<br>(z-direction)                           | 0.25                       | 0.1                                  | 0.3           | 0.38                                                | 0.1                             | 3                   |
| Thermal<br>Conductivity<br>(W/(m × K))                       | 130                        | 50                                   | 400           | 35                                                  | 50                              | 400                 |
| Coefficient of<br>Thermal<br>Expansion (10 <sup>-6</sup> /K) | 3                          | 23                                   | 17            | 6.5                                                 | 23                              | 17                  |
| Thermal<br>Capacity<br>(J/(kg × K))                          | 700                        | 150                                  | 385           | 730                                                 | 150                             | 385                 |

Table 3.1: Packaging material properties for FEA modelling



Figure 3.4: FEA model of a multi-chip module, (a) opened module and (b) on the watercooled heatsink

The FEA model uses the Heat Transfer in Solid and Laminar Flow tools in COMSOL to characterize the thermal feature. The external surfaces of the module in contact with the ambient are set with a  $200W/(m^2 \times K)$  outflow convective rate [98]. The heatsink pipe is filled with water at a flow rate of 6300ml/min. The ambient and inlet water temperatures are both set at 35°C. The chip junction temperature is defined as the mean value on the chip top surface. As the module is screw mounted on the heatsink, the thermal expansion of materials is not considered here to reduce computation time.

The half-bridge module is simulated as working in a 3-phase inverter with a DC link voltage of 1000V, supplying an inductive load: 50Hz, 690V, 600A, with a PWM switching frequency of 2550Hz. The load condition and power factor are different from the generator application but will not affect the thermal performance of the module, and this is more convenience to be experimentally tested in next chapter. The thermal behaviour of this module is calculated with the average power loss of each device. As shown in Figure 3.5 (a), sliced along the centreline of the diode in the third half-bridge section, the cross-sectional view of the model shows the temperature distribution inside the module in Figure 3.5 (b).

The silicone gel surface and module case above the chip are heated up to 60°C. The simulation model is validated through the equivalent junction-to-case transient thermal impedance for the IGBT and diode, which are measured as a step response of the average chip temperature and agree well with the datasheet as shown in Figure 3.6.



Figure 3.5: (a) Chip surface temperature of multi-chip FEA model and (b) sectional view (from left) along centreline of the hottest diode



Figure 3.6: Thermal impedance of IGBT and diode from FEA and datasheet

#### 3.2.3 Thermal network of power modules

The thermal network of this power module is shown in Figure 3.7. A Foster thermal network consisting of the half bridge power module, as used in a wind turbine converter module, can be extracted from the product datasheet together with the thermal network of thermal interface material and heat sink shown in Table 3.2.

The thermal resistance of the diode is about twice of that of the IGBT. This is because the diode's chip area is one half of an IGBT chip, while the diode chip is thicker compared to the IGBT for the same voltage and current ratings. Here the ambient is considered as an infinite thermal capacitance, which can absorb all the heat dissipation at the end, with the temperature set at 35 °C. The junction temperature could be derived here and then used to evaluate the lifetime of the power converter in the next electrothermal modelling.



Figure 3.7: Thermal network of the power module

| Thermal<br>resistance (K/W) | Value  | Thermal<br>capacitance (J/K) | Value  |
|-----------------------------|--------|------------------------------|--------|
| Rijcl                       | 0.0008 | CijcI                        | 1      |
| $R_{ijc2}$                  | 0.0037 | $C_{ijc2}$                   | 3.5135 |
| Rijc3                       | 0.017  | Cijc3                        | 2.9412 |
| $R_{ijc4}$                  | 0.0025 | Cijc4                        | 240    |
| Rdjcl                       | 0.003  | Cdjc1                        | 0.2667 |
| $R_{djc2}$                  | 0.0115 | Cdjc2                        | 1.1304 |
| R <sub>djc</sub> 3          | 0.03   | Cdjc3                        | 1.6667 |
| $R_{djc4}$                  | 0.0035 | $C_{djc4}$                   | 171    |
| $R_{dch}$                   | 0.0119 | Cdch                         | 1.2764 |
| $R_{ha}$                    | 0.0017 | $C_{ha}$                     | -      |

Table 3.2: The parameters of the power module thermal network

## 3.3 ELECTROTHERMAL MODELLING

#### 3.3.1 Wind turbines

Fully and partially rated power converters are used permanent magnet synchronous generator (PMSG) and doubly-fed induction generator (DFIG) wind turbines, respectively, as shown in Figure 3.8. In a full rated converter system, the generator stator is directly connected to the grid through a back-to-back (B2B) converter. In a partially rated power converter system, the stator of the DFIG is directly connected to the grid while the rotor is connected to the power converter which can control the frequency and speed of the generator. The frequency and voltage of the grid side converter are the same as those of grid.

The turbine operation can usually be divided into maximum power point tracking (MPPT) and constant power with blade pitching stages. The cut-in wind speed of the full-scale wind turbine and the partial-scale wind turbine is 3 m/s and 2 m/s for DFIG and PMSG, and the

cut-out wind speed of these is 16 m/s and 18 m/s, respectively. The rated wind speeds of these are all 12 m/s, and the synchronous wind speed of DFIG is 8.6 m/s.



Figure 3.8: Fully rated power converter in (a) PMSG wind turbine and partial-scale power converter in (b) DFIG wind turbine

#### 3.3.2 Electro-thermal modelling

The wind turbine models are established in MATLAB/Simulink and PLECS for evaluating the reliability of the power converter from the electrical-thermal point of view. As the modelling method shown in Figure 3.9, the operational conditions are extracted from the two types of wind turbines respectively to the device behaviour model, which exports the power losses to the module's thermal network for estimating the junction temperature then receives the junction temperature as feedback. Finally, the junction temperature and the operational parameters are employed to estimate the lifetime of the power devices in wind converters.

The lifetime model used to estimate the lifetime of power device in the wind power converter is derived analytically from power cycling experimental results [99].

$$N_{f} = k\Delta T_{j}^{\beta_{1}} e^{\frac{\beta_{2}}{T_{j\min} + 273}} t_{on}^{\beta_{3}} I_{bw}^{\beta_{4}} V_{rated}^{\beta_{5}} D^{\beta_{6}}$$
(3.5)

where  $N_f$  is number of cycles to failure,  $\Delta T_j$  is the junction temperature fluctuation during a stress cycle,  $T_{j,min}$  is the minimum value of junction temperature,  $t_{on}$  is the time of heating stage,  $I_{bw}$  is current through each bond wire,  $V_{rated}$  is rated blocking voltage, D is the diameter of bond wire, and  $\beta_1 \sim \beta_6$  and k are constants based on experiment. This lifetime model is proposed based on the experiments of Infineon power modules and takes into account any failure mechanisms during the test. The parameters of this lifetime model are shown in Table 3.3.



Figure 3.9: The electrical-thermal modelling methodology for wind turbines

| Parameter | Value    | Parameter   | Value  |
|-----------|----------|-------------|--------|
| k         | 9.3x1014 | $\beta_{I}$ | -4.416 |
| $\beta_2$ | 1285     | $\beta_3$   | -0.463 |
| $\beta_4$ | -0.716   | $\beta_5$   | -0.761 |
| β6        | -0.5     | -           | -      |

Table 3.3: The parameters of Bayerer lifetime model

## 3.4 LIFETIME ESTIMATION

#### 3.4.1 Thermal profile

In order to investigate the reliability of power converter under the whole wind speed range, the electrical and thermal performance of each device integrated in the converter systems are extracted from the electrothermal modelling results when the wind turbines are working at a steady state at each single wind speed profile from cut-in speed to cut-out speed with a step of 1 m/s.

The machine side converters operate at much lower output frequency than the grid side converters, and this will push machine side devices into deeper temperature cycling both in DFIG and PMSG wind turbines based on the results from electro-thermal modelling as shown in Figure 3.10. On the machine side, diodes will suffer from much larger temperature swings leading to large thermal fatigue compared to IGBTs. The devices in PMSG system suffer much more thermal stresses when the wind speed increasing and then maintain at a high level during constant power with blade pitching stages with 12m/s rated wind speed. It is more obvious that the thermomechanical fatigue stresses on machine side devices are highest near synchronous speed in DFIG because more reactive power is provided by machine-side converter leading to lager current through diodes, then this will also raise the temperature of grid side devices due to thermal coupling effect on the heatsink.



Figure 3.10: Temperature profiles of power devices in PMSG and DFIG converters in the whole wind speed

#### 3.4.2 Lifetime consumption

Based on the temperature profiles of power devices, the converter lifetime can be estimated from the former lifetime model. The end-to-failure lifetime can be derived by dividing the number of cycles to failure to the frequency of temperature cycling, which is same as the output frequency. The output frequency is also proportional to wind speed within the MPPT stage and kept constant during the constant power stage.

The lifetime of devices in the two types of wind turbines are calculated by (3.5), as shown in Figure 3.11, respectively. It can be seen that the lifetime of the PMSG machine side diode is much lower at the high wind speed range from 6 m/s because the machine side diode conducts more current to provide the reactive power for the generator, while under the low wind speed range the grid-side IGBT is more unreliable. The lifetime of the devices in PMSG system decline with the wind speed increasing while the synchronous speed point is the most critical issue in DFIG turbine system leading to the machine side diode has only a few years' lifetime as shown in Figure 3.11 (b).



Figure 3.11: Lifetime of devices in the power converter in (a) PMSG and (b) DFIG wind turbines

The estimated lifetimes of the devices are transferred to the consumption per hour as

$$LC_{hrs} = \frac{3600 \times f}{N_f} \tag{3.6}$$

where  $LC_{hrs}$  is the lifetime consumption per hour for a wind speed point, f is the frequency of fundamental current cycle, and  $N_f$  is the number of cycles to failure of the device under such wind speed calculated from (3.5). The results are shown in Figure 3.12 and Figure 3.13. The lifetime consumption per hour operation of the converter in PMSG system is more critical in high wind speed range while the machine-side devices in DFIG system will ageing more quickly at the synchronous speed points due to lager current through diodes for reactive power in machine side, which are in the same trend compared with the former thermal stress analysis.



Figure 3.12: The lifetime consumptions per hour of the (a) grid-side and (b) machine-side devices in PMSG system



Figure 3.13: The lifetime consumptions per hour of the (a) grid-side and (b) machine-side devices in DFIG system

The output power of the wind turbine is proportional to the cube of wind speed below the rated wind speed point. The lifetime consumption per MW-Hour power generated is estimated considering the power devices as two whole converters in each grid and machine side, whose number of cycles to failure is equal to the minimum value of the diode or the IGBT,

$$LC_{MWhrs} = \frac{3600 \times f}{\min[N_{f,IGBT}, N_{f,diode}] \cdot P_{hrs}}$$
(3.7)

where  $LC_{MWhrs}$  is the lifetime consumption per MW·hrs,  $N_{f,IGBT}$  and  $N_{f,diode}$  are the lifetime of the IGBT and diode, respectively, in either a machine or grid side converter, and  $P_{hrs}$  is the power generated per hour in MW. The results are shown in Figure 3.14. The lifetime consumptions of machine-side devices are much faster than the that in the grid-side whole wind speed range, while the high speed range and the synchronous speed point are still the critical issues for PMSG and DFIG turbine system respectively.



Figure 3.14: Lifetime of devices and lifetime consumption of power converter in PMSG and DFIG wind turbines

## 3.5 UNEVEN DEGRADATION

According to the above analysis, the machine side diode is the device that has the shortest lifetime because of low frequency temperature cycling. This section evaluates the uneven degradation of the paralleled diodes for the fully rated direct-drive turbine under long-term operating conditions. The evaluation considers the wind speed statistics extracted from an offshore wind farm SCADA for one year, as shown in Figure 3.15, while the wind speed distribution is counted into 1 m/s bin. It can be seen that the speed mostly ranges around from 5 m/s to 10 m/s. As the data sampling interval is 10 min in such SCADA, during each interval the converter is assumed to constantly work at such operating points of each recorded wind speed point. The ageing damage is calculated based on the statistical information of one-year wind speed.



Figure 3.15: The wind speed distribution of an offshore wind farm for one year

Because the module has six diode chips in parallel, to reduce the computing complexity, this study divides the module symmetrically and chooses one side of three chips as the target to analyse. Following the same procedure in Section 3.2.2, the thermal network of such three devices in the multi-chip power module is extracted from FEA simulation. Besides, the current profiles of the PMSG machine side diode at wind speed points are extracted from the previous simulation model. Then, such current and thermal network are used to build up an electrothermal model of the machine side paralleled diodes in MATLAB/Simulink. Thus, the mean and amplitude of the junction temperature of three paralleled diodes can be calculated individually at 11 operating points from 2 m/s to 12 m/s at a 1 m/s interval. The operating points between 12 m/s to 18 m/s are considered the same as the rated point 12m/s. According to (3.5), the number of cycles to failure under each point can be calculated, and the total damage  $D_a$  accumulated during one year can be derived as:

$$D_a = \sum_{ws=3}^{20} \frac{N_{ws}}{N_{f,ws}}$$
(3.8)

where  $w_s$  is the wind speed,  $N_{ws}$  the number of counts of each wind speed during one year,  $N_{f,ws}$  the number of cycles to failure under each wind speed. Thus, when the  $D_a$  reaches 1, the device is considered as failure. According to the failure standard, the criterion of the power device thermal failure is defined that the thermal resistance increases to 1.5 times of the initial value. Thus, the increase of normalised thermal resistance  $\Delta R_{th,a}$  can be estimated as

$$\Delta R_{th,a} = 0.5 \times D_a \tag{3.9}$$

Based on the above calculation, the thermal resistance increment after each year's ageing is iteratively feedbacked into the thermal model, and this study evaluates the uneven degradation development of the paralleled devices in the full life cycle. Firstly when the condition of the devices is assumed to be healthy, the thermal resistance increase during their ageing progress is shown in Figure 3.16. It can be seen that the lifetime of three devices are all above 25 years, and the difference is about 5 years. The shorter lifetime of diode 1 locating on the middle of the module is because the thermal coupling effect leads to higher junction temperature.



Figure 3.16: The ageing development of paralleled devices under health condition

In addition to the difference in degradation development, manufacturing flaws or packaging layout may also cause an initial increase in thermal resistance on one of the paralleled devices. This will accelerate the long-term ageing process of the initially degraded diode. For instance, if the thermal resistance of diode 1 is increased by 1% because of its initial solder cracks, its lifetime will be reduced to 24 years, while the lifetime of other paralleled diodes will not be impacted, as shown in Figure 3.17.



Figure 3.17: The uneven degradation development of paralleled diodes with  $1\% R_{th}$  increase of diode 1

However, if the initial defects raise the thermal resistance of diode 1 by 5%, this will significantly shorten the lifetime, as shown in Figure 3.18. It can be seen that the lifetime of diode 1 will be reduced by 50% to about 15 years. It also presents that the ageing development of diode 1 with initial defects is accelerated progress. This determines the warning time that the condition monitoring method needs to provide. For instance, if the thermal resistance threshold indicates a device replacement is set at  $1.2R_{th0}$ , the warning window that should be given at  $1.18R_{th0}$  is only 3~6 months.



Figure 3.18: The uneven degradation development of paralleled diodes with 5%  $R_{th}$  increase of diode 1

## 3.6 SUMMARY

This chapter establishes electrothermal models to analyse the operating conditions of wind power converters used in fully and partially rated wind turbines and then evaluates the reliability characteristics of the power converters in the whole wind speed range. The machine side devices have a lower lifetime than the grid side devices in the whole wind speed range. The PMSG converters are more stressed beyond the rated wind speed and may only have about ten years' lifetime in the simulated design depending on the wind profile. However, DFIG machine side converters may have even less lifetime, consumed rapidly around the synchronous speed because the massive reactive current at low frequency sends the devices into deep temperature cycling. The electrothermal modelling and lifetime estimation are used to investigate the uneven degradation development of a multi-chip power module under the long-term wind turbine application. The results show that the asymmetrical packaging layout can lead to uneven degradation between paralleled devices even they have the same initial health states. However, once the defects cause an increase in thermal resistance on the weak diode initially, the diode's further ageing progress will be significantly accelerated.

## 4 CONDITION MONITORING OF MULTI-CHIP MODULES WITH UNEVEN DEGRADATION

## 4.1 INTRODUCTION

Due to the relatively low current rating of a single IGBT (or MOSFET) chip, large power modules often contain parallel chips which are not identically packed. As analysed in Chapter 3, initial defects in their die-attach solder layers will gradually grow at different rates and develop into uneven degradation pattern [67]. This necessitates monitoring individual solder layer conditions later in life, particularly in high power applications such as wind turbine converters and EV drives. As parallel chips join at the same terminals, this increases early fault detection difficulty without internal measurements. Examination of failed power modules indicates that a single chip's fault could damage the whole module, which causes electromagnetic transients to other chips including their gating circuitry [100].

This chapter proposes a monitoring scheme for die-attach solder degradation in large power modules with parallel chips or parallel modules in converters, using neural networks to represent the mapping from the electrical operating point to the temperature distribution on module baseplate and heatsink surfaces, which depends on the health condition of power module's package. Instead of tracking the increase of thermal resistance of the solder layers, the proposed scheme focuses on the mapping being matched or mismatched. In this way, the patterns and severities of the degradation can be differentiated in NN training. The mapping represented by the NN captures the effect of solder degradation on the heat dissipation in all directions, although measurements may be arranged on only one side. The details of the NN method is explained further in Section 4.4.

This chapter is arranged as follows: Section 4.2 illustrates the feature of a widely used large, multi-chip-in-parallel power module using a validated simulation model proposed in Chapter 3. An equivalent experiment rig is presented in Section 4.3. Section 4.4 describes the condition monitoring method. The training strategy is presented along with the NN structure. Justifications for the scheme are discussed. Section 4.5 describes the test procedure and presents experimental results to verify the proposed condition monitoring method, followed by conclusions in Section 4.6.

## 4.2 CHARACTERIZATION OF UNEVEN DEGRADATION

The previous study assumed that all the power loss in the module is dissipated downwards to the heatsink. Recently, there were reported cases in the industry that the silicone gel on top of the chips liquefied suggesting that a significant amount of heat was dissipated in this direction, at least after the solder degradation. To obtain a feasible condition monitoring indicator, it is necessary to analyse the multi-chip power module's electrothermal behaviour with uneven degradation.

The thermal behaviour-based monitoring method proposed in this chapter can treat the switching devices and diodes in the same way. Besides, due to smaller size of die area and a larger die thickness, the diode usually has a higher thermal resistance than an IGBT of the same current rating [19]. As analysed in Chapter 3, diodes often fail first particularly in rectifiers but have not received enough research attention. Thus, this chapter focuses on the paralleled diodes in the multi-chip power module.

#### 4.2.1 Electrical characterisation

The electrical characterisation of the paralleled diodes is analysed based on the MATLAB/ Simulink model set up in Chapter 3. Under the same inverter condition, the temperature response of the six parallel diodes in the lower leg of the module is predicted. The electricalthermal coupling model considers the temperature dependence of device characteristics by using the junction temperature as a feedback to its electrical characteristics. The temperaturedependent on-state and switching losses are obtained from the datasheet. The envelopes of diode junction temperatures and the current sharing are shown in Figure 4.1, where the solder thermal resistance of the third diode is increased by 50%. Although the junction temperature of the aged diode is higher by about 10°C, the current sharing is not much affected.



Figure 4.1: Simulation results (a) junction temperature, and (b) current of the aged and parallel diodes

Then the electrical and thermal features of the parallel diodes are investigated with developing degradation. Scanning of degraded modules revealed that the die-attach solder layer degradation is due to cracks initiated at the edge and developing towards the centre [5, 18]. The remaining diode chip solder area is assumed to be circular, as shown in Figure 4.2. The degradation level (DL) is defined according to the increase of thermal resistance  $R_{th}$  normalized to the original value.



Figure 4.2: The ageing process of the die-attach solder layer

The change of the on-state voltage of the parallel diodes and the difference of current between the aged and a healthy diode at the peak load current is shown in Figure 4.3 (a). The on-state characteristic change is insignificant, only 4.5mV (0.25%) and 4A (out of about 130A) at  $1.5R_{th}$ . However, the total power loss of diodes increases by 13W, or 2.2%, as shown in Figure 4.3 (b). This is mostly due to the sensitivity of the switching losses to temperature and implies that the solder degradation may be more sensitively detected by a thermal method; detecting the switching process directly can be informative but has its own challenges, particularly for modules with multiple chips in parallel [101]. The degradation impedes the heat flux below the aged diode. This results in a decrease of temperature gradient in the corresponding direction. The temperature difference between the case and downside of the heatsink corresponding to the aged diode chip,  $T_c$ - $T_h$ , decreases, as shown in Figure 4.3 (b), which is detectable by thermocouples.



Figure 4.3: (a) Electrical and (b) thermal features of parallel diodes under different DLs

The simulation results indicate that, depending on the operating point, the power module's health condition will affect the power loss distribution and the heat transfer through the module. The electric current sharing is only slightly affected. The mapping between the operating point and the external temperature distribution can/should be used for health condition indication. This study only considers the temperature distribution in the heatsink direction. Arguably, the increase of switching loss may cancel out the impeding effect of the increase in thermal resistance. However, as long as multiple operating points are used, condition monitoring would not have undetectable blind conditions.

#### 4.2.2 Thermal characterisation

Returning to the FEA, constant heat flux is calculated with the average power loss for diodes in one leg of the half-bridge module; all other devices are assumed lossless. In a healthy condition, the heat flux on the baseplate and other surfaces are shown in Figure 4.4. The large heat flux density beneath the chips indicates that most power is dissipated towards the heatsink, but still considerable heat is from other parts, i.e. plastic case and terminals. This proportion of energy towards the heatsink was used to estimate the thermal resistance [52].

Heat flux on module top surfaces (W/m<sup>2</sup>)



(b)

Figure 4.4: Heat flux distribution on (a) external surfaces and (b) baseplate bottom surface of a healthy module

Still, with average losses of the six diodes, the simulated temperature distribution on the diodes' top surfaces in a healthy module is shown in Figure 4.5 (a). The centre temperatures are all above 120 °C. The high temperature is quite confined inside the chip area, suggesting that lateral thermal coupling is weak. Along the diodes' centre line, the temperature distribution is extracted, as shown in Figure 4.5 (b). Diode 1, soldered on the left, has a less

effective cooling path and shows slightly higher temperature, leading this device to degrade faster. The temperature distribution viewed from the baseplate's outside surface is shown in Figure 4.6, together with the 1-D distribution plot along the same centreline. The position and thermal status of every diode can be identified clearly.



Figure 4.5: Temperature distribution on the top chip surfaces

For five different diode degradation levels on the left, the temperature distributions viewed from the chips' top surfaces are shown in Figure 4.7. As the die-attach of diode 1 on the left degrades from the chip corners to the centre, high temperature is increasingly concentrated around the chip corners and edges. Although the temperature is increased to about 160 °C, the other five diodes are at temperatures similar to those found in the healthy state.



0% solder layer degradation, Bottom Surface: Temperature (degC)





10% solder layer degradation, Surface: Temperature (degC)

(a)

20% solder layer degradation, Surface: Temperature (degC)







30% solder layer degradation, Surface: Temperature (degC)

(C) 40% solder layer degradation, Surface: Temperature (degC)



(d)

50% solder layer degradation, Surface: Temperature (degC)



(e)

Figure 4.7: Temperature distribution on diode 1 top surfaces with different degradation levels (a)-(e) from 1.1  $R_{th0}$  to 1.5  $R_{th0}$ 

The temperature distribution along the centre line of diode 1 is shown in Figure 4.8 for the top surface and the baseplate's bottom side. It is clear that, with 50% increase of  $R_{th}$ , the chip centre temperature can be raised by more than 20 °C and the case temperature is consequently increased by about 1 °C, detectable by a thermocouple. This means that it is possible to classify health state from outside the power module, regarding both the level and location. In the experiment of this study, solder layer degradation will be emulated by



attaching different layers of thermal pads on the module baseplate to cause similar temperature effects.

Figure 4.8: Temperature distribution on (a) top surface of diode 1 and (b) case surface beneath diode 1

## 4.3 EXPERIMENTAL TEST RIG

The method aims to identify the pattern and severity of degradation of individual solder layers. For a given health condition, it is assumed that the thermal model of the multi-device system is approximately fixed and linear. In each degradation scenario, the division of heat flux in different directions follows individual relationship. According to above, as a solder layer degrades, the power loss estimated from the downwards heat flux will be an underestimate of the total. Nevertheless, it is still hoped that the heat flux pattern will cause a temperature distribution which uniquely corresponds to the degradation condition, at the given operating point. Although it is possible that the effects of the increased power loss and thermal resistance cancel out each other, resulting in an external temperature distribution which is not affected by the internal health condition, this problem can be avoided by including several operating points in the NN training and application processes. As shown in Figure 4.9, the temperature distribution is measured across the heatsink and aligned to the chip centres. The mapping between the temperature distribution and the operating point can indicate the occurrence of solder degradation. The condition monitoring method proposed in this chapter only uses the converter level signals and external temperature measurements and treats the switching devices and diodes in the same way.



Figure 4.9: Instrumentation points in the power module system

The FEA simulation showed that, above the baseplate, there is little lateral heat transfer between chips. In order to conveniently measure the behaviour of individual devices in parallel and set uneven degradation levels in experiments, an equivalent rig consisting of two paralleled modules is thus used to test the condition monitoring method experimentally. The equivalence of using such rig to represent a multi-chip system is presented further in Section 5.2. A single-chip half-bridge power module (SKM50GB12T4, 1200V/50A) includes one IGBT and one diode in each leg, as shown in Figure 4.10 (a). Two such power modules (encapsulated) are mounted on the middle and rear positions of the same water-cooled heatsink (Hi-Contact 416601) as shown in Figure 4.10 (b), which has more significant heat even-out effect than the baseplate in the module. The outlet water is cooled by a chiller (Lauda WK 4600) and then re-fed into the heatsink. The cooling water's inlet temperature and the flow rate were set at 20°C and 6300ml/min, respectively. The lower leg diode or IGBT chips of the two modules are paralleled with only their total current measured as in a multi-chip module. DC supply is first used to test the diodes whose junction-to-case thermal

resistance values are separately varied as described below to emulate the solder layer degradation. The parallel modules will later be configured as a half-bridge inverter to further test the condition monitoring method in operation.



Figure 4.10: Multi-chip experimental rig, (a) power module and (b) heatsink

Figure 4.11 shows appending thermal pads to the baseplate to emulate solder layer degradation, without discriminating IGBT and diode. The thermal conductivity of Bergquist GapPad 1500 is 1.5W/m·K. The two power modules can be appended differently to create different degradation patterns. The degradation level (DL) to be specified in this way is similar to that defined in Figure 4.2. The adopted five degradation levels are shown in Table 4.1, with the measured thermal resistance values, for a single diode chip. The initial condition is with one pad layer to fulfil the micro gaps between the contact surfaces, and the modules are screw mounted with minimal effect of material thermal expansion.



Figure 4.11: Thermal pad attached on baseplate

The full test rig with thermocouple transducers is shown in Figure 4.12. The electrical operating point and temperatures are recorded every second by a cDAQ-9181/9213 and PICO TC-08 data acquisition system, implementing the supervision control and condition monitoring algorithms.

| Degradation<br>level | Number of<br>thermal<br>pads | Junction-to-case<br>thermal resistance<br>(K/W) | Normalized<br>thermal<br>resistance |
|----------------------|------------------------------|-------------------------------------------------|-------------------------------------|
| DL1                  | 1                            | 0.946                                           | 1 (initial state)                   |
| DL2                  | 2                            | 1.052                                           | 1.112                               |
| DL3                  | 3                            | 1.158                                           | 1.224                               |
| DL4                  | 4                            | 1.264                                           | 1.336                               |
| DL5                  | 5                            | 1.370                                           | 1.448                               |

Table 4.1: Diode thermal resistance at different degradation levels



Figure 4.12: Complete experimental rig

## 4.4 TWO-STAGE NEURAL NETWORK

This chapter proposes a two-stage NN method for condition monitoring multi-chip-inparallel modules. The first stage consists of a set of sub-NNs representing the mapping between the electrical operating point and the external temperature distribution, for a series of degradation conditions which can be used to extrapolate the others not in the series. The outputs of all the sub-NNs which measure the match or mismatch of all the mapping relationships are presented to the next stage to classify the actual health condition according to its positions with respect to the sub-networks.

#### 4.4.1 First stage NNs

For a first stage NN, the temperatures on the upper (module side) and lower (ambient air side) surfaces of the heatsink ( $T_{c1}$ ,  $T_{c2}$ ,  $T_{c3}$ , .../ $T_{h1}$ ,  $T_{h2}$ ,  $T_{h3}$ , ...), the inlet & outlet water temperatures ( $T_{in}$ ,  $T_{out}$ ) and that of the ambient in the distance ( $T_{amb}$ ) are used as the inputs to be mapped to the corresponding electrical operating point which can be tagged arbitrarily. The operating point itself is irrelevant to the module health condition, although each of the mapping relationships is. Therefore, in this study, the operating point is tagged by the total power loss (OPL) of the healthy module at that point, and under fixed ambient and cooling conditions. The structure of each sub-network is shown in Figure 4.13.



Figure 4.13: Structure of a sub-NN in the first stage

Because of the system's thermal capacity, the output power loss is fed back as another input to the network through a time delay whose delay time and gain are trained together with other NN parameters. The raw data are Z-score normalized, i.e. data of each channel minus its mean value and then divided by the standard deviation [102]. The number of neurons in the hidden layer is set by trial-and-error, to provide acceptable prediction accuracy with the minimum number of hidden neurons. The initial number is set as the square root of the product of the number of inputs and the number of outputs [103]. Each neuron uses a sigmoid activation function [104]. The mismatch to be minimized in training is the mean square error (MSE) between the estimated power loss (EPL) and measured operational power loss (OPL).

Each sub-NN is trained for a DL condition, representing a specific pattern and severity of degradation. For a given electrical operating point, the power losses inside the module will create a temperature distribution. As the operating point is tagged by the OPL, which is used to train the sub-NN together with the temperature distribution, the output of a sub-NN will be closest to the OPL. Because other sub-NNs represent other degradation conditions, they are trained with different temperature distributions, and even the electrical operating point may be the same. As a result, their outputs are likely to be more different from the OPL. Such a difference is present in all operating points. This is how the proposed method differentiates the patterns and severities of degradation. If the comparison is made at more operating points, this will lead to a higher confidence level with the conclusion.

#### 4.4.2 Generalization and extrapolation

Training each of the first stage NNs should cover the full range of operating point; the NNs should collectively cover all possible degradation scenarios. In reality, a finite number of cases are adopted for training. Figure 4.14 shows that the operating point is varied to tune a sub-NN in a fixed health condition. The level of degradation is then incremented to complete the training. With the normalization of data described above, the symmetry of the module's chip layout can be exploited to reduce the number of scenarios required. As for the operating point range, the conditions outside the training dataset can be extrapolated by existing sub-NNs. Therefore, despite the large combination number in multi-chip-in-parallel modules, applying the method in practical SCADA systems is possible.



Figure 4.14: Training of sub-neural networks with variable operating point

Each module is assigned 5 degradation levels for the two-module experimental system, as shown in Table 4.1. Totally 25 first stage sub-neural networks for estimating the power loss are established with names, as shown in Table 4.2, each for the whole operating point range. DL1-5 (NN5) represents that diode 1 is in DL1, and diode 2 is in DL5, for instance.

 Table 4.2: Neural network definition matrix based on combination of device degradation

 levels

| First Stage       |      | Diode 1     | Degradat | tion Level |             |             |
|-------------------|------|-------------|----------|------------|-------------|-------------|
| sub-NN Name       |      | <b>DL</b> 1 | DL 2     | DL 3       | <b>DL 4</b> | <b>DL 5</b> |
|                   | DL 1 | NN1         | NN6      | NN11       | NN16        | NN21        |
| Diada 2           | DL 2 | NN2         | NN7      | NN12       | NN17        | NN22        |
| Diode 2           | DL 3 | NN3         | NN8      | NN13       | NN18        | NN23        |
| Degradation Level | DL 4 | NN4         | NN9      | NN14       | NN19        | NN24        |
|                   | DL 5 | NN5         | NN10     | NN15       | NN20        | NN25        |

Figure 4.15 shows the interpolation between trained sub-NNs for DL a, DL b, and DL c with a progressively higher level of degradation. Each line represents the correlating between the temperature distribution ( $T_c$ - $T_h$  as a vector) to the estimated power loss (EPL) by a certain NN. For an operating point, outputs of all sub-NNs are compared with the operational power loss (OPL). The fact that the module may degrade to different levels is concealed in that the same power loss will be obtained from different temperature distributions in different sub-NNs. Therefore, in practice, given the operating point (i.e. OPL) and temperature distribution (i.e.  $T_c$ - $T_h$ ), the NN that gives the closest match between them should have the heaviest weight to signify the degradation. Suppose the number of sub-NNs, each representing a level of degradation, can sufficiently cover the targeted conditions. Once an actual degradation level which has not been previously learned by the NNs is happened, it can also be recognised by interpolating from its neighbouring NNs.



Figure 4.15: Illustration of the method by the deviation between EPL and OPL

### 4.4.3 Second stage NN

Figure 4.16 shows the general process of the proposed condition monitoring method consisting of two-stages of neural networks. The first stage has a set of sub-NNs, e.g. 25 from 'NN1' to 'NN25' corresponding to Table II. When used to diagnose a health condition, all the NNs are activated to generate EPLs for the same input. When a temperature distribution under a specific DL is applied to a sub-NN which is trained for another DL, the EPL will deviate from the OPL and this deviation increases if the two DLs differ further. This ability to differentiate the power loss is the basis of health state recognition and will be demonstrated experimentally in the next section.



Figure 4.16: Block diagram of the condition monitoring method.

The second stage NN uses the information to identify the module's actual degradation condition, as expanded in Figure 4.17. The mismatches of power loss are compiled as degradation references from 'Err 1' to 'Err 25' as inputs. Also, the temperature distribution also contains the location information indicating where the uneven degradation happens, and supply voltage & current ( $V_d$ ,  $I_d$ ) indicate the actual operating point. These inputs are standardized again by Z-score normalization, as they are measurements in different physics domains. Each layer of neurons also performs a sigmoid function calculation. The 25 degradation levels are expressed as a matrix output, as shown in Figure 4.17, and a Softmax layer is employed at the final stage to calculate the classification probability. The loss function is selected as cross-entropy, a typical classification for NN configurations [22]. The entire condition monitoring method could efficiently identify a multi-chip module's health state in terms of degradation pattern and severity.



Figure 4.17: Second stage neural network structure for degradation classification

## 4.5 EXPERIMENTAL RESULTS

#### 4.5.1 Power loss estimation of first stage NNs

The lower leg diodes of the single-chip modules are connected in parallel and tested with a DC. The experiment is repeated for the 25 combinations of different degradation levels: DL1-1 to DL5-5. In each case, the total current is varied from 0A, 10A, 20A... to 80 A and then from 80A to 0A. When the multi-device system reaches an electrical-thermal steady

state, the temperature distribution is recorded for 170 seconds. The measurement data with current increasing are used as the training dataset, and those with current decreasing are used for testing. For each DL condition defined in Table 4.2 and emulated as Figure 4.11, there are 1,360 data sets in the whole current range for training or testing.

The training processes are performed on an i7-6820HQ PC with a GTX-1080 GPU; the training algorithm parameters are Levenberg-Marquardt,  $1x10^{-3}$  learning rate, and 300 maximum epochs with shuffle data. The neural networks for power loss estimation at all degradation combinations are trained for the whole current range, and the corresponding testing results are shown in this section.

The results of sub-NNs corresponding to DL1-1 (NN1), DL1-3 (NN3) and DL1-5 (NN5) at 10A, 40A and 80A total current respectively, are shown in Figure 4.18. The trained first stage sub-neural networks can track the total power loss at different degradation levels and different operating points.



Figure 4.18: Power loss estimation results by sub-neural networks with different degradation levels and operating points

When the dataset of temperature distribution with DL1-3 is applied to other sub-NNs which are not trained for such a degradation condition, the error of power loss estimation is much more prominent as shown in Figure 4.19 for a total current of 40A.



Figure 4.19: Power loss estimation from different sub-neural networks with degradation level DL1-3 and 40 A total current

An NN trained for a more severe level of degradation tends to over-estimate the power loss because it assumes that the temperature distribution only indicates a reduced portion of the total power loss dissipated downwards to the heatsink. This is also shown by the histograms in Figure 4.20, for estimation errors using 5 NNs and for 3 different DLs, in the whole current range.



Figure 4.20: Estimation error histograms for 5 sub-neural networks with 3 degradation levels: DL1-1, DL1-3, and DL1-5

#### 4.5.2 Degradation level monitoring by second stage NN

Once all of the 25 first stage sub-neural networks have been trained, the multi-device system's degradation levels can be classified by the second stage network, using the power loss estimation errors from the 25 sub-NNs.

The situation with only one diode aged is first demonstrated. The raw data from the measurements are shown in Figure 4.21(a), under DL3-1 for the whole operating range. The electrical measurements are subject to noises, and temperatures on the heatsink further fluctuate due to the chiller cycle. The condition monitoring neural network can tolerate these disturbances and only 6 samples out of 1360 (over 1400s) during the period misidentify the health condition as shown in Figure 4.21(b).



Figure 4.21: (a) Measurements under DL 3-1 and (b) diode 1 degradation level monitoring results

The classification results of the second stage NN are illustrated in confusion matrix in Figure 4.22 (a), showing the results when diode 1 is degraded to five different levels but diode 2 remains in the health condition (DL1-1, DL2-1, ..., DL5-1). Each row in the 5x5 matrix represents the instances in a predicted degradation level, and each column represents the instances in an actual degradation level set in the experiment. The green-coloured elements are the number of true predictions, and the red are false, while the number of testing samples that fall into each case is also shown in such element. In the bottom row, the percentage is

'recall' for each DL, i.e. the number of the samples truly recognized by the NN divided by the total number of samples with such DL in the test data, which indicates to what extent each actual DL can be recognized correctly [105]. Similarly, in the right column, the 'precision' is the number of the samples recognized by NN divided by the total number of samples with such DL, indicating to what extent each DL prediction can be trusted. The overall 'accuracy' for all the five DLs is 99.6%. The corresponding results for the cases where only diode 2 has degraded are also with high accuracy, as shown in Figure 4.22 (b).

The classification results for cases where the solder layers of both parallel devices have degraded are shown in Figure 4.23. 25 degradation levels (DL1-1, DL1-2,..., DL5-5) are demonstrated by combining the two diodes' health conditions. For each combinational condition, the dataset includes measurements in the entire current range. The degradation level of this multi-device system can be recognized by first extracting the degradation condition of diode 1 from the first classification matrix and then from the second matrix for diode 2. For each degradation level, the success rate of pattern recognition is higher than 99%. Although the interference and measurement errors would cause some mistakes, the correct rate is high enough to monitor the uneven solder layer degradation in a multi-device system in practice.



Figure 4.22: Recognition results for single device aged conditions: (a) diode 1 degraded and (b) diode 2 degraded



Figure 4.23: Recognition results for conditions with both devices aged: (a) diode 1 classification and (b) diode 2 classification

#### 4.5.3 Untrained conditions

The above analysis considered the cases where the patterns and severities of the degradation have all been included in the training dataset. In fact, the condition monitoring method has the capability to diagnose untrained conditions within the range. To demonstrate this, we arbitrarily train the first stage networks for only 16 degradation cases, without the states that either diode 1 or 2 is at DL3. In other words, DL1-3, DL2-3, ..., DL5-3, and DL3-1, DL3-2, ..., DL3-5 are excluded in the training process. Because of this, the degradation recognition network only contains DL1, DL2, DL4 and DL5 as the trained output results in the confusion matrices in Figure 4.24(a) and (b) for the degradation levels of diode 1 or diode 2. Then the testing datasets, which contain all the cases including DL3 for either diode 1 or 2, are imposed to the two-stage NN. All other degradation levels excluding DL3 were recognized precisely with recall more than 98%, as shown in the confusion matrices in Figure 4.24 (c) and (d).



Figure 4.24: Recognition of an intermediate degradation level: neural network training results for (a) diode 1 and (b) for diode 2 and, degradation recognition results for (c) diode 1 and (d) diode 2

Although the degradation level DL3 could not be explicitly classified in these outputs because it has not been defined, it is possible to determine the transitional state from the distribution of the network output results on the other degradation levels. Once enough data have been collected, such DL can be detected from statistical distribution. As degradation develops gradually, the output results gather at DL2 and DL4 along the column of target degradation level 3 in Figure 4.24 (c) and (d), indicating this degradation level from the input is a state between the two levels. This interpolation algorithm could enhance the capability of extrapolating to diagnose untrained conditions.

# 4.6 SUMMARY

The main objective of this study is to deal with the extra complexity and ineffectiveness of condition monitoring, caused by the uneven solder degradation of multiple parallel chips packaged in the same power module. The proposed method is based on external electrical and temperature measurements. The NN training is based on a supervised approach which requires the explicit degradation labels and places a temperature sensor for each power semiconductor chip, while it is considered possible to arrange in-factory training for the converters in the same fleet of target wind turbine systems and sensors can be readily designed for power modules in most wind turbine converters. As the mapping between the electrical operating point and temperature distribution deviates from that of a health module, the results of temperature sensors placed in different positions could be used to recognize the pattern of degradation, and the rate of change can in theory provide information of severity.

This chapter proposes a potential method for detecting uneven solder layer degradation in a multi-chip-in-parallel system by heat-flux based condition monitoring using a two-stage neural network. According to electro-thermal modelling, a degraded device tends to work at higher junction temperature than a normal device, which accelerates the ageing process due to increased thermomechanical stresses. In the two-stage neural network, the first stage sub-NNs are trained to represent the mapping between the electrical operating point and external temperature distribution for a series of degradation conditions, and their output mismatches with respect to the training target are then extracted and applied to the second stage network to derive the multi-device system's health condition. This approach is based on the dependence of a module's external temperature distribution on its internal health condition, which is shown to be stronger than the change of device electric characteristics (particularly the steady-state characteristics) at the module terminals. The detection rate of this condition monitoring method has been found experimentally higher than 98%, with a resolution of solder resistance change by about 10% per step. It is hoped that the method can be developed for field-deployment in large-scale power converters such as those used in offshore wind turbines.

# 5 ADVANCED MEASURES FOR CONDITION MONITORING OF MULTI-CHIP MODULES

# 5.1 INTRODUCTION

The condition monitoring method proposed in Chapter 4 is based on external electrical and temperature measurements. To improve the feasibility of implementing such a method in practical applications, some additional measures need to be further considered. In Chapter 4, the NN training is based on a supervised approach which requires the explicit degradation labels. This will no doubt add to the difficulty of implementation and network training. At the same time, it is considered possible to arrange in-factory training for the converters in the same fleet of target systems. The labelling method is mimicked by attaching thermal pads on module baseplate, which has to be validated to have the equivalent effect compared to the real solder degradation in this chapter. On the other hand, the one-layer network in Chapter 4 may not be adequate for the complex operating conditions in a real converter, which will be presented in Section 5.4. This requires different network structures to mine deep information, such as the more complex combination of operating points and cooling conditions and transient responses during rapid wind speed varying, which will be investigated in this chapter.

The study in Chapter 4 has placed a temperature sensor for each power semiconductor chip. This can be readily designed for power modules in most wind turbine converters and EV drives. As the mapping between the electrical operating point and temperature distribution deviates from that of a health module, the results of temperature sensors placed in different positions could be used to recognize the pattern of degradation, and the rate of change can in theory provide information of severity. However, in some high-power turbines and grid applications, such as MMC-HVDC, the power module may consist of tens of parallel chips. In such a case, to improve the field-deployment feasibility, optical temperature distribution sensing using FBG (fibre Bragg grating) may be an appropriate choice that can integrate 20 temperature sensors one FBG fibre. Therefore, this chapter deals with the extra complexity and requirement of field-deploying the proposed condition monitoring method, in the aspects of data labelling method, temperature sensing, and network architectures.

## 5.2 DATA LABELLING FOR UNEVEN DEGRADATION LEVEL

Chapter 4 uses two single-chip power modules as DUTs in parallel to mimic the multi-chip power module introduced in Chapter 3 and conduct the test under DC conditions. In this section, an inverter system is established with two single-chip modules in parallel. Along with the degradation emulation of thermal pad attachment, the system is used as an uneven degradation level labelling platform to generate data for NN training in field-deployment. The platform is first to be validated that it can achieve the same electrical and thermal responses as the multi-chip power module.

#### 5.2.1 Data labelling platform

With two parallelly connected single-chip power modules, whose AC terminals are connected to an inductive load, a half-bridge inverter system is established, as shown in Figure 5.1. These two modules are considered entirely as a multi-chip power module in further tests. A DC power supply provides a 200V DC link, and the load current is acquired by a current probe and controlled by a dSPACE system, as shown in Figure 5.1 (b). The operating conditions of this test rig are summarized in Table 5.1.



(b)

Figure 5.1: The half-bridge inverter test, (a) circuit diagram and (b) test rig

| Tab | le | 5. | 1: | 0 | perating | cond | itions |
|-----|----|----|----|---|----------|------|--------|
|-----|----|----|----|---|----------|------|--------|

| Parameters          | Value      |
|---------------------|------------|
| DC-link voltage     | 200 V      |
| Output current      | Programmed |
| Switching frequency | 2550 Hz    |
| Output frequency    | 50 Hz      |

These two modules are bonded on the water-cooling heatsink used in Figure 4.10, and the initial coolant temperature is controlled through a chiller. The temperature at the position beneath the centre of each chip on the top and bottom surface of the water-cooled heatsink are measured by K-type thermocouples and acquired by NI cRIO 9213 at a rate of 1 sample/s as shown in Figure 5.2, which has more temperature sensing points on the than the setup in

previous chapter. At the same time, the electrical operating point is in the dSPACE system at the same sampling rate.



Figure 5.2: Temperature measurement positions on heatsink

The method to emulate solder degradation experimentally is to attach the thermal pads on the baseplate surface to increase the thermal resistance underparts of Module 1. Five layers of high thermal conductivity (TC) (4 (W/(m·K)) thermal pad with maximum 2.5 mm thickness are placed on the bottom (outer) surface of the baseplate, while the region of the high TC thermal pads directly beneath the lower leg diode of Module 1 are cut away and then filled with a variable number of layers of a low TC (1.5 (W/(m·K)) thermal pad of the same thickness. The thermal resistance associated with the lower leg diode is consequently increased as if its solder layer has aged into a degradation state which is controlled by the combination of high and low TC thermal pad layers. The diode's equivalent thermal resistances are extracted from FEA modelling for the healthy state (DL 0) and five degradation levels (DL 1 ~ DL 5). The thermal pad setup of the DL 5 degradation case is shown in Figure 5.3, and the thermal resistance of each DL are extracted from FEA model simulation, as shown in Table 5.2.



Figure 5.3: Thermal pad setup for DL 5

Under DL 0, the voltage and current of the inductor and the current through Module 1 are shown in Figure 5.4, where the RMS current is controlled at 40A and the current sharing between the modules can be identified.

| Degradation<br>level | Number of<br>low TC pads | Junction to case<br>thermal<br>resistance (K/W) | Normalized<br>thermal<br>resistance |
|----------------------|--------------------------|-------------------------------------------------|-------------------------------------|
| DL 0                 | 0                        | 0.946                                           | 1 (initial state)                   |
| DL 1                 | 1                        | 1.052                                           | 1.112                               |
| DL 2                 | 2                        | 1.158                                           | 1.224                               |
| DL 3                 | 3                        | 1.264                                           | 1.336                               |
| DL 4                 | 4                        | 1.370                                           | 1.448                               |
| DL 5                 | 5                        | 1.476                                           | 1.560                               |

Table 5.2: Thermal resistance at different degradation levels



Figure 5.4: Output of the test rig under 200V/40A

Two healthy unpacked modules painted in black by PNM high-temperature paint, as shown in Figure 5.5, are first used in this test rig to investigate the temperature distribution inside the multi-chip-in-parallel system. With 40A load current, the diodes have a higher temperature than the IGBTs, and the thermal coupling within each half-bridge section is observed as shown in Figure 5.6, using FLIR a310. Because the extra thermal pads added to one of the modules also increase the thermal resistance of the IGBTs in that module, the corresponding IGBT temperature is also higher compared to the other module. In the following investigation, the increase of IGBT thermal resistance is not considered as a fault. The focus of condition monitoring is still on the diodes in the module, such as that shown on the left of Figure 5.6. These diodes are each in parallel with the corresponding diodes in the module shown on the right. In the experiments, different layers of appending thermal pads to the left module are considered as the DLs.



Figure 5.5: The layout of the uncased power module



Figure 5.6: Temperature distribution measured by thermal camera

#### 5.2.2 Electrical equivalence

This section evaluates the equivalence between the electrical behaviours of the proposed uneven degradation level labelling vehicle and the multi-chip power module. The diode forward characteristic and reverse recovery losses of the multi-chip module are shown in Figure 5.7. The ZTC point is at the current of about 570A. Considering the rating of a device is usually selected with 1.5 to 2 times the safety margin in practical design, this module would be working under the current 600A rms maximally, which means the diode will mostly work in the NTC region. Assuming the module is under a constant current condition, a solder layer degraded diode would become hotter then share more current because its conduction resistance is higher within the parallel system. Compared with other health diodes, the higher power losses and higher thermal resistance on this diode intend to increase its junction temperature. Furthermore, according to the reverse recovery loss characteristic, the aged diode would also have a higher switching loss than others.

Then regarding the single-chip module in the vehicle, the diode has a ZTC point at 58A slightly below its rated current as shown in Figure 5.8 together with the forward characteristic and reverse recovery energy at 25°C and 125°C. This commercial Si PiN diode has a high ZTC position, close to its rated current, meaning that these diodes would usually be working in an NTC region if the testing current reaches the rated point. Like the multi-

chip module, a solder-degraded device will have a higher junction temperature; hence the diode will conduct a larger current when working in the NTC zone. Meanwhile, high current and high temperature extend the reverse recovery time and increase the aged diode's power losses, leading to accelerating degradation, which is the same as the multi-chip operating conditions analysed above.



Figure 5.7: (a) Forward characteristic and (b) reverse recovery energy of the multi-chip power module



Figure 5.8: (a) The diode forward characteristic and (b) reverse recovery energy in different temperature

## 5.2.3 Thermal equivalence

To validate the feasibility of using thermal pads to mimic the solder degradation, in this section an FEA model is established to compare the electrothermal behaviour of uneven degradation caused by real solder degradation and thermal pad attachment in the vehicle. The meshed FEA model of the single-chip module SKM50GB12T4 without capsulation as shown in Figure 5.5 is established in COMSOL Multiphysics software, as shown in Figure 5.9 and the material parameters are shown in Table 5.3 [84]. The chips are set as heat sources defined by the programmable power density. The baseplate's bottom surface is simulated to be bonded on a water-cooling heatsink with an inlet temperature of 20 °C to emulate the wind turbine cooling condition [106] other surfaces are defined as an air-cooled condition.



Figure 5.9: The FEA model of the single-chip module

| Parameter                                                       | Chip<br>/Si | Die-<br>attach<br>/SAC305 | DBC<br>/Cu | DBC<br>/Al <sub>2</sub> O <sub>3</sub> | Baseplate<br>solder | Baseplate<br>/Cu |
|-----------------------------------------------------------------|-------------|---------------------------|------------|----------------------------------------|---------------------|------------------|
| Length (mm)<br>(x-direction)                                    | 7.2/ 5.6    | 7.2/ 5.6                  | 28.8       | 29.8                                   | 28.8                | 91.5             |
| Width (mm)<br>(y-direction)                                     | 6.8/ 5.6    | 6.8/ 5.6                  | 26         | 27                                     | 26                  | 31.5             |
| Thickness (mm)<br>(z-direction)                                 | 0.18        | 0.1                       | 0.3        | 0.38                                   | 0.1                 | 3                |
| Thermal<br>Conductivity<br>(W/(m × K))                          | 130         | 50                        | 400        | 35                                     | 50                  | 400              |
| Coefficient of<br>Thermal<br>Expansion<br>(10 <sup>-6</sup> /K) | 3           | 23                        | 17         | 6.5                                    | 23                  | 17               |
| Thermal<br>Capacity<br>(J/(kg × K))                             | 700         | 150                       | 385        | 730                                    | 150                 | 385              |

Table 5.3: Packaging material parameters for FEA modelling

In the simulation model, this module's operating condition is defined as the same as the experimental condition in Section 5.2.1. The module is assumed to be working in an inverter with an inductive load. The output frequency is 50 Hz, the peak current is 40 A, and the switching frequency is 2550 Hz. Based on the characteristics extracted from the datasheet, the average power losses on each IGBT and diode chip are 88 W and 56 W, respectively.

The heat dissipation through the power module's surface can be illustrated by the outflow heat flux on the top and bottom surface, as shown in Figure 5.10. The heat flux on the top surface of chips is about  $1.3 \times 10^3$  W/m<sup>2</sup> as compared to  $5.5 \times 10^3$  W/m<sup>2</sup> on the bottom surface of the base plate, with air convection on open surfaces and with the capsulation removed in the FEA model. This infers that the heat dissipation through top direction could not be ignored when calculating the thermal resistance, and this proportion would be increased with the solder degradation, which agrees with the results in Chapter 4.



Figure 5.10: Outflow heat flux on (a) top surface and (b) bottom surface of the power module

Compared with the baseplate solder, the die-attach solder layer would be more unreliable because it is much closer to the semiconductor chips, which leads to higher thermomechanical stresses in it. Still focusing on the diode, its ageing process is simulated as cracks developing inside die-attach from the corner to centre, which increases the thermal resistance. The diode solder degradation level (DL) is defined as the ratio of the remaining area of the solder layer to its original area, as shown in Figure 5.11. The remaining solder is assumed to be circular, which is usually shown in CT or SAM scanning of degraded devices [5].



Figure 5.11: The ageing process of die-attach in FEA model

The equivalence between these two methods, reduction of solder pad area and insertion of the thermal pads, can be established by simulation results. The temperature distribution on the diode chip surface is simulated in the FEA model. As shown in Figure 5.12, the centre of the chip with thermal pads is slightly hotter, but the temperature on edge is similar to the module with a reduced solder pad area. Considering that the semiconductor cells are evenly distributed on the chip, the diode chip's average temperature under different degradation levels is still similar, as shown in Figure 5.13.



Figure 5.12: The temperature distribution on diode surface with  $1.2R_{th}$ , (a) practical crack modelled and (b) with a thermal pad.



Figure 5.13: Average temperature on the chip surface under different DLs

The ratio of the heat dissipating through the top surface to the total power losses under different degradation levels are shown in Figure 5.14. It can be seen that the two degradation emulation methods can achieve the same heat transfer characteristic. Also, the heat transferred from the top surface is increasing with the downwards thermal resistance.



Figure 5.14: The ratio of the heat dissipating through top surface under different DLs

## 5.3 FBG FOR TEMPERATURE MEASUREMENT

As analysed above, the online measurement of the power module's temperature distribution can be considered as the key indicator of effective condition monitoring. However, the conventional sensing techniques like thermocouples remain challenging due to large device geometry, multi-point measurement and electromagnetic interference (EMI) rich operating environment [107]. This section aims to explore the potential of FBG array sensor application to monitor the health state of the power module through external temperature distribution on baseplate and proposes a novel measurement solution for the condition monitoring of multi-chip power modules.

#### 5.3.1 FBG sensor integration and calibration

The proposed FBG thermal sensor is customised based on the single-chip power module's dimension and layout to measure the distributed temperature on the baseplate. To this end, an FBG array sensor with four FBG heads is imprinted in a single-mode optical fibre coated by polyimide which can resist high temperature resistance (up to  $\approx 300$  °C).

Figure 5.15 illustrates the FBG array sensor architecture; it consists of four 5mm long FBG heads with an average bandwidth of  $\approx 0.3$  nm and reflectivity of  $\approx 80$  %. The sensing heads are distributed along with the fibre based on the positions of semiconductor chips. The spacing is designed to ensure that each sensing point (FBG head) in the array is located vertically under a target chip, as shown in Figure 5.15(b). The FBG head locations are seen to be designed to mirror individual chip positions and are named accordingly as follows: FBG-IG1 (for IGBT1), FBG-D1 (for diode1), FBG-IG2 (for IGBT2) and FBG-D2 (for diode2).



Figure 5.15: (a) FBG thermal senor design and (b) FBG array sensor instrumentation in the DUT

To eliminate mechanical excitation (strain) effects on array FBGs' Bragg wavelength shift, which may induce an error in the thermal measurements due to FBG inherent thermomechanical cross-sensitivity, the fibre section containing the FBG heads is loosely packaged within a brass capillary. The remainder of the optic cable is Teflon tubed for protective purposes. The brass capillary is integrated into the module copper baseplate, to allow precision placement of array sensor's sensing heads but also to ensure sensor protection during installation. The chosen capillary dimensions are 0.3 mm inner diameter and 0.5 mm outer diameter. This ensures a sufficient air gap to house the sensing fibre in its bore and a fine wall thickness of about 0.1 mm to enhance the sensor thermal response [45].

The procedure of integrating the FBG array thermal sensor into the power module was conducted in the following steps:

First, a groove of 0.6 x 0.6 mm sectional size was machined in the baseplate along its axial centreline in the area beneath the known locations of the module chips where thermal hotspots are expected; at one groove side, an exit curvature was made, as shown in Figure 5.16. This allows the embedment of the packaging capillary into the baseplate structure without interfering with the module mounting points and enables the sensing fibre to effectively interface with an external interrogator device.

The packaging brass capillary was then prepared with a length equal to that of the machined baseplate groove plus an additional few millimetre to enable sensing fibre insertion and the integration between the brass and Teflon packaging into a joint protective structure. Once prepared, the capillary was fitted into the groove using a thermally conductive adhesive (acrylic thermal adhesive).

Finally, the FBG sensing array was carefully inserted into the capillary in the desired position. Heat shrink tubing was used to bond the brass and Teflon packaging.



Figure 5.16: Grooved power module

The described process enables practical installation of a packaged FBG array sensor into a power module. The process makes the array sensor accessible for in-situ replacement or recalibration and effective re-positioning of sensing points for greater accuracy or coverage of additional locations of interest.

The FBG array thermal sensor was thermally calibrated using a thermal chamber to obtain the appropriate wavelength shift-temperature fit curve for each FBG head. The FBG fabrication, installation on the module, and thermal calibration were performed by my colleagues Dr Anees Mohammed and Dr Sinisa Durovic from the University of Manchester. The FBG instrumented module was placed inside the thermal chamber and exposed to different levels of static thermal excitation from 25 °C to 95 °C in 10 °C increments. The FBGs' wavelength shifts were recorded at each applied temperature level using a temperature reference obtained from the thermal chamber readings. Figure 5.17 shows the obtained calibration test data. The average temperature sensitivity of the FBG array sensor calculated from the calibration data linear fits slope was  $\approx 10.9 \text{ pm/°C}$ .



Figure 5.17: FBG array thermal calibration

#### 5.3.2 Temperature measurement results

This section reports the results of the experimental study undertaken to evaluate the application of the proposed thermal monitoring scheme for power module thermal condition monitoring. To examine the performance of the FBG based thermal sensing system, the FBG-instructed module is placed on the Module 1 position in the inverter test rig shown in Figure 1, while the Module 2 is still a normal module. At the same time, a TC set is employed to measure four points on the heatsink top surface, as shown in Figure 5.2.

These two types of temperature measurement are evaluated and compared under static and dynamic thermal states. For the static thermal conditions, different tests with different constant levels of load current were examined. Each static load current test starts from ambient temperature and lasts unit the module thermal equilibrium is reached. The dynamic thermal conditions tests are performed for a variable load current profile that replicates load variability encountered in field inverters operating under a typical wind speed profile.

The obtained results are presented and discussed in the following subsections. First, the FBG thermal sensing system performance is explored and cross-correlated with conventional TC measurements under static thermal conditions. The FBG in-situ thermal measurement is then validated against those provided by the TCs based on utilizing a thermal network model that allows their direct correlation. Finally, the results of examining the FBG sensing system response under a typical wind speed load profile is presented.

The static tests were undertaken on the test rig in the following uniformly distributed range of load current within the nominal range: 10A, 20A, 30A, 40A and 50A. For illustration purposes, Figure 5.18 shows the temperature measurements obtained by the FBG array and TCs in the half-bridge inverter test rig when operated under low and high load currents (10A and 50A).





Figure 5.18: FBG and TC temperature measurements under static conditions of load current (a) 10 A and (b) 50 A

The first 60s of data in Figure 5.18 are measured with the test inverter not energized and the heatsink temperature controlled at  $\approx$  18 °C by the water-cooled chiller. As the TCs are fitted to the heatsink, their measurements are similar to the coolant temperature at  $\approx$  18 °C. The FBGs embedded within the baseplate measure a temperature that is  $\approx$  0.6 °C higher than that of the coolant. This is due to FBGs not being in direct contact with the heatsink and the test room ambient temperature during the test of  $\approx$  25 °C contributing to the temperature gradient from external ambient down to the cooling system.

The converter is energized at the 60s point in tests and the thermal responses from both FBGs and TCs then observed during the step transient and the steady-state stage. As expected, due to their location and effective integration into the module structure, the FBGs recorded higher temperatures than the TCs. The temperature difference between FBGs and TCs increases with the increase of operating stress ( $\approx 1.5^{\circ}$ C (a) 10A and  $\approx 6^{\circ}$ C (a) 50A), which indicates that FBGs obtained more thermal information from the module than the conventional TCs attached on the heatsink. The differences between individual FBG thermal readings in steady-state and step transient conditions is due to the heatsink temperature gradient due to the water-cooling flow cycle. The thermal recordings' general fluctuation is induced by the cooling capability thermal-inertia limitations of the chiller system used. The inverter is stopped at 500 s, resulting in a relatively rapid heat decrease whose dynamics are determined by the whole system's thermal inertia, including the heat sink and the cooling chiller.

The TC measurements obtained cannot be directly used to validate the FBG thermal sensing system measurements. This is due to inherent installation limitations of the TCs, which are fitted in the top of the heat sink and not in the baseplate. This part presents a validation method of the FBG thermal measurements based on a thermal network model of the power module and TC temperature measurements.

Figure 5.19 shows the power module's thermal network schematic, including the thermal pad and the water-cooled heatsink. The power loss is dissipated from the semiconductors (top of the module) to the heatsink and then to the ambient, forming a thermal path involving the following elements:  $R_{th,j-c}$  is the junction to case thermal resistance of the power module,  $R_{th,TIM}$  is the thermal resistance of the thermal pad as the thermal interface material (TIM), and  $R_{th,hs}$  is the thermal resistance of the heat sink. The temperature measurement by FBG/

TC and coolant temperature are also illustrated in this diagram. It can be noted from the thermal model that the thermal resistance of the thermal pad,  $R_{th,TIM}$ , is the root cause of temperature measurement difference between the FBG and TCs. From the thermal model,  $R_{th,TIM}$  can be derived from the power losses of the power devices and the temperature gradient between FBG (base temperature) and TCs (top heat sink temperature),

$$R_{th,TIM} = \frac{T_{base} - T_{hs,top}}{P_{loss}}$$
(5.1)

By comparing the derived thermal resistance from the thermal model given by (5.1) and its nominal value deduced from the material properties and the thermal pads' geometry The thermal resistance  $R_{th, TIM}$  based on the material properties and geometry can be expressed as:

$$R_{th,TIM} = \frac{L}{A^*\lambda}$$
(5.2)

where: *L* is the thickness of the thermal pad, *A* is the area and  $\lambda$  is the TIM material's thermal conductivity. Detailed parameters of applied thermal pads are given in Table 5.4. Using (5.2) and Table 5.4, the nominal thermal resistance *R*<sub>th,TIM</sub> of  $\approx$  0.1956 °C/W is obtained.



Figure 5.19: Test power module thermal network

Table 5.4: Thermal pad parameters

| Property                                        | Value   |
|-------------------------------------------------|---------|
| Thickness (mm) (L)                              | 0.5     |
| Area $(mm^2)$ (A)                               | 34 x 94 |
| Thermal conductivity $(W/(m \cdot K))(\lambda)$ | 4       |

To analytically determine  $R_{th,TIM}$  from the thermal model using (5.1), the power loss heat transferred through the system needs to be calculated. Here, the power losses are calculated based on the method [108] using test voltage/current measurements. The temperature-dependent on-state and switching losses are extracted from the datasheet and then imported into an electrothermal model with the coolant temperature as ambient reference to derive each semiconductor's total power losses. The 10A and 50A steps' calculated power losses are shown in Figure 5.20 with the same legend as Figure 5.18. The  $R_{th,TIM}$  value was obtained using the calculated power loss and the temperature measurements from the FBGs and the corresponding TCs in the heatsink during the steady-state condition (from 350s to 500s in Figure 5.18) examined in the 10 and 50 A tests. Figure 5.21 shows the determined  $R_{th,TIM}$  together with its nominal value calculated from (5.2) for all considered sensing locations. The results show the analytically calculated  $R_{th,TIM}$  value matches its calculated nominal value, thus validating the FBG array sensor temperature measurements.



Figure 5.20: Power losses calculation under (a) 10A and (b) 50A total current step-change conditions



Figure 5.21: Validation of temperature measurement based on the thermal resistance of TIM *R<sub>th,TIM</sub>* under (a) 10A and (b) 50A total current conditions

The dynamic response of the FBG array thermal sensor under transient thermal conditions is examined in this part. A wind speed representative dynamic load profile was applied on the test inverter, which was achieved by appropriate control of the inverter load current. Figure 5.22 shows the recorded thermal measurements under the applied typical wind speed profile, comprising nine distinct load changes in the examined 650s period, as detailed in Table 5.5.

Table 5.5: Wind profile representative condition

| Time (s)  | Current (A) | Time (s)  | Current (A) |
|-----------|-------------|-----------|-------------|
| 0 - 50    | 0           | 340 - 400 | 39.58       |
| 50 - 170  | 40.15       | 400 - 460 | 25.21       |
| 170 - 200 | 26.54       | 460 - 550 | 32.23       |
| 200 - 250 | 0           | 550 - 580 | 18.32       |
| 250 - 340 | 32.15       | 580 - 650 | 0           |



Figure 5.22: Thermal measurement under dynamic condition

The thermal measurements obtained by the FBGs show a higher sensitivity to dynamic thermal variation when compared with the TC measurements. The FBGs registered the instant change between steps, whereas the TC measurements were smooth and less detailed; the FBGs also recorded higher levels of thermal variation. For instance, FBG sensors recorded a temperature rise of  $\approx 14$  °C in the 1<sup>st</sup> step (current change from 0 to 40.15A), and a temperature drop of  $\approx 9$  °C in the 3<sup>rd</sup> step change (current change from 26.54 A to 0 A) – the corresponding TC measurements are a  $\approx 6$  °C temperature rise and a  $\approx 3$  °C temperature drop. This is because the FBG can be more easily installed on the module baseplate within only one groove and one terminal compared with TC, and the temperature measurement difference is caused by such installation position as indicated in Figure 5.19. Figure 5.22 also demonstrates an important functional feature of FBG, which is EMI immunity: while the TC recordings are noisy and contain considerable temperature spikes, the FBG measurements are clear and unaffected by EMI.

### 5.4 DEEP LEARNING FOR CONDITION MONITORING

In this section, the algorithm and NN structure used for the condition monitoring method in Chapter 4 is improved to handle two field-deployment challenges: indicators for degradation representation and generalization on complex operating conditions. An optimized algorithm is proposed to solve the difficulty of power loss calculation, and various combinations of deep learning neural networks (DNNs) are evaluated in terms of training speed and accuracy for health state recognition under a variable load current profile.

#### 5.4.1 Improved condition monitoring algorithm

The condition monitoring method proposed in Chapter 4 relies on the power loss of the multi-chip power module to identify the health state changing. However, this process can be a difficult task in practical wind turbines while the multi-physics environment of the turbine nacelle is too complicated to extract the converter system's thermal network. Thus, in this section, the indicators between the first and second stage NN, e.g. the power loss in the original method, is replaced by a set of electrical parameter (EP) measurements consisting of DC-link voltage  $V_{DC}$ , load voltage  $V_L$ , load current  $I_L$ , power factor PF, and switching frequency  $f_s$ , which are more accessible in practices. Such EPs are the necessities to power loss calculation based on the electrothermal modelling method and are thus used to represent the power converter's electrothermal characteristics at different health states.

Then, similar to the method in Chapter 4, the improved condition monitoring method consists of two stages, as shown in Figure 5.23. The NN is designed to rapidly learn the intrinsic electrothermal behaviour of the power modules under very complex combinations of operating points and environmental conditions. The training process can be completed within few hours based on realistic experiments, but if establishing complete look-up tables to cover all the range, it can cost months by using conventional electrothermal modelling, not to mention that the accuracy is also a problem in such a simulation solution. Based on the test rig in Section 5.2.1, considering the paralleled modules as an entire multi-chip module, an artificial neural network (ANN) is first established to evaluate this multi-chip system's thermal behaviour and obtain the estimated electrical parameters (EEP). Because the system's thermal network is fixed under each solder degradation level, in each scenario, the proportion of the dissipated heat towards the top and bottom direction would also be fixed. The temperature measurements can illustrate the heat-flux through the heatsink and derive the electrical parameters of the multi-chip system. The measured electrical parameters (MEP) will be recorded, and the deviation compared with the EEP could illustrate how the degraded state has changed from the initial health state.



Figure 5.23: Processing diagram of the condition monitoring method

## 5.4.2 NN structure

In the test rig shown in Section 5.2.1, there are 16 temperature measurement points for power modules, i.e. four on the heatsink top surface and four on the bottom surface for each module. Besides, the water temperature at the inlet and outlet, ambient temperature, and the temperature of cold junction compensation (CJC) of thermocouple dock are also logged in real-time at a sampling rate of 1 sample per second. To diagnose the health state of the multichip converter system, the temperature information will be imported into the pre-trained NN, whose output EEPs will be compared with the MEP. There are total six first-stage NNs which are trained under the degradation levels indicated in Table 5.2. The errors between these two kinds of electrical parameters are caused by the solder degradation blocking heat-flux transportation and could demonstrate how severe is the degradation developed. These will be imported into a classification network at the second stage to pair the degradation level to the most similar one from the trained reference states.

The basic neural network structure for the improved method is single hidden layer networks, as shown in Figure 5.24. The first-stage network is established to estimate the converter electrothermal characteristics expressed as the EPs. The differences between the estimated and measured electrical parameters are employed as the input of the second-stage pattern recognition network to classify the five labels of degradation levels.



Figure 5.24: The basic two-stage NN for the improve condition monitoring method

The performance of basic NN could be improved by adding the hidden layers in each stage network to form various deep learning architectures that would extract the more complex combination of operating points and cooling conditions and transient responses during rapid wind speed varying. The DNN structure for each stage is shown in Figure 5.25 [109].



Figure 5.25: The DNN structure in (a) first stage and (b) second stage

## 5.4.3 Condition monitoring results

For training purposes, each leg's electrical stress is derived from the electrical parameters under various operating conditions. The NNs are able to learn the intrinsic electrothermal behaviour of the power modules by training from the measured data at 6 continuous step steady states with a load RMS current of 10A, 20A, ..., 60A. This can be tested by the data under a 15-min transient condition with load current changes once per minute. This is because the thermal time constant of the power module is less than 1 second and the module itself is considered in a steady state under such 15-min transient condition. The power loss distribution on each leg in the health condition (DL 0) is shown in Figure 5.26, and the power loss of lower the leg diode is comparatively higher due to degradation as the analysed about the thermal camera imaging results in Figure 5.6.



Figure 5.26. Power losses distribution under (a) steady-state and (b) transient conditions

Then, Module 1 is emulated to different degradation levels defined in Table 5.2. The solder layer's ageing process will change the thermal network of a power module, which would

cause the first stage NN trained on health state underestimate the power losses when importing the temperature distribution of a degradation condition into it. The estimated power losses from first stage NN and the power losses generated on the devices could be derived respectively from the electrical parameters of the first stage NN and measurement. Therefore, the second stage NN is employed to extracted features of degradation conditions from the deviations between the estimated electrical parameters and measurements.

The combinations of basic NN and DNN in the first and second stages are established in MATLAB/Deep Learning Toolbox. The operating conditions are the same as shown in Figure 5.26. The combined NNs are trained on a PC with 4-core i7-6820HQ/ 32G RAM and the training speed and mean accuracy are shown in Table 5.6. The combined two DNN network has the highest accuracy. As the classification results shown in the confusion matrixes in Figure 5.27, based on the basic NN the accuracy is only 60.8% when used to evaluate the transient conditions, but the deep learning architecture could raise the accuracy more than 95%, which shows the potential to be utilized for field deployment.

| 1 <sup>st</sup><br>stage | 2 <sup>nd</sup><br>stage | Training<br>time (s) | Epoch           | Accuracy<br>(%) |
|--------------------------|--------------------------|----------------------|-----------------|-----------------|
| Basic                    | Basic<br>DNN             | 53                   | 52<br>93        | 60.8<br>78.9    |
|                          | Basic                    | <u>1304</u><br>818   | <u>93</u><br>76 | 78.9            |
| DNN                      | DNN                      | 1735                 | 113             | 98.8            |

Table 5.6: Training speed and accuracy of different combinations of NN architecture

| DL 0                                 | 537<br>8.9%   | 1055<br>17.6% | 1<br>0.0%      | 0<br>0.0%     | 0<br>0.0%      | 0<br>0.0%     | 33.7%<br>66.3% | I        | DL 0 | 537<br>8.9%   | 0<br>0.0%     | 1<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 99.8<br>0.2  |
|--------------------------------------|---------------|---------------|----------------|---------------|----------------|---------------|----------------|----------|------|---------------|---------------|---------------|---------------|---------------|---------------|--------------|
| DL 1<br>DL 2<br>DL 3<br>DL 4<br>DL 5 | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%      | 0<br>0.0%     | 0<br>0.0%      | 0<br>0.0%     | NaN%<br>NaN%   | n level  | DL 1 | 0<br>0.0%     | 1055<br>17.6% | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 100<br>100   |
| DL 2                                 | 1<br>0.0%     | 0<br>0.0%     | 245<br>4.1%    | 0<br>0.0%     | 1<br>0.0%      | 11<br>0.2%    | 95.0%<br>5.0%  | radatio  | DL 2 | 1<br>0.0%     | 0<br>0.0%     | 1016<br>16.8% | 0<br>0.0%     | 1<br>0.0%     | 11<br>0.2%    | 95.0<br>5.0  |
| 0<br>DL 3                            | 0<br>0.0%     | 0<br>0.0%     | 771<br>12.8%   | 1062<br>17.7% | 10<br>0.2%     | 0<br>0.0%     | 57.6%<br>42.4% | deg      | DL 3 | 0<br>0.0%     | 0<br>0.0%     | 11<br>0.2%    | 1062<br>17.7% | 10<br>0.2%    | 0<br>0.0%     | 98.<br>1.9   |
| DL 4                                 | 0<br>0.0%     | 0<br>0.0%     | 11<br>0.2%     | 28<br>0.5%    | 727<br>12.1%   |               | 94.9%<br>5.1%  | output   | DL 4 | 0<br>0.0%     | 0<br>0.0%     | 12<br>0.2%    | 28<br>0.5%    | 1182<br>19.6% | 0<br>0.0%     | 94.9<br>5.1  |
| DL 5                                 | 0<br>0.0%     | 0<br>0.0%     | 12<br>0.2%     | 0<br>0.0%     | 455<br>7.6%    | 1083<br>18.0% | 69.9%<br>30.1% | Tes te d | DL 5 | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 0<br>0.0%     | 1083<br>18.0% | 100<br>100   |
|                                      | 99.8%<br>0.2% |               | 23.6%<br>76.4% |               | 60.9%<br>39.1% |               | 60.8%<br>39.2% | T        |      | 99.8%<br>0.2% | 100%<br>0.0%  | 97.8%<br>2.2% | 97.4%<br>2.6% | 99.1%<br>0.9% | 99.0%<br>1.0% | 98.8<br>1.29 |
|                                      | DL.           | DL)           | DL2            | dr,           | DL.A           | かっ            |                |          |      | DL.           | DL)           | DL2           | Dr?           | DL.           | 015           |              |
|                                      | Р             | re-set        | target         | degrad        | lation         | level         |                |          |      | P             | re-set        | target        | degrad        | lation        | level         |              |
|                                      |               |               | (a)            | )             |                |               |                |          |      |               |               |               |               | (b)           |               |              |

Figure 5.27: Classification results of (a) basic NN and (b) deep learning architecture

# 5.5 SUMMARY

In this section, three optimizations in the aspects of the data labelling process, temperature sensing, and network architectures are presented to improve the feasibility of the condition monitoring method for field-deployment. With an inverter test rig established in this thesis, an equivalent emulation for uneven degradation is used to generate labelled data for the network training. Such data labelling vehicles' thermal and electrical behaviour is validated to have the same characteristics as the practical uneven solder degradation in a multi-chip converter system. This chapter then reports a study of fibre optic FBG distributed thermal sensing application for wind turbine power converter thermal monitoring. The sensor design, implementation and calibration principles are presented, and an extensive experimental study is then undertaken to evaluate and validate the potential of the proposed sensing scheme in steady-state and transient conditions. The resulting thermal monitoring system is of considerably lower wiring and installation complexity compared to thermocouples and offers full EMI immunity and ease of in-situ multi-point sensing of the sensor array. This section also presents and evaluates deep learning neural networks that could be effective tools to handle the degradation level recognition for multi-chip converter system. With the improved monitoring algorithm using electrical parameters as state indicators, the basic neural network is not accurate enough to identify the degradation levels under fast-changing load conditions, but the two-stage DNN can achieve such target with an overall accuracy of more than 95%. Based on this understanding, it is feasible to promote the condition monitoring of multi-chip converter system in wind turbine field deployment.

# 6 CONDITION MONITORING OF WIND TURBINE CONVERTERS BASED ON SCADA DATA

#### 6.1 INTRODUCTION

Wind turbine (WT) manufacturers are adopting new power semiconductors and converter techniques. The high maintenance cost demands field-deployable condition monitoring (CM) methods suitable for the full life cycle. In addition to the slow ageing process in the wearout phase of the bathtub curve, early faults in the "infant fatality" phase also cause a high failure rate due to component defect, immature design or improper assembly [29]. However, existing CM methods based on thermal networks or TSEPs (temperature-sensitive electrical parameters) [2], are mostly designed to detect long-term gradual ageing process, relying on the degradation mechanism learned from a large quantity of model-based testing and samples. Multi-physics modelling is expensive, easily distorted [58], and hence difficult for online application. To address these challenges, this chapter proposes a data-driven solution to detect the health-to-fault operational behaviour of WT converters, based on the limited SCADA data in the early-stage operation.

This chapter presents a framework of designing an online deployable CM system for WT converter. An unsupervised learning DNN is proposed to distinguish the healthy and faulty states with no need to label health states. By analysing the distribution of operating data, the cost function is customized to attenuate the effect of unbalanced data on model

generalization. An online learning process is employed to update the DNN by newly logged real-time data. The results are demonstrated in early-stage fault detection. Moreover, this updating process can enable the method to extend for long-term full life cycle CM.

#### 6.2 SCADA DATA BASED CONDITION MONITORING

The proposed framework includes a DNN modelling and a condition monitoring process, as shown in Figure 6.1. Using the SCADA data in the healthy operating period, the modelling process trains a DNN to represent the WT converter's healthy characteristics. In the CM process, each newly logged real-time data is input into the trained model whose prediction error indicates to what extent the WT converter has deviated from the healthy state. The processes contain the following steps.



Figure 6.1: Condition monitoring framework

#### 6.2.1 Data pre-processing

The WT SCADA system used in the study case contains dozens of physical channels of measured or calculated variables at aerodynamic, electrical, and thermal domains. Due to confidentiality reasons the exact wind farm, wind turbine and power converter type cannot be provided and the channels are not identified here. The raw SCADA data is logged as time-sequence with 10-min intervals. The channels which did not store any data and logged 'zero' values are removed from the modelling process, with n channels remaining. The observations during shutdown periods are also removed to focus on operating behaviour. The data of each channel is then standardised individually. For confidentiality reasons, the exact wind farm, wind turbine and power converter type cannot be provided.

#### 6.2.2 Network model

The network is expected to differentiate the faulty state of the converter from the healthy through the variation of thermal and electrical measurements during the failure development. The thermal response depends on the converter electrical operating point and cooling condition, which can be influenced by the converter fault. A regression neural network is proposed to model the thermal response, which correlates temperature measurements with the rest SCADA channels.

In this chapter, two common network architectures are selected as benchmarks shown in Figure 6.2. The number below each layer indicates the number of neurons in such layer. The first is a fully connected neural network (FCNN) consisting of three hidden layers to extract the complex thermal response. It uses ReLU (rectified linear unit) as activation function and Adam optimizer [110]. The second architecture is a long-short term memory (LSTM) type that can remember and transfer information about the features of time sequences, which has the potential to include both the aerodynamic inertia and thermal capacitance effects. According to the system-level thermal time constant, the input data is resampled by a 12-hour rolling window, every 72 SCADA samples of 10-minute intervals, into a 2-D sequence. The sequence-to-one LSTM is then established to predict the converter temperature based on the information of the last LSTM unit.



Figure 6.2: The network structure of (a) FCNN and (b) LSTM

#### 6.2.3 Fault detection

Once the converter undergoes a fault, the network will no longer accurately represent the thermal characteristics, and the prediction accuracy has potential signatures for fault detection. The accuracy A and the accuracy with squared error SA of a given data point can be calculated as

$$A = (1 - (y - t)/t) \times 100\%, SA = (1 - (y - t)^2/t^2) \times 100\%$$
(6.1)

where y is the prediction of the network, t the actual observation. As the model training removes the dependency on the operating point, the distribution of A from a healthy period is statistically close to a normal distribution presented in Section 6.4. Providing the converter remains healthy, the A values for the training and testing dataset should have the same distribution. This can be validated by two-sample t-test [111]. A rejection from such test indicates the pre-trained network can no longer accurately represent the converter state, i.e., the converter is now subjected to degradation.

As the degradation of power modules is a monotonically increasing process, the final breakdown will be the ending point of decreasing network prediction accuracy. Considering the sensitivity of fault detection, the '2- $\sigma$ ' region of *A*'s distribution is selected as the threshold of early fault alarm. Meanwhile, *SA* in positive value provides a unidirectional judgment, and a seven-day moving average is applied to *SA* for detecting the potential monotonical trend with the better signal-to-noise ratio. Finally, the smoothed *SA* (*SSA*) is used as a converter health indicator.

#### 6.2.4 CM with online learning

Aiming to deploy the method for real-time monitoring, it is necessary to investigate the adequacy of the training data for the network to represent the healthy WT converter adequately. The criterion is to keep collecting the training data since WT commissioning until the prediction of a 30-day rolling window reaches a same accuracy level as it achieves on the model training process. The overall performance of the network prediction on training or testing dataset can be estimated by the coefficient of determination,  $R^2$ , defined as

$$R^{2} = 1 - \sum_{i=1}^{size} \left( y_{i} - t_{i} \right)^{2} / size \times Var(t)$$
(6.2)

where Var(t) is the variation of actual measurement and *size* is the number of points in training or testing dataset. The generalization capability of the trained network is estimated by the consistency *C* as

$$C = \frac{R_{tst}^2}{R_{trn}^2} \times \left(1 - \frac{\left|\sigma_{err,tst} - \sigma_{err,trn}\right|}{\sigma_{err,trn}}\right)$$
(6.3)

where the  $R^{2}_{tst}$  (or  $R^{2}_{trn}$ ) donates the overall testing (or training) performance and  $\sigma_{err,tst}$  (or  $\sigma_{err,trn}$ ) is the standard deviation of the prediction error of testing (or training) dataset, respectively. The indicator *C* can also help to select a proper DNN model with different size of training data, as presented in Section 6.4.

The online learning method is put forward to achieve real-time condition monitoring. Firstly, the model is trained on an adequate amount of SCADA data and predicts the converter thermal performance in the following 30-day as testing. After every 10-min with a new SCADA data logging, the oldest data in the original testing dataset is moved to the training dataset, and the model is re-trained at the end of 30-day testing. Then the updated model predicts on the next 30-day rolling period, and the *SSA* is used to indicate the converter health state in real-time.

#### 6.3 COST FUNCTION DESIGN

Two widely used cost functions for regression are mean square error (MSE) and mean absolute error (MAE) of individual observation and prediction, which however underrate specific operating points with only a small amount of SCADA data. This chapter optimises these two cost functions by introducing a probability density weight (PDW) for each data point, to mitigate the unbalanced data distribution problem. The PDW is calculated for the converter during the healthy operation, whose flowchart is shown in Figure 6.3. The basic MSE and MAE are used as the benchmark for further comparison.



Figure 6.3: The flowchart of the PDW calculation process

#### 6.3.1 Clustering of SCADA channels

It is first necessary to identify indicators to define the operating points and derive the SCADA data's statistical distribution. Hierarchical clustering analysis can find out relevance between all SCADA channels based on the pairwise distance:

$$dist(\mathbf{x}_{p},\mathbf{x}_{q}) = \left\|\mathbf{x}_{p} - \mathbf{x}_{q}\right\| = \sqrt{\left(\mathbf{x}_{p} - \mathbf{x}_{q}\right) \cdot \left(\mathbf{x}_{p} - \mathbf{x}_{q}\right)}, p, q \in (1,...,n)$$
(6.4)

where *dist* is the Euclidean distance between the *p*-th and *q*-th SCADA channel.  $\mathbf{x}_p$  and  $\mathbf{x}_q$  are the standardized data of these two channels from healthy periods. Pairs of channels that are in the closest proximity will be linked as a cluster. The distance d(r,s) between two clusters *r* and *s* is calculated by the nearest neighbour method as

$$d(r,s) = \min(dist(\mathbf{x}_{si}, \mathbf{x}_{tj})), \ i \in (1, ..., n_s), \ j \in (1, ..., n_t)$$
(6.5)

where  $n_s$  ( $n_t$ ) is the number of channels in cluster s (t), and  $\mathbf{x}_{si}$  ( $\mathbf{x}_{tj}$ ) is the  $i^{\text{th}}$  ( $j^{\text{th}}$ ) channel in cluster s (t). The closest clusters can then be grouped further into larger clusters until all clusters of channels are linked together as a hierarchical tree by such agglomerative process.

The SCADA channels of a healthy WT is analysed by this clustering analysis with the dendrogram shown in Figure 6.4. The channels closer to the temperature channels have more impact on the latter. They are manually summarized into two clusters: 1<sup>st</sup> is cooling conditions, i.e. coolant temperature, flow rate and water pressure, and 2<sup>nd</sup> is electrical parameters, i.e. converter current, voltage and power.



Figure 6.4: The hierarchical clustering dendrogram of SCADA channels

#### 6.3.2 Principal component analysis

The dimension (the number of channels) of the cluster 1 and 2 is reduced to 1 using principal component analysis (PCA) [112] to summarize each cluster's measurement information by a 1-D representation. PCA uses orthogonal linear transmission to find a coordinate on which the scalar projection of time-sequence SCADA data **X** has the greatest variance. To find the first principal components ( $1^{st}$  PC), the weight matrix **W** has to satisfy the equation below.

$$\mathbf{W} = \underset{\|\mathbf{W}\|=1}{\arg\max} \left\{ \mathbf{W}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{W} \right\}$$
(6.6)

Then the 1<sup>st</sup> PC T can be calculated as

$$\mathbf{T} = \mathbf{X} \cdot \mathbf{W} \tag{6.7}$$

The ratio of the variance in 1<sup>st</sup> PC to the total variance in the cluster's SCADA data indicates to what extent the measurement information in each group is represented by the 1<sup>st</sup> PC. The explanation percentage is more than 99% according to the results. Hence, the operating condition is characterised by these two 1<sup>st</sup> PCs from cooling and electrical condition clusters.

#### 6.3.3 Probability distribution

The cumulative density function (CDF) of the PC values is calculated from the statistical distribution histogram for each cluster, respectively. The value of the PC when its CDF reaches 0.9 is selected as the threshold of the dense-to-sparse boundary to differentiate the uneven data distribution.

#### 6.3.4 2-D joint distribution

Two distributions above for the two 1<sup>st</sup> PCs can build a 2-D joint distribution and differentiate data points into two operating regions: either region 1 under both low power and low temperature condition or region 2 of the rest conditions according to

$$\begin{cases} region 1: \left[ < PC_1 | F_{PC1}(PC_1) = 0.9 \right] \cap \left[ < PC_2 | F_{PC2}(PC_2) = 0.9 \right] \\ region 2: \left[ > PC_1 | F_{PC1}(PC_1) = 0.9 \right] \cup \left[ > PC_2 | F_{PC2}(PC_2) = 0.9 \right] \end{cases}$$
(6.8)

where  $F_{PC1}$  (or  $F_{PC2}$ ) is the CDF of the cooling condition (or electrical parameters) cluster's PC,  $PC_1$  (or  $PC_2$ ) is the PC's value when its CDF reaches 0.9.

#### 6.3.5 Probability density weights

The weight for the SCADA data points falling into region r (1 or 2) is then derived as  $w_r$ ,

$$w_r = \sum_{r=1}^{2} m_r / m_r$$
 (6.9)

where  $m_r$  is the number of training data belonging to the region r. The SCADA dataset can be weighted accordingly. The basic MSE and MAE cost functions are then optimally weighted by PDW as

$$WMSE = \frac{1}{uv} \sum_{i=1}^{u} \sum_{j=1}^{v} w_{r,ij} \times (y_{ij} - t_{ij})^2, WMAE = \frac{1}{uv} \sum_{i=1}^{u} \sum_{j=1}^{v} w_{r,ij} \times |y_{ij} - t_{ij}|$$
(6.10)

where u is the batch size, and v the number of output channels. Thereby the training process can focus more on the less populated data points.

#### 6.4 CONDITION MONITORING RESULTS AND ANALYSIS

The SCADA data of five WTs in the same wind farm are collected for the study. Four turbines (No.1 to No. 4) have been operating normally since commission, while WT No. 5 has reported a converter fault, which is treated as the detection target in this section.

#### 6.4.1 SCADA data pre-processing

From No.1 WT the raw SCADA data of the first 150 days has 22000 sampling points, but only 16855 are valid and retained in the dataset after pre-processing.

The two 1<sup>st</sup> PCs obtained from PCA can explain 99.84% and 99.99% of the measurement information in each cluster. The 1<sup>st</sup> PCs' histogram, CDF and 0.9 threshold are shown in Figure 6.5 (a) and (b) respectively. In the joint distribution shown in Figure 6.5 (c), most of the operating points fall into region 1, while the defined threshold can clearly differentiate the uneven distribution. Based on (6.8) and (6.9), the weights for regions 1 and 2 are 1.13 and 8.61, respectively.



Figure 6.5: The histogram and CDF of the 1<sup>st</sup> PC for (a) cluster 1 and (b) cluster 2, and (c) the joint probability distribution

#### 6.4.2 Online learning results

The adequacy of the training data is evaluated from the 60-days training data. The consistency C of FCNN and LSTM with four cost functions and the different amounts of training data are calculated by (6.3) and plotted in Figure 6.6.

Without PDW, the FCNN with MAE and MSE fail to generalize on new operating points despite being trained with 150-days of training data. After testing from the  $150^{\text{th}}$  day, the FCNNs with PDW have better performance of the *C* above 0.9, while the WMSE has stronger constrains on minority data and is easier to solve than WMAE. However, the LSTM cannot achieve such consistency when new operating points appear on later period. This is because the LSTM could not learn enough patterns from the discontinuous sequence caused

by the data removal process. Going by these observations, this chapter selects the FCNN with WMSE and starts the CM process 150 days after the initial commissioning.



Figure 6.6: The consistency *C* of FCNN and LSTM with four cost functions on the different amounts of training data

It may be concerned that the operating condition would be influenced not only by the cooling and electrical conditions but also by other unseen facts that may cause false-positive fault detection. Hence, from the data-driven perspective, this chapter uses DBSCAN [113] to cluster every logged data set into different classes representing different operating states. During 300-days operating, in total 21 classes are identified, and the -1 indicates that these data points cannot cluster into any others, as shown in Figure 6.7. The consistent prediction performance among the classes can be examined by the bar figure which indicates the prediction accuracy A distribution of each class by using dot and bar to donate the mean value and standard deviation, respectively. Starting from the 150<sup>th</sup> day, although the CM period has new classes from 7<sup>th</sup> to 21<sup>st</sup>, it can still provide consistent error distribution, which demonstrates the robustness of the method.



Figure 6.7: The clustering results and corresponding error bar of each class

#### 6.4.3 Fault detection

With the CM starting from the  $150^{\text{th}}$  day, the prediction accuracy *A* of the healthy WT No. 1 on training and testing dataset has the same normal distribution. In order to statistically validate whether the prediction accuracy of testing dataset has deviated from the accuracy of training dataset, a two-sample *t*-test is performed for the null hypothesis that the distribution of the accuracy of two datasets have the same equal means and equal variances. The *p* value is 0.4490 which means that the null hypothesis cannot be rejected at a 5% significance level. Thereby the model still has a similar prediction accuracy on both training and testing dataset, as shown in Figure 6.8 (a). The 7-day smoothed accuracy *SSA* keeps above the predefined fault detection threshold during the entire CM period, as shown in Figure 6.8 (b). The other three healthy WTs also have the same results. This result indicates that the proposed method would result in a very low false positive.



Figure 6.8: (a) The prediction accuracy distribution and (b) condition monitoring results of healthy WT No.1

The accuracy A of No. 5 WT on training and testing data cannot be considered as the same distribution by t-test, with  $p \cong 0$  at 5% significance level, which indicates that the converter is no longer within the original healthy state, as shown Figure 6.9 (a). On day 195, the WT is reported converter fault and shut down, while the indicator SSA drops below the threshold, as shown in Figure 6.9 (b). The proposed CM method could also produce a warning message several days ahead of the converter fault, which could help to coordinate predictive maintenance.

The online learning method requires model training to be completed within 10 min. The CM algorithm of FCNN-WMSE for a single WT can be completed within two minutes by a workstation with i7-CPU/GTX1080-GPU in MATLAB 2020a. Furthermore, cloud and distributed computing can be employed for large scale wind farms and their fine-tuning, which can effectively boost the training process during long-term CM.



Figure 6.9: The prediction accuracy distribution and (b) condition monitoring results of No.5 WT with converter fault reported on day 195

#### 6.5 SUMMARY

This chapter proposes a data-driven condition monitoring method for WT converters using limited and unbalanced SCADA data. A DNN model is established to represent the healthy converter's characteristics, and the change of prediction accuracy of the model can be correlated with the abnormality during the converter operation, indicating a potentially active fault. The cost function has been modified by the probability density weight of the data to improve the model generalization, considering the limited and unbalanced data. Moreover, the online learning process ensures that the proposed CM method could be deployable in real-time. Demonstrated using the historical SCADA data from wind turbines, this CM method shows robust diagnosis results and would predict the converter fault a few days ahead of actual failure.

## 7 CONCLUSIONS AND FUTURE WORK

#### 7.1 CONCLUSIONS

This thesis mainly focuses on the reliability and condition monitoring of multi-chip power modules in wind turbines. It has analysed the initial solder ageing development and further uneven degradation of the multi-chip power module under realistic stress conditions, developed an online condition monitoring method to detect the uneven degradation and promoted the field-deployable method for commercial offshore wind turbines.

In Chapter 2, the growth process of two types of solder defects, voids and cracks, in a power module is analysed under low-stressed conditions. The morphological development of the defects is captured by CT scanning before and after power cycling tests. An established 2-D symmetrical FEA model is validated by experimental results, to investigate the thermomechanical stress in the solder layer. The simulation shows that solder material near the initial void and crack undergoes high fatigue inelastic strain due to stress concentration despite low amplitude of temperature cycling in normal operation. The voids attached to the solder-chip boundary have a critical effect on the power module's reliability as they are possible to form an initial crack which can then accelerated grow, leading to a catastrophic thermal breakdown. Combined with a physics-of-failure model, the dynamic of initial defects' distribution.

The evaluation of the multi-chip power module's uneven degradation process under longterm operation is presented in Chapter 3. This chapter establishes electrothermal models to analyse the temperature profiles of wind power converters used in fully and partially rated wind turbines in the whole wind speed range and then combined with a lifetime model to estimate the lifetime consumption of the power converters in terms of end-of-life, per hour operation and per megawatt-hour energy generated. The machine side converters have been found more vulnerable, leading to quicker lifetime consumption of PMSG under high-speed range and DFIG at synchronous speed. Such models are further used to investigate the uneven degradation development of a multi-chip power module under long-term service conditions. The results show that the asymmetrical packaging layout can send paralleled devices into uneven degradation of few-year lifetime difference even they have the same initial health states. More importantly, once the defects cause an increase in thermal resistance on the weak diode initially, the diode's further ageing progress will be significantly accelerated.

After the initial ageing and uneven degradation analysis, a condition monitoring scheme for detecting uneven solder layer degradation in a multi-chip-in-parallel system is proposed in Chapter 4. The sensitivities of the multi-chip power module electrothermal characteristics to uneven degradation are first analysed based on a high-fidelity multi-physics simulation model, and the external heat flux is then identified to a potential condition monitoring indicator. Based on such understanding, a two-stage neural network method is proposed to detect the uneven degradation levels: the first stage NNs represent the multi-chip system's electrothermal characteristics and the second is trained to recognise the degradation levels from the firsts' outputs. This approach is based on the dependence of a module's external temperature distribution on its internal health condition. The detection rate of this condition monitoring method has been found experimentally higher than 98%.

Then three optimizations are proposed in Chapter 4 to improve the feasibility of the condition monitoring method for field-deployment. The labelled data required for the network training is generated from an inverter test rig with the capability of equivalently emulating uneven degradation. Such data labelling vehicles' thermal and electrical behaviour is validated to have the same characteristics as the practical uneven solder degradation in a multi-chip converter system. The FBG distributed sensing technique is then employed to measure multi-point temperature on the multi-chip converter system, which integrates multiple sensors array on one fibre and achieves high measuring precision with

immunity to EMI. The resulting thermal monitoring system is of considerably lower wiring and installation complexity compared to thermocouples. The complex continuously varying operating conditions in practical wind turbines is also generalised by DNN structure using external electrical-thermal measurements as state indicators. The improved two-stage DNN achieves an overall accuracy of more than 95% under dynamic thermal conditions encountered in field inverters operating under a typical wind speed profile.

The core concept of the proposed CM method, i.e. train a network to represent the healthy state and then distinguish the deviation of faulty conditions, is further promoted to commercial offshore wind turbines. Thereby, a data-driven condition monitoring method is proposed in Chapter 6 to detect the early-stage fault of wind turbine converters using limited and unbalanced SCADA data. A DNN is designed by an unsupervised representation learning approach, and the change of model prediction accuracy can be correlated with the abnormality during the converter operation. The cost function has been modified by the probability density weight of the data to improve the model generalization, considering the limited and unbalanced data. Moreover, an online learning process empowers the CM method is deployable for long-term real-time diagnosis. Demonstrated using the historical SCADA data from wind turbines, the proposed CM method shows robust diagnosis results and would predict the converter fault a few days ahead of actual failure.

#### 7.2 FUTURE WORK

The reliability analysis in this thesis is focused on the solder bonding IGBT power module. The ageing mechanism and failure mode may be different to some emerging packaging and semiconductor techniques like silver sintering, double side cooling, SiC, GaN, etc., which are attracting more research attention. The author has also been exploring the reliability issues of the SiC module in C4 and J6, which are although not included in this thesis. The technical pipeline proposed in this thesis can be inherited to evaluate new techniques and provide feedback to optimize their designs.

The initial defects, voids, and cracks have been randomly distributed in solder layers and may merge together during development. This thesis simplifies the situation to a single void or crack and investigate the development of each one individually. Moreover, as restricted by the CT equipment, the void is considered as a perfect spherical shape, but more morphological details might be found if employing a higher resolution CT. Thereby, more exhaustive information of the solder defects, e.g. distribution combinations and shapes, can be extracted from CT scanning and input to FEA geometry.

Besides, a comprehensive high-fidelity model is also desired to simulate the thermomechanical behaviour of the defect development process dynamically. In order to reduce computing time, the FEA used is a symmetrical 2-D model. A 3-D model may comprehensively describe the defect development into all directions while they have been validated to have the same accuracy on thermomechanical modelling. Apart from COMSOL, other modelling methods and tools like Abaqus are also suitable to solve the defect development problem, and advanced algorithms are also required to increase computing speed and accuracy.

From the perspective of lifetime estimation, it will be valuable to transfer the physics-offailure lifetime model to an analytical model which uses junction temperature information, e.g.  $T_{j,mean}$  and  $\Delta T_j$ , as indicators instead of fatigue strain. The physics-of-failure lifetime model is difficult to apply in real-time because it relies on strain calculation from multiphysics modelling, which is a time-consuming task, but the electrothermal model is easier to solve than the FEA model. This thesis has established the link between the physics-offailure model and analytical model. It should be mentioned that such a relationship is only applicable to a specific type of devices. Although this thesis only considers one temperature condition,  $\Delta T_j=17.5$  °C,  $T_{j,mean}=68$  °C, a comprehensive solution can be proposed by further manual labour collaborating with industrial end-users. That is, first calculate a completed temperature profile that the device will undergo in the full operating range of practical applications, use rain-flow counting to summary the profiles into  $\Delta T_{j}/T_{j,mean}$  bins, calculate the physics-of-failure lifetime for each bin by the proposed method, and finally a mapping can be established between  $T_j$  profiles and lifetime. After this, the thesis's work can achieve its full effect.

The failure mode considered in this thesis is solder layer degradation and externally experimental emulation. Other failure modes, like bond wire lift-off, are also necessary to treat as condition monitoring targets. Besides, the multi-chip modules can be aged to some certain levels of various failure modes by accelerated lifetime tests, while some smart

approaches are needed to induce uneven degradation at the same time. The CM method can then be tested to detect multiple failure patterns, and advanced methods should also be designed accordingly.

The operating range selected as the condition for generating training data should fully cover the intended range under which the devices would practically work. This thesis uses stepchange profiles to traverse from the minimal to the maximal currents. Although the training operating range is not exactly a point-to-point mapping to the testing data, the NN has adequately learned the target system's electrothermal behaviour. This is because the thermal constant of power modules is such relatively low that they can reach a steady thermal state. Thus, the NN method provides robust inference as long as the practical operating conditions are within the coverage of training operating range, which has been validated in Section 6.4.2. However, some continuously rapid changing patterns, which have not been observed during the tests and data in this thesis though, may not be captured by this steady state-based method. It would be valuable to investigate the effects of some extreme conditions on the NN's generalization. Therefore, a sensitivity analysis can be performed to select the most valuable operating points and patterns for generating training data before commissioning.

The network in Chapter 4 is proposed in a two-stage approach to ensure a certain degree of interpretability. The indicators of linking the stages present how the network can represent the physical system, i.e. capture the electrothermal behaviour. A deeper network is possible to combine the two-stage NNs using other architectures, but the interpretability of such end-to-end black box is also essential in industrial applications. To promote data-driven solutions in industrial applications, another issue needed to consider is the benchmarking database. The commonly used data in wind energy is SCADA, which lacks ageing labels and may not fully cover the turbine operating range. Some approaches, like generative adversarial networks, transfer learning and digital twin, are desired to generate more synthetic data to compensate for some valuable missing gaps. An opensource database would also benefit academia.

### BIBLIOGRAPHY

- [1] H. M. Dai Sugimoto, Semiconductor Quality and Reliability Handbook, H. Miyamoto, ed., Third ed. Japan, 2018. [Online]. Available.
- [2] S. Yang, D. Xiang, A. Bryant, P. Mawby, L. Ran, and P. Tavner, "Condition monitoring for device reliability in power electronic converters: A review," *IEEE Transactions on Power Electronics*, vol. 25, no. 11, pp. 2734-2752, 2010.
- [3] C. Ui-Min, F. Blaabjerg, and L. Kyo-Beum, "Study and Handling Methods of Power IGBT Module Failures in Power Electronic Converter Systems," *Power Electronics, IEEE Transactions on*, vol. 30, no. 5, pp. 2517-2533, 2015.
- [4] O. Hyunseok, H. Bongtae, P. McCluskey, H. Changwoon, and B. D. Youn, "Physicsof-Failure, Condition Monitoring, and Prognostics of Insulated Gate Bipolar Transistor Modules: A Review," *Power Electronics, IEEE Transactions on*, vol. 30, no. 5, pp. 2413-2426, 2015.
- [5] M. Ciappa, "Selected failure mechanisms of modern power modules," *Microelectronics Reliability*, vol. 42, no. 4–5, pp. 653-667, 4// 2002.
- [6] C. Busca, "Modeling lifetime of high power IGBTs in wind power applications-An overview," in 2011 IEEE International Symposium on Industrial Electronics, 2011, pp. 1408-1413: IEEE.
- [7] Ø. B. Frank, "Power Cycle Testing of Press-Pack IGBT Chips," NTNU, 2014.
- [8] L. Tinschert, A. R. Årdal, T. Poller, M. Bohlländer, M. Hernes, and J. Lutz, "Possible failure modes in Press-Pack IGBTs," *Microelectronics Reliability*, vol. 55, no. 6, pp. 903-911, 5// 2015.
- [9] F. Wakeman, D. Hemmings, W. Findlay, and G. Lockwood, "Pressure contact IGBT, testing for reliability," *PCIM, Nuremberg Germany*, 2000.
- [10] A. Hasmasan, C. Busca, R. Teodorescu, and L. Helle, "Modelling the clamping force distribution among chips in press-pack IGBTs using the finite element method," in 2012 3rd IEEE International Symposium on Power Electronics for Distributed Generation Systems (PEDG), 2012, pp. 788-793: IEEE.
- [11] A. Wintrich, U. Nicolai, T. Reimann, and W. Tursky, "Application manual power semiconductors," 2011: ISLE.

- [12] U. M. Choi, S. Joergensen, and F. Blaabjerg, "Advanced Accelerated Power Cycling Test for Reliability Investigation of Power Device Modules," *IEEE Transactions on Power Electronics*, vol. PP, no. 99, pp. 1-1, 2016.
- [13] X. Dawei, L. Ran, P. Tavner, Y. Shaoyong, A. Bryant, and P. Mawby, "Condition Monitoring Power Module Solder Fatigue Using Inverter Harmonic Identification," *Power Electronics, IEEE Transactions on*, vol. 27, no. 1, pp. 235-247, 2012.
- [14] B. Ji, V. Pickert, W. Cao, and B. Zahawi, "In situ diagnostics and prognostics of wire bonding faults in IGBT modules for electric vehicle drives," *IEEE Transactions on Power Electronics*, vol. 28, no. 12, pp. 5568-5577, 2013.
- [15] S. Eicher, M. Rahimo, E. Tsyplakov, D. Schneider, A. Kopta, U. Schlapbach, and E. Carroll, "4.5kV press pack IGBT designed for ruggedness and reliability," in *Industry Applications Conference, 2004. 39th IAS Annual Meeting. Conference Record of the 2004 IEEE*, 2004, vol. 3, pp. 1534-1539 vol.3.
- [16] B. Backlund, R. Schnell, U. Schlapbach, R. Fischer, and E. Tsyplakov, "Applying igbts," *ABB Semiconductors*, 2007.
- [17] A. Oukaour, B. Tala-Ighil, B. Pouderoux, M. Tounsi, M. Bouarroudj-Berkani, S. Lefebvre, and B. Boudart, "Ageing defect detection on IGBT power modules by artificial training methods based on pattern recognition," *Microelectronics Reliability*, vol. 51, no. 2, pp. 386-391, 2011.
- [18] B. Hu, S. Konaklieva, N. Kourra, M. A. Williams, L. Ran, and W. Lai, "Long Term Reliability Evaluation of Power Modules with Low Amplitude Thermomechanical Stresses and Initial Defects," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, pp. 1-1, 2019.
- [19] B. Hu, J. O. Gonzalez, L. Ran, H. Ren, Z. Zeng, W. Lai, B. Gao, O. Alatise, H. Lu, and C. Bailey, "Failure and reliability analysis of a SiC power module based on stress comparison to a Si device," *IEEE Transactions on Device and Materials Reliability*, vol. 17, no. 4, pp. 727-737, 2017.
- W. Lai, M. Chen, L. Ran, O. Alatise, S. Xu, and P. Mawby, "Low \$\Delta T\_ {j} \$ Stress Cycle Effect in IGBT Power Module Die-Attach Lifetime Modeling," *IEEE Transactions on Power Electronics*, vol. 31, no. 9, pp. 6575-6585, 2016.
- [21] J. Lutz, H. Schlangenotto, U. Scheuermann, and R. De Doncker, "Semiconductor power devices," *Physics, characteristics, reliability*, vol. 2, 2011.
- [22] T. A. Nguyen, S. Lefebvre, P.-Y. Joubert, D. Labrousse, and S. Bontemps, "Estimating current distributions in power semiconductor dies under aging conditions: Bond wire liftoff and aluminum reconstruction," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 5, no. 4, pp. 483-495, 2015.
- [23] J. Zhang, X. Du, Y. Wu, Q. Luo, P. Sun, and H.-M. Tai, "Thermal parameter monitoring of IGBT module using case temperature," *IEEE Transactions on Power Electronics*, vol. 34, no. 8, pp. 7942-7956, 2018.
- [24] K. Fischer, T. Stalin, H. Ramberg, T. Thiringer, J. Wenske, and R. Karlsson, "Investigation of converter failure in wind turbines," *Elforsk report*, vol. 12, no. 58, 2012.

- [25] B. Lu, Y. Li, X. Wu, and Z. Yang, "A review of recent advances in wind turbine condition monitoring and fault diagnosis," in 2009 IEEE power electronics and machines in wind applications, 2009, pp. 1-7: IEEE.
- [26] Y. Berthier, L. Vincent, and M. Godet, "Fretting fatigue and fretting wear," *Tribology international*, vol. 22, no. 4, pp. 235-242, 1989.
- [27] E. Deng, Z. Zhao, J. Li, and Y. Huang, "Clamping force distribution within press pack IGBTs," *Design, Simulation and Construction of Field Effect Transistors*, p. 73, 2018.
- [28] E. Deng, Z. Zhao, Z. Lin, R. Han, and Y. Huang, "Influence of temperature on the pressure distribution within press pack IGBTs," *IEEE Transactions on Power Electronics*, vol. 33, no. 7, pp. 6048-6059, 2017.
- [29] A. Hanif, Y. Yu, D. DeVoto, and F. Khan, "A Comprehensive Review Toward the State-of-the-Art in Failure and Lifetime Predictions of Power Electronic Devices," *IEEE Transactions on Power Electronics*, vol. 34, no. 5, pp. 4729-4746, 2019.
- [30] P. Ghimire, S. Bęczkowski, S. Munk-Nielsen, B. Rannestad, and P. B. Thøgersen, "A review on real time physical measurement techniques and their attempt to predict wear-out status of IGBT," in 2013 15th European Conference on Power Electronics and Applications (EPE), 2013, pp. 1-10: IEEE.
- [31] V. Smet, F. Forest, J.-J. Huselstein, A. Rashed, and F. Richardeau, "Evaluation of \$ V\_ {\rm ce} \$ Monitoring as a Real-Time Method to Estimate Aging of Bond Wire-IGBT Modules Stressed by Power Cycling," *IEEE Transactions on Industrial Electronics*, vol. 60, no. 7, pp. 2760-2770, 2012.
- [32] A. Singh, A. Anurag, and S. Anand, "Evaluation of Vce at inflection point for monitoring bond wire degradation in discrete packaged IGBTs," *IEEE Transactions* on Power Electronics, vol. 32, no. 4, pp. 2481-2484, 2016.
- [33] N. Patil, J. Celaya, D. Das, K. Goebel, and M. Pecht, "Precursor parameter identification for insulated gate bipolar transistor (IGBT) prognostics," *IEEE Transactions on Reliability*, vol. 58, no. 2, pp. 271-276, 2009.
- [34] R. Mandeya, C. Chen, V. Pickert, R. Naayagi, and B. Ji, "Gate–emitter pre-threshold voltage as a health-sensitive parameter for IGBT chip failure monitoring in highvoltage multichip IGBT power modules," *IEEE Transactions on Power Electronics*, vol. 34, no. 9, pp. 9158-9169, 2018.
- [35] P. Sun, C. Gong, X. Du, Y. Peng, B. Wang, and L. Zhou, "Condition monitoring IGBT module bond wires fatigue using short-circuit current identification," *IEEE Transactions on Power Electronics*, vol. 32, no. 5, pp. 3777-3786, 2016.
- [36] K. Wang, L. Zhou, P. Sun, and X. Du, "Monitoring Bond Wires Defects of IGBT Module Using Module Transconductance," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, 2020.
- [37] N. Baker, M. Liserre, L. Dupont, and Y. Avenas, "Improved reliability of power modules: A review of online junction temperature measurement methods," *IEEE Industrial Electronics Magazine*, vol. 8, no. 3, pp. 17-27, 2014.

- [38] H. Luo, Y. Chen, P. Sun, W. Li, and X. He, "Junction temperature extraction approach with turn-off delay time for high-voltage high-power IGBT modules," *IEEE Transactions on Power Electronics*, vol. 31, no. 7, pp. 5122-5132, 2015.
- [39] H. Luo, W. Li, F. Iannuzzo, X. He, and F. Blaabjerg, "Enabling junction temperature estimation via collector-side thermo-sensitive electrical parameters through emitter stray inductance in high-power IGBT modules," *IEEE Transactions on Industrial Electronics*, vol. 65, no. 6, pp. 4724-4738, 2017.
- [40] I. T. AG, "Transient thermal measurements and thermal equivalent circuit models,"
- [41] E. R. Motto, and J. F. Donlon, "IGBT module with user accessible on-chip current and temperature sensors," in 2012 Twenty-Seventh Annual IEEE Applied Power Electronics Conference and Exposition (APEC), 2012, pp. 176-181: IEEE.
- [42] E. Baygildina, L. Smirnova, K. Murashko, R. Juntunen, A. Mityakov, M. Kuisma, O. Pyrhönen, P. Peltoniemi, K. Hynynen, and V. Mityakov, "Application of a heat flux sensor in wind power electronics," *Energies*, vol. 9, no. 6, p. 456, 2016.
- [43] W. M. Rohsenow, J. P. Hartnett, and Y. I. Cho, *Handbook of heat transfer*. McGraw-Hill New York, 1998.
- [44] T. Hjort, and L. Glavind, "Optical sensor system and detecting method for an enclosed semiconductor device module," ed: Google Patents, 2011.
- [45] A. Mohammed, and S. Djurović, "FBG thermal sensing features for hot spot monitoring in random wound electric machine coils," *IEEE Sensors Journal*, vol. 17, no. 10, pp. 3058-3067, 2017.
- [46] A. Mohammed, and S. Djurović, "FBG array sensor use for distributed internal thermal monitoring in low voltage random wound coils," in 2017 6th Mediterranean Conference on Embedded Computing (MECO), 2017, pp. 1-4: IEEE.
- [47] A. Mohammed, J. I. Melecio, and S. Djurović, "Open-circuit fault detection in stranded PMSM windings using embedded FBG thermal sensors," *IEEE Sensors Journal*, vol. 19, no. 9, pp. 3358-3367, 2019.
- [48] M. A. Ismail, N. Tamchek, M. R. A. Hassan, K. D. Dambul, J. Selvaraj, N. A. Rahim, S. R. Sandoghchi, and F. R. M. Adikan, "A fiber Bragg grating—bimetal temperature sensor for solar panel inverters," *sensors*, vol. 11, no. 9, pp. 8665-8673, 2011.
- [49] J. P. Bazzo, T. Lukasievicz, M. Vogt, V. De Oliveira, H. J. Kalinowski, and J. C. C. Da Silva, "Monitoring the junction temperature of an IGBT through direct measurement using a fiber Bragg grating," in 21st International Conference on Optical Fiber Sensors, 2011, vol. 7753, p. 77538Q: International Society for Optics and Photonics.
- [50] E. Deng, Z. Zhao, P. Zhang, J. Li, and Y. Huang, "Study on the method to measure the junction-to-case thermal resistance of press-pack IGBTs," *IEEE Transactions on Power Electronics*, vol. 33, no. 5, pp. 4352-4361, 2017.
- [51] E. Deng, Z. Zhao, P. Zhang, X. Luo, J. Li, and Y. Huang, "Study on the method to measure thermal contact resistance within press pack IGBTs," *IEEE Transactions on Power Electronics*, vol. 34, no. 2, pp. 1509-1517, 2018.

- [52] D. Xiang, L. Ran, P. Tavner, A. Bryant, S. Yang, and P. Mawby, "Monitoring solder fatigue in a power module using case-above-ambient temperature rise," *IEEE Transactions on Industry Applications*, vol. 47, no. 6, pp. 2578-2591, 2011.
- [53] Z. Wang, B. Tian, W. Qiao, and L. Qu, "Real-Time Aging Monitoring for IGBT Modules Using Case Temperature," *IEEE Transactions on Industrial Electronics*, vol. 63, no. 2, pp. 1168-1178, 2016.
- [54] H. Soliman, H. Wang, B. Gadalla, and F. Blaabjerg, "Condition monitoring for DClink capacitors based on artificial neural network algorithm," in 2015 IEEE 5th International Conference on Power Engineering, Energy and Electrical Drives (POWERENG), 2015, pp. 587-591: IEEE.
- [55] S. Khomfoi, and L. M. Tolbert, "Fault Diagnosis and Reconfiguration for Multilevel Inverter Drive Using AI-Based Techniques," *IEEE Transactions on Industrial Electronics*, vol. 54, no. 6, pp. 2954-2968, 2007.
- [56] S. Khomfoi, and L. M. Tolbert, "Fault Diagnostic System for a Multilevel Inverter Using a Neural Network," *IEEE Transactions on Power Electronics*, vol. 22, no. 3, pp. 1062-1069, 2007.
- [57] W. M. Lin, C. M. Hong, and C. H. Chen, "Neural-Network-Based MPPT Control of a Stand-Alone Hybrid Power Generation System," *IEEE Transactions on Power Electronics*, vol. 26, no. 12, pp. 3571-3581, 2011.
- [58] Y. Zhang, Z. Wang, H. Wang, and F. Blaabjerg, "Artificial Intelligence-Aided Thermal Model Considering Cross-Coupling Effects," *IEEE Transactions on Power Electronics*, pp. 1-1, 2020.
- [59] S. Zhou, L. Zhou, and P. Sun, "Monitoring potential defects in an IGBT module based on dynamic changes of the gate current," *IEEE Transactions on Power Electronics*, vol. 28, no. 3, pp. 1479-1487, 2012.
- [60] B. Hu, Z. Hu, L. Ran, C. Ng, C. Jia, P. Mckeever, P. Tavner, C. Zhang, H. Jiang, and P. Mawby, "Heat-Flux Based Condition Monitoring of Multi-chip Power Modules Using a Two-Stage Neural Network," *IEEE Transactions on Power Electronics*, pp. 1-1, 2020.
- [61] R. Liu, G. Meng, B. Yang, C. Sun, and X. Chen, "Dislocated Time Series Convolutional Neural Architecture: An Intelligent Fault Diagnosis Approach for Electric Machine," *IEEE Transactions on Industrial Informatics*, vol. 13, no. 3, pp. 1310-1320, 2017.
- [62] X. Rui, and D. Wunsch, "Survey of clustering algorithms," *IEEE Transactions on Neural Networks*, vol. 16, no. 3, pp. 645-678, 2005.
- [63] L. Wang, Z. Zhang, J. Xu, and R. Liu, "Wind Turbine Blade Breakage Monitoring With Deep Autoencoders," *IEEE Transactions on Smart Grid*, vol. 9, no. 4, pp. 2824-2833, 2018.
- [64] F. Li, Q. Li, J. Zhang, J. Kou, J. Ye, W. Song, and A. H. Mantooth, "Detection and Diagnosis of Data Integrity Attacks in Solar Farms Based on Multi-layer Long Short-Term Memory Network," *IEEE Transactions on Power Electronics*, pp. 1-1, 2020.

- [65] Y. Wan, and D. Shi, "Joint Exact Histogram Specification and Image Enhancement Through the Wavelet Transform," *IEEE Transactions on Image Processing*, vol. 16, no. 9, pp. 2245-2250, 2007.
- [66] H. Wang, M. Liserre, F. Blaabjerg, P. de Place Rimmen, J. B. Jacobsen, T. Kvisgaard, and J. Landkildehus, "Transitioning to physics-of-failure as a reliability driver in power electronics," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 2, no. 1, pp. 97-114, 2014.
- [67] M. Liserre, R. Cardenas, M. Molinas, and J. Rodriguez, "Overview of Multi-MW Wind Turbines and Wind Parks," *IEEE Transactions on Industrial Electronics*, vol. 58, no. 4, pp. 1081-1095, 2011.
- [68] W. Lai, M. Chen, L. Ran, S. Xu, N. Jiang, X. Wang, O. Alatise, and P. Mawby, "Experimental Investigation on the Effects of Narrow Junction Temperature Cycles on Die-Attach Solder Layer in an IGBT Module," *IEEE Transactions on Power Electronics*, vol. 32, no. 2, pp. 1431-1441, 2017.
- [69] T. Herrmann, M. Feller, J. Lutz, R. Bayerer, and T. Licht, "Power cycling induced failure mechanisms in solder layers," in 2007 European Conference on Power Electronics and Applications, 2007, pp. 1-7: IEEE.
- [70] Y. Chan, D. Xie, and J. Lai, "Characteristics of porosity in solder pastes during infrared reflow soldering," *Journal of materials science*, vol. 30, no. 21, pp. 5543-5550, 1995.
- [71] D. Katsis, and J. VanWyk, "A thermal, mechanical, and electrical study of voiding in the solder die-attach of power MOSFETs," *Components and Packaging Technologies, IEEE Transactions on*, vol. 29, no. 1, pp. 127-136, 2006.
- [72] D. C. Katsis, and J. D. van Wyk, "Void-induced thermal impedance in power semiconductor modules: Some transient temperature effects," *IEEE Transactions on Industry Applications*, vol. 39, no. 5, pp. 1239-1246, 2003.
- [73] W. O'Hara, and N.-C. Lee, "Voiding mechanism in BGA assembly," *International journal of microcircuits and electronic packaging*, vol. 19, pp. 190-198, 1996.
- [74] U.-M. Choi, F. Blaabjerg, and S. Jørgensen, "Study on Effect of Junction Temperature Swing Duration on Lifetime of Transfer Molded Power IGBT Modules," *IEEE Transactions on Power Electronics*, vol. 32, no. 8, pp. 6434-6443, 2017.
- [75] N. Kourra, J. M. Warnett, A. Attridge, A. Dahnel, H. Ascroft, S. Barnes, and M. A. Williams, "A metrological inspection method using micro-CT for the analysis of drilled holes in CFRP and titanium stacks," *The International Journal of Advanced Manufacturing Technology*, vol. 88, no. 5-8, pp. 1417-1427, 2017.
- [76] J. M. Warnett, V. Titarenko, E. Kiraci, A. Attridge, W. R. Lionheart, P. J. Withers, and M. A. Williams, "Towards in-process x-ray CT for dimensional metrology," *Measurement Science and Technology*, vol. 27, no. 3, p. 035401, 2016.
- S. Morankar, M. Mandal, N. Kourra, M. A. Williams, R. Mitra, and P. Srirangam,
   "X-Ray Tomography Study on Porosity and Particle Size Distribution in In Situ Al-4.5 Cu-5TiB 2 Semisolid Rolled Composites," *JOM*, vol. 71, no. 11, pp. 4050-4058, 2019.

- [78] L. R. GopiReddy, L. M. Tolbert, and B. Ozpineci, "Power Cycle Testing of Power Switches: A Literature Survey," *Power Electronics, IEEE Transactions on*, vol. 30, no. 5, pp. 2465-2473, 2015.
- [79] H. Huang, and P. A. Mawby, "A lifetime estimation technique for voltage source inverters," *IEEE Transactions on Power Electronics*, vol. 28, no. 8, pp. 4113-4119, 2013.
- [80] M. Bouarroudj, Z. Khatir, J.-P. Ousten, F. Badel, L. Dupont, and S. Lefebvre, "Degradation behavior of 600V–200A IGBT modules under power cycling and high temperature environment conditions," *Microelectronics Reliability*, vol. 47, no. 9, pp. 1719-1724, 2007.
- [81] N. Heuck, R. Bayerer, S. Krasel, F. Otto, R. Speckels, and K. Guth, "Lifetime analysis of power modules with new packaging technologies," in *Power Semiconductor Devices & IC's (ISPSD), 2015 IEEE 27th International Symposium* on, 2015, pp. 321-324: IEEE.
- [82] J. J. Lifton, A. A. Malcolm, J. W. McBride, and K. J. Cross, "The application of voxel size correction in X-ray computed tomography for dimensional metrology," in *Singapore international NDT conference & exhibition*, 2013, pp. 19-20.
- [83] K. B. Pedersen, and K. Pedersen, "Dynamic Modeling Method of Electro-Thermo-Mechanical Degradation in IGBT Modules," *Power Electronics, IEEE Transactions* on, vol. 31, no. 2, pp. 975-986, 2016.
- [84] B. Gao, F. Yang, M. Chen, L. Ran, I. Ullah, S. Xu, and P. Mawby, "A Temperature Gradient-Based Potential Defects Identification Method for IGBT Module," *IEEE Transactions on Power Electronics*, vol. 32, no. 3, pp. 2227-2242, 2017.
- [85] V. Gektin, A. Bar-Cohen, and J. Ames, "Coffin-Manson fatigue model of underfilled flip-chips," *IEEE Transactions on Components, Packaging, and Manufacturing Technology: Part A*, vol. 20, no. 3, pp. 317-326, 1997.
- [86] W. Lee, L. Nguyen, and G. S. Selvaduray, "Solder joint fatigue models: review and applicability to chip scale packages," *Microelectronics reliability*, vol. 40, no. 2, pp. 231-244, 2000.
- [87] I. Shohji, H. Mori, and Y. Orii, "Solder joint reliability evaluation of chip scale package using a modified Coffin–Manson equation," *Microelectronics Reliability*, vol. 44, no. 2, pp. 269-274, 2004.
- [88] C. Andersson, Z. Lai, J. Liu, H. Jiang, and Y. Yu, "Comparison of isothermal mechanical fatigue properties of lead-free solder joints and bulk solders," *Materials Science and Engineering: A*, vol. 394, no. 1, pp. 20-27, 2005.
- [89] A. Morozumi, K. Yamada, T. Miyasaka, S. Sumi, and Y. Seki, "Reliability of power cycling for IGBT power semiconductor modules," *IEEE Transactions on Industry Applications*, vol. 39, no. 3, pp. 665-671, 2003.
- [90] A. Syed, "Accumulated creep strain and energy density based thermal fatigue life prediction models for SnAgCu solder joints," in *Electronic Components and Technology Conference, 2004. Proceedings. 54th*, 2004, vol. 1, pp. 737-746: IEEE.

- [91] W. Cao, J. Wu, N. Jenkins, C. Wang, and T. Green, "Benefits analysis of Soft Open Points for electrical distribution network operation," *Applied Energy*, vol. 165, pp. 36-47, 2016/03/01/ 2016.
- [92] (2016). *Standard Load Profiles*. Available: <u>https://rmdservice.com/standard-load-profiles/</u>
- [93] B. Hahn, M. Durstewitz, and K. Rohrig, "Reliability of wind turbines," *Wind energy*, pp. 329-332, 2007.
- [94] H. Wang, M. Liserre, and F. Blaabjerg, "Toward reliable power electronics: Challenges, design tools, and opportunities," *IEEE Industrial Electronics Magazine*, vol. 7, no. 2, pp. 17-26, 2013.
- [95] D. Xiang, L. Ran, P. J. Tavner, and S. Yang, "Control of a doubly fed induction generator in a wind turbine during grid fault ride-through," *IEEE Transactions on Energy Conversion*, vol. 21, no. 3, pp. 652-662, 2006.
- [96] J. Due, S. Munk-Nielsen, and R. Nielsen, "Lifetime investigation of high power IGBT modules," in *Power Electronics and Applications (EPE 2011), Proceedings of* the 2011-14th European Conference on, 2011, pp. 1-8: IEEE.
- [97] U. Drofenik, and J. W. Kolar, "A general scheme for calculating switching-and conduction-losses of power semiconductors in numerical circuit simulations of power electronic systems," in *Proceedings of the 2005 International Power Electronics Conference (IPEC'05), Niigata, Japan, April,* 2005, pp. 4-8: Citeseer.
- [98] A. S. Bahman, K. Ma, and F. Blaabjerg, "A Lumped Thermal Model Including Thermal Coupling and Thermal Boundary Conditions for High-Power IGBT Modules," *IEEE Transactions on Power Electronics*, vol. 33, no. 3, pp. 2518-2530, 2018.
- [99] R. Bayerer, T. Herrmann, T. Licht, J. Lutz, and M. Feller, "Model for power cycling lifetime of IGBT modules-various factors influencing lifetime," in *Integrated Power Systems (CIPS), 2008 5th International Conference on*, 2008, pp. 1-6: VDE.
- [100] E. Deng, J. Chen, Y. Zhao, Z. Zhao, and Y. Huang, "Power Cycling Capability of High Power IGBT Modules for Flexible HVDC System," in *PCIM Europe digital* days 2020; International Exhibition and Conference for Power Electronics, Intelligent Motion, Renewable Energy and Energy Management, 2020, pp. 1-8.
- [101] U.-M. Choi, F. Blaabjerg, S. Jørgensen, S. Munk-Nielsen, and B. Rannestad, "Reliability improvement of power converters by means of condition monitoring of IGBT modules," *IEEE Transactions on Power Electronics*, vol. 32, no. 10, pp. 7990-7997, 2017.
- [102] T. Dragičević, and M. Novak, "Weighting Factor Design in Model Predictive Control of Power Electronic Converters: An Artificial Neural Network Approach," *IEEE Transactions on Industrial Electronics*, vol. 66, no. 11, pp. 8870-8880, 2019.
- [103] S. Mohagheghi, R. G. Harley, T. G. Habetler, and D. Divan, "Condition Monitoring of Power Electronic Circuits Using Artificial Neural Networks," *IEEE Transactions* on Power Electronics, vol. 24, no. 10, pp. 2363-2367, 2009.
- [104] R. Razavi-Far, E. Hallaji, M. Farajzadeh-Zanjani, M. Saif, S. H. Kia, H. Henao, and G. Capolino, "Information Fusion and Semi-Supervised Deep Learning Scheme for

Diagnosing Gear Faults in Induction Machine Systems," *IEEE Transactions on Industrial Electronics*, vol. 66, no. 8, pp. 6331-6342, 2019.

- [105] D. M. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," 2011.
- [106] W. Tong, Wind power generation and wind turbine design. WIT press, 2010.
- [107] L. Ran, P. A. Mawby, P. McKeever, and S. Konaklieva, "Condition monitoring of power electronics for offshore wind," *Engineering & Technology Reference*, 2012.
- [108] K. Ma, M. Liserre, F. Blaabjerg, and T. Kerekes, "Thermal Loading and Lifetime Estimation for Power Device Considering Mission Profiles in Wind Power Converter," *IEEE Transactions on Power Electronics*, vol. 30, no. 2, pp. 590-602, 2015.
- [109] F. Jia, Y. Lei, J. Lin, X. Zhou, and N. Lu, "Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data," *Mechanical Systems and Signal Processing*, vol. 72, pp. 303-315, 2016.
- [110] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, *Deep learning* (no. 2). MIT press Cambridge, 2016.
- [111] J. K. Kruschke, "Bayesian estimation supersedes the t test," *Journal of Experimental Psychology: General*, vol. 142, no. 2, p. 573, 2013.
- [112] H. Abdi, and L. J. Williams, "Principal component analysis," *Wiley interdisciplinary reviews: computational statistics*, vol. 2, no. 4, pp. 433-459, 2010.
- [113] J. Shen, X. Hao, Z. Liang, Y. Liu, W. Wang, and L. Shao, "Real-Time Superpixel Segmentation by DBSCAN Clustering Algorithm," *IEEE Transactions on Image Processing*, vol. 25, no. 12, pp. 5933-5942, 2016.