Data-driven optimization of brittleness index for hydraulic fracturing

Evaluation of brittleness index (BI) is a fundamental principle of a hydraulic fracturing design. A wide variety of BI calculations often baffle field engineers. The traditional value comparison may also not make the best of BI. Moreover, it is often mixed up with the fracability in field applications, thus causing concerns. We, therefore, redefine fracability as the fracturing pressure under certain rock mechanical (mainly brittleness), geological and injecting conditions to clarify the confusion. Then, we propose a data-driven workflow to optimize BIs by con- trolling the geological and injecting conditions. The machine learning (ML) workflow is employed to predict the fracability (fracturing pressure) based on field measurement. Three representative ML algorithms are applied to average the prediction, aiming to restrict the interference of algorithm performances. The contribution of brittleness on pressure/fracability prediction by error analysis (rather than the traditional method of BI-value comparison) is proposed as the new criterion for optimization. Six classic BI correlations (mineral-, logging-and elastic-based) are evaluated, three of which are optimized for the derivation of a new BI using the backward elimination strategy. The stress ratio (ratio of minimum and maximum horizontal principal stress), representing the geological feature, is introduced into the derived calculation based on the independent variable analysis. The reliability of the new BI is verified by error analyses using data of eight fracturing stages from seven different wells. Approximately 40% – 50% of the errors are reduced based on the new BI. The differences among the performances of algorithms are also significantly restrained. The new brittleness index provides a more reliable option for evaluating the brittleness and fracability of the fracturing formation. The machine learning workflow also proposes a promising application scenario of the BI for hydraulic fracturing, which makes more efficient and broader usages of the BI compared with the traditional value comparison.


Introduction
Brittleness index (BI), an essential parameter to evaluate the mineral, mechanical and failure characteristics of formations, is commonly applied in mining (e.g. rock cutting) and petroleum engineering (e.g. drilling and fracturing). [1][2][3] For hydraulic fracturing, it is an essential criterion for the optimization of "sweet point" that is easy to crack and tends to create complex fracture networksnamely the evaluation of fracability (how easy the reservoir can be cracked). [4][5][6] Initiating from the mineral component analysis, the BI calculations for hydraulic fracturing are involving more and more rock mechanical and engineering features. 7,8 Dozens of BI calculations are proposed for various formations with unique lithological and elastic properties, which could be roughly classified into (i) the mineral-based brittleness index (MBI), 9,10 (ii) the logging-based brittleness index (LBI), 11 and (iii) the elastic-based brittleness index (EBI). 12,13 Current efforts evaluate the BI calculations mainly based on numerical simulation, 14 logging data, 15 and static tests. 16 Limited test at field-practical scales is presented, thus restricting the reliability and persuasion of these BI calculations. However, the successful applications of machine learning techniques in petroleum engineering have paved a potential data-driven approach for BI optimization at field-practical scales. 17,18 The primary application of BI is to evaluate the fracability for hydraulic fracturing. However, the BI is often mixed up with the fracability in many fracturing applications, which has caused concerns about misleading the brittleness and ductility recognization of the targeting formation. [19][20][21] Some researchers believe that the BI is more like a rock type indicator. The geological properties should also play an important role in the evaluation of fracability, for instance, the in-situ confining stress, pore pressure, natural fractures and faults. [22][23][24] To distinguish fracability from BI, we propose that fracability should be an even broader concept that involves the injection parameters (pump rate, fluid and proppant properties) as well.
Therefore, we redefine fracability as the fracturing pressure, a more comprehensive response of the cracked reservoir under certain geological, rock mechanical and hydraulic injecting conditions. Then, a machine learning (ML) workflow is proposed for optimizing BIs by predicting the fracability/pressure based on field measurements. The prediction errors are analyzed to evaluate the contribution of each BI via a control variate method (controlling the interferences of geological, hydraulic and ML algorithm conditions). A new BI is derived by integrating the geological feature based on the optimization, variable analysis (optimizing variables according to their significance to the prediction) and backward elimination strategy (starting with all candidates and eliminating one at a time until minimizing the error), which significantly improves the predictions of fracability/pressure compared with classic BIs. Our optimization work clarifies the confusion between BI and fracability, promotes the calculation of BI and provides a more reliable option of BI for the recognition of targeting formation and design of fracturing schedule. It also demonstrates a promising application scene of BI based on the data-driven approach, which takes better advantage of BI than the traditional value comparisons.

Methodology
The field measurements of hydraulic fracturing are collected and used as the Basic dataset for comparisons. By appending the brittleness index to the Basic dataset, the BI datasets are built for data processing. Three classic machine learning algorithms, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and Artificial Neural Network (ANN) are used for data processing to mitigate the interference of algorithm performances on the predicted fracability. The mean absolute error (MAE) and root mean squared error (RMSE) are deployed for the prediction analyses and BI optimization. The backward elimination strategy and variable analysis are applied for the derivation of the new BI, as shown in Fig. 1.

Data preparation
We collect 76 stages of shale gas fracturing and logging measurements (11,860 groups of data at minute level) in 7 wells (denoted as W1 to W7) from the Sichuan Basin (Longmaxi shale gas formation), China, as shown in Table 1. Each group of data involves parameters of fluid types (hydrochloric acid -HCL, slickwater, and gel), pump rate, proppant concentration, proppant mesh, and fracturing pressure. The logging interpretations for calculating BIs contain mineral components (Clay, Quartz, Carbonate and total organic carbon -TOC), Young's modulus, Poisson ratio, density, porosity, wave velocities and compressional slowness (DTC).
The training and predicting stages are specially designed (Table 1).
There are 68 stages from wells W1~W4 used for algorithm training. The first four predicting stages (Stage 3, 5, 7 and 9) are selected successively from the same wells (W1~W4), aiming to control the uncertainties from geological heterogeneity in the shale formation. As a result, the optimization of BI based on the prediction is more reliable by minimizing the interference of geological uncertainties. Four more predicting stages are selected from the first and second halves of the fracturing stages in two new wells, as shown in Table 1. W5 and W3 are neighbour wells. W6 is a distant well far from all others. W7 drills through different subdivided formations from all other wells. We choose representative stages from the first and second halves of a fracturing operation to restrict the potential influence of human factors on fracturing pressures because operators usually learn experiences from precedents and perform better in subsequent fracturing stages. The categorical data encoding and feature scaling are carried out during the data pre-processing. There are three fluid types and one special fluid condition (the pump off, denoted as N/A) in our dataset, which are string variables and can not be used for algorithm training directly. Therefore, the fluid type is encoded by the OneHotEncoder and transformed into a combination of three binary digits (i.e. 0, 0 and 1) after deleting the dummy variable. 25 The fluid type used to be one column of strings is then transformed into three columns of binary code when they are fed into the algorithm. The proppant type is presented by its averaged diameter. All input variables are standardized by removing the mean and scaling to unit variance. The datasets, inputs and outputs are summarized in Table 2. The pump rate, fluid type, proppant concentration, averaged proppant diameter and BI are used as inputs for the algorithm training, in which the hydraulic parameters are set as control variables. The machine learning algorithm is trained by the inputs to predict fracturing pressures. The predictions are then compared with field records for error analyses. Both the Basic and BI datasets contain all stages in Table 1.

Brittleness index for optimization
Six representative BI calculations are selected from MBI, LBI and EBI categories for evaluations, as summarized in Table 3. BI 1 and BI 2 , widely used for shale gas fracturing, are usually obtained by laboratory tests and reflect the mechanical properties of the rock. BI 3 are empirical equations based on data from the shale gas wells in the Permian Basin (U.S.), which evaluates the BI based on the logging response of underground formations. The sum of the calculations is utilized for BI 3 . BI 4 is the energy dissipation per unit area during the process of new fracture creation. The fracture toughness (KIC) represents the resistance of rock to fracture propagation. BI 5 and BI 6 are deformations and extensions of BI 2 . The correlations in Table 3 assess the brittleness from different aspects (mineral components, rock mechanism, energy dissipation, logging interpretation, etc.) using various data sources (from both surface and subsurface). The calculation biases are inevitable because the brittleness is the synthetical result of multifactor. Therefore, the machine learning workflow is applied to evaluate the performances of BIs. Each BI is calculated by original data and fed into the algorithm as an input feature. Then, the prediction errors based on each correlation are analyzed and used as the criterion for BI optimization.

Machine learning algorithm and workflow
Three different machine learning algorithms are deployed to restrict the potential interferences induced by the algorithm performances. The MLR is used for both pressure prediction and variable analysis. The SVR, with a radial basis function (RBF) kernel, is capable of both linear and non-linear regression. 30,31 A three-layer ANN (input, hidden and output layers) is constructed under the Tensorflow frame with 20 neural units in the first and hidden layers referring to previous research. 32,33 The dropout regularization (the rate of 0.2) is set after the first and hidden layers to avoid overfitting. 34 The Adam routine is selected as the optimizer to compile the model, where a callback function is applied to return and automatically update the learning rate. 35 Other hyperparameters are optimized by the Grid search and K-fold cross-validation, 36,37 including the training epochs (30) and batch size (50).
The detailed data processing workflow, integrating BI calculations and machine learning algorithms, is illustrated in Fig. 2. The original data is pre-processed by the calculations in Table 3 for BIs. Considering the division design in Tables 1 and 2, the geological and hydraulic effects on fracturing pressure/fracability are restricted, thus boosting the Logging-based dominance of BI on predictions. Each BI-based workflow processes the original measurements respectively. The predicted pressures based on the Basic (donated as the Baseline prediction) and BI i datasets are compared with field records for error analyses and BI optimization. The remaining errors are improved by the derivation of a new BI based on the backward elimination strategy. The geological features are selected based on the variable analysis, and then introduced as the correction coefficient into the new BI calculation for promotions. The derivation details are presented in Appendix B. Finally, the workflow is executed again to verify the performance of the new BI.

Results
The prediction of pressure/fracability based on the Basic dataset is set as the reference. The error between the reference and field-recorded pressure is denoted as the baseline error, and is used to measure the promotion based on the BI dataset. The error improvement is applied as the criterion of BI optimization, and is quantified by the mean absolute error (MAE) and root mean squared error (RMSE). Based on the optimization, a new BI equation is derived via variable analysis and backward elimination strategy (the detailed process is presented in Appendix B), which is then verified based on field applications.

Predictions of fracturing pressure
Based on the data-processing workflow in Fig. 2, the predicted fracturing pressure/fracability produced by different algorithms are presented in Fig. 3 using Stage 5 from W2 as an example. The predictions based on BI 3 (the blue dashed curve) and only hydraulic parameters (the red dashed curve) are compared with field measurements (the solid black curve). Generally, predictions based on BI 3 fluctuate around the field curve and exhibit higher precision than baseline predictions. BI 3 introduces the logging data and porosity of the targeting formation, thus improving the interpretation of fracabiliby and reducing the error significantly.
The BI 3 -based workflow (used as an example) is deployed to process the data from all eight testing stages. The errors of these testing stages are presented in Appendix A. Generally, the errors increase apparently in the last four stages from W5~W7 compared with errors based on W1~W4. Stages 3, 5, 7 and 9 are from the same wells (W1~W4) as the training stages (Table 1), which may constrain the geological uncertainties because of the close location and result in better predictions. The testing errors are averaged and used to characterize the error improvement by introducing the BI 3 , as shown in Fig. 4. The averaged

Error analyses and BI optimization
The error analyses in Table A1 (Appendix A) and Fig. 4 are conducted for five more iterations based on BI i -based workflow (Fig. 2). The algorithms are trained by BI i datasets (i = 1, 2, 4, 5 and 6), respectively. The errors are analyzed and presented in Fig. 5, where the error produced by the Basic dataset is used as a reference. Among all the BIs, BI 6 yields the largest MAE and RMSE, followed by BI 4 . The performances of BI 2 and BI 3 are comparable, which reduces around 12%-16% of the errors. BI 1 performs slightly better in controlling errors than the performances of BI 2 and BI 3 . BI 5 produces the minimum errors compared with others, and diminishes the MAE and RMSE by 20% and 18%, respectively. Besides, the performance of the logging-based BI 3 is unexpected since the equations are fitted using data from a different basin.
The performances of ANN and SVR are better than that of the MLR algorithm for most of the predictions, as shown in Fig. 5. Therefore, the predictions based on ANN and SVR algorithms are used for the stage-bystage error analysis. Noteworthy, an appropriate BI may restrain the difference in the algorithm performances as the results based on BI 3 in Fig. 5 (noted by the vertical dashed line). The results based on the BI 5based workflow (with the minimum errors) are expanded as examples for the stage-by-stage analysis among different wells, as shown in Fig. 6. The MAEs of the first four stages are controlled under 10% because the algorithms are trained by data from the same wells (W1~W4). The errors of W5 mildly increase, which may be explained by the adjacent relationship between W5 and W3, sharing similar geological properties. The algorithms produce the largest errors for W6 and W7, wells from a far distance and drilled through a different subdivided formation, respectively. The error improvements based on classic BIs mainly occur in the first four wells. Therefore, a new BI is necessary to promote its interpretation capacity for broader deploying scenarios, and then improve the errors for W6 and W7. The BI 1 , BI 2 , BI 3 , and BI 5 perform better according to the error analyses (Fig. 5) and are optimized for the new BI derivation.

New BI derivation
The new BI is designed to involve as many characteristics as possible to deliver more features into the algorithm training, thus improving the prediction. The backward elimination strategy (based on SVR) and independent variable analysis (based on MLR) are employed to optimize the expression and new parameters. The derivation process is explained in Appendix B. The expression of the new BI is given by where BI in is the normalized BI i (i = 1, 3, and 5), in which the mineral component (BI 1n ) exerts the most significant contribution followed by BI 3n and BI 5n according to the results of the backward elimination (Appendix B, Table B1); Stress min and Stress max are minimum and maximum horizontal principal stress, and are optimized by variable analyses (Appendix B, Table B2).
The new calculation involves the mineral component, mechanical properties, logging interpretation and geological stresses, thus providing a more comprehensive evaluation method of brittleness. The application range of the new BI is constrained by the data source used for the derivation and verification. It is proposed based on field measurements from the shale gas wells in the Sichuan Basin, China. Broader assessments based on different basins are currently infeasible due to the data limitation. However, the BI 3 fitted by the field data from the U.S. exerts relatively high performance in processing the data from China (Fig. 5). We believe that the universality of the new BI (Eq. (1), involving BI 3 ) can also be extensive.

Performance of new BI
The performance of the new BI (Eq. (1)) is demonstrated by averaging the errors based on eight testing stages and three algorithms (MLR, SVR, and ANN), as shown in Fig. 7. The single BI 1~B I 6 decreases MAE and RMSE by around 20% compared with the baseline errors. However, the new BI cuts down 50% of MAE and 40% of RMSE, respectively. The new BI also restrains the differences in algorithm performances, demonstrating the promotion of predictions based on Eq. (1).
Significant improvements are observed in cases of W6 and W7 based on the new BI, as shown in Fig. 8. Those errors are rarely reduced by BI 5 as shown in Fig. 6. The new BI suppresses the MAE of the neighbour well (W6) under 10%, which will benefit the evaluation of fracability for the well-factory mode fracturing (several neighbour wells are drilled and fractured in one platform). The experience of previous fracturing operations can be extracted by the new BI for pressure prediction and schedule optimization for new neighbour wells. Meanwhile, approximately 55% of MAE and 52% of RMSE, on average, are diminished for the W7 case. The new BI provides better interpretations of the new subdivided formation and helps the machine learning algorithms to yield more accurate predictions. Therefore, the higher performance of the new BI is demonstrated according to the error comparisons with classic BIs.

Analysis of remaining errors
We use error reductions (relative to the baseline reference) as the criterion of BI optimization. Therefore, the final errors in Fig. 8 may remain high for the W7 case, especially for Stage 10. The precise prediction of fracturing pressure is beyond the scope of this work since the performance of the new BI is already demonstrated by the significant improvement in MAE and RMSE. Future studies focusing on the pressure/fracability prediction may concern the remaining errors in Fig. 8 and restrict the errors by introducing more dominating features or advanced machine learning algorithms. In addition, slight error increases are observed for W1. The hydraulic parameters may determine the prediction for W1, referring to the relative low MAE and RMSE yield by the Basic dataset where only pumping parameters are involved. The introduction of BI may provide limited contribution or even interference to the algorithm training and the predictions. Similar results are inspected in the case of W3, where low errors and limited promotion of the BIs are obtained.

Limitations and implications
The new BI is tested by eight fracturing stages from seven different wells, resulting in satisfactory performances. The limitation of this work may still exist in the complex expression of the new BI (3-BIs-combination that involves mineral, logging and elastic parameters) that increases the assignment of data collection and calculations. We, therefore, mention that a 2-BIs-combination (Eq. B6in Appendix B, Table B1) may be a convenient substitute for on-site manual computations. Noteworthy, the capacity of the 2-BIs-combination is likely to be restrained, especially when the missing features (compared with the 3-BIs-combination) are determinants of the brittleness.
The new BI (considering mineral, mechanical, logging and geological features) provides a more accurate and comprehensive recognization of formation, thus improving the "sweet point" optimization and stage partition. According to the derivation, the usage of the new BI is more complicated (based on a machine learning workflow) than the traditional value comparison. Fitting the correlation between the new BI and fracability (i.e. the maximum fracturing pressure) based on a mass of field cases may produce a similar value-based criterion as the traditional usage. However, we believe that it is more important to make better and broader usages of the brittleness index. Traditional value comparison of BI can only generate a qualitative adjustment about the fracturing difficulty. The usage of comparison results highly depends on the experiences of field engineers, which may induce uncertainty and bias. In contrast, the data-driven method in this paper provides a more powerful approach to producing informative numerical predictions (such as the fracturing pressure). It is possible to use the new BI as a control variable, and then regulate the injection parameters (pump rate, fluid viscosity, etc.) for pressure prediction, thus implementing precise optimization of the pre-fracturing design based on the data-driven workflow.

Conclusions
Six classical mineral-, logging-and elastic-based BI correlations are evaluated by processing field measurements of shale gas fracturing based on machine learning workflows. The fracability is redefined as the fracturing pressure dominated by rock mechanical (mainly brittleness), geological, and hydraulic conditions (pumping parameters), which clarifies the confusion between brittleness and fracability. The control variate method is applied to restrain the effects of geological and injecting aspects and enhance the dominance of brittleness on fracability. Classic BIs are evaluated by predicting the fracturing pressure based on three different algorithms, trying to mitigate the interferences of algorithm performances. The data-driven approach makes better use of the BI for informative numerical predictions compared with the conventional way of value comparison that only generates a qualitative evaluation. The major conclusions may be generalized as follows: (1) The elastic-based BI produces the minimum errors for pressure predictions, followed by the logging-based BI, which is unexpected since the logging-based BI is empirical equations based on data from a different basin. The mineral-based BI also exerts a significant contribution to predictions and is crucial for the new BI derivation according to the backward elimination analysis (Appendix B). Moreover, an appropriate BI may restrain the different performances of various machine learning algorithms. The new BI provides a more reliable option for recognizing the formation and may be essential for pre-fracturing design and prediction of operating pressure.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. close location and result in better predictions.

Appendix B. Derivation of the new BI
The BI 1 , BI 2 , BI 3 , and BI 5 produce smaller errors according to the error analyses (Fig. 5) and are optimized for the new BI derivation. The SVR is used for derivation because of its relatively high accuracy and computational efficiency, referring to Fig. 5. The new BI is designed to involve as many characteristics as possible to deliver more features into the algorithm training, thus improving the prediction. Therefore, the initial expression of new BI is proposed by averaging the normalized BIs referring to previous efforts 12,29 BI New− 1 = (BI 1n + BI 2 + BI 3n + BI 5n )/4 where BI in is the normalized BI i (i = 1, 3, and 5). BI 2 is an average of the normalized Young's modulus and Poisson ratio (Table 3), thus remaining the original value.
The backward elimination strategy (starting with all candidates and eliminating one at a time until minimizing the error) is deployed for the new BI derivation. Eq. (B1), containing all four selected BIs, is firstly applied for algorithm training, which is denoted as the All-in step. The prediction error in this step is used as a reference. Then, three out of the four BIs are freely assembled for four 3-BIs-combinations, in which the one with the minimum error is chosen for further derivation (Iteration 1 in Table B1). If the minimum errors are not larger than the All-in errors, the chosen 3-BIs-combination will be the new reference for the next round comparisonan elimination process. The elimination process repeats for smaller-scale combinations until the new error is larger than the reference (Iteration 2 in Table B1).
The averaged errors based on all eight testing stages are summarized in Table B1. All new BI combinations promote the model performance compared with the errors based on BI 5 . Eqs. (B2) and (B8), without BI 1 , produce the most prominent errors in each iteration, indicating the significance of mineral components for the interpretation of fracability. Eq. (B6) requires fewer parameters (2-BIs-combination) and produces relatively small errors, thus may be a convenient substitute for on-site estimations. Eq. (B3) results in the minimum errors and is selected for further derivation. According to the error improvements during Iteration 2 in Table B1, BI 5n contributes 7.69% of MAE reduction to Eq. (B3), from 0.130 to 0.120. BI 3n and BI 1n decreased 12.3% and 26.4% of MAE, respectively, based on the errors of Eqs. (B7) and (B8).
New parameters are tested and introduced into the expression to promote BI performance. A series of geological and engineering candidates are applied for variable analyses (Table B2). The parameters involved in Eq. (B3) and hydraulic parameters are set as the control variables. The well depth, pore pressure, stage length, number of clusters, and stress ratio (ratio of minimum and maximum horizontal principal stress) are appended to the control variables and fed into the algorithm independently and successively (Tests 1-5 and All-in test), as shown in Table B2. Correspondingly, a series of MLR workflow is constructed to process the training dataset. The new parameter is optimized by referring to the variable coefficient and P valuesthe significance of a variable to the pressure prediction. Since the feature scaling is performed, the larger coefficient (absolute value) indicates a more significant impact of the corresponding variable on the fracturing pressure.
The stress ratio, with the largest coefficient in Tests 1-5 and All-in test (Table B2), is chosen as the new parameter and used as the correction coefficient for the new equation. The expression of the new BI, therefore, is given by where Stress min and Stress max are minimum and maximum horizontal principal stress, respectively.

Table B1
Selection of the BI-combinations based on the backward elimination strategy.