Machine-learning-based pilot symbol assisted channel prediction

In this paper, machine learning (ML) algorithms are used for channel prediction in wireless communications. The performances of ﬁve ML algorithms are compared in terms of the prediction accuracy and the symbol error rate (SER) of different modulation schemes based on the prediction. The result shows that, for channel prediction, support vector machine (SVM) has the best performance in terms of accuracy and stability. For signal detection, SVM and linear regression (LR) have their own advantages in different ranges of signal to noise ratio (SNR). At high constellation size, ML methods give similar performances to existing scheme. From the numerical examples, the SERs based on SVM and LR can both reach lower than 10 − 3 in binary phase shift keying and 16-ary quadrature amplitude modulation signalling, and can reach 1.13 × 10 − 2 and 4.28 × 10 − 3 in 16-ary phase shift keying signalling respectively. In terms of prediction time, SVM is more efﬁcient.


INTRODUCTION
error (MMSE). Generally speaking, the LS estimator estimates the wireless channel without using its statistical information, but the method is sensitive to the interference [5]. The method of MMSE makes use of the noise variance and correlation of the channel, which increases the computational complexity, as calculation of the sample correlation matrix is complicated [6].
On the other hand, in recent years, machine learning (ML) algorithms have been widely applied to various fields, such as natural language identification and image processing, because of their good performance and popularity [7]. There are two different types of ML: classification and regression [8]. Since the estimation of the wireless channel is essentially about regression, ML can be applied to channel estimation in wireless communications. By using ML algorithms, channel gain can be calculated without the channel correlation matrix, which leads to higher efficiency. For example, in ref. [9], a low-complexity channel estimator based on the ML method and the MMSE structure was proposed. In the estimator, the model parameters were learned instead of fined-tuned for different channel models. An ML-based time-division duplex scheme was presented in ref. [10]. In this scheme, channel state information (CSI) was obtained based on the temporal channel correlation, and the estimation performance was optimised. In ref. [11], a real-valued sparse Bayesian learning approach was developed to estimate the downlink channel of a massive multiple-input multiple-output (MIMO) system. By converting the complexvalued channel recovery problem into a real one, the computational complexity was significantly reduced. Deep learning (DL) methods were used in ref. [12] to resolve the estimation of fast time-varying MIMO orthogonal frequency division multiplexing (OFDM) channels. The results showed that DL models outperformed traditional algorithms in both accuracy and robustness. In ref. [13], a cost-efficient convolutional neural network was used to classify the modulation of radio signals for various distortions and noise. Its accuracy can reach over 93% at a signal-to-noise ratio (SNR) of 20 dB. In ref. [14], a DLbased super-resolution network and an image restoration network were cascaded for channel estimation as ChannelNet, and in ref. [15], a residual learning based deep neural network called ReEsNet was proposed and compared with ChannelNet in ref. [14].
As the CSI between two relatively moving transceivers is correlated in time, recently, increasing researches have focused on wireless channel prediction. Generally, there are mainly two types of channel prediction: autoregressive model prediction and parametric model prediction [16]. For example, in ref. [17], a recurrent neural network (RNN) based real-time channel predictor was proposed. By using historical CSI for training, the network achieved CSI prediction. In ref. [18], long short-term memory and gated recurrent unit were used, and the prediction accuracy was further improved. Among the works on channel prediction, non-e of them has predicted the wireless channel using the noisy received signals. Moreover, as the latest advance in ML, DL has good accuracy du e to long training and large-scale data [19]. Traditional ML algorithms are rarely used in recent works. In refs. [20] and [21], support vector machine (SVM) was cascaded for channel estimation in massive MIMO systems with one-bit analogue-to-digital converters. An SVMbased channel estimation method and a two-stage signal detection method were proposed. However, traditional ML algorithms, which produce the results without large-scale data for long-time training and do not need high-performance GPU, CPU, or SSD, as in DL, are a good choice for real-time prediction [7,22]. Motivated by these observations, in this work, the feasibility of channel prediction using traditional ML algorithms will be explored. Five different traditional ML algorithms are used to build predictive models using the noisy received signals and historical CSI: 1) random forest (RF), 2) SVM, 3) linear regression (LR), 4) decision tree(DT) and 5) ensemble regression (ER). All the methods in this research are based on Statistics and Machine Learning Toolbox of Matlab R2019b. The channel state information will be extrapolated from the predictive models. All the used algorithms are supervised regression algorithms. Their performances are calculated and compared.
The rest of the paper is organised as follows. In Section 2, the system model will be described. In Section 3, the sample size, the training size, and different algorithms will be considered and selected for further use. In Section 4, signal detection will be simulated using the predicted channel based on the selected algorithms and parameters and the results will be discussed. In Section 5, the conclusions will be drawn.

SYSTEM MODEL
The data symbols are assumed to be transmitted and received in frames each of which contains K data symbols. Within the K symbols of a frame, the first one is a pilot symbol and the other K − 1 symbols are data symbols. All the symbols are from a signal set of M possible values. The value of the pilot symbol is known asb. The received signal from wireless channel can be represented as where c(t ) is the complex channel gain, s(t ) is the transmitted signal, and n(t ) is the additive white Gaussian noise (AWGN), which is a zero mean Gaussian random process, and the transmitted signal s(t ) can be written as where b(i ) is the value of the ith transmitted signal, T is the symbol duration, and p(t ) is the shaping pulse with energy E p . The complex channel gain c(t ) is a complex Gaussian random process with variance 2 c , and it can be represented as If the channel is Rayleigh fading, one has where p |c(t )| (x) is the probability density function (PDF) with parameter 2 , and E{.} is the expectation. If the channel is Rician fading, it has where A is the peak value of line-of-sight amplitude, and I 0 is modified Bessel function of first kind with order zero. Its autocorrelation function can be represented as whereR c ( ) is the normalised autocorrelation function. In this research, the scattering in the fading channel is assumed to be isotropic. Thus, one has where f D is the maximum Doppler spread in the channel. For simplicity, the line-of-sight component of fading process is assumed to be constant. Thus, m R (t ) can be represented as m R , and m I (t ) can be represented as m I . Then, the local mean power of the line-of-sight component in the Rician fading channel can be defined as [1] and the Rician K factor can be defined as [1] In this research, different values of R K will be used to examine the performance of the predictor.
The received signal will be sampled with a duration T . Thus, the ith symbol sampled can be represented as where c(iT ) is the Gaussian channel gain sample, and n i is the noise sample whose mean is zero and variance is 2 n = N 0 E p . For simplicity, in the system, it is assumed that the 0th symbol of the transmitted signal is a pilot symbol [4], and the first to the (K − 1)th symbols are data symbols.
Thus, the effective SNR per bit can be represented as [3] where 1 2 2 c is the average fading power, and E{|b i | 2 } is the mean signal energy.
It is assumed that the previous S S pilot symbols situated in S S previous frames are used to predict the channel gain in the future to detect the data symbols. S S is the sample size. Also, S T is the number of frames of data used to train the predictive models, as the training size. The received pilot symbols are included in the training set X . The channel gains are included in the training set Y . The structure of training sets is shown as Figure 1.
For example, when S S = 50, S T = 10 and the frame size K = 5, there is a training set X of size 50 ×10 and a training set Y of size 1 ×10. For the prediction of the channel gain at the 61st pilot position c(300T ), in the first line of the training set X , they are the first, second,..., 50th received pilot symbols, i.e. r 0 , r 5 , … , r 245 , and in the first line of the training set Y , it is the channel gain in the 51th pilot position, i.e. c(250T ). Similarly, . Each prediction will be given according to such training sets covering the previous S S + S T pilot symbols and S T complex channel gain values by 5 ML algorithms. The processing of them is shown by the flow graph in Figure 2. The best choices of S S and S T will be tested in Section 3. The accuracy of the prediction will be examined in the pilot symbol assisted modulation (PSAM) signal detector and compared with the perfect channel knowledge case, which uses the true value of the channel gain. The detector can be represented as [3] whereb i is the data decision, X i is the prediction result in ith frame, and other symbols are defined as before.

CHOICES OF KEY PARAMETERS FOR PREDICTION
In this section, the normalised root mean square error (NRMSE) will be used to represent the prediction accuracy as whereŷ t is predicted value, y t is actual value, and n is the number of predictions. Equation (13) is used to calculate the NRMSE of the real part and imaginary part of the predicted channel gain. The final NRMSE is their average.

Sample size
Sample size is the number of pilot symbols before the ith pilot position that will be learned in one row of a data set. In this subsection, different values of sample size will be tested from The process of ML algorithms 25 to 250 with a step size of 25. The prediction will be averaged for 1000 times. In Figure 3, the NRMSE of each algorithm is shown. All of them decrease with the sample size in general. Among them, for RF, the lowest error is recorded as 0.1053 when S S = 250, while the highest error occurs as 0.1305 when S S = 75. The mean value of prediction NRMSE is 0.1174.This gives a mean prediction accuracy of 88.26%. Next is SVM, whose lowest error is recorded as 0.0378 when S S = 250, and the highest error occurs as 0.1630 when S S = 50. The mean value of NRMSE is 0.0799, and the mean prediction accuracy is 92.01%. For LR, the lowest error is recorded as 0.0425 when S S = 250, while the highest error occurs as 0.5118 when S S = 100. The mean value of FIGURE 3 NRMSE for different sample sizes from 25 to 250 prediction NRMSE is 0.1346, and the mean prediction accuracy is 86.54%. For DT, the lowest error is recorded as 0.1231 when S S = 175, while the highest error occurs as 0.1534 when S S = 25. The mean value of prediction NRMSE is 0.1374, and the mean prediction accuracy is 86.26%. Finally, for ER, the lowest error is recorded as 0.1187 when S S = 175, while the highest error occurs as 0.1440 when S S = 50. The mean value of prediction NRMSE is 0.1315, and the mean prediction accuracy is 86.85%.
These results are based on S T = 100 and SNR = 20 dB. The results show that the channel prediction error in general decreases when S S increases, because larger sample size provides more information on the fading process. In addition, when S S is above a certain value, the prediction errors remain relatively stable, which means that the data farther away from the desired time gives less help to the prediction. Similar tests have been done for different S T and SNR. Generally, when S S ⩾ 200, the NRMSE is relatively stable, especially for SVM and LR. Hence, considering the trade-off between complexity and accuracy in the prediction efficiency, S S = 200 is chosen as the sample size in later studies. The NRMSE of LR hops at S S = 100. After the test, LR algorithm is unstable when S S = S T , because the training set X is symmetric, which disturbs the fitting of linear regression model.

Training size
In this subsection, different training sizes will be tried to examine the system performance. Training size represents how many rows of data are used in a data set and will be learned to make a prediction. The prediction will also be averaged for 1000 times, for training sizes from 25 to 250 with a step size of 25. In Figure 4, the NRMSEs of SVM and LR show an upward trend These results are based on SNR = 20 dB. Similar tests have also been done for different SNRs. Generally, when S T ⩽ 150, the NRMSE is relatively stable with good accuracy, especially for SVM and LR. The results show that, in dynamic wireless channel conditions, the neighbouring batches of data points are more effective to real-time channel prediction. Hence, for a balance between complexity and accuracy, S S = 100 is chosen as the training size in later studies.

Chunks
In this subsection, to compare the performances of different algorithms, the test is done in several 'chunks' of data. The dataset is divided into chunks to examine the stability of these algorithms in different data intervals. In this research, the sample size is set as 200 and the training size is set as 100 to build the prediction model. Also 1000 rows of data points are predicted in one chunk based on the learning of 300 data points, while the next 1000+300 data points are used in the next chunk and so on. For example, as Figure 5, the first chunk has the first to 1300th data points and the second chunk has the 1301st to 2600th data points. The mean NRMSEs of 10 chunks will be calculated and compared in the following.
In Figure 6, the average NRMSEs for RF, SVM, LR, DT, and ER are 0.1088, 0.0392, 0.0467, 0.1251, and 0.1194 respectively, and the mean value of these errors is 0.0878. Overall, the algorithm with the best performance is SVM, which gives a mean prediction accuracy of 96.08%. The lowest error of SVM is recorded in chunk 3 as 0.0389, while the highest error occurs of SVM in chunk 10 as 0.0396. The range of errors for SVM is 0.0007. Next, LR gives a mean prediction accuracy of 95.33%. The lowest error of LR is recorded in chunk 3 as 0.0434, while the highest error of LR occurs in chunk 1 as 0.0523. The range of errors for LR is 0.0089. Next is RF, which gives a mean prediction accuracy of 89.12%. The lowest error of RF is recorded in chunk 2 as 0.1077, while the highest error of RF occurs in chunk 4 as 0.1106. The range of errors for RF is 0.0029. Next, ER gives a mean prediction accuracy of 88.06%. The lowest error of ER is recorded in chunk 8 as 0.1138, while the highest error of ER occurs in chunk 2 as 0.1289. The range of errors for ER is 0.0151. The algorithm with the worst performance is DT,

NUMERICAL RESULTS AND DISCUSSION
In this section, the prediction will be examined in the PSAM detector to compare the signal detection accuracy. Symbol error rate (SER) is used to represent the detection accuracy as

SNR
In this subsection, channel prediction and signal detection will be simulated separately. The performances of the prediction systems for different values of SNR are compared with that of MMSE based estimator. In this test, S T = 100, R K = 8, and the modulation type is BPSK. In Figure 7, when S S = 200, the NRMSEs of the five algorithms all reduce with the increase of SNR. Among them, for RF, the lowest error is recorded as 0.1106 when SNR = 30 dB, while the highest error occurs as 0.1857 when SNR = −5 dB. The mean value of prediction NRMSE is 0.1362, and thus the mean prediction accuracy is 86.38%. For SVM, the lowest error is recorded as 0.0342 when SNR = 30 dB, while the highest error occurs as 0.1719 when Among the five algorithms, SVM gives the best performance in terms of NRMSE, and RF is the second best. In addition, when SNR < 10 dB, the performances of all the ML methods are better than MMSE, which means ML shows better adaptability to noise in the wireless channel. When SNR ⩾ 15 dB, MMSE gives lower NRMSE than ML methods, but the performances of LR and SVM are always near the MMSE. When SNR ⩾ 25 dB, the NRMSE of LR becomes the lowest of all the ML algorithms. Thus, for large SNRs, LR should be used, while for small SNRs, SVM should be used.
The prediction in Figure 7 is then used in the signal detector, and its SER is shown in Figure 8. For all the following figures, the line 'perfect detection' means the SER is obtained when the detector using the true value of channel gain with the length of S S . Similar to NRMSE, in Figure 8, all the SERs decrease with the increasing SNR.
For RF, the lowest SER is recorded as 1.78 × 10 −2 when SNR = 30 dB, while the highest SER occurs as 3.12 × 10 −1 when SNR = −5 dB. For SVM, the highest SER occurs as 3.09 × 10 −1 when SNR = −5 dB. And when SNR ⩾ 20 dB, SER ⩽ 10 −3 . For LR, the highest SER occurs as 3.59 × 10 −1 when SNR = −5 dB. And when SNR ⩾ 20 dB, SER ⩽ 10 −3 . For DT, the lowest SER is recorded as 2.38 × 10 −2 when SNR = 25 dB, while the highest SER occurs as 3.13 × 10 −1 when SNR = −5 dB. Finally for ER, the lowest SER is recorded as 2.40 × 10 −2 when SNR = 25 dB, while the highest SER occurs as 3.12 × 10 −1 when SNR = −5 dB. The performance of SVM is very close to perfect detection. However, the channel prediction in this research does not require knowledge of the channel covariance matrix to reduce complexity.
In Figure 9, when S S = 500, the NRMSEs of the five algorithms all decrease with the increase of SNR value. For RF, the lowest error is recorded as 0.1054 when SNR = 30 dB, while the highest error occurs as 0.1818 when SNR = −5 dB. The mean value of prediction NRMSE is 0.1314, and the mean prediction accuracy is 86.86%. For SVM, the lowest error is recorded as 0.0328 when SNR = 30 dB, while the highest error occurs as 0.1442 when SNR = −5 dB. The mean value of prediction NRMSE is 0.0664, and the mean prediction accuracy is 93.36%. For LR, the lowest error is recorded as 0.0116 when SNR = 30 dB, while the highest error occurs as 0.3884 when SNR = −5 dB. The mean value of prediction NRMSE is 0.1190, and the mean prediction accuracy is 88.10%. For DT, the lowest error is recorded as 0.1226 when SNR = 30 dB, while the highest error occurs as 0.2483 when SNR = −5 dB. The mean value of prediction NRMSE is 0.1654, and the mean prediction accuracy is 83.46%. Finally for ER, the lowest error is recorded as 0.1147 when SNR = 30 dB, while the highest error occurs as 0.2535 when SNR = −5 dB. The mean value of prediction NRMSE is 0.1631, and the mean prediction accuracy is 83.69%.
When S S = 500, SVM also gives the best performance of mean prediction accuracy, and LR is in the second best, too. ML methods still outperforms MMSE in lower SNR regions due to their better anti-noise abilities. Additionally, the NRMSE of LR becomes the lowest of ML algorithms and near MMSE when SNR ⩾ 20 dB.
The SER is shown in Figure 10, in which all the SER also show an downward trend with the increase SNR.
For RF, the lowest SER is recorded as 4.75×10 −3 when SNR = 20 dB, while the highest SER occurs as 3.12×10 −1 when SNR For clarity, all the NRMSE and SER values are illustrated in Tables 1 and 2. Overall, the SER of all the methods decreases with the increase of the SNR, because less noise leads higher accuracies. The SER of detection based on ML methods match their performances of channel prediction. SVM and LR outperform the other three algorithms both in prediction NRMSE and detection SER.

Normalised doppler shift
In this subsection, the performance of detection based on channel prediction for different values of the normalised Doppler shift in the fading channel will be compared. In the following, based on the previous results, only SVM and LR will be chosen to make comparison with the perfect detection, and all the tests are done for 10,000 data points to calculate the SER. Figures 11  and 12 give the flow diagrams of SVM and LR. Figure 13 and Table 3 shows the SER for BPSK modulation in Rayleigh fading channels when f D T = 0.01, 0.03 and 0.06. When f D T = 0.01, the mean SER of SVM and LR are 3.98×10 −2 and 5.79 × 10 −2 . When f D T = 0.03, the mean SER of SVM is 5.57 × 10 −2 and the mean SER of LR is 1.00×10 −1 . When f D T = 0.06, the mean SER of SVM is 9.85 × 10 −2 , while the mean SER of LR is 2.00 × 10 −1 . In addition, the performance of LR is better than SVM at a higher SNR, while SVM is better at a smaller SNR. The normalised Doppler shift relates to the relative speed between transmitter and receiver. Generally speaking, because the normalised Doppler shift leads to the variation of wireless channel and signal, a larger value of the normalised Doppler shift gives a higher SER for signal detection. Figure 14 gives the SER for BPSK signalling in different R K values. In this subsection, SVM and LR will be compared with the existing scheme, which is based on MMSE estimator with conventional PSAM (CPSAM) detector in refs. [3] and [4]. When R K = 0 and SER = 10 −1 , the performances of SVM and LR are about 4.7 and 4.1 dB worse than the performance of existing scheme. When R K = 4 and SER=10 −1 , the perfor-

FIGURE 11
Flow diagram of SVM mance of LR is about 2.6 dB worse than the existing scheme and SVM. When R K = 8 and SER=10 −1 , the performances of existing scheme and LR are about 0.2 and 3.8 dB worse than the performance of SVM, respectively. Additionally, when R K = 0, the mean SER of SVM from 0 to 30 dB is 1.18×10 −1 and the mean SER of LR from 0 to 30 db is 1.12×10 −1 . When R K = 4, the mean SER of SVM from 0 to 30 dB is 4.82×10 −2 , while that of LR is 6.79×10 −2 . When R K = 8, the mean SER of SVM and LR from 0 to 30 dB are 4.02×10 −2 and 5.74×10 −2 respectively. When R K = 8 and SNR ⩾ 20, the SERs of both algorithms are ⩽ 10 −3 . In Figure 15, the SER for 16-ary phase shift keying(16-PSK) signalling in different fading channel conditions is shown. When R K = 0, the mean SER of SVM from 0 to 30 dB is 4.90×10 −1 , and the mean SER of LR from 0 to 30 dB is 4.21×10 −1 . Both of the two algorithms cannot achieve the SER of lower than 10 −1 when SNR ⩽ 30 dB. When R K = 4, the mean SER of SVM and LR from 0 to 30 dB are 2.74×10 −1 and 2.99×10 −1 . When the SER = 10 −1 , the performance of SVM is about 2.7 dB worse than the CPSAM and the performance of LR is about 2.5 dB worse than the existing scheme. And when R K = 8, the mean SER of SVM from 0 to 30 dB is 2.45×10 −1 , and that of LR is 2.79×10 −1 . When the SER = 10 −1 , the performance of SVM and LR are about 1.3 and 2.4 dB worse than the existing scheme. Figure 16 shows the SER for 16-ary quadrature amplitude modulation(16-QAM) signalling in different fading channel conditions. When R K = 0, the mean SER of SVM from 0 to 30 dB is 2.82×10 −1 , and the mean SER of LR from 0 to 30 dB is 2.64×10 −1 . When the SER = 10 −1 , the performance of LR is about 2.7 dB worse than the existing scheme. However, SVM can not achieve the SER of lower than 10 −1 when SNR ⩽ 30 dB. When R K = 4, the mean SER of SVM and LR from 0 to 30 When the SER = 10 −1 , the performance of SVM is about 5.1 dB better than the existing scheme and the performance of LR is about 0.2 dB better than the existing scheme. When the SER = 10 −2 , the performance of LR is about 3.5 dB worse than the existing scheme. However, SVM can not achieve the SER of lower than 10 −2 when SNR ⩽ 30 dB. And when R K = 8, the mean SER of SVM from 0 to 30 dB is 4.38×10 −2 , and that of LR is 1.16×10 −1 . When the SER = 10 −1 , the performances of SVM and LR are about 7.8 and 1.7 dB better than the existing scheme. When the SER = 10 −2 , the performances of SVM is about 4.7 dB better than the existing scheme and LR is about 0.7 dB worse than the existing scheme. And when the SER = 10 −3 , the performances of SVM and LR are about 0.8 and 2.6 dB worse than the existing scheme. Table 4 shows the SERs of different signalling when R K = 8. Overall, larger values of R K result in higher accuracy of SVM and LR prediction because of the better channel conditions. This is because higher R K leads to stronger direct wave from transmitter to receiver. Respectively, when SNR = 30 dB and R K = 8, in BPSK signalling, the SER of SVM and LR can be lower than 10 −3 . In 16-PSK signalling, the SER of SVM can reach 1.13×10 −2 and the SER of LR can reach 4.28×10 −3 . In 16-QAM signalling, the SER of SVM and LR can also be lower than 10 −3 .  Generally, in the lower SNR region, the SVM predictor shows reliable SER performance. On the other hand, when the constellation size is large, the gaps between the SERs of ML methods and those of existing scheme are small. In 16-PSK and 16-QAM modulation, the mean SER of ML methods reaches the same level as the existing scheme, and even outperforms it in some cases, which means that, compared to the existing scheme, ML methods can learn channel characteristics from neighbouring data points, and eliminate noise disturbance. In the higher SNR region, the performance of LR outperforms that of SVM. It is because SVM is based on the structural risk minimisation principle, which can prevent overfitting, while LR is not. In addition, after the test of even higher range of SNR, the minimum SER value of the LR is significantly lower than that of SVM, which means the highest accuracy that LR can achieve is higher than the value that SVM can achieve. Compared to existing scheme, LR and SVM do not need any channel model knowledge for estimation. In these figures, the curves of LR and SVM become flatter when SNR increases, the similar situation is also occurred on the curves of existing scheme with the increase of SNR.

Prediction efficiency
In Table 5, the mean training and prediction time at each data point for different algorithms and different signalling is compared. The training and prediction time represents the time of each algorithm to renew the model parameter and make prediction in each data point. They are recorded when SNR = 5. Similar tests for others SNRs have also been done. The results are not affected by SNR values. In BPSK, 16-PSK, and 16-QAM signalling, when using LR, the training and prediction time for each data point is 0.4150, 0.4171, and 0.4086 s. When using SVM, the prediction of each point spends 0.1444, 0.1414, and 0.1410s. From the table, the prediction time is irrelevant to the modulation type. In the test, for DL methods, each update of model needs decades or hundreds of epochs, and each epoch takes several seconds. Compared to DL, in the dynamic wireless channel environment, LR and SVM can update the model and make real-time prediction at each data point in no more than 0.5 s. On the other hand, the training and prediction time shows that, although the prediction accuracy of SVM is slightly lower than that of LR, but SVM only about 34% time of LR for each data point. Therefore, SVM is more efficient than LR.

CONCLUSION
This work has studied five ML algorithms (RF, LR, SVM, DT, and ER) for real-time channel prediction based on the received signals, which do not need any channel model knowledge. The results have shown that, in terms of the average prediction accuracy, the SER of detection, and prediction efficiency, the SVM give the best performance among all the five algorithms. When SNR = 30 dB and R K = 8, for BPSK and 16-QAM modulation, the SERs of detection based on SVM and LR prediction have both reached lower than 10 −3 , and for 16-PSK modulation, the SERs of SVM and LR have reached 1.13×10 −2 and 4.28×10 −3 . Additionally, in higher constellation size conditions, ML methods have reached similar detection accuracy to existing scheme and even outperformed it, which shows the potential ability of ML algorithms in complex channel conditions. The main contribution of this work includes the following. First, to the best of our knowledge, this is the first time that classical ML algorithms are used to predict wireless channel. Second, the detection accuracies of different ML predictors have been tested. Moreover, because of the efficiency of traditional ML, the proposed method will be easier to use in the real-time prediction using received signal. More researches will be done to improve the performances of predictors and to find a balance between the high efficiency of classical ML algorithms and the high accuracy of recent deep learning methods in the future.

ACKNOWLEDGEMENT
This work is supported in part by EC H2020 DAWN4IoE-Data Aware Wireless Network for Internet-of-Everything under Grant 778305.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
Research data are not shared.