- Data preprocessing
The data comes from the unit operation data recorded by SCADA system of a wind farm in Northwest China from August 1 to October 2, 2013, and the alarm record shows that thebearing is damaged. The data records the operation state parameters of the unit in detail, including time, wind speed, total power generation, ambient temperature, generator speed, wind direction angle, etc. the sampling interval is 1 min. However, the original data contains some invalid data, such as unit start-up and shut-down, null value record, etc. in order to get a good model prediction effect, it is necessary to preprocess the original data and eliminate the invalid data according to the following principles:
The point where the output power of the generator is less than zero; the point where the wind speed is less than the cut in wind speed and greater than the cut out wind speed; the point where the generator speed is less than zero.
- Feature selection
The original data contains 45 data features, among which there are some redundant and irrelevant features. Data determines the upper limit of the prediction effect of the model. When the input dimension of the model exceeds a certain limit, it will have a certain impact on the prediction effect of the model and reduce the operation efficiency of the model. Therefore, it is necessary to select the features of the original data and extract the useful data information for the research target. In this paper, mutual information method is used for feature selection
Where: P (x) and P (y) are the marginal distribution probability of variables X and Y; P (x, y) is the joint distribution probability.
After feature selection, the features with high correlation with gearbox bearing temperature are obtained: wind speed, generator output, 10 min average wind speed, 1 min average wind speed, 1 min average generator output, engine room temperature, ambient temperature, gear oil temperature, generator stator temperature and wind direction angle. It can be found that not all of the selected features are directly related to the running state of the gearbox bearing, and some of them indirectly affect the change of the bearing temperature. If these features are ignored, the prediction effect of the model may have a certain deviation.