A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)
-
- Journal of Intelligence and Information Systems
- /
- v.27 no.3
- /
- pp.57-73
- /
- 2021
Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.
Recommender systems based on association rule mining significantly contribute to seller's sales by reducing consumers' time to search for products that they want. Recommendations based on the frequency of transactions such as orders can effectively screen out the products that are statistically marketable among multiple products. A product with a high possibility of sales, however, can be omitted from the recommendation if it records insufficient number of transactions at the beginning of the sale. Products missing from the associated recommendations may lose the chance of exposure to consumers, which leads to a decline in the number of transactions. In turn, diminished transactions may create a vicious circle of lost opportunity to be recommended. Thus, initial sales are likely to remain stagnant for a certain period of time. Products that are susceptible to fashion or seasonality, such as clothing, may be greatly affected. This study was aimed at expanding association rules to include into the list of recommendations those products whose initial trading frequency of transactions is low despite the possibility of high sales. The particular purpose is to predict the strength of the direct connection of two unconnected items through the properties of the paths located between them. An association between two items revealed in transactions can be interpreted as the interaction between them, which can be expressed as a link in a social network whose nodes are items. The first step calculates the centralities of the nodes in the middle of the paths that indirectly connect the two nodes without direct connection. The next step identifies the number of the paths and the shortest among them. These extracts are used as independent variables in the regression analysis to predict future connection strength between the nodes. The strength of the connection between the two nodes of the model, which is defined by the number of nodes between the two nodes, is measured after a certain period of time. The regression analysis results confirm that the number of paths between the two products, the distance of the shortest path, and the number of neighboring items connected to the products are significantly related to their potential strength. This study used actual order transaction data collected for three months from February to April in 2016 from an online commerce company. To reduce the complexity of analytics as the scale of the network grows, the analysis was performed only on miscellaneous goods. Two consecutively purchased items were chosen from each customer's transactions to obtain a pair of antecedent and consequent, which secures a link needed for constituting a social network. The direction of the link was determined in the order in which the goods were purchased. Except for the last ten days of the data collection period, the social network of associated items was built for the extraction of independent variables. The model predicts the number of links to be connected in the next ten days from the explanatory variables. Of the 5,711 previously unconnected links, 611 were newly connected for the last ten days. Through experiments, the proposed model demonstrated excellent predictions. Of the 571 links that the proposed model predicts, 269 were confirmed to have been connected. This is 4.4 times more than the average of 61, which can be found without any prediction model. This study is expected to be useful regarding industries whose new products launch quickly with short life cycles, since their exposure time is critical. Also, it can be used to detect diseases that are rarely found in the early stages of medical treatment because of the low incidence of outbreaks. Since the complexity of the social networking analysis is sensitive to the number of nodes and links that make up the network, this study was conducted in a particular category of miscellaneous goods. Future research should consider that this condition may limit the opportunity to detect unexpected associations between products belonging to different categories of classification.
According as the automation of clerical work(OA ; Office Automation) develops, the use of VDT(Visual or Video Display Terminal) is increasing suddenly. But, in proportion to the spread of office automation(OA tendency), the self-conciousness syptom attendant upon the work is appearing also (Kim, Jung Tae, Lee, Young Ook, 1990). The apparatuses of office enable the clerical workers to be convenient and perform mass businesses. But, they are increasing the opportunity to be exposed to VDT syndrom, techno stress, computer terminal disease, pain by muscle strain(RSI), bradycausia of noise nature, and electromagnetic waves, etc. which are referred to as the new type of occupational diseases to the workers. It is the real situation that the workers to use VDT is complaining of the physical inconvenience sense in the recent newspaper and literature, it is the point of time that the sydrome to come from VDT use and computer terminal disease, etc. must be classified into the occupational disease(Lee, Kwang Young 1990, Lee, Kyoo Hak 1990, Lee, Won Ho 1991, Lee, Si Young 1991, Lee, Joon 1991, Choi, Young Tae 1991, Heo, Seung Ho 1989). In addition, it is the real situation that the scientifitic study result about the scope that electromagnetic waves has influence on the human body has not been suggested yet, and criticism on the stable exposure permission standard about electromagnetic waves to be emitted from VDT and on the problem in the health about electromagnetic waves is continuing. (IEEE Spectrum, 1990). In addition according to the experience of nursery business of industry field, it is the real situation that the patients who consult complaining of physical and mental inconvenience sence, among the users of apparatus of office automation, are reaching 10% of the patients coming to doctor's room. Therefore, it is necessary to confirm the self-consciousness symptom that the clerical workers complain of multilaterally with the actual state examination about the use of the apparatuses of offices automaton. Thus, this study was tried as th basic data for the cosultation and education for the maintenance and furtherance of the health of workers as the nurse of industry field, by confirming the contents of self-consciousness symptom attendant upon the use of the apparatus for office outomation making the financial institution in which the spparatus for office automation in most frequently used as the subject, and by examining whether there is the difference according to the subject of study, the data were collected, by using the questionnaire method, making 200 workers who consented to the study participation as the subject, among the persons who have spent over 3 months since they used the apparatuses for office automation and didn't receive the treatment in hospital due to the clerical disease for recent 3 years. The period of data collection was from Oct. 9, 1991 to Oct. 12. As for the measurement instrument about the complaint if self-consciousness symptom attendant upon the use of apparatuses fo office automation, the question item on the complaint symptom of health problem attendant upon the treatment of VDT that Kim(1991) developed and on CMI health problem and the question items on the fatigue degree due to industry were used by previous examination to 25 persons. Collected data were analyzed with the statistical method such as percentage, arithmetic mean, Person correlation coeffient, Kai square verfication, t-test, ANOVA, etc. by using SPSS/PC+ program, and the result is as follows : 1. The self-consciousness symptom that the clerical workers complained of most frequetly appeared high in 'My eyes are tired'(99.4%), 'I feel fatigue and weariness'(99.4%), 'I feel that my head is heavy5(90.0%), 'eyesight fell'(88.8%), 'I have a stiff neck'(88.8%), 'I fell pain in the shoulder'(85.0%), 'I feel cold and painful in the eyes'(76.9%), 'I feel the dry sense of eyeball'(76.2%), 'My nerves are edgy, and I an fretful, (75.6%), 'I feel pain in the waist'(73.2%) and 'I fell pain in the back'(72.8%). It emerged that the subject use the apparatuses for office automation complained of self-consciousness symptoms related to visual symptoms and musculoskeletal symptoms. 2. As for the general feature of examination subjects, the result to see the distribution by classifying into sex, age, school career, use career of apparatuses for office automation, skillfulness degree of the use of apparatus for office automation, use hours of the apparatuses for office automation per 1 day, type of business of the apparatus for office automation, rest hours during the use of apparatus for office automation, satifaction degree of business of office automation, and work circumstance, etc. emerged as follows : As for the sex of subjects, the distribution showed that men were 58.8% and women were 41.3%, Age was average 26.9. As the distribution of school career, the distribution showed that4below the graduation of high school' was 58.8%, 'graduation from junior college-university' was 35.0%, and 'over graduate school' was 6.3%. In the question to ask the existence or non-existence of experience of health consultation in connection with the work of office automation, the response that I had the consultation exprience and I feel the necessity emergerd as 90.1% And, the case that the subject who didn't wear the glasses or lens before using the OA apparatus wear glasses or lens after using OA apparatus emerged as 28.3% of whole. As for the existence or non-existence of use career of OA apparatus, the case under 3 years was highest as 52. 7%. As for the skillfulnness degree about the use of apparatus for office automation, most of them are skillful with the fact that 'common' was 44.4%, 'skill' was 42.5%, and 'unskillful' was 13.1% As for the use average hours of the apparatus for office automation per 1 day, the distribution showed that the case under 3-6 hours was 33.1%, the case under 6-9 hours was 28.1%, the case under 3 hours was 30.6%, and the case over 9 hours was 8.1% Main OA business and the use hours for 1 day showed in the order of keeping and retrieval, business of information transmission(162min), business of information transmission(79.3 min), business of document framing(55.5 min), and business of duplication and printing(25.4min). as for the rest during the use of apparatus for affice automation, that I take rest occasion demands the major portion, but that I take after completing the work emerged as 33.8%. Though the subiness gets to be convenient by the use of the apparatus for of office automation, respondents who showed the dissatisfaction about the present OA business emergd high as 78.1%. The work circumstances of each office was good with the fact that the temperature of office was 21.8, noise was average 42.7db, and the illumination was average 364.4 lx, in the light of ANSi/HFS 100 Standard. 3. Sight syptom, musculoskeletal symptom, skin and other symptoms showed the significant difference according to the extent of skillfulness of the apparatus for office automation. All the symptoms exept skin symptom showed the difference according to the use hours of the apparatus for office automation. All the question items exept the sytoms of digestive organs and the rest hours during the apparatus for office automation showed the signicant difference. The question item which showed the signicant difference from the satisfaction degree of present OA business showed the significant difference from all the question item classified into 6 groups. But, age and school career didn't significant difference from the complaint of any self-consciousness symptoms.