Search | Korea Science

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- 제26권4호
- /
- pp.127-148
- /
- 2020
The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.
https://doi.org/10.13088/jiis.2020.26.4.127 인용 PDF KSCI

The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method (Support Vector Regression에서 분리학습을 이용한 고객의 구매액 예측모형)

Hong, Tae-Ho;Kim, Eun-Mi
- Journal of Intelligence and Information Systems
- /
- 제16권4호
- /
- pp.213-225
- /
- 2010
Data mining has empowered the managers who are charge of the tasks in their company to present personalized and differentiated marketing programs to their customers with the rapid growth of information technology. Most studies on customer' response have focused on predicting whether they would respond or not for their marketing promotion as marketing managers have been eager to identify who would respond to their marketing promotion. So many studies utilizing data mining have tried to resolve the binary decision problems such as bankruptcy prediction, network intrusion detection, and fraud detection in credit card usages. The prediction of customer's response has been studied with similar methods mentioned above because the prediction of customer's response is a kind of dichotomous decision problem. In addition, a number of competitive data mining techniques such as neural networks, SVM(support vector machine), decision trees, logit, and genetic algorithms have been applied to the prediction of customer's response for marketing promotion. The marketing managers also have tried to classify their customers with quantitative measures such as recency, frequency, and monetary acquired from their transaction database. The measures mean that their customers came to purchase in recent or old days, how frequent in a period, and how much they spent once. Using segmented customers we proposed an approach that could enable to differentiate customers in the same rating among the segmented customers. Our approach employed support vector regression to forecast the purchase amount of customers for each customer rating. Our study used the sample that included 41,924 customers extracted from DMEF04 Data Set, who purchased at least once in the last two years. We classified customers from first rating to fifth rating based on the purchase amount after giving a marketing promotion. Here, we divided customers into first rating who has a large amount of purchase and fifth rating who are non-respondents for the promotion. Our proposed model forecasted the purchase amount of the customers in the same rating and the marketing managers could make a differentiated and personalized marketing program for each customer even though they were belong to the same rating. In addition, we proposed more efficient learning method by separating the learning samples. We employed two learning methods to compare the performance of proposed learning method with general learning method for SVRs. LMW (Learning Method using Whole data for purchasing customers) is a general learning method for forecasting the purchase amount of customers. And we proposed a method, LMS (Learning Method using Separated data for classification purchasing customers), that makes four different SVR models for each class of customers. To evaluate the performance of models, we calculated MAE (Mean Absolute Error) and MAPE (Mean Absolute Percent Error) for each model to predict the purchase amount of customers. In LMW, the overall performance was 0.670 MAPE and the best performance showed 0.327 MAPE. Generally, the performances of the proposed LMS model were analyzed as more superior compared to the performance of the LMW model. In LMS, we found that the best performance was 0.275 MAPE. The performance of LMS was higher than LMW in each class of customers. After comparing the performance of our proposed method LMS to LMW, our proposed model had more significant performance for forecasting the purchase amount of customers in each class. In addition, our approach will be useful for marketing managers when they need to customers for their promotion. Even if customers were belonging to same class, marketing managers could offer customers a differentiated and personalized marketing promotion.
PDF KSCI

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
- Journal of Intelligence and Information Systems
- /
- 제20권2호
- /
- pp.39-58
- /
- 2014
As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.
https://doi.org/10.13088/jiis.2014.20.2.039 인용 PDF KSCI

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

Chun, Se-Hak
- Journal of Intelligence and Information Systems
- /
- 제25권3호
- /
- pp.239-251
- /
- 2019
Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.
https://doi.org/10.13088/jiis.2019.25.3.239 인용 PDF KSCI

Validation of Surface Reflectance Product of KOMPSAT-3A Image Data: Application of RadCalNet Baotou (BTCN) Data (다목적실용위성 3A 영상 자료의 지표 반사도 성과 검증: RadCalNet Baotou(BTCN) 자료 적용 사례)

Kim, Kwangseob;Lee, Kiwon
- Korean Journal of Remote Sensing
- /
- 제36권6_2호
- /
- pp.1509-1521
- /
- 2020
Experiments for validation of surface reflectance produced by Korea Multi-Purpose Satellite (KOMPSAT-3A) were conducted using Chinese Baotou (BTCN) data among four sites of the Radical Calibration Network (RadCalNet), a portal that provides spectrophotometric reflectance measurements. The atmosphere reflectance and surface reflectance products were generated using an extension program of an open-source Orfeo ToolBox (OTB), which was redesigned and implemented to extract those reflectance products in batches. Three image data sets of 2016, 2017, and 2018 were taken into account of the two sensor model variability, ver. 1.4 released in 2017 and ver. 1.5 in 2019, such as gain and offset applied to the absolute atmospheric correction. The results of applying these sensor model variables showed that the reflectance products by ver. 1.4 were relatively well-matched with RadCalNet BTCN data, compared to ones by ver. 1.5. On the other hand, the reflectance products obtained from the Landsat-8 by the USGS LaSRC algorithm and Sentinel-2B images using the SNAP Sen2Cor program were used to quantitatively verify the differences in those of KOMPSAT-3A. Based on the RadCalNet BTCN data, the differences between the surface reflectance of KOMPSAT-3A image were shown to be highly consistent with B band as -0.031 to 0.034, G band as -0.001 to 0.055, R band as -0.072 to 0.037, and NIR band as -0.060 to 0.022. The surface reflectance of KOMPSAT-3A also indicated the accuracy level for further applications, compared to those of Landsat-8 and Sentinel-2B images. The results of this study are meaningful in confirming the applicability of Analysis Ready Data (ARD) to the surface reflectance on high-resolution satellites.
https://doi.org/10.7780/kjrs.2020.36.6.2.3 인용 PDF KSCI HTML

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

Oh, Chi Yeong;Kang, Nam-Hwa
- Journal of Science Education
- /
- 제45권3호
- /
- pp.292-303
- /
- 2021
This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.
https://doi.org/10.21796/jse.2021.45.3.292 인용 PDF KSCI

A Study on the Foundation of the Infrastructure for National Geospatial Information Distribution (국가 지리공간 정보 유통기반 구축에 관한 연구)

Choi, Jae-Hun;Chyung, Nan-Soo;Kim, Young-Sup
- Journal of Korea Spatial Information System Society
- /
- 제1권2호
- /
- pp.63-80
- /
- 1999
This study presents NGDM(National Geospatial Information Distribution Model) in order to effectively utilize and differently apply geospatial information which is important in the dispersion of GIS. In order to establish the NGDM, this study draws the guideline of NGDM in Korea by analyzing its present condition of domestic and foreign geospatial information distribution. It also investigates some major factors forming the infrastructure of NGDM in regulative, technical, physical, and social aspects. Based on these factors, this study presents a three-staged NGDM that is applicable in Korea. The NGDM consists of four components that are the consumer, supplier, gateway for the clearinghouse and the clearinghouse of the geospatial information. According to the management form of geospatial information, the types of NGDM are classified as the concentration type, the distribution type, and compound type. Also, this study explains the mutual relationship between the NGDM's components and suggests a three-staged NGDM of planting, growth, and maturity period considering comparison results of classified models and development direction of regulation, protocol, communication network, electronic commerce, and etc.
PDF

Change of Fractured Rock Permeability due to Thermo-Mechanical Loading of a Deep Geological Repository for Nuclear Waste - a Study on a Candidate Site in Forsmark, Sweden

Min, Ki-Bok;Stephansson, Ove
- Proceedings of the Korean Radioactive Waste Society Conference
- /
- 한국방사성폐기물학회 2009년도 학술논문요약집
- /
- pp.187-187
- /
- 2009
Opening of fractures induced by shear dilation or normal deformation can be a significant source of fracture permeability change in fractured rock, which is important for the performance assessment of geological repositories for spent nuclear fuel. As the repository generates heat and later cools the fluid-carrying ability of the rocks becomes a dynamic variable during the lifespan of the repository. Heating causes expansion of the rock close to the repository and, at the same time, contraction close to the surface. During the cooling phase of the repository, the opposite takes place. Heating and cooling together with the, virgin stress can induce shear dilation of fractures and deformation zones and change the flow field around the repository. The objectives of this work are to examine the contribution of thermal stress to the shear slip of fracture in mid- and far-field around a KBS-3 type of repository and to investigate the effect of evolution of stress on the rock mass permeability. In the first part of this study, zones of fracture shear slip were examined by conducting a three-dimensional, thermo-mechanical analysis of a spent fuel repository model in the size of 2 km $\times$ 2 km $\times$ 800 m. Stress evolutions of importance for fracture shear slip are: (1) comparatively high horizontal compressive thermal stress at the repository level, (2) generation of vertical tensile thermal stress right above the repository, (3) horizontal tensile stress near the surface, which can induce tensile failure, and generation of shear stresses at the comers of the repository. In the second part of the study, fracture data from Forsmark, Sweden is used to establish fracture network models (DFN). Stress paths obtained from the thermo-mechanical analysis were used as boundary conditions in DFN-DEM (Discrete Element Method) analysis of six DFN models at the repository level. Increases of permeability up to a factor of four were observed during thermal loading history and shear dilation of fractures was not recovered after cooling of the repository. An understanding of the stress path and potential areas of slip induced shear dilation and related permeability changes during the lifetime of a repository for spent nuclear fuel is of utmost importance for analysing long-term safety. The result of this study will assist in identifying critical areas around a repository where fracture shear slip is likely to develop. The presentation also includes a brief introduction to the ongoing site investigation on two candidate sites for geological repository in Sweden.
PDF

OVERVIEW OF KSTAR INTEGRATED CONTROL SYSTEM

Park, Mi-Kyung;Kim, Kuk-Hee;Lee, Tae-Gu;Kim, Myung-Kyu;Hong, Jae-Sic;Baek, Sul-Hee;Lee, Sang-Il;Park, Jin-Seop;Chu, Yong;Kim, Young-Ok;Hahn, Sang-Hee;Oh, Yeong-Kook;Bak, Joo-Shik
- Nuclear Engineering and Technology
- /
- 제40권6호
- /
- pp.451-458
- /
- 2008
After more than 10 years construction, KSTAR (Korea Superconducting Tokamak Advanced Research) had finally completed its assembly in June 2007, and then achieved the goal of first-plasma in July 2008 through the four month's commissioning. KSTAR was constructed with fully superconducting magnets with material of $Nb_3Sn$ and NbTi, and their operation temperatures are maintained below 4.5K by the help of Helium Refrigerator System. During the first-plasma operation, plasmas of maximum current of 133kA and maximum pulse width of 865ms were obtained. The KSTAR Integrated Control System (KICS) has successfully fulfilled its missions of surveillance, device operation, machine protection interlock, and data acquisition and management. These and more were all KSTAR commissioning requirements. For reliable and safe operation of KSTAR, 17 local control systems were developed. Those systems must be integrated into the logically single control system, and operate regardless of their platforms and location installed. In order to meet these requirements, KICS was developed as a network-based distributed system and adopted a new framework, named as EPICS (Experimental Physics and Industrial Control System). Also, KICS has some features in KSTAR operation. It performs not only 24 hour continuous plant operation, but the shot-based real-time feedback control by exchanging the initiatives of operation between a central controller and a plasma control system in accordance with the operation sequence. For the diagnosis and analysis of plasma, 11 types of diagnostic system were implemented in KSTAR, and the acquired data from them were archived using MDSpius (Model Driven System), which is widely used in data management of fusion control systems. This paper will cover the design and implementation of the KSTAR integrated control system and the data management and visualization systems. Commissioning results will be introduced in brief.
https://doi.org/10.5516/NET.2008.40.6.451 인용 PDF KSCI

Assessment of Precipitation Characteristics and Synoptic Pattern Associated with Typhoon Affecting the South Korea (우리나라 내습태풍 유형에 따른 강우특성 및 종관기후학적 분석)

Kim, Tae-Jeong;Park, Kun-Chul;Kwon, Hyun-Han
- Journal of Korea Water Resources Association
- /
- 제48권6호
- /
- pp.463-477
- /
- 2015
The recent unusual climate and extreme weather events have frequently given unexpected disaster and damages, facing difficulties in the management of water resources. In particular, climate change could result in intensified typhoons, and this would be the worst case scenario that can happen. The primary objective of this study is to identify the patterns of typhoon-induced precipitation and the associated synoptic pattern. This study focused on analyzing precipitation patterns over the South Korea using historic records as opposed to a specified season or duration, and further investigates the potential connection with heavy rainfall to synoptic patterns. In this study, we used the best track data provided by the Regional Specialized Meteorological Center of Japan for 40 years from 1973 to 2012. The patterns of the typhoon-induced precipitation were categorized into four groups according to a given typhoon track information, and then the associated synoptic climatology patterns were further investigated. The results demonstrate that the typhoon-induced precipitation patterns could be grouped and potentially simulated according to the identified synoptic patterns. Our future work will focus on developing a short-term forecasting model of typhoon-induced precipitation considering the identified climate patterns as inputs.
https://doi.org/10.3741/JKWRA.2015.48.6.463 인용 PDF KSCI

검색결과 548건 처리시간 0.031초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)