• Title/Summary/Keyword: Generation Model

Search Result 5,392, Processing Time 0.035 seconds

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Local, Jobless Person, Homo Economicus, Three Axis of Kwak Hashin's Works (로컬, 룸펜, 경제적 인간, 곽하신 소설의 세 좌표)

  • Kim, Yang-Sun
    • Journal of Popular Narrative
    • /
    • v.26 no.3
    • /
    • pp.161-188
    • /
    • 2020
  • This paper seeks to expand the scale of literary history by restoring and analyzing the whole aspect of Kwak Hashin's works, which has so far been studied little. For this purpose, I notice the rupture of discontinuity of his works which is greatly divided into the colonial period and post Korean war period. And the characteristics of each works can be analyzed based on the three axis, local(colonial period), jobless person(post-war period), and Homo Economicus(some short stories, and popular novels in post-war period). In Chapter 2, 'Local-the world of Munjang', I evaluated that Kwak Hashin's novel, which had been published in the late 1930s in the Journal of Munjang, embodied anti-modern aesthetic consciousness, as clearly revealing the sorrow for disappearing things, the pre-modern sense of time, and the preference for local. In Chapter 3, 'Jobless Person' and Chapter 4, 'The State of All People's Struggle against All People, The Appearance of Homo Economicus', the Korean society in late 1950s, which entered underdeveloped capitalist countries after Korean war, can be characterized by two contrasting male-gender, one is the jobless, incompetent male, and the economic man on the other hand. In the late '50s, Lumpen(=Jobless Person) novels showed the problems of the Korean economy through incompetent male character. The intelligent men took the path to survival rather than morality or intimacy, projecting their own incompetence and anxiety to women/wives. In the popular novels Women's Song and The Shadow of the Fig Tree, achievement-oriented male figures who betrayed their colleagues, and exploited women's sex by using love relationships to rise to the top appeared. They can be defined as the Homo Economicus who embody the state of universal struggle against all people. These novels showed the formation of the masculinity in post Korean war period, which pursued the survival of the fittest, borrowing form of popular novel. As we have seen so far, Kwak Hashin needs to be re-evaluated as an writer who expanded the modern literary history in the outside of literature. He was the last generation writer written in Korean late colonial period, and provided the model of postwar literature by borrowing the form of journalism and popular novels.

Implications of Shared Growth of Public Enterprises: Korea Hydro & Nuclear Power Case (공공기관의 동반성장 현황과 시사점: 한국수력원자력(주) 사례를 중심으로)

  • Jeon, Young-tae;Hwang, Seung-ho;Kim, Young-woo
    • Journal of Venture Innovation
    • /
    • v.4 no.2
    • /
    • pp.57-75
    • /
    • 2021
  • KHNP's shared growth activities are based on such public good. Reflecting the characteristics of a comprehensive energy company, a high-tech plant company, and a leading company for shared growth, it presents strategies to link performance indicators with its partners and implements various measures. Key tasks include maintaining the nuclear power plant ecosystem, improving management conditions for partner companies, strengthening future capabilities of the nuclear power plant industry, and supporting a virtuous cycle of regional development. This is made by reflecting the specificity of nuclear power generation as much as possible, and is designed to reflect the spirit of shared growth through win-win and cooperation in order to solve the challenges of the times while considering the characteristics as much as possible as possible. KHNP's shared growth activities can be said to be the practice of the spirit of the times(Zeitgeist). The spirit of the times given to us now is that companies should strive for sustainable growth as social air. KHNP has been striving to establish a creative and leading shared growth ecosystem. In particular, considering the positions of partners, it has been promoting continuous system improvement to establish a fair trade culture and deregulation. In addition, it has continuously discovered and implemented new customized support projects that are effective for partner companies and local communities. To this end, efforts have been made for shared growth through organic collaboration with partners and stakeholders. As detailed tasks, it also presents fostering new markets and new industries, maintaining supply chains, and emergency support for COVID-19 to maintain the nuclear power plant ecosystem. This reflects the social public good after the recent COVID-19 incident. In order to improve the management conditions of partner companies, productivity improvement, human resources enhancement, and customized funding are being implemented as detailed tasks. This is a plan to practice win-win growth with partner companies emphasized by corporate social responsibility (CSR) and ISO 26000 while being faithful to the main job. Until now, ESG management has focused on the environmental field to cope with the catastrophe of climate change. According to KHNP is presenting a public enterprise-type model in the environmental field. In order to strengthen the future capabilities of the nuclear power plant industry as a state-of-the-art energy company, it has set tasks to attract investment from partner companies, localization and new technologies R&D, and commercialization of innovative technologies. This is an effort to develop advanced nuclear power plant technology as a concrete practical measure of eco-friendly development. Meanwhile, the EU is preparing a social taxonomy to focus on the social sector, another important axis in ESG management, following the Green Taxonomy, a classification system in the environmental sector. KHNP includes enhancing local vitality, increasing income for the underprivileged, and overcoming the COVID-19 crisis as part of its shared growth activities, which is a representative social taxonomy field. The draft social taxonomy being promoted by the EU was announced in July, and the contents promoted by KHNP are consistent with this, leading the practice of social taxonomy

Work & Life Balance and Conflict among Employees : Work-life Balance Effect that Reflects Work Characteristics (일·생활 균형과 구성원간 갈등관계 : 직장 내 업무 특성을 반영한 WLB 효과 중심으로)

  • Lee, Yang-pyo;Choi, Chang-bum
    • Journal of Venture Innovation
    • /
    • v.7 no.1
    • /
    • pp.183-200
    • /
    • 2024
  • Recently, with the MZ generation's entry into society and the social participation of the female population, conflicts are occurring between workplace groups that value WLB and existing groups that emphasize collaboration due to differences in work orientation. Public institutions and companies that utilize work-life balance support systems show differences in job Commitment depending on the nature of the work and the activation of the support system. Accordingly, it is necessary to verify the effectiveness of the WLB support system actually operated by the company and present universally valid standards. The purpose of this study is, first, to verify the effectiveness of the support system for work-life balance and to find practical consensus amid changes in policies and perceptions of the working environment. Second, the influence of work-life balance level and job immersion according to work characteristics was analyzed to verify the mutual influence in order to establish standards for WLB operation that reflects work characteristics. For the study, a 2X2 matrix model was used to analyze the impact of work-life balance and work characteristics on job commitment, and four hypotheses were established. First, analysis of the job involvement level of conflict-type group members, second, analysis of the job involvement level of leading group members, third, analysis of the job involvement level of agreeable group members, and fourth, analysis of the job involvement level of cooperative group members. To conduct this study, an online survey was conducted targeting employees working in public institutions and large corporations. The survey was conducted for a total of 9 days from October 23 to 31, 2023, and 163 people responded, and the analysis was based on a valid sample of 152 people, excluding 11 copies that were insincere responses or gave up midway. As a result of the study's hypothesis testing, first, the conflict type group was found to have the lowest level of job engagement at 1.43. Second, the proactive group showed the highest level of job engagement at 4.54. Third, the conformity group showed a slightly lower level of job involvement at 2.58. Fourth, the cooperative group showed a slightly higher level of job involvement at 3.80. The academic implications of the study are that it subdivides employees' personalities into factors based on the level of work-life balance and nature of work. The practical implications of the study are that it analyzes the effectiveness of WLB support systems operated by public institutions and large corporations by grouping them.

Numerical Study on Thermochemical Conversion of Non-Condensable Pyrolysis Gas of PP and PE Using 0D Reaction Model (0D 반응 모델을 활용한 PP와 PE의 비응축성 열분해 기체의 열화학적 전환에 대한 수치해석 연구)

  • Eunji Lee;Won Yang;Uendo Lee;Youngjae Lee
    • Clean Technology
    • /
    • v.30 no.1
    • /
    • pp.37-46
    • /
    • 2024
  • Environmental problems caused by plastic waste have been continuously growing around the world, and plastic waste is increasing even faster after COVID-19. In particular, PP and PE account for more than half of all plastic production, and the amount of waste from these two materials is at a serious level. As a result, researchers are searching for an alternative method to plastic recycling, and plastic pyrolysis is one such alternative. In this paper, a numerical study was conducted on the pyrolysis behavior of non-condensable gas to predict the chemical reaction behavior of the pyrolysis gas. Based on gas products estimated from preceding literature, the behavior of non-condensable gas was analyzed according to temperature and residence time. Numerical analysis showed that as the temperature and residence time increased, the production of H2 and heavy hydrocarbons increased through the conversion of the non-condensable gas, and at the same time, the CH4 and C6H6 species decreased by participating in the reaction. In addition, analysis of the production rate showed that the decomposition reaction of C2H4 was the dominant reaction for H2 generation. Also, it was found that more H2 was produced by PE with higher C2H4 contents. As a future work, an experiment is needed to confirm how to increase the conversion rate of H2 and carbon in plastics through the various operating conditions derived from this study's numerical analysis results.

The Study of Characteristics of Consumer Purchasing Private Brand Products at Large-Scale Mart (국내 대형마트의 유통업체 브랜드 상품 구매 소비자의 특성 분석에 관한 연구)

  • Hwang, Seong-Huyk;Lee, Jung-Hee;Roh, Eun-Jung
    • Journal of Distribution Research
    • /
    • v.15 no.4
    • /
    • pp.1-19
    • /
    • 2010
  • As having the movement of developing private brand (PB) goods, domestic big retailers are facing up with new problems. Thus, it is required studies of PB products, and how consumers recognize PB products as a consideration commodity set. Also, it is worthy in order that it gives us the important meaning on the marketing strategy with focusing on evaluating the differences between customers buying PB grocery goods with respect to demographic characteristics and purchasing behaviors. PB has some advantages for customers and retailers. However, according to AC Nielson's report (2005), Asian and emerging market has 1/5 sales relatively to Western countries. But we can assume that the emerging market has the most potential growth through this result. As a result from several other studies, it becomes necessary to not only increase the rate of selling composition of PB product temporarily, but also analyze the characteristics of customers using big retailers and segmenting customer groups to make PB product as a consideration commodity set for them. In addition, it is needed to have a variety of acts of marketing. From studies related to PB, there is a prejudice - cheap products have low quality - but, evaluation by customers who have used those products shows neutral stand, and there is a study representing that it is the most important to accumulate the belief between the retailers selling PB products and consumers using those for the accurate evaluation and intention on purchasing. Also, by the result from analyzing the characteristics of customers buying PB products, we could assume that higher income and higher education level, more preference on PB products. Especially, according to TNS's research, the primary targets of PB product are 30's who seeks value for money and planned spending habits, and 40's who have teenager children, and are interested in encouraging themselves. This paper used Probit model to analyze the characteristics of consumers. This model helps us to analyze with the variables representing the demographic characteristics of consumers (gender, age, educational level, occupation, income level, living area), and variables related to purchasing behavior (visiting frequency on big retailers, the average amount that they pay for goods in there, and check-up which brand made those goods). The method we used in this study is by man to man interview and survey on-line with the rate of 89% and 11% in Seoul and Gyunggi Province, respectively, for about one month from the beginning of February, 2008. As a result of this, under the assumption that people buy PB products more as long as they go shopping more, it was not meaningful for target groups which we pointed out as frequently visiting customers to be. Although, we have expected women buy more PB products than men do, gender doesn't mean anything for the result. And, it has inferred that married people buy more PB goods than singles do. It was also meaningless with variables related to occupation. Because housewives are often exposed to any kind of supermarket than workers are, we could not get any relatives. Moreover, we couldn't proof that younger generation prefer big retailers more than older people who 50~60's. Education levels doesn't affect on the purchase of PB product as well. Related to living area, the result is statistically not similar as we expected whether living in Seoul or not. It shows there is no relationship with the preference on retail brands and PB products, and it is similar with the study researched by TNS(2008) that customers tend to buy PB product impulsively no matter which brand it is and where they are even though their shopping place is the big market where customers are often using. Variables on which we had meaningful results are income level and living place. That is, customers who have 3,000,000~6,000,000 WON every month on average are more willing to buy PB products than other customers whose income is over 6,000,000 WON, and residents not living in Seoul prefer PB goods than those who are living in Seoul. To explain more about what we got, if there is only one condition about customer's visiting frequency on big retails, we could come up with this result that more exposed to PB products, more purchasing frequency. Consequently, it brings the important insight that large retailers have to prepare something to make customers visit them often to increase selling rate of PB products. To demonstrate the result of analyzing more, what is more efficient variables are demographically including marital status, income level, and residential area to buy items that affect the PB products and could include the frequency of visiting large markets by the purchase habits. Specifically, then, married couples rather than singles, middle-income customers than high-income customers, and local residents not living in Seoul than customers in Seoul are more likely to purchase PB goods. In addition, as long as a customer visits two times more, then the purchasing rate of PB products is to increase over 5.3%. Therefore, it seems that retailers are better to make a shopping place as fun and comfortable places. With overwhelming the idea that PB products are just cheap, one-time purchase goods, it is needed to increase the loyalty on those goods like NB products, try to make PB products as a consideration products set, and occur to sustainable sales. Especially, as suggested by this paper, it seems like it strongly needs to identify the characteristics of customers who prefer PB, to segment those customers, and to select the main target, and to do positioning with well-planned marketing strategies. Then, it is able to give us a meaningful point on marketing strategy by developing the field of PB study, identifying the difference of life style and shopping habits of customers.

  • PDF

Critical Success Factor of Noble Payment System: Multiple Case Studies (새로운 결제서비스의 성공요인: 다중사례연구)

  • Park, Arum;Lee, Kyoung Jun
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.59-87
    • /
    • 2014
  • In MIS field, the researches on payment services are focused on adoption factors of payment service using behavior theories such as TRA(Theory of Reasoned Action), TAM(Technology Acceptance Model), and TPB (Theory of Planned Behavior). The previous researches presented various adoption factors according to types of payment service, nations, culture and so on even though adoption factors of identical payment service were presented differently by researchers. The payment service industry relatively has strong path dependency to the existing payment methods so that the research results on the identical payment service are different due to payment culture of nation. This paper aims to suggest a successful adoption factor of noble payment service regardless of nation's culture and characteristics of payment and prove it. In previous researches, common adoption factors of payment service are convenience, ease of use, security, convenience, speed etc. But real cases prove the fact that adoption factors that the previous researches present are not always critical to success to penetrate a market. For example, PayByPhone, NFC based parking payment service, successfully has penetrated to early market and grown. In contrast, Google Wallet service failed to be adopted to users despite NFC based payment method which provides convenience, security, ease of use. As shown in upper case, there remains an unexplained aspect. Therefore, the present research question emerged from the question: "What is the more essential and fundamental factor that should takes precedence over factors such as provides convenience, security, ease of use for successful penetration to market". With these cases, this paper analyzes four cases predicted on the following hypothesis and demonstrates it. "To successfully penetrate a market and sustainably grow, new payment service should find non-customer of the existing payment service and provide noble payment method so that they can use payment method". We give plausible explanations for the hypothesis using multiple case studies. Diners club, Danal, PayPal, Square were selected as a typical and successful cases in each category of payment service. The discussion on cases is primarily non-customer analysis that noble payment service targets on to find the most crucial factor in the early market, we does not attempt to consider factors for business growth. We clarified three-tier non-customer of the payment method that new payment service targets on and elaborated how new payment service satisfy them. In case of credit card, this payment service target first tier of non-customer who can't pay for because they don't have any cash temporarily but they have regular income. So credit card provides an opportunity which they can do economic activities by delaying the date of payment. In a result of wireless phone payment's case study, this service targets on second of non-customer who can't use online payment because they concern about security or have to take a complex process and learn how to use online payment method. Therefore, wireless phone payment provides very convenient payment method. Especially, it made group of young pay for a little money without a credit card. Case study result of PayPal, online payment service, shows that it targets on second tier of non-customer who reject to use online payment service because of concern about sensitive information leaks such as passwords and credit card details. Accordingly, PayPal service allows users to pay online without a provision of sensitive information. Final Square case result, Mobile POS -based payment service, also shows that it targets on second tier of non-customer who can't individually transact offline because of cash's shortness. Hence, Square provides dongle which function as POS by putting dongle in earphone terminal. As a result, four cases made non-customer their customer so that they could penetrate early market and had been extended their market share. Consequently, all cases supported the hypothesis and it is highly probable according to 'analytic generation' that case study methodology suggests. We present for judging the quality of research designs the following. Construct validity, internal validity, external validity, reliability are common to all social science methods, these have been summarized in numerous textbooks(Yin, 2014). In case study methodology, these also have served as a framework for assessing a large group of case studies (Gibbert, Ruigrok & Wicki, 2008). Construct validity is to identify correct operational measures for the concepts being studied. To satisfy construct validity, we use multiple sources of evidence such as the academic journals, magazine and articles etc. Internal validity is to seek to establish a causal relationship, whereby certain conditions are believed to lead to other conditions, as distinguished from spurious relationships. To satisfy internal validity, we do explanation building through four cases analysis. External validity is to define the domain to which a study's findings can be generalized. To satisfy this, replication logic in multiple case studies is used. Reliability is to demonstrate that the operations of a study -such as the data collection procedures- can be repeated, with the same results. To satisfy this, we use case study protocol. In Korea, the competition among stakeholders over mobile payment industry is intensifying. Not only main three Telecom Companies but also Smartphone companies and service provider like KakaoTalk announced that they would enter into mobile payment industry. Mobile payment industry is getting competitive. But it doesn't still have momentum effect notwithstanding positive presumptions that will grow very fast. Mobile payment services are categorized into various technology based payment service such as IC mobile card and Application payment service of cloud based, NFC, sound wave, BLE(Bluetooth Low Energy), Biometric recognition technology etc. Especially, mobile payment service is discontinuous innovations that users should change their behavior and noble infrastructure should be installed. These require users to learn how to use it and cause infra-installation cost to shopkeepers. Additionally, payment industry has the strong path dependency. In spite of these obstacles, mobile payment service which should provide dramatically improved value as a products and service of discontinuous innovations is focusing on convenience and security, convenience and so on. We suggest the following to success mobile payment service. First, non-customers of the existing payment service need to be identified. Second, needs of them should be taken. Then, noble payment service provides non-customer who can't pay by the previous payment method to payment method. In conclusion, mobile payment service can create new market and will result in extension of payment market.

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.