• Title/Summary/Keyword: Memory Analysis

Search Result 2,090, Processing Time 0.031 seconds

Analysis of groundwater withdrawal impact in the middle mountainous area of Pyoseon Watershed in Jeju Island using LSTM (LSTM을 활용한 제주도 표선유역 중산간지역의 지하수 취수영향 분석)

  • Shin, Mun-Ju;Moon, Soo-Hyoung;Moon, Duk-Chul;Koh, Hyuk-Joon;Kang, Kyung Goo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.267-267
    • /
    • 2021
  • 제주도는 화산섬의 지질특성상 강수의 지표침투성이 높아 지표수의 개발이용여건이 취약한 관계로 용수의 대부분을 지하수에 의존하고 있다. 따라서 지하수의 보전관리는 매우 중요한 사항이며 특히 지하수의 안정적인 이용을 위해서는 지하수 취수가 주변지역 지하수위에 미치는 영향 분석이 반드시 필요하다. 본 연구는 딥러닝 알고리즘인 Long Short-Term Memory(LSTM)를 활용하여 제주도 남동쪽 표선유역 중산간지역에 위치한 2개 지하수위 관측정을 대상으로 지하수 취수영향을 분석하였다. 입력자료로써 인근 2개 강우관측소의 일단위 강수량자료와 인근 6개 취수정의 지하수 취수량자료 및 연구대상 관측정의 지하수위 자료(2001. 2. 11. ~ 2019. 10. 31.)를 사용하였다. 지하수위 변동특성을 최대한 반영하기 위해 LSTM의 예측일수를 1일로 설정하였다. 보정 및 검증 기간을 사용하여 매개변수의 과적합을 방지하였으며, 테스트 기간을 사용하여 LSTM의 예측성능을 평가하였다. 평가지수로써 Nash-Sutcliffe Efficiency(NSE)와 평균제곱근오차(RMSE)를 사용하였다. 그리고 지하수 취수가 주변 지하수위 변동에 미치는 영향을 분석하기 위해 취수량을 최대취수량인 2,300 m3/일, 최대취수량의 2/3인 1,533 m3/일 및 0 m3/일로 설정하여 모의하였다. 모의결과, 2개 감시정의 보정, 검증 및 예측기간에 대한 NSE는 최대 0.999, 최소 0.976의 범위를 보였으며, RMSE는 최대 0.494 m, 최소 0.084 m를 보여 LSTM은 우수한 예측성능을 나타내었다. 이것은 LSTM이 지하수위 변동특성을 적절히 학습하였다는 것을 의미하며 따라서 추정된 매개변수를 활용하여 지하수 취수영향을 모의 및 분석하였다. 그 결과, 지하수위 하강량은 최대 0.38 m 였으며 이것은 대상지점에 대한 취수량은 지하수위 하강에 거의 영향을 주지 않는다는 것을 의미한다. 또한 취수량과 지하수위 하강량과의 관계는 한 개 관측정에 대해 선형적인 관계를 보인 반면 나머지 한 개 관측정에 대해서는 비선형적인 관계를 나타내는 것을 확인하였다. 따라서 LSTM 알고리즘을 활용하여 제주도 표선유역 중산간지역의 지하수위 변동특성을 분석할 수 있다.

  • PDF

Adverse Effects on EEGs and Bio-Signals Coupling on Improving Machine Learning-Based Classification Performances

  • SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.133-153
    • /
    • 2023
  • In this paper, we propose a novel approach to investigating brain-signal measurement technology using Electroencephalography (EEG). Traditionally, researchers have combined EEG signals with bio-signals (BSs) to enhance the classification performance of emotional states. Our objective was to explore the synergistic effects of coupling EEG and BSs, and determine whether the combination of EEG+BS improves the classification accuracy of emotional states compared to using EEG alone or combining EEG with pseudo-random signals (PS) generated arbitrarily by random generators. Employing four feature extraction methods, we examined four combinations: EEG alone, EG+BS, EEG+BS+PS, and EEG+PS, utilizing data from two widely-used open datasets. Emotional states (task versus rest states) were classified using Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) classifiers. Our results revealed that when using the highest accuracy SVM-FFT, the average error rates of EEG+BS were 4.7% and 6.5% higher than those of EEG+PS and EEG alone, respectively. We also conducted a thorough analysis of EEG+BS by combining numerous PSs. The error rate of EEG+BS+PS displayed a V-shaped curve, initially decreasing due to the deep double descent phenomenon, followed by an increase attributed to the curse of dimensionality. Consequently, our findings suggest that the combination of EEG+BS may not always yield promising classification performance.

The Case Analysis of Teacher's Questioning and Feedback through Vernal Interactions in the Classes of the Gifted in Science (과학영재 수업에서 언어적 상호작용을 통하여 본 교사의 발문과 피드백 사례분석)

  • Jung, Min-Soo;Chun, Mi-Ran;Chae, Hee-K.
    • Journal of The Korean Association For Science Education
    • /
    • v.27 no.9
    • /
    • pp.881-892
    • /
    • 2007
  • This study is aimed to classify teachers' questions and feedbacks as well as students' responses in term, of type and frequency, and speculate the distinctive features of verbal interactions including teachers' questions and feedbacks performed actively in the classes of the gifted in science. The 24 hours of the classes made for the 8th grade science-gifted students were observed and recorded. In addition, the mutual conversations between the teacher and the students were transcribed and analyzed, and the interviews with the teachers also were made. It is found that the teachers usually use the question methods of memory recollection, perception and memorization, together with an instant feedback method, while the students prefer to respond with rather short answers. The characteristic features of the class by the teachers who lead the active class show that they use the open questions at the beginning, raise the level of the questioning, use the questions 'why and how' frequently, and to ask evaluative questions. Their feedbacks to the students interestingly indicate that they show the students the attitude of accepting and receiving students' replies, invite different responses from other students by reserving instant answers or judgements to the students, and give the students the confidence of solving the next problems, by praising and encouraging them.

Prospero Homeobox 1 and Doublecortin Correlate with Neural Damage after Ischemic Stroke

  • Dong-Hun Lee;Eun Chae Lee;Sang-Won Park;Ji young Lee;Kee-Pyo Kim;Jae Sang Oh
    • Journal of Korean Neurosurgical Society
    • /
    • v.67 no.3
    • /
    • pp.333-344
    • /
    • 2024
  • Objective : Markers of neuroinflammation during ischemic stroke are well characterized, but additional markers of neural damage are lacking. The study identified associations of behavioral disorders after stroke with histologic neural damage and molecular biological change. Methods : Eight-week-old, 25 g male mice of the C57BL/6J strain were subjected to middle cerebral artery occlusion (MCAO) to induce ischemic stroke. The control group was a healthy wild type (WT), and the experimental group were designed as a low severity MCAO1 and a high severity MCAO2 based on post-stroke neurological scoring. All groups underwent behavioral tests, realtime polymerase chain reaction, triphenyltetrazolium chloride (TTC) staining and Hematoxylin and Eosin staining. One-way analysis of variance was used to analyze statistical significance between groups. Results : In TTC staining, MCAO1 showed 29.02% and MCAO2 showed 38.94% infarct volume (p<0.0001). The pro-inflammatory cytokine interleukin (IL)-1β was most highly expressed in MCAO2 (WT 0.44 vs. MCAO1 2.69 vs. MCAO2 5.02, p<0.0001). From the distance to target in the Barnes maze test, WT had a distance of 178 cm, MCAO1 had a distance of 276 cm, and MCAO2 had a distance of 1051 (p=0.0015). The latency to target was 13.3 seconds for WT, 27.9 seconds for MCAO1, and 87.9 seconds for MCAO2 (p=0.0007). Prospero homeobox 1 (Prox1) was most highly expressed in MCAO2 (p=0.0004). Doublecortin (Dcx) was most highly expressed in MCAO2 (p<0.0001). Conclusion : The study demonstrated that histological damage to neural cells and changes in brain mRNA expression were associated with behavioral impairment after ischemic stroke. Prox1 and Dcx may be biomarkers of neural damage associated with long-term cognitive decline, and increased expression at the mRNA level was consistent with neural damage and long-term cognitive dysfunction.

Timing Driven Analytic Placement for FPGAs (타이밍 구동 FPGA 분석적 배치)

  • Kim, Kyosun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.7
    • /
    • pp.21-28
    • /
    • 2017
  • Practical models for FPGA architectures which include performance- and/or density-enhancing components such as carry chains, wide function multiplexers, and memory/multiplier blocks are being applied to academic FPGA placement tools which used to rely on simple imaginary models. Previously the techniques such as pre-packing and multi-layer density analysis are proposed to remedy issues related to such practical models, and the wire length is effectively minimized during initial analytic placement. Since timing should be optimized rather than wire length, most previous work takes into account the timing constraints. However, instead of the initial analytic placement, the timing-driven techniques are mostly applied to subsequent steps such as placement legalization and iterative improvement. This paper incorporates the timing driven techniques, which check if the placement meets the timing constraints given in the standard SDC format, and minimize the detected violations, with the existing analytic placer which implements pre-packing and multi-layer density analysis. First of all, a static timing analyzer has been used to check the timing of the wire-length minimized placement results. In order to minimize the detected violations, a function to minimize the largest arrival time at end points is added to the objective function of the analytic placer. Since each clock has a different period, the function is proposed to be evaluated for each clock, and added to the objective function. Since this function can unnecessarily reduce the unviolated paths, a new function which calculates and minimizes the largest negative slack at end points is also proposed, and compared. Since the existing legalization which is non-timing driven is used before the timing analysis, any improvement on timing is entirely due to the functions added to the objective function. The experiments on twelve industrial examples show that the minimum arrival time function improves the worst negative slack by 15% on average whereas the minimum worst negative slack function improves the negative slacks by additional 6% on average.

An Analysis of the Roles of Experience in Information System Continuance (정보시스템의 지속적 사용에서 경험의 역할에 대한 분석)

  • Lee, Woong-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.4
    • /
    • pp.45-62
    • /
    • 2011
  • The notion of information systems (IS) continuance has recently emerged as one of the most important research issues in the field of IS. A great deal of research has been conducted thus far on the basis of theories adapted from various disciplines including consumer behaviors and social psychology, in addition to theories regarding information technology (IT) acceptance. This previous body of knowledge provides a robust research framework that can already account for the determination of IS continuance; however, this research points to other, thus-far-unelucidated determinant factors such as habit, which were not included in traditional IT acceptance frameworks, and also re-emphasizes the importance of emotion-related constructs such as satisfaction in addition to conscious intention with rational beliefs such as usefulness. Experiences should also be considered one of the most important factors determining the characteristics of information system (IS) continuance and the features distinct from those determining IS acceptance, because more experienced users may have more opportunities for IS use, which would allow them more frequent use than would be available to less experienced or non-experienced users. Interestingly, experience has dual features that may contradictorily influence IS use. On one hand, attitudes predicated on direct experience have been shown to predict behavior better than attitudes from indirect experience or without experience; as more information is available, direct experience may render IS use a more salient behavior, and may also make IS use more accessible via memory. Therefore, experience may serve to intensify the relationship between IS use and conscious intention with evaluations, On the other hand, experience may culminate in the formation of habits: greater experience may also imply more frequent performance of the behavior, which may lead to the formation of habits, Hence, like experience, users' activation of an IS may be more dependent on habit-that is, unconscious automatic use without deliberation regarding the IS-and less dependent on conscious intentions, Furthermore, experiences can provide basic information necessary for satisfaction with the use of a specific IS, thus spurring the formation of both conscious intentions and unconscious habits, Whereas IT adoption Is a one-time decision, IS continuance may be a series of users' decisions and evaluations based on satisfaction with IS use. Moreover. habits also cannot be formed without satisfaction, even when a behavior is carried out repeatedly. Thus, experiences also play a critical role in satisfaction, as satisfaction is the consequence of direct experiences of actual behaviors. In particular, emotional experiences such as enjoyment can become as influential on IS use as are utilitarian experiences such as usefulness; this is especially true in light of the modern increase in membership-based hedonic systems - including online games, web-based social network services (SNS), blogs, and portals-all of which attempt to provide users with self-fulfilling value. Therefore, in order to understand more clearly the role of experiences in IS continuance, analysis must be conducted under a research framework that includes intentions, habits, and satisfaction, as experience may not only have duration-based moderating effects on the relationship between both intention and habit and the activation of IS use, but may also have content-based positive effects on satisfaction. This is consistent with the basic assumptions regarding the determining factors in IS continuance as suggested by Oritz de Guinea and Markus: consciousness, emotion, and habit. The principal objective of this study was to explore and assess the effects of experiences in IS continuance, with special consideration given to conscious intentions and unconscious habits, as well as satisfaction. IN service of this goal, along with a review of the relevant literature regarding the effects of experiences and habit on continuous IS use, this study suggested a research model that represents the roles of experience: its moderating role in the relationships of IS continuance with both conscious intention and unconscious habit, and its antecedent role in the development of satisfaction. For the validation of this research model. Korean university student users of 'Cyworld', one of the most influential social network services in South Korea, were surveyed, and the data were analyzed via partial least square (PLS) analysis to assess the implications of this study. In result most hypotheses in our research model were statistically supported with the exception of one. Although one hypothesis was not supported, the study's findings provide us with some important implications. First the role of experience in IS continuance differs from its role in IS acceptance. Second, the use of IS was explained by the dynamic balance between habit and intention. Third, the importance of satisfaction was confirmed from the perspective of IS continuance with experience.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.

Brand Equity and Purchase Intention in Fashion Products: A Cross-Cultural Study in Asia and Europe (상표자산과 구매의도와의 관계에 관한 국제비교연구 - 아시아와 유럽의 의류시장을 중심으로 -)

  • Kim, Kyung-Hoon;Ko, Eun-Ju;Graham, Hooley;Lee, Nick;Lee, Dong-Hae;Jung, Hong-Seob;Jeon, Byung-Joo;Moon, Hak-Il
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.4
    • /
    • pp.245-276
    • /
    • 2008
  • Brand equity is one of the most important concepts in business practice as well as in academic research. Successful brands can allow marketers to gain competitive advantage (Lassar et al.,1995), including the opportunity for successful extensions, resilience against competitors' promotional pressures, and the ability to create barriers to competitive entry (Farquhar, 1989). Branding plays a special role in service firms because strong brands increase trust in intangible products (Berry, 2000), enabling customers to better visualize and understand them. They reduce customers' perceived monetary, social, and safety risks in buying services, which are obstacles to evaluating a service correctly before purchase. Also, a high level of brand equity increases consumer satisfaction, repurchasing intent, and degree of loyalty. Brand equity can be considered as a mixture that includes both financial assets and relationships. Actually, brand equity can be viewed as the value added to the product (Keller, 1993), or the perceived value of the product in consumers' minds. Mahajan et al. (1990) claim that customer-based brand equity can be measured by the level of consumers' perceptions. Several researchers discuss brand equity based on two dimensions: consumer perception and consumer behavior. Aaker (1991) suggests measuring brand equity through price premium, loyalty, perceived quality, and brand associations. Viewing brand equity as the consumer's behavior toward a brand, Keller (1993) proposes similar dimensions: brand awareness and brand knowledge. Thus, past studies tend to identify brand equity as a multidimensional construct consisted of brand loyalty, brand awareness, brand knowledge, customer satisfaction, perceived equity, brand associations, and other proprietary assets (Aaker, 1991, 1996; Blackston, 1995; Cobb-Walgren et al., 1995; Na, 1995). Other studies tend to regard brand equity and other brand assets, such as brand knowledge, brand awareness, brand image, brand loyalty, perceived quality, and so on, as independent but related constructs (Keller, 1993; Kirmani and Zeithaml, 1993). Walters(1978) defined information search as, "A psychological or physical action a consumer takes in order to acquire information about a product or store." But, each consumer has different methods for informationsearch. There are two methods of information search, internal and external search. Internal search is, "Search of information already saved in the memory of the individual consumer"(Engel, Blackwell, 1982) which is, "memory of a previous purchase experience or information from a previous search."(Beales, Mazis, Salop, and Staelin, 1981). External search is "A completely voluntary decision made in order to obtain new information"(Engel & Blackwell, 1982) which is, "Actions of a consumer to acquire necessary information by such methods as intentionally exposing oneself to advertisements, taking to friends or family or visiting a store."(Beales, Mazis, Salop, and Staelin, 1981). There are many sources for consumers' information search including advertisement sources such as the internet, radio, television, newspapers and magazines, information supplied by businesses such as sales people, packaging and in-store information, consumer sources such as family, friends and colleagues, and mass media sources such as consumer protection agencies, government agencies and mass media sources. Understanding consumers' purchasing behavior is a key factor of a firm to attract and retain customers and improving the firm's prospects for survival and growth, and enhancing shareholder's value. Therefore, marketers should understand consumer as individual and market segment. One theory of consumer behavior supports the belief that individuals are rational. Individuals think and move through stages when making a purchase decision. This means that rational thinkers have led to the identification of a consumer buying decision process. This decision process with its different levels of involvement and influencing factors has been widely accepted and is fundamental to the understanding purchase intention represent to what consumers think they will buy. Brand equity is not only companies but also very important asset more than product itself. This paper studies brand equity model and influencing factors including information process such as information searching and information resources in the fashion market in Asia and Europe. Information searching and information resources are influencing brand knowledge that influences consumers purchase decision. Nine research hypotheses are drawn to test the relationships among antecedents of brand equity and purchase intention and relationships among brand knowledge, brand value, brand attitude, and brand loyalty. H1. Information searching influences brand knowledge positively. H2. Information sources influence brand knowledge positively. H3. Brand knowledge influences brand attitude. H4. Brand knowledge influences brand value. H5. Brand attitude influences brand loyalty. H6. Brand attitude influences brand value. H7. Brand loyalty influences purchase intention. H8. Brand value influence purchase intention. H9. There will be the same research model in Asia and Europe. We performed structural equation model analysis in order to test hypotheses suggested in this study. The model fitting index of the research model in Asia was $X^2$=195.19(p=0.0), NFI=0.90, NNFI=0.87, CFI=0.90, GFI=0.90, RMR=0.083, AGFI=0.85, which means the model fitting of the model is good enough. In Europe, it was $X^2$=133.25(p=0.0), NFI=0.81, NNFI=0.85, CFI=0.89, GFI=0.90, RMR=0.073, AGFI=0.85, which means the model fitting of the model is good enough. From the test results, hypotheses were accepted. All of these hypotheses except one are supported. In Europe, information search is not an antecedent of brand knowledge. This means that sales of global fashion brands like jeans in Europe are not expanding as rapidly as in Asian markets such as China, Japan, and South Korea. Young consumers in European countries are not more brand and fashion conscious than their counter partners in Asia. The results have theoretical, practical meaning and contributions. In the fashion jeans industry, relatively few studies examining the viability of cross-national brand equity has been studied. This study provides insight on building global brand equity and suggests information process elements like information search and information resources are working differently in Asia and Europe for fashion jean market.

  • PDF

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.