Search | Korea Science

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
- KIISE Transactions on Computing Practices
- /
- v.23 no.2
- /
- pp.128-133
- /
- 2017
In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.
https://doi.org/10.5626/KTCP.2017.23.2.128 인용 KSCI

Selection of the economically optimal parameters in the EWMA control chart (지수가중이동평균관리도의 경제적 최적모수의 선정)

박창순;원태연
- The Korean Journal of Applied Statistics
- /
- v.9 no.1
- /
- pp.91-109
- /
- 1996
Exponentially weighted moving averae(EWMA) control chart has been used widely for process monitoring and process adjustment recently, but there has not been many studies about the selection of the parameters. Design of the control chart can be classified into the statistical design and the economic design. The purpose of the economic design is to minimize the cost function in which all the possible costs occurring during the process are probability given the Type I error probability. In this paper the optimal parameters of the EWMA chart are selected for the economic design as well as for the statistical design. The optimal parameters for the economic design show significantly different from those of the statistical design, and especially the weight is always larger than that used in the statistical design. In the economic design, we divide the model into the single assignable cause model and the multiple assignable causes model caacording to number of which is used as the average context of the multiple assignable causes, it shows that the selection of the parameters may be misleading when the multiple assignable causes exist in practice.
PDF

Robust Stereo Matching under Radiometric Change based on Weighted Local Descriptor (광량 변화에 강건한 가중치 국부 기술자 기반의 스테레오 정합)

Koo, Jamin;Kim, Yong-Ho;Lee, Sangkeun
- Journal of the Institute of Electronics and Information Engineers
- /
- v.52 no.4
- /
- pp.164-174
- /
- 2015
In a real scenario, radiometric change has frequently occurred in the stereo image acquisition process using multiple cameras with geometric characteristics or moving a single camera because it has different camera parameters and illumination change. Conventional stereo matching algorithms have a difficulty in finding correct corresponding points because it is assumed that corresponding pixels have similar color values. In this paper, we present a new method based on the local descriptor reflecting intensity, gradient and texture information. Furthermore, an adaptive weight for local descriptor based on the entropy is applied to estimate correct corresponding points under radiometric variation. The proposed method is tested on Middlebury datasets with radiometric changes, and compared with state-of-the-art algorithms. Experimental result shows that the proposed scheme outperforms other comparison algorithms around 5% less matching error on average.
https://doi.org/10.5573/ieie.2015.52.4.164 인용 PDF KSCI

Development of an Incident Detection Algorithm by Using Traffic Flow Pattern (이력패턴데이터를 이용한 돌발상황 감지알고리즘 개발)

Heo, Min-Guk;No, Chang-Gyun;Kim, Won-Gil;Son, Bong-Su
- Journal of Korean Society of Transportation
- /
- v.28 no.6
- /
- pp.7-15
- /
- 2010
Research of this paper focused on developing and demonstrating of algorithm with the figures of difference between historical traffic pattern data and real-time traffic data to decide on what the incident is. The aim of this dissertation is to develop incident detection algorithm which can be understood and modified easier to operate. To establish traffic pattern of this algorithm, weighted moving average method was applied. The basis of this method was traffic volume and speed of the same day and time at the same location based on 30-second raw data. The model was completed by a serious of steps of process-screening process of error data, decision of the traffic condition, comparison with pattern data, decision of incident circumstances, continuity test. A variety of parameter value was applied to select reasonable parameter. Results of application of the algorithm came out with figures of average detection rate 94.7 percent, 0.8 percent rate of misinformation and the average detection time 1.6 minutes. With these following results, the detection rate turned out to be superior compared with result of existing model. Applying the concept of traffic patterns was useful to gain excellent results of this study. Also, this study is significant in terms of making algorithm which theorized the decision process of actual operators.
PDF KSCI

A Meta-analysis of Ambient Air Pollution in Relation to Daily Mortality in Seoul, $1991\sim1995$ (메타분석 방법을 적용한 서울시 대기오염과 조기사망의 상관성 연구 (1991년$\sim$1995년))

Dockery, Douglas W.;Kim, Chun-Bae;Jee, Sun-Ha;Chung, Yong;Lee, Jong-Tae
- Journal of Preventive Medicine and Public Health
- /
- v.32 no.2
- /
- pp.177-182
- /
- 1999
Objectives: To reexamine the association between air pollution and daily mortality in Seoul, Korea using a method of meta-analysis with the data filed for 1991 through 1995. Methods: A separate Poisson regression analysis on each district within the metropolitan area of Seoul was conducted to regress daily death counts on levels of each ambient air pollutant, such as total suspended particulates (TSP), sulfur dioxide $(SO_2)$, and ozone $(O_3)$, controlling for variability in the weather condition. We calculated a weighted mean as a meta-analysis summary of the estimates and its standard error. Results: We found that the p value from each pollutant model to test the homogeneity assumption was small (p<0.01) because of the large disparity among district-specific estimates. Therefore, all results reported here were estimated from the random effect model. Using the weighted mean that we calculated, the mortality at a $100{\mu}g/m^3$ increment in a 3-day moving average of TSP levels was 1.034 (95% Cl 1.009-1.059). The mortality was estimated to increase 6% (95% Cl 3-10%) and 3% (95% Cl 0-6%) with each 50 ppb increase for 9-day moving average of SO2 and 1-hr maximum O3, respectively. Conclusions: Like most of air pollution epidemiologic studies, this meta-analysis cannot avoid fleeing from measurement misclassification since no personal measurement was taken. However, we can expect that a measurement bias be reduced in a district-specific estimate since a monitoring station is hefter representative cf air quality of the matched district. The similar results to those from the previous studios indicated existence of health effect of air pollution at current levels in many industrialized countries, including Korea.
PDF

Analysis on Characteristics of Variation in Flood Flow by Changing Order of Probability Weighted Moments (확률가중모멘트의 차수 변화에 따른 홍수량 변동 특성 분석)

Maeng, Seung-Jin;Hwang, Ju-Ha
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.10 no.5
- /
- pp.1009-1019
- /
- 2009
In this research, various characteristics of South Korea's design flood have been examined by deriving appropriate design flood, using data obtained from careful observation of actual floods occurring in selected main watersheds of the nation. 19 watersheds were selected for research in Korea. The various characteristics of annual rainfall were analyzed by using a moving average method. The frequency analysis was decided to be performed on the annual maximum flood of succeeding one year as a reference year. For the 19 watersheds, tests of basic statistics, independent, homogeneity, and outlier were calculated per period of annual maximum flood series. By performing a test using the LH-moment ratio diagram and the Kolmogorov-Smirnov (K-S) test, among applied distributions of Gumbel (GUM), Generalized Extreme Value (GEV), Generalized Logistic (GLO) and Generalized Pareto (GPA) distribution was found to be adequate compared with other probability distributions. Parameters of GEV distribution were estimated by L, L1, L2, L3 and L4-moment method based on the change in the order of probability weighted moments. Design floods per watershed and the periods of annual maximum flood series were derived by GEV distribution. According to the result of the analysis performed by using variation rate used in this research, it has been concluded that the time for changing the design conditions to ensure the proper hydraulic structure that considers recent climate changes of the nation brought about by global warming should be around the year 2002.
https://doi.org/10.5762/KAIS.2009.10.5.1009 인용 PDF

A Derivation of the Representative Unit Hydrograph from Multiperiod Complex Storm by Linear Programming (선형계획법(線型計劃法)에 의한 대표단위도(代表單位圖) 유도(誘導))

Kwon, Oh Hun;Ryu, Tae Sang;Yoo, Ju Hwan
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.13 no.2
- /
- pp.173-182
- /
- 1993
This paper presents an algorithm to derive the representative unit hydrograph for the real environment of a watershed. For a given watershed, the conventional methods give several different unit hydrographs by storm events. In this study the LP model is somewhat modified based on the previous study by Mays et also as follows: the objective function is designed to minimize the sum of weighted residuals. An additional constraint of moving average is added to prevent the unit hydrograph from the occurence of oscillation which was not active in Mays's paper. Configuration of rainfall matrix was improved to reduce its dimension in accordance with Diskin's review point. In spite of the superiority of LP approach in terms of representativeness, all the methods were very sensitive to the validity of baseflow separation and rainfall-loss. Several methods of the separations for rainfall excesses and direct runoffs were applied and no preferred methods were identified. This is the matter of judgement considering catchment and rainfall characteristics. This algorithm was applied to a real watershed of the Wi stream in the Nak-dong river. Compared with the IHP results by conventional methods, this optimized representative unit hydrograph demonstrated relatively smaller and shorter values in terms of the peak discharge and the basin lag respectively, and the oscillation of its falling limb successfully eliminated owing to the additional constraints of moving averages.
PDF

Analysis of Spatial Changes in the Forest Landscape of the Upper Reaches of Guem River Dam Basin according to Land Cover Change (토지피복변화에 따른 금강 상류 댐 유역 산림 경관의 구조적 변화 분석)

Kyeong-Tae Kim;Hyun-Jung Lee;Whee-Moon Kim;Won-Kyong Song
- Korean Journal of Environment and Ecology
- /
- v.37 no.4
- /
- pp.289-301
- /
- 2023
Forests within watersheds are essential in maintaining ecosystems and are the central infrastructure for constructing an ecological network system. However, due to indiscriminate development projects carried out over past decades, forest fragmentation and land use changes have accelerated, and their original functions have been lost. Since a forest's structural pattern directly impacts ecological processes and functions in understanding forest ecosystems, identifying and analyzing change patterns is essential. Therefore, this study analyzed structural changes in the forest landscape according to the time-series land cover changes using the FRAGSTATS model for the dam watershed of the Geum River upstream. Land cover changes in the dam watershed of the Geum River upstream through land cover change detection showed an increase of 33.12 square kilometers (0.62%) of forests and 67.26 square kilometers (1.26%) of urbanized dry areas and a decrease of 148.25 square kilometers (2.79%) in agricultural areas from the 1980s to the 2010s. The results of no-sampling forest landscape analysis within the watershed indicated landscape percentage (PLAND), area-weighted proximity index (CONTIG_AM), average central area (CORE_MN), and adjacency index (PLADJ) increased, and the number of patches (NP), landscape shape index (LSI), and cohesion index (COHESION) decreased. Identification of structural change patterns through a moving window analysis showed the forest landscape in Sangju City, Gyeongsangbuk Province, Boeun County in Chungcheongbuk Province, and Jinan Province in Jeollabuk Province was relatively well preserved, but fragmentation was ongoing at the border between Okcheon County in Chungcheongbuk Province, Yeongdong and Geumsan Counties in Chungcheongnam Province, and the forest landscape in areas adjacent to Muju and Jangsu Counties in Jeollabuk Province. The results indicate that it is necessary to establish afforestation projects for fragmented areas when preparing a future regional forest management strategy. This study derived areas where fragmentation of forest landscapes is expected and the results may be used as basic data for assessing the health of watershed forests and establishing management plans.
https://doi.org/10.13047/KJEE.2023.37.4.289 인용 PDF

Development of Bus Arrival Time Estimation Model by Unit of Route Group (노선그룹단위별 버스도착시간 추정모형 연구)

No, Chang-Gyun;Kim, Won-Gil;Son, Bong-Su
- Journal of Korean Society of Transportation
- /
- v.28 no.1
- /
- pp.135-142
- /
- 2010
The convenient techniques for predicting the bus arrival time have used the data obtained from the buses belong to the same company only. Consequently, the conventional techniques have often failed to predict the bus arrival time at the downstream bus stops due to the lack of the data during congestion time period. The primary objective of this study is to overcome the weakness of the conventional techniques. The estimation model developed based on the data obtained from Bus Information System(BIS) and Bus management System(BMS). The proposed model predicts the bus arrival time at bus stops by using the data of all buses travelling same roadway section during the same time period. In the tests, the proposed model had a good accuracy of predicting the bus arrival time at the bus stops in terms of statistical measurements (e.g., root mean square error). Overall, the empirical results were very encouraging: the model maintains a prediction job during the morning and evening peak periods and delivers excellent results for the severely congested roadways that are of the most practical interest.
PDF KSCI

Congestion Degree Based Available Bandwidth Estimation Method for Enhancement of UDT Fairness (UDT 플로우 간 공평성 향상을 위한 혼잡도 기반의 가용대역폭 추정 기법)

Park, Jongseon;Jang, Hyunhee;Cho, Gihwan
- Journal of the Institute of Electronics and Information Engineers
- /
- v.52 no.7
- /
- pp.63-73
- /
- 2015
In the end to end data transfer protocols, it is very important to correctly estimate available bandwidth. In UDT (UDP based Data Transfer), receiver estimates the MTR (Maximum Transfer Rate) of the current link using pair packets transmitted periodically from sender and, then sender finally decides the MTR through EWMA (Exponential Weighted Moving Average) algorithm. Here, MTR has to be exactly estimated because available bandwidth is calculated with difference of MTR and current transfer rate. However, when network is congested due to traffic load and where competing flows are coexisted, it bring about a severe fairness problem. This paper proposes a congestion degree based MTR estimation algorithm. Here, the congestion degree stands a relative index for current congestion status on bottleneck link, which is calculated with arriving intervals of a pair packets. The algorithm try to more classify depending on the congestion degree to estimate more actual available bandwidth. With the network simulation results, our proposed method showed that the fairness problem among the competing flows is significantly resolved in comparison with that of UDT.
https://doi.org/10.5573/ieie.2015.52.7.063 인용 PDF KSCI

Search Result 134, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)