• Title/Summary/Keyword: Data Clustering

Search Result 2,747, Processing Time 0.038 seconds

Traffic Attributes Correlation Mechanism based on Self-Organizing Maps for Real-Time Intrusion Detection (실시간 침입탐지를 위한 자기 조직화 지도(SOM)기반 트래픽 속성 상관관계 메커니즘)

  • Hwang, Kyoung-Ae;Oh, Ha-Young;Lim, Ji-Young;Chae, Ki-Joon;Nah, Jung-Chan
    • The KIPS Transactions:PartC
    • /
    • v.12C no.5 s.101
    • /
    • pp.649-658
    • /
    • 2005
  • Since the Network based attack Is extensive in the real state of damage, It is very important to detect intrusion quickly at the beginning. But the intrusion detection using supervised learning needs either the preprocessing enormous data or the manager's analysis. Also it has two difficulties to detect abnormal traffic that the manager's analysis might be incorrect and would miss the real time detection. In this paper, we propose a traffic attributes correlation analysis mechanism based on self-organizing maps(SOM) for the real-time intrusion detection. The proposed mechanism has three steps. First, with unsupervised learning build a map cluster composed of similar traffic. Second, label each map cluster to divide the map into normal traffic and abnormal traffic. In this step there is a rule which is created through the correlation analysis with SOM. At last, the mechanism would the process real-time detecting and updating gradually. During a lot of experiments the proposed mechanism has good performance in real-time intrusion to combine of unsupervised learning and supervised learning than that of supervised learning.

Estimation of Drought Rainfall by Regional Frequency Analysis Using L and LH-Moments (II) - On the method of LH-moments - (L 및 LH-모멘트법과 지역빈도분석에 의한 가뭄우량의 추정 (II)- LH-모멘트법을 중심으로 -)

  • Lee, Soon-Hyuk;Yoon , Seong-Soo;Maeng , Sung-Jin;Ryoo , Kyong-Sik;Joo , Ho-Kil;Park , Jin-Seon
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.46 no.5
    • /
    • pp.27-39
    • /
    • 2004
  • In the first part of this study, five homogeneous regions in view of topographical and geographically homogeneous aspects except Jeju and Ulreung islands in Korea were accomplished by K-means clustering method. A total of 57 rain gauges were used for the regional frequency analysis with minimum rainfall series for the consecutive durations. Generalized Extreme Value distribution was confirmed as an optimal one among applied distributions. Drought rainfalls following the return periods were estimated by at-site and regional frequency analysis using L-moments method. It was confirmed that the design drought rainfalls estimated by the regional frequency analysis were shown to be more appropriate than those by the at-site frequency analysis. In the second part of this study, LH-moment ratio diagram and the Kolmogorov-Smirnov test on the Gumbel (GUM), Generalized Extreme Value (GEV), Generalized Logistic (GLO) and Generalized Pareto (GPA) distributions were accomplished to get optimal probability distribution. Design drought rainfalls were estimated by both at-site and regional frequency analysis using LH-moments and GEV distribution, which was confirmed as an optimal one among applied distributions. Design rainfalls were estimated by at-site and regional frequency analysis using LH-moments, the observed and simulated data resulted from Monte Carlotechniques. Design drought rainfalls derived by regional frequency analysis using L1, L2, L3 and L4-moments (LH-moments) method have shown higher reliability than those of at-site frequency analysis in view of RRMSE (Relative Root-Mean-Square Error), RBIAS (Relative Bias) and RR (Relative Reduction) for the estimated design drought rainfalls. Relative efficiency were calculated for the judgment of relative merits and demerits for the design drought rainfalls derived by regional frequency analysis using L-moments and L1, L2, L3 and L4-moments applied in the first report and second report of this study, respectively. Consequently, design drought rainfalls derived by regional frequency analysis using L-moments were shown as more reliable than those using LH-moments. Finally, design drought rainfalls for the classified five homogeneous regions following the various consecutive durations were derived by regional frequency analysis using L-moments, which was confirmed as a more reliable method through this study. Maps for the design drought rainfalls for the classified five homogeneous regions following the various consecutive durations were accomplished by the method of inverse distance weight and Arc-View, which is one of GIS techniques.

Design and Implementation of Unified Index for Moving Objects Databases (이동체 데이타베이스를 위한 통합 색인의 설계 및 구현)

  • Park Jae-Kwan;An Kyung-Hwan;Jung Ji-Won;Hong Bong-Hee
    • Journal of KIISE:Databases
    • /
    • v.33 no.3
    • /
    • pp.271-281
    • /
    • 2006
  • Recently the need for Location-Based Service (LBS) has increased due to the development and widespread use of the mobile devices (e.g., PDAs, cellular phones, labtop computers, GPS, and RFID etc). The core technology of LBS is a moving-objects database that stores and manages the positions of moving objects. To search for information quickly, the database needs to contain an index that supports both real-time position tracking and management of large numbers of updates. As a result, the index requires a structure operating in the main memory for real-time processing and requires a technique to migrate part of the index from the main memory to disk storage (or from disk storage to the main memory) to manage large volumes of data. To satisfy these requirements, this paper suggests a unified index scheme unifying the main memory and the disk as well as migration policies for migrating part of the index from the memory to the disk during a restriction in memory space. Migration policy determines a group of nodes, called the migration subtree, and migrates the group as a unit to reduce disk I/O. This method takes advantage of bulk operations and dynamic clustering. The unified index is created by applying various migration policies. This paper measures and compares the performance of the migration policies using experimental evaluation.

Characteristics of Occupational Skin Disease Reported by Surveillance System (감시체계를 통하여 보고된 직업성 피부질환의 특성에 관한 연구 - 사업장, 특수건강진단기관, 피부과의사의 보고사례를 중심으로 기술 -)

  • Kim, Hyoung-Ok;Lee, Jun-Young;Jung, Ho-Keun;Ahn, Yeon-Soon
    • Journal of Preventive Medicine and Public Health
    • /
    • v.32 no.2
    • /
    • pp.130-140
    • /
    • 1999
  • Objectives: This study was carried out to estimate the magnitude of skin disease related to occupation and to find out the characteristics of it. Methods: We collected and analyzed the cases of occupational skin disease reported by surveillance system composed of doctors and nurses in 150 enterprises with dispensary or attacked hospital and physicians in 92 specific health examination institutes and 150 dermatologists from May to November, 1998. Results: Among members of surveillance system, 66 enterprises and 47 specific health examination institutes and 55 dermatologists reported 571 cases of occupational skin disease in 512 workers. Excepting 81 cases reported by dermatologists, We analyzed 490 cases reported by enterprises and specific health examination institutes. Among 490 cases, contact dermatitis was most common(368 cases, 75.1%) and the second was hyper or hypopigmentation(36 cases, 7.3%). When we analyzed the characteristics of workers with occupational contact dermatitis, male workers were 281 (79.2%) and female were 74(20.8%). 165 workers(64.5%) had chronic skin disease with repeated cure and relapse. 245 workers(72.5%) answered positively that their coworkers had similar skin disease. 27 workers(8.7%) experienced absence due to contact dermatitis related to occupation. To analyze the type of industries of workers with occupational contact dermatitis, automobile and trailer manufacturing industry was most common(105 cases, 29.6%) and the second was manufacturing industry for image, sound and communication equipment(55 cases, 15.5%). Organic solvent(183 cases, 46.7%) was the most common treating material of workers with contact dermatitis and the second was various kinds of chemicals(59cases, 15.1%). Conclusions: This is the first study using nationwide surveillance system to collect data of occupational skin disease. We found that many workers had skin disease related to occupation and characteristics of occupational skin disease were chronic and clustering. Therefore, we had to establish counterplan to manage occupational skin disease and to operate surveillance system to identify trends of occupational skin disease, continuously.

  • PDF

Spatio-temporal Analysis of Freeway Emissions for Establishing Public Health Policies Based on Transportation (교통기반 공공보건 정책 수립을 위한 고속도로 차량배출가스 시공간 패턴분석)

  • LEE, Seol Young;JOO, Shinhye;YOUN, Seok Min;OH, Cheol
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.5
    • /
    • pp.377-393
    • /
    • 2016
  • Vehicle emissions have been known as a critical factor to give a negative impact on the public health. In particular, particulate matters(PM) and NOx are highly related with respiratory diseases such as asthma. This study aimed at analyzing spatio-temporal patterns of PM and NOx generated from urban freeway traffic. MOVES, which is a well-known emission analysis tool presented by US Environmental Protection Agency(EPA), was applied to estimate PM and NOx based on traffic volume and speed data obtained from Seoul Outer Ring Expressway during January~June, 2012. K-means clustering analysis was used for categorizing the Level of Vehicle Emissions(LOVE) to support more systematical identification of the significance of emissions. Then, spatio-temporal analyses of estimated emissions were conducted by LOVE. Finally, this study proposed a set of strategies to reduce both PM and NOx to enhance public health based on analysis results.

Phylogenetic Relationship of Ligularia Species Based on RAPD and ITS Sequences Analyses (RAPD 및 ITS 염기서열 분석을 이용한 곰취 속(Ligularia) 식물의 유연관계 분석)

  • Ahn, Soon-Young;Cho, Kwang-Soo;Yoo, Ki-Oug;Suh, Jong-Taek
    • Horticultural Science & Technology
    • /
    • v.28 no.4
    • /
    • pp.638-647
    • /
    • 2010
  • The genetic relationships in 5 species of $Ligularia$ were investigated using RAPD (Randomly Amplified Polymorphic DNA) and ITS (Internal Transcribed Spacer) sequences analyses. In RAPD analysis, sixty three of 196 arbitrary primers showed polymorphism. The amplified fragments ranged from 0.2 to 1.6 kb in size. The dendrogram was constructed by the UPGMA clustering algorithm based on genetic similarity of RAPD markers. A total of 16 accessions were classified into 5 major groups corresponding each species at the similarity coefficient value of 0.77. In the ITS sequence analysis, the size of ITS 1 was varied from 248 to 256 bp, while ITS 2 was varied from 220 to 222 bp. The 5.8S coding region was 164 bp in lengths. Forty nine sites (10.2%) of the 478 nucleotides were variable, and the G+C content of ITS region ranged from 49.4 to 53.5%. In the ITS tree, five species of $Ligularia$ were monophyletic, and $L.$ $taquetii$ was the first branching within the clade. $Ligularia$ $intermedia$ formed a clade with $L.$ $fischeri$ var. $spiciformis$ (BS=79), and $L.$ $stenocephala$ and $L.$ $fischeri$ were also claded. Two data sets were congruent, except of the position of $L.$ $fischeri$ var. $spiciformis$.

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

  • Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1458-1467
    • /
    • 2010
  • Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.

A Study on Efficient Access Point Installation Based on Fixed Radio Wave Radius for WSN Configuration at Subway Station (지하철 역사 내 WSN 환경구축을 위한 고정 전파범위 기반의 효율적인 AP설치에 관한 연구)

  • An, Taeki;Ahn, Chihyung;Lee, Youngseok;Nam, Myungwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.7
    • /
    • pp.740-748
    • /
    • 2016
  • IT and communication technologies has contributed significantly to the convenience of passengers and the financial management of stations in accordance with the task automation in the field of the urban railway system. The foundation of the above development is based on the large amounts of data from various sensors installed in railways, trains, and stations. In particular, the sensor network that is installed in the station and train has played an important role in the railway information system. The performance of AP is affected by the number of APs and their locations installed in the station. In the installation of APs in stations, the intensity of the radio wave of the AP on its underlying position is considered to determine the number and position of APs. This paper proposes a method to estimate the number of APs and their position based on the structure of the underlying station and implemented a simulator to simulate the performance of the proposed method. The implemented simulator was applied to the decision of AP installation at Busan Seomyeon station to evaluate its performance.

A Study for Development of Ratio Beale Measuring Pain Using Korean Pain Tersm (통증어휘를 이용한 통증비율척도의 개발연구)

  • 이은옥;윤순녕;송미순
    • Journal of Korean Academy of Nursing
    • /
    • v.14 no.2
    • /
    • pp.93-111
    • /
    • 1984
  • The main purpose of this study is to develop a ratio scale measuring level of pain using Korean pain terms. The specific purposes of this study are to identify the degree of pain of each pain term in each subclass: to classify each subclass in terms of dimensions of pain; and to analyze factors of the Korean pain ratio scale clustering together. One hundred an4 fifty eight pain terms which were originally identified as representative terms and their synonyms were used for data collection. Fifty eight nursing professors ana sixty one medical doctors who have contacted with patients having pain were asked to rate the weight of each pain term on a visual analogue scale. Subclasses in which ranks of pain terms were same f s findings in two previous studies were 1) thermal 3 am 2) cavity pressure, 3) single stimulating pain, 4) radiation pain. and 5) chemical pain. Subclasses in which ranks of pain terms were confused were 1) incisive pressure, and 2) cold pain. Subclasses in which one new pain term was added were 1) inflammatory-repeated pain, 2) punctuate pressure, 3) constrictive pressure, 4) fatigue-related pressure, and 5) suffering-relate4 pain. Subclasses in which two new pain terms were added were 1) traction pressure, 2) peripheral nerve pain, 3) dull pain, 4) pulsation-related pain, 5) digestion-related pain, 6) tract pain, and 7) punishment-related pain. Subclass in which 3 new pain terms were included was fear-related pain. Rating scores of 5 words in 4 subclasses were significantly different between the normal group and the extreme group of subjects in terms of subjective rating. Only one word among 6 words was that newly added to the scale. Rating scores of 12 words in 9 subclasses were significantly different between doctor group and nursing professor group. Among these 12 words, only 3 were those newly added to the scale. In comparison of these 12 words, mean scores of the nursing professors were always 7 to 16 points higher than those of the medical doctors. In the analysis of judgement of subjects in terms of dimensions of pain terms, subclasses of dull pain, cavity pressure, tract pain and cold pain were suggested to be included in the miscellaneous dimension. As a result of factor analysis of the ratings given to 96 pain words using principal components analysis without iteration and with varimax rotation limiting the number of factors to 4, factors of severe pain (factor I) mild-moderate pain (factor II) , causative pain (factor III) and temperature-related pain(factor IV) were extracted with the factor loading above 0.388. When the pain words were re-arranged on the bases of factor loading above 0.368, number of factors decreased to only first two factors. Maximum score of pain word in factor II was 46.17 and the minimum score of the factor I was 45.36. Further studies are needed to identify the validity, reliability, sensitivity and practicability of this ratio scale using patients having various sources of pain.

  • PDF

Submarket Identification in Property Markets: Focusing on a Hedonic Price Model Improvement (부동산 하부시장 구획: 헤도닉 모형의 개선을 중심으로)

  • Lee, Chang Ro;Eum, Young Seob;Park, Key Ho
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.3
    • /
    • pp.405-422
    • /
    • 2014
  • Two important issues in hedonic model are to specify accurate model and delineate submarkets. While the former has experienced much improvement over recent decades, the latter has received relatively little attention. However, the accuracy of estimates from hedonic model will be necessarily reduced when the analysis does not adequately address market segmentation which can capture the spatial scale of price formation process in real estate. Placing emphasis on improvement of performance in hedonic model, this paper tried to segment real estate markets in Gangnam-gu and Jungrang-gu, which correspond to most heterogeneous and homogeneous ones respectively in 25 autonomous districts of Seoul. First, we calculated variable coefficients from mixed geographically weighted regression model (mixed GWR model) as input for clustering, since the coefficient from hedonic model can be interpreted as shadow price of attributes constituting real estate. After that, we developed a spatially constrained data-driven methodology to preserve spatial contiguity by utilizing the SKATER algorithm based on a minimum spanning tree. Finally, the performance of this method was verified by applying a multi-level model. We concluded that submarket does not exist in Jungrang-gu and five submarkets centered on arterial roads would be reasonable in Gangnam-gu. Urban infrastructure such as arterial roads has not been considered an important factor for delineating submarkets until now, but it was found empirically that they play a key role in market segmentation.

  • PDF