• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 547, Processing Time 0.027 seconds

Design of Fuzzy Prediction System based on Dual Tuning using Enhanced Genetic Algorithms (강화된 유전알고리즘을 이용한 이중 동조 기반 퍼지 예측시스템 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.1
    • /
    • pp.184-191
    • /
    • 2010
  • Many researchers have been considering genetic algorithms to system optimization problems. Especially, real-coded genetic algorithms are very effective techniques because they are simpler in coding procedures than binary-coded genetic algorithms and can reduce extra works that increase the length of chromosome for wide search space. Thus, this paper presents a fuzzy system design technique to improve the performance of the fuzzy system. The proposed system consists of two procedures. The primary tuning procedure coarsely tunes fuzzy sets of the system using the k-means clustering algorithm of which the structure is very simple, and then the secondary tuning procedure finely tunes the fuzzy sets using enhanced real-coded genetic algorithms based on the primary procedure. In addition, this paper constructs multiple fuzzy systems using a data preprocessing procedure which is contrived for reflecting various characteristics of nonlinear data. Finally, the proposed fuzzy system is applied to the field of time series prediction and the effectiveness of the proposed techniques are verified by simulations of typical time series examples.

Design of Multiple Model Fuzzy Predictors using Data Preprocessing and its Application (데이터 전처리를 이용한 다중 모델 퍼지 예측기의 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.1
    • /
    • pp.173-180
    • /
    • 2009
  • It is difficult to predict non-stationary or chaotic time series which includes the drift and/or the non-linearity as well as uncertainty. To solve it, we propose an effective prediction method which adopts data preprocessing and multiple model TS fuzzy predictors combined with model selection mechanism. In data preprocessing procedure, the candidates of the optimal difference interval are determined based on the correlation analysis, and corresponding difference data sets are generated in order to use them as predictor input instead of the original ones because the difference data can stabilize the statistical characteristics of those time series and better reveals their implicit properties. Then, TS fuzzy predictors are constructed for multiple model bank, where k-means clustering algorithm is used for fuzzy partition of input space, and the least squares method is applied to parameter identification of fuzzy rules. Among the predictors in the model bank, the one which best minimizes the performance index is selected, and it is used for prediction thereafter. Finally, the error compensation procedure based on correlation analysis is added to improve the prediction accuracy. Some computer simulations are performed to verify the effectiveness of the proposed method.

Prediction System Design based on An Interval Type-2 Fuzzy Logic System using HCBKA (HCBKA를 이용한 Interval Type-2 퍼지 논리시스템 기반 예측 시스템 설계)

  • Bang, Young-Keun;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.30 no.A
    • /
    • pp.111-117
    • /
    • 2010
  • To improve the performance of the prediction system, the system should reflect well the uncertainty of nonlinear data. Thus, this paper presents multiple prediction systems based on Type-2 fuzzy sets. To construct each prediction system, an Interval Type-2 TSK Fuzzy Logic System and difference data were used, because, in general, it has been known that the Type-2 Fuzzy Logic System can deal with the uncertainty of nonlinear data better than the Type-1 Fuzzy Logic System, and the difference data can provide more steady information than that of original data. Also, to improve each rule base of the fuzzy prediction systems, the HCBKA (Hierarchical Correlation Based K-means clustering Algorithm) was applied because it can consider correlationship and statistical characteristics between data at a time. Subsequently, to alleviate complexity of the proposed prediction system, a system selection method was used. Finally, this paper analyzed and compared the performances between the Type-1 prediction system and the Interval Type-2 prediction system using simulations of three typical time series examples.

  • PDF

EDGE: An Enticing Deceptive-content GEnerator as Defensive Deception

  • Li, Huanruo;Guo, Yunfei;Huo, Shumin;Ding, Yuehang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1891-1908
    • /
    • 2021
  • Cyber deception defense mitigates Advanced Persistent Threats (APTs) with deploying deceptive entities, such as the Honeyfile. The Honeyfile distracts attackers from valuable digital documents and attracts unauthorized access by deliberately exposing fake content. The effectiveness of distraction and trap lies in the enticement of fake content. However, existing studies on the Honeyfile focus less on this perspective. In this work, we seek to improve the enticement of fake text content through enhancing its readability, indistinguishability, and believability. Hence, an enticing deceptive-content generator, EDGE, is presented. The EDGE is constructed with three steps: extracting key concepts with a semantics-aware K-means clustering algorithm, searching for candidate deceptive concepts within the Word2Vec model, and generating deceptive text content under the Integrated Readability Index (IR). Furthermore, the readability and believability performance analyses are undertaken. The experimental results show that EDGE generates indistinguishable deceptive text content without decreasing readability. In all, EDGE proves effective to generate enticing deceptive text content as deception defense against APTs.

Method for Estimating Intramuscular Fat Percentage of Hanwoo(Korean Traditional Cattle) Using Convolutional Neural Networks in Ultrasound Images

  • Kim, Sang Hyun
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.105-116
    • /
    • 2021
  • In order to preserve the seeds of excellent Hanwoo(Korean traditional cattle) and secure quality competitiveness in the infinite competition with foreign imported beef, production of high-quality Hanwoo beef is absolutely necessary. %IMF (Intramuscular Fat Percentage) is one of the most important factors in evaluating the value of high-quality meat, although standards vary according to food culture and industrial conditions by country. Therefore, it is required to develop a %IMF estimation algorithm suitable for Hanwoo. In this study, we proposed a method of estimating %IMF of Hanwoo using CNN in ultrasound images. First, the proposed method classified the chemically measured %IMF into 10 classes using k-means clustering method to apply CNN. Next, ROI images were obtained at regular intervals from each ultrasound image and used for CNN training and estimation. The proposed CNN model is composed of three stages of convolution layer and fully connected layer. As a result of the experiment, it was confirmed that the %IMF of Hanwoo was estimated with an accuracy of 98.2%. The correlation coefficient between the estimated %IMF and the real %IMF by the proposed method is 0.97, which is about 10% better than the 0.88 of the previous method.

A Study on Heavy Rainfall Guidance Realized with the Aid of Neuro-Fuzzy and SVR Algorithm Using AWS Data (AWS자료 기반 SVR과 뉴로-퍼지 알고리즘 구현 호우주의보 가이던스 연구)

  • Kim, Hyun-Myung;Oh, Sung-Kwun;Kim, Yong-Hyuk;Lee, Yong-Hee
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.4
    • /
    • pp.526-533
    • /
    • 2014
  • In this study, we introduce design methodology to develop a guidance for issuing heavy rainfall warning by using both RBFNNs(Radial basis function neural networks) and SVR(Support vector regression) model, and then carry out the comparative studies between two pattern classifiers. Individual classifiers are designed as architecture realized with the aid of optimization and pre-processing algorithm. Because the predictive performance of the existing heavy rainfall forecast system is commonly affected from diverse processing techniques of meteorological data, under-sampling method as the pre-processing method of input data is used, and also data discretization and feature extraction method for SVR and FCM clustering and PSO method for RBFNNs are exploited respectively. The observed data, AWS(Automatic weather wtation), supplied from KMA(korea meteorological administration), is used for training and testing of the proposed classifiers. The proposed classifiers offer the related information to issue a heavy rain warning in advance before 1 to 3 hours by using the selected meteorological data and the cumulated precipitation amount accumulated for 1 to 12 hours from AWS data. For performance evaluation of each classifier, ETS(Equitable Threat Score) method is used as standard verification method for predictive ability. Through the comparative studies of two classifiers, neuro-fuzzy method is effectively used for improved performance and to show stable predictive result of guidance to issue heavy rainfall warning.

A Study on the Cerber-Type Ransomware Detection Model Using Opcode and API Frequency and Correlation Coefficient (Opcode와 API의 빈도수와 상관계수를 활용한 Cerber형 랜섬웨어 탐지모델에 관한 연구)

  • Lee, Gye-Hyeok;Hwang, Min-Chae;Hyun, Dong-Yeop;Ku, Young-In;Yoo, Dong-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.363-372
    • /
    • 2022
  • Since the recent COVID-19 Pandemic, the ransomware fandom has intensified along with the expansion of remote work. Currently, anti-virus vaccine companies are trying to respond to ransomware, but traditional file signature-based static analysis can be neutralized in the face of diversification, obfuscation, variants, or the emergence of new ransomware. Various studies are being conducted for such ransomware detection, and detection studies using signature-based static analysis and behavior-based dynamic analysis can be seen as the main research type at present. In this paper, the frequency of ".text Section" Opcode and the Native API used in practice was extracted, and the association between feature information selected using K-means Clustering algorithm, Cosine Similarity, and Pearson correlation coefficient was analyzed. In addition, Through experiments to classify and detect worms among other malware types and Cerber-type ransomware, it was verified that the selected feature information was specialized in detecting specific ransomware (Cerber). As a result of combining the finally selected feature information through the above verification and applying it to machine learning and performing hyper parameter optimization, the detection rate was up to 93.3%.

Scalable Cluster Overlay Source Routing Protocol (확장성을 갖는 클러스터 기반의 라우팅 프로토콜)

  • Jang, Kwang-Soo;Yang, Hyo-Sik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.3
    • /
    • pp.83-89
    • /
    • 2010
  • Scalable routing is one of the key challenges in designing and operating large scale MANETs. Performance of routing protocols proposed so far is only guaranteed under various limitation, i.e., dependent of the number of nodes in the network or needs the location information of destination node. Due to the dependency to the number of nodes in the network, as the number of nodes increases the performance of previous routing protocols degrade dramatically. We propose Cluster Overlay Dynamic Source Routing (CODSR) protocol. We conduct performance analysis by means of computer simulation under various conditions - diameter scaling and density scaling. Developed algorithm outperforms the DSR algorithm, e.g., more than 90% improvement as for the normalized routing load. Operation of CODSR is very simple and we show that the message and time complexity of CODSR is independent of the number of nodes in the network which makes CODSR highly scalable.

Analysis on the Efficiency Change in Electric Vehicle Charging Stations Using Multi-Period Data Envelopment Analysis (다기간 자료포락분석을 이용한 전기차 충전소 효율성 변화 분석)

  • Son, Dong-Hoon;Gang, Yeong-Su;Kim, Hwa-Joong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • It is highly challenging to measure the efficiency of electric vehicle charging stations (EVCSs) because factors affecting operational characteristics of EVCSs are time-varying in practice. For the efficiency measurement, environmental factors around the EVCSs can be considered because such factors affect charging behaviors of electric vehicle drivers, resulting in variations of accessibility and attractiveness for the EVCSs. Considering dynamics of the factors, this paper examines the technical efficiency of 622 electric vehicle charging stations in Seoul using data envelopment analysis (DEA). The DEA is formulated as a multi-period output-oriented constant return to scale model. Five inputs including floating population, number of nearby EVCSs, average distance of nearby EVCSs, traffic volume and traffic congestion are considered and the charging frequency of EVCSs is used as the output. The result of efficiency measurement shows that not many EVCSs has most of charging demand at certain periods of time, while the others are facing with anemic charging demand. Tobit regression analyses show that the traffic congestion negatively affects the efficiency of EVCSs, while the traffic volume and the number of nearby EVCSs are positive factors improving the efficiency around EVCSs. We draw some notable characteristics of efficient EVCSs by comparing means of the inputs related to the groups classified by K-means clustering algorithm. This analysis presents that efficient EVCSs can be generally characterized with the high number of nearby EVCSs and low level of the traffic congestion.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.