• Title/Summary/Keyword: Clustering behavior

Search Result 182, Processing Time 0.025 seconds

A Study on the Cerber-Type Ransomware Detection Model Using Opcode and API Frequency and Correlation Coefficient (Opcode와 API의 빈도수와 상관계수를 활용한 Cerber형 랜섬웨어 탐지모델에 관한 연구)

  • Lee, Gye-Hyeok;Hwang, Min-Chae;Hyun, Dong-Yeop;Ku, Young-In;Yoo, Dong-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.363-372
    • /
    • 2022
  • Since the recent COVID-19 Pandemic, the ransomware fandom has intensified along with the expansion of remote work. Currently, anti-virus vaccine companies are trying to respond to ransomware, but traditional file signature-based static analysis can be neutralized in the face of diversification, obfuscation, variants, or the emergence of new ransomware. Various studies are being conducted for such ransomware detection, and detection studies using signature-based static analysis and behavior-based dynamic analysis can be seen as the main research type at present. In this paper, the frequency of ".text Section" Opcode and the Native API used in practice was extracted, and the association between feature information selected using K-means Clustering algorithm, Cosine Similarity, and Pearson correlation coefficient was analyzed. In addition, Through experiments to classify and detect worms among other malware types and Cerber-type ransomware, it was verified that the selected feature information was specialized in detecting specific ransomware (Cerber). As a result of combining the finally selected feature information through the above verification and applying it to machine learning and performing hyper parameter optimization, the detection rate was up to 93.3%.

[Retracted]Analysis of Slope Safety by Tension Wire Data ([논문철회]지표변위계를 활용한 비탈면 안정성 예측)

  • Lee, Seokyoung;Jang, Seoyong;Kim, Taesoo;Han, Heuisoo
    • Journal of the Korean GEO-environmental Society
    • /
    • v.16 no.4
    • /
    • pp.5-12
    • /
    • 2015
  • Civil engineers have taken the numerous slope monitoring data for an engineering project subjected to hazard potential of slide. However, the topics on how to deal with and draw out proper information from the data related to the slope behavior have not been widely discussed. Recently, several researchers had installed the real-time monitoring system to cope with slope failure; however they are mainly focused on the hardware system installation. Therefore, this study tries to show how the measured data could be grouped and connected each other. The basic idea of analyzing method studied in this paper came from the clustering, which is the part of data mining analysis. Therefore, at the base of classification of time series data, the authors suggest three mathematical data analyzing methods; Average Index of different displacement ($AD_{i,j}$), Difference of average relative displacement ($\overline{RD}_{i,j}$) and Coordinate system of average and relative displacement ($\overline{RD}$, AD). These analyzing methods are based on the statistical method and failure mechanism of slope. Therefore they showed clustering relationships of the similar parts of the slope which makes the same sliding mechanism.

A study on the weight control behavior according to cluster types of the motivation to use social media among university students in the Jeonbuk area (전북지역 대학생의 소셜미디어 이용동기 유형에 따른 체중조절 행태 연구)

  • Jiyoon Lee;Sung Suk Chung;Jeong Ok Rho
    • Journal of Nutrition and Health
    • /
    • v.56 no.2
    • /
    • pp.203-216
    • /
    • 2023
  • Purpose: This study examines the weight control behavior depending on university students' motives of using social media. Methods: The participants were 447 university students in the Jeonbuk area. Collected data were analyzed using factor analysis, cluster analysis, analysis of variance, and χ2 tests with SPSS v. 26.0. Considering the motives of using social media, we investigated the usage of social media, dietary behavior related to social media, and weight control behavior. Results: Using the K-clustering method, the motives to use social media were categorized into three clusters: cluster 1 was the interest-centered group, cluster 2 was the multipurpose information-seeking group, and cluster 3 was the relationship-centered group. Among the various social media sites, YouTube (86.8%), Instagram (76.1%), and Facebook (61.1%) were the most visited by the subjects. The dietary behavior related to social media in cluster 2 was significantly higher than clusters 1 and 3 (p < 0.001). Clusters 1 and 2 showed a significantly higher dissatisfaction with one's weight (p < 0.05) and consequent interest in weight control than cluster 3 (p < 0.001). Cluster 2 used weight control-related information from social media significantly more than other clusters (p < 0.05). Weight control experiences in cluster 1 and 2 were significantly higher than in cluster 3 (p < 0.001). Conclusion: Differences in dietary behavior related to social media and weight control behavior were observed between cluster types of motivation to use social media. Based on the usage motives of university students and their behaviors, we propose that educational programs should be conducted for weight control using social media.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

Managerial Implication of Trails in the Teabaeksan National Park Derived from the Analysis of Visitors Behaviors Using Automatic Visitor Counter Data (탐방객 자동 계수기 데이터를 활용한 태백산국립공원 탐방로 탐방 행태 분석 및 관리 방안 제언)

  • Sung, Chan Yong;Cho, Woo;Kim, Jong-Sub
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.5
    • /
    • pp.446-453
    • /
    • 2020
  • This study built a model to predict the daily number of visitors to 18 trails in the Taebaeksan National Park using the auto-counter system data to analyze the factors affecting the daily number of visitors to each trail and classified the trails by visitors' behaviors. Results of the multiple regression models with the daily number of visitors of the 18 trails indicated that the events, such as the National Foundation Day celebration of Snow Festival, affected the number of visitors of all of the 18 trails and were the most critical factor that determined the daily number of visitors to the Taebaeksan National Park. The long-holidays of three days or longer and other national holidays also affected the daily number of visitors to the trails. Precipitation had a negative impact on the number of visitors of trails where the intention of most visitors was for sightseeing or camping instead of hiking, whereas had no significant impacts on the number of visitors of trails where many visitors intended for hiking. It indicated that visitors who intended for hiking went ahead hiking even if the weather was poor. The effects of temperature had a positive effect on the number of visitors who intended for hiking but a negative effect on the number of visitor to the trails near Danggol Plaza where the Snow Festival was held in each winter, suggesting that the impact of the Snow Festival was the deterministic factor for trail management. Results of K-mean clustering showed that the 18 trails of the Taekbaeksan National Park could be classified into three types: those affected by the Snow Festival (type 1), those that have sightseeing points and so were visited mostly by non-hikers (type 2), and those visited mostly by hikers (type 3). Since visitor behaviors and illegal actions differ according to the trail type, this study's results can be used to prepare a trail management plan based on the trail characteristics.

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.

Cluster Cell Separation Algorithm for Automated Cell Tracking (자동 세포 추적을 위한 클러스터 세포 분리 알고리즘)

  • Cho, Mi Gyung;Shim, Jaesool
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.37 no.3
    • /
    • pp.259-266
    • /
    • 2013
  • An automated cell tracking system is used to automatically analyze and track the changes in cell behavior in time-lapse cell images acquired using a microscope with a cell culture. Clustering is the partial overlapping of neighboring cells in the process of cell change. Separating clusters into individual cells is very important for cell tracking. In this study, we proposed an algorithm for separating clusters by using ellipse fitting based on a direct least square method. We extracted the contours of clusters, divided them into line segments, and then produced their fitted ellipses using a direct least square method for each line segment. All of the fitted ellipses could be used to separate their corresponding clusters. In experiments, our algorithm separated clusters with average precisions of 91% for two overlapping cells, 84% for three overlapping cells, and about 73% for four overlapping cells.

Discovery of Travel Patterns in Seoul Metropolitan Subway Using Big Data of Smart Card Transaction Systems (스마트카드 빅데이터를 이용한 서울시 지하철 이동패턴 분석)

  • Kim, Kwanho;Oh, Kyuhyup;Lee, Yeong Kyu;Jung, Jae-Yoon
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.3
    • /
    • pp.211-222
    • /
    • 2013
  • Discovering zones which a1re sets of geographically adjacent regions are essential in sophisticated urban developments and people's movement improvements. While there are some studies that separately focus on movements between particular regions and zone discovery, they show limitations to understand people's movements from a wider viewpoint. Therefore, in this research, we propose a clustering based analysis method that aims at discovering movement patterns, which involves zones and their relations, based on a big data of smart card transaction systems. Moreover, the effectiveness of discovered movement patterns is quantitatively evaluated by using the proposed metrics. By using a real-world dataset obtained in Seoul metropolitan subway networks, we investigate and visualize hidden movement patterns in Seoul.

Improvement of Component Design using Component Metrics (컴포넌트 메트릭스를 이용한 컴포넌트 설계 재정비)

  • 고병선;박재년
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.980-990
    • /
    • 2004
  • The component-based development methodology aims at the high state of abstraction and the reusability with components larger than classes. It is indispensible to measure the component so as to improve the quality of the component-based system and the individual component. And, the quality of the component should be improved through putting the results into the process of the development. So, it is necessary to study the component metric which can be applied in the stage of the component analysis and design. Hence, in this paper, we propose component cohesion, coupling, independence metrics reflecting the information extracted in the step of component analysis and design. The proposed component metric bases on the similarity information about behavior patterns of operations to offer the component's service. Also, we propose the redesigning process for the improvement of component design. That process uses the techniques of clustering and is for the thing that makes the component as the independent functional unit having the low complexity and easy maintenance. And, we examine that the component design model can be improved by the component metrics and the component redesigning process.

Analysis on Characteristics of Sediment Produce by Landslide in a Basin 2. Rainfall Event-based Analysis (유역 내에서의 산사태에 의한 토사발생특성 분석 2. 강우사상별 분석)

  • Yoo, Chul-Sang;Kim, Kee-Wook
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.10 no.3
    • /
    • pp.147-154
    • /
    • 2010
  • This study analyzed the characteristics of sediment produce by landslide triggered by rainfall. One-dimensional unsaturated groundwater model and infinite slope stability analysis were used to estimate the behavior of soil moisture and slope stability according to rainfall, respectively. Slope stability analysis was performed considering on soil depth and characteristics of trees. The results of the analysis on characteristics of sediment produce according to rainfall events showed that the sediment produce by landslide was mainly contributed to rainfall intensity and its temporal clustering. The results of the analysis on characteristics of sediment produce by extreme events showed that remaining rainfall amount of typhoon 'Rusa' was much more than that of the other extreme events, and thus this remaining rainfall was to contribute to sediment transportation. Additionally, only a small number of extreme events were found to cause most amount of sediment produce in a basin.