• Title/Summary/Keyword: clustering method

Search Result 2,553, Processing Time 0.025 seconds

Managing the Reverse Extrapolation Model of Radar Threats Based Upon an Incremental Machine Learning Technique (점진적 기계학습 기반의 레이더 위협체 역추정 모델 생성 및 갱신)

  • Kim, Chulpyo;Noh, Sanguk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.4
    • /
    • pp.29-39
    • /
    • 2017
  • Various electronic warfare situations drive the need to develop an integrated electronic warfare simulator that can perform electronic warfare modeling and simulation on radar threats. In this paper, we analyze the components of a simulation system to reversely model the radar threats that emit electromagnetic signals based on the parameters of the electronic information, and propose a method to gradually maintain the reverse extrapolation model of RF threats. In the experiment, we will evaluate the effectiveness of the incremental model update and also assess the integration method of reverse extrapolation models. The individual model of RF threats are constructed by using decision tree, naive Bayesian classifier, artificial neural network, and clustering algorithms through Euclidean distance and cosine similarity measurement, respectively. Experimental results show that the accuracy of reverse extrapolation models improves, while the size of the threat sample increases. In addition, we use voting, weighted voting, and the Dempster-Shafer algorithm to integrate the results of the five different models of RF threats. As a result, the final decision of reverse extrapolation through the Dempster-Shafer algorithm shows the best performance in its accuracy.

Artificial Neural Network with Firefly Algorithm-Based Collaborative Spectrum Sensing in Cognitive Radio Networks

  • Velmurugan., S;P. Ezhumalai;E.A. Mary Anita
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1951-1975
    • /
    • 2023
  • Recent advances in Cognitive Radio Networks (CRN) have elevated them to the status of a critical instrument for overcoming spectrum limits and achieving severe future wireless communication requirements. Collaborative spectrum sensing is presented for efficient channel selection because spectrum sensing is an essential part of CRNs. This study presents an innovative cooperative spectrum sensing (CSS) model that is built on the Firefly Algorithm (FA), as well as machine learning artificial neural networks (ANN). This system makes use of user grouping strategies to improve detection performance dramatically while lowering collaboration costs. Cooperative sensing wasn't used until after cognitive radio users had been correctly identified using energy data samples and an ANN model. Cooperative sensing strategies produce a user base that is either secure, requires less effort, or is faultless. The suggested method's purpose is to choose the best transmission channel. Clustering is utilized by the suggested ANN-FA model to reduce spectrum sensing inaccuracy. The transmission channel that has the highest weight is chosen by employing the method that has been provided for computing channel weight. The proposed ANN-FA model computes channel weight based on three sets of input parameters: PU utilization, CR count, and channel capacity. Using an improved evolutionary algorithm, the key principles of the ANN-FA scheme are optimized to boost the overall efficiency of the CRN channel selection technique. This study proposes the Artificial Neural Network with Firefly Algorithm (ANN-FA) for cognitive radio networks to overcome the obstacles. This proposed work focuses primarily on sensing the optimal secondary user channel and reducing the spectrum handoff delay in wireless networks. Several benchmark functions are utilized We analyze the efficacy of this innovative strategy by evaluating its performance. The performance of ANN-FA is 22.72 percent more robust and effective than that of the other metaheuristic algorithm, according to experimental findings. The proposed ANN-FA model is simulated using the NS2 simulator, The results are evaluated in terms of average interference ratio, spectrum opportunity utilization, three metrics are measured: packet delivery ratio (PDR), end-to-end delay, and end-to-average throughput for a variety of different CRs found in the network.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

The Habitat Classification of mammals in Korea based on the National Ecosystem Survey (전국자연환경조사를 활용한 포유류 서식지 유형의 분류)

  • Lee, Hwajin;Ha, Jeongwook;Cha, Jinyeol;Lee, Junghyo;Yoon, Heenam;Chung, Chulun;Oh, Hongshik;Bae, Soyeon
    • Journal of Environmental Impact Assessment
    • /
    • v.26 no.2
    • /
    • pp.160-170
    • /
    • 2017
  • The purpose of this study is to perform clustering of the habitat types and to identify the characteristics of species in the habitat types using mammal data (70,562) of the 3rd National Ecosystem Survey conducted from 2006 to 2012. The 15 habitat types recorded in the field-paper of the 3rd National ecosystem survey were reclassified, which was followed by the statistical analysis of mammal habitat types. In the habitat types cluster analysis, non-hierarchical cluster analysis (k-means cluster analysis), hierarchical cluster analysis, and non-metric multidimensional scaling method were applied to 14 habitat types recorded more than 30 times. A total of 7 Orders, 16 Families, and 39 Species of mammals were identified in the 3rd National Ecosystem Survey collected nationwide. When 11 clusters were classified by habitat types, the simple structure index was the highest (ssi = 0.07). As a result of the similarities and hierarchies between habitat types suggested by the hierarchical clustering analysis, the residential areas were the most different habitat types for mammals; the next following type was a cluster together with rivers and coasts. The results of the non-metric multidimensional scaling analysis demonstrated that both Mus musculus and Rattus norvegicus restrictively appeared in a residential area, which is the most discriminating habitat type. Lutra lutra restrictively appeared in coastal and river areas. In summary, according to our results, the mammalian habitat can be divided into the following four types: (1) the forest type (using forest as the main habitat and migration route); (2) the river type (using water as the main habitat); (3) the residence habitat (living near residential area); and (4) the lowland type (consuming grain or seeds as the main feeding resource).

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

A Study on the Impact Factors of Contents Diffusion in Youtube using Integrated Content Network Analysis (일반영향요인과 댓글기반 콘텐츠 네트워크 분석을 통합한 유튜브(Youtube)상의 콘텐츠 확산 영향요인 연구)

  • Park, Byung Eun;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.19-36
    • /
    • 2015
  • Social media is an emerging issue in content services and in current business environment. YouTube is the most representative social media service in the world. YouTube is different from other conventional content services in its open user participation and contents creation methods. To promote a content in YouTube, it is important to understand the diffusion phenomena of contents and the network structural characteristics. Most previous studies analyzed impact factors of contents diffusion from the view point of general behavioral factors. Currently some researchers use network structure factors. However, these two approaches have been used separately. However this study tries to analyze the general impact factors on the view count and content based network structures all together. In addition, when building a content based network, this study forms the network structure by analyzing user comments on 22,370 contents of YouTube not based on the individual user based network. From this study, we re-proved statistically the causal relations between view count and not only general factors but also network factors. Moreover by analyzing this integrated research model, we found that these factors affect the view count of YouTube according to the following order; Uploader Followers, Video Age, Betweenness Centrality, Comments, Closeness Centrality, Clustering Coefficient and Rating. However Degree Centrality and Eigenvector Centrality affect the view count negatively. From this research some strategic points for the utilizing of contents diffusion are as followings. First, it is needed to manage general factors such as the number of uploader followers or subscribers, the video age, the number of comments, average rating points, and etc. The impact of average rating points is not so much important as we thought before. However, it is needed to increase the number of uploader followers strategically and sustain the contents in the service as long as possible. Second, we need to pay attention to the impacts of betweenness centrality and closeness centrality among other network factors. Users seems to search the related subject or similar contents after watching a content. It is needed to shorten the distance between other popular contents in the service. Namely, this study showed that it is beneficial for increasing view counts by decreasing the number of search attempts and increasing similarity with many other contents. This is consistent with the result of the clustering coefficient impact analysis. Third, it is important to notice the negative impact of degree centrality and eigenvector centrality on the view count. If the number of connections with other contents is too much increased it means there are many similar contents and eventually it might distribute the view counts. Moreover, too high eigenvector centrality means that there are connections with popular contents around the content, and it might lose the view count because of the impact of the popular contents. It would be better to avoid connections with too powerful popular contents. From this study we analyzed the phenomenon and verified diffusion factors of Youtube contents by using an integrated model consisting of general factors and network structure factors. From the viewpoints of social contribution, this study might provide useful information to music or movie industry or other contents vendors for their effective contents services. This research provides basic schemes that can be applied strategically in online contents marketing. One of the limitations of this study is that this study formed a contents based network for the network structure analysis. It might be an indirect method to see the content network structure. We can use more various methods to establish direct content network. Further researches include more detailed researches like an analysis according to the types of contents or domains or characteristics of the contents or users, and etc.

Fecal Microbiota Profiling of Holstein and Jersey, in South Korea : A Comparative Study (국내에서 사육되는 Holstein 젖소과 Jersey 젖소의 대변 미생물 분석 : 비교연구)

  • Gwangsu Ha;Ji-Won Seo;Hee Gun Yang;Se Won Park;Soo-Young Lee;Young Kyoung Park;RanHee Lee;Do-Youn Jeong;Hee-Jong Yang
    • Journal of Life Science
    • /
    • v.33 no.7
    • /
    • pp.565-573
    • /
    • 2023
  • In light of the complex interactions between the host animal and its resident gut microbiomes, studies of these microbial communities as a means to improve cattle production are important. This study was conducted to analyze the intestinal microorganisms of Holstein (HT) and Jersey (JS), raised in Korea and to clarify the differences in microbial structures according to cattle species through next-generation sequencing. The alpha-diversity analysis revealed that most species richness and diversity indices were significantly higher in JS than in HT whereas phylogenetic diversity, which is the sum of taxonomic distances, is not significant. Microbial composition analysis showed that the intestinal microbial community structure of the two groups differed. In the both groups, a significant correlation was observed among the distribution of several microbes at the family level. In particular, a highly significant correlation (p<0.0001) among a variety of microbial distributions was found in JS. Beta-diversity analyis was to performed to statistically verify whether a difference exists in the intestinal microbial community structure of the two groups. Principal coordinate analysis and unweighted pair group method with arithmetic mean (UPGMA) clustering analysis showed separation between the HT and JS clusters. Meanwhile, permutational multivariate analysis of variance (PERMANOVA) revealed that their microbial structures are significantly different (p<0.0001). LEfSe biomarker analysis was performed to discover the differenc microbial features between the two groups. We found that several microbes, such as Firmicutes, Bacilli, Moraxellaceae and Pseudomonadales account for most of the difference in intestinal microbial community structure between the two groups.

Genetic Traceability of Black Pig Meats Using Microsatellite Markers

  • Oh, Jae-Don;Song, Ki-Duk;Seo, Joo-Hee;Kim, Duk-Kyung;Kim, Sung-Hoon;Seo, Kang-Seok;Lim, Hyun-Tae;Lee, Jae-Bong;Park, Hwa-Chun;Ryu, Youn-Chul;Kang, Min-Soo;Cho, Seoae;Kim, Eui-Soo;Choe, Ho-Sung;Kong, Hong-Sik;Lee, Hak-Kyo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.27 no.7
    • /
    • pp.926-931
    • /
    • 2014
  • Pork from Jeju black pig (population J) and Berkshire (population B) has a unique market share in Korea because of their high meat quality. Due to the high demand of this pork, traceability of the pork to its origin is becoming an important part of the consumer demand. To examine the feasibility of such a system, we aim to provide basic genetic information of the two black pig populations and assess the possibility of genetically distinguishing between the two breeds. Muscle samples were collected from slaughter houses in Jeju Island and Namwon, Chonbuk province, Korea, for populations J and B, respectively. In total 800 Jeju black pigs and 351 Berkshires were genotyped at thirteen microsatellite (MS) markers. Analyses on the genetic diversity of the two populations were carried out in the programs MS toolkit and FSTAT. The population structure of the two breeds was determined by a Bayesian clustering method implemented in structure and by a phylogenetic analysis in Phylip. Population J exhibited higher mean number of alleles, expected heterozygosity and observed heterozygosity value, and polymorphism information content, compared to population B. The $F_{IS}$ values of population J and population B were 0.03 and -0.005, respectively, indicating that little or no inbreeding has occurred. In addition, genetic structure analysis revealed the possibility of gene flow from population B to population J. The expected probability of identify value of the 13 MS markers was $9.87{\times}10^{-14}$ in population J, $3.17{\times}10^{-9}$ in population B, and $1.03{\times}10^{-12}$ in the two populations. The results of this study are useful in distinguishing between the two black pig breeds and can be used as a foundation for further development of DNA markers.

Evaluation of Genetic Diversity among Persimmon Cultivars (Diospyros kaki Thunb.) Using Microsatellite Markers (초위성 마커를 이용한 감(Diospyros kaki Thunb.)의 유연관계 분석)

  • Hwang, Ji-Hyeon;Park, Yu-Ok;Kim, Sung-Churl;Lee, Yong-Jae;Kang, Jum-Soon;Choi, Young-Whan;Son, Beung-Gu;Park, Young-Hoon
    • Journal of Life Science
    • /
    • v.20 no.4
    • /
    • pp.632-638
    • /
    • 2010
  • The genetic diversity among 48 persimmon (Diospyros kaki Thunb.) accessions, indigenous in Korea and introduced from Japan and China, was evaluated by using simple sequence repeat (SSR) markers. From 20 SSR primer sets, a total of 114 polymorphic markers were detected among 12 pollination-constant non-astringent (PCNA), 13 pollination-variant non-astringent (PVNA), 15 pollination-variant astringent (PVA), and 8 pollination-constant astringent (PCA) cultivars. Analysis of pair-wise genetic similarity coefficient (Nei-Li) and unweighted pair-group method with arithmetic averaging (UPGMA) clustering revealed two main clusters and four subclusters for cluster I. The subclustering pattern was in accordance with the classification of persimmon cultivars based on the nature of astringency loss. Phenetic relationships among the subclusters showed a closer relatedness of the PCNA group with the PVNA group, and the PVA with the PCA group. Genetic similarity co-efficiency was 0.499 on average and the highest (0.954) similarity was observed between 'Cheongdo-Bansi' and 'Haman-Bansi'. The similarity was lowest (0.192) between 'Damopan'and 'Atago'. Identification of each cultivar with the execption of 'Cheongdo-Bansi' and 'Gyeongsan-Bansi' was possible based on the SSR fingerprints, suggesting that these SSR markers are a useful tool for protecting intellectual property on newly developed cultivars.

A Study on the Synecological Values of the Torreya nucifera Forest (Natural Monument No. 374) at Pyeongdae-ri in Jeju Island (천연기념물 제374호 제주 평대리 비자나무림의 식물생태학적 가치 제고)

  • Choi, Byoung-Ki;Lee, Chin-Bum
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.33 no.4
    • /
    • pp.87-98
    • /
    • 2015
  • The natural monument forests (no.374) located at Pyeongdae-ri in Jeju island are described and classified by using phytosociological methods and numerical analysis. The purpose of this paper is to identify the ecological character of Torreya nucifera forests between natural habitat and artificial habitat, as well as their spatial and phytogeographical distribution in the Korea. The comparison of forests between Pyeongdae-ri and other regions was analyzed by using a non-metric multidimensional scaling analysis (NMDS) and hierarchical clustering. On the basis of the 12 phytosociological $relev{\acute{e}}s$, the vegetation of T. nucifera dominant forest in Jeju island was arranged in one syntaxon (Alangium platanifolium-Torreya nucifera community included typicum and one subcommunity) within Camellietea. The community of T. nucifera dominant forests were characterized floristically and ecologically. We discussed diagnostic species with references, and proposed a few important diagnostic species (Ilex crenata for. microphylla, Acer palmatum, Zingiber mioga, Mercurialis leiocarpa, Osmorhiza aristata, Mecodium wrightii etc.) to explain condition of the habitat and synecological character. The communities were described by concerning their edaphical and syndynamical niche; we discussed their total distribution in Korea. In most forests they are widespread in Korean peninsular and their distribution is primarily determined by artificial plantation and periodical management. The forests consisted of T. nucifera have developed from natural environment element and artificial management. As a result they have very unique characters with the floristic, structural characterization and distribution. Furthermore, we identified that they need to apposite management for sustainability.