• Title/Summary/Keyword: clustering algorithms

Search Result 606, Processing Time 0.025 seconds

Managing the Reverse Extrapolation Model of Radar Threats Based Upon an Incremental Machine Learning Technique (점진적 기계학습 기반의 레이더 위협체 역추정 모델 생성 및 갱신)

  • Kim, Chulpyo;Noh, Sanguk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.4
    • /
    • pp.29-39
    • /
    • 2017
  • Various electronic warfare situations drive the need to develop an integrated electronic warfare simulator that can perform electronic warfare modeling and simulation on radar threats. In this paper, we analyze the components of a simulation system to reversely model the radar threats that emit electromagnetic signals based on the parameters of the electronic information, and propose a method to gradually maintain the reverse extrapolation model of RF threats. In the experiment, we will evaluate the effectiveness of the incremental model update and also assess the integration method of reverse extrapolation models. The individual model of RF threats are constructed by using decision tree, naive Bayesian classifier, artificial neural network, and clustering algorithms through Euclidean distance and cosine similarity measurement, respectively. Experimental results show that the accuracy of reverse extrapolation models improves, while the size of the threat sample increases. In addition, we use voting, weighted voting, and the Dempster-Shafer algorithm to integrate the results of the five different models of RF threats. As a result, the final decision of reverse extrapolation through the Dempster-Shafer algorithm shows the best performance in its accuracy.

Case Analysis of the Promotion Methodologies in the Smart Exhibition Environment (스마트 전시 환경에서 프로모션 적용 사례 및 분석)

  • Moon, Hyun Sil;Kim, Nam Hee;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.171-183
    • /
    • 2012
  • In the development of technologies, the exhibition industry has received much attention from governments and companies as an important way of marketing activities. Also, the exhibitors have considered the exhibition as new channels of marketing activities. However, the growing size of exhibitions for net square feet and the number of visitors naturally creates the competitive environment for them. Therefore, to make use of the effective marketing tools in these environments, they have planned and implemented many promotion technics. Especially, through smart environment which makes them provide real-time information for visitors, they can implement various kinds of promotion. However, promotions ignoring visitors' various needs and preferences can lose the original purposes and functions of them. That is, as indiscriminate promotions make visitors feel like spam, they can't achieve their purposes. Therefore, they need an approach using STP strategy which segments visitors through right evidences (Segmentation), selects the target visitors (Targeting), and give proper services to them (Positioning). For using STP Strategy in the smart exhibition environment, we consider these characteristics of it. First, an exhibition is defined as market events of a specific duration, which are held at intervals. According to this, exhibitors who plan some promotions should different events and promotions in each exhibition. Therefore, when they adopt traditional STP strategies, a system can provide services using insufficient information and of existing visitors, and should guarantee the performance of it. Second, to segment automatically, cluster analysis which is generally used as data mining technology can be adopted. In the smart exhibition environment, information of visitors can be acquired in real-time. At the same time, services using this information should be also provided in real-time. However, many clustering algorithms have scalability problem which they hardly work on a large database and require for domain knowledge to determine input parameters. Therefore, through selecting a suitable methodology and fitting, it should provide real-time services. Finally, it is needed to make use of data in the smart exhibition environment. As there are useful data such as booth visit records and participation records for events, the STP strategy for the smart exhibition is based on not only demographical segmentation but also behavioral segmentation. Therefore, in this study, we analyze a case of the promotion methodology which exhibitors can provide a differentiated service to segmented visitors in the smart exhibition environment. First, considering characteristics of the smart exhibition environment, we draw evidences of segmentation and fit the clustering methodology for providing real-time services. There are many studies for classify visitors, but we adopt a segmentation methodology based on visitors' behavioral traits. Through the direct observation, Veron and Levasseur classify visitors into four groups to liken visitors' traits to animals (Butterfly, fish, grasshopper, and ant). Especially, because variables of their classification like the number of visits and the average time of a visit can estimate in the smart exhibition environment, it can provide theoretical and practical background for our system. Next, we construct a pilot system which automatically selects suitable visitors along the objectives of promotions and instantly provide promotion messages to them. That is, based on the segmentation of our methodology, our system automatically selects suitable visitors along the characteristics of promotions. We adopt this system to real exhibition environment, and analyze data from results of adaptation. As a result, as we classify visitors into four types through their behavioral pattern in the exhibition, we provide some insights for researchers who build the smart exhibition environment and can gain promotion strategies fitting each cluster. First, visitors of ANT type show high response rate for promotion messages except experience promotion. So they are fascinated by actual profits in exhibition area, and dislike promotions requiring a long time. Contrastively, visitors of GRASSHOPPER type show high response rate only for experience promotion. Second, visitors of FISH type appear favors to coupon and contents promotions. That is, although they don't look in detail, they prefer to obtain further information such as brochure. Especially, exhibitors that want to give much information for limited time should give attention to visitors of this type. Consequently, these promotion strategies are expected to give exhibitors some insights when they plan and organize their activities, and grow the performance of them.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

    • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
      • Asia pacific journal of information systems
      • /
      • v.21 no.1
      • /
      • pp.103-122
      • /
      • 2011
    • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

    Benchmark Results of a Monte Carlo Treatment Planning system (몬데카를로 기반 치료계획시스템의 성능평가)

    • Cho, Byung-Chul
      • Progress in Medical Physics
      • /
      • v.13 no.3
      • /
      • pp.149-155
      • /
      • 2002
    • Recent advances in radiation transport algorithms, computer hardware performance, and parallel computing make the clinical use of Monte Carlo based dose calculations possible. To compare the speed and accuracies of dose calculations between different developed codes, a benchmark tests were proposed at the XIIth ICCR (International Conference on the use of Computers in Radiation Therapy, Heidelberg, Germany 2000). A Monte Carlo treatment planning comprised of 28 various Intel Pentium CPUs was implemented for routine clinical use. The purpose of this study was to evaluate the performance of our system using the above benchmark tests. The benchmark procedures are comprised of three parts. a) speed of photon beams dose calculation inside a given phantom of 30.5 cm$\times$39.5 cm $\times$ 30 cm deep and filled with 5 ㎣ voxels within 2% statistical uncertainty. b) speed of electron beams dose calculation inside the same phantom as that of the photon beams. c) accuracy of photon and electron beam calculation inside heterogeneous slab phantom compared with the reference results of EGS4/PRESTA calculation. As results of the speed benchmark tests, it took 5.5 minutes to achieve less than 2% statistical uncertainty for 18 MV photon beams. Though the net calculation for electron beams was an order of faster than the photon beam, the overall calculation time was similar to that of photon beam case due to the overhead time to maintain parallel processing. Since our Monte Carlo code is EGSnrc, which is an improved version of EGS4, the accuracy tests of our system showed, as expected, very good agreement with the reference data. In conclusion, our Monte Carlo treatment planning system shows clinically meaningful results. Though other more efficient codes are developed such like MCDOSE and VMC++, BEAMnrc based on EGSnrc code system may be used for routine clinical Monte Carlo treatment planning in conjunction with clustering technique.

    • PDF

    An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

    • Park, Chul-Soo
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.3
      • /
      • pp.63-88
      • /
      • 2019
    • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.