• Title/Summary/Keyword: 키워드 추출 방법

Search Result 355, Processing Time 0.026 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

Automatic Recommendation of Nearby Tourist Attractions related to Events (이벤트와 관련된 주변 관광지 자동 추천 알고리즘 개발)

  • Ahn, Jinhyun;Im, Dong-Hyuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.3
    • /
    • pp.407-413
    • /
    • 2020
  • Participating in exhibitions is one of the major activities for tourists. When selecting their next travel destination after participating in an event, they use map services and social network services, such as blogs, to obtain information about tourist attractions. The map services are location-based recommendations, because they can easily retrieve information regarding nearby places. Blogs contain informative content about tourist attractions, thereby providing content-based recommendations. However, few services consider both location and content. In location-based recommendations, tourist attractions that are not related to the content of the event attended might be recommended. Content-based recommendation has a disadvantage in that events located at a distance might get recommended. We propose an algorithm that considers both location and content, based on information from the Korea Tourism Organization's Linked Open Data (LOD), Wikipedia, and a Korean dictionary. By extracting nouns from the description of a tourist attraction and then comparing them with nouns about other attractions, a content-based relationship is determined. The distance to the event is calculated based on the latitude and longitude of each tourist attraction. A weight selected by the user is used for linear combination with the content-based relationship to determine the preference order of the recommendations.

Analysis of Research Trends in Relation to the Yellow Sea using Text Mining (텍스트 마이닝을 활용한 황해 관련 연구동향 분석연구)

  • Kyu Won Hwang;Kim Jinkyung;Kang Seung-Koo;Kang Gil Mo
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.724-739
    • /
    • 2023
  • Located in the sea area between South Korea, North Korea, and China, the Yellow Sea plays an important role from a geopolitical perspective, and recently, as the use of marine space in the Yellow Sea is expanding, its social and economic values have been increasing further. In addition, owing to rapid climate changes, the need for joint response and cooperation between Korea and China is increasing in various fields, including changes in the marine environment and marine ecosystem and generation and movement of air pollutants. Accordingly, in this study, core topics were derived from research papers with the Yellow Sea as a keyword, and research trends to date were explored through author network analysis. As a specific research method, research papers related to the Yellow Sea published between 1984 and 2021 were extracted from the Web of Science database and were classified into four periods to derive core topics using topic modeling, a type of text mining. Furthermore, the influences of major research communities, researchers, and research institutes in the appropriate fields were identified through analyzing the author network, and their implications were presented. The analysis results indicated that the core topics of research papers on the Yellow Sea had changed over time, and differences existed in the influence (centrality) of key researchers. Finally, based on the results of this study, this study aims to identify research trends related to the Yellow Sea, major researchers, and research institutes and contribute to research cooperation between Korea and China regarding the Yellow Sea in the future.

A Comparative Analysis of Research Trends in Korean Modern Medicine: Focusing on Two Journals of Medical School (근대의학 논문의 계량학적 방법을 통한 연구 경향 비교 분석 - 의학전문학교 학술지 2종을 중심으로 -)

  • Mijin Seo;Jisu Lee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.34 no.4
    • /
    • pp.29-54
    • /
    • 2023
  • This study aimed to analyze the research trends of journal articles published by medical schools representing Korean modern. A total of 682 were selected from two journals published by Medical College in Keijo and Keijo Imperial University Medical Faculty. In results, the affiliations of authors who participated in Acta Medicinalia in Keijo included various schools and hospitals, and the authors' major was found to be similar in basic medicine and clinical medicine. In The Keijo Journal of Medicine, only school-affiliated authors participated, and 96.33% of the authors were majors in basic medicine. Co-occurrence network analysis was conducted on MeSH terms from the title of the article using MeSH on Demand, and the keyword that derived in both journals was 'erythrocytes', which analyzed the condition of red blood cells according to organs and diseases. In frequency analysis, a common area of research in both journals was the study focusing on blood and blood cells, and the study of anemia and tuberculosis, which were prevalent diseases at the time. As for comparing each journal, Acta Medicinalia in Keijo has focused on inflammatory diseases and clinical pathological studies in humans, and The Keijo Journal of Medicine has focused on anatomical studies on animals and pharmacological studies on medicines. Through this study, it was possible to identify the research topics and major keywords in two medical schools with different founding goals.

Study on the Policy of Supporting University Students in the Beauty Field through Social Big Data Analysis: Based on exploratory data analytics (소셜 빅 데이터 분석을 통한 미용분야 대학생 창업지원 정책에 관한 연구 -탐색적 데이터 분석법을 기반으로-)

  • Mi-Yun Yoon;Nam-hoon Park
    • Journal of the Korean Applied Science and Technology
    • /
    • v.39 no.6
    • /
    • pp.853-863
    • /
    • 2022
  • In order to revitalize start-ups in the beauty field, this study attempted to derive characteristic patterns of changes in demand and differences in emotions and meaning for 'beauty start-ups' by dividing the period by year from 2019 to 2021 based on exploratory data analysis (EDA). Most of the search terms related to the keyword "beauty start-up" showed more interest in institutions or certificates that can learn beauty skills than professional start-up education, which still does not recognize the importance of start-up education, and as an alternative, it is necessary to develop customized start-up education programs for each major. We establish hypotheses through exploratory data analysis and verify hypotheses by combining traditional corroborative data analysis (CDA). There has never been an exploratory data analysis method for beauty startups, and rather than mentioning the need for formal start-up education, analyzing changes in interest in beauty startups and the requirements of prospective start-ups with exploratory data will help develop customized start-up programs.

NFT(Non-Fungible Token) Patent Trend Analysis using Topic Modeling

  • Sin-Nyum Choi;Woong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.41-48
    • /
    • 2023
  • In this paper, we propose an analysis of recent trends in the NFT (Non-Fungible Token) industry using topic modeling techniques, focusing on their universal application across various industrial fields. For this study, patent data was utilized to understand industry trends. We collected data on 371 domestic and 454 international NFT-related patents registered in the patent information search service KIPRIS from 2017, when the first NFT standard was introduced, to October 2023. In the preprocessing stage, stopwords and lemmas were removed, and only noun words were extracted. For the analysis, the top 50 words by frequency were listed, and their corresponding TF-IDF values were examined to derive key keywords of the industry trends. Next, Using the LDA algorithm, we identified four major latent topics within the patent data, both domestically and internationally. We analyzed these topics and presented our findings on NFT industry trends, underpinned by real-world industry cases. While previous review presented trends from an academic perspective using paper data, this study is significant as it provides practical trend information based on data rooted in field practice. It is expected to be a useful reference for professionals in the NFT industry for understanding market conditions and generating new items.

A Study on Martial art for suggesting the role of Martial art Sports as a Leisure Activity (여가활동으로서의 무도스포츠 역할 재고를 위한 고찰)

  • Lim, Young-Sam
    • Journal of the Korean Applied Science and Technology
    • /
    • v.37 no.3
    • /
    • pp.564-570
    • /
    • 2020
  • The purpose of this study is to examine the preceding studies and suggest alternatives to establish the role and value of martial arts sports as a leisure activity. To achieve the purpose of this study, the keywords and themes of leisure-related journals were extracted and the articles and current status of martial arts-related journals in Korea were derived using SPSS descriptive statistics method. The subjects of the analysis were 'mudo' and 'leisure' of leisure-related journals between 2005 and 2017, and an interpretative textual analysis was conducted to analyze the contents of individual studies. According to the results of the survey on the actual condition of participation in the 2016 National Sports for all, Taekwondo was ranked 5th among the top 5 sports with 6.1% of the sports for all, and Taekwondo and Kendo were ranked 1st and 2nd respectively in the sports for all students and the clubs that they want to join in the future. Second, the study on martial arts in the journals related to leisure was found most in 2006 and 2010, but only one study was not conducted after 2014, which confirmed that the absolute number of studies was very insufficient. Third, the research themes of the journals related to leisure were serious leisure, female college students, physical self-concept, social development, leisure recreation class, job satisfaction, life satisfaction, training, leisure constraints, etc., and the study of martial arts related to leisure was found to require quantitative and qualitative multilateral approaches. Fourth, in the current status of dance related studies by year in domestic journals, two of 23 studies in 2007 were conducted on leisure topics, and the average number of studies related to domestic martial arts and leisure related papers was 5.65%, which is very low. In conclusion, as a result of analyzing the trend of research on leisure as a martial arts sport, it is necessary to suggest the direction of future research that can reconsider the role and value of martial arts sport as a leisure activity that can improve the quality of life and happiness in future society through quantitative and qualitative improvement.

Dynamic Virtual Ontology using Tags with Semantic Relationship on Social-web to Support Effective Search (효율적 자원 탐색을 위한 소셜 웹 태그들을 이용한 동적 가상 온톨로지 생성 연구)

  • Lee, Hyun Jung;Sohn, Mye
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.19-33
    • /
    • 2013
  • In this research, a proposed Dynamic Virtual Ontology using Tags (DyVOT) supports dynamic search of resources depending on user's requirements using tags from social web driven resources. It is general that the tags are defined by annotations of a series of described words by social users who usually tags social information resources such as web-page, images, u-tube, videos, etc. Therefore, tags are characterized and mirrored by information resources. Therefore, it is possible for tags as meta-data to match into some resources. Consequently, we can extract semantic relationships between tags owing to the dependency of relationships between tags as representatives of resources. However, to do this, there is limitation because there are allophonic synonym and homonym among tags that are usually marked by a series of words. Thus, research related to folksonomies using tags have been applied to classification of words by semantic-based allophonic synonym. In addition, some research are focusing on clustering and/or classification of resources by semantic-based relationships among tags. In spite of, there also is limitation of these research because these are focusing on semantic-based hyper/hypo relationships or clustering among tags without consideration of conceptual associative relationships between classified or clustered groups. It makes difficulty to effective searching resources depending on user requirements. In this research, the proposed DyVOT uses tags and constructs ontologyfor effective search. We assumed that tags are extracted from user requirements, which are used to construct multi sub-ontology as combinations of tags that are composed of a part of the tags or all. In addition, the proposed DyVOT constructs ontology which is based on hierarchical and associative relationships among tags for effective search of a solution. The ontology is composed of static- and dynamic-ontology. The static-ontology defines semantic-based hierarchical hyper/hypo relationships among tags as in (http://semanticcloud.sandra-siegel.de/) with a tree structure. From the static-ontology, the DyVOT extracts multi sub-ontology using multi sub-tag which are constructed by parts of tags. Finally, sub-ontology are constructed by hierarchy paths which contain the sub-tag. To create dynamic-ontology by the proposed DyVOT, it is necessary to define associative relationships among multi sub-ontology that are extracted from hierarchical relationships of static-ontology. The associative relationship is defined by shared resources between tags which are linked by multi sub-ontology. The association is measured by the degree of shared resources that are allocated into the tags of sub-ontology. If the value of association is larger than threshold value, then associative relationship among tags is newly created. The associative relationships are used to merge and construct new hierarchy the multi sub-ontology. To construct dynamic-ontology, it is essential to defined new class which is linked by two more sub-ontology, which is generated by merged tags which are highly associative by proving using shared resources. Thereby, the class is applied to generate new hierarchy with extracted multi sub-ontology to create a dynamic-ontology. The new class is settle down on the ontology. So, the newly created class needs to be belong to the dynamic-ontology. So, the class used to new hyper/hypo hierarchy relationship between the class and tags which are linked to multi sub-ontology. At last, DyVOT is developed by newly defined associative relationships which are extracted from hierarchical relationships among tags. Resources are matched into the DyVOT which narrows down search boundary and shrinks the search paths. Finally, we can create the DyVOT using the newly defined associative relationships. While static data catalog (Dean and Ghemawat, 2004; 2008) statically searches resources depending on user requirements, the proposed DyVOT dynamically searches resources using multi sub-ontology by parallel processing. In this light, the DyVOT supports improvement of correctness and agility of search and decreasing of search effort by reduction of search path.

Text Mining-Based Emerging Trend Analysis for e-Learning Contents Targeting for CEO (텍스트마이닝을 통한 최고경영자 대상 이러닝 콘텐츠 트렌드 분석)

  • Kyung-Hoon Kim;Myungsin Chae;Byungtae Lee
    • Information Systems Review
    • /
    • v.19 no.2
    • /
    • pp.1-19
    • /
    • 2017
  • Original scripts of e-learning lectures for the CEOs of corporation S were analyzed using topic analysis, which is a text mining method. Twenty-two topics were extracted based on the keywords chosen from five-year records that ranged from 2011 to 2015. Research analysis was then conducted on various issues. Promising topics were selected through evaluation and element analysis of the members of each topic. In management and economics, members demonstrated high satisfaction and interest toward topics in marketing strategy, human resource management, and communication. Philosophy, history of war, and history demonstrated high interest and satisfaction in the field of humanities, whereas mind health showed high interest and satisfaction in the field of in lifestyle. Studies were also conducted to identify topics on the proportion of content, but these studies failed to increase member satisfaction. In the field of IT, educational content responds sensitively to change of the times, but it may not increase the interest and satisfaction of members. The present study found that content production for CEOs should draw out deep implications for value innovation through technology application instead of simply ending the technical aspect of information delivery. Previous studies classified contents superficially based on the name of content program when analyzing the status of content operation. However, text mining can derive deep content and subject classification based on the contents of unstructured data script. This approach can examine current shortages and necessary fields if the service contents of the themes are displayed by year. This study was based on data obtained from influential e-learning companies in Korea. Obtaining practical results was difficult because data were not acquired from portal sites or social networking service. The content of e-learning trends of CEOs were analyzed. Data analysis was also conducted on the intellectual interests of CEOs in each field.