• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.036 seconds

Analysis of User Complaints of the Air Force Weapon System Using Text Mining (텍스트마이닝을 활용한 항공무기체계의 사용자불만 분석)

  • Hyewon Hwang;Youngjin Kim;Jeonghwan Jeon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.6
    • /
    • pp.784-796
    • /
    • 2024
  • User complaints are occurring due to the inability to meet user needs, such as the performance and ease of use of military supplies. Over the past five years, an average of 1,115 user complaints have occurred, and the Defense Agency for Technology and Quality(DTaQ) is handling the complaints collected from the requesting military. This user complaint information is accumulated as unstructured data in the Quality Information Service(IQIS) and Excel, making systematic analysis difficult. Therefore, this study aims to identify the status of user complaints related to air weapon systems using network analysis. This research is significant as it quantitatively analyzes user complaint data through the analysis of unstructured data, and the results are expected to serve as reference material for future quality assurance activities and user complaint handling.

Text-mining to Explore ESG Disclosure in the Fashion Industry (텍스트 마이닝을 통한 패션 기업의 ESG 정보 유형화)

  • Min Jung Kim;Sojeong Kim;Yu-na Lee;Sojin Jung
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.48 no.5
    • /
    • pp.883-899
    • /
    • 2024
  • The aim of this study was to investigate fashion firms' environmental, social, and governance (ESG) information disclosure. A total of 25 fashion firms (e.g., Adidas, Burberry Group, Nike, Ralph Lauren Corp.) were selected, including eight luxury brands and eight athleisure brands. Thus, three groups were formed for analysis: the entire group (N = 25), luxury brands (N = 8), and athleisure brands (N = 8). Based on the ESG information disclosed on the firms' official web pages, 1128 valid words were extracted. The top keywords for each brand group were identified based on the frequency and term frequency-inverse document frequency (TF-IDF), and semantic network analysis and convergence of iterated correlations (CONCOR) analysis were performed. The results revealed that several keywords and clusters emerged with respect to unique attributes of the fashion industry, and they also revealed inconsistent ESG clusters according to brand type. The findings have significant academic and managerial implications.

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

Design and Implementation of Web Crawler with Real-Time Keyword Extraction based on the RAKE Algorithm

  • Zhang, Fei;Jang, Sunggyun;Joe, Inwhee
    • Annual Conference of KIPS
    • /
    • 2017.11a
    • /
    • pp.395-398
    • /
    • 2017
  • We propose a web crawler system with keyword extraction function in this paper. Researches on the keyword extraction in existing text mining are mostly based on databases which have already been grabbed by documents or corpora, but the purpose of this paper is to establish a real-time keyword extraction system which can extract the keywords of the corresponding text and store them into the database together while grasping the text of the web page. In this paper, we design and implement a crawler combining RAKE keyword extraction algorithm. It can extract keywords from the corresponding content while grasping the content of web page. As a result, the performance of the RAKE algorithm is improved by increasing the weight of the important features (such as the noun appearing in the title). The experimental results show that this method is superior to the existing method and it can extract keywords satisfactorily.

A Study on Automatic Analysis System of National Defense Articles (국방 기사 자동 분석 시스템 구축 방안 연구)

  • Kim, Hyunjung;Kim, Wooju
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.21 no.1
    • /
    • pp.86-93
    • /
    • 2018
  • Since media articles, which have a great influence on public opinion, are transmitted to the public through various media, it is very difficult to analyze them manually. There are many discussions on methods that can collect, process, and analyze documents in the academia, but this is mostly done in the areas related to politics and stocks, and national-defense articles are poorly researched. In this study, we will explain how to build an automatic analysis system of national defense articles that can collect information on defense articles automatically, and can process information quickly by using topic modeling with LDA, emotional analysis, and extraction-based text summarization.

Towards cross-platform interoperability for machine-assisted text annotation

  • de Castilho, Richard Eckart;Ide, Nancy;Kim, Jin-Dong;Klie, Jan-Christoph;Suderman, Keith
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.19.1-19.10
    • /
    • 2019
  • In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.

Text Based Explainable AI for Monitoring National Innovations (텍스트 기반 Explainable AI를 적용한 국가연구개발혁신 모니터링)

  • Jung Sun Lim;Seoung Hun Bae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.1-7
    • /
    • 2022
  • Explainable AI (XAI) is an approach that leverages artificial intelligence to support human decision-making. Recently, governments of several countries including Korea are attempting objective evidence-based analyses of R&D investments with returns by analyzing quantitative data. Over the past decade, governments have invested in relevant researches, allowing government officials to gain insights to help them evaluate past performances and discuss future policy directions. Compared to the size that has not been used yet, the utilization of the text information (accumulated in national DBs) so far is low level. The current study utilizes a text mining strategy for monitoring innovations along with a case study of smart-farms in the Honam region.

A Recognition Method for Korean Spatial Background in Historical Novels (한국어 역사 소설에서 공간적 배경 인식 기법)

  • Kim, Seo-Hee;Kim, Seung-Hoon
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.245-253
    • /
    • 2016
  • Background in a novel is most important elements with characters and events, and means time, place and situation that characters appeared. Among the background, spatial background can help conveys topic of a novel. So, it may be helpful for choosing a novel that readers want to read. In this paper, we are targeting Korean historical novels. In case of English text, It can be recognize spatial background easily because it use upper and lower case and words used with the spatial information such as Bank, University and City. But, in case Korean text, it is difficult to recognize that spatial background because there is few information about usage of letter. In the previous studies, they use machine learning or dictionaries and rules to recognize about spatial information in text such as news and text messages. In this paper, we build a nation dictionaries that refer to information such as 'Korean history' and 'Google maps.' We Also propose a method for recognizing spatial background based on patterns of postposition in Korean sentences comparing to previous works. We are grasp using of postposition with spatial background because Korean characteristics. And we propose a method based on result of morpheme analyze and frequency in a novel text for raising accuracy about recognizing spatial background. The recognized spatial background can help readers to grasp the atmosphere of a novel and to understand the events and atmosphere through recognition of the spatial background of the scene that characters appeared.

Analysis of Factors Affecting Surge in Container Shipping Rates in the Era of Covid19 Using Text Analysis (코로나19 판데믹 이후 컨테이너선 운임 상승 요인분석: 텍스트 분석을 중심으로)

  • Rha, Jin Sung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.1
    • /
    • pp.111-123
    • /
    • 2022
  • In the era of the Covid19, container shipping rates are surging up. Many studies have attempted to investigate the factors affecting a surge in container shipping rates. However, there is limited literature using text mining techniques for analyzing the underlying causes of the surge. This study aims to identify the factors behind the unprecedented surge in shipping rates using network text analysis and LDA topic modeling. For the analysis, we collected the data and keywords from articles in Lloyd's List during past two years(2020-2021). The results of the text analysis showed that the current surge is mainly due to "US-China trade war", "rising blanking sailings", "port congestion", "container shortage", and "unexpected events such as the Suez canal blockage".

Analysis of Social Media Utilization based on Big Data-Focusing on the Chinese Government Weibo

  • Li, Xiang;Guo, Xiaoqin;Kim, Soo Kyun;Lee, Hyukku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2571-2586
    • /
    • 2022
  • The rapid popularity of government social media has generated huge amounts of text data, and the analysis of these data has gradually become the focus of digital government research. This study uses Python language to analyze the big data of the Chinese provincial government Weibo. First, this study uses a web crawler approach to collect and statistically describe over 360,000 data from 31 provincial government microblogs in China, covering the period from January 2018 to April 2022. Second, a word separation engine is constructed and these text data are analyzed using word cloud word frequencies as well as semantic relationships. Finally, the text data were analyzed for sentiment using natural language processing methods, and the text topics were studied using LDA algorithm. The results of this study show that, first, the number and scale of posts on the Chinese government Weibo have grown rapidly. Second, government Weibo has certain social attributes, and the epidemics, people's livelihood, and services have become the focus of government Weibo. Third, the contents of government Weibo account for more than 30% of negative sentiments. The classified topics show that the epidemics and epidemic prevention and control overshadowed the other topics, which inhibits the diversification of government Weibo.