• Title/Summary/Keyword: Text data

Search Result 2,956, Processing Time 0.038 seconds

Comparison of term weighting schemes for document classification (문서 분류를 위한 용어 가중치 기법 비교)

  • Jeong, Ho Young;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.265-276
    • /
    • 2019
  • The document-term frequency matrix is a general data of objects in text mining. In this study, we introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency) which is applied in the document-term frequency matrix and used for text classifications. In addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently. This study also provides a method to extract keyword enhancing the quality of text classifications. Based on the keywords extracted, we applied support vector machine for the text classification. In this study, to compare the performance term weighting schemes, we used some performance metrics such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high performance metrics and was optimal for text classification.

Development of Elementary School AI Education Contents using Entry Text Model Learning (엔트리 텍스트 모델 학습을 활용한 초등 인공지능 교육 내용 개발)

  • Kim, Byungjo;Kim, Hyenbae
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.1
    • /
    • pp.65-73
    • /
    • 2022
  • In this study, by using Entry text model learning, educational contents for artificial intelligence education of elementary school students are developed and applied to actual classes. Based on the elementary and secondary artificial intelligence content table, the achievement standards of practical software education and artificial intelligence education will be reconstructed.. Among text, images, and sounds capable of machine learning, "production of emotion recognition programs using text model learning" will be selected as the educational content, which can be easily understood while reducing data preparation time for elementary school students. Entry artificial intelligence is selected as an education platform to develop artificial intelligence education contents that create emotion recognition programs using text model learning and apply them to actual elementary school classes. Based on the contents of this study, As a result of class application, students showed positive responses and interest in the entry AI class. it is suggested that quantitative research on the effectiveness of classes for elementary school students is necessary as a follow-up study.

The Design and Implementation of VDL M2 Data Link Software (VDL M2 데이터 링크 소프트웨어 설계 및 구현)

  • Kim, Hyoun-Kyoung;Yang, Kwang-Jik;Kim, Tae-Sik;Bae, Joong-Won
    • Aerospace Engineering and Technology
    • /
    • v.7 no.2
    • /
    • pp.11-20
    • /
    • 2008
  • The current air-to-ground communication between aircraft pilots and ground controllers is done by voice communication and text-based data communication. International Civil Aviation Organization (ICAO) suggested the digital data communication techniques to improve accuracy and effectiveness of the current air-to-ground communication. As one of them, VDL M2, a VHF band digital data communication link, is expected to substitute the voice communication and text-based ACARS data communication. In this paper, the software design and implementation of the VDL M2 system developed by Korea Aerospace Research Institute.

  • PDF

A Research on Automatic Data Extract Method for Herbal Formula Combinations Using Herb and Dosage Terminology - Based on 『Euijongsonik』 - (본초 및 용량 용어를 이용한 방제구성 자동추출방법에 대한 연구 -『의종손익』을 중심으로-)

  • Keum, Yujeong;Lee, Byungwook;Eom, Dongmyung;Song, Jichung
    • Journal of Korean Medical classics
    • /
    • v.33 no.4
    • /
    • pp.67-81
    • /
    • 2020
  • Objectives : This research aims to suggest a automatic data extract method for herbal formula combinations from medical classics' texts. Methods : This research was carried out by using Access of Microsoft Office 365 in Windows 10 of Microsoft. The subject text for extraction was 『Euijongsonik』. Using data sets of herb and dosage terminology, herbal medicinals and their dosages were extracted. Afterwards, using the position value of the character string, the formula combinations were automatically extracted. Results :The PC environment of this research was Intel Core i7-1065G7 CPU 1.30GHz, with 8GB of RAM and a Windows 10 64bit operation system. Out of 6,115 verses, 19,277 herb-dosage combinations were extracted. Conclusions : In this research, it was demonstrated that in the case of classical texts that are available as data, knowledge on herbal medicine could be extracted without human or material resources. This suggests an applicability of classical text knowledge to clinical practice.

Data Mining and FNN-Driven Knowledge Acquisition and Inference Mechanism for Developing A Self-Evolving Expert Systems

  • Kim, Jin-Sung
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.99-104
    • /
    • 2003
  • In this research, we proposed the mechanism to develop self evolving expert systems (SEES) based on data mining (DM), fuzzy neural networks (FNN), and relational database (RDB)-driven forward/backward inference engine. Most former researchers tried to develop a text-oriented knowledge base (KB) and inference engine (IE). However, thy have some limitations such as 1) automatic rule extraction, 2) manipulation of ambiguousness in knowledge, 3) expandability of knowledge base, and 4) speed of inference. To overcome these limitations, many of researchers had tried to develop an automatic knowledge extraction and refining mechanisms. As a result, the adaptability of the expert systems was improved. Nonetheless, they didn't suggest a hybrid and generalized solution to develop self-evolving expert systems. To this purpose, in this study, we propose an automatic knowledge acquisition and composite inference mechanism based on DM, FNN, and RDB-driven inference. Our proposed mechanism has five advantages empirically. First, it could extract and reduce the specific domain knowledge from incomplete database by using data mining algorithm. Second, our proposed mechanism could manipulate the ambiguousness in knowledge by using fuzzy membership functions. Third, it could construct the relational knowledge base and expand the knowledge base unlimitedly with RDBMS (relational database management systems). Fourth, our proposed hybrid data mining mechanism can reflect both association rule-based logical inference and complicate fuzzy logic. Fifth, RDB-driven forward and backward inference is faster than the traditional text-oriented inference.

  • PDF

Research Trend Analysis for Sustainable QR code use - Focus on Big Data Analysis

  • Lee, Eunji;Jang, Jikyung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3221-3242
    • /
    • 2021
  • The purpose of the study is to examine the current study trend of 'QR code' and suggest a direction for the future study of big data analysis: (1) Background: study trend of 'QR code' and analysis of the text by subject field and year; (2) Methodology: data scraping and collection, EXCEL summary, and preprocess and big data analysis by R x 64 4.0.2 program package; (3) the findings: first, the trend showed a continuous increase in 'QR code' studies in general and the findings were applied in various fields. Second, the analysis of frequent keywords showed somewhat different results by subject field and year, but the overall results were similar. Third, the visualization of the frequent keywords also showed similar results as that of frequent keyword analysis; and (4) the conclusions: in general, 'QR code' studies are used in various fields, and the trend is likely to increase in the future as well. And the findings of this study are a reflection that 'QR code' is an aspect of our social and cultural phenomena, so that it is necessary to think that 'QR code' is a tool and an application of information. An expansion of the scope of the analysis is expected to show us more meaningful indications on 'QR code' study trends and development potential.

Analysis of the supportive care needs of the parents of preterm children in South Korea using big data text-mining: Topic modeling

  • Park, Ji Hyeon;Lee, Hanna;Cho, Haeryun
    • Child Health Nursing Research
    • /
    • v.27 no.1
    • /
    • pp.34-42
    • /
    • 2021
  • Purpose: The purpose of this study was to identify the supportive care needs of parents of preterm children in South Korea using text data from a portal site. Methods: In total, 628 online newspaper articles and 1,966 social network service posts published between January 1 and December 31, 2019 were analyzed. The procedures in this study were conducted in the following order: keyword selection, data collection, morpheme analysis, keyword analysis, and topic modeling. Results: The term "yirundung-yi", which is a native Korean word referring to premature infants, was confirmed to be a useful term for parents. The following four topics were identified as the supportive care needs of parents of preterm children: 1) a vague fear of caring for a baby upon imminent neonatal intensive care unit discharge, 2) real-world difficulties encountered while caring for preterm children, 3) concerns about growth and development problems, and 4) anxiety about possible complications. Conclusion: Supportive care interventions for parents of preterm children should include general parenting methods for babies. A team composed of multidisciplinary experts must support the individual growth and development of preterm children and manage the complications of prematurity using highly accessible media.

A Study of Consumer Perception on Freediving Suits Utilizing Big Data Analysis (빅데이터 분석을 활용한 프리다이빙 슈트에 대한 소비자 인식 연구)

  • Ji-Eun Kim;Eunyoung Lee
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.26 no.2
    • /
    • pp.87-99
    • /
    • 2024
  • Freediving, an underwater leisure sport that involves diving without the use of a breathing apparatus, has gained popularity among younger demographics through the viral spread of images and videos on social media platforms. This study employs prominent Big Data analysis techniques, including text mining, Latent Dirichlet Allocation (LDA) topic analysis, and opinion mining to explore the keywords associated with freediving suits over the past five years. The research aims to analyze the rapidly evolving market trends of freediving suits and the increasingly complex and diverse consumer perceptions to provide foundational data for activating the freediving suit market and developing strategies for sustained growth. The study identified the keyword 'size' related to freediving suits and conducted opinion mining on 'freediving suit sizes'. Although the results showed a higher positive than negative sentiment, negative keywords were also extracted, indicating the need to understand and mitigate the negative factors associated with 'size'. The findings offer vital guidelines for the advancement of the freediving suit market and enhancing consumer satisfaction. This study is important as it contributes foundational data for continuous growth strategies of the freediving suit market.

Improving on Matrix Factorization for Recommendation Systems by Using a Character-Level Convolutional Neural Network (문자 수준 컨볼루션 뉴럴 네트워크를 이용한 추천시스템에서의 행렬 분해법 개선)

  • Son, Donghee;Shim, Kyuseok
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.2
    • /
    • pp.93-98
    • /
    • 2018
  • Recommendation systems are used to provide items of interests for users to maximize a company's profit. Matrix factorization is frequently used by recommendation systems, based on an incomplete user-item rating matrix. However, as the number of items and users increase, it becomes difficult to make accurate recommendations due to the sparsity of data. To overcome this drawback, the use of text data related to items was recently suggested for matrix factorization algorithms. Furthermore, a word-level convolutional neural network was shown to be effective in the process of extracting the word-level features from the text data among these kinds of matrix factorization algorithms. However, it involves a large number of parameters to learn in the word-level convolutional neural network. Thus, we propose a matrix factorization algorithm which utilizes a character-level convolutional neural network with which to extract the character-level features from the text data. We also conducted a performance study with real-life datasets to show the effectiveness of the proposed matrix factorization algorithm.

Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm (텍스트 마이닝 기법을 활용한 기후변화관련 식품분야 논문초록 분석)

  • Bae, Kyu Yong;Park, Ju-Hyun;Kim, Jeong Seon;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1429-1437
    • /
    • 2013
  • Research articles in food related to climate change were analyzed by implementing a text-mining algorithm, which is one of nonstructural data analysis tools in big data analysis with a focus on frequencies of terms appearing in the abstracts. As a first step, a term-document matrix was established, followed by implementing a hierarchical clustering algorithm based on dissimilarities among the selected terms and expertise in the field to classify the documents under consideration into a few labeled groups. Through this research, we were able to find out important topics appearing in the field of food related to climate change and their trends over past years. It is expected that the results of the article can be utilized for future research to make systematic responses and adaptation to climate change.