• Title/Summary/Keyword: 텍스트 연구

Search Result 3,494, Processing Time 0.029 seconds

A Study on the Effectiveness of Emotional Communication According to Types of Emoticon - Focusing on the Differences in Gender and Major of the Receiver - (이모티콘 유형에 따른 감정소통의 효과성 연구 - 수신자의 성별 및 전공계열별 차이를 중심으로 -)

  • Kang, Jung Ae;Kim, Hyun Ji;Lee, Sang Soo
    • Design Convergence Study
    • /
    • v.15 no.4
    • /
    • pp.45-58
    • /
    • 2016
  • The purpose of this study is to investigate the most effective emoticon type in on-line communication context through analysis decoding(by their interpretation, empathy, reaction) of receiver about emotional message included the various emoticon types. Message types were all 5 - only text message and messages included texticon, graphicon, anicon, and photocon that reflected the transitional process of emoticon. Survey questionnaire that included various emotional situations was developed and utilized to undergraduate students to analyze the differences in their gender and majors. Results are as follow. First, the graphicon, anicon and photocon messages had higher effectiveness than others in the pleasure while the text only message had the highest effectiveness of them in the displeasure. Second, female students responded that the graphicon, anicon and photocon messages were more effective while male students responded that text only message was. Third, between Arts/Physical and Science/Engineering majors had significant differences in some message types, and especially Science/Engineering majors showed higher average than other majors in all of the emoticon types. These results can provide the information to design messages by the emotional situation of sender and gender and major of receiver.

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Customer Voices in Telehealth: Constructing Positioning Maps from App Reviews (고객 리뷰를 통한 모바일 앱 서비스 포지셔닝 분석: 비대면 진료 앱을 중심으로)

  • Minjae Kim;Hong Joo Lee
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.69-90
    • /
    • 2023
  • The purpose of this study is to evaluate the service attributes and consumer reactions of telemedicine apps in South Korea and visualize their differentiation by constructing positioning maps. We crawled 23,219 user reviews of 6 major telemedicine apps in Korea from the Google Play store. Topics were derived by BERTopic modeling, and sentiment scores for each topic were calculated through KoBERT sentiment analysis. As a result, five service characteristics in the application attribute category and three in the medical service category were derived. Based on this, a two-dimensional positioning map was constructed through principal component analysis. This study proposes an objective service evaluation method based on text mining, which has implications. In sum, this study combines empirical statistical methods and text mining techniques based on user review texts of telemedicine apps. It presents a system of service attribute elicitation, sentiment analysis, and product positioning. This can serve as an effective way to objectively diagnose the service quality and consumer responses of telemedicine applications.

Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning (프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론)

  • Jinmyung Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.165-184
    • /
    • 2023
  • Recently, Deep learning analysis of unstructured text data using language models, such as Google's BERT and OpenAI's GPT has shown remarkable results in various applications. Most language models are used to learn generalized linguistic information from pre-training data and then update their weights for downstream tasks through a fine-tuning process. However, some concerns have been raised that privacy may be violated in the process of using these language models, i.e., data privacy may be violated when data owner provides large amounts of data to the model owner to perform fine-tuning of the language model. Conversely, when the model owner discloses the entire model to the data owner, the structure and weights of the model are disclosed, which may violate the privacy of the model. The concept of offsite tuning has been recently proposed to perform fine-tuning of language models while protecting privacy in such situations. But the study has a limitation that it does not provide a concrete way to apply the proposed methodology to text classification models. In this study, we propose a concrete method to apply offsite tuning with an additional classifier to protect the privacy of the model and data when performing multi-classification fine-tuning on Korean documents. To evaluate the performance of the proposed methodology, we conducted experiments on about 200,000 Korean documents from five major fields, ICT, electrical, electronic, mechanical, and medical, provided by AIHub, and found that the proposed plug-in model outperforms the zero-shot model and the offsite model in terms of classification accuracy.

Domain-Specific Terminology Mapping Methodology Using Supervised Autoencoders (지도학습 오토인코더를 이용한 전문어의 범용어 공간 매핑 방법론)

  • Byung Ho Yoon;Junwoo Kim;Namgyu Kim
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.93-110
    • /
    • 2023
  • Recently, attempts have been made to convert unstructured text into vectors and to analyze vast amounts of natural language for various purposes. In particular, the demand for analyzing texts in specialized domains is rapidly increasing. Therefore, studies are being conducted to analyze specialized and general-purpose documents simultaneously. To analyze specific terms with general terms, it is necessary to align the embedding space of the specific terms with the embedding space of the general terms. So far, attempts have been made to align the embedding of specific terms into the embedding space of general terms through a transformation matrix or mapping function. However, the linear transformation based on the transformation matrix showed a limitation in that it only works well in a local range. To overcome this limitation, various types of nonlinear vector alignment methods have been recently proposed. We propose a vector alignment model that matches the embedding space of specific terms to the embedding space of general terms through end-to-end learning that simultaneously learns the autoencoder and regression model. As a result of experiments with R&D documents in the "Healthcare" field, we confirmed the proposed methodology showed superior performance in terms of accuracy compared to the traditional model.

A Study on the Countmeasures of the Korean Pharmaceutical/Bio Industry to the EU Corporate Sustainability Due Diligence Directive, by using Text Mining (텍스트 마이닝을 활용한 국내 제약·바이오 업종의 EU 공급망 실사법 대응 방안 연구)

  • Sori Kim;Joonhak Ki
    • Information Systems Review
    • /
    • v.26 no.1
    • /
    • pp.93-117
    • /
    • 2024
  • In February 2022, the EU announced a draft of the EU Corporate Sustainability Due Diligence Directive requiring due diligence and disclosure of information on environmental and human rights risks in corporate supply chains. This study evaluated the ability of 13 Korean pharmaceutical/bio companies to respond to the EU's demand for due diligence in the supply chain and compared it to 13 globally leading pharmaceutical/bio companies which are considered good in environmental and human rights risk management. For comparative analysis, text mining analysis was performed using R. Basic word frequency and concurrent words were analyzed and topic modeling was performed by applying Latent Dirichlet Allocation. As a result of the analysis, it was found that compared to advanced companies, domestic pharmaceutical and bio companies lack negative issue reporting and identification systems and supply chain due diligence implementation processes, and require advancement of data management for environmental and human rights information disclosure. Accordingly, domestic pharmaceutical and bio companies need to prepare differentiated support measures to systematically identify and reduce risks in the supply chain of small and medium-sized businesses beyond simply providing financial support. It is also desirable for the government to provide policy support by mandating Korea's own supply chain environment and human rights due diligence system, along with support for strengthening the ability to respond to due diligence of domestic pharmaceutical and bio companies, such as expert consulting and financial support.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

A Study on the Documentation Related to Mugeuk-do: Focusing on Its Comparison and Historical Evidence (무극도 관련 문헌 연구 - 비교 및 고증을 중심으로 -)

  • Park Sang-kyu
    • Journal of the Daesoon Academy of Sciences
    • /
    • v.41
    • /
    • pp.27-61
    • /
    • 2022
  • Documentation related to Mugeuk-do (Limitless Dao) is rare in comparison to other Korean new religions given that it has been open to the public and translated since the 1970s. Due to its rarity, the documentation has been used uncritically, without there being any comparative study or historical research. It is undeniable that distortions and fallacies are embedded in these documents, and this has resulted in quite a few problems in precisely understanding Mugeuk-do and Daesoon Jinrihoe (The Fellowship of Daesoon Truth), an order that has inherited the legacy of Mugeuk-do. In this regard, this study aims to critically define the characteristics and limitations of the major documents related to Mugeuk-do that were published by the colonial government in the 1920s~1930s and recorded by multiple orders in the 1970s-1980s through comparisons. An attempt to conduct this research allows for the discovery of a solution to the problem of uncritical usage of those materials. The documents produced by the colonial government that can be used as basic texts to study Mugeuk-do are The General Conditions of the Religion Mugeuk-do (無極大道敎槪況) and Unofficial Religions of the Korea (朝鮮の類似宗敎). These can be found through bibliography, comparison, and historical research. Chapters 6, 7, and 8 of The General Conditions of the Religion Mugeuk-do are a possible source on the order that reflects the circumstances of Mugeuk-do until 1925. In the case of Unofficial Religions of the Korea, if the descriptive perspective on unofficial religions is excluded, the articles written about the circumstances post 1925 have credibility. Another document that describes multiple orders and can be used as a basic text is chapter 2 of 'Progress of the Order' in Daesoon Jinrihoe's The Canonical Scripture. This is because its record precisely reflects the conditions of the era, with regard to the fact that it is the freest from distortions caused by changes in the belief system and it is less biased towards certain sects or denominations. Furthermore, the collection period of the articles is the earliest. Accordingly, as basic texts, Chapters 6, 7, and 8 of The General Conditions of the Religion Mugeuk-do and the articles from Unofficial Religions of the Korea after 1925, as well as chapter 2 of 'Progress of the Order' in The Canonical Scripture are appropriate for studying Mugeuk-do. In addition, Overview of Bocheonism, History of Jeungsan-gyo, and The True Scripture of the Great Ultimate can be utilized as references after removing distortions and fallacies through comparative study. Henceforth, relevant documents should be utilized to establish comprehensive data on Mugeuk-do through comparative and historical research.

The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method (그래프 기반 준지도 학습 방법을 이용한 특정분야 감성사전 구축)

  • Kim, Jung-Ho;Oh, Yean-Ju;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.18 no.1
    • /
    • pp.103-110
    • /
    • 2015
  • Sentiment lexicon is an essential element for expressing sentiment on a text or recognizing sentiment from a text. We propose a graph-based semi-supervised learning method to construct a sentiment dictionary as sentiment lexicon set. In particular, we focus on the construction of domain-specific sentiment dictionary. The proposed method makes up a graph according to lexicons and proximity among lexicons, and sentiments of some lexicons which already know their sentiment values are propagated throughout all of the lexicons on the graph. There are two typical types of the sentiment lexicon, sentiment words and sentiment phrase, and we construct a sentiment dictionary by creating each graph of them and infer sentiment of all sentiment lexicons. In order to verify our proposed method, we constructed a sentiment dictionary specific to the movie domain, and conducted sentiment classification experiments with it. As a result, it have been shown that the classification performance using the sentiment dictionary is better than the other using typical general-purpose sentiment dictionary.

Assessment Process Design for Python Programming Learning (파이선(Python) 학습을 위한 평가 프로세스 설계)

  • Ko, Eunji;Lee, Jeongmin
    • Journal of The Korean Association of Information Education
    • /
    • v.24 no.1
    • /
    • pp.117-129
    • /
    • 2020
  • The purpose of this paper is to explore ways to assess computational thinking from a formative perspective and to design a process for assessing programming learning using Python. Therefore, this study explored the computational thinking domain and analyzed research related to assessment design. Also, this study identified the areas of Python programming learning that beginners learn and the areas of computational thinking ability that can be obtained through Python learning. Through this, we designed an assessment method that provides feedback by analyzing syntax corresponding to computational thinking ability. Besides, self-assessment is possible through reflective thinking by using the flow-chart and pseudo-code to express ideas, and peer feedback is designed through code sharing and communication using community.