• 제목/요약/키워드: Korean Natural Language Processing

Search Result 513, Processing Time 0.024 seconds

CRPN (Customer-oriented Risk Priority Number): RPN Evaluation Method Based on Customer Opinion through SNS Opinion Mining (CRPN(Customer-oriented Risk Priority Number): SNS 오피니언 마이닝을 활용한 고객 의견 기반의 RPN 평가 기법)

  • Yoo, In-Hyeok;Kang, Won-Kyung;Choi, Kyu-Nam;Park, Ji-Yun;Lee, Geon-Ju;Kang, Sung-Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.1
    • /
    • pp.97-108
    • /
    • 2019
  • Purpose: The purpose of this study is to propose a new Risk Priority Number(RPN) evaluation method which analyzes value of product functions by mining customer opinions in Social Network Service(SNS). Methods: A traditional RPN is measured by three evaluation standards (Severity, Occurrence, Detection) which are analyzed by manufacturing engineers and researchers. On the other hand, these standards are analyzed by customers' viewpoints through SNS opinion mining in this research. In order to extract customer feedbacks from textual data sets, the methodology in this paper implies natural language processing, hereby collecting product related data sets and analyzing the opinions automatically. An emotional polarity of an opinion indicates severity, while the number of negative opinion shows occurrence, and the entire number of customer opinion refers to detection. Results: The results of this study are as follows; As a result of the CRPN evaluation, it is confirmed that the features evaluated as risky are highly likely to be improved in the next series. Therefore, CRPN is an effective risk assessment model that reflects customer feedback. Conclusion: Reflecting customer feedback is a useful tool for risk assessment of the product as well as for developing new products and improving existing products.

Research Trends of Ergonomics in Occupational Safety and Health through MEDLINE Search: Focus on Abstract Word Modeling using Word Embedding (MEDLINE 검색을 통한 산업안전보건 분야에서의 인간공학 연구동향 : 워드임베딩을 활용한 초록 단어 모델링을 중심으로)

  • Kim, Jun Hee;Hwang, Ui Jae;Ahn, Sun Hee;Gwak, Gyeong Tae;Jung, Sung Hoon
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.5
    • /
    • pp.61-70
    • /
    • 2021
  • This study aimed to analyze the research trends of the abstract data of ergonomic studies registered in MEDLINE, a medical bibliographic database, using word embedding. Medical-related ergonomic studies mainly focus on work-related musculoskeletal disorders, and there are no studies on the analysis of words as data using natural language processing techniques, such as word embedding. In this study, the abstract data of ergonomic studies were extracted with a program written with selenium and BeutifulSoup modules using python. The word embedding of the abstract data was performed using the word2vec model, after which the data found in the abstract were vectorized. The vectorized data were visualized in two dimensions using t-Distributed Stochastic Neighbor Embedding (t-SNE). The word "ergonomics" and ten of the most frequently used words in the abstract were selected as keywords. The results revealed that the most frequently used words in the abstract of ergonomics studies include "use", "work", and "task". In addition, the t-SNE technique revealed that words, such as "workplace", "design", and "engineering," exhibited the highest relevance to ergonomics. The keywords observed in the abstract of ergonomic studies using t-SNE were classified into four groups. Ergonomics studies registered with MEDLINE have investigated the risk factors associated with workers performing an operation or task using tools, and in this study, ergonomics studies were identified by the relationship between keywords using word embedding. The results of this study will provide useful and diverse insights on future research direction on ergonomic studies.

Modified multi-sense skip-gram using weighted context and x-means (가중 문맥벡터와 X-means 방법을 이용한 변형 다의어스킵그램)

  • Jeong, Hyunwoo;Lee, Eun Ryung
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.389-399
    • /
    • 2021
  • In recent years, word embedding has been a popular field of natural language processing research and a skip-gram has become one successful word embedding method. It assigns a word embedding vector to each word using contexts, which provides an effective way to analyze text data. However, due to the limitation of vector space model, primary word embedding methods assume that every word only have a single meaning. As one faces multi-sense words, that is, words with more than one meaning, in reality, Neelakantan (2014) proposed a multi-sense skip-gram (MSSG) to find embedding vectors corresponding to the each senses of a multi-sense word using a clustering method. In this paper, we propose a modified method of the MSSG to improve statistical accuracy. Moreover, we propose a data-adaptive choice of the number of clusters, that is, the number of meanings for a multi-sense word. Some numerical evidence is given by conducting real data-based simulations.

Automatic Classification of Academic Articles Using BERT Model Based on Deep Learning (딥러닝 기반의 BERT 모델을 활용한 학술 문헌 자동분류)

  • Kim, In hu;Kim, Seong hee
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.293-310
    • /
    • 2022
  • In this study, we analyzed the performance of the BERT-based document classification model by automatically classifying documents in the field of library and information science based on the KoBERT. For this purpose, abstract data of 5,357 papers in 7 journals in the field of library and information science were analyzed and evaluated for any difference in the performance of automatic classification according to the size of the learned data. As performance evaluation scales, precision, recall, and F scale were used. As a result of the evaluation, subject areas with large amounts of data and high quality showed a high level of performance with an F scale of 90% or more. On the other hand, if the data quality was low, the similarity with other subject areas was high, and there were few features that were clearly distinguished thematically, a meaningful high-level performance evaluation could not be derived. This study is expected to be used as basic data to suggest the possibility of using a pre-trained learning model to automatically classify the academic documents.

Identification of User Preference Factor Using Review Information (리뷰 정보를 활용한 이용자의 선호요인 식별에 관한 연구)

  • Song, Sungjeon;Shim, Jiyoung
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.311-336
    • /
    • 2022
  • This study analyzed the contents of Goodreads review data, which is a social cataloging service with the participation of book users around the world, to identify the preference factors that affect book users' book recommendations in the library information service environment. To understand user preferences from a more detailed point of view, sub-datasets for each rating group, each book, and each user were constructed in the sample selection process. Stratified sampling was also performed based on the result of topic modeling of review text data to include various topics. As a result, a total of 90 preference factors belonging to 7 categories('Content', 'Character', 'Writing', 'Reading', 'Author', 'Story', 'Form') were identified. Also, the general preference factors revealed according to the ratings, as well as the patterns of preference factors revealed in books and users with clear likes and dislikes were identified. The results of this study are expected to contribute to more sophisticated recommendations in future recommendation systems by identifying specific aspects of user preference factors.

Research on Overseas Trends and Emerging Topics in Field of Library and Information Science (문헌정보학분야 해외 연구 동향 및 유망 주제 분석 연구)

  • Bon Jin Koo;Durk Hyun Chang
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.71-96
    • /
    • 2023
  • This study aimed to investigate key research areas in the field of Library and Information Science (LIS) by analyzing trends and identifying emerging topics. To facilitate the research, a collection of 40,897 author keywords from 11,252 papers published in the past 30 years (1993-2022) in five journals was gathered. In addition, keyword analysis, as well as Principal Component Analysis (PCA) and correlation analysis were conducted, utilizing variables such as the number of articles, number of authors, ratio of co-authored papers, and cited counts. The findings of the study suggest that two topics are likely to develop as promising research areas in LIS in the future: machine learning/algorithm and research impact. Furthermore, it is anticipated that future research will focus on topics such as social media and big data, natural language processing, research trends, and research assessment, as they are expected to emerge as prominent areas of study.

Re-defining Named Entity Type for Personal Information De-identification and A Generation method of Training Data (개인정보 비식별화를 위한 개체명 유형 재정의와 학습데이터 생성 방법)

  • Choi, Jae-hoon;Cho, Sang-hyun;Kim, Min-ho;Kwon, Hyuk-chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.206-208
    • /
    • 2022
  • As the big data industry has recently developed significantly, interest in privacy violations caused by personal information leakage has increased. There have been attempts to automate this through named entity recognition in natural language processing. In this paper, named entity recognition data is constructed semi-automatically by identifying sentences with de-identification information from de-identification information in Korean Wikipedia. This can reduce the cost of learning about information that is not subject to de-identification compared to using general named entity recognition data. In addition, it has the advantage of minimizing additional systems based on rules and statistics to classify de-identification information in the output. The named entity recognition data proposed in this paper is classified into twelve categories. There are included de-identification information, such as medical records and family relationships. In the experiment using the generated dataset, KoELECTRA showed performance of 0.87796 and RoBERTa of 0.88.

  • PDF

Recent Research Trend Analysis for the Journal of Society of Korea Industrial and Systems Engineering Using Topic Modeling (토픽모델링을 활용한 한국산업경영시스템학회지의 최근 연구주제 분석)

  • Dong Joon Park;Pyung Hoi Koo;Hyung Sool Oh;Min Yoon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.170-185
    • /
    • 2023
  • The advent of big data has brought about the need for analytics. Natural language processing (NLP), a field of big data, has received a lot of attention. Topic modeling among NLP is widely applied to identify key topics in various academic journals. The Korean Society of Industrial and Systems Engineering (KSIE) has published academic journals since 1978. To enhance its status, it is imperative to recognize the diversity of research domains. We have already discovered eight major research topics for papers published by KSIE from 1978 to 1999. As a follow-up study, we aim to identify major topics of research papers published in KSIE from 2000 to 2022. We performed topic modeling on 1,742 research papers during this period by using LDA and BERTopic which has recently attracted attention. BERTopic outperformed LDA by providing a set of coherent topic keywords that can effectively distinguish 36 topics found out this study. In terms of visualization techniques, pyLDAvis presented better two-dimensional scatter plots for the intertopic distance map than BERTopic. However, BERTopic provided much more diverse visualization methods to explore the relevance of 36 topics. BERTopic was also able to classify hot and cold topics by presenting 'topic over time' graphs that can identify topic trends over time.

Verification on stock return predictability of text in analyst reports (애널리스트 보고서 텍스트의 주가예측력에 대한 검증)

  • Young-Sun Lee;Akihiko Yamada;Cheol-Won Yang;Hohsuk Noh
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.5
    • /
    • pp.489-499
    • /
    • 2023
  • As sharing of analyst reports became widely available, reports generated by analysts have become a useful tool to reduce difference in financial information between market participants. The quantitative information of analyst reports has been used in many ways to predict stock returns. However, there are relatively few domestic studies on the prediction power of text information in analyst reports to predict stock returns. We test stock return predictability of text in analyst reports by creating variables representing the TONE from the text. To overcome the limitation of the linear-model-assumption-based approach, we use the random-forest-based F-test.

Analyzing employment trends in response to AI exposure: K-shaped labor polarization in Korea (인공지능 노출 정도에 따른 고용 추세 분석: K자형 고용 양극화)

  • Lee, Yeseul;Hwang, Hyeonjun
    • Informatization Policy
    • /
    • v.30 no.3
    • /
    • pp.69-91
    • /
    • 2023
  • The impact of technological advancements on employment is a matter of ongoing debate, with discussions on the effects of AI technology development on employment being particularly scarce. This study employs the natural language processing technique (SBERT) and patents to calculate an occupation-based AI exposure score and to analyze employment trends by group. It proposes a method for calculating the AI exposure score based on the similarity between Korean patent information and US job descriptions and linking SOC(U.S.) and KSCO(Korea). The analysis of domestic AI patent applications and regional employment data in the KOSIS Database since 2013 reveals a K-shaped polarization pattern in Korean employment trends among groups with above and below average levels of AI exposure.