• Title/Summary/Keyword: 자연어 처리 연구

Search Result 648, Processing Time 0.028 seconds

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

Art transaction using big data Artist analysis system implementation (미술품 거래 빅데이터를 이용한 작가 분석 시스템 구현)

  • SeungKyung Lee;JongTae Lim
    • Journal of Service Research and Studies
    • /
    • v.11 no.2
    • /
    • pp.79-93
    • /
    • 2021
  • The size of the domestic art market has increased 21.9% over the past five years as of 2018 to KRW 448.2 billion and the number of transactions has also increased 31.6% to 39,367 points maintaining growth for the fifth consecutive year. Art distribution platforms are diversifying from galleries and auction-style offline to online auctions. The art market consists of three areas: production (creation), distribution (trade), and consumption (buying) of works and as the perception of artistic value as well as economic value spreads interest is also increasing as a means of investment. Consumers who purchase works and think of them as a means of investment technology have an increased need for objective information about their works, but there is a limit to collecting and analyzing objective and reliable statistics because information provision in the art market distribution area is closed and unbalanced. This paper identifies objective and reliable art distribution status and status through big data collection and structured and unstructured data analysis on art market distribution areas. Through this, we want to implement a system that can objectively provide analysis of authors in the current market. This study collected author information from art distribution sites and calculated the frequency of associated words by writer by collecting and analyzing the author's articles from Maeil Business, a daily newspaper. It aims to provide consumers with objective and reliable information.

Development and Validation of the Letter-unit based Korean Sentimental Analysis Model Using Convolution Neural Network (회선 신경망을 활용한 자모 단위 한국형 감성 분석 모델 개발 및 검증)

  • Sung, Wonkyung;An, Jaeyoung;Lee, Choong C.
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.13-33
    • /
    • 2020
  • This study proposes a Korean sentimental analysis algorithm that utilizes a letter-unit embedding and convolutional neural networks. Sentimental analysis is a natural language processing technique for subjective data analysis, such as a person's attitude, opinion, and propensity, as shown in the text. Recently, Korean sentimental analysis research has been steadily increased. However, it has failed to use a general-purpose sentimental dictionary and has built-up and used its own sentimental dictionary in each field. The problem with this phenomenon is that it does not conform to the characteristics of Korean. In this study, we have developed a model for analyzing emotions by producing syllable vectors based on the onset, peak, and coda, excluding morphology analysis during the emotional analysis procedure. As a result, we were able to minimize the problem of word learning and the problem of unregistered words, and the accuracy of the model was 88%. The model is less influenced by the unstructured nature of the input data and allows for polarized classification according to the context of the text. We hope that through this developed model will be easier for non-experts who wish to perform Korean sentimental analysis.

A Method for Spelling Error Correction in Korean Using a Hangul Edit Distance Algorithm (한글 편집거리 알고리즘을 이용한 한국어 철자오류 교정방법)

  • Bak, Seung Hyeon;Lee, Eun Ji;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.6 no.1
    • /
    • pp.16-21
    • /
    • 2017
  • Long time has passed since computers which used to be a means of research were commercialized and available for the general public. People used writing instruments to write before computer was commercialized. However, today a growing number of them are using computers to write instead. Computerized word processing helps write faster and reduces fatigue of hands than writing instruments, making it better fit to making long texts. However, word processing programs are more likely to cause spelling errors by the mistake of users. Spelling errors distort the shape of words, making it easy for the writer to find and correct directly, but those caused due to users' lack of knowledge or those hard to find may make it almost impossible to produce a document free of spelling errors. However, spelling errors in important documents such as theses or business proposals may lead to falling reliability. Consequently, it is necessary to conduct research on high-level spelling error correction programs for the general public. This study was designed to produce a system to correct sentence-level spelling errors to normal words with Korean alphabet similarity algorithm. On the basis of findings reported in related literatures that corrected words are significantly similar to misspelled words in form, spelling errors were extracted from a corpus. Extracted corrected words were replaced with misspelled ones to correct spelling errors with spelling error detection algorithm.

Can Online Community Managers Enhance User Engagement?: Evidence from Anonymous Social Media Postings (온라인 커뮤니티 이용자 참여 증진을 위한 관리자의 운영 전략: 대학별 대나무숲 분석을 중심으로)

  • Kim, Hyejeong;Hwang, Seungyeup;Kwak, Youshin;Choi, Jeonghye
    • Knowledge Management Research
    • /
    • v.23 no.2
    • /
    • pp.211-228
    • /
    • 2022
  • As social media marketing becomes prevalent, it is necessary to understand the administrative role of managers in promoting user engagement. However, little is known about how community managers enhance user engagement in social media. In this research, we study how managers can boost online user participation, including clicking likes and writing comments. Using the SUR (Seemingly Unrelated Regression) model, we find out that the active participation of managers increases user engagement of both passive (likes) and active (comments) ones. In addition, we find that the number of emotional words included in posts has a positive effect on the passive engagement whereas it negatively affects the active engagement. Lastly, the congruency between posts and comments positively affects users' passive engagement. This study contributes to prior literature related to online community management and text analyses. Furthermore, our findings offer managerial insights for practitioners and social media managers to further facilitate user engagement.

Analysis of Resident's Satisfaction and Its Determining Factors on Residential Environment: Using Zigbang's Apartment Review Bigdata and Deeplearning-based BERT Model (주거환경에 대한 거주민의 만족도와 영향요인 분석 - 직방 아파트 리뷰 빅데이터와 딥러닝 기반 BERT 모형을 활용하여 - )

  • Kweon, Junhyeon;Lee, Sugie
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.2
    • /
    • pp.47-61
    • /
    • 2023
  • Satisfaction on the residential environment is a major factor influencing the choice of residence and migration, and is directly related to the quality of life in the city. As online services of real estate increases, people's evaluation on the residential environment can be easily checked and it is possible to analyze their satisfaction and its determining factors based on their evaluation. This means that a larger amount of evaluation can be used more efficiently than previously used methods such as surveys. This study analyzed the residential environment reviews of about 30,000 apartment residents collected from 'Zigbang', an online real estate service in Seoul. The apartment review of Zigbang consists of an evaluation grade on a 5-point scale and the evaluation content directly described by the dweller. At first, this study labeled apartment reviews as positive and negative based on the scores of recommended reviews that include comprehensive evaluation about apartment. Next, to classify them automatically, developed a model by using Bidirectional Encoder Representations from Transformers(BERT), a deep learning-based natural language processing model. After that, by using SHapley Additive exPlanation(SHAP), extract word tokens that play an important role in the classification of reviews, to derive determining factors of the evaluation of the residential environment. Furthermore, by analyzing related keywords using Word2Vec, priority considerations for improving satisfaction on the residential environment were suggested. This study is meaningful that suggested a model that automatically classifies satisfaction on the residential environment into positive and negative by using apartment review big data and deep learning, which are qualitative evaluation data of residents, so that it's determining factors were derived. The result of analysis can be used as elementary data for improving the satisfaction on the residential environment, and can be used in the future evaluation of the residential environment near the apartment complex, and the design and evaluation of new complexes and infrastructure.

Understanding of Generative Artificial Intelligence Based on Textual Data and Discussion for Its Application in Science Education (텍스트 기반 생성형 인공지능의 이해와 과학교육에서의 활용에 대한 논의)

  • Hunkoog Jho
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.307-319
    • /
    • 2023
  • This study aims to explain the key concepts and principles of text-based generative artificial intelligence (AI) that has been receiving increasing interest and utilization, focusing on its application in science education. It also highlights the potential and limitations of utilizing generative AI in science education, providing insights for its implementation and research aspects. Recent advancements in generative AI, predominantly based on transformer models consisting of encoders and decoders, have shown remarkable progress through optimization of reinforcement learning and reward models using human feedback, as well as understanding context. Particularly, it can perform various functions such as writing, summarizing, keyword extraction, evaluation, and feedback based on the ability to understand various user questions and intents. It also offers practical utility in diagnosing learners and structuring educational content based on provided examples by educators. However, it is necessary to examine the concerns regarding the limitations of generative AI, including the potential for conveying inaccurate facts or knowledge, bias resulting from overconfidence, and uncertainties regarding its impact on user attitudes or emotions. Moreover, the responses provided by generative AI are probabilistic based on response data from many individuals, which raises concerns about limiting insightful and innovative thinking that may offer different perspectives or ideas. In light of these considerations, this study provides practical suggestions for the positive utilization of AI in science education.

Establishment of Risk Database and Development of Risk Classification System for NATM Tunnel (NATM 터널 공정리스크 데이터베이스 구축 및 리스크 분류체계 개발)

  • Kim, Hyunbee;Karunarathne, Batagalle Vinuri;Kim, ByungSoo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.25 no.1
    • /
    • pp.32-41
    • /
    • 2024
  • In the construction industry, not only safety accidents, but also various complex risks such as construction delays, cost increases, and environmental pollution occur, and management technologies are needed to solve them. Among them, process risk management, which directly affects the project, lacks related information compared to its importance. This study tried to develop a MATM tunnel process risk classification system to solve the difficulty of risk information retrieval due to the use of different classification systems for each project. Risk collection used existing literature review and experience mining techniques, and DB construction utilized the concept of natural language processing. For the structure of the classification system, the existing WBS structure was adopted in consideration of compatibility of data, and an RBS linked to the work species of the WBS was established. As a result of the research, a risk classification system was completed that easily identifies risks by work type and intuitively reveals risk characteristics and risk factors linked to risks. As a result of verifying the usability of the established classification system, it was found that the classification system was effective as risks and risk factors for each work type were easily identified by user input of keywords. Through this study, it is expected to contribute to preventing an increase in cost and construction period by identifying risks according to work types in advance when planning and designing NATM tunnels and establishing countermeasures suitable for those factors.

Development and Efficacy Validation of an ICF-Based Chatbot System to Enhance Community Participation of Elderly Individuals with Mild Dementia in South Korea (우리나라 경도 치매 노인의 지역사회 참여 증진을 위한 ICF 기반 Decision Tree for Chatbot 시스템 개발과 효과성 검증)

  • Haewon Byeon
    • Journal of Advanced Technology Convergence
    • /
    • v.3 no.3
    • /
    • pp.17-27
    • /
    • 2024
  • This study focuses on the development and evaluation of a chatbot system based on the International Classification of Functioning, Disability, and Health (ICF) framework to enhance community participation among elderly individuals with mild dementia in South Korea. The study involved 12 elderly participants who were living alone and had been diagnosed with mild dementia, along with 15 caregivers who were actively involved in their daily care. The development process included a comprehensive needs assessment, system design, content creation, natural language processing using Transformer Attention Algorithm, and usability testing. The chatbot is designed to offer personalized activity recommendations, reminders, and information that support physical, social, and cognitive engagement. Usability testing revealed high levels of user satisfaction and perceived usefulness, with significant improvements in community activities and social interactions. Quantitative analysis showed a 92% increase in weekly community activities and an 84% increase in social interactions. Qualitative feedback highlighted the chatbot's user-friendly interface, relevance of suggested activities, and its role in reducing caregiver burden. The study demonstrates that an ICF-based chatbot system can effectively promote community participation and improve the quality of life for elderly individuals with mild dementia. Future research should focus on refining the system and evaluating its long-term impact.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.