• Title/Summary/Keyword: 자연어분석

Search Result 562, Processing Time 0.03 seconds

Multi-perspective User Preference Learning in a Chatting Domain (인터넷 채팅 도메인에서의 감성정보를 이용한 타관점 사용자 선호도 학습 방법)

  • Shin, Wook-Hyun;Jeong, Yoon-Jae;Myaeng, Sung-Hyon;Han, Kyoung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.1-8
    • /
    • 2009
  • Learning user's preference is a key issue in intelligent system such as personalized service. The study on user preference model has adapted simple user preference model, which determines a set of preferred keywords or topic, and weights to each target. In this paper, we recommend multi-perspective user preference model that factors sentiment information in the model. Based on the topicality and sentimental information processed using natural language processing techniques, it learns a user's preference. To handle timc-variant nature of user preference, user preference is calculated by session, short-term and long term. User evaluation is used to validate the effect of user preference teaming and it shows 86.52%, 86.28%, 87.22% of accuracy for topic interest, keyword interest, and keyword favorableness.

A Survey on the Latest Research Trends in Retrieval-Augmented Generation (검색 증강 생성(RAG) 기술의 최신 연구 동향에 대한 조사)

  • Eunbin Lee;Ho Bae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.9
    • /
    • pp.429-436
    • /
    • 2024
  • As Large Language Models (LLMs) continue to advance, effectively harnessing their potential has become increasingly important. LLMs, trained on vast datasets, are capable of generating text across a wide range of topics, making them useful in applications such as content creation, machine translation, and chatbots. However, they often face challenges in generalization due to gaps in specific or specialized knowledge, and updating these models with the latest information post-training remains a significant hurdle. To address these issues, Retrieval-Augmented Generation (RAG) models have been introduced. These models enhance response generation by retrieving information from continuously updated external databases, thereby reducing the hallucination phenomenon often seen in LLMs while improving efficiency and accuracy. This paper presents the foundational architecture of RAG, reviews recent research trends aimed at enhancing the retrieval capabilities of LLMs through RAG, and discusses evaluation techniques. Additionally, it explores performance optimization and real-world applications of RAG in various industries. Through this analysis, the paper aims to propose future research directions for the continued development of RAG models.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Analysis of Seasonal Importance of Construction Hazards Using Text Mining (텍스트마이닝을 이용한 건설공사 위험요소의 계절별 중요도 분석)

  • Park, Kichang;Kim, Hyoungkwan
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.41 no.3
    • /
    • pp.305-316
    • /
    • 2021
  • Construction accidents occur due to a number of reasons-worker carelessness, non-adoption of safety equipment, and failure to comply with safety rules are some examples. Because much construction work is done outdoors, weather conditions can also be a factor in accidents. Past construction accident data are useful for accident prevention, but since construction accident data are often in a text format consisting of natural language, extracting construction hazards from construction accident data can take a lot of time and that entails extra cost. Therefore, in this study, we extracted construction hazards from 2,026 domestic construction accident reports using text mining and performed a seasonal analysis of construction hazards through frequency analysis and centrality analysis. Of the 254 construction hazards defined by Korea's Ministry of Land, Infrastructure, and Transport, we extracted 51 risk factors from the construction accident data. The results showed that a significant hazard was "Formwork" in spring and autumn, "Scaffold" in summer, and "Crane" in winter. The proposed method would enable construction safety managers to prepare better safety measures against outdoor construction accidents according to weather, season, and climate.

Development of big data based Skin Care Information System SCIS for skin condition diagnosis and management

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.137-147
    • /
    • 2022
  • Diagnosis and management of skin condition is a very basic and important function in performing its role for workers in the beauty industry and cosmetics industry. For accurate skin condition diagnosis and management, it is necessary to understand the skin condition and needs of customers. In this paper, we developed SCIS, a big data-based skin care information system that supports skin condition diagnosis and management using social media big data for skin condition diagnosis and management. By using the developed system, it is possible to analyze and extract core information for skin condition diagnosis and management based on text information. The skin care information system SCIS developed in this paper consists of big data collection stage, text preprocessing stage, image preprocessing stage, and text word analysis stage. SCIS collected big data necessary for skin diagnosis and management, and extracted key words and topics from text information through simple frequency analysis, relative frequency analysis, co-occurrence analysis, and correlation analysis of key words. In addition, by analyzing the extracted key words and information and performing various visualization processes such as scatter plot, NetworkX, t-SNE, and clustering, it can be used efficiently in diagnosing and managing skin conditions.

A Study on Environmental research Trends by Information and Communications Technologies using Text-mining Technology (텍스트 마이닝 기법을 이용한 환경 분야의 ICT 활용 연구 동향 분석)

  • Park, Boyoung;Oh, Kwan-Young;Lee, Jung-Ho;Yoon, Jung-Ho;Lee, Seung Kuk;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.2
    • /
    • pp.189-199
    • /
    • 2017
  • Thisstudy quantitatively analyzed the research trendsin the use ofICT ofthe environmental field using the text mining technique. To that end, the study collected 359 papers published in the past two decades(1996-2015)from the National Digital Science Library (NDSL) using 38 environment-related keywords and 16 ICT-related keywords. It processed the natural languages of the environment and ICT fields in the papers and reorganized the classification system into the unit of corpus. It conducted the text mining analysis techniques of frequency analysis, keyword analysis and the association rule analysis of keywords, based on the above-mentioned keywords of the classification system. As a result, the frequency of the keywords of 'general environment' and 'climate' accounted for 77 % of the total proportion and the keywords of 'public convergence service' and 'industrial convergence service' in the ICT field took up approximately 30 % of the total proportion. According to the time series analysis, the researches using ICT in the environmental field rapidly increased over the past 5 years (2011-2015) and the number of such researches more than doubled compared to the past (1996-2010). Based on the environmental field with generated association rules among the keywords, it was identified that the keyword 'general environment' was using 16 ICT-based technologies and 'climate' was using 14 ICT-based technologies.

Method for Spatial Sentiment Lexicon Construction using Korean Place Reviews (한국어 장소 리뷰를 이용한 공간 감성어 사전 구축 방법)

  • Lee, Young Min;Kwon, Pil;Yu, Ki Yun;Kim, Ji Young
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.2
    • /
    • pp.3-12
    • /
    • 2017
  • Leaving positive or negative comments of places where he or she visits on location-based services is being common in daily life. The sentiment analysis of place reviews written by actual visitors can provide valuable information to potential consumers, as well as business owners. To conduct sentiment analysis of a place, a spatial sentiment lexicon that can be used as a criterion is required; yet, lexicon of spatial sentiment words has not been constructed. Therefore, this study suggested a method to construct a spatial sentiment lexicon by analyzing the place review data written by Korean internet users. Among several location categories, theme parks were chosen for this study. For this purpose, natural language processing technique and statistical techniques are used. Spatial sentiment words included the lexicon have information about sentiment polarity and probability score. The spatial sentiment lexicon constructed in this study consists of 3 tables(SSLex_SS, SSLex_single, SSLex_combi) that include 219 spatial sentiment words. Throughout this study, the sentiment analysis has conducted based on the texts written about the theme parks created on Twitter. As the accuracy of the sentiment classification was calculated as 0.714, the validity of the lexicon was verified.

A Convergence Study of the Research Trends on Stress Urinary Incontinence using Word Embedding (워드임베딩을 활용한 복압성 요실금 관련 연구 동향에 관한 융합 연구)

  • Kim, Jun-Hee;Ahn, Sun-Hee;Gwak, Gyeong-Tae;Weon, Young-Soo;Yoo, Hwa-Ik
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.8
    • /
    • pp.1-11
    • /
    • 2021
  • The purpose of this study was to analyze the trends and characteristics of 'stress urinary incontinence' research through word frequency analysis, and their relationships were modeled using word embedding. Abstract data of 9,868 papers containing abstracts in PubMed's MEDLINE were extracted using a Python program. Then, through frequency analysis, 10 keywords were selected according to the high frequency. The similarity of words related to keywords was analyzed by Word2Vec machine learning algorithm. The locations and distances of words were visualized using the t-SNE technique, and the groups were classified and analyzed. The number of studies related to stress urinary incontinence has increased rapidly since the 1980s. The keywords used most frequently in the abstract of the paper were 'woman', 'urethra', and 'surgery'. Through Word2Vec modeling, words such as 'female', 'urge', and 'symptom' were among the words that showed the highest relevance to the keywords in the study on stress urinary incontinence. In addition, through the t-SNE technique, keywords and related words could be classified into three groups focusing on symptoms, anatomical characteristics, and surgical interventions of stress urinary incontinence. This study is the first to examine trends in stress urinary incontinence-related studies using the keyword frequency analysis and word embedding of the abstract. The results of this study can be used as a basis for future researchers to select the subject and direction of the research field related to stress urinary incontinence.

Deep Learning Description Language for Referring to Analysis Model Based on Trusted Deep Learning (신뢰성있는 딥러닝 기반 분석 모델을 참조하기 위한 딥러닝 기술 언어)

  • Mun, Jong Hyeok;Kim, Do Hyung;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.4
    • /
    • pp.133-142
    • /
    • 2021
  • With the recent advancements of deep learning, companies such as smart home, healthcare, and intelligent transportation systems are utilizing its functionality to provide high-quality services for vehicle detection, emergency situation detection, and controlling energy consumption. To provide reliable services in such sensitive systems, deep learning models are required to have high accuracy. In order to develop a deep learning model for analyzing previously mentioned services, developers should utilize the state of the art deep learning models that have already been verified for higher accuracy. The developers can verify the accuracy of the referenced model by validating the model on the dataset. For this validation, the developer needs structural information to document and apply deep learning models, including metadata such as learning dataset, network architecture, and development environments. In this paper, we propose a description language that represents the network architecture of the deep learning model along with its metadata that are necessary to develop a deep learning model. Through the proposed description language, developers can easily verify the accuracy of the referenced deep learning model. Our experiments demonstrate the application scenario of a deep learning description document that focuses on the license plate recognition for the detection of illegally parked vehicles.

Comparing the 2015 with the 2022 Revised Primary Science Curriculum Based on Network Analysis (2015 및 2022 개정 초등학교 과학과 교육과정에 대한 비교 - 네트워크 분석을 중심으로 -)

  • Jho, Hunkoog
    • Journal of Korean Elementary Science Education
    • /
    • v.42 no.1
    • /
    • pp.178-193
    • /
    • 2023
  • The aim of this study was to investigate differences in the achievement standards from the 2015 to the 2022 revised national science curriculum and to present the implications for science teaching under the revised curriculum. Achievement standards relevant to primary science education were therefore extracted from the national curriculum documents; conceptual domains in the two curricula were analyzed for differences; various kinds of centrality were computed; and the Louvain algorithm was used to identify clusters. These methods revealed that, in the revised compared with the preceding curriculum, the total number of nodes and links had increased, while the number of achievement standards had decreased by 10 percent. In the revised curriculum, keywords relevant to procedural skills and behavior received more emphasis and were connected to collaborative learning and digital literacy. Observation, survey, and explanation remained important, but varied in application across the fields of science. Clustering revealed that the number of categories in each field of science remained mostly unchanged in the revised compared with the previous curriculum, but that each category highlighted different skills or behaviors. Based on those findings, some implications for science instruction in the classroom are discussed.