• Title/Summary/Keyword: 자연어

Search Result 1,207, Processing Time 0.024 seconds

A Study on the Improvement Model of Document Retrieval Efficiency of Tax Judgment (조세심판 문서 검색 효율 향상 모델에 관한 연구)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.41-47
    • /
    • 2019
  • It is very important to search for and obtain an example of a similar judgment in case of court judgment. The existing judge's document search uses a method of searching through key-words entered by the user. However, if it is necessary to input an accurate keyword and the keyword is unknown, it is impossible to search for the necessary document. In addition, the detected document may have different contents. In this paper, we want to improve the effectiveness of the method of vectorizing a document into a three-dimensional space, calculating cosine similarity, and searching close documents in order to search an accurate judge's example. Therefore, after analyzing the similarity of words used in the judge's example, a method is provided for extracting the mode and inserting it into the text of the text, thereby providing a method for improving the cosine similarity of the document to be retrieved. It is hoped that users will be able to provide a fast, accurate search trying to find an example of a tax-related judge through the proposed model.

Effects of Interactivity and Usage Mode on User Experience in Chatbot Interface (챗봇 기반 인터페이스의 상호작용성과 사용 모드가 사용자 경험에 미치는 영향)

  • Baek, Hyunji;Kim, Sangyeon;Lee, Sangwon
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.35-43
    • /
    • 2019
  • This study examines how interactivity and usage mode of a chatbot interface affects user experience. Chatbot has rapidly been commercialized in accordance with improvements in artificial intelligence and natural language processing. However, most of the researches have focused on the technical aspect to improve the performance of chatbots, and it is necessary to study user experience on a chatbot interface. In this article, we investigated how 'interactivity' of an interface and the 'usage mode' referring to situations of a user affect the satisfaction, flow, and perceived usefulness of a chatbot for exploring user experience. As the result, first, the higher level of interactivity, the higher user experience. Second, usage mode showed interaction effect with interactivity on flow, although it didn't show the main effect. In specific, when interactivity is high in usage mode, flow was the highest rather than other conditions. Thus, for designing better chatbot interfaces, it should be considered to increase the degree of interactivity, and for users to achieve a goal easily through various functions with high interactivity.

Survey on Out-Of-Domain Detection for Dialog Systems (대화시스템 미지원 도메인 검출에 관한 조사)

  • Jeong, Young-Seob;Kim, Young-Min
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.9
    • /
    • pp.1-12
    • /
    • 2019
  • A dialog system becomes a new way of communication between human and computer. The dialog system takes human voice as an input, and gives a proper response in voice or perform an action. Although there are several well-known products of dialog system (e.g., Amazon Echo, Naver Wave), they commonly suffer from a problem of out-of-domain utterances. If it poorly detects out-of-domain utterances, then it will significantly harm the user satisfactory. There have been some studies aimed at solving this problem, but it is still necessary to study about this intensively. In this paper, we give an overview of the previous studies of out-of-domain detection in terms of three point of view: dataset, feature, and method. As there were relatively smaller studies of this topic due to the lack of datasets, we believe that the most important next research step is to construct and share a large dataset for dialog system, and thereafter try state-of-the-art techniques upon the dataset.

Development of Supervised Machine Learning based Catalog Entry Classification and Recommendation System (지도학습 머신러닝 기반 카테고리 목록 분류 및 추천 시스템 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.57-65
    • /
    • 2019
  • In the case of Domeggook B2B online shopping malls, it has a market share of over 70% with more than 2 million members and 800,000 items are sold per one day. However, since the same or similar items are stored and registered in different catalog entries, it is difficult for the buyer to search for items, and problems are also encountered in managing B2B large shopping malls. Therefore, in this study, we developed a catalog entry auto classification and recommendation system for products by using semi-supervised machine learning method based on previous huge shopping mall purchase information. Specifically, when the seller enters the item registration information in the form of natural language, KoNLPy morphological analysis process is performed, and the Naïve Bayes classification method is applied to implement a system that automatically recommends the most suitable catalog information for the article. As a result, it was possible to improve both the search speed and total sales of shopping mall by building accuracy in catalog entry efficiently.

Suggestions on how to convert official documents to Machine Readable (공문서의 기계가독형(Machine Readable) 전환 방법 제언)

  • Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.67
    • /
    • pp.99-138
    • /
    • 2021
  • In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.

Modified multi-sense skip-gram using weighted context and x-means (가중 문맥벡터와 X-means 방법을 이용한 변형 다의어스킵그램)

  • Jeong, Hyunwoo;Lee, Eun Ryung
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.389-399
    • /
    • 2021
  • In recent years, word embedding has been a popular field of natural language processing research and a skip-gram has become one successful word embedding method. It assigns a word embedding vector to each word using contexts, which provides an effective way to analyze text data. However, due to the limitation of vector space model, primary word embedding methods assume that every word only have a single meaning. As one faces multi-sense words, that is, words with more than one meaning, in reality, Neelakantan (2014) proposed a multi-sense skip-gram (MSSG) to find embedding vectors corresponding to the each senses of a multi-sense word using a clustering method. In this paper, we propose a modified method of the MSSG to improve statistical accuracy. Moreover, we propose a data-adaptive choice of the number of clusters, that is, the number of meanings for a multi-sense word. Some numerical evidence is given by conducting real data-based simulations.

Recent Automatic Post Editing Research (최신 기계번역 사후 교정 연구)

  • Moon, Hyeonseok;Park, Chanjun;Eo, Sugyeong;Seo, Jaehyung;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.19 no.7
    • /
    • pp.199-208
    • /
    • 2021
  • Automatic Post Editing(APE) is the study that automatically correcting errors included in the machine translated sentences. The goal of APE task is to generate error correcting models that improve translation quality, regardless of the translation system. For training these models, source sentence, machine translation, and post edit, which is manually edited by human translator, are utilized. Especially in the recent APE research, multilingual pretrained language models are being adopted, prior to the training by APE data. This study deals with multilingual pretrained language models adopted to the latest APE researches, and the specific application method for each APE study. Furthermore, based on the current research trend, we propose future research directions utilizing translation model or mBART model.

High-Speed Search for Pirated Content and Research on Heavy Uploader Profiling Analysis Technology (불법복제물 고속검색 및 Heavy Uploader 프로파일링 분석기술 연구)

  • Hwang, Chan-Woong;Kim, Jin-Gang;Lee, Yong-Soo;Kim, Hyeong-Rae;Lee, Tae-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1067-1078
    • /
    • 2020
  • With the development of internet technology, a lot of content is produced, and the demand for it is increasing. Accordingly, the number of contents in circulation is increasing, while the number of distributing illegal copies that infringe on copyright is also increasing. The Korea Copyright Protection Agency operates a illegal content obstruction program based on substring matching, and it is difficult to accurately search because a large number of noises are inserted to bypass this. Recently, researches using natural language processing and AI deep learning technologies to remove noise and various blockchain technologies for copyright protection are being studied, but there are limitations. In this paper, noise is removed from data collected online, and keyword-based illegal copies are searched. In addition, the same heavy uploader is estimated through profiling analysis for heavy uploaders. In the future, it is expected that copyright damage will be minimized if the illegal copy search technology and blocking and response technology are combined based on the results of profiling analysis for heavy uploaders.

Integrated Verbal and Nonverbal Sentiment Analysis System for Evaluating Reliability of Video Contents (영상 콘텐츠의 신뢰도 평가를 위한 언어와 비언어 통합 감성 분석 시스템)

  • Shin, Hee Won;Lee, So Jeong;Son, Gyu Jin;Kim, Hye Rin;Kim, Yoonhee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.4
    • /
    • pp.153-160
    • /
    • 2021
  • With the advent of the "age of video" due to the simplification of video content production and the convenience of broadcasting channel operation, review videos on various products are drawing attention. We proposes RASIA, an integrated reliability analysis system based on verbal and nonverbal sentiment analysis of review videos. RASIA extracts and quantifies each emotional value obtained through language sentiment analysis and facial analysis of the reviewer in the video. Subsequently, we conduct an integrated reliability analysis of standardized verbal and nonverbal sentimental values. RASIA provide an new objective indicator to evaluate the reliability of the review video.

Creating Songs Using Note Embedding and Bar Embedding and Quantitatively Evaluating Methods (음표 임베딩과 마디 임베딩을 이용한 곡의 생성 및 정량적 평가 방법)

  • Lee, Young-Bae;Jung, Sung Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.483-490
    • /
    • 2021
  • In order to learn an existing song and create a new song using an artificial neural network, it is necessary to convert the song into numerical data that the neural network can recognize as a preprocessing process, and one-hot encoding has been used until now. In this paper, we proposed a note embedding method using notes as a basic unit and a bar embedding method that uses the bar as the basic unit, and compared the performance with the existing one-hot encoding. The performance comparison was conducted based on quantitative evaluation to determine which method produced a song more similar to the song composed by the composer, and quantitative evaluation methods used in the field of natural language processing were used as the evaluation method. As a result of the evaluation, the song created with bar embedding was the best, followed by note embedding. This is significant in that the note embedding and bar embedding proposed in this paper create a song that is more similar to the song composed by the composer than the existing one-hot encoding.