• Title/Summary/Keyword: Text Input Method

Search Result 166, Processing Time 0.029 seconds

Embeded-type Search Function with Feedback for Smartphone Applications (스마트폰 애플리케이션을 위한 임베디드형 피드백 지원 검색체)

  • Kang, Moonjoong;Hwang, Mintae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.5
    • /
    • pp.974-983
    • /
    • 2017
  • In this paper, we have discussed the search function that can be embedded and used on Android-based applications. We used BM25 to suppress insignificant and too frequent words such as postpositions, Pivoted Length Normalization technique used to resolve the search priority problem related to each item's length, and Rocchio's method to pull items inferred to be related to the query closer to the query vector on Vector Space Model to support implicit feedback function. The index operation is divided into two methods; simple index to support offline operation and complex index for online operation. The implementation uses query inference function to guess user's future input by collating given present input with indexed data and with it the function is able to handle and correct user's error. Thus the implementation could be easily adopted into smartphone applications to improve their search functions.

Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length (BERT 기반 자연어처리 모델의 미세 조정을 통한 한국어 리뷰 감성 분석: 입력 시퀀스 길이 최적화)

  • Sunga Hwang;Seyeon Park;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.4
    • /
    • pp.47-56
    • /
    • 2024
  • This paper proposes a method for fine-tuning BERT-based natural language processing models to perform sentiment analysis on Korean review data. By varying the input sequence length during this process and comparing the performance, we aim to explore the optimal performance according to the input sequence length. For this purpose, text review data collected from the clothing shopping platform M was utilized. Through web scraping, review data was collected. During the data preprocessing stage, positive and negative satisfaction scores were recalibrated to improve the accuracy of the analysis. Specifically, the GPT-4 API was used to reset the labels to reflect the actual sentiment of the review texts, and data imbalance issues were addressed by adjusting the data to 6:4 ratio. The reviews on the clothing shopping platform averaged about 12 tokens in length, and to provide the optimal model suitable for this, five BERT-based pre-trained models were used in the modeling stage, focusing on input sequence length and memory usage for performance comparison. The experimental results indicated that an input sequence length of 64 generally exhibited the most appropriate performance and memory usage. In particular, the KcELECTRA model showed optimal performance and memory usage at an input sequence length of 64, achieving higher than 92% accuracy and reliability in sentiment analysis of Korean review data. Furthermore, by utilizing BERTopic, we provide a Korean review sentiment analysis process that classifies new incoming review data by category and extracts sentiment scores for each category using the final constructed model.

A Study on the Automated Design System for Gear (기어설계 자동화 시스템에 관한 연구)

  • Jo, Hae-Yong;Nam, Gi-Jeong;O, Byeong-Gi
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.8
    • /
    • pp.1506-1511
    • /
    • 2002
  • A computer aided expert system fur spur, helical, bevel and worm gears was newly developed by using AutoiCAD system and its AutoLISP computer language in the present study. Two methods are available for a designer to draw a gear. The first method needs the gear design parameters such as pressure, module, number of tooth, shaft angle, velocity, materials, etc. When the gear design parameters are inputted, a gear is drawn in AutoCAD system and maximum allowable power and shaft diameter are calculated additionally. The second method calculates all dimensions and gear design parameters to draw a gear when the information such as transmission, reduction ratio, nm, materials and pressure are inputted. The system includes four programs. Each program is composed of a data input module, a database module, a strength calculation module, a drawing module, a text module and a drawing edit module. In conclusion, the CAD system would be widely used in companies to find the geometric data and manufacturing course.

A Study of Disambiguation Method To Improve The Syntactic Analysis System (구문 분석의 결과로 나타나는 구조의 모호성을 해결하기 위한 방법 연구)

  • Park, Yong Uk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2764-2769
    • /
    • 2015
  • In this paper, we present a Korean syntactic analysis system which can generate all possible syntactic trees in a given sentence. Therefore, the number of syntactic trees by this syntactic analysis system can be increased exponentially. To solve this problem, we suggest a segmentation method and maximum connected unit in a segmentation. Maximum connected unit is a combined unit which contains all morphemes in a segmentation. According to the input sentence, it is possible one or more maximum connected unit in a segmentation. We extract 516 sentences to experiment randomly from the text book of Korean middle school. We could reduce about 28% of the number of syntactic trees.

Dialog-based multi-item recommendation using automatic evaluation

  • Euisok Chung;Hyun Woo Kim;Byunghyun Yoo;Ran Han;Jeongmin Yang;Hwa Jeon Song
    • ETRI Journal
    • /
    • v.46 no.2
    • /
    • pp.277-289
    • /
    • 2024
  • In this paper, we describe a neural network-based application that recommends multiple items using dialog context input and simultaneously outputs a response sentence. Further, we describe a multi-item recommendation by specifying it as a set of clothing recommendations. For this, a multimodal fusion approach that can process both cloth-related text and images is required. We also examine achieving the requirements of downstream models using a pretrained language model. Moreover, we propose a gate-based multimodal fusion and multiprompt learning based on a pretrained language model. Specifically, we propose an automatic evaluation technique to solve the one-to-many mapping problem of multi-item recommendations. A fashion-domain multimodal dataset based on Koreans is constructed and tested. Various experimental environment settings are verified using an automatic evaluation method. The results show that our proposed method can be used to obtain confidence scores for multi-item recommendation results, which is different from traditional accuracy evaluation.

Automatic Text Extraction from News Video using Morphology and Text Shape (형태학과 문자의 모양을 이용한 뉴스 비디오에서의 자동 문자 추출)

  • Jang, In-Young;Ko, Byoung-Chul;Kim, Kil-Cheon;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.4
    • /
    • pp.479-488
    • /
    • 2002
  • In recent years the amount of digital video used has risen dramatically to keep pace with the increasing use of the Internet and consequently an automated method is needed for indexing digital video databases. Textual information, both superimposed and embedded scene texts, appearing in a digital video can be a crucial clue for helping the video indexing. In this paper, a new method is presented to extract both superimposed and embedded scene texts in a freeze-frame of news video. The algorithm is summarized in the following three steps. For the first step, a color image is converted into a gray-level image and applies contrast stretching to enhance the contrast of the input image. Then, a modified local adaptive thresholding is applied to the contrast-stretched image. The second step is divided into three processes: eliminating text-like components by applying erosion, dilation, and (OpenClose+CloseOpen)/2 morphological operations, maintaining text components using (OpenClose+CloseOpen)/2 operation with a new Geo-correction method, and subtracting two result images for eliminating false-positive components further. In the third filtering step, the characteristics of each component such as the ratio of the number of pixels in each candidate component to the number of its boundary pixels and the ratio of the minor to the major axis of each bounding box are used. Acceptable results have been obtained using the proposed method on 300 news images with a recognition rate of 93.6%. Also, my method indicates a good performance on all the various kinds of images by adjusting the size of the structuring element.

A Study on Development of Patent Information Retrieval Using Textmining (텍스트 마이닝을 이용한 특허정보검색 개발에 관한 연구)

  • Go, Gwang-Su;Jung, Won-Kyo;Shin, Young-Geun;Park, Sang-Sung;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.8
    • /
    • pp.3677-3688
    • /
    • 2011
  • The patent information retrieval system can serve a variety of purposes. In general, the patent information is retrieved using limited key words. To identify earlier technology and priority rights repeated effort is needed. This study proposes a method of content-based retrieval using text mining. Using the proposed algorithm, each of the documents is invested with characteristic value. The characteristic values are used to compare similarities between query documents and database documents. Text analysis is composed of 3 steps: stop-word, keyword analysis and weighted value calculation. In the test results, the general retrieval and the proposed algorithm were compared by using accuracy measurements. As the study arranges the result documents as similarities of the query documents, the surfer can improve the efficiency by reviewing the similar documents first. Also because of being able to input the full-text of patent documents, the users unacquainted with surfing can use it easily and quickly. It can reduce the amount of displayed missing data through the use of content based retrieval instead of keyword based retrieval for extending the scope of the search.

Language-Independent Word Acquisition Method Using a State-Transition Model

  • Xu, Bin;Yamagishi, Naohide;Suzuki, Makoto;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.3
    • /
    • pp.224-230
    • /
    • 2016
  • The use of new words, numerous spoken languages, and abbreviations on the Internet is extensive. As such, automatically acquiring words for the purpose of analyzing Internet content is very difficult. In a previous study, we proposed a method for Japanese word segmentation using character N-grams. The previously proposed method is based on a simple state-transition model that is established under the assumption that the input document is described based on four states (denoted as A, B, C, and D) specified beforehand: state A represents words (nouns, verbs, etc.); state B represents statement separators (punctuation marks, conjunctions, etc.); state C represents postpositions (namely, words that follow nouns); and state D represents prepositions (namely, words that precede nouns). According to this state-transition model, based on the states applied to each pseudo-word, we search the document from beginning to end for an accessible pattern. In other words, the process of this transition detects some words during the search. In the present paper, we perform experiments based on the proposed word acquisition algorithm using Japanese and Chinese newspaper articles. These articles were obtained from Japan's Kyoto University and the Chinese People's Daily. The proposed method does not depend on the language structure. If text documents are expressed in Unicode the proposed method can, using the same algorithm, obtain words in Japanese and Chinese, which do not contain spaces between words. Hence, we demonstrate that the proposed method is language independent.

A BERGPT-chatbot for mitigating negative emotions

  • Song, Yun-Gyeong;Jung, Kyung-Min;Lee, Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.12
    • /
    • pp.53-59
    • /
    • 2021
  • In this paper, we propose a BERGPT-chatbot, a domestic AI chatbot that can alleviate negative emotions based on text input such as 'Replika'. We made BERGPT-chatbot into a chatbot capable of mitigating negative emotions by pipelined two models, KR-BERT and KoGPT2-chatbot. We applied a creative method of giving emotions to unrefined everyday datasets through KR-BERT, and learning additional datasets through KoGPT2-chatbot. The development background of BERGPT-chatbot is as follows. Currently, the number of people with depression is increasing all over the world. This phenomenon is emerging as a more serious problem due to COVID-19, which causes people to increase long-term indoor living or limit interpersonal relationships. Overseas artificial intelligence chatbots aimed at relieving negative emotions or taking care of mental health care, have increased in use due to the pandemic. In Korea, Psychological diagnosis chatbots similar to those of overseas cases are being operated. However, as the domestic chatbot is a system that outputs a button-based answer rather than a text input-based answer, when compared to overseas chatbots, domestic chatbots remain at a low level of diagnosing human psychology. Therefore, we proposed a chatbot that helps mitigating negative emotions through BERGPT-chatbot. Finally, we compared BERGPT-chatbot and KoGPT2-chatbot through 'Perplexity', an internal evaluation metric for evaluating language models, and showed the superity of BERGPT-chatbot.

The Development of Forest Fire Statistical Management System using Web GIS Technology

  • Jo, Myung-Hee;Kim, Joon-Bum;Kim, Hyun-Sik;Jo, Yun-Won
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.183-190
    • /
    • 2002
  • In this paper forest fire statistical information management system is constructed on web environment using web based GIS(Geographic Information System) technology. Though this system, general users can easily access forest fire statistical information and obtain them in visual method such as maps, graphs, and text if they have web browsers. Moreover, officials related to forest fire can easily control and manage all information in domestic by accessing input interface, retrieval interface, and out interface. In order to implement this system, IIS 5.0 of Microsoft is used as web server and Oracle 8i and ASP(Active Server Page) are used for database construction and dynamic web page operation, respectively. Also, Arc IMS of ESRI is used to serve map data using Java and HTML as system development language. Through this system, general users can obtain the whole information related to forest fire visually in real time also recognize forest fire prevention. In addition, Forest officials can manage the domestic forest resource and control forest fire dangerous area efficiently and scientifically by analyzing and retrieving huge forest data through this system. So, they can save their manpower, time and cost to collect and manage data.

  • PDF