• Title/Summary/Keyword: Text Similarity

Search Result 277, Processing Time 0.038 seconds

Spontaneous Speech Language Modeling using N-gram based Similarity (N-gram 기반의 유사도를 이용한 대화체 연속 음성 언어 모델링)

  • Park Young-Hee;Chung Minhwa
    • MALSORI
    • /
    • no.46
    • /
    • pp.117-126
    • /
    • 2003
  • This paper presents our language model adaptation for Korean spontaneous speech recognition. Korean spontaneous speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpus. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf/sup */idf similarity. In addition to relevance weighting, we use disfluencies as Predictor to the neighboring words. The best result reduces 9.7% word error rate relatively and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor also.

  • PDF

College Admissions Consultation Chatbot based on Text Similarity (텍스트 유사도 기반의 대학 입시 상담 챗봇)

  • Lee, Se-Hoon;Cha, Hyun-Suk;Jeon, Chan-Ho;Baek, Yeong-Tae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.441-442
    • /
    • 2018
  • 본 논문에서는 입시상담을 위한 챗봇 시스템을 텍스트 유사도 기반으로 개발하였다. 텍스트를 인지하여 답변을 제공해 주는 방식이며 실시간을 요하는 데이터들은 크롤링한 데이터를 가공을 한 후 사용자에게 대답을 해주고 사용자가 답변에 얼마나 좋은 정보인지 체크하여 그에 맞는 답변을 내어 준다. 사용자의 텍스트를 인식하는 것은 텍스트 유사도를 이용하여 정확하게 인지하고 사용자의 질문과 답변을 서버 DB에 저장을 하여 비슷한 질문이 있을 경우 저장된 답변과 평점을 이용하여 답변을 제공한다.

  • PDF

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

  • Yun, Yeoil;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.18 no.4
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

Text Verification Based on Sub-Image Matching (부분 영상 매칭에 기반한 텍스트 검증)

  • Son Hwa Jeong;Jeong Seon Hwa;Kim Soo Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.2 s.98
    • /
    • pp.115-122
    • /
    • 2005
  • The sub-mage matching problem in which one image contains some part of the other image, has been mostly investigated on natural images. In this paper, we propose two sub-image matching techniques: mesh-based method and correlation-based method, that are efficiently used to match text images. Mesh-based method consists of two stages, box alignment and similarity measurement by extracting the mesh feature from the two images. Correlation-based method determines the similarity using the correlation of the two images based on FFT function. We have applied the two methods to the text verification in a postal automation system and observed that the accuracy of correlation-based method is $92.7\%$ while that of mesh-based method is $90.1\%$.

Text Extraction Algorithm in Complex Images using Adaptive Edge detection (복잡한 영상에서 적응적 에지검출을 이용한 텍스트 추출 알고리즘 연구)

  • Shin, Seong;Kim, Sung-Dong;Baek, Young-Hyun;Moon, Sung-Ryong
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.251-252
    • /
    • 2007
  • The thesis proposed the Text Extraction Algorithm which is a text extraction algorithm which uses the Coiflet Wavelet, YCbCr Color model and the close curve edge feature of adaptive LoG Operator in order to complement the demerit of the existing research which is weak in complexity of background, variety of light and disordered line and similarity of text and background color. This thesis is simulated with natural images which include naturally text area regardless of size, resolution and slant and so on of image. And the proposed algorithm is confirmed to an excellent by compared with an existing extraction algorithm in same image.

  • PDF

Citation-based Article Summarization using a Combination of Lexical Text Similarities: Evaluation with Computational Linguistics Literature Summarization Datasets

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.7
    • /
    • pp.31-37
    • /
    • 2019
  • Citation-based article summarization is to create a shortened text for an academic article, reflecting the content of citing sentences which contain other's thoughts about the target article to be summarized. To deal with the problem, this study introduces an extractive summarization method based on calculating a linear combination of various sentence salience scores, which represent the degrees to which a candidate sentence reflects the content of author's abstract text, reader's citing text, and the target article to be summarized. In the current study, salience scores are obtained by computing surface-level textual similarities. Experiments using CL-SciSumm datasets show that the proposed method parallels or outperforms the previous approaches in ROUGE evaluations against SciSumm-2017 human summaries and SciSumm-2016/2017 community summaries.

Style-Specific Language Model Adaptation using TF*IDF Similarity for Korean Conversational Speech Recognition

  • Park, Young-Hee;Chung, Min-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2E
    • /
    • pp.51-55
    • /
    • 2004
  • In this paper, we propose a style-specific language model adaptation scheme using n-gram based tf*idf similarity for Korean spontaneous speech recognition. Korean spontaneous speech shows especially different style-specific characteristics such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words. To reflect these style-specific characteristics and overcome insufficient data for training language model, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to their n-. gram based tf*idf similarity, in which in-domain language model include disfluency model. Recognition results show that n-gram based tf*idf similarity weighting effectively reflects style difference.

Similarity and Approximate Solutions of Laminar Film Condensation on a Flat Plate

  • Lee, Sung-Hong;Lee, Euk-Soo
    • Journal of Mechanical Science and Technology
    • /
    • v.15 no.9
    • /
    • pp.1339-1345
    • /
    • 2001
  • Laminar film condensation of a saturated pure vapor in forced flow over a flat plate is analyzed as boundary layer solutions. Similarity solutions for some real fluids are presented as a function of modified Jakob number (C$\_$pι/ ΔΤ/Prh$\_$fg/) with property ratio (No Abstract.see full/text) and Pγ as parameters and compared with approximate solutions which were obtained from energy and momentum equations without convection and inertia terms in liquid flow. Approximate solutions agree well with the similarity solutions when the values of modified Jakob number are less then 0.1 near 1 atmospheric pressure.

  • PDF

Objective Material analysis to the device with IoT Framework System

  • Lee, KyuTae;Ki, Jang Geun
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.289-296
    • /
    • 2020
  • Software copyright are written in text form of documents and stored as files, so it is easy to expose on an illegal copyright. The IOT framework configuration and service environment are also evaluated in software structure and revealed to replication environments. Illegal copyright can be easily created by intelligently modifying the program code in the framework system. This paper deals with similarity comparison to determine the suspicion of illegal copying. In general, original source code should be provided for similarity comparison on both. However, recently, the suspected developer have refused to provide the source code, and comparative evaluation are performed only with executable code. This study dealt with how to analyze the similarity with the execution code and the circuit configuration and interface state of the system without the original source code. In this paper, we propose a method of analyzing the data of the object without source code and verifying the similarity comparison result through evaluation examples.