• Title/Summary/Keyword: target text

Search Result 233, Processing Time 0.023 seconds

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

A new approach technique on Speech-to-Speech Translation (신호의 복원된 위상 공간을 이용한 오디오 상황 인지)

  • Le, Thanh Hien;Lee, Sung-young;Lee, Young-Koo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.239-240
    • /
    • 2009
  • We live in a flat world in which globalization fosters communication, travel, and trade among more than 150 countries and thousands of languages. To surmount the barriers among these languages, translation is required; Speech-to-Speech translation will automate the process. Thanks to recent advances in Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), one can now utilize a system to translate a speech of source language to a speech of target language and vice versa in affordable manner. The three phase process establishes that the source speech be transcribed into a (set of) text of the source language (ASR) before the source text is translated into the target text (MT). Finally, the target speech is synthesized from the target text (TTS).

Perspective Coherence in Simultaneous Interpreting - with Reference to German-Korean Interpreting - (동시통역과 시각적 응집성 - 독한 통역을 중심으로 -)

  • Ahn In-Kyoung
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.9
    • /
    • pp.169-193
    • /
    • 2004
  • In simultaneous interpreting, if the syntactic structure of the source language and the target language are very different, interpreters have to wait before being able to reformulate the source text segments into a meaningful utterance in target language. It is inevitable to adapt the target language structure to that of the source language so as not to unduly increase the memory load and to minimize the pause. While such adaptation enables simultaneous interpretating, it results in damaging the perspective coherence of the text. Discovering when such perspective coherence is impaired, and how the problem can be relieved, will enable interpreters to enhance their performance. This paper analyses the reasons for perspective coherence damage by looking at some examples of German-Korean simultaneous interpreting.

  • PDF

Text Watermarking Based on Syntactic Constituent Movement (구문요소의 전치에 기반한 문서 워터마킹)

  • Kim, Mi-Young
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.79-84
    • /
    • 2009
  • This paper explores a method of text watermarking for agglutinative languages and develops a syntactic tree-based syntactic constituent movement scheme. Agglutinative languages provide a good ground for the syntactic tree-based natural language watermarking because syntactic constituent order is relatively free. Our proposed natural language watermarking method consists of seven procedures. First, we construct a syntactic dependency tree of unmarked text. Next, we perform clausal segmentation from the syntactic tree. Third, we choose target syntactic constituents, which will move within its clause. Fourth, we determine the movement direction of the target constituents. Then, we embed a watermark bit for each target constituent. Sixth, if the watermark bit does not coincide with the direction of the target constituent movement, we displace the target constituent in the syntactic tree. Finally, from the modified syntactic tree, we obtain a marked text. From the experimental results, we show that the coverage of our method is 91.53%, and the rate of unnatural sentences of marked text is 23.16%, which is better than that of previous systems. Experimental results also show that the marked text keeps the same style, and it has the same information without semantic distortion.

Font Change Blindness Triggered by the Text Difficulty in Moving Window Technique (움직이는 창 기법에서의 덩이글 난이도에 따른 글꼴 변화맹)

  • Seong-Jun Bak;Joo-Seok Hyun
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.4
    • /
    • pp.259-275
    • /
    • 2023
  • The aim of this study was to investigate font change blindness based on text difficulty in the "Moving Window Task", as originally introduced by McConkie and Rayner(1975). During the reading process where the moving window was applied, different target words in terms of font style compared to the text were presented. As participants' gaze reached the position of the target word, the font of the target word was changed to match the text font. The font of the target word before the change was either sans-serif when the text font was serif, or serif when the text font was sans-serif. After completing the reading task, more than half of the participants(62.5%) reported not detecting the font change. Observation of eye movements at the target word positions revealed that when understanding the content within the text was difficult, there was an increase in the number of regressions, an extended gaze duration, and a reduction in saccade length. Specifically, the increase in the number of regressions was evident only when the text font was serif, in other words, when the font of the target word shifted from sans-serif to serif. These results suggest that sensory interference unrelated to content understanding is not easily detected during reading. However, the possibility of detection increases when comprehension of the content becomes challenging. Furthermore, this exceptional detection possibility implies that it may be higher when the text font is serif compared to when it is sans-serif.

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

  • Myronenko, Serhii;Myronenko, Yelyzaveta
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.242-248
    • /
    • 2022
  • The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.

Learning from the L2 Expository Text

  • Kim, Jung-Tae
    • English Language & Literature Teaching
    • /
    • v.10 no.3
    • /
    • pp.21-40
    • /
    • 2004
  • This study Questioned what happens in L2 reading comprehension of the expository text, as measured by recall and inference-making abilities, when a L2 reader was induced to develop a content schema about the topic of a target text, but the structure of that schema departs from the structure of the target text Seventy-four. Korean university students read either the same version text twice (consistent condition) or two different version texts (inconsistent condition) with a three-day interval between the two readings. The results of a verification test indicate that, for those subjects with higher L2 reading proficiency, the inconsistent condition was more beneficial than the consistent condition for the inference-making task. On the other hand, for lower-level L2 readers, the consistent condition was more favorable for the recall task. It was concluded that inducing a structurally inconsistent schema through an L2 pre-reading would be beneficial only when the reader's L2 linguistic ability is proficient enough to produce necessary propositions from the pre-reading.

  • PDF

Citation-based Article Summarization using a Combination of Lexical Text Similarities: Evaluation with Computational Linguistics Literature Summarization Datasets

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.7
    • /
    • pp.31-37
    • /
    • 2019
  • Citation-based article summarization is to create a shortened text for an academic article, reflecting the content of citing sentences which contain other's thoughts about the target article to be summarized. To deal with the problem, this study introduces an extractive summarization method based on calculating a linear combination of various sentence salience scores, which represent the degrees to which a candidate sentence reflects the content of author's abstract text, reader's citing text, and the target article to be summarized. In the current study, salience scores are obtained by computing surface-level textual similarities. Experiments using CL-SciSumm datasets show that the proposed method parallels or outperforms the previous approaches in ROUGE evaluations against SciSumm-2017 human summaries and SciSumm-2016/2017 community summaries.

Consolidation of Subtasks for Target Task in Pipelined NLP Model

  • Son, Jeong-Woo;Yoon, Heegeun;Park, Seong-Bae;Cho, Keeseong;Ryu, Won
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.704-713
    • /
    • 2014
  • Most natural language processing tasks depend on the outputs of some other tasks. Thus, they involve other tasks as subtasks. The main problem of this type of pipelined model is that the optimality of the subtasks that are trained with their own data is not guaranteed in the final target task, since the subtasks are not optimized with respect to the target task. As a solution to this problem, this paper proposes a consolidation of subtasks for a target task ($CST^2$). In $CST^2$, all parameters of a target task and its subtasks are optimized to fulfill the objective of the target task. $CST^2$ finds such optimized parameters through a backpropagation algorithm. In experiments in which text chunking is a target task and part-of-speech tagging is its subtask, $CST^2$ outperforms a traditional pipelined text chunker. The experimental results prove the effectiveness of optimizing subtasks with respect to the target task.

BIOLOGY ORIENTED TARGET SPECIFIC LITERATURE MINING FOR GPCR PATHWAY EXTRACTION (GPCR 경로 추출을 위한 생물학 기반의 목적지향 텍스트 마이닝 시스템)

  • KIm, Eun-Ju;Jung, Seol-Kyoung;Yi, Eun-Ji;Lee, Gary-Geunbae;Park, Soo-Jun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.86-94
    • /
    • 2003
  • Electronically available biological literature has been accumulated exponentially in the course of time. So, researches on automatically acquiring knowledge from these tremendous data by text mining technology become more and more prosperous. However, most of the previous researches are technology oriented and are not well focused in practical extraction target, hence result in low performance and inconvenience for the bio-researchers to actually use. In this paper, we propose a more biology oriented target domain specific text mining system, that is, POSTECH bio-text mining system (POSBIOTM), for signal transduction pathway extraction, especially for G protein-coupled receptor (GPCR) pathway. To reflect more domain knowledge, we specify the concrete target for pathway extraction and define the minimal pathway domain ontology. Under this conceptual model, POSBIOTM extracts interactions and entities of pathways from the full biological articles using a machine learning oriented extraction method and visualizes the pathways using JDesigner module provided in the system biology workbench (SBW) [14]

  • PDF