• Title/Summary/Keyword: text extraction

Search Result 454, Processing Time 0.027 seconds

A Study on the Text-Independent Speaker Recognition from the Vowel Extraction (모음 검출을 통한 텍스트 독립 화자인식에 관한 연구)

  • 김에녹;복혁규;김형래
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.10
    • /
    • pp.82-91
    • /
    • 1994
  • In this thesis, we perform the experiment of speaker recognition by identifying vowels in the pronounciation of each speaker. In detail, we extract the vowels from the pronounciation of each speaker first. From it, we check the frequency energgy of 29 channels. After changing these into fuzzy values, we employ the fuzzy inference to recognize the speaker by text-dependent and text-independent methods. For this experiment, an algorithm of extracting vowels is developed, and newly introduced parameter is the frequency energy of the 29 channels computed from the extracted vowels. It shows the features of each speakers better than existing parameters. The advanced point of this paramter is to use the reference pattern only without the help of any codebook. As a rewult, test-dependent method showed about 95.5% rate of recognition, and text-independent method showed about 94.2% rate of recognition.

  • PDF

Automatic Drawing Input by Segmentation of Text Region and Recognltion of Geometric Drawing Element (문자영역의 분리와 기하학적 도면요소의 인식에 의한 도면 자동입력)

  • 배창석;민병우
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.6
    • /
    • pp.91-103
    • /
    • 1994
  • As CAD systems are introduced in the filed of engineering design, the necessities for automatic drawing input are increased . In this paper, we propose a method for realizing automatic drawing input by separation of text regions and graphic regions, extraction of line vectors from graphic regions, and recognition of circular arcs and circles from line vectors. Sizes of isolated regions, on a drawing are used for separating text regions and graphic regions. Thinning and maximum allowable error method are used to extract line vectors. And geometric structures of line vectors are analyzed to recognize circular arcs and circles. By processing text regions and graphic regions separately, 30~40% of vector information can be reduced. Recognition of circular arcs and circles can increase the utilization of automatic drawing input function.

  • PDF

Keyword Automatic Extraction Scheme with Enhanced TextRank using Word Co-Occurrence in Korean Document (한글 문서의 단어 동시 출현 정보에 개선된 TextRank를 적용한 키워드 자동 추출 기법)

  • Song, KwangHo;Min, Ji-Hong;Kim, Yoo-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.62-66
    • /
    • 2016
  • 문서의 의미 기반 처리를 위해서 문서의 내용을 대표하는 키워드를 추출하는 것은 정확성과 효율성 측면에서 매우 중요한 과정이다. 그러나 단일문서로부터 키워드를 추출해 내는 기존의 연구들은 정확도가 낮거나 한정된 분야에 대해서만 검증을 수행하여 결과를 신뢰하기 어려운 문제가 있었다. 따라서 본 연구에서는 정확하면서도 다양한 분야의 텍스트에 적용 가능한 키워드 추출 방법을 제시하고자 단어의 동시출현정보와 그래프 모델을 바탕으로 TextRank 알고리즘을 변형한 새로운 형태의 알고리즘을 동시에 적용하는 키워드 추출 기법을 제안하였다. 제안한 기법을 활용하여 성능평가를 진행한 결과 기존의 연구들보다 향상된 정확도를 얻을 수 있음을 확인하였다.

  • PDF

Using Text Network Analysis for Analyzing Academic Papers in Nursing (간호학 학술논문의 주제 분석을 위한 텍스트네크워크분석방법 활용)

  • Park, Chan Sook
    • Perspectives in Nursing Science
    • /
    • v.16 no.1
    • /
    • pp.12-24
    • /
    • 2019
  • Purpose: This study examined the suitability of using text network analysis (TNA) methodology for topic analysis of academic papers related to nursing. Methods: TNA background theories, software programs, and research processes have been described in this paper. Additionally, the research methodology that applied TNA to the topic analysis of the academic nursing papers was analyzed. Results: As background theories for the study, we explained information theory, word co-occurrence analysis, graph theory, network theory, and social network analysis. The TNA procedure was described as follows: 1) collection of academic articles, 2) text extraction, 3) preprocessing, 4) generation of word co-occurrence matrices, 5) social network analysis, and 6) interpretation and discussion. Conclusion: TNA using author-keywords has several advantages. It can utilize recognized terms such as MeSH headings or terms chosen by professionals, and it saves time and effort. Additionally, the study emphasizes the necessity of developing a sophisticated research design that explores nursing research trends in a multidimensional method by applying TNA methodology.

Extending TextAE for annotation of non-contiguous entities

  • Lever, Jake;Altman, Russ;Kim, Jin-Dong
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.15.1-15.6
    • /
    • 2020
  • Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity "type 1 diabetes" in the phrase "type 1 and type 2 diabetes." This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE's existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.

Pre-processing Algorithm for Detection of Slab Information on Steel Process using Robust Feature Points extraction (강건한 특징점 추출을 이용한 철강제품 정보 검출을 위한 전처리 알고리즘)

  • Choi, Jong-Hyun;Yun, Jong-Pil;Choi, Sung-Hoo;Koo, Keun-Hwi;Kim, Sang-Woo
    • Proceedings of the KIEE Conference
    • /
    • 2008.07a
    • /
    • pp.1819-1820
    • /
    • 2008
  • Steel slabs are marked with slab management numbers (SMNs). To increase efficiency, automated identification of SMNs from digital images is desirable. Automatic extraction of SMNs is a prerequisite for automatic character segmentation and recognition. The images include complex background, and the position of the text region of the slabs is variable. This paper describes an pre-processing algorithm for detection of slab information using robust feature points extraction. Using SIFT(Scale Invariant Feature Transform) algorithm, we can reduce the search region for extraction of SMNs from the slab image.

  • PDF

Size-Independent Caption Extraction for Korean Captions with Edge Connected Components

  • Jung, Je-Hee;Kim, Jaekwang;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.308-318
    • /
    • 2012
  • Captions include information which relates to the images. In order to obtain the information in the captions, text extraction methods from images have been developed. However, most existing methods can be applied to captions with a fixed height or stroke width using fixed pixel-size or block-size operators which are derived from morphological supposition. We propose an edge connected components based method that can extract Korean captions that are composed of various sizes and fonts. We analyze the properties of edge connected components embedding captions and build a decision tree which discriminates edge connected components which include captions from ones which do not. The images for the experiment are collected from broadcast programs such as documentaries and news programs which include captions with various heights and fonts. We evaluate our proposed method by comparing the performance of the latent caption area extraction. The experiment shows that the proposed method can efficiently extract various sizes of Korean captions.

Latent Keyphrase Extraction Using Deep Belief Networks

  • Jo, Taemin;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.153-158
    • /
    • 2015
  • Nowadays, automatic keyphrase extraction is considered to be an important task. Most of the previous studies focused only on selecting keyphrases within the body of input documents. These studies overlooked latent keyphrases that did not appear in documents. In addition, a small number of studies on latent keyphrase extraction methods had some structural limitations. Although latent keyphrases do not appear in documents, they can still undertake an important role in text mining because they link meaningful concepts or contents of documents and can be utilized in short articles such as social network service, which rarely have explicit keyphrases. In this paper, we propose a new approach that selects qualified latent keyphrases from input documents and overcomes some structural limitations by using deep belief networks in a supervised manner. The main idea of this approach is to capture the intrinsic representations of documents and extract eligible latent keyphrases by using them. Our experimental results showed that latent keyphrases were successfully extracted using our proposed method.

Text Region Verification in Natural Scene Images using Multi-resolution Wavelet Transform and Support Vector Machine (다해상도 웨이블릿 변환과 써포트 벡터 머신을 이용한 자연영상에서의 문자 영역 검증)

  • Bae Kyungsook;Choi Youngwoo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.667-674
    • /
    • 2004
  • Extraction of texts from images is a fundamental and important problem to understand the images. This paper suggests a text region verification method by statistical means of stroke features of the characters. The method extracts 36 dimensional features from $16\times16$sized text and non-text images using wavelet transform - these 36 dimensional features express stroke and direction of characters - and select 12 sub-features out of 36 dimensional features which yield adequate separation between classes. After selecting the features, SVM trains the selected features. For the verification of the text region, each $16\times16$image block is scanned and classified as text or non-text. Then, the text region is finally decided as text region or non-text region. The proposed method is able to verify text regions which can hardly be distin guished.

Analysis of User Requirements Prioritization Using Text Mining : Focused on Online Game (텍스트마이닝을 활용한 사용자 요구사항 우선순위 도출 방법론 : 온라인 게임을 중심으로)

  • Jeong, Mi Yeon;Heo, Sun-Woo;Baek, Dong Hyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.3
    • /
    • pp.112-121
    • /
    • 2020
  • Recently, as the internet usage is increasing, accordingly generated text data is also increasing. Because this text data on the internet includes users' comments, the text data on the Internet can help you get users' opinion more efficiently and effectively. The topic of text mining has been actively studied recently, but it primarily focuses on either the content analysis or various improving techniques mostly for the performance of target mining algorithms. The objective of this study is to propose a novel method of analyzing the user's requirements by utilizing the text-mining technique. To complement the existing survey techniques, this study seeks to present priorities together with efficient extraction of customer requirements from the text data. This study seeks to identify users' requirements, derive the priorities of requirements, and identify the detailed causes of high-priority requirements. The implications of this study are as follows. First, this study tried to overcome the limitations of traditional investigations such as surveys and VOCs through text mining of online text data. Second, decision makers can derive users' requirements and prioritize without having to analyze numerous text data manually. Third, user priorities can be derived on a quantitative basis.