• Title/Summary/Keyword: Sentence Extraction

Search Result 97, Processing Time 0.025 seconds

A Study of Fundamental Frequency for Focused Word Spotting in Spoken Korean (한국어 발화음성에서 중점단어 탐색을 위한 기본주파수에 대한 연구)

  • Kwon, Soon-Il;Park, Ji-Hyung;Park, Neung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.15B no.6
    • /
    • pp.595-602
    • /
    • 2008
  • The focused word of each sentence is a help in recognizing and understanding spoken Korean. To find the method of focused word spotting at spoken speech signal, we made an analysis of the average and variance of Fundamental Frequency and the average energy extracted from a focused word and the other words in a sentence by experiments with the speech data from 100 spoken sentences. The result showed that focused words have either higher relative average F0 or higher relative variances of F0 than other words. Our findings are to make a contribution to getting prosodic characteristics of spoken Korean and keyword extraction based on natural language processing.

Story Generation Method using User Information in Mobile Environment (모바일 환경에서 사용자 정보를 이용한 스토리 생성 방법)

  • Hong, Jeen-Pyo;Cha, Jeong-Won
    • Journal of Internet Computing and Services
    • /
    • v.14 no.3
    • /
    • pp.81-90
    • /
    • 2013
  • Mobile device can get useful user information, because users have always this device. In this paper, we propose automatically story generation method and user topic extraction using user information in mobile environment. Proposed method is follows: (1) We collect user action information in mobile device. Then, (2) we extract topics from collected information. (3) For the results of (2), we determine episodes for one day. Then, (4) we generate sentences using sentence templates and we compose stories which have theme-based or time-based. Because proposed method is simpler than previous method, proposed method can work only in mobile device. There's no room to leak user information. And proposed method is expressed more informative than previous method, because proposed method is provided sentence-based result. Extracted user-topic, a result of our method, can use to analyze user action and user preference.

Automatic Extraction of Opinion Words from Korean Product Reviews Using the k-Structure (k-Structure를 이용한 한국어 상품평 단어 자동 추출 방법)

  • Kang, Han-Hoon;Yoo, Seong-Joon;Han, Dong-Il
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.470-479
    • /
    • 2010
  • In relation to the extraction of opinion words, it may be difficult to directly apply most of the methods suggested in existing English studies to the Korean language. Additionally, the manual method suggested by studies in Korea poses a problem with the extraction of opinion words in that it takes a long time. In addition, English thesaurus-based extraction of Korean opinion words leaves a challenge to reconsider the deterioration of precision attributed to the one to one mismatching between Korean and English words. Studies based on Korean phrase analyzers may potentially fail due to the fact that they select opinion words with a low level of frequency. Therefore, this study will suggest the k-Structure (k=5 or 8) method, which may possibly improve the precision while mutually complementing existing studies in Korea, in automatically extracting opinion words from a simple sentence in a given Korean product review. A simple sentence is defined to be composed of at least 3 words, i.e., a sentence including an opinion word in ${\pm}2$ distance from the attribute name (e.g., the 'battery' of a camera) of a evaluated product (e.g., a 'camera'). In the performance experiment, the precision of those opinion words for 8 previously given attribute names were automatically extracted and estimated for 1,868 product reviews collected from major domestic shopping malls, by using k-Structure. The results showed that k=5 led to a recall of 79.0% and a precision of 87.0%; while k=8 led to a recall of 92.35% and a precision of 89.3%. Also, a test was conducted using PMI-IR (Pointwise Mutual Information - Information Retrieval) out of those methods suggested in English studies, which resulted in a recall of 55% and a precision of 57%.

Eojeol-Block Bidirectional Algorithm for Automatic Word Spacing of Hangul Sentences (한글 문장의 자동 띄어쓰기를 위한 어절 블록 양방향 알고리즘)

  • Kang, Seung-Shik
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.441-447
    • /
    • 2000
  • Automatic word spacing is needed to solve the automatic indexing problem of the non-spaced documents and the space-insertion problem of the character recognition system at the end of a line. We propose a word spacing algorithm that automatically finds out word spacing positions. It is based on the recognition of Eojeol components by using the sentence partition and bidirectional longest-match algorithm. The sentence partition utilizes an extraction of Eojeol-block where the Eojeol boundary is relatively clear, and a Korean morphological analyzer is applied bidirectionally to the recognition of Eojeol components. We tested the algorithm on two sentence groups of about 4,500 Eojeols. The space-level recall ratio was 97.3% and the Eojeol-level recall ratio was 93.2%.

  • PDF

Korean Summarization System using Automatic Paragraphing (단락 자동 구분을 이용한 문서 요약 시스템)

  • 김계성;이현주;이상조
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.681-686
    • /
    • 2003
  • In this paper, we describes a system that extracts important sentences from Korean newspaper articles using automatic paragraphing. First, we detect repeated words between sentences. Through observation of the repeated words, this system compute Closeness Degree between Sentences(CDS ) from the degree of morphological agreement and the change of grammatical role. And then, it automatically divides a document into meaningful paragraphs using the number of paragraph defined by the user´s need. Finally. it selects one representative sentence from each paragraph and it generates summary using representative sentences. Though our system doesn´t utilize some features such as title, sentence position, rhetorical structure, etc., it is able to extract meaningful sentences to be included in the summary.

A Korean Sentence and Document Sentiment Classification System Using Sentiment Features (감정 자질을 이용한 한국어 문장 및 문서 감정 분류 시스템)

  • Hwang, Jaw-Won;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.3
    • /
    • pp.336-340
    • /
    • 2008
  • Sentiment classification is a recent subdiscipline of text classification, which is concerned not with the topic but with opinion. In this paper, we present a Korean sentence and document classification system using effective sentiment features. Korean sentiment classification starts from constructing effective sentiment feature sets for positive and negative. The synonym information of a English word thesaurus is used to extract effective sentiment features and then the extracted English sentiment features are translated in Korean features by English-Korean dictionary. A sentence or a document is represented by using the extracted sentiment features and is classified and evaluated by SVM(Support Vector Machine).

A Study on Extraction of Character String in Document Image Using Morphology (Morphology를 이용한 문서화상내의 문자열 추출에 관한 연구)

  • 장희돈;김동현;김석태;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.1
    • /
    • pp.123-132
    • /
    • 1993
  • This paper presents the segmentation of sentence area and diagram area from docwnent image. For extracting the sentence area, we perform the Dilation, basic operation of Morphology, to the document image and obtain the smeared document image. After the smeared docwnent image is blocked, we determine the writing form by the vertical and horizontal characteristics of the document image and calculate the skew from it. And then, we relocate the document image and extract the chatacter string from the relocated docwnent. 11 document images of three classes are considered and the character string has been well extracting from 11 document images.

  • PDF

Mining Parallel Text from the Web based on Sentence Alignment

  • Li, Bo;Liu, Juan;Zhu, Huili
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.285-292
    • /
    • 2007
  • The parallel corpus is an important resource in the research field of data-driven natural language processing, but there are only a few parallel corpora publicly available nowadays, mostly due to the high labor force needed to construct this kind of resource. A novel strategy is brought out to automatically fetch parallel text from the web in this paper, which may help to solve the problem of the lack of parallel corpora with high quality. The system we develop first downloads the web pages from certain hosts. Then candidate parallel page pairs are prepared from the page set based on the outer features of the web pages. The candidate page pairs are evaluated in the last step in which the sentences in the candidate web page pairs are extracted and aligned first, and then the similarity of the two web pages is evaluate based on the similarities of the aligned sentences. The experiments towards a multilingual web site show the satisfactory performance of the system.

  • PDF

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

Constructing Tagged Corpus and Cue Word Patterns for Detecting Korean Hedge Sentences (한국어 Hedge 문장 인식을 위한 태깅 말뭉치 및 단서어구 패턴 구축)

  • Jeong, Ju-Seok;Kim, Jun-Hyeouk;Kim, Hae-Il;Oh, Sung-Ho;Kang, Sin-Jae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.761-766
    • /
    • 2011
  • A hedge is a linguistic device to express uncertainties. Hedges are used in a sentence when the writer is uncertain or has doubt about the contents of the sentence. Due to this uncertainty, sentences with hedges are considered to be non-factual. There are many applications which need to determine whether a sentence is factual or not. Detecting hedges has the advantage in information retrieval, and information extraction, and QnA systems, which make use of non-hedge sentences as target to get more accurate results. In this paper, we constructed Korean hedge corpus, and extracted generalized hedge cue-word patterns from the corpus, and then used them in detecting hedges. In our experiments, we achieved 78.6% in F1-measure.