• Title/Summary/Keyword: Corpus analysis

Search Result 424, Processing Time 0.029 seconds

A Study on the Copyright for the Digital Data (디지털자료에 대한 저작권적 해석에 관한 연구 - 코퍼스를 중심으로 -)

  • 남영준
    • Journal of the Korean Society for information Management
    • /
    • v.14 no.1
    • /
    • pp.161-181
    • /
    • 1997
  • The purpose of this paper is to analyze the legality of the restructuring and the using of the library data by corpus, the linguistic analysis data. In the process of the analysis, the definition of corpus is tried and its possible application areas are mentioned. It is also proved how its applications are related to the copyright. In addition the problems in the present interpretations of the copyright for the library materials are analyzed in terms of to be read by machine rather than to be read by mankind, especially when the data is stored in the forms of the digital data.

  • PDF

Implementation of morphologica analyzer and spelling corrector for charcter recognition post-processing (문자 인식 후처리를 위한 형태소 분석기와 문자 교정기의 구현)

  • 이영화;김규성;김영훈;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.82-92
    • /
    • 1997
  • In this paper, we propose post-rpocessing method that corrects a misrecognized character by generated a characater recognizer using morphological analyzer and spelling corrector. The proposed post-processing consists of sthree phases : First, our method pass through morhological analyzer which only outputted necessary information for spelling correcting, doesn't analyze a bundle of phrases, and detects the location of misrecognized character. Second, tagging the generated candidate character using the information of character substitution table and grapheme substitution/separating table. Then we retry analysis after the misrecognition character has been substituted. Finally we select table, we investigate misrecognized charcters in CORPUS. Reliability analysis used to frequency of randomly selected about 100,000 words in CORPUS. A korean character recognizer demonstrates 93% correction rate without a post-processing. The entire recognition rate of our system with a post-processing exceeds 97% correction rate.

  • PDF

Building an Exceptional Pronunciation Dictionary For Korean Automatic Pronunciation Generator (한국어 자동 발음열 생성을 위한 예외발음사전 구축)

  • Kim, Sun-Hee
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.167-177
    • /
    • 2003
  • This paper presents a method of building an exceptional pronunciation dictionary for Korean automatic pronunciation generator. An automatic pronunciation generator is an essential element of speech recognition system and a TTS (Text-To-Speech) system. It is composed of a part of regular rules and an exceptional pronunciation dictionary. The exceptional pronunciation dictionary is created by extracting the words which have exceptional pronunciations from text corpus based on the characteristics of the words of exceptional pronunciation through phonological research and text analysis. Thus, the method contributes to improve performance of Korean automatic pronunciation generator as well as the performance of speech recognition system and TTS system.

  • PDF

An Establishment of Entrepreneurship Ontology through Analysis of Intellectual Structure in Entrepreneurship Research (창업학 지식구조 분석결과를 활용한 창업 온톨로지 구축)

  • Shimi, Jaehu;Choi, Myeonggil
    • Journal of Information Technology Applications and Management
    • /
    • v.20 no.2
    • /
    • pp.161-176
    • /
    • 2013
  • The outcomes of entrepreneurship studies have been tried to help the entrepreneurs in start-up stages, but the outcomes of the entrepreneurship research are not fully utilized to guide the activities of the entrepreneurs in start-up businesses. To utilize the outcomes of entrepreneurship research for helping entrepreneurs effectively, an entrepreneurship ontology, a systemized specification of the knowledge in the entrepreneurship research, has to be established, Based on the entrepreneurship ontology, the knowledge of entrepreneurial processes can be illustrated, and a diagnosis and coaching system for the entrepreneurs can be built effectively. To establish an entrepreneurship ontology, this study investigates the intellectual structure of entrepreneurship studies by analyzing the contents of top journals in entrepreneurship field, and identifies the relationship among the key concepts through bibliometric analyses based on entrepreneurship corpus, This study suggests a method of establishing entrepreneurship ontology and utilization of the ontology. Through utilization of the entrepreneurship ontology, it is expected to explain the entrepreneurial processes effectively and to improve the rate of business success.

Developing a Sentiment Analysing and Tagging System (감성 분석 및 감성 정보 부착 시스템 구현)

  • Lee, Hyun Gyu;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.8
    • /
    • pp.377-384
    • /
    • 2016
  • Our goal is to build the system which collects tweets from Twitter, analyzes the sentiment of each tweet, and helps users build a sentiment tagged corpus semi-automatically. After collecting tweets with the Twitter API, we analyzes the sentiments of them with a sentiment dictionary. With the proposed system, users can verify the results of the system and can insert new sentimental words or dependency relations where sentiment information exist. Sentiment information is tagged with the JSON structure which is useful for building or accessing the corpus. With a test set, the system shows about 76% on the accuracy in analysing the sentiments of sentences as positive, neutral, or negative.

A Method of Intonation Modeling for Corpus-Based Korean Speech Synthesizer (코퍼스 기반 한국어 합성기의 억양 구현 방안)

  • Kim, Jin-Young;Park, Sang-Eon;Eom, Ki-Wan;Choi, Seung-Ho
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.193-208
    • /
    • 2000
  • This paper describes a multi-step method of intonation modeling for corpus-based Korean speech synthesizer. We selected 1833 sentences considering various syntactic structures and built a corresponding speech corpus uttered by a female announcer. We detected the pitch using laryngograph signals and manually marked the prosodic boundaries on recorded speech, and carried out the tagging of part-of-speech and syntactic analysis on the text. The detected pitch was separated into 3 frequency bands of low, mid, high frequency components which correspond to the baseline, the word tone, and the syllable tone. We predicted them using the CART method and the Viterbi search algorithm with a word-tone-dictionary. In the collected spoken sentences, 1500 sentences were trained and 333 sentences were tested. In the layer of word tone modeling, we compared two methods. One is to predict the word tone corresponding to the mid-frequency components directly and the other is to predict it by multiplying the ratio of the word tone to the baseline by the baseline. The former method resulted in a mean error of 12.37 Hz and the latter in one of 12.41 Hz, similar to each other. In the layer of syllable tone modeling, it resulted in a mean error rate less than 8.3% comparing with the mean pitch, 193.56 Hz of the announcer, so its performance was relatively good.

  • PDF

Building an RST-tagged Corpus and its Classification Scheme for Korean News Texts (한국어 수사구조 분류체계 수립 및 주석 코퍼스 구축)

  • Noh, Eunchung;Lee, Yeonsoo;Kim, YeonWoo;Lee, Do-Gil
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.33-38
    • /
    • 2016
  • 수사구조는 텍스트의 각 구성 성분이 맺고 있는 관계를 의미하며, 필자의 의도는 논리적인 구조를 통해서 독자에게 더 잘 전달될 수 있다. 따라서 독자의 인지적 효과를 극대화할 수 있도록 수사구조를 고려하여 단락과 문장 구조를 구성하는 것이 필요하다. 그럼에도 불구하고 지금까지 수사구조에 기초한 한국어 분류체계를 만들거나 주석 코퍼스를 설계하려는 시도가 없었다. 본 연구에서는 기존 수사구조 이론을 기반으로, 한국어 보도문 형식에 적합한 30개 유형의 분류체계를 정제하고 최소 담화 단위별로 태깅한 코퍼스를 구축하였다. 또한 구축한 코퍼스를 토대로 중심문장을 비롯한 문장 구조의 특징과 분포 비율, 신문기사의 장르적 특성 등을 살펴봄으로써 텍스트에서 응집성의 실현 양상과 구문상의 특징을 확인하였다. 본 연구는 한국어 담화 구문에 적합한 수사구조 분류체계를 설계하고 이를 이용한 주석 코퍼스를 최초로 구축하였다는 점에서 의의를 갖는다.

  • PDF

Cumulative Effects of Constituents from the Mushroom Calvatia nipponica on the Contractility of Penile Corpus Cavernosum Smooth Muscle

  • Lee, Seulah;Kim, Min-Ji;Lee, Bum Soo;Ryoo, Rhim;Kim, Hye Kyung;Kim, Ki Hyun
    • Mycobiology
    • /
    • v.48 no.2
    • /
    • pp.153-156
    • /
    • 2020
  • Calvatia nipponica, a puffball mushroom (Agaricaceae), is thought to be an aphrodisiac, as this mushroom is traditionally known to improve sexual function in males. As part of the systematic study to determine the bioactive secondary metabolites from C. nipponica responsible for aphrodisiac effects, chemical analysis of methanol (MeOH) extracts of the fruiting bodies of C. nipponica resulted in the isolation of two major compounds: N,N-dimethyl-anthranilic acid (1) and (7Z,10Z)-7,10-octadecadienoic acid methyl ester (2). Compounds 1 and 2 were evaluated for cumulative dose-dependent relaxation responses to precontracted penile corpus smooth muscle (PCCSM). Results show that compounds 1 and 2 exhibited a maximum relaxation effect of 20.33 ± 2.18% and 24.63 ± 3.60%, respectively. These findings indicate that compounds 1 and 2, major components of C. nipponica, could potentially be used to treat erectile dysfunction, functioning as natural aphrodisiacs.

This study revises Lee Hyo-seok's The Buckwheat Season, utilizing Novel Corpus, intermediate learners' level (소설텍스트의 난이도 조정 방안 연구 -이효석의 「메밀꽃 필 무렵」을 중심으로-)

  • Hwang, Hye ran
    • Journal of Korean language education
    • /
    • v.29 no.4
    • /
    • pp.255-294
    • /
    • 2018
  • The Buckwheat Season, evaluated as the best of Lee Hyo-seok's literature, is one of the short stories that represent Korean literature. However, vivid literary expressions such as lyrical and beautiful depictions, figurative expressions and dialects, which show the Korean beauty, rather make learners have difficulty and become a factor that fails in reading comprehension. Thus, it is necessary to revise and present the text modified for the learners' language level. The methods of revising a literary text include the revision of linguistic elements such as cryptic vocabulary or sentence structure and the revision of the composition of the text, e.g. suggestion of characters or plot, or insertion of illustration. The methods of revising the language of the text can be divided into methods of simplification and detailing. However, in the process of revising the text, many depend on the adapter's subjective perception, not revising it with objective criteria. This paper revised the text, utilizing by the Academy of Korean Studies, , and the by the National Institute of Korean Language to secure objectivity in revising the text.