• Title/Summary/Keyword: Corpus Frequency

Search Result 166, Processing Time 0.027 seconds

Construction of an Efficient Pre-analyzed Dictionary for Korean Morphological Analysis (한국어 형태소 분석을 위한 효율적 기분석 사전의 구성 방법)

  • Kwak, Sujeong;Kim, Bogyum;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.881-888
    • /
    • 2013
  • A pre-analyzed dictionary is used to increase the speed and the accuracy of morphological analyzers and to decrease the over-generation. However, if the dictionary includes 'Insufficiently-analyzed word-phrases', which do not include all the possible analysis of the word-phrase, it may cause the decrease of the analysis accuracy. In this paper, we measure the accuracy changes according to the number of word-phrase frequency and the size changes of corpus by Sejong corpus. And performance of integrate system(SMA with pre-dictionary) is highest when sufficient analysis rate of pre-dictionary is more than 99.82%. Also pre-dictionary is constructed with word-phrase that frequency more than 32(64) when size of corpus is 1,600,000(6,300,000) word-phrase.

Extracting Core Events Based on Timeline and Retweet Analysis in Twitter Corpus (트위터 문서에서 시간 및 리트윗 분석을 통한 핵심 사건 추출)

  • Tsolmon, Bayar;Lee, Kyung-Soon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.69-74
    • /
    • 2012
  • Many internet users attempt to focus on the issues which have posted on social network services in a very short time. When some social big issue or event occurred, it will affect the number of comments and retweet on that day in twitter. In this paper, we propose the method of extracting core events based on timeline analysis, sentiment feature and retweet information in twitter data. To validate our method, we have compared the methods using only the frequency of words, word frequency with sentiment analysis, using only chi-square method and using sentiment analysis with chi-square method. For justification of the proposed approach, we have evaluated accuracy of correct answers in top 10 results. The proposed method achieved 94.9% performance. The experimental results show that the proposed method is effective for extracting core events in twitter corpus.

중국 코퍼스와 인터넷을 이용한 중한사전 표제어의 오류 연구 - F2-1을 중심으로

  • Baek, Jong-In
    • 중국학논총
    • /
    • no.63
    • /
    • pp.47-64
    • /
    • 2019
  • 当今在韩国流通的中韩词典收词颇多, 但词典里翻开哪已叶不难发现令人莫名其妙的词汇, 而且这些词汇当中有的甚至连汉语大词典里都找不到. 我们发现这些词汇里往往出现解释有误的问题. 本文主要探讨了这些解释有误词汇. 为此, 先从中韩词典里筛选出在现代汉语语料库中出现的次数少于十次的词汇. 我们认为此文里筛选出的这些词汇很可能不太正规或现在不怎幺使用. 为了使这种推测能得到更准确的印证, 作者在百度网上又检索了是否出现它们的用例, 之后, 就发现这些词汇确实存在各种问题, 需要校正这些解释有误的词汇. 本文以F2-1部分一千五百个词条为研究对象进行了适当性调查. 通过这次研究发现F2-1部分低频率词条有348个词, 其中45个词有各种问题. 值得探讨的是在汉韩词典里对这些低频率词条的说明出现不少错误, 许多词汇根本不适合被收录到词典里. 我们把这些带错误的词汇分成三各部分 : 1. 词汇解释有误, 2. 漏意味项, 3. 其他错误, 进行讨论. 我们将要继续研究其他项目的词条. 希望这些研究对中韩词典的编辑有所帮助.

Korean prosodic properties between read and spontaneous speech (한국어 낭독과 자유 발화의 운율적 특성)

  • Yu, Seungmi;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.39-54
    • /
    • 2022
  • This study aims to clarify the prosodic differences in speech types by examining the Korean read speech and spontaneous speech in the Korean part of the L2 Korean Speech Corpus (speech corpus for Korean as a foreign language). To this end, the articulation length, articulation speed, pause length and frequency, and the average fundamental frequency values of sentences were set as variables and analyzed via statistical methodologies (t-test, correlation analysis, and regression analysis). The results found that read speech and spontaneous speech were structurally different in the form of prosodic phrases constituting each sentence and that the prosodic elements differentiating each speech type were articulation length, pause length, and pause frequency. The statistical results show that the correlation between articulation speed and articulation length was highest in read speech, explaining that the longer a given sentence is, the faster the speaker speaks. In spontaneous speech, however, the relationship between the articulation length and the pause frequency in a sentence was high. Overall, spontaneous speech produces more pauses because short intonation phrases are continuously built to make a sentence, and as a result, the sentence gets lengthened.

Relation of Ethanol and Calcium to Contractile and Electrical Activity of Cat Stomach (고양이 위(胃)의 수축 및 전기활동에 대한 에탄올과 칼슘의 관계)

  • Kim, Myung-Suk;Sim, Sang-Soo;Yoon, Shin-Hee;Han, Sang-Jun;Kim, Chung-Chin;Choi, Hyun
    • The Korean Journal of Physiology
    • /
    • v.21 no.2
    • /
    • pp.259-272
    • /
    • 1987
  • This was study carried out to investigate the effect of calcium on spontaneous contraction and electrical activity induced by ethanol in gastric smooth muscle. After peeling off the mucous membrane from the isolated whole stomach of 102 cats, two kinds of small muscle preparations $(2.0{\times}0.2\;cm)$, one longitudinal and the other circular, were excised from the fundus, the corpus and the antrum portion of each whole stomach specimen. The isometric contraction of the small muscle preparation was measured in a cylinder-shaped chamber filled with Krebs-Ringer-dextrose solution (pH 7.4, temperature $36{\pm}0.5^{\circ}C$) bubbling with 5% $CO_2$ in $O_2$. A large muscle preparation $(5.0{\times}1.2\;cm)$ was excised from the anterior wall of the corpus-antrum portion of the same specimen in 72 of 102 cats. The gastric electrical activity (slow wave and spike potential) was monopolarly recorded by four capillary electrodes (Ag-AgCl), of which two were placed on the corpus and two on the antrum, in a muscle chamber filled with the same solution as described above. Changes in the amplitude of the contraction, frequency of the gastric slow wave and the production of the spike potential were observed after adding ethanol and/or under the treatments with verapamil, $CaCl_2$ and Ca-free Krebs-Ringer-dextrose solution. The results were as follows: 1) After adding ethanol, the spontaneous phasic contraction of the corpus was reduced dose-dependently (0.125-2.0%), which was totally abolished by higher concentrations (2.0-8.0%) of ethanol. 2) The corporal phasic contraction was also completely abolished by verapamil $(3{\times}10^{-5}\;M)$ or Ca-free Krebs-Ringer-dextrose solution. The contraction was increased by $CaCl_2\;(1.8{\times}10^{-3}\;M)$, but the inhibitory effect of ethanol on the contraction persisted even under the treatment with $CaCl_2$. 3) At higher concentrations, ethanol caused tonic contraction of both preparations from the fundus, the corpus and the antrum in a dose-dependent manner. The tonic contraction of the fundus produced by ethanol was not influenced by $CaCl_2$ or verapamil, whereas the tonic contraction was not produced by ethanol in tile Ca-free solution. 4) Frequency of gastric slow wave was decreased dose-dependently by the addition of ethanol (0.25-1.0%), and tile slow wave was not produced by higher concentration of ethanol (2.0%). 5) The frequency of slow wave was significantly reduced by verapamil only and the inhibitory influence of ethanol on the slow wave frequency was reinforced by verapamil. 6) The treatment of $CaCl_2$ increased significantly the slow wave frequency, and attenuated the inhibitory effect of ethanol on the frequency. It is therefore suggested that ethanol regulates the phasic contraction and the production of slow wave by interfering with the transport of calcium in the stomach muscle of the cat.

  • PDF

Pharmacological evidences that vasoactive intestinal polypeptide is not involved in non-adrenergic non-cholinergic relaxation in rabbit corpus cavernosum

  • Park, Mi-Sun;Hong, Eun-Ju;Hong, Sung-Cheul
    • Proceedings of the Korean Society of Applied Pharmacology
    • /
    • 1996.04a
    • /
    • pp.217-217
    • /
    • 1996
  • The putative role of vasoactive intestinal polypeptide (VIP) as non-adrenergic non-cholinergic (NANC) neurotransmitter has been studied in rabbit corpus cavernosum. In the presence of atropine and guanethidine the short and prolonged electrical field stimulation (EFS, 2~16 ㎐) induced a frequency-dependent relaxation which was abolished by tetrodotoxin (0.3 ${\mu}$M), a nerve conductance blocker. The neurogenic relaxant reponses were not affected in the presence of VIP-inactivating peptidase, ${\alpha}$-chymotrypsin (2 units/$m\ell$), whereas VIP-induced relaxation were completely abolished. Inhibition of nitric oxide synthase by N$\^$G/-nitro-L-arginine (10~100 ${\mu}$M) caused concentration-dependent inhibition to the neurogenic relaxant responses and at 100 ${\mu}$M the relaxations were virtually abolished. In contrast NO (3~30 ${\mu}$M) and VIP (0.001~l ${\mu}$M)-induced relaxation were unaffected. The inhibitory effect of L-NNA was reversed in the presence of L-arginine (5 mM), the precursor of the NO biosynthesis. Hemog1obin (20~60 ${\mu}$M), sequestering NO in the extracellular space, abolished the NO-evoked relaxation and also caused a concentration-dependent inhibition to the neurogenic relaxation. These observation indicate that NANC relaxation induced by prolonged EFS of rabbit corpus cavernosum is also mediated mainly by nitric oxide as same as that of short EFS, and suggest that VIP is not involved in NANC relaxation of rabbit corpus cavernosum and NO would not be produced by VIP in this tissue.

  • PDF

A Study on the Simple Algorithm for Discrimination of Voiced Sounds (유성음 구간 검출을 위한 간단한 알고리즘에 관한 연구)

  • 장규철;우수영;박용규;유창동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.8
    • /
    • pp.727-734
    • /
    • 2002
  • A simple algorithm for discriminating voiced sounds in a speech is proposed in this paper. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.

The Role of Distributional Cues in the Acquisition of Verb Argument Structures

  • Kim, Mee-Sook
    • Language and Information
    • /
    • v.7 no.1
    • /
    • pp.87-99
    • /
    • 2003
  • This paper investigates the role of input frequency in the acquisition of verb argument structures based on distributional information of a corpus of utterances derived from the English CHILDES database (MacWhinney 1993). It has been widely accepted that children successfully learn verb argument structures by innate language mechanisms, such as linking rules which connect verb meanings and its syntactic structures. In contrast, an approach to language acquisition called “statistical language learning” has currently claimed that children could succeed in acquiring syntactic structures in the absence of innate language mechanisms, making use of distributional properties of the input. In this paper, I evaluate the feasibility of the statistical learning in acquiring verb argument structures, based on distributional information about locative verbs in parental input. The naturalistic data allow us to investigate to what extent the statistical learning approach can and cannot help children succeed in learning the syntax of locative verbs. Based on the results of English database analysis, I show that there is rich statistical information for learning the syntactic possibilities of locative verbs in parental input, despite some limitations in the statistical learning approach.

  • PDF

The Formant Frequency Differences of English Vowels as a Function of Stress and its Applications on Vowel Pronunciation Training (강세에 따른 영어 모음의 포먼트 변이와 모음 발음 교육에의 응용)

  • Kim, Ji-Eun;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.53-58
    • /
    • 2013
  • The purpose of this study is to compare the first two vowel formants of the stressed and unstressed English vowels produced by ten young males (in their twenties and thirties) and ten old males (in their forties or fifties) from the Buckeye Corpus of Conversational Speech. The results indicate that the stressed and unstressed vowels, /i/ and $/{\ae}/$ in particular, from the two groups are different in their formant frequencies. In addition, the vowel space of the unstressed vowels is somewhat smaller than that of the stressed vowels. Specifically, the range of the second formant of the unstressed vowels and that of the first formant of the unstressed front vowels were compressed. The findings from this study can be applied to the pronunciation training for the Korean learners of English vowels. We propose that teachers of English pay attention to the stress patterns of English vowels as well as their formant frequencies.

Implementation of morphologica analyzer and spelling corrector for charcter recognition post-processing (문자 인식 후처리를 위한 형태소 분석기와 문자 교정기의 구현)

  • 이영화;김규성;김영훈;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.82-92
    • /
    • 1997
  • In this paper, we propose post-rpocessing method that corrects a misrecognized character by generated a characater recognizer using morphological analyzer and spelling corrector. The proposed post-processing consists of sthree phases : First, our method pass through morhological analyzer which only outputted necessary information for spelling correcting, doesn't analyze a bundle of phrases, and detects the location of misrecognized character. Second, tagging the generated candidate character using the information of character substitution table and grapheme substitution/separating table. Then we retry analysis after the misrecognition character has been substituted. Finally we select table, we investigate misrecognized charcters in CORPUS. Reliability analysis used to frequency of randomly selected about 100,000 words in CORPUS. A korean character recognizer demonstrates 93% correction rate without a post-processing. The entire recognition rate of our system with a post-processing exceeds 97% correction rate.

  • PDF