• Title/Summary/Keyword: Morpheme

Search Result 238, Processing Time 0.02 seconds

Morpheme-based Korean broadcast news transcription (형태소 기반의 한국어 방송뉴스 인식)

  • Park Young-Hee;Ahn Dong-Hoon;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.123-126
    • /
    • 2002
  • In this paper, we describe our LVCSR system for Korean broadcast news transcription. The main focus is to find the most proper morpheme-based lexical model for Korean broadcast news recognition to deal with the inflectional flexibilities in Korean. There are trade-offs between lexicon size and lexical coverage, and between the length of lexical unit and WER. In our system, we analyzed the training corpus to obtain a small 24k-morpheme-based lexicon with 98.8% coverage. Then, the lexicon is optimized by combining morphemes using statistics of training corpus under monosyllable constraint or maximum length constraint. In experiments, our system reduced the number of monosyllable morphemes from 52% to 29% of the lexicon and obtained 13.24% WER for anchor and 24.97% for reporter.

  • PDF

Relationship between Maternal Conversational Function and Question Type and Early Language Development (어머니가 사용한 담화기능 및 질문유형과 영아의 언어발달과의 관계)

  • Lee Kwee-Ock
    • The Korean Journal of Community Living Science
    • /
    • v.17 no.3
    • /
    • pp.3-14
    • /
    • 2006
  • The purpose of this study was to investigate the relationship between conversational function and question type in mothers' utterances and their infant's language development. The subjects were 20 infants from 1;07 to 1;11 years of age in Yanji, China. Each child's spontaneous natural speech during interaction with his/her mother was videotaped for about 30 minutes. The children and their mother's spontaneous utterances were transcribed and coded for the number of type and token of word, grammatical morpheme conversational function and type of question in mother's language input to her child. The result showed that mothers used questions as the most frequent conversational function with their infants. The number of questions in conversational function in mothers' utterances positively correlated with the type of word, type of morpheme and grammatical morpheme in infants' utterance. However, there was no correlation between mothers' language input and infant early language development.

  • PDF

Word Spacing Consistency Check using Syllable and Morpheme Information (음절 및 형태소 정보를 이용한 띄어쓰기 일관성 검사)

  • Lee, Jae-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.10-19
    • /
    • 2010
  • Korean word spacing rules have exceptional cases which permit both spacing and no-spacing between words. The exceptional cases, however, do not mean that inconsistent spacing between words or word-phrases is legitimate in a document proof reading. This paper proposes a word spacing consistency check method using syllable and morpheme information, and evaluated it through experiment.

Development of A Plagiarism Detection System Using Web Search and Morpheme Analysis (인터넷 검색과 형태소분석을 이용한 표절검사시스템의 개발에 관한 연구)

  • Hwang, In-Soo
    • Journal of Information Technology Applications and Management
    • /
    • v.16 no.1
    • /
    • pp.21-36
    • /
    • 2009
  • As the World Wide Web (WWW) has become a major channel for information delivery, the data accumulated in the Internet increases at an incredible speed, and it derives the advances of information search technologies. It is the search engine that solves the problem of information overloading and helps people to identify relevant information. However, as search engines become a powerful tool for finding information, the opportunities of plagiarizing have increased significantly in e-Learning. In this paper, we developed an online plagiarism detection system for detecting plagiarized documents that incorporates the functions of search engines and acts in exactly the same way of plagiarizing. The plagiarism detection system uses morpheme analysis to improve the performance and sentence-based comparison to investigate document comes from multiple sources. As a result of applying this system in e-Learning, the performance of plagiarism detection was improved.

  • PDF

Comprehension and Production of Causative and Passive Sentences in Multicultural Family Children (다문화 가정 아동의 사동문, 피동문 이해와 사동 및 피동 접미사 표현 능력)

  • PARK, Eun-Jong;PARK, Chan-Hee;PARK, Hyun
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.28 no.5
    • /
    • pp.1365-1377
    • /
    • 2016
  • The purpose of this study is to find out by comparing the ability of comprehension and production of causative and passive sentences and grammatical morphemes between multicultural family children and normal children. Fifteen multicultural family children and fifteen normal children aged 7-9 years participated in this study. The results of this study are as follows; First, the children of multicultural family showed significantly a lower ability to comprehension and production of causative and passive sentences compared to normal children. Second, the children of multicultural family showed the difference about the acquisition of grammatical morpheme of causative and passive compared to normal children. Third, multicultural family children and normal children were not statistically significant differences in comprehension and production abilities of causative and passive in accordance with the increase of age.

Efficient Language Model based on VCCV unit for Sentence Speech Recognition (문장음성인식을 위한 VCCV 기반의 효율적인 언어모델)

  • Park, Seon-Hui;No, Yong-Wan;Hong, Gwang-Seok
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.836-839
    • /
    • 2003
  • In this paper, we implement a language model by a bigram and evaluate proper smoothing technique for unit of low perplexity. Word, morpheme, clause units are widely used as a language processing unit of the language model. We propose VCCV units which have more small vocabulary than morpheme and clauses units. We compare the VCCV units with the clause and the morpheme units using the perplexity. The most common metric for evaluating a language model is the probability that the model assigns the derivative measures of perplexity. Smoothing used to estimate probabilities when there are insufficient data to estimate probabilities accurately. In this paper, we constructed the N-grams of the VCCV units with low perplexity and tested the language model using Katz, Witten-Bell, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.

  • PDF

A Focus-Based Approach to Scope Ambiguity in Japanese

  • Okabe, Ryoya
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.370-382
    • /
    • 2002
  • This paper puts forward an analysis of scope interactions between Japanese adverbial quantifiers like mainichi 'everyday'and tokidoki 'sometimes'and a negative morpheme nai 'not'on the basis of f(ocus)-structures. In this analysis, three f-structures are assigned to a sentence with an adverbial quantifier and a negative morpheme. One of them represents a negation-wide reading, and the other two represent quantifier-wide readings. Some f-structures, however, are unacceptable due to semantic or pragmatic factors. Different scope behaviors of the two quantifiers mentioned above can then be ascribed to acceptability of f-structures.

  • PDF

Using Syntactic Unit of Morpheme for Reducing Morphological and Syntactic Ambiguity (형태소 및 구문 모호성 축소를 위한 구문단위 형태소의 이용)

  • Hwang, Yi-Gyu;Lee, Hyun-Young;Lee, Yong-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.7
    • /
    • pp.784-793
    • /
    • 2000
  • The conventional morphological analysis of Korean language presents various morphological ambiguities because of its agglutinative nature. These ambiguities cause syntactic ambiguities and they make it difficult to select the correct parse tree. This problem is mainly related to the auxiliary predicate or bound noun in Korean. They have a strong relationship with the surrounding morphemes which are mostly functional morphemes that cannot stand alone. The combined morphemes have a syntactic or semantic role in the sentence. We extracted these morphemes from 0.2 million tagged words and classified these morphemes into three types. We call these morphemes a syntactic morpheme and regard them as an input unit of the syntactic analysis. This paper presents the syntactic morpheme is an efficient method for solving the following problems: 1) reduction of morphological ambiguities, 2) elimination of unnecessary partial parse trees during the parsing, and 3) reduction of syntactic ambiguity. Finally, the experimental results show that the syntactic morpheme is an essential unit for reducing morphological and syntactic ambiguity.

  • PDF

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

A Filtering Method of Malicious Comments Through Morpheme Analysis (형태소 분석을 통한 악성 댓글 필터링 방안)

  • Ha, Yeram;Cheon, Junseok;Wang, Inseo;Park, Minuk;Woo, Gyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.9
    • /
    • pp.750-761
    • /
    • 2021
  • Even though the replying comments on Internet articles have positive effects on discussions and communications, the malicious comments are still the source of problems even driving people to death. Automatic detection of malicious comments is important in this respect. However, the current filtering method of the malicious comments, based on forbidden words, is not so effective, especially for the replying comments written in Korean. This paper proposes a new filtering approach based on morpheme analysis, identifying coarse and polite morphemes. Based on these two groups of morphemes, the soundness of comments can be calculated. Further, this paper proposes various impact measures for comments, based on the soundness. According to the experiments on malicious comments, one of the impact measures is effective for detecting malicious comments. Comparing our method with the clean-bot of a portal site, the recall is enhanced by 37.93% point and F-measure is also enhanced up to 47.66 points. According to this result, it is highly expected that the new filtering method based on morpheme analysis can be a promising alternative to those based on forbidden words.