• Title/Summary/Keyword: Parts of Speech

Search Result 136, Processing Time 0.022 seconds

A Segmentation Algorithm of the Connected Word Speech by Statistical Method (統計的인 方法에 依한 連結音의 音素分割 알고리듬)

  • Cho, Jeong-Ho;Hong, Jae-Keun;Kim, Soo-Joong
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.4
    • /
    • pp.151-163
    • /
    • 1989
  • A statistical approach for the segmentation of speed signals is described in this paper. The main idea of this algorithm is the use of three AR models. Two fixed models are identified at the stationary parts of the signal before and after the spectral change. Changes are detected when the distance between these two models is high. Another model is located between two fixed models and is used to estimate spectral change time. This segmentation algorithm has been tested with connected words and compared to classical methods. The results showed that it can provide more accurate locations of boundaries of segments and can reduce the amount of oversegmentation.

  • PDF

Syntactic Analysis of Korean Sentence for Machine Translation (한국어의 Machine translation을 위한 구문 구조 분석)

  • Lee, Ju-Geun;Han, Seong-Guk;Jeon, Byeong-Dae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.18 no.5
    • /
    • pp.15-21
    • /
    • 1981
  • This paper deals with the syntactic analysis algorithms of Korean sentence and system for machine translation. The parts of speech and constituients are syntactically analized at unified view-points and then an effective classification algorithm is proposed. The constituients which are applied an inverse movement transformation algorithm are processed with the concept of attribute. Syntactic analysis system is constructed to generate parsing table including the deep structure of sentence by lexicon proper to the combinational property of Korean and breadth-first searching method. The results obtained from the system program are shown as the parsing table of source sentences.

  • PDF

Analysis on Vocabulary Used in School Newsletters of Korean elementary Schools: Focus on the areas of Busan, Ulsan and Gyeongnam (한국 초등학교 가정통신문의 어휘 특성 연구 -부산·울산·경남 지역을 중심으로-)

  • Kang, Hyunju
    • Journal of Korean language education
    • /
    • v.29 no.2
    • /
    • pp.1-23
    • /
    • 2018
  • This study aims to analyze words and phrases which are frequently used in newsletters from Korean elementary schools. In order to achieve this goal, high frequent words from school newsletters were selected and classified into content and function words, and the domains of the words were looked up. For this study 1,000 school newsletters were collected in the areas of Busan, Ulsan and Gyeongnam. In terms of parts of speech, nouns, especially common nouns, most frequently appeared in the school newsletters followed by verbs and adjectives. This result shows that for immigrant women who have basic knowledge on Korean language, it is useful to give translated words to get the message of school letters. Furthermore, school related terms such as facilities, regulations and activities of school and Chinese-based vocabularies are found in school newsletters. In case of verbs, the words which contain the meaning of requests and suggestions are used the most. Adjectives which are related to positive value and evaluation, and describing weather and season is frequently used as well.

A Comparative Study of Aphasics' Abilities in Reading and Writing Hangul and Hanja

  • Kim, Heui-Beom
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.289-293
    • /
    • 1996
  • In Korean, as with Kana and Kanji in Japanese, two kinds of word-writing systems--Hangul (the Korean alphabet) and Hanja (the Chinese character; Kanji in Japanese)--have been and still are being used. Hangul is phonetic while Hanja is ideographic. A phonetic alphabet represents the pronunciation of words, wheras ideographs are where a character of a writing system represents a concept. Aphasics suffer from language disorders following brain damage. The reading and writing of Hangul and Hanja by two Korean Broca's aphasics were analyzed with two goals. The first goal was to confirm the functional autonomy of reading and writing systems in the brain that has been argued by other researchers. The second goal was to reveal what difference the subjects show in reading and writing Hangul and Hanja. As experimental materials, 50 monosyllabic words were chosen in Hangul and Hanja respectively. The 50 word pairs of Hangul and Hanja have the same meaning and are also the most familiar monosyllabic words for a group of normal adults in their fifties and sixties. The errors that the aphasic subjects made in performing the experimental materials are analyzed and discussed here. This analysis has confirmed that reading and writing systems are located in different parts in the brain. Furthemore, it seems clear that the two writing systems of Hangul and Hanja have their own respective processes.

  • PDF

Differential semantic processing in Korean and English Word Naming (모국어와 외국어 어휘 산출 시 의미정보처리 과정의 차이)

  • Her, Ju-Young;Koo, Min-Mo;Nam, Ki-Chun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.180-182
    • /
    • 2007
  • The present study was carried out to investigate how two languages are represented and processed for the late Korean-English bilinguals. To this end, we compared the naming times of Korean-English bilinguals on a series of the picture-word interference tasks. The entire experiment is divided into four parts, each of which required participants to name the pictures in Korean or in English with distractor words visually presented either in Korean or English. The distractor words were semantically related or unrelated to the picture. The results showed that, in different language conditions (L1 naming-L2 distractor, L2 naming - L1 distractor), there was only numerical difference between semantic related and unrelated condition. In same language conditions (L1 naming-L1 distractor, L2 naming-L2 distractor), however, significant semantic interference effect occurred. And, the interference effect was stronger in the L1 distractor condition than in the L2 distractor condition. These results suggest that the semantic processing of L1 and L2 for the late bilinguals are independent each other.

  • PDF

Terminology Tagging System using elements of Korean Encyclopedia (백과사전 기반 전문용어 태깅 시스템)

Improvement of a Korean Speller with Collocation of Parts of Speech (연어 정보를 이용한 한국어 철자 검사기의 기능 개선)

  • Sim, Chul-Min;Kim, Hyun-Jin;Kim, Young-Jin;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 1995.10a
    • /
    • pp.86-90
    • /
    • 1995
  • 본 논문에서는 한 어절 단위에서 다수 어절 단위로 그 고려 영역을 확장한 개선된 철자 검사기를 제시한다. 개선된 철자 검사기는 1) 한 어절 철자 검사 교정부, 2) 언어 규칙 처리부, 3) 문장 부호 규칙 처리부로 구성된다. 한 어절 철자 검사 교정부는 기존의 철자 검사기와 같은 기능을 수행한다. 연어 규칙처리부는 형태소간의 연어 관계를 이용하여 7가지로 유형 분류된 어절 간 오류를 처리한다. 문장 부호 처리부는 문장 부호 자체의 오류와 문장 부호를 참조하여 좌우 어절들의 오류를 검사한다. 현재 256가지의 연이 규칙과 51가지의 문장 부호 규칙이 구축되어 있다. 본 논문에서 제시한 개선된 철자 검사기는 한국어 문체 검사기(Korean Style Checker) 로서 의의를 가지며, 형태소의 연어 정보는 향후 파싱 등의 문장 분석이나 의미 분석에 중요한 자료로 이용될 수 있을 것으로 기대된다.

  • PDF

Research on Chinese Microblog Sentiment Classification Based on TextCNN-BiLSTM Model

  • Haiqin Tang;Ruirui Zhang
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.842-857
    • /
    • 2023
  • Currently, most sentiment classification models on microblogging platforms analyze sentence parts of speech and emoticons without comprehending users' emotional inclinations and grasping moral nuances. This study proposes a hybrid sentiment analysis model. Given the distinct nature of microblog comments, the model employs a combined stop-word list and word2vec for word vectorization. To mitigate local information loss, the TextCNN model, devoid of pooling layers, is employed for local feature extraction, while BiLSTM is utilized for contextual feature extraction in deep learning. Subsequently, microblog comment sentiments are categorized using a classification layer. Given the binary classification task at the output layer and the numerous hidden layers within BiLSTM, the Tanh activation function is adopted in this model. Experimental findings demonstrate that the enhanced TextCNN-BiLSTM model attains a precision of 94.75%. This represents a 1.21%, 1.25%, and 1.25% enhancement in precision, recall, and F1 values, respectively, in comparison to the individual deep learning models TextCNN. Furthermore, it outperforms BiLSTM by 0.78%, 0.9%, and 0.9% in precision, recall, and F1 values.

Token-Based Classification and Dataset Construction for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 기반의 분류 및 데이터셋)

  • Sungmin Ko;Youhyun Shin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.181-188
    • /
    • 2024
  • Traditional profanity detection methods have limitations in identifying intentionally altered profanities. This paper introduces a new method based on Named Entity Recognition, a subfield of Natural Language Processing. We developed a profanity detection technique using sequence labeling, for which we constructed a dataset by labeling some profanities in Korean malicious comments and conducted experiments. Additionally, to enhance the model's performance, we augmented the dataset by labeling parts of a Korean hate speech dataset using one of the large language models, ChatGPT, and conducted training. During this process, we confirmed that filtering the dataset created by the large language model by humans alone could improve performance. This suggests that human oversight is still necessary in the dataset augmentation process.

Development of Neck-Type Electrolarynx Blueton and Acoustic Characteristic Analysis (경부형 전기인공후두 Blueton의 개발과 음향학적 성능 분석)

  • Choi, Seong-Hee;Park, Young-Jae;Park, Young-Kwan;Kim, Tae-Jung;Nam, Do-Hyun;Lim, Sung-Eun;Lee, Sung-Eun;Kim, Han-Soo;Choi, Hong-Shik;Kim, Kwang-Moon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.1
    • /
    • pp.37-42
    • /
    • 2004
  • Electrolarynx(EL), battery operated vibrators which are held against the neck by on-off button, has been widely used as a verbal communication method among post-laryngectomized patients. EL speech can produce easily without need of any additional surgery or special training and be used with any other methods. This institute developed a neck-typed EL named "Blueton" in commperation with EL Company Linkus, which consists of 3 parts : Vibrator part, Control part, Battery part. In this study we evaluated the acoustic characteristics of the produced voices by Blueton compared with Servox-inton using MDVP. Three EL users (2 full time users, 1 part time user) were participated. The results revelaed that NHR higher in Servox than Blueton and intensity is higher in Blueton than Servox. The spectra for vowels produced by EL speakers are mixed signals combined with talkers' vocal output and electrolarynx noise. The spectra pattern is similar with two ELs. High, SPI index and vowel spectra from MDVP demonstrated characteristics of both electrolarynxes related to noise signal. This finding suggests that Blueton helps to provide one of useful rehabilitation options in the post laryngectomy patients.

  • PDF