• Title/Summary/Keyword: Corpus-based Study

Search Result 204, Processing Time 0.024 seconds

Prosodic features and discourse functions of discourse marker 'mak'('막') ('막'의 운율적 특성과 담화적 기능)

  • Song, Inseong
    • Korean Linguistics
    • /
    • v.65
    • /
    • pp.211-236
    • /
    • 2014
  • The aim of this study is to investigate categorical characteristics of 'mak' and their discourse functions through analyzed the prosodic features of 'mak'. The previous studies of 'mak' focused on grammatical or semantic characteristics, but this study focuses on the prosodic features of 'mak' based on speech data. As a result, adverb 'mak' and discourse marker 'mak' are distinguished from prosodic boundary, duration, pause and sort of number tonal patterns. Functions of discourse marker 'mak' is as follows: Maintenance of utterance, Attention, Delay, Expression negative manner. These functions have salient prosodic features related to their functions. Consequently prosodic features are important to analyze categorical characteristics and to establish functions of 'mak'.

A Study of the Realization of Speech Act and Teaching-learning Contents of Korean Speculative Expressions (한국어 추측 표현의 화행 실현 양상과 교수학습 내용 연구)

  • Jeong, Mi-Jin
    • Korean Linguistics
    • /
    • v.76
    • /
    • pp.187-211
    • /
    • 2017
  • The purpose of this study is to investigate the speech act realization of speculative expressions and to present their teaching-learning contents. It is hard for Korean learners to use speculative expressions appropriately because there are various similar expressions and their meaning is distinctive in detail. This study describes speech act realizations of '-는 것 같다, -을까, -나 보다, -을걸'. All these forms have the meaning of speculations, so they are mainly used to present uncertain information or thoughts of speaker. But they show distinctive aspects. '-는 것 같다' is mainly used to present contents contrary to their counterparts' opinions or irritating for their counterparts. It is used as polite forms because it conveys meanings of uncertainty. Especially in these contexts, it performs the refusal speech acts. '-을까' has the characteristic feature in the complex forms such as '뭐랄까', '뭐라고 할까' and it performs request speech acts more frequently than '-는 것 같다'. Also it is used to express the speakers' opinions contrary to their counterparts'. '-나 보다' expresses speaker's speculations based on hearer's conditions or his speech, so it is used to respond to hearer actively and express interests unlike other speculative expressions. '-을걸' isn't used to perform request, to express interests to hearer. However, it is mainly used when speaker has the contrary assumptions or expectations to hearer's. Based on the analyze, this study presents and grades teaching-learning contents of speculative expressions.

A Study on Environmental research Trends by Information and Communications Technologies using Text-mining Technology (텍스트 마이닝 기법을 이용한 환경 분야의 ICT 활용 연구 동향 분석)

  • Park, Boyoung;Oh, Kwan-Young;Lee, Jung-Ho;Yoon, Jung-Ho;Lee, Seung Kuk;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.2
    • /
    • pp.189-199
    • /
    • 2017
  • Thisstudy quantitatively analyzed the research trendsin the use ofICT ofthe environmental field using the text mining technique. To that end, the study collected 359 papers published in the past two decades(1996-2015)from the National Digital Science Library (NDSL) using 38 environment-related keywords and 16 ICT-related keywords. It processed the natural languages of the environment and ICT fields in the papers and reorganized the classification system into the unit of corpus. It conducted the text mining analysis techniques of frequency analysis, keyword analysis and the association rule analysis of keywords, based on the above-mentioned keywords of the classification system. As a result, the frequency of the keywords of 'general environment' and 'climate' accounted for 77 % of the total proportion and the keywords of 'public convergence service' and 'industrial convergence service' in the ICT field took up approximately 30 % of the total proportion. According to the time series analysis, the researches using ICT in the environmental field rapidly increased over the past 5 years (2011-2015) and the number of such researches more than doubled compared to the past (1996-2010). Based on the environmental field with generated association rules among the keywords, it was identified that the keyword 'general environment' was using 16 ICT-based technologies and 'climate' was using 14 ICT-based technologies.

Vicarious Radiometric Calibration of RapidEye Satellite Image Using CASI Hyperspectral Data (CASI 초분광 영상을 이용한 RapidEye 위성영상의 대리복사보정)

  • Chang, An Jin;Choi, Jae Wan;Song, Ah Ram;Kim, Ye Ji;Jung, Jin Ha
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.3
    • /
    • pp.3-10
    • /
    • 2015
  • All kinds of objects on the ground have inherent spectral reflectance curves, which can be used to classify the ground objects and to detect the target. Remotely sensed data have to be transferred to spectral reflectance for accurate analysis. There are formula methods provided by the institution, mathematical model method and ground-data-based method. In this study, RapidEye satellite image was converted to reflectance data using spectral reflectance of a CASI hyperspectral image by using vicarious radiometric calibration. The results were compared with those of the other calibration methods and ground data. The proposed method was closer to the ground data than ATCOR and New Kurucz 2005 method and equal with ELM method.

A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis (한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구)

  • Kang, Sun-Mee;Kwon, Oh-Il
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

First Report of Two Cephalobidae Species (Nematoda: Cephalobomorpha) in South Korea

  • Kim, Taeho;Kim, Jiyeon;Park, Joong-Ki
    • Animal Systematics, Evolution and Diversity
    • /
    • v.34 no.4
    • /
    • pp.181-189
    • /
    • 2018
  • Cephalobus aff. quinilineatus (Shavrov, 1968) Anderson and Hooper, 1970 and Eucephalobus hooperi MarinariPalmisano, 1967 from the family Cephalobidae Filipjev, 1934 (Cephalobomorpha) are newly reported from South Korea. Cephalobus aff. quinilineatus is distinguished from other Cephalobus species by its high and rounded labial probolae and five lateral incisures, with three incisures extending to the tail terminus. Eucephalobus hooperi is distinguished from other Eucephalobus species by its three bifurcated labial probolae with pointed termini and by morphometric characters such as body and tail length and the corpus:isthmus ratio. In this study, the morphological characters and morphometrics of C. aff. quinilineatus and E. hooperi Korean population are described and illustrated based on optical and/or scanning electron microscopy.

First report and morphological description of two Acrobeloides species(Nematoda: Rhabditida: Cephalobidae) in South Korea

  • Kim, Taeho;Lee, Yucheol;Park, Joong-Ki
    • Journal of Species Research
    • /
    • v.10 no.4
    • /
    • pp.405-411
    • /
    • 2021
  • The genus Acrobeloides(Cobb, 1924) Thorne, 1937 are bacterial feeders and are one of the most abundant and widely distributed nematode groups in various terrestrial environments. Based on morphological and morphometric analyses, we found two Acrobeloides species reported in Korea for the first time: A. bodenheimeri (Steiner, 1936) Thorne, 1937 and A. tricornis (Throne, 1925) Thorne, 1937. These species exhibit morphological characters concordant with typical features of the genus Acrobeloides, such as a fusiform pharyngeal corpus with swollen metacorpus and lateral incisures extending to the tail terminus. However, A. bodenheimeri is distinguished from other acrobeloids by having its low and rounded labial probolae, distinct post-uterine sac and five lateral incisures. Acrobeloides tricornis is distinguished from its congeners by the following characteristics: its high labial probolae with acuate termini, inconspicuous post-uterine sac and five lateral incisures. Morphological characters and their measurements, and illustrations of A. bodenheimeri and A. tricornis are described in this study.

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

A Study on Automatic Data Tagging for Text-based Training Data Construction (텍스트 기반의 훈련 데이터 구축을 위한 자동 데이터 태깅 작업에 대한 연구)

  • Kim, NaYun;So, Hyeryung;Park, Joonho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.1008-1009
    • /
    • 2020
  • 텍스트 기반의 훈련 데이터는 데이터를 수집한 이후에 각 문자별로 태깅 작업이 필요하다. 말뭉치(Corpus)는 언어학에서 주로 이루고 있는 텍스트 집합이다. 말뭉치는 각 단어의 품사 표기에 대한 정보가 태그 형태로 되어 있다. 본 연구에서는 한국어 기반의 태깅 작업을 연구했으며, 기본 한국어 말뭉치가 아닌 기업이나 연구 기관에서 데이터를 수집하여 말뭉치나 별도 학습 데이터를 구축하기 위한 자동 태깅 방법에 대해 알아본다.

A Study on the Construction of Korean Hate Speech Corpus: Based on the Attributes of Online Toxic Comments (한국어 혐오 표현 코퍼스 구축 방법론 연구: 온라인 악성 댓글에 나타나는 특성을 중심으로)

  • Cho, Won Ik;Moon, Jihyung
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.298-303
    • /
    • 2020
  • 온라인 공간에서 특정인, 혹은 특정 집단의 사람들을 대상으로 한 혐오 표현은 당사자에게 정신적 고통을 미칠 뿐 아니라 이를 보는 이에게도 간접적인 불쾌함을 유발한다. 이에 관한 문제의식은 사회적으로 공감대가 형성된 바 있지만, 아직 한국어에서는 많은 연구들이 혐오 표현 자체의 논의에 집중하고 있으며, 이는 실제로 관찰되는 혐오 표현들의 자동 탐지 및 예방에는 효과적인 정보를 제공하지 못하는 것이 사실이다. 이에 우리는 실제 온라인 댓글들을 탐구하여 혐오, 모욕 및 사회적 편견을 탐지할 수 있는 모델 학습에 필요한 코퍼스 구축 가이드라인을 제작하였다. 구체적인 사례를 동반한 가이드라인과 크라우드소싱을 바탕으로 약 9천 3백 문장 가량의 코퍼스를 구축하였으며, 해당 데이터에 관한 개요와 함께 우리의 접근 방식이 어떤 점에서 기존의 담론과 연관되어 있는지에 대한 분석을 제시한다.

  • PDF