• Title/Summary/Keyword: Korean grammar

Search Result 461, Processing Time 0.025 seconds

Vocabulary Recognition Retrieval Optimized System using MLHF Model (MLHF 모델을 적용한 어휘 인식 탐색 최적화 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.217-223
    • /
    • 2009
  • Vocabulary recognition system of Mobile terminal is executed statistical method for vocabulary recognition and used statistical grammar recognition system using N-gram. If limit arithmetic processing capacity in memory of vocabulary to grow then vocabulary recognition algorithm complicated and need a large scale search space and many processing time on account of impossible to process. This study suggest vocabulary recognition optimize using MLHF System. MLHF separate acoustic search and lexical search system using FLaVoR. Acoustic search feature vector of speech signal extract using HMM, lexical search recognition execution using Levenshtein distance algorithm. System performance as a result of represent vocabulary dependence recognition rate of 98.63%, vocabulary independence recognition rate of 97.91%, represent recognition speed of 1.61 second.

A Probing Task on Linguistic Properties of Korean Sentence Embedding (한국어 문장 임베딩의 언어적 속성 입증 평가)

  • Ahn, Aelim;Ko, ByeongiI;Lee, Daniel;Han, Gyoungeun;Shin, Myeongcheol;Nam, Jeesun
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.161-166
    • /
    • 2021
  • 본 연구는 한국어 문장 임베딩(embedding)에 담겨진 언어적 속성을 평가하기 위한 프로빙 태스크(Probing Task)를 소개한다. 프로빙 태스크는 임베딩으로부터 문장의 표층적, 통사적, 의미적 속성을 구분하는 문제로 영어, 폴란드어, 러시아어 문장에 적용된 프로빙 테스크를 소개하고, 이를 기반으로하여 한국어 문장의 속성을 잘 보여주는 한국어 문장 임베딩 프로빙 태스크를 설계하였다. 언어 공통적으로 적용 가능한 6개의 프로빙 태스크와 한국어 문장의 주요 특징인 주어 생략(SubjOmission), 부정법(Negation), 경어법(Honorifics)을 추가로 고안하여 총 9개의 프로빙 태스크를 구성하였다. 각 태스크를 위한 데이터셋은 '세종 구문분석 말뭉치'를 의존구문문법(Universal Dependency Grammar) 구조로 변환한 후 자동으로 구축하였다. HuggingFace에 공개된 4개의 다국어(multilingual) 문장 인코더와 4개의 한국어 문장 인코더로부터 획득한 임베딩의 언어적 속성을 프로빙 태스크를 통해 비교 분석한 결과, 다국어 문장 인코더인 mBART가 9개의 프로빙 태스크에서 전반적으로 높은 성능을 보였다. 또한 한국어 문장 임베딩에는 표층적, 통사적 속성보다는 심층적인 의미적 속성을 더욱 잘 담고 있음을 확인할 수 있었다.

  • PDF

Operation and Satisfaction of Physical Computing Classes Using MODI (MODI를 활용한 피지컬 컴퓨팅 수업 운영 및 만족도)

  • Seo, Eunsil
    • Journal of Engineering Education Research
    • /
    • v.26 no.1
    • /
    • pp.37-44
    • /
    • 2023
  • Recently, the Internet of Things is attracting attention as an important key technology of the 4th Industrial Revolution, and SW education using physical computing is suggested as a good alternative to supplement the problems raised by beginners in programming education. Among the many teaching tools that can be used for physical computing education, MODI is a modular manufacturing tool that anyone can easily assemble like Lego. MODI is a teaching tool that can improve learners' achievement by linking a self-linked block-type code editor called MODI Studio to lay the foundation for programming in a relatively small amount of time and immediately check the results in person. In this paper, a physical computing education method using MODI was designed to be applied to basic programming courses for programming beginners and applied to after-school classes for middle school students. As a result, it was found that students' interest and satisfaction were much higher in physical computing classes using MODI than in text-based programming classes. It can be seen that physical computing education that allows beginners to see and feel the results in person is more effective than grammar-oriented text programming, and it can have a positive effect on improving basic programming skills by increasing students' participation.

Psychological Distance between Students and Professors in Asynchronous Online Learning, and Its Relationship to Student Achievement & Preference for Online Courses

  • LEE, Jieun
    • Educational Technology International
    • /
    • v.11 no.2
    • /
    • pp.123-148
    • /
    • 2010
  • Relationships between students' perception of psychological distance with online professors and their academic learning achievement and their intention to continue online learning were examined. The courses selected for this study are two online courses: 1) 'English Grammar' and 2) 'TOEIC (Test of English for International Communication) Preparation' offered by a campus-based, medium-sized university. This study employed a mixed-methods approach by conducting a survey as well as one-on-one interviews with students. Students who feel psychologically distant with the online professors show significantly lower degree of perceived learning achievement, and higher tendency not to take online courses any more. All the three scales measuring the psychological distance -mutual awareness, connectedness, and availability- with professors turned out to be significantly related with students' perceived learning achievement. According to the result of the interview data analysis, the student interviewees unanimously said that the university should limit the number of online courses that students can register in a semester to one or two courses. Most students regard low interactivity of online learning as inevitable phenomenon. There is a statistically significant difference in perceived learning achievement between the online preferred group and the offline preferred group. Also, there is a significant difference in connectedness and availability and no significant difference in the degree of mutual awareness between the online and the offline preferred group.

Digitization of Old Korean Texts with Obsolete Korean Characters and Suggestion for Improvement of Information Sharing (옛한글 문서의 전자문서화와 정보공유 방법 제안)

  • Kim, Ha Young;Yoo, Woo Sik
    • Journal of Conservation Science
    • /
    • v.37 no.3
    • /
    • pp.255-269
    • /
    • 2021
  • A vast amount of materials-such as prints, woodblock prints, manuscripts, old novels, and letters-written in old Korean and using old grammar and/or obsolete characters, are collected in many institutions, including the Jangseogak at the Academy of Korean Studies. Digitization of these texts has required a prolonged manual inputting process. Individual researchers, who majored in old Korean, have read and typed the characters into electronic documents, which depends upon individual skill, effort, and approach, and is particularly limiting because none can be significantly increased. To date, only a small proportion of the old Korean document collections, currently kept in storage, have been digitized and made available to the public. Even the electronic formats of the texts prove difficult to displaying correctly, due to the incompatibility between the old Korean characters and the character set on today's electronic devices. To improve the techniques and efficiency of digitizing old Korean texts, it is necessary to develop optical character recognition (OCR), which will analyze images of old Korean documents, as well as input, display, and storage methods.

Historic Status and Grammatical Characteristics of Korean language in the Early 20th Century (한국어사에서 20세기 초 한국어의 위상과 문법 특징)

  • Hong, Jongseon
    • Korean Linguistics
    • /
    • v.71
    • /
    • pp.1-22
    • /
    • 2016
  • The early 20th century is a period of time when Korea confronted with the surging waves of modernization, and made a variety of internal reactions. The Korean language, not immune to the upheaval, also experienced new changes and gradually gained characteristics of today's Korean. Although scholars have not yet fully agreed upon the time division of Korean, Gabo reformation (1896) is usually considered to be the beginning of modern Korean. Thus, the early 20th century was also the beginning of modern Korean. Phonological, lexical, and grammatical characteristics of modern day Korean began to appear during this period of time. Phonologically, the 10 vowel system was established, glottal sounds and aspirated sounds increased, vowel harmony declined. Phenomena such as vowel raising, front-vowelization, monophthongization, and the word-initial rule appeared. Meanwhile, hangul-Chinese mix writing became common practice, and hangul-only writing also started to take place in narrative writing, and elements of spoken language began to reflect in written language. All those pointed to the unification of written and spoken language. Under the influence of modernization, a great amount of new words appeared. Especially, Japanese and other foreign words flooded in in great quantities. Grammatically, '-eos-(-엇-), -neun-(-는-), -ges-(-겟-)' trichotomy system of tenses was established, and hearer-oriented honorific system also formed a binary system of 'hasoseo(하소서), hasibsio(하십시오), hao(하오), hage(하게), haera(해라)' and 'hae (해), haeyo(해요)'. In word formation and sentence construction, the use of '-gi(-기)' became more frequent than '-eum(-음)', while '~geot(~것)' also significantly increased. In negative, causative and passive expressions, the use of long form, which has fewer restrictions than the short form, became more frequent. A tendency towards simplicity appeared. In the same vain, long and complex sentences with several clauses tend to be avoided. Instead, short simple sentences became more favorable. Korean linguistics scholars should pay closer attention to the modernization period, which includes the early 20th century. In order to fully understand today's Korean language, more thorough research on this immediately preceding period is necessary.

Study on the translation of the Dong-uibogam "東醫寶鑑" in Korean version with a different view. -Focused on Tang-aekpyeon(湯液篇) and Chobu(草部) in Dong-uibogam"東醫寶鑑"- ("동의보감(東醫寶鑑)" 번역서(飜譯書)에 대한 이견(異見) -탕액편(湯液篇)과 초부(草部)를 중심(中心)으로-)

  • Kim, Yong-Han;Kim, Young-Ho;Kim, Eun-Ha
    • Journal of Korean Medical classics
    • /
    • v.23 no.1
    • /
    • pp.143-161
    • /
    • 2010
  • The "Dong-uibogam(東醫寶鑑)" is a Korean medical book which represents the Korean Oriental Medicine and compiled by the royal physician, Heo Jun. It was placed on UNESCO's Memory of the World Programme in the year of 2009. It has been translated and published in Korean 7 times so far, and most of them depended on the liberal translation. This study has a purpose to investigate the Korean version in the view of Chinese writing grammar, and the results can be concluded as follows ; 1. The Korean version shows insufficient translation of individual morpheme in the sentence which has the prepositions with the pronouns or the conjunctions. 2. Most of the versions failed to translate the syntax properties of the demonstrative pronoun; '之' and '其'. 3. Some of the versions are not successful in the understanding of the constituent of sentence correctly. 4. Many of the adverbial phrases are not translated, which is the constituent of modifier in the sentence. 5. Some sentences are mistranslated by the paragraphs. 6. Some of them failed to understand the significances of the vocabularies.

Formation of A Phonetic-Value Look-up Table for Korean Voice Synthesis (한국어 음성 합성을 위한 음가 변환 테이블 생성)

  • Lee, Gye-Young;Yim, Jae-Geol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.5
    • /
    • pp.44-57
    • /
    • 2001
  • In order to synthesize grammatically correct Korean voices, we have to refer to the 'Standard Pronunciation Rules(SPR)' stated in the 'Standard Grammar of Korean Language.' Therefore, the rules that is used for a Korean-voice-synthesis system to find Korean voices corresponding to a given Korean sentence must completely reflect the SPR and must be sound. However, in the field of computer science they have just used the SPR without proving the completeness and soundness of their rules. In this paper, we construct a Petri net model for each rule of SPR, integrate all the Petri net models to build one big Petri net completely representing SPR, and analyse the Petri net to prove the consistency of it. Then, we transfer the Petri net model into a look-up table for Korean voice. Using this table, we can avoid the drawbacks of existing approaches such as going through several stages or repetitively applying a converting process.

  • PDF

A Morpheme-unit Korean Feature-Based Brammer (KFG) with the X-bar Theoretic Notion of Headedness (X-바 이론의 중심어 개념을 도입한 형태소 단위의 한국어 자질 기반 문법)

  • Park, So-Yeong;Hwang, Yeong-Suk;Im, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1247-1259
    • /
    • 1999
  • 본 논문에서는 한국어 문장형성원리를 간결하게 제시할 수 있도록 X-바 이론의 중심어 개념을 도입한 한국어 자질기반 문법을 제안한다. 제안하는 문법은 어절에 관계없이 나타나는 한국어의 문법현상을 명확히 설명할 수 있도록 어절 대신 형태소를 기본단위로 한다. 그리고, 한국어의 구문범주가 지닌 의미정보와 기능정보를 자질을 이용하여 독립적으로 표현하며, 구문범주간의 결합관계를 바탕으로 하는 자질연산을 수행하여 문장을 분석한다. 또한, 한국어의 부분자유어순과 생략현상에 대해 견고하게 분석할 수 있도록 자질연산을 이진결합중심의 CNF(Chomsky Normal Form)로 제한한다. 이렇게 구성된 한국어 자질기반 문법은 규칙을 직관적이고도 간단하게 기술하며, 한국어의 다양한 문장들을 견고하게 분석한다. SERI Test Suites 97과 신문기사에서 746문장을 추출하여 실험한 결과 94%~99%의 적용율을 보였다.Abstract In this paper, we propose a Korean feature-based grammar(KFG) which adopts the X-bar theoretic notion of headedness for a precise representation of Korean syntactic structure. In order to explain various language phenomena in a given sentence, we use not the word but the morpheme as a constituent unit of KFG. We use features manifesting both the syntactic information and the semantic information of Korean syntactic categories, and feature operations based on the association relationship between two categories. In addition, we restrict feature operations to CNF(Chomsky Normal Form) binary form, which provides a robust representation for properties in Korean such as the frequent ellipsis and the partial free-order. The KFG is intuitive, simple, and versatile in representing most Korean sentences. The experimental result shows 94%~99% coverage on 746 sentences extracted from SERI Test Suites 97 and newspaper sentences.

Design and Implementation of Web Crawler Wrappers to Collect User Reviews on Shopping Mall with Various Hierarchical Tree Structure (다양한 계층 트리 구조를 갖는 쇼핑몰 상에서의 상품평 수집을 위한 웹 크롤러 래퍼의 설계 및 구현)

  • Kang, Han-Hoon;Yoo, Seong-Joon;Han, Dong-Il
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.3
    • /
    • pp.318-325
    • /
    • 2010
  • In this study, the wrapper database description language and model is suggested to collect product reviews from Korean shopping malls with multi-layer structures and are built in a variety of web languages. Above all, the wrapper based web crawlers have the website structure information to bring the exact desired data. The previously suggested wrapper based web crawler can collect HTML documents and the hierarchical structure of the target documents were only 2-3 layers. However, the Korean shopping malls in the study consist of not only HTML documents but also of various web language (JavaScript, Flash, and AJAX), and have a 5-layer hierarchical structure. A web crawler should have information about the review pages in order to visit the pages without visiting any non-review pages. The proposed wrapper contains the location information of review pages. We also propose a language grammar used in describing the location information.