• 제목/요약/키워드: language model

검색결과 2,777건 처리시간 0.036초

문장음성인식을 위한 VCCV 기반의 언어모델과 Smoothing 기법 평가 (Language Model based on VCCV and Test of Smoothing Techniques for Sentence Speech Recognition)

  • 박선희;노용완;홍광석
    • 정보처리학회논문지B
    • /
    • 제11B권2호
    • /
    • pp.241-246
    • /
    • 2004
  • 본 논문에서는 언어모델의 언어처리 단위로 VCCV(vowel consonant consonant vowel) 단위를 제안하구 기존의 언어처리 단위인 어적 형태소 단위와 비교한다. 어절과 형태소는 어휘수가 많고 높은 복잡도를 가진다. 그러나 VCCV 단위는 작은 사전과 제한된 어휘를 가지므로 복잡도가 적다. 언어모델 구성에 smoothing은 반드시 필요하다. smoothing 기법은 정확한 확률 예측이 불확실한 데이터가 있을 때 더 나은 확률 예측을 위해 사용된다. 본 논문에서는 형태소, 어절, VCCV 단위에 대해 언어모델을 구성하여 복잡도를 계산하였다. 그 결과 VCCV 단위의 복잡도가 형태소나 어절보다 적게 나오는 것을 볼 수 있었다. 복잡도가 적게 나온 VCCV를 기반으로 N-gram을 구성하고 Katz. Witten-Bell, absolute, modified Kneser-Ney smoothing 등의 방법을 이용한 언어 모델에 대해 평가하였다. 그 결과 VCCV 단위의 언어모델에 적합한 smoothing 기법은 modified Kneser-Ney 방법으로 평가되었다.

자연어 처리 및 기계학습을 통한 동의보감 기반 한의변증진단 기술 개발 (Donguibogam-Based Pattern Diagnosis Using Natural Language Processing and Machine Learning)

  • 이승현;장동표;성강경
    • 대한한의학회지
    • /
    • 제41권3호
    • /
    • pp.1-8
    • /
    • 2020
  • Objectives: This paper aims to investigate the Donguibogam-based pattern diagnosis by applying natural language processing and machine learning. Methods: A database has been constructed by gathering symptoms and pattern diagnosis from Donguibogam. The symptom sentences were tokenized with nouns, verbs, and adjectives with natural language processing tool. To apply symptom sentences into machine learning, Word2Vec model has been established for converting words into numeric vectors. Using the pair of symptom's vector and pattern diagnosis, a pattern prediction model has been trained through Logistic Regression. Results: The Word2Vec model's maximum performance was obtained by optimizing Word2Vec's primary parameters -the number of iterations, the vector's dimensions, and window size. The obtained pattern diagnosis regression model showed 75% (chance level 16.7%) accuracy for the prediction of Six-Qi pattern diagnosis. Conclusions: In this study, we developed pattern diagnosis prediction model based on the symptom and pattern diagnosis from Donguibogam. The prediction accuracy could be increased by the collection of data through future expansions of oriental medicine classics.

Using Semantic Knowledge in the Uyghur-Chinese Person Name Transliteration

  • Murat, Alim;Osman, Turghun;Yang, Yating;Zhou, Xi;Wang, Lei;Li, Xiao
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.716-730
    • /
    • 2017
  • In this paper, we propose a transliteration approach based on semantic information (i.e., language origin and gender) which are automatically learnt from the person name, aiming to transliterate the person name of Uyghur into Chinese. The proposed approach integrates semantic scores (i.e., performance on language origin and gender detection) with general transliteration model and generates the semantic knowledge-based model which can produce the best candidate transliteration results. In the experiment, we use the datasets which contain the person names of different language origins: Uyghur and Chinese. The results show that the proposed semantic transliteration model substantially outperforms the general transliteration model and greatly improves the mean reciprocal rank (MRR) performance on two datasets, as well as aids in developing more efficient transliteration for named entities.

Korean Students' Intentions to Use Mobile-Assisted Language Learning: Applying the Technology Acceptance Model

  • Kim, Gyoo-mi;Lee, Sang-jun
    • International Journal of Contents
    • /
    • 제12권3호
    • /
    • pp.47-53
    • /
    • 2016
  • The purpose of this study was to examine how Korean students accept and use mobile-assisted language learning (MALL) and investigate related factors that potentially affect MALL usage. The participants were 244 undergraduate students who were surveyed with a questionnaire. The research model, which included students' self-efficacy, content reliability, interactivity, perceived enjoyment, perceived usefulness, perceived ease of use, attitude, and behavioral intention to use MALL, was developed based on the technology acceptance model (TAM). The structural equation modeling (SEM) technique was employed in order to analyze the overall results of modified TAM and the research model. The results indicated that TAM was a good theoretical tool to understand students' acceptance of MALL. In addition, all constructs, with the exception of self-efficacy and interactivity, had significant effects on students' acceptance possibilities of MALL. Limitations and suggestions for the further study are also presented.

Analysis of a crop growth model using Unified Modeling Language

  • Kim, Kwang Soo;Kim, Do-Gyeom;Kim, Sey Hyun;Hwang, Grim;Jeong, Haneul
    • 한국농림기상학회:학술대회논문집
    • /
    • 한국농림기상학회 2011년도 학술발표회
    • /
    • pp.12-14
    • /
    • 2011
  • Crop growth simulation models have been developed as research and management tools. When these models are needed to incorporate new knowledge on phenology and physiology of crops, programming languages have been used for development and documentation of these models. However, researchers may have limited skill in programming languages. Furthermore, software developer may find it challenging to improve the crop models because documentation of the models are rarely available. The Unified Modeling Language (UML) can provide a simple approach for development and documentation of model. A template for implementation of the model can be obtained using the UML, which would facilitate code re-use and model improvement.

  • PDF

다양한 앙상블 알고리즘을 이용한 한국어 의존 구문 분석 (Korean Dependency Parsing Using Various Ensemble Models)

  • 조경철;김주완;김균엽;박성진;강상우
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2019년도 제31회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.543-545
    • /
    • 2019
  • 본 논문은 최신 한국어 의존 구문 분석 모델(Korean dependency parsing model)들과 다양한 앙상블 모델(ensemble model)들을 결합하여 그 성능을 분석한다. 단어 표현은 미리 학습된 워드 임베딩 모델(word embedding model)과 ELMo(Embedding from Language Model), Bert(Bidirectional Encoder Representations from Transformer) 그리고 다양한 추가 자질들을 사용한다. 또한 사용된 의존 구문 분석 모델로는 Stack Pointer Network Model, Deep Biaffine Attention Parser와 Left to Right Pointer Parser를 이용한다. 최종적으로 각 모델의 분석 결과를 앙상블 모델인 Bagging 기법과 XGBoost(Extreme Gradient Boosting) 이용하여 최적의 모델을 제안한다.

  • PDF

한-일 수화 영상통신을 위한 3차원 모델 (3D model for korean-japanese sign language image communication)

  • 신성효;김상운
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1998년도 하계종합학술대회논문집
    • /
    • pp.929-932
    • /
    • 1998
  • In this paper we propose a method of representing emotional experessions and lip shapes for sign language communication using 3-dimensional model. At first we employ the action units (AU) of facial action coding system(FACS) to display all shapes. Then we define 11 basic lip shapes and sounding times of each components in a syllable in order to synthesize the lip shapes more precisely for korean characters. Experimental results show that the proposed method could be used efficiently for the sign language image communication between different languages.

  • PDF

Tee 모델과 Pipe 모렐 (Tee Model or Pipe Model?)

  • 한영희
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 1991년도 제3회 한글 및 한국어정보처리 학술대회
    • /
    • pp.189-195
    • /
    • 1991
  • 지금까지 생성문법에서 제시한 문법의 구조는 대체적으로 3개부 (components)가 T자 형으로 배치되어 있으며 이러한 구조는 주로 언어능력을 위한 것으로 여겨져 왔다. 본고는 언어수행도 동시에 -- 특히 화자의 환류 (feedback)와 청자의 지각의 면에서 -- 처리할 수 있고 또 각부 사이의 기능을 자연스럽게 연계하여 주는 interface 기능을 위하여, 보다 타당성 있는 새로운 모델로 Pipe 형 (또는 종합형)을 제시하고자 한다.

  • PDF

Project-based CALL Class: Linking the Theory and Practice

  • Yang, Eun-Mi
    • 영어어문교육
    • /
    • 제10권1호
    • /
    • pp.53-76
    • /
    • 2004
  • This paper introduces a class model based on a course, Internet English, offered by an English department at a university. The course has dual purposes of developing students I English skills and Internet using skills at the same time. In support of using the Internet for language learning, the advantages of project-based language learning and constructivist learning in relation to CALL are explored. The activities in this course, which are basically project-based under the paradigm of constructivist learning perspective, are explained in detail to show the relationship between second language learning theory and teaching application. The way how the four language skills - speaking, listening, reading, and writing - are integrated in this class is described as well. Finally, judgmental evaluation of the course by the students is noted. The results show that a project-based CALL class could be a promising class model to realize an integrative, constructivist, and authentic learning.

  • PDF

거대 언어 모델(LLM)을 이용한 비훈련 이진 감정 분류 (Utilizing Large Language Models for Non-trained Binary Sentiment Classification)

  • 안형진;황태욱;정상근
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2023년도 제35회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.66-71
    • /
    • 2023
  • ChatGPT가 등장한 이후 다양한 거대 언어 모델(Large Language Model, LLM)이 등장하였고, 이러한 LLM을 목적에 맞게 파인튜닝하여 사용할 수 있게 되었다. 하지만 LLM을 새로 학습하는 것은 물론이고, 단순 튜닝만 하더라도 일반인은 시도하기 어려울 정도의 많은 컴퓨팅 자원이 필요하다. 본 연구에서는 공개된 LLM을 별도의 학습 없이 사용하여 zero-shot 프롬프팅으로 이진 분류 태스크에 대한 성능을 확인하고자 했다. 학습이나 추가적인 튜닝 없이도 기존 선학습 언어 모델들에 준하는 이진 분류 성능을 확인할 수 있었고, 성능이 좋은 LLM의 경우 분류 실패율이 낮고 일관적인 성능을 보여 상당히 높은 활용성을 확인하였다.

  • PDF