• 제목/요약/키워드: Statistical Language Model

검색결과 107건 처리시간 0.026초

문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선 (Improved Statistical Language Model for Context-sensitive Spelling Error Candidates)

  • 이정훈;김민호;권혁철
    • 한국멀티미디어학회논문지
    • /
    • 제20권2호
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

통계 정보를 이용한 전치사 최적 번역어 결정 모델 (A Statistical Model for Choosing the Best Translation of Prepositions.)

  • 심광섭
    • 한국언어정보학회지:언어와정보
    • /
    • 제8권1호
    • /
    • pp.101-116
    • /
    • 2004
  • This paper proposes a statistical model for the translation of prepositions in English-Korean machine translation. In the proposed model, statistical information acquired from unlabeled Korean corpora is used to choose the best translation from several possible translations. Such information includes functional word-verb co-occurrence information, functional word-verb distance information, and noun-postposition co-occurrence information. The model was evaluated with 443 sentences, each of which has a prepositional phrase, and we attained 71.3% accuracy.

  • PDF

R: AN OVERVIEW AND SOME CURRENT DIRECTIONS

  • Tierney, Luke
    • Journal of the Korean Statistical Society
    • /
    • 제36권1호
    • /
    • pp.31-55
    • /
    • 2007
  • R is an open source language for statistical computing and graphics based on the ACM software award-winning S language. R is widely used for data analysis and has become a major vehicle for making available new statistical methodology. This paper presents an overview of the design philosophy and the development model for R, reviews the basic capabilities of the system, and outlines some current projects that will influence future developments of R.

Language Modeling Approaches to Information Retrieval

  • Banerjee, Protima;Han, Hyo-Il
    • Journal of Computing Science and Engineering
    • /
    • 제3권3호
    • /
    • pp.143-164
    • /
    • 2009
  • This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model. In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends.

복합 자질 정보를 이용한 통계적 한국어 채팅 문장 생성 (Statistical Generation of Korean Chatting Sentences Using Multiple Feature Information)

  • 김종환;장두성;김학수
    • 인지과학
    • /
    • 제20권4호
    • /
    • pp.421-437
    • /
    • 2009
  • 채팅 시스템은 인간이 사용하는 언어를 이용하여 인간과 컴퓨터 간의 대화를 시뮬레이션하는 프로그램이다. 본 논문에서는 핵심어와 화행을 입력으로 받아 자연스러운 채팅 문장을 생성하는 통계 모델을 제안한다. 제안 모델은 먼저 핵심어를 포함한 어절을 말뭉치에서 선택하고, 해당 어절의 주위에 있는 어절의 출현 정보와 구문 정보를 이용하여 후보 문장들을 생성한다. 그리고 화행에 기초한 언어 모델, 어절간 공기 정보, 각 어절의 구문 정보를 이용하여 생성된 후보 문장 중 하나를 선택한다. 실험 결과에 따르면 제안 모델은 단순한 언어 모델에 기반한 기존의 모델보다 좋은 86.2%의 적합 문장 생성률을 보였다.

  • PDF

언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석 (Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling)

  • 정의정;이영직
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

통계 언어모델 기반 객관식 빈칸 채우기 문제 생성 (Automatic Generation of Multiple-Choice Questions Based on Statistical Language Model)

  • 박영기
    • 정보교육학회논문지
    • /
    • 제20권2호
    • /
    • pp.197-206
    • /
    • 2016
  • 빈칸 채우기 문제는 학생들이 학습 내용을 제대로 이해했는지 확인하기 위해 널리 사용되어 왔다. 이런 유형의 문제를 컴퓨터 알고리즘에 의해 자동으로 생성하는 많은 방법들이 제안되어 왔지만, 대부분 어떤 부분을 빈칸으로 만들면 좋을지에 대해 집중했기 때문에 적절한 보기를 자동으로 생성하는 연구는 미흡했다. 본 논문에서는 빈칸이 주어졌다고 가정하고, 이에 어울리는 보기를 자동 생성하는 알고리즘을 제안한다. 본 알고리즘은 통계 언어 모델에 기반하여 보기를 생성하기 때문에, 사람이 생성하는 경우보다 출제자에 편향되지 않은 보기를 제공할 수 있다. 또, 확률값에 기반하여 난이도를 자동으로 조절하는 것이 가능하기 때문에, 직접 사람이 문제를 만드는 것에 비해 상당한 비용 절감 효과가 있다. TEPS 문법, 어휘 시험에 대해 적용하여 실험한 결과, 사람과 유사한 결과를 생성함을 확인하였다. 향후 스마트 교육 분야에서 높은 활용도를 보일 것으로 기대한다.

코퍼스 빈도 정보 활용을 위한 적정 통계 모형 연구: 코퍼스 규모에 따른 타입/토큰의 함수관계 중심으로 (The Statistical Relationship between Linguistic Items and Corpus Size)

  • 양경숙;박병선
    • 한국언어정보학회지:언어와정보
    • /
    • 제7권2호
    • /
    • pp.103-115
    • /
    • 2003
  • In recent years, many organizations have been constructing their own large corpora to achieve corpus representativeness. However, there is no reliable guideline as to how large corpus resources should be compiled, especially for Korean corpora. In this study, we have contrived a new statistical model, ARIMA (Autoregressive Integrated Moving Average), for predicting the relationship between linguistic items (the number of types) and corpus size (the number of tokens), overcoming the major flaws of several previous researches on this issue. Finally, we shall illustrate that the ARIMA model presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, corpus representativeness and linguistic comprehensiveness.

  • PDF

중학교 수학 교과서의 통계적 소양 수준 반영 정도 (Levels of Statistical Literacy Derived from Middle School Mathematics Textbook)

  • 최선미;노지화
    • East Asian mathematical journal
    • /
    • 제37권4호
    • /
    • pp.481-497
    • /
    • 2021
  • The importance of statistics in everyday life and work place has led to calls for an increased attention to statistical literacy in the mathematics curriculum both internationally and domestically. While professional organizations and researchers propose perspectives towards and models of statistical literacy, conceptions and elements of statistical literacy vary. This study examines how mathematics textbook questions fulfill the requirements of statistical literacy by employing two models: Watson's model focusing on understanding of statistical language and Curcio's model on data interpretation aspects of statistical literacy. For this, a total of 872 problem questions presented in the statistics units of from ten textbooks for the middle school year 1 mathematics were analyzed.