• Title/Summary/Keyword: 카이제곱 통계량

Search Result 75, Processing Time 0.026 seconds

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.2
    • /
    • pp.5-25
    • /
    • 2011
  • This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.

Exploring factors in terms of school and social environment that affect high school student's affective attitude on mathematics according to the student's academic level, grade, gender, and school location (고등학생의 학업성취도, 학년, 성별, 학교 소재지에 따른 수학에 대한 정의적 태도에 영향 미치는 학교와 사회 환경적 측면의 요인 탐색)

  • Jung Hye-Yun
    • The Mathematical Education
    • /
    • v.62 no.1
    • /
    • pp.151-173
    • /
    • 2023
  • In this study, we explored factors that affect high school student's affective attitude on mathematics with respect to the school mathematics instructoin, school mathematics assessment, mathematics textbook, private mathematics education, college entrance and career, and social atmosphere. Considering students' grade, major, academic level, gender, and school location, 1,029 high school students participated in the survey. To analyze the survey results, descriptive statistics, t-test, ANOVA, and chi-square test were conducted using SPSS ver 29.0. Results are as follows. First, generally, college entrance and career and school mathematics instruction affected students' affective attitude on mathematics. Second, according to student's academic level and gender, there was a statistically significant difference in the factors affecting the affective attitude on mathematics. Third, according to students' background, there was a statistically significant difference in students' responses to sub-categories of each factor. We suggested that to improve student's affective attitude on mathematics, diversity of the school mathematics instruction, improvement of the mathematics textbook, student's appropriate participation in the private mathematics education, improvement of student's perception of the utilization of the mathematics in the future and the importance of the mathemaitcs in the society, and parents' emotional support are needed.

A Loyalty Score Model Development in Credit Card Business (고객 로열티 스코어 모델 개발)

  • Chun, Heui-Ju
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.211-219
    • /
    • 2008
  • Customer Loyalty is very important for a company to be survived and to make profit for a long time. Especially, since the credit card company has to manage proper card members and merchants, the CRM(Customer Relationship Management) is much emphasized. A loyalty score is more essential to credit card companies which provide differential financial services based on card members and merchants than any other companies. In this paper, we discuss behavioral measures to define customer loyalty and suggest a method to make loyalty score with an example of a credit card company. The loyalty score developed is considered easy to understand and simple to apply in card industry. In the development of loyalty score, first, we define the loyal customers and non-loyal customers by measuring variables indicating loyalty. And we perform individual logistic regression by each exploratory measuring variable and obtain the weight of measure variables using Chi-square statistics which is used for model fitness. The loyalty score suggested shows very stable results in terms of PSI (Population Stability Index) as time goes.

Logistic Regression Accident Models by Location in the Case of Cheong-ju 4-Legged Signalized Intersections (사고위치별 로지스틱 회귀 교통사고 모형 - 청주시 4지 신호교차로를 중심으로 -)

  • Park, Byung-Ho;Yang, Jeong-Mo;Kim, Jun-Young
    • International Journal of Highway Engineering
    • /
    • v.11 no.2
    • /
    • pp.17-25
    • /
    • 2009
  • The goal of this study is to develop Logistic regression model by accident location(entry section, exit section, inside intersection and pedestrian crossing section). Based on the accident data of Chungbuk Provincial Police Agency(2004$\sim$2005) and the field survey data, the geometric elements, environmental factor and others related to traffic accidents were analyzed. Developed models are all analyzed to be statistically significant(chi-square p=0.000, Nagelkerke $R^2$=0.363$\sim$0.819). The models show that the common factors of accidents are the traffic volume(ADT), distant of crossing and exclusive left turn lane, and the specific factors are the minor traffic volume(inside intersection model) and U-turn of main road(pedestrian crossing model). Hosmer & Loineshow tests are evaluated to be statistically significant(p$\geqq$0.05) except the entry section model. The correct classification rates are also analyzed to be very predictable(more than 73.9% to all models).

  • PDF

The Sensitivity Analysis for Customer Feedback on Social Media (소셜 미디어 상 고객피드백을 위한 감성분석)

  • Song, Eun-Jee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.4
    • /
    • pp.780-786
    • /
    • 2015
  • Social media, such as Social Network Service include a lot of spontaneous opinions from customers, so recent companies collect and analyze information about customer feedback by using the system that analyzes Big Data on social media in order to efficiently operate businesses. However, it is difficult to analyze data collected from online sites accurately with existing morpheme analyzer because those data have spacing errors and spelling errors. In addition, many online sentences are short and do not include enough meanings which will be selected, so established meaning selection methods, such as mutual information, chi-square statistic are not able to practice Emotional Classification. In order to solve such problems, this paper suggests a module that can revise the meanings by using initial consonants/vowels and phase pattern dictionary and meaning selection method that uses priority of word class in a sentence. On the basis of word class extracted by morpheme analyzer, these new mechanisms would separate and analyze predicate and substantive, establish properties Database which is subordinate to relevant word class, and extract positive/negative emotions by using accumulated properties Database.

International Patent Classificaton Using Latent Semantic Indexing (잠재 의미 색인 기법을 이용한 국제 특허 분류)

  • Jin, Hoon-Tae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1294-1297
    • /
    • 2013
  • 본 논문은 기계학습을 통하여 특허문서를 국제 특허 분류(IPC) 기준에 따라 자동으로 분류하는 시스템에 관한 연구로 잠재 의미 색인 기법을 이용하여 분류의 성능을 높일 수 있는 방법을 제안하기 위한 연구이다. 종래 특허문서에 관한 IPC 자동 분류에 관한 연구가 단어 매칭 방식의 색인 기법에 의존해서 이루어진바가 있으나, 현대 기술용어의 발생 속도와 다양성 등을 고려할 때 특허문서들 간의 관련성을 분석하는데 있어서는 단어 자체의 빈도 보다는 용어의 개념에 의한 접근이 보다 효과적일 것이라 판단하여 잠재 의미 색인(LSI) 기법에 의한 분류에 관한 연구를 하게 된 것이다. 실험은 단어 매칭 방식의 색인 기법의 대표적인 자질선택 방법인 정보획득량(IG)과 카이제곱 통계량(CHI)을 이용했을 때의 성능과 잠재 의미 색인 방법을 이용했을 때의 성능을 SVM, kNN 및 Naive Bayes 분류기를 사용하여 분석하고, 그중 가장 성능이 우수하게 나오는 SVM을 사용하여 잠재 의미 색인에서 명사가 해당 용어의 개념적 의미 구조를 구축하는데 기여하는 정도가 어느 정도인지 평가함과 아울러, LSI 기법 이용시 최적의 성능을 나타내는 특이값의 범위를 실험을 통해 비교 분석 하였다. 분석결과 LSI 기법이 단어 매칭 기법(IG, CHI)에 비해 우수한 성능을 보였으며, SVM, Naive Bayes 분류기는 단어 매칭 기법에서는 비슷한 수준을 보였으나, LSI 기법에서는 SVM의 성능이 월등이 우수한 것으로 나왔다. 또한, SVM은 LSI 기법에서 약 3%의 성능 향상을 보였지만 Naive Bayes는 오히려 20%의 성능 저하를 보였다. LSI 기법에서 명사가 잠재적 의미 구조에 미치는 영향은 모든 단어들을 내용어로 한 경우 보다 약 10% 더 향상된 결과를 보여주었고, 특이값의 범위에 따른 성능 분석에 있어서는 30% 수준에 Rank 되는 범위에서 가장 높은 성능의 결과가 나왔다.

A Domain Action Classification Model Using Conditional Random Fields (Conditional Random Fields를 이용한 영역 행위 분류 모델)

  • Kim, Hark-Soo
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.1
    • /
    • pp.1-14
    • /
    • 2007
  • In a goal-oriented dialogue, speakers' intentions can be represented by domain actions that consist of pairs of a speech act and a concept sequence. Therefore, if we plan to implement an intelligent dialogue system, it is very important to correctly infer the domain actions from surface utterances. In this paper, we propose a statistical model to determine speech acts and concept sequences using conditional random fields at the same time. To avoid biased learning problems, the proposed model uses low-level linguistic features such as lexicals and parts-of-speech. Then, it filters out uninformative features using the chi-square statistic. In the experiments in a schedule arrangement domain, the proposed system showed good performances (the precision of 93.0% on speech act classification and the precision of 90.2% on concept sequence classification).

  • PDF

Korean Speech Act Tagging using Previous Sentence Features and Following Candidate Speech Acts (이전 문장 자질과 다음 발화의 후보 화행을 이용한 한국어 화행 분석)

  • Kim, Se-Jong;Lee, Yong-Hun;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.374-385
    • /
    • 2008
  • Speech act tagging is an important step in various dialogue applications, which recognizes speaker's intentions expressed in natural language utterances. Previous approaches such as rule-based and statistics-based methods utilize the speech acts of previous utterances and sentence features of the current utterance. This paper proposes a method that determines speech acts of the current utterance using the speech acts of the following utterances as well as previous ones. Using the features of following utterances yields the accuracy 95.27%, improving previous methods by 3.65%. Moreover, sentence features of the previous utterances are employed to maximally utilize the information available to the current utterance. By applying the proper probability model for each speech act, final accuracy of 97.97% is achieved.

Convergence of Relationship between Obesity and Periodontal Disease in Adults (성인의 비만과 치주질환과의 융합적 관계)

  • Lee, Yu-Hee;Choi, Jung-OK
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.11
    • /
    • pp.215-222
    • /
    • 2017
  • The purpose of this study was to investigate the relationship between oral health behaviors and periodontal diseases in adult obese people. Using the original data of the second phase of the 6th National Health and Nutrition Survey, the final 4381 adults were extracted. We analyzed frequency and technical statistics and chi - square test and multiple logistic regression analysis using SPSS statistical program to confirm the association between body mass index, number of brushing, drinking, smoking and oral health status and behavior. As a result, the prevalence of periodontal disease decreased as the number of toothbrushing increased, and the prevalence of periodontal disease increased as the body mass index increased. Through this study, obesity, a global health issue, should be more concerned with oral care and develop oral health management programs.

Testing Independence in Contingency Tables with Clustered Data (집락자료의 분할표에서 독립성검정)

  • 정광모;이현영
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.337-346
    • /
    • 2004
  • The Pearson chi-square goodness-of-fit test and the likelihood ratio tests are usually used for testing independence in two-way contingency tables under random sampling. But both of these tests may provide false results for the contingency table with clustered observations. In this case we consider the generalized linear mixed model which includes random effects of clustering in addition to the fixed effects of covariates. Both the heterogeneity between clusters and the dependency within a cluster can be explained via generalized linear mixed model. In this paper we introduce several types of generalized linear mixed model for testing independence in contingency tables with clustered observations. We also discuss the fitting of these models through a real dataset.