• Title/Summary/Keyword: 범주형

Search Result 548, Processing Time 0.029 seconds

A Customer Classifier for EC Mall (전자상거래에 적용 가능한 고객분류기)

  • 김선철;이준욱;이용준;류근호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.138-140
    • /
    • 1999
  • 분류기법은 과거데이터를 분석하여 새로운 데이터에 대한 예측에 사용되며, 결정트리 알고리즘을 많이 사용한다. 따라서, 이 기법은 전자상거래에서 DB 마케팅을 위해 데이터베이스에 저장되어 있는 고객데이터를 분석하여 암시적인 고객들의 행위규칙을 찾고, 예측하기 위하여 사용할 수 있다. 기존의 분류알고리즘들은 전자상거래에서 일반적인 연속형 고객데이터를 처리하는데는 많은 문제점을 가지고 있다. 이러한 문제를 해결하기 위하여 연속형 데이터를 범주형 데이터로 변환하는 알고리즘을 구현하였다. 이 논문은 전자상거래에 적용하기 위한 고객분류기로서 ID3 알고리즘에 1차원 클러스터링알고리즘을 결합하여 사용한다.

  • PDF

지분구조의 다가자료에 관한 모형

  • 최재성
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.2
    • /
    • pp.377-384
    • /
    • 1997
  • 본 논문은 지분구조를 갖는 범주형 자료가 명목상의 다가자료일 때, 지분구조의 각 단계에서 정의될 수 있는 지분변수들의 유형과 지분변수들의 관심확률들에 영향을 미치는 변수들을 고려한 자료분석 모형들을 제시하고 있다.

  • PDF

Empirical Bayesian Misclassification Analysis on Categorical Data (범주형 자료에서 경험적 베이지안 오분류 분석)

  • 임한승;홍종선;서문섭
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.39-57
    • /
    • 2001
  • Categorical data has sometimes misclassification errors. If this data will be analyzed, then estimated cell probabilities could be biased and the standard Pearson X2 tests may have inflated true type I error rates. On the other hand, if we regard wellclassified data with misclassified one, then we might spend lots of cost and time on adjustment of misclassification. It is a necessary and important step to ask whether categorical data is misclassified before analyzing data. In this paper, when data is misclassified at one of two variables for two-dimensional contingency table and marginal sums of a well-classified variable are fixed. We explore to partition marginal sums into each cells via the concepts of Bound and Collapse of Sebastiani and Ramoni (1997). The double sampling scheme (Tenenbein 1970) is used to obtain informations of misclassification. We propose test statistics in order to solve misclassification problems and examine behaviors of the statistics by simulation studies.

  • PDF

Measurement of Association of Categorical Data Using The Overlapped Mosaic Plot : Dynamic Graphics Approach for $2{\times}2$ Contingency Table ($2{\times}2$ 분할표에서 동적 그래픽스로 구현된 겹쳐진 모자익 그림을 이용한 범주형 자료의 연관성 측정)

  • Yoon, Yeo-Chang;Oh, Min-Gweon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.2
    • /
    • pp.457-464
    • /
    • 1999
  • In this paper, we propose an overlapped mosaic plot which proposed by Hartigan and Kleiner(1981) represents the counts in $2{\times}2$ contingency table directly by tiles whose area is proportional to the cell frequency. Overlapped mosaic plot provides some measurements of association including dynamic graphics for mosaic plots. Dynamic graphics for mosaic plots give some useful informations when one gets some measurements of association and selects a model, and current statistical software does not provide this feature. We can see the deviations between observation and estimate of independence from overlapped mosaic plot. This dynamic graphics give some useful informations how far this data are apart from independence.

  • PDF

Assessment of predictability of categorical probabilistic long-term forecasts and its quantification for efficient water resources management (효율적인 수자원관리를 위한 범주형 확률장기예보의 예측력 평가 및 정량화)

  • Son, Chanyoung;Jeong, Yerim;Han, Soohee;Cho, Younghyun
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.8
    • /
    • pp.563-577
    • /
    • 2017
  • As the uncertainty of precipitation increases due to climate change, seasonal forecasting and the use of weather forecasts become essential for efficient water resources management. In this study, the categorical probabilistic long-term forecasts implemented by KMA (Korea Meteorological Administration) since June 2014 was evaluated using assessment indicators of Hit Rate, Reliability Diagram, and Relative Operating Curve (ROC) and a technique for obtaining quantitative precipitation estimates based on probabilistic forecasts was proposed. The probabilistic long-term forecasts showed its maximum predictability of 48% and the quantified precipitation estimates were closely matched with actual observations; maximum correlation coefficient (R) in predictability evaluation for 100% accurate and actual weather forecasts were 0.98 and 0.71, respectively. A precipitation quantification approach utilizing probabilistic forecasts proposed in this study is expected to enable water management considering the uncertainty of precipitation. This method is also expected to be a useful tool for supporting decision-making in the long-term planning for water resources management and reservoir operations.

The Nature of Variables Represented in the Titles of 7th Graders' Inquiry Report (중학교 1학년 학생들의 자유 탐구보고서에 나타난 변인의 유형)

  • Kim, Jae-Woo;Oh, Won-Kun;Pak, Sung-Jae
    • Journal of The Korean Association For Science Education
    • /
    • v.18 no.3
    • /
    • pp.297-301
    • /
    • 1998
  • To investigate the 7th graders' ideas on inquiry, researchers analysed the titles of inquiry report, which were submitted as summer vacation homework. The subjects were four classes of 141 thirteen year old boys and girls in a school in Seoul. After analysing the titles of student's report, researchers classified the titles into 9 types according to the clarity and the nature of variables in the titles. The fact that few students represented the variables in the report title and most of the variables used were categoric was found.

  • PDF

Steal Success Model for 2007 Korean Professional Baseball Games (2007년 한국프로야구에서 도루성공모형)

  • Hong, Chong-Sun;Choi, Jeong-Min
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.455-468
    • /
    • 2008
  • Based on the huge baseball game records, the steal plays an important role to affect the result of games. For the research about success or failure of the steal in baseball games, logistic regression models are developed based on 2007 Korean professional baseball games. The analyses of logistic regression models are compared of those of the discriminant models. It is found that the performance of the logistic regression analysis is more efficient than that of the discriminant analysis. Also, we consider an alternative logistic regression model based on categorical data which are transformed from uneasy obtainable continuous data.

TeGCN:Transformer-embedded Graph Neural Network for Thin-filer default prediction (TeGCN:씬파일러 신용평가를 위한 트랜스포머 임베딩 기반 그래프 신경망 구조 개발)

  • Seongsu Kim;Junho Bae;Juhyeon Lee;Heejoo Jung;Hee-Woong Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.419-437
    • /
    • 2023
  • As the number of thin filers in Korea surpasses 12 million, there is a growing interest in enhancing the accuracy of assessing their credit default risk to generate additional revenue. Specifically, researchers are actively pursuing the development of default prediction models using machine learning and deep learning algorithms, in contrast to traditional statistical default prediction methods, which struggle to capture nonlinearity. Among these efforts, Graph Neural Network (GNN) architecture is noteworthy for predicting default in situations with limited data on thin filers. This is due to their ability to incorporate network information between borrowers alongside conventional credit-related data. However, prior research employing graph neural networks has faced limitations in effectively handling diverse categorical variables present in credit information. In this study, we introduce the Transformer embedded Graph Convolutional Network (TeGCN), which aims to address these limitations and enable effective default prediction for thin filers. TeGCN combines the TabTransformer, capable of extracting contextual information from categorical variables, with the Graph Convolutional Network, which captures network information between borrowers. Our TeGCN model surpasses the baseline model's performance across both the general borrower dataset and the thin filer dataset. Specially, our model performs outstanding results in thin filer default prediction. This study achieves high default prediction accuracy by a model structure tailored to characteristics of credit information containing numerous categorical variables, especially in the context of thin filers with limited data. Our study can contribute to resolving the financial exclusion issues faced by thin filers and facilitate additional revenue within the financial industry.

A Comparative Study on the Environmental Impacts by Concrete Strength Using End-point LCA methodology (피해산정형 전과정평가 기법을 적용한 콘크리트 압축강도별 환경영향 비교 분석 연구)

  • Kim, Sung-Hee;Tae, Sung-Ho;Chae, Chang-U
    • Journal of the Korea Concrete Institute
    • /
    • v.26 no.4
    • /
    • pp.465-474
    • /
    • 2014
  • This is a comparative study that shows the overall environmental impacts from concrete structures when different compressive strength of concrete applied to structural systems having the same reference flow with different durability. A total of 24 MPa, 40 MPa and 60 MPa cases is analyzed to define the characteristic using end-point perspective LCA methodology including the stages of production, construction, maintenance and disposal. As results, global warming, non-renewable energy and respiratory inorganics problems are the major issues for assessing environmental impacts of concrete products.

스플라인을 이용한 스코어 카드

  • Choe, Min-Seong;Gu, Ja-Yong;Choe, Dae-U
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.285-288
    • /
    • 2003
  • 신용위험 관리에서 필수적인 방법론이 스코어 카드이며 이를 작성하는 데에 있어서 널리 쓰이는 방법 중의 하나가 로지스틱 회귀분석이다. 본 논문에서는 로지스틱 회귀 방법에 기초한 스플라인 방법론을 소개하고자 한다. 최종 스코어 카드는 연속형 변수를 범주형 변수화 하므로 조각 선형 스플라인을 채택하였다. 모의 실험을 통하여 제안된 방법의 성 능을 규명 하였다.

  • PDF