• Title/Summary/Keyword: 제1종 오류

Search Result 47, Processing Time 0.027 seconds

Statistical methods for testing tumor heterogeneity (종양 이질성을 검정을 위한 통계적 방법론 연구)

  • Lee, Dong Neuck;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.331-348
    • /
    • 2019
  • Understanding the tumor heterogeneity due to differences in the growth pattern of metastatic tumors and rate of change is important for understanding the sensitivity of tumor cells to drugs and finding appropriate therapies. It is often possible to test for differences in population means using t-test or ANOVA when the group of N samples is distinct. However, these statistical methods can not be used unless the groups are distinguished as the data covered in this paper. Statistical methods have been studied to test heterogeneity between samples. The minimum combination t-test method is one of them. In this paper, we propose a maximum combinatorial t-test method that takes into account combinations that bisect data at different ratios. Also we propose a method based on the idea that examining the heterogeneity of a sample is equivalent to testing whether the number of optimal clusters is one in the cluster analysis. We verified that the proposed methods, maximum combination t-test method and gap statistic, have better type-I error and power than the previously proposed method based on simulation study and obtained the results through real data analysis.

Improved Preservation Methods for Big and Old Tress in South Korea (우리 나라의 노거수자원(老巨樹資源) 보호관리실태(保護管理室態) 및 개선방안(改善方案))

  • Park, Chong-Min;Seo, Byun-Soo;Lee, Cheong-Taek
    • Journal of Korean Society of Forest Science
    • /
    • v.89 no.3
    • /
    • pp.440-451
    • /
    • 2000
  • This study was conducted in order to provide essential data and relevant management proposal to conserve and maintain big and old trees in a rational way. For the field survey, 77 big and old trees preserved by the laws in Chollabuk-do, Korea were investigated. The study results are summarized as follows : 1. To conserve and manage big and old trees, the valuable trees have been designated as natural monument trees and protection-needed trees. There are 141 individuals of 37 species designated as natural monuments and 10,049 individuals of 102 species designated as protection-needed trees. 2. Management budget for natural monument trees was devoted at 70% from the national expenditure, but that for protection-needed trees was devoted at 98% from the local expenditure. 3. Standardized sign boards and sign stones for natural monument trees were well placed and other protection facilities such as fences, branch supports and branch holdings were established. On the other hand, management of protection-needed trees was deficient overall. 4. Problems for designation process and management of protection-needed trees could include items such as insufficient management budget, various development activities, land ownership, misjudgement of tree age and species identification, unsatisfaction of sign board placement, insufficient surgery for damaged trees, pavement around tree root system and environmental pollution around the trees. 5. In order to improve the existing management methods of big and old trees, the following schemes were suggested : the development of practical criteria for natural monument and protection-needed trees, nationwide surveys of big and old tree resources, the security of national budget, securing sufficient spaces for the tree growth, specialization of management systems, extended practices of tree form management, establishment of permanent standard signs and consideration of opinions of village residents.

  • PDF

Prolyl Endopeptidase-inhibiting Isoflavonoids from Puerariae Flos and Some Revision of their $^{13}C-NMR$ Assignment (갈화의 Prolyl Endopeptidase 저해 활성 Isoflavonoid 및 이들의 $^{13}C-NMR$ Assignment)

  • Kim, Kyung-Bum;Kim, Sang-In;Kim, Jong-Sik;Song, Kyung-Sik
    • Applied Biological Chemistry
    • /
    • v.42 no.4
    • /
    • pp.351-355
    • /
    • 1999
  • In order to find anti-dementia drugs from natural products, prolyl endopeptidase inhibitors were purified from Puerariae Flos by consecutive solvent partition, followed by silica gel, Sephadex LH-20, and HPLC. Four isoflavonoid inhibitors were isolated and identified as tectorigenin, genistein, 5,7-dihydroxy-4',6-dimethoxyisoflavone, and 5-hydroxy-6,7,4'-trimethoxyisoflavone by means of instrumental analyses including $^{1}H-$, $^{13}C-$, $^{2}D-NMR$ and MS and $IC_{50}$ values against PEP were 5.30 ppm$(17.7\;{\mu}M)$, 10.39 ppm$(38.5\;{\mu}M)$, 13.92 ppm$(44.3\;{\mu}M)$, and 20.61 ppm$(62.8\;{\mu}M)$, respectively. Some previous mistakes in $^{13}C-NMR$ assignment were revised by careful investigation of HMBC and HMQC data.

  • PDF

Analysis of Association between Mood of Music and Folksonomy Tag (음악의 분위기와 폭소노미 태그의 관계 분석)

  • Moon, Chang Bae;Kim, HyunSoo;Jang, Young-Wan;Kim, Byeong Man
    • Science of Emotion and Sensibility
    • /
    • v.16 no.1
    • /
    • pp.53-64
    • /
    • 2013
  • Folksonomies have potential problems caused by synonyms, tagging level, neologisms and so forth when retrieving music by tags. These problems can be tackled by introducing the mood intensity (Arousal and Valence value) of music as its internal tag. That is, if moods of music pieces and their mood tags are all represented internally by numeric values, A (Arousal) value and V (Valence) value, and they are retrieved by these values, then music pieces having similar mood with the mood tag of a query can be retrieved based on the similarity of their AV values though their tags are not exactly matched with the query. As a prerequisite study, in this paper, we propose the mapping table defining the relation between AV values and folksonomy tags. For analysis of the association between AV values and tags, ANOVA tests are performed on the test data collected from the well known music retrieval site last.fm. The results show that the P values for A values and V values are 0.0, which means the null hypotheses could be rejected and the alternative hypotheses could be adopted. Consequently, it is verified that the distribution of AV values depends on folksonomy tags.

  • PDF

Confidence Bounds following Adaptive Group Sequential Tests with Repeated Measures in Clinical Trials (반복측정자료를 가지는 적응적 집단축차검정에서의 신뢰구간 추정)

  • Joa, Sook Jung;Lee, Jae Won
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.581-594
    • /
    • 2013
  • A group sequential design can end a clinical trial early if a confirmed efficacy or a futility of study medication is found during clinical trials. Adaptation can adjust the design of clinical trials based on accumulated data. The key to this methodology is considered to control the overall type 1 error rate while maintaining the integrity of clinical trials. The estimation would be more complex and the sample size calculation will be more difficult if the clinical trials have repeated measurement data. Lee et al. (2002) suggested a repeated observation case by using the independent increments properties of the interim test statistics and investigated the properties of the proposed confidence interval based on the stage-wise ordering. This study extend Lee et al. (2002) to adaptive group sequential design. We suggest test statistics for the adaptation as redesigning the second stage of clinical trials and induce the stage-wise confidence interval of parameter of interests. The simulation will help to confirm the suggested method.

Problem Structuring in IT Policy: Boundary Analysis of IT Policy Problems (경계분석을 통한 정책문제 정의에 관한 연구 - 언론보도에 나타난 IT 정책문제 탐색을 중심으로 -)

  • Park, Chisung;Nam, Ki Bum
    • 한국정책학회보
    • /
    • v.21 no.4
    • /
    • pp.199-228
    • /
    • 2012
  • Policy problems are complex due to diverse participants and their relations in the policy processes. Defining the right problem in the first place is important because Type III error is likely to happen without removing rival hypothesis in defining the problem. This study applies Boundary Analysis suggested by Dunn to structure IT policy problems in Korea. The time frame of the study focuses on 5 years of Lee Administration and data are collected from four newspapers. Using content analysis, the study, first, elaborates total 2,614 policy problems from 1,908 stakeholders. After removing duplicating problems, 369 problems from 323 stakeholders are identified as a boundary of IT policy problem. Among others, failures in government policies are weighted as the most serious problems in IT policy field. However, many significant problems raised by stakeholders dated back to more than a decade, and those are intrinsic problems, which initially caused by market distortions in the IT industry. Therefore, we should be cautious not to overemphasize the most conspicuous problem as the only problem in the policy field when we interpret results of problem structuring.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.