• Title/Summary/Keyword: Validation Set

Search Result 675, Processing Time 0.025 seconds

Development and Validation of an Analytical Method for the Insecticide Sulfoxaflor in Agricultural Commodities using HPLC-UVD (HPLC-UVD를 이용한 농산물 중 살충제 sulfoxaflor의 시험법 개발 및 검증)

  • Do, Jung-Ah;Lee, Mi-Young;Park, Hyejin;Kwon, Ji-Eun;Jang, Hyojin;Cho, Yoon-Jae;Kang, Il-Hyun;Lee, Sang-Mok;Chang, Moon-Ik;Oh, Jae-Ho;Hwang, In-Gyun
    • Korean Journal of Food Science and Technology
    • /
    • v.45 no.2
    • /
    • pp.148-155
    • /
    • 2013
  • Sulfoxaflor is a new active ingredient within the sulfoximine insecticide class that acts via a unique interaction with the nicotinic receptor. The MRLs (maximun residue limit) of sulfoxaflor in apple and pear are set at 0.4 mg/kg and that in pepper is set at 0.5 mg/kg. The purpose of this study was to develop an analytical method for the determination of sulfoxaflor residues in agricultural commodities using HPLC-UVD and LC-MS. The analysis of sulfoxaflor was performed by reverse phase-HPLC using an UV detector. Acetone and methanol were used for the extraction and aminopropyl ($NH_2$) cartridge was used for the clean-up in the samples. Recovery experiments were conducted on 7 representative agricultural products to validate the analytical method. The recoveries of the proposed method ranged from 82.8% to 108.2% and relative standard deviations were less than 10%. Finally, LC-MS with selected ion monitoring was also applied to confirm the suspected residues of sulfoxaflor in agricultural commodities.

Evaluation of the Measurement Uncertainty from the Standard Operating Procedures(SOP) of the National Environmental Specimen Bank (국가환경시료은행 생태계 대표시료의 채취 및 분석 표준운영절차에 대한 단계별 측정불확도 평가 연구)

  • Lee, Jongchun;Lee, Jangho;Park, Jong-Hyouk;Lee, Eugene;Shim, Kyuyoung;Kim, Taekyu;Han, Areum;Kim, Myungjin
    • Journal of Environmental Impact Assessment
    • /
    • v.24 no.6
    • /
    • pp.607-618
    • /
    • 2015
  • Five years have passed since the first set of environmental samples was taken in 2011 to represent various ecosystems which would help future generations lead back to the past environment. Those samples have been preserved cryogenically in the National Environmental Specimen Bank(NESB) at the National Institute of Environmental Research. Even though there is a strict regulation (SOP, standard operating procedure) that rules over the whole sampling procedure to ensure each sample to represent the sampling area, it has not been put to the test for the validation. The question needs to be answered to clear any doubts on the representativeness and the quality of the samples. In order to address the question and ensure the sampling practice set in the SOP, many steps to the measurement of the sample, that is, from sampling in the field and the chemical analysis in the lab are broken down to evaluate the uncertainty at each level. Of the 8 species currently taken for the cryogenic preservation in the NESB, pine tree samples from two different sites were selected for this study. Duplicate samples were taken from each site according to the sampling protocol followed by the duplicate analyses which were carried out for each discrete sample. The uncertainties were evaluated by Robust ANOVA; two levels of uncertainty, one is the uncertainty from the sampling practice, and the other from the analytical process, were then compiled to give the measurement uncertainty on a measured concentration of the measurand. As a result, it was confirmed that it is the sampling practice not the analytical process that accounts for the most of the measurement uncertainty. Based on the top-down approach for the measurement uncertainty, the efficient way to ensure the representativeness of the sample was to increase the quantity of each discrete sample for the making of a composite sample, than to increase the number of the discrete samples across the site. Furthermore, the cost-effective approach to enhance the confidence level on the measurement can be expected from the efforts to lower the sampling uncertainty, not the analytical uncertainty. To test the representativeness of a composite sample of a sampling area, the variance within the site should be less than the difference from duplicate sampling. For that, a criterion, ${i.e.s^2}_{geochem}$(across the site variance) <${s^2}_{samp}$(variance at the sampling location) was proposed. In light of the criterion, the two representative samples for the two study areas passed the requirement. In contrast, whenever the variance of among the sampling locations (i.e. across the site) is larger than the sampling variance, more sampling increments need to be added within the sampling area until the requirement for the representativeness is achieved.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

A New Exploratory Research on Franchisor's Provision of Exclusive Territories (가맹본부의 배타적 영업지역보호에 대한 탐색적 연구)

  • Lim, Young-Kyun;Lee, Su-Dong;Kim, Ju-Young
    • Journal of Distribution Research
    • /
    • v.17 no.1
    • /
    • pp.37-63
    • /
    • 2012
  • In franchise business, exclusive sales territory (sometimes EST in table) protection is a very important issue from an economic, social and political point of view. It affects the growth and survival of both franchisor and franchisee and often raises issues of social and political conflicts. When franchisee is not familiar with related laws and regulations, franchisor has high chance to utilize it. Exclusive sales territory protection by the manufacturer and distributors (wholesalers or retailers) means sales area restriction by which only certain distributors have right to sell products or services. The distributor, who has been granted exclusive sales territories, can protect its own territory, whereas he may be prohibited from entering in other regions. Even though exclusive sales territory is a quite critical problem in franchise business, there is not much rigorous research about the reason, results, evaluation, and future direction based on empirical data. This paper tries to address this problem not only from logical and nomological validity, but from empirical validation. While we purse an empirical analysis, we take into account the difficulties of real data collection and statistical analysis techniques. We use a set of disclosure document data collected by Korea Fair Trade Commission, instead of conventional survey method which is usually criticized for its measurement error. Existing theories about exclusive sales territory can be summarized into two groups as shown in the table below. The first one is about the effectiveness of exclusive sales territory from both franchisor and franchisee point of view. In fact, output of exclusive sales territory can be positive for franchisors but negative for franchisees. Also, it can be positive in terms of sales but negative in terms of profit. Therefore, variables and viewpoints should be set properly. The other one is about the motive or reason why exclusive sales territory is protected. The reasons can be classified into four groups - industry characteristics, franchise systems characteristics, capability to maintain exclusive sales territory, and strategic decision. Within four groups of reasons, there are more specific variables and theories as below. Based on these theories, we develop nine hypotheses which are briefly shown in the last table below with the results. In order to validate the hypothesis, data is collected from government (FTC) homepage which is open source. The sample consists of 1,896 franchisors and it contains about three year operation data, from 2006 to 2008. Within the samples, 627 have exclusive sales territory protection policy and the one with exclusive sales territory policy is not evenly distributed over 19 representative industries. Additional data are also collected from another government agency homepage, like Statistics Korea. Also, we combine data from various secondary sources to create meaningful variables as shown in the table below. All variables are dichotomized by mean or median split if they are not inherently dichotomized by its definition, since each hypothesis is composed by multiple variables and there is no solid statistical technique to incorporate all these conditions to test the hypotheses. This paper uses a simple chi-square test because hypotheses and theories are built upon quite specific conditions such as industry type, economic condition, company history and various strategic purposes. It is almost impossible to find all those samples to satisfy them and it can't be manipulated in experimental settings. However, more advanced statistical techniques are very good on clean data without exogenous variables, but not good with real complex data. The chi-square test is applied in a way that samples are grouped into four with two criteria, whether they use exclusive sales territory protection or not, and whether they satisfy conditions of each hypothesis. So the proportion of sample franchisors which satisfy conditions and protect exclusive sales territory, does significantly exceed the proportion of samples that satisfy condition and do not protect. In fact, chi-square test is equivalent with the Poisson regression which allows more flexible application. As results, only three hypotheses are accepted. When attitude toward the risk is high so loyalty fee is determined according to sales performance, EST protection makes poor results as expected. And when franchisor protects EST in order to recruit franchisee easily, EST protection makes better results. Also, when EST protection is to improve the efficiency of franchise system as a whole, it shows better performances. High efficiency is achieved as EST prohibits the free riding of franchisee who exploits other's marketing efforts, and it encourages proper investments and distributes franchisee into multiple regions evenly. Other hypotheses are not supported in the results of significance testing. Exclusive sales territory should be protected from proper motives and administered for mutual benefits. Legal restrictions driven by the government agency like FTC could be misused and cause mis-understandings. So there need more careful monitoring on real practices and more rigorous studies by both academicians and practitioners.

  • PDF

An Operations Study on a Home Health Nursing Demonstration Program for the Patients Discharged with Chronic Residual Health Care Problems (추후관리가 필요한 만성질환 퇴원환자 가정간호 시범사업 운영 연구)

  • 홍여신;이은옥;이소우;김매자;홍경자;서문자;이영자;박정호;송미순
    • Journal of Korean Academy of Nursing
    • /
    • v.20 no.2
    • /
    • pp.227-248
    • /
    • 1990
  • The study was conceived in relation to a concern over the growing gap between the needs of chronic patients and the availability of care from the current health care system in Korea. Patients with agonizing chronic pain, discomfort, despair and disability are left with helplessly unprepared families with little help from the acute care oriented health care system after discharge from hospital. There is a great need for the development of an alternative means of quality care that is economically feasible and culturally adaptible to our society. Thus, the study was designed to demonstrate the effectiveness of home heath care as an alternative to bridge the existing gap between the patients' needs and the current practice of health care. The study specifically purports to test the effects of home care on health expenditure, readmission, job retention, compliance to health care regime, general conditions, complications, and self-care knowledge and practices. The study was guided by the operations research method advocated by the Primary Health Care Operations Research Institute(PRICOR) which constitutes 3 stages of research : namely, problem analysis solution development, and solution validation. The first step in the operations research was field preparation to develop the necessary consensus and cooperation. This was done through the formation of a consulting body at the hospital and a steering committee among the researchers. For the stage of problem analysis, the Annual Report of Seoul National University Hospital and the patients records for last 5 years were reviewed and selective patient interviews were conducted to find out the magnitude of chronic health problems and areas of unmect health care needs to finally decide on the kinds of health problems to study. On the basis of problem analysis, the solution development stage was devoted to home care program development asa solution alternative. Assessment tools, teaching guidelines and care protocols were developed and tested for their validity. The final stage was the stage of experimentation and evaluation. Patients with liver diseases, hemiplegic and diabetic conditions were selected as study samples. Discharge evaluation, follow up home care, measurement and evaluation were carried out according to the protocols of care and measurement plan for each patient for the period of 6 months after discharge. The study was carried out for the period from Jan. 1987 to Dec. 1989. The following are the results of the study presented according to the hypotheses set forth for the study ; 1. Total expenditures for the period of study were not reduced for the experimental group, however, since the cost per hospital visit is about 4 times as great as the cost per home visit, the effect of cost saving by home care will become a reality as home care replaces part of the hospital visits. 2. The effect on the rate of readmission and job retention was found to be statistically nonsignificant though the number of readmission was less among the experimental group receiving home care. 3. The effect on compliance to the health care regime was found to be statistically significant at the 5% level for hepatopathic and diabetic patients. 4. Education on diet, rest and excise, and medication through home care had an effect on improved liver function test scores, prevention of complications and self - care knowledge in hepatopathic patients at a statistically significant level. 5. In hemiplegic patient, home care had an effect on increased grasping power at a significant level. However. there was no significant difference between the experimental and control groups in the level of compliane, prevention of complications or in self-care practices. 6. In diabetic patients, there was no difference between the experimental and control groups in scores of laboratory tests, appearance of complications, and self-care knowledge or self -care practices. The above findings indicate that a home care program instituted for such short term as 6 months period could not totally demonstrate its effectiveness at a statistically significant level by quantitative analysis however, what was shown in part in this analysis, and in the continuous consultation sought by those who had been in the experimental group, is that home health care has a great potential in retarding or preventing pathological progress, facilitating rehabilitative and productive life, and improving quality of life by adding comfort, confidence and strength to patients and their families. For the further studies of this kind with chronic patients it is recommended that a sample of newly diagnosed patients be followed up for a longer period of time with more frequent observations to demonstrate a more dear- cut picture of the effectiveness of home care.

  • PDF

Application of The Semi-Distributed Hydrological Model(TOPMODEL) for Prediction of Discharge at the Deciduous and Coniferous Forest Catchments in Gwangneung, Gyeonggi-do, Republic of Korea (경기도(京畿道) 광릉(光陵)의 활엽수림(闊葉樹林)과 침엽수림(針葉樹林) 유역(流域)의 유출량(流出量) 산정(算定)을 위한 준분포형(準分布型) 수문모형(水文模型)(TOPMODEL)의 적용(適用))

  • Kim, Kyongha;Jeong, Yongho;Park, Jaehyeon
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.2
    • /
    • pp.197-209
    • /
    • 2001
  • TOPMODEL, semi-distributed hydrological model, is frequently applied to predict the amount of discharge, main flow pathways and water quality in a forested catchment, especially in a spatial dimension. TOPMODEL is a kind of conceptual model, not physical one. The main concept of TOPMODEL is constituted by the topographic index and soil transmissivity. Two components can be used for predicting the surface and subsurface contributing area. This study is conducted for the validation of applicability of TOPMODEL at small forested catchments in Korea. The experimental area is located at Gwangneung forest operated by Korea Forest Research Institute, Gyeonggi-do near Seoul metropolitan. Two study catchments in this area have been working since 1979 ; one is the natural mature deciduous forest(22.0 ha) about 80 years old and the other is the planted young coniferous forest(13.6 ha) about 22 years old. The data collected during the two events in July 1995 and June 2000 at the mature deciduous forest and the three events in July 1995 and 1999, August 2000 at the young coniferous forest were used as the observed data set, respectively. The topographic index was calculated using $10m{\times}10m$ resolution raster digital elevation map(DEM). The distribution of the topographic index ranged from 2.6 to 11.1 at the deciduous and 2.7 to 16.0 at the coniferous catchment. The result of the optimization using the forecasting efficiency as the objective function showed that the model parameter, m and the mean catchment value of surface saturated transmissivity, $lnT_0$ had a high sensitivity. The values of the optimized parameters for m and InT_0 were 0.034 and 0.038; 8.672 and 9.475 at the deciduous and 0.031, 0.032 and 0.033; 5.969, 7.129 and 7.575 at the coniferous catchment, respectively. The forecasting efficiencies resulted from the simulation using the optimized parameter were comparatively high ; 0.958 and 0.909 at the deciduous and 0.825, 0.922 and 0.961 at the coniferous catchment. The observed and simulated hyeto-hydrograph shoed that the time of lag to peak coincided well. Though the total runoff and peakflow of some events showed a discrepancy between the observed and simulated output, TOPMODEL could overall predict a hydrologic output at the estimation error less than 10 %. Therefore, TOPMODEL is useful tool for the prediction of runoff at an ungaged forested catchment in Korea.

  • PDF

Development and validation of an analytical method for fungicide fenpyrazamine determination in agricultural products by HPLC-UVD (HPLC-UVD를 이용한 살균제 fenpyrazamine의 시험법 개발 및 검증)

  • Park, Hyejin;Do, Jung-Ah;Kwon, Ji-Eun;Lee, Ji-Young;Cho, Yoon-Jae;Kim, Heejung;Oh, Jae-Ho;Rhee, Kyu-Sik;Lee, Sang-Jae;Chang, Moon-Ik
    • Analytical Science and Technology
    • /
    • v.27 no.3
    • /
    • pp.172-180
    • /
    • 2014
  • Fenpyrazamine which is a pyrazole fungicide class for controlling gray mold, sclerotinia rot, and Monilinia in grapevines, stone fruit trees, and vegetables has been registered in republic of Korea in 2013 and the maximum residue limits of fenpyrazamine is set to grape, peach, and mandarin as 5.0, 2.0, and 2.0 mg/kg, respectively. Very reliable and sensitive analytical method for determination of fenpyrazamine residues is required for ensuring the food safety in agricultural products. Fenpyrazamine residues in samples were extracted with acetonitrile, partitioned with dichloromethane, and then purified with silica-SPE cartridge and eluted with hexane and acetone mixture. The purified samples were determined by HPLC-UVD and confirmed with LC-MS and quantified using external standard method. Linear range of fenpyrazamine was between $0.1{\sim}5.0{\mu}g/mL$ with the correlation coefficient (r) 0.999. The average recovery ranged from 71.8 to 102.7% at the spiked level of 0.05, 0.5, and 5.0 mg/kg, while the relative standard deviation was between 0.1 and 7.3%. In addition, limit of detection and limit of quantitation were 0.01 and 0.05 mg/L, respectively. The results revealed that the developed and validated analytical method is possible for fenpyrazamine determination in agricultural product samples and will be used as an official analytical method.

EEPERF(Experiential Education PERFormance): An Instrument for Measuring Service Quality in Experiential Education (체험형 교육 서비스 품질 측정 항목에 관한 연구: 창의적 체험활동을 중심으로)

  • Park, Ky-Yoon;Kim, Hyun-Sik
    • Journal of Distribution Science
    • /
    • v.10 no.2
    • /
    • pp.43-52
    • /
    • 2012
  • As experiential education services are growing, the need for proper management is increasing. Considering that adequate measures are an essential factor for achieving success in managing something, it is important for managers to use a proper system of metrics to measure the performance of experiential education services. However, in spite of this need, little research has been done to develop a valid and reliable set of metrics for assessing the quality of experiential education services. The current study aims to develop a multi-item instrument for assessing the service quality of experiential education. The specific procedure is as follows. First, we generated a pool of possible metrics based on diverse literature on service quality. We elicited possiblemetric items not only from general service quality metrics such as SERVQUAL and SERVPERF but also from educational service quality metrics such as HEdPERF and PESPERF. Second, specialist teachers in the experiential education area screened the initial metrics to boost face validity. Third, we proceeded with multiple rounds of empirical validation of those metrics. Based on this processes, we refined the metrics to determine the final metrics to be used. Fourth, we examined predictive validity by checking the well-established positive relationship between each dimension of metrics and customer satisfaction. In sum, starting with the initial pool of scale items elicited from the previous literature and purifying them empirically through the surveying method, we developed a four-dimensional systemized scale to measure the superiority of experiential education and named it "Experiential Education PERFormance" (EEPERF). Our findings indicate that students (consumers) perceive the superiority of the experiential education (EE) service in the following four dimensions: EE-empathy, EE-reliability, EE-outcome, and EE-landscape. EE-empathy is a judgment in response to the question, "How empathetically does the experiential educational service provider interact with me?" Principal measures are "How well does the service provider understand my needs?," and "How well does the service provider listen to my voice?" Next, EE-reliability is a judgment in response to the question, "How reliably does the experiential educational service provider interact with me?" Major measures are "How reliable is the schedule here?," and "How credible is the service provider?" EE-outcome is a judgmentin response to the question, "What results could I get from this experiential educational service encounter?" Representative measures are "How good is the information that I will acquire form this service encounter?," and "How useful is this service encounter in helping me develop creativity?" Finally, EE-landscape is a judgment about the physical environment. Essential measures are "How convenient is the access to the service encounter?,"and "How well managed are the facilities?" We showed the reliability and validity of the system of metrics. All four dimensions influence customer satisfaction significantly. Practitioners may use the results in planning experiential educational service programs and evaluating each service encounter. The current study isexpected to act as a stepping-stone for future scale improvement. In this case, researchers may use the experience quality paradigm that has recently arisen.

  • PDF

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Establishment of Reference Range of Proinsulin (Proinsulin 참고치 설정에 관한 연구)

  • Nam, Yee Moon;Shin, Yong Hwan;Kim, Ji Young;Seok, Jae Dong
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.17 no.1
    • /
    • pp.76-79
    • /
    • 2013
  • Purpose: It is very important to establish the appropriate reference range in the laboratory for preventing mistakes like false positive or false negative. Because the reference range in the laboratory is standard of patient test results interpretation. Proinsulin is precursor hormone of insulin, and the importance is increasing for diagnosing diabetes or insulinoma. Proinsulin reagent used in our laboratory is produced in the USA, and the reference range provided by manufacturer was adapted to our reference range after the validation test. But, it is generally recommend for the every laboratory to establish the their own reference range. So, we decided to re-evaluate the reference range with our patients' test results. Materials and Methods: Among 737 patients who had been to health promotion center in our hospital between Dec. $8^{th}$ 2011 and Dec. $21^{st}$ 2011, 563 patients are chosen with exception of diabetics patients and patients showing abnormal test results in Fasting Glucose, HbA1c, Insulin, and C-peptide. The 563 test results (275 males and 288 females) were classified with three groups(entire, male, female), and analysis of normal distribution was performed with aid of SPSS(version 19.0). Because Each group didn't show normal distribution, the reference range was set from the lowest limit of 2.5% to the highest limit of 97.5% with Percentile method used in non-normal distribution. Results: When evaluation values are sorted in ascending order, the entire range is 4.5~52.0 pM and 5.3~51.9 pM for male and 4.5~52.0 pM for female. The calculated reference range with percentile method shows 6.7~26.5 pM for entire group, 6.8~26.5 pM for male and 6.7~26.5 pM for female, respectively. Conclusion: The reference range provided by reagent manufacturer is 6.4~9.4 pM and the one established in this study is 6.7~26.5 pM. This difference might be caused by racial characteristics between Western people and Koreans. So an ideal reference range can be gotten with normal population visiting to every hospital. Our hospital has been using the newly re-establishing reference range under consultation with the department of endocrinology since Aug. $1^{st}$ 2012.

  • PDF