• Title/Summary/Keyword: Model Validation

Search Result 3,188, Processing Time 0.037 seconds

DISEASE DIAGNOSED AND DESCRIBED BY NIRS

  • Tsenkova, Roumiana N.
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1031-1031
    • /
    • 2001
  • The mammary gland is made up of remarkably sensitive tissue, which has the capability of producing a large volume of secretion, milk, under normal or healthy conditions. When bacteria enter the gland and establish an infection (mastitis), inflammation is initiated accompanied by an influx of white cells from the blood stream, by altered secretory function, and changes in the volume and composition of secretion. Cell numbers in milk are closely associated with inflammation and udder health. These somatic cell counts (SCC) are accepted as the international standard measurement of milk quality in dairy and for mastitis diagnosis. NIR Spectra of unhomogenized composite milk samples from 14 cows (healthy and mastitic), 7days after parturition and during the next 30 days of lactation were measured. Different multivariate analysis techniques were used to diagnose the disease at very early stage and determine how the spectral properties of milk vary with its composition and animal health. PLS model for prediction of somatic cell count (SCC) based on NIR milk spectra was made. The best accuracy of determination for the 1100-2500nm range was found using smoothed absorbance data and 10 PLS factors. The standard error of prediction for independent validation set of samples was 0.382, correlation coefficient 0.854 and the variation coefficient 7.63%. It has been found that SCC determination by NIR milk spectra was indirect and based on the related changes in milk composition. From the spectral changes, we learned that when mastitis occurred, the most significant factors that simultaneously influenced milk spectra were alteration of milk proteins and changes in ionic concentration of milk. It was consistent with the results we obtained further when applied 2DCOS. Two-dimensional correlation analysis of NIR milk spectra was done to assess the changes in milk composition, which occur when somatic cell count (SCC) levels vary. The synchronous correlation map revealed that when SCC increases, protein levels increase while water and lactose levels decrease. Results from the analysis of the asynchronous plot indicated that changes in water and fat absorptions occur before other milk components. In addition, the technique was used to assess the changes in milk during a period when SCC levels do not vary appreciably. Results indicated that milk components are in equilibrium and no appreciable change in a given component was seen with respect to another. This was found in both healthy and mastitic animals. However, milk components were found to vary with SCC content regardless of the range considered. This important finding demonstrates that 2-D correlation analysis may be used to track even subtle changes in milk composition in individual cows. To find out the right threshold for SCC when used for mastitis diagnosis at cow level, classification of milk samples was performed using soft independent modeling of class analogy (SIMCA) and different spectral data pretreatment. Two levels of SCC - 200 000 cells/$m\ell$ and 300 000 cells/$m\ell$, respectively, were set up and compared as thresholds to discriminate between healthy and mastitic cows. The best detection accuracy was found with 200 000 cells/$m\ell$ as threshold for mastitis and smoothed absorbance data: - 98% of the milk samples in the calibration set and 87% of the samples in the independent test set were correctly classified. When the spectral information was studied it was found that the successful mastitis diagnosis was based on reviling the spectral changes related to the corresponding changes in milk composition. NIRS combined with different ways of spectral data ruining can provide faster and nondestructive alternative to current methods for mastitis diagnosis and a new inside into disease understanding at molecular level.

  • PDF

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Validation of Food Intake Frequency from Food Frequency Questionnaire for Use as a Covariate in a Model to Estimate Usual Food Intake (식품의 일상섭취량 추정을 위한 식품섭취빈도의 활용가능성 및 타당도 연구)

  • Lee, Ja Yoon;Kim, Dong Woo
    • Culinary science and hospitality research
    • /
    • v.23 no.2
    • /
    • pp.64-73
    • /
    • 2017
  • Although 24-hour recalls (24HR) capture detailed information on a person's food intake, this method suffers from difficulties in adequately measuring the usual intake of foods that are not consumed daily by most. Therefore, the purpose of this study is to investigate whether frequency of Food Frequency Questionnaire (FFQ) can be utilized in form of covariate when calculating usual intake of episodically-consumed foods and their distributions. Data used in this study was from the Korean National Healthy and Nutrition Examination Survey (KNHANES) 2012~2014 (3 years) and 10,945 subjects participated in this survey who performed both of 24HR and FFQ. In order to analyze the data, amount of intake in each food, which was reported in 24HR was recalculated according to 112 items in FFQ. We first assessed the relationship between FFQ frequency and the amount reported on 24HR. Second, we assessed the relationship between usual portion size of FFQ and the amount reported on 24HR. Our hypothesis was that people who reported high FFQ-reported frequency or FFQ-reported usual portion size would consume larger amounts of that food on 24HR than those with lower frequency or portion size of consumption of a food on the FFQ. For 59 of 112 individual foods (52.2%), there were statistically significant increasing relationships between FFQ frequency and consumption-day intake. Also, 102 of 112 individual foods (90.3%), there were statistically significant increasing relationships between FFQ usual portion size and consumption-day intake. For 10 of 13 food groups (grains, fruits, eggs, pulses, root and tuber crops, milk products, meat, beverage, alcoholic drink, vegetable, seaweeds and others), there were statistically significant increasing relationships between FFQ frequency and consumption-day intake. And there were statistically significant increasing relationships between FFQ usual portion size and consumption-day intake for all food groups. This study confirmed consistent correlation between reported FFQ frequency or usual portion size of food (group) consumption and consumption-day intake on 24HR. Therefore the frequency data may be utilized as important covariate when estimating usual intake of food or food groups.

Quantification of Protein and Amylose Contents by Near Infrared Reflectance Spectroscopy in Aroma Rice (근적외선 분광분석법을 이용한 향미벼의 아밀로스 및 단백질 정량분석)

  • Kim, Jeong-Soon;Song, Mi-Hee;Choi, Jae-Eul;Lee, Hee-Bong;Ahn, Sang-Nag
    • Korean Journal of Food Science and Technology
    • /
    • v.40 no.6
    • /
    • pp.603-610
    • /
    • 2008
  • The principal objective of current study was to evaluate the potential of near infrared reflectance spectroscopy (NIRS) as a non-destructive method for the prediction of the amylose and protein contents of un-hulled and brown rice in broad-based calibration models. The average amylose and protein content of 75 rice accessions were 20.3% and 7.1%, respectively. Additionally, the range of amylose and protein content were 16.6-24.5% and 3.8-9.3%, respectively. In total, 79 rice germplasms representing a wide range of chemical characteristics, variable physical properties, and origins were scanned via NIRS for calibration and validation equations. The un-hulled and brown rice samples evidenced distinctly different patterns in a wavelength range from 1,440 nm to 2,400 nm in the original NIR spectra. The optimal performance calibration model could be obtained by MPLS (modified partial least squares) using the first derivative method (1:4:4:1) for un-hulled rice and the second derivative method (2:4:4:1) for brown rice. The correlation coefficients $(r^2)$ and standard error of calibration (SEC) of protein and amylose contents for the un-hulled rice were 0.86, 2.48, and 0.84, 1.13, respectively. The $r^2$ and SEC of protein and amylose content for brown rice were 0.95, 1.09 and 0.94, 0.42, respectively. The results of this study suggest that the NIRS technique could be utilized as a routine procedure for the quantification of protein and amylose contents in large accessions of un-hulled rice germplasms.

Development of NQ-A, Nutrition Quotient for Korean Adolescents, to assess dietary quality and food behavior (청소년을 위한 영양지수 개발과 타당도 검증)

  • Kim, Hye-Young;Lee, Jung-Sug;Hwang, Ji-Yun;Kwon, Sehyug;Chung, Hae Rang;Kwak, Tong-Kyung;Kang, Myung-Hee;Choi, Young-Sun
    • Journal of Nutrition and Health
    • /
    • v.50 no.2
    • /
    • pp.142-157
    • /
    • 2017
  • Purpose: The purpose of this study was to develop a nutrition quotient for adolescents (NQ-A) to assess overall dietary quality and food behavior of Korean adolescents. Methods: Development of the NQ-A was undertaken in three steps: item generation, item reduction, and validation. Candidate items of the NQ-A checklist were selected based on literature reviews, results of the fifth Korea National Health and Nutrition Examination Survey data, dietary guidelines for Korean adolescents, expert in-depth interviews, and national nutrition policies and recommendations. A total of 213 middle and high school students participated in a one-day dietary record survey and responded to 41 items in the food behavior checklist. Pearson's correlation coefficients between the responses to the checklist items along with nutritional status of the adolescents were calculated. Item reduction was performed, and 24 items were selected for the nation-wide survey. A total of 1,547 adolescents from 17 cities completed the checklist questionnaire. Exploratory factor and confirmatory factor analyses were performed to develop a final NQ-A model. Results: Nineteen items were finalized as the checklist items for the NQ-A. Checklist items were composed of five factors (balance, diversity, moderation, environment, and practice). The five-factor structure accounted for 47.2% of the total variance. Standardized path coefficients were used as weights of the items. The NQ-A and five-factor scores were calculated based on the obtained weights of the questionnaire items. Conclusion: Nutrition Quotient for adolescents (NQ-A) would be a useful instrument for evaluating dietary quality and food behavior of Korean adolescents. Further research on NQ-A is needed to reflect changes in adolescent's food behavior and environment.

Simulation and model validation of Biomass Fast Pyrolysis in a fluidized bed reactor using CFD (전산유체역학(CFD)을 이용한 유동층반응기 내부의 목질계 바이오매스 급속 열분해 모델 비교 및 검증)

  • Ju, Young Min;Euh, Seung Hee;Oh, Kwang cheol;Lee, Kang Yol;Lee, Beom Goo;Kim, Dae Hyun
    • Journal of Energy Engineering
    • /
    • v.24 no.4
    • /
    • pp.200-210
    • /
    • 2015
  • The modeling for fast pyrolysis of biomass in fluidized bed reactor has been developed for accurate prediction of bio-oil and gas products and for yield improvement. The purpose of this study is to analyze and to compare the CFD(Computational Fluid Dynamics) simulation results with the experimental data from the CFD simulation results with the experimental data from the reference(Mellin et al., 2014) for gas products generated during fast pyrolysis of biomass in fluidized bed reactor. CFD(ANSYS FLUENT v.15.0) was used for the simulation. Complex pyrolysis reaction scheme of biomass subcomponents was applied for the simulation of pyrolysis reaction. This pyrolysis reaction scheme was included reaction of cellulose, hemicellulose, lignin in detail, gas products obtained from pyrolysis were mainly $CO_2$, CO, $CH_4$, $H_2$, $C_2H_4$. The deviation between the simulation results from this study and experimental data from the reference was calculated about 3.7%p, 4.6%p, 3.9%p for $CH_4$, $H_2$, $C_2H_4$ respectively, whereas 9.6%p and 6.7%p for $CO_2$ and CO which are relatively high. Through this study, it is possible to predict gas products accurately by using CFD simulation approach. Moreover, this modeling approach should be developed to predict fluidized bed reactor performance and other gas product yields.

An Experimental and Numerical Study on the Survivability of a Long Pipe-Type Buoy Structure in Waves (긴 파이프로 이뤄진 세장형 부이 구조물의 파랑 중 생존성에 관한 모형시험 및 수치해석 연구)

  • Kwon, Yong-Ju;Nam, Bo-Woo;Kim, Nam-Woo;Park, In-Bo;Kim, Sea-Moon
    • Journal of Navigation and Port Research
    • /
    • v.42 no.6
    • /
    • pp.427-436
    • /
    • 2018
  • In this study, experimental and numerical analysis were performed on the survivability of a long pipe-type buoy structure in waves. The buoy structure is an articulated tower consisting of an upper structure, buoyancy module, and gravity anchor with long pipes forming the base frame. A series of experiment were performed in the ocean engineering basin of KRISO with the scaled model of 1/ 22 to evaluate the survivability of the buoy structure at West Sea in South Korea. Survival condition was considered as the wave of 50 year return period. Additional experiments were performed to investigate the effects of current and wave period. The factors considered for the evaluation of the buoy's survival were the pitch angle of the structure, anchor reaction force, and the number of submergence of the upper structure. Numerical simulations were carried out with the OrcaFlex, the commercial program for the mooring analysis, with the aim of performing mutual validation with the experimental results. Based on the evaluation, the behavior characteristics of the buoy structure were first examined according to the tidal conditions. The changes were investigated for the pitch angle and anchor reaction force at HAT and LAT conditions, and the results directly compared with those obtained from numerical simulation. Secondly, the response characteristics of the buoy structure were studied depending on the wave period and the presence of current velocity. Third, the number of submergence through video analysis was compared with the simulation results in relation to the submergence of the upper structure. Finally, the simulation results for structural responses which were not directly measured in the experiment were presented, and the structural safety discussed in the survival waves. Through a series of survivability evaluation studies, the behavior characteristics of the buoy structure were examined in survival waves. The vulnerability and utility of the buoy structure were investigated through the sensitivity studies of waves, current, and tides.

Predicting the Pre-Harvest Sprouting Rate in Rice Using Machine Learning (기계학습을 이용한 벼 수발아율 예측)

  • Ban, Ho-Young;Jeong, Jae-Hyeok;Hwang, Woon-Ha;Lee, Hyeon-Seok;Yang, Seo-Yeong;Choi, Myong-Goo;Lee, Chung-Keun;Lee, Ji-U;Lee, Chae Young;Yun, Yeo-Tae;Han, Chae Min;Shin, Seo Ho;Lee, Seong-Tae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.4
    • /
    • pp.239-249
    • /
    • 2020
  • Rice flour varieties have been developed to replace wheat, and consumption of rice flour has been encouraged. damage related to pre-harvest sprouting was occurring due to a weather disaster during the ripening period. Thus, it is necessary to develop pre-harvest sprouting rate prediction system to minimize damage for pre-harvest sprouting. Rice cultivation experiments from 20 17 to 20 19 were conducted with three rice flour varieties at six regions in Gangwon-do, Chungcheongbuk-do, and Gyeongsangbuk-do. Survey components were the heading date and pre-harvest sprouting at the harvest date. The weather data were collected daily mean temperature, relative humidity, and rainfall using Automated Synoptic Observing System (ASOS) with the same region name. Gradient Boosting Machine (GBM) which is a machine learning model, was used to predict the pre-harvest sprouting rate, and the training input variables were mean temperature, relative humidity, and total rainfall. Also, the experiment for the period from days after the heading date (DAH) to the subsequent period (DA2H) was conducted to establish the period related to pre-harvest sprouting. The data were divided into training-set and vali-set for calibration of period related to pre-harvest sprouting, and test-set for validation. The result for training-set and vali-set showed the highest score for a period of 22 DAH and 24 DA2H. The result for test-set tended to overpredict pre-harvest sprouting rate on a section smaller than 3.0 %. However, the result showed a high prediction performance (R2=0.76). Therefore, it is expected that the pre-harvest sprouting rate could be able to easily predict with weather components for a specific period using machine learning.

A Study on the Influence of Workers' Aspiration for Academic Needs on Participation in University Education (근로자의 학업욕구 열망이 대학교육 참여에 미치는 영향에 관한 연구)

  • Lee, Ji-Hun;Mun, Bok-Hyun
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.3
    • /
    • pp.231-241
    • /
    • 2021
  • This study intended to present strategies and implications for attracting new students and customized education to university officials through research on the participation of workers' academic aspirations in university education. Thus, variables were derived by analyzing prior data, and causal settings between variables and questionnaires were developed. Subject to the survey, 331 workers interested in participating in university education were collected through interpersonal interviews. The collected data were dataized, and reliability and feasibility verification and frequency analysis were conducted. Finally, we validate the fit of the structural equation model and the causal relationship for each concept. Therefore, the results of the validation show the following implications. First, university officials should be motivated by a mentor and mentee system with experienced people who have switched to a suitable vocational group through university education. It will also be necessary to develop and disseminate programs so that they can continue to develop themselves for the future. To this end, it will be necessary to help them understand their aptitude and strengths through consultation with experts. Second, university officials should strengthen public relations so that prospective students can know the cases and information of the job transformation of the admitted workers through recommendations. It will also be necessary to develop university education programs that can self-develop, accept various ideas through "public contest", and provide accurate information about university education to workers through re-processing. Third, university officials should provide workers with a program that allows them to catch two rabbits: job transformation and self-improvement through university education. In other words, it is necessary to stimulate the motivation of workers by providing various information such as visiting advanced overseas companies, obtaining various certificates, moving between departments of blue-collar and white-collar, and transfer opportunities. Fourth, university officials should actively promote university education programs related to this by participating in university education and receiving systematic education and the flow of social environment. Finally, university officials will need to consult and promote workers so that they can self-develop when they participate in college education, and they will have to figure out what they need for self-development through demand surveys and analysis.

Evaluation of Sensitivity and Retrieval Possibility of Land Surface Temperature in the Mid-infrared Wavelength through Radiative Transfer Simulation (복사전달모의를 통한 중적외 파장역의 민감도 분석 및 지표면온도 산출 가능성 평가)

  • Choi, Youn-Young;Suh, Myoung-Seok;Cha, DongHwan;Seo, DooChun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1423-1444
    • /
    • 2022
  • In this study, the sensitivity of the mid-infrared radiance to atmospheric and surface factors was analyzed using the radiative transfer model, MODerate resolution atmospheric TRANsmission (MODTRAN6)'s simulation data. The possibility of retrieving the land surface temperature (LST) using only the mid-infrared bands at night was evaluated. Based on the sensitivity results, the LST retrieval algorithm that reflects various factors for night was developed, and the level of the LST retrieval algorithm was evaluated using reference LST and observed LST. Sensitivity experiments were conducted on the atmospheric profiles, carbon dioxide, ozone, diurnal variation of LST, land surface emissivity (LSE), and satellite viewing zenith angle (VZA), which mainly affect satellite remote sensing. To evaluate the possibility of using split-window method, the mid-infrared wavelength was divided into two bands based on the transmissivity. Regardless of the band, the top of atmosphere (TOA) temperature is most affected by atmospheric profile, and is affected in order of LSE, diurnal variation of LST, and satellite VZA. In all experiments, band 1, which corresponds to the atmospheric window, has lower sensitivity, whereas band 2, which includes ozone and water vapor absorption, has higher sensitivity. The evaluation results for the LST retrieval algorithm using prescribed LST showed that the correlation coefficient (CC), the bias and the root mean squared error (RMSE) is 0.999, 0.023K and 0.437K, respectively. Also, the validation with 26 in-situ observation data in 2021 showed that the CC, bias and RMSE is 0.993, 1.875K and 2.079K, respectively. The results of this study suggest that the LST can be retrieved using different characteristics of the two bands of mid-infrared to the atmospheric and surface conditions at night. Therefore, it is necessary to retrieve the LST using satellite data equipped with sensors in the mid-infrared bands.