• Title/Summary/Keyword: Accuracy of Prediction

Search Result 3,730, Processing Time 0.033 seconds

Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

  • Park, Seohui;Kim, Miae;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.321-335
    • /
    • 2021
  • Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.

Prediction of Correct Answer Rate and Identification of Significant Factors for CSAT English Test Based on Data Mining Techniques (데이터마이닝 기법을 활용한 대학수학능력시험 영어영역 정답률 예측 및 주요 요인 분석)

  • Park, Hee Jin;Jang, Kyoung Ye;Lee, Youn Ho;Kim, Woo Je;Kang, Pil Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.509-520
    • /
    • 2015
  • College Scholastic Ability Test(CSAT) is a primary test to evaluate the study achievement of high-school students and used by most universities for admission decision in South Korea. Because its level of difficulty is a significant issue to both students and universities, the government makes a huge effort to have a consistent difficulty level every year. However, the actual levels of difficulty have significantly fluctuated, which causes many problems with university admission. In this paper, we build two types of data-driven prediction models to predict correct answer rate and to identify significant factors for CSAT English test through accumulated test data of CSAT, unlike traditional methods depending on experts' judgments. Initially, we derive candidate question-specific factors that can influence the correct answer rate, such as the position, EBS-relation, readability, from the annual CSAT practices and CSAT for 10 years. In addition, we drive context-specific factors by employing topic modeling which identify the underlying topics over the text. Then, the correct answer rate is predicted by multiple linear regression and level of difficulty is predicted by classification tree. The experimental results show that 90% of accuracy can be achieved by the level of difficulty (difficult/easy) classification model, whereas the error rate for correct answer rate is below 16%. Points and problem category are found to be critical to predict the correct answer rate. In addition, the correct answer rate is also influenced by some of the topics discovered by topic modeling. Based on our study, it will be possible to predict the range of expected correct answer rate for both question-level and entire test-level, which will help CSAT examiners to control the level of difficulties.

Estimation of Chlorophyll-a Concentrations in the Nakdong River Using High-Resolution Satellite Image (고해상도 위성영상을 이용한 낙동강 유역의 클로로필-a 농도 추정)

  • Choe, Eun-Young;Lee, Jae-Woon;Lee, Jae-Kwan
    • Korean Journal of Remote Sensing
    • /
    • v.27 no.5
    • /
    • pp.613-623
    • /
    • 2011
  • This study assessed the feasibility to apply Two-band and Three-band reflectance models for chlorophyll-a estimation in turbid productive waters whose scale is smaller and narrower than ocean using a high spatial resolution image. Those band ratio models were successfully applied to analyzing chlorophyll-a concentrations of ocean or coastal water using Moderate Imaging Spectroradiometer(MODIS), Sea-viewing Wide Field-fo-view Sensor(SeaWiFS), Medium Resolution Imaging Spectrometer(MERIS), etc. Two-band and Three-band models based on band ratio such as Red and NIR band were generally used for the Chl-a in turbid waters. Two-band modes using Red and NIR bands of RapidEye image showed no significant results with $R^2$ 0.38. To enhance a band ratio between absorption and reflection peak, We used red-edge band(710 nm) of RapidEye image for Twoband and Three-band models. Red-RE Two-band and Red-RE-NIR Three-band reflectance model (with cubic equation) for the RapidEye image provided significance performances with $R^2$ 0.66 and 0.73, respectively. Their performance showed the 'Approximate Prediction' with RPD, 1.39 and 1.29 and RMSE, 24.8, 22.4, respectively. Another three-band model with quadratic equation showed similar performances to Red-RE two-band model. The findings in this study demonstrated that Two-band and Three-band reflectance models using a red-edge band can approximately estimate chlorophyll-a concentrations in a turbid river water using high-resolution satellite image. In the distribution map of estimated Chl-a concentrations, three-band model with cubic equation showed lower values than twoband model. In the further works, quantification and correction of spectral interferences caused by suspended sediments and colored dissolved organic matters will improve the accuracy of chlorophyll-a estimation in turbid waters.

Prediction of Maximal Oxygen Uptake Ages 18~34 Years (18~34 남성의 최대산소 섭취량 추정)

  • Jeon, Yoo-Joung;Im, Jae-Hyeng;Lee, Byung-Kun;Kim, Chang-Hwan;Kim, Byeong-Wan
    • 한국체육학회지인문사회과학편
    • /
    • v.51 no.3
    • /
    • pp.373-382
    • /
    • 2012
  • The purpose of this study is to predict VO2max with body index and submaximal metabolic responses. The subjects are consisted of 250 male aging from 18 to 34 and we separated them into two groups randomly; 179 for a sample, 71 for a cross-validation group. They went through maximal exercise testing with Bruce protocol, and we measured the metabolic responses in the end of the first(3 minute) and second stage(6 minute). To predict VO2max, we applied multiple regression analysis to the sample with stepwise method. Model 1's variables are weight, 6 minute HR and 6 minute VO2(R=0.64, SEE=4.74, CV=11.7%, p<.01), and the equation is VO2max(ml/kg/min)= 72.256-0.340(Weight)-0.220(6minHR)+0.013(6minVO2). Model 2's variables are weight, 6 minute HR, 6 minute VO2, and 6 minute VCO2(R=0.67, SEE=4.59, CV=11.3%, p<.01), and the equation is VO2max(ml/kg/min)= 68.699-0.277(Weight) -0.206(6minHR)+0.020(6minVO2)-0.009(6minVCO2). And the result did not show multicolinearity for both models. Model 2 demonstrated more correlation compared to Model 1. However, when we conducted cross-validation of those models with 71 men, measured VO2max and estimated VO2 Max had statistical significance with correlation (R=0.53, 0.56, P<.01). Although both models are functional with validity considering their simplicity and utility, Model 2 has more accuracy.

Comparison of Size Criteria in Mediastinal Lymph Node Involvement of Adenocarcinoma of Lungs (폐 선암의 종격동 림프절 전이에 있어서 림프절 크기 기준의 비교)

  • Gu, Ki-Seon;Kuk, Hiang;Koh, Hyeck-Jae;Yang, Sei-Hun;Jeong, Eun-Taik
    • Tuberculosis and Respiratory Diseases
    • /
    • v.46 no.4
    • /
    • pp.542-547
    • /
    • 1999
  • Background: Decision in mediastinal lymph node involvement of lung cancer by CT scan is very important and valuable for the treatment planning and prognosis prediction. In general, long diameter of mediastinal lymph node more than 15mm is used as criterion of lung cancer involvement. Adenocarci-noma has a tendency of early distant metastasis and micrometastasis, so adenocarcinoma may involve lymph node earlier and cannot be detected before lymph nodes are enlarged enough. The authors tried to determine the difference between two size criteria(15mm, 10mm) in adenocarcinoma for the detection of cancer involvement. Methods: Numbers of sample are 60 cases(male 46, female 14, median age: 61.5 years). According to pathology, squamous cancer 41, large cell cancer 2, adenocarcinoma 17. According to TNM stage, I 23, III 24, IIIA 13. Results : Mean long diameter of lymph node involvement is 16.0($\pm8.0$) mm in non-adenocarcinoma group, and that of adenocarcinoma group is 12.0($\pm3.2$) mm(p<0.05). If long diameter of lymph node larger than 15mm as involvement criterion is applied, sensitivity, specificity, positive predictive index, negative predictive index, accuracy of nonadenocarcinoma group are 54%, 100%, 100%, 83%, 86%, and those of adenocarcinoma group are 43%, 90%, 75%, 69%, 71%. If long diameter of lymph node larger than 10mm as involvement criterion is applied, sensitivity, specificity, positive predictive index. negative predictive index. accuracy of nonadenocarcinoma group are 65%, 77%, 61%, 92%, 79%, and those of adenocarcinoma group are 100%, 80%, 78%, 100%, 88%. Conclusion: Long diameter of lymph node larger than 10mm is more valuable criterion as lymph node involvement in adenocarcinoma of lungs.

  • PDF

Studies on the Changes of Sex Hormone Concentrations in Milk during the Reproductive Stages of Dairy Cows (유우의 번식과정에 따른 유즙중의 성호르몬 수준 변화에 관한 연구)

  • 김상근;이재근
    • Korean Journal of Animal Reproduction
    • /
    • v.9 no.1
    • /
    • pp.9-30
    • /
    • 1985
  • The study was carried out to find out the changes of the sex hormone levels in the milk of Holstein cows during the reproductive stages such as the estrous cycle, pregnancy and periparturient period. The FSH, LH, estradiol-17$\beta$ and progesterone from the milk samples were assayed by radioimmunoassay methods. The results of this study were summarized as follows: 1. The levels of progesterone and estradiol-17$\beta$ were similar among inter-quarters, but they were higher in after milking than before milking times, with no statistical significance. 2. The milk progesterone levels during the estrous cycles reached a peak mean level of 3.55$\pm$0.26ng/$m\ell$ at 15 days after estrus and they did not show any differences among the length of estrous cycles. The estradiol-17$\beta$ levels during the estrous cycles showed a peak level of 36.40$\pm$2.38pg/$m\ell$ at estrus, and decreased(17.20$\pm$0.46 pg/$m\ell$ to 18.65$\pm$1.26pg/$m\ell$) at luteal phase. 3. The FSH levels during the estrous cycles ranged from 2.25$\pm$0.23mIU/$m\ell$ to 4.35$\pm$0.24mIU/$m\ell$ showing significant changes. The LH levels during the estrous cycles gradually increased and remained a peak level of 10.90$\pm$0.36mIU/$m\ell$ from 20 to 25 days after estrus. 4. The progesterone levels during the pregnancy were decreased from 30 to 60 days after artificial insemination, and therafter continuously increased until 240 days. The estradiol-17$\beta$ levels during the pregnancy were 24.56$\pm$1.19pg/$m\ell$ at day 30 after artificial inseminaton, and increased rapidly until 180 days. The levles were agagin decreased by 26.17$\pm$3.03pg/$m\ell$ until 210 days and markedly increased by 68.00$\pm$8.70pg/$m\ell$ until 240 days. 5. The prolactin levels during the pregnancy were 31.27$\pm$2.31ng/$m\ell$ and 42.60$\pm$2.37ng/$m\ell$ at day 150 and 240 after artificial insemination respectively. The LH levels during the pregnancy reached a peak of 27.47$\pm$7.90mIU/$m\ell$ at day 30 after artificial insemination, and thereafter gradually decreased. 6. The progesterone levels during the periparturient period reached a peak of 4.61$\pm$0.34ng/$m\ell$ at day 3 prepartum, and thereafter gradually decreased, and showed 2.05$\pm$0.60ng/$m\ell$ at day 7 postpartum. The estradiol-17$\beta$ levels during the periparturient period showed high level from 207.23$\pm$6.04pg/$m\ell$ at day 1 prepartum to 239.90$\pm$13.90pg/$m\ell$ at day 2 prepartum, and thereafter began to decline and reached 51.87$\pm$1.72pg/$m\ell$ at by 7 postpartum. 7. The prolactin levels during the periparturient period showed relatively higher level at the time of parturition. The LH levels during the periparturient period rnage from 6.32$\pm$0.32mIU/$m\ell$ to 13.90$\pm$1.37mIU/$m\ell$ showing significant changes. 8. The progesterone levels(4.6$\pm$0.8ng/$m\ell$) of the pregnant cows were significantly higher than those (1.84$\pm$1.4ng/$m\ell$) of nonpregnant cows. The cows of artificial insemination from 61 to 90 days after parturition showed higher progesterone levels. 9. During 20 to 25 days after artificial insemination, the accuracy of pregnancy diagnosis from milk progesterone levels were 94.4% for nonpregnant cows(<2.3ng/$m\ell$), and 75.0% for pregnant cows( 3.2ng/$m\ell$). The average overall accuracy of pregnancy prediction for nonpregnant and pregnant cows 83.3% 10. The results obtained this study suggest that the understanding of the endocrinological mechanisms by means of milk hormone analysis during the estrous cycle, pregnancy and parturition would give the basic information needed for increasing efficiency of reproduction. This study would not only provide an accurate method of the early pregnancy diagnosis by milk progesterone levels but also contribute to the research of providing the method of detecting of FSH levels in milk, which was difficult in blood serum.

  • PDF

Evaluation of the quality of Italian Ryegrass Silages by Near Infrared Spectroscopy (근적외선 분광법을 이용한 이탈리안 라이그라스 사일리지의 품질 평가)

  • Park, Hyung-Soo;Lee, Sang-Hoon;Choi, Ki-Choon;Lim, Young-Chul;Kim, Jong-Gun;Jo, Kyu-Chea;Choi, Gi-Jun
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.32 no.3
    • /
    • pp.301-308
    • /
    • 2012
  • Near infrared reflectance spectroscopy (NIRS) has become increasingly used as a rapid and accurate method of evaluating some chemical compositions in forages. This study was carried out to explore the accuracy of near infrared spectroscopy (NIRS) for the prediction of chemical parameters of Italian ryegrass silages. A population of 267 Italian ryegrass silages representing a wide range in chemical parameters and fermentative characteristics was used in this investigation. Samples of silage were scanned at 2 nm intervals over the wavelength range 680~2,500 nm and the optical data recorded as log 1/Reflectance (log 1/R) and scanned in intact fresh condition. The spectral data were regressed against a range of chemical parameters using partial least squares (PLS) multivariate analysis in conjunction with spectral math treatments to reduced the effect of extraneous noise. The optimum calibrations were selected on the basis of the highest coefficients of determination in cross validation ($R^2$) and the lowest standard error of cross validation (SECV). The results of this study showed that NIRS predicted the chemical parameters with very high degree of accuracy. The $R^2$ and SECV were 0.98 (SECV 1.27%) for moisture, 0.88 (SECV 1.26%) for ADF, 0.84 (SECV 2.0%), 0.93 (SECV 0.96%) for CP and 0.78 (SECV 0.56), 0.81 (SECV 0.31%), 0.88 (SECV 1.26%) and 0.82 (SECV 4.46) for pH, lactic acid, TDN and RFV on a dry matter (%), respectively. Results of this experiment showed the possibility of NIRS method to predict the chemical composition and fermentation quality of Italian ryegrass silages as routine analysis method in feeding value evaluation and for farmer advice.

Safety and Efficacy of Ultrasound-Guided Percutaneous Core Needle Biopsy of Pancreatic and Peripancreatic Lesions Adjacent to Critical Vessels (주요 혈관 근처의 췌장 또는 췌장 주위 병변에 대한 초음파 유도하 경피적 중심 바늘 생검의 안전성과 효율성)

  • Sun Hwa Chung;Hyun Ji Kang;Hyo Jeong Lee;Jin Sil Kim;Jeong Kyong Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.82 no.5
    • /
    • pp.1207-1217
    • /
    • 2021
  • Purpose To evaluate the safety and efficacy of ultrasound-guided percutaneous core needle biopsy (USPCB) of pancreatic and peripancreatic lesions adjacent to critical vessels. Materials and Methods Data were collected retrospectively from 162 patients who underwent USPCB of the pancreas (n = 98), the peripancreatic area adjacent to the portal vein, the paraaortic area adjacent to pancreatic uncinate (n = 34), and lesions on the third duodenal portion (n = 30) during a 10-year period. An automated biopsy gun with an 18-gauge needle was used for biopsies under US guidance. The USPCB results were compared with those of the final follow-up imaging performed postoperatively. The diagnostic accuracy and major complication rate of the USPCB were calculated. Multiple factors were evaluated for the prediction of successful biopsies using univariate and multivariate analyses. Results The histopathologic diagnosis from USPCB was correct in 149 (92%) patients. The major complication rate was 3%. Four cases of mesenteric hematomas and one intramural hematoma of the duodenum occurred during the study period. The following factors were significantly associated with successful biopsies: a transmesenteric biopsy route rather than a transgastric or transenteric route; good visualization of targets; and evaluation of the entire US pathway. In addition, the number of biopsies required was less when the biopsy was successful. Conclusion USPCB demonstrated high diagnostic accuracy and a low complication rate for the histopathologic diagnosis of pancreatic and peripancreatic lesions adjacent to critical vessels.

Evaluation of Moisture and Feed Values for Winter Annual Forage Crops Using Near Infrared Reflectance Spectroscopy (근적외선분광법을 이용한 동계사료작물 풀 사료의 수분함량 및 사료가치 평가)

  • Kim, Ji Hea;Lee, Ki Won;Oh, Mirae;Choi, Ki Choon;Yang, Seung Hak;Kim, Won Ho;Park, Hyung Soo
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.39 no.2
    • /
    • pp.114-120
    • /
    • 2019
  • This study was carried out to explore the accuracy of near infrared spectroscopy(NIRS) for the prediction of moisture content and chemical parameters on winter annual forage crops. A population of 2454 winter annual forages representing a wide range in chemical parameters was used in this study. Samples of forage were scanned at 1nm intervals over the wavelength range 680-2500nm and the optical data was recorded as log 1/Reflectance(log 1/R), which scanned in intact fresh condition. The spectral data were regressed against a range of chemical parameters using partial least squares(PLS) multivariate analysis in conjunction with spectral math treatments to reduced the effect of extraneous noise. The optimum calibrations were selected based on the highest coefficients of determination in cross validation($R^2$) and the lowest standard error of cross-validation(SECV). The results of this study showed that NIRS calibration model to predict the moisture contents and chemical parameters had very high degree of accuracy except for barely. The $R^2$ and SECV for integrated winter annual forages calibration were 0.99(SECV 1.59%) for moisture, 0.89(SECV 1.15%) for acid detergent fiber, 0.86(SECV 1.43%) for neutral detergent fiber, 0.93(SECV 0.61%) for crude protein, 0.90(SECV 0.45%) for crude ash, and 0.82(SECV 3.76%) for relative feed value on a dry matter(%), respectively. Results of this experiment showed the possibility of NIRS method to predict the moisture and chemical composition of winter annual forage for routine analysis method to evaluate the feed value.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.