• Title/Summary/Keyword: Accuracy of Prediction

Search Result 3,757, Processing Time 0.031 seconds

Development of a Retrieval Algorithm for Adjustment of Satellite-viewed Cloudiness (위성관측운량 보정을 위한 알고리즘의 개발)

  • Son, Jiyoung;Lee, Yoon-Kyoung;Choi, Yong-Sang;Ok, Jung;Kim, Hye-Sil
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.3
    • /
    • pp.415-431
    • /
    • 2019
  • The satellite-viewed cloudiness, a ratio of cloudy pixels to total pixels ($C_{sat,\;prev}$), inevitably differs from the "ground-viewed" cloudiness ($C_{grd}$) due to different viewpoints. Here we develop an algorithm to retrieve the satellite-viewed, but adjusted cloudiness to $C_{grd} (C_{sat,\;adj})$. The key process of the algorithm is to convert the cloudiness projected on the plane surface into the cloudiness on the celestial hemisphere from the observer. For this conversion, the supplementary satellite retrievals such as cloud detection and cloud top pressure are used as they provide locations of cloudy pixels and cloud base height information, respectively. The algorithm is tested for Himawari-8 level 1B data. The $C_{sat,\;adj}$ and $C_{sat,\;prev}$ are retrieved and validated with $C_{grd}$ of SYNOP station over Korea (22 stations) and China (724 stations) during only daytime for the first seven days of every month from July 2016 to June 2017. As results, the mean error of $C_{sat,\;adj}$ (0.61) is less that than that of $C_{sat,\;prev}$ (1.01). The percent of detection for 'Cloudy' scenario of $C_{sat,\;adj}$ (73%) is higher than that of $C_{sat,\;prev}$ (60%) The percent of correction, the accuracy, of $C_{sat,\;adj}$ is 61%, while that of $C_{sat,\;prev}$ is 55% for all seasons. For the December-January-February period when cloudy pixels are readily overestimated, the proportion of correction of $C_{sat,\;adj$ is 60%, while that of $C_{sat,\;prev}$ is 56%. Therefore, we conclude that the present algorithm can effectively get the satellite cloudiness near to the ground-viewed cloudiness.

Prediction on the Quality of Total Mixed Ration for Dairy Cows by Near Infrared Reflectance Spectroscopy (근적외선 분광법에 의한 국내 축우용 TMR의 성분추정)

  • Ki, Kwang-Seok;Kim, Sang-Bum;Lee, Hyun-June;Yang, Seung-Hak;Lee, Jae-Sik;Jin, Ze-Lin;Kim, Hyeon-Shup;Jeo, Joon-Mo;Koo, Jae-Yeon;Cho, Jong-Ku
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.29 no.3
    • /
    • pp.253-262
    • /
    • 2009
  • The present study was conducted to develop a rapid and accurate method of evaluating chemical composition of total mixed ration (TMR) for dairy cows using near infrared reflectance spectroscopy (NIRS). A total of 253 TMR samples were collected from TMR manufacturers and dairy farms in Korea. Prior to NIR analysis, TMR samples were dried at $65^{\circ}C$ for 48 hour and then ground to 2 mm size. The samples were scanned at 2 nm interval over the wavelength range of 400-2500 nm on a FOSS-NIR Systems Model 6500. The values obtained by NIR analysis and conventional chemical methods were compared. Generally, the relationship between chemical analysis and NIR analysis was linear: $R^2$ and standard error of calibration (SEC) were 0.701 (SEC 0.407), 0.965 (SEC 0.315), 0.796 (SEC 0.406), 0.889 (SEC 0.987), 0.894 (SEC 0.311), 0.933 (SEC 0.885) and 0.889 (SEC 1.490) for moisture, crude protein, ether extract, crude fiber, crude ash, acid detergent fiber (ADF) and neutral detergent fiber (NDF), respectively. In addition, the standard error of prediction (SEP) value was 0.371, 0.290, 0.321, 0.380, 0.960, 0.859 and 1.446 for moisture, crude protein, ether extract, crude fiber, crude ash, ADF and NDF, respectively. The results of the present study showed that the NIR analysis for unknown TMR samples would be relatively accurate. Use of the developed NIR calibration curve can obtain fast and reliable data on chemical composition of TMR. Collection and analysis of more TMR samples will increase accuracy and precision of NIR analysis to TMR.

Application of deep learning method for decision making support of dam release operation (댐 방류 의사결정지원을 위한 딥러닝 기법의 적용성 평가)

  • Jung, Sungho;Le, Xuan Hien;Kim, Yeonsu;Choi, Hyungu;Lee, Giha
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1095-1105
    • /
    • 2021
  • The advancement of dam operation is further required due to the upcoming rainy season, typhoons, or torrential rains. Besides, physical models based on specific rules may sometimes have limitations in controlling the release discharge of dam due to inherent uncertainty and complex factors. This study aims to forecast the water level of the nearest station to the dam multi-timestep-ahead and evaluate the availability when it makes a decision for a release discharge of dam based on LSTM (Long Short-Term Memory) of deep learning. The LSTM model was trained and tested on eight data sets with a 1-hour temporal resolution, including primary data used in the dam operation and downstream water level station data about 13 years (2009~2021). The trained model forecasted the water level time series divided by the six lead times: 1, 3, 6, 9, 12, 18-hours, and compared and analyzed with the observed data. As a result, the prediction results of the 1-hour ahead exhibited the best performance for all cases with an average accuracy of MAE of 0.01m, RMSE of 0.015 m, and NSE of 0.99, respectively. In addition, as the lead time increases, the predictive performance of the model tends to decrease slightly. The model may similarly estimate and reliably predicts the temporal pattern of the observed water level. Thus, it is judged that the LSTM model could produce predictive data by extracting the characteristics of complex hydrological non-linear data and can be used to determine the amount of release discharge from the dam when simulating the operation of the dam.

The Study on Gyeokguk and Sangshin (격국과 상신에 대한 소고)

  • Hwangbo, Kwan
    • Industry Promotion Research
    • /
    • v.7 no.3
    • /
    • pp.115-124
    • /
    • 2022
  • The most difficult things, when we study the future-telling science of human destiny, are in case of what one's individual's fate is bad which is shown by Saju-Palza(四柱八字), In that case, we have faced the problems on how we live ; to follow or to deny our fate under the brief of improving our lives by trying to make hard efforts, regardless of the bad Saju-Palza(四柱八字). However, we can hardly find the clear answer to those questions. 『Liao Fan 4 lessons(了凡四訓)』 shows that one's destiny can be improved by accumulating good deeds despite of the bad Saju-Palza(四柱八字). Someone says that future can be created, not be foreseen. As well, Dr. Steven Coby says that the best definite way to forecast future is in creating the future. Anyhow, the strong desire and curiosity to know one's individual's future is having been lasted until now since the Genesis. we guess these desires may be one of our basic instinct. If then, the function and role of the future-telling science will be to increase the accuracy of future prediction, whether our fate has been fixed or been able to be changeable. Therefore, this study summarizes the definition of confusing terms, focusing on Gyeokguk(格局) and Sangshin(相神), the core of Myeongrihak(命理學), which is considered to be one of the most popular future-telling science. Concering Gyeok(格), in this paper, Nae-Gyeok(內格) has been mainly considered and Oi-Gyeok(外格) or Special-Gyeok(別格) have not been addressed. Specifically, it summarized the views of the classical Myeongri(命理) books and modern scholars on Gyeokguk(格局) and Yongshin(用神). In particular, it also summarized the comparison of various concepts of Gyeokguk(格局), the advantages and disadvantages of each Nae-Gyeok(內格)'s characteristic, the determination order of Nae-Gyeok(內格) and the good case and bad case of it's Gyeok(格). In addition, it was necessary to summarize the concept of Sangshin(相神), which was talked about in 『Japyeongjinjeon』 and to briefly summarize Heeshin(喜神) with a broader concept than Sangshin(相神). The different usage of Sangshin(相神) was also analyzed, between the priority interpretation of Cheongan(天干) in Day-Column(日柱) and the interpretation based on Jijee(地支) in Month-Column(月柱). Finally, this paper was completed, leaving it later as a research task, the confusion that comes from the scholars' acceptance of the comprehensive diversity on the same term.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

  • Park, Seohui;Kim, Miae;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.321-335
    • /
    • 2021
  • Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.

Prediction of Correct Answer Rate and Identification of Significant Factors for CSAT English Test Based on Data Mining Techniques (데이터마이닝 기법을 활용한 대학수학능력시험 영어영역 정답률 예측 및 주요 요인 분석)

  • Park, Hee Jin;Jang, Kyoung Ye;Lee, Youn Ho;Kim, Woo Je;Kang, Pil Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.509-520
    • /
    • 2015
  • College Scholastic Ability Test(CSAT) is a primary test to evaluate the study achievement of high-school students and used by most universities for admission decision in South Korea. Because its level of difficulty is a significant issue to both students and universities, the government makes a huge effort to have a consistent difficulty level every year. However, the actual levels of difficulty have significantly fluctuated, which causes many problems with university admission. In this paper, we build two types of data-driven prediction models to predict correct answer rate and to identify significant factors for CSAT English test through accumulated test data of CSAT, unlike traditional methods depending on experts' judgments. Initially, we derive candidate question-specific factors that can influence the correct answer rate, such as the position, EBS-relation, readability, from the annual CSAT practices and CSAT for 10 years. In addition, we drive context-specific factors by employing topic modeling which identify the underlying topics over the text. Then, the correct answer rate is predicted by multiple linear regression and level of difficulty is predicted by classification tree. The experimental results show that 90% of accuracy can be achieved by the level of difficulty (difficult/easy) classification model, whereas the error rate for correct answer rate is below 16%. Points and problem category are found to be critical to predict the correct answer rate. In addition, the correct answer rate is also influenced by some of the topics discovered by topic modeling. Based on our study, it will be possible to predict the range of expected correct answer rate for both question-level and entire test-level, which will help CSAT examiners to control the level of difficulties.

Estimation of Chlorophyll-a Concentrations in the Nakdong River Using High-Resolution Satellite Image (고해상도 위성영상을 이용한 낙동강 유역의 클로로필-a 농도 추정)

  • Choe, Eun-Young;Lee, Jae-Woon;Lee, Jae-Kwan
    • Korean Journal of Remote Sensing
    • /
    • v.27 no.5
    • /
    • pp.613-623
    • /
    • 2011
  • This study assessed the feasibility to apply Two-band and Three-band reflectance models for chlorophyll-a estimation in turbid productive waters whose scale is smaller and narrower than ocean using a high spatial resolution image. Those band ratio models were successfully applied to analyzing chlorophyll-a concentrations of ocean or coastal water using Moderate Imaging Spectroradiometer(MODIS), Sea-viewing Wide Field-fo-view Sensor(SeaWiFS), Medium Resolution Imaging Spectrometer(MERIS), etc. Two-band and Three-band models based on band ratio such as Red and NIR band were generally used for the Chl-a in turbid waters. Two-band modes using Red and NIR bands of RapidEye image showed no significant results with $R^2$ 0.38. To enhance a band ratio between absorption and reflection peak, We used red-edge band(710 nm) of RapidEye image for Twoband and Three-band models. Red-RE Two-band and Red-RE-NIR Three-band reflectance model (with cubic equation) for the RapidEye image provided significance performances with $R^2$ 0.66 and 0.73, respectively. Their performance showed the 'Approximate Prediction' with RPD, 1.39 and 1.29 and RMSE, 24.8, 22.4, respectively. Another three-band model with quadratic equation showed similar performances to Red-RE two-band model. The findings in this study demonstrated that Two-band and Three-band reflectance models using a red-edge band can approximately estimate chlorophyll-a concentrations in a turbid river water using high-resolution satellite image. In the distribution map of estimated Chl-a concentrations, three-band model with cubic equation showed lower values than twoband model. In the further works, quantification and correction of spectral interferences caused by suspended sediments and colored dissolved organic matters will improve the accuracy of chlorophyll-a estimation in turbid waters.

Prediction of Maximal Oxygen Uptake Ages 18~34 Years (18~34 남성의 최대산소 섭취량 추정)

  • Jeon, Yoo-Joung;Im, Jae-Hyeng;Lee, Byung-Kun;Kim, Chang-Hwan;Kim, Byeong-Wan
    • 한국체육학회지인문사회과학편
    • /
    • v.51 no.3
    • /
    • pp.373-382
    • /
    • 2012
  • The purpose of this study is to predict VO2max with body index and submaximal metabolic responses. The subjects are consisted of 250 male aging from 18 to 34 and we separated them into two groups randomly; 179 for a sample, 71 for a cross-validation group. They went through maximal exercise testing with Bruce protocol, and we measured the metabolic responses in the end of the first(3 minute) and second stage(6 minute). To predict VO2max, we applied multiple regression analysis to the sample with stepwise method. Model 1's variables are weight, 6 minute HR and 6 minute VO2(R=0.64, SEE=4.74, CV=11.7%, p<.01), and the equation is VO2max(ml/kg/min)= 72.256-0.340(Weight)-0.220(6minHR)+0.013(6minVO2). Model 2's variables are weight, 6 minute HR, 6 minute VO2, and 6 minute VCO2(R=0.67, SEE=4.59, CV=11.3%, p<.01), and the equation is VO2max(ml/kg/min)= 68.699-0.277(Weight) -0.206(6minHR)+0.020(6minVO2)-0.009(6minVCO2). And the result did not show multicolinearity for both models. Model 2 demonstrated more correlation compared to Model 1. However, when we conducted cross-validation of those models with 71 men, measured VO2max and estimated VO2 Max had statistical significance with correlation (R=0.53, 0.56, P<.01). Although both models are functional with validity considering their simplicity and utility, Model 2 has more accuracy.

Comparison of Size Criteria in Mediastinal Lymph Node Involvement of Adenocarcinoma of Lungs (폐 선암의 종격동 림프절 전이에 있어서 림프절 크기 기준의 비교)

  • Gu, Ki-Seon;Kuk, Hiang;Koh, Hyeck-Jae;Yang, Sei-Hun;Jeong, Eun-Taik
    • Tuberculosis and Respiratory Diseases
    • /
    • v.46 no.4
    • /
    • pp.542-547
    • /
    • 1999
  • Background: Decision in mediastinal lymph node involvement of lung cancer by CT scan is very important and valuable for the treatment planning and prognosis prediction. In general, long diameter of mediastinal lymph node more than 15mm is used as criterion of lung cancer involvement. Adenocarci-noma has a tendency of early distant metastasis and micrometastasis, so adenocarcinoma may involve lymph node earlier and cannot be detected before lymph nodes are enlarged enough. The authors tried to determine the difference between two size criteria(15mm, 10mm) in adenocarcinoma for the detection of cancer involvement. Methods: Numbers of sample are 60 cases(male 46, female 14, median age: 61.5 years). According to pathology, squamous cancer 41, large cell cancer 2, adenocarcinoma 17. According to TNM stage, I 23, III 24, IIIA 13. Results : Mean long diameter of lymph node involvement is 16.0($\pm8.0$) mm in non-adenocarcinoma group, and that of adenocarcinoma group is 12.0($\pm3.2$) mm(p<0.05). If long diameter of lymph node larger than 15mm as involvement criterion is applied, sensitivity, specificity, positive predictive index, negative predictive index, accuracy of nonadenocarcinoma group are 54%, 100%, 100%, 83%, 86%, and those of adenocarcinoma group are 43%, 90%, 75%, 69%, 71%. If long diameter of lymph node larger than 10mm as involvement criterion is applied, sensitivity, specificity, positive predictive index. negative predictive index. accuracy of nonadenocarcinoma group are 65%, 77%, 61%, 92%, 79%, and those of adenocarcinoma group are 100%, 80%, 78%, 100%, 88%. Conclusion: Long diameter of lymph node larger than 10mm is more valuable criterion as lymph node involvement in adenocarcinoma of lungs.

  • PDF