• Title/Summary/Keyword: data value prediction

Search Result 1,088, Processing Time 0.026 seconds

Statistical Analysis of Protein Content in Wheat Germplasm Based on Near-infrared Reflectance Spectroscopy (밀 유전자원의 근적외선분광분석 예측모델에 의한 단백질 함량 변이분석)

  • Oh, Sejong;Choi, Yu Mi;Yoon, Hyemyeong;Lee, Sukyeung;Yoo, Eunae;Hyun, Do Yoon;Shin, Myoung-Jae;Lee, Myung Chul;Chae, Byungsoo
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.64 no.4
    • /
    • pp.353-365
    • /
    • 2019
  • A near-infrared reflectance spectroscopy (NIRS) prediction model was set to establish a rapid analysis system of wheat germplasm and provide statistical information on the characteristics of protein contents. The variability index value (VIV) of calibration resources was 0.80, the average protein content was 13.2%, and the content range was from 7.0% to 13.2%. After measuring the near-infrared spectra of calibration resources, the NIRS prediction model was developed through a regression analysis between protein content and spectra data, and then optimized by excluding outliers. The standard error of calibration, R2, and the slope of the optimized model were 0.132, 0.997, and 1.000 respectively, and those of external validation results were 0.994, 0.191, and 1.013, respectively. Based on these results, a developed NIRS model could be applied to the rapid analysis of protein in wheat. The distribution of NIRS protein content of 6,794 resources were analyzed using a normal distribution analysis. The VIV was 0.79, the average protein was 12.1%, and the content range of resources accounting for 42.1% and 68% of the total accessions were 10-13% and 9.5-14.6%, respectively. The composition of total resources was classified into breeding line (3,128), landrace (2,705), and variety (961). The VIV in breeding line was 0.80, the protein average was 11.8%, and the contents of 68% of total resources ranged from 9.2% to 14.5%. The VIV in landrace was 0.76, the protein average was 12.1%, and the content range of resources of 68% of total accessions was 9.8-14.4%. The VIV in variety was 0.80, the protein average was 12.8%, and the accessions representing 68% of total resources ranged from 10.2% to 15.4%. These results should be helpful to the related experts of wheat breeding.

Distribution and Potential Suitable Habitats of an Endemic Plant, Sophora koreensis in Korea (MaxEnt 분석을 통한 한반도 특산식물 개느삼 서식 가능지역 분석)

  • An, Jong-Bin;Sung, Chan Yong;Moon, Ae-Ra;Kim, Sodam;Jung, Ji-Young;Son, Sungwon;Shin, Hyun-Tak;Park, Wan-Geun
    • Korean Journal of Environment and Ecology
    • /
    • v.35 no.2
    • /
    • pp.154-163
    • /
    • 2021
  • This study was carried out to present the habitat distribution status and the habitat distribution prediction of Sophora koreensis, which is the Korean Endemic Plant included in the EN (Endangered) class of the IUCN Red List. The habit distribution survey of Sophora koreensis confirmed 19 habitats in Gangwon Province, including 13 habitats in Yanggu-gun, 3 habitats in Inje-gun, 2 habitats in Chuncheon-si, and 1 habitat in Hongcheon-gun. The northernmost habitat of Sophora koreensis in Korea was in Imdang-ri, Yanggu-gun; the easternmost habitat in Hangye-ri, Inje-gun; the westernmost habitat in Jinae-ri, Chuncheon-si; and the southernmost habitat in Sungdong-ri, Hongcheon-gun. The altitude of the Sophora koreensis habitats ranged from 169 to 711 m, with an average altitude of 375m. The area of the habitats was 8,000-734,000 m2, with an average area of 202,789 m2. Most habitats were the managed forests, such as thinning and pruning forests. The MaxEnt program analysis for the potential habitat of Sophora koreensis showed the AUC value of 0.9762. The predictive habitat distribution was Yanggu-gun, Inje-gun, Hwacheon-gun, and Chuncheon-si in Gangwon Province. The variables that influence the prediction of the habitat distribution were the annual precipitation, soil carbon content, and maximum monthly temperature. This study confirmed that habitats of Sophora koreensis were mostly found in the ridge area with rich light intensity. They can be used as basic data for the designation of protected areas of Sophora koreensis habitat.

Prediction of Life Expectancy for Terminally Ill Cancer Patients Based on Clinical Parameters (말기 암 환자에서 임상변수를 이용한 생존 기간 예측)

  • Yeom, Chang-Hwan;Choi, Youn-Seon;Hong, Young-Seon;Park, Yong-Gyu;Lee, Hye-Ree
    • Journal of Hospice and Palliative Care
    • /
    • v.5 no.2
    • /
    • pp.111-124
    • /
    • 2002
  • Purpose : Although the average life expectancy has increased due to advances in medicine, mortality due to cancer is on an increasing trend. Consequently, the number of terminally ill cancer patients is also on the rise. Predicting the survival period is an important issue in the treatment of terminally ill cancer patients since the choice of treatment would vary significantly by the patents, their families, and physicians according to the expected survival. Therefore, we investigated the prognostic factors for increased mortality risk in terminally ill cancer patients to help treat these patients by predicting the survival period. Methods : We investigated 31 clinical parameters in 157 terminally ill cancer patients admitted to in the Department of Family Medicine, National Health Insurance Corporation Ilsan Hospital between July 1, 2000 and August 31, 2001. We confirmed the patients' survival as of October 31, 2001 based on medical records and personal data. The survival rates and median survival times were estimated by the Kaplan-Meier method and Log-rank test was used to compare the differences between the survival rates according to each clinical parameter. Cox's proportional hazard model was used to determine the most predictive subset from the prognostic factors among many clinical parameters which affect the risk of death. We predicted the mean, median, the first quartile value and third quartile value of the expected lifetimes by Weibull proportional hazard regression model. Results : Out of 157 patients, 79 were male (50.3%). The mean age was $65.1{\pm}13.0$ years in males and was $64.3{\pm}13.7$ years in females. The most prevalent cancer was gastric cancer (36 patients, 22.9%), followed by lung cancer (27, 17.2%), and cervical cancer (20, 12.7%). The survival time decreased with to the following factors; mental change, anorexia, hypotension, poor performance status, leukocytosis, neutrophilia, elevated serum creatinine level, hypoalbuminemia, hyperbilirubinemia, elevated SGPT, prolonged prothrombin time (PT), prolonged activated partial thromboplastin time (aPTT), hyponatremia, and hyperkalemia. Among these factors, poor performance status, neutrophilia, prolonged PT and aPTT were significant prognostic factors of death risk in these patients according to the results of Cox's proportional hazard model. We predicted that the median life expectancy was 3.0 days when all of the above 4 factors were present, $5.7{\sim}8.2$ days when 3 of these 4 factors were present, $11.4{\sim}20.0$ days when 2 of the 4 were present, and $27.9{\sim}40.0$ when 1 of the 4 was present, and 77 days when none of these 4 factors were present. Conclusions : In terminally ill cancer patients, we found that the prognostic factors related to reduced survival time were poor performance status, neutrophilia, prolonged PT and prolonged am. The four prognostic factors enabled the prediction of life expectancy in terminally ill cancer patients.

  • PDF

Research about feature selection that use heuristic function (휴리스틱 함수를 이용한 feature selection에 관한 연구)

  • Hong, Seok-Mi;Jung, Kyung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.281-286
    • /
    • 2003
  • A large number of features are collected for problem solving in real life, but to utilize ail the features collected would be difficult. It is not so easy to collect of correct data about all features. In case it takes advantage of all collected data to learn, complicated learning model is created and good performance result can't get. Also exist interrelationships or hierarchical relations among the features. We can reduce feature's number analyzing relation among the features using heuristic knowledge or statistical method. Heuristic technique refers to learning through repetitive trial and errors and experience. Experts can approach to relevant problem domain through opinion collection process by experience. These properties can be utilized to reduce the number of feature used in learning. Experts generate a new feature (highly abstract) using raw data. This paper describes machine learning model that reduce the number of features used in learning using heuristic function and use abstracted feature by neural network's input value. We have applied this model to the win/lose prediction in pro-baseball games. The result shows the model mixing two techniques not only reduces the complexity of the neural network model but also significantly improves the classification accuracy than when neural network and heuristic model are used separately.

A study on the rock mass classification in boreholes for a tunnel design using machine learning algorithms (머신러닝 기법을 활용한 터널 설계 시 시추공 내 암반분류에 관한 연구)

  • Lee, Je-Kyum;Choi, Won-Hyuk;Kim, Yangkyun;Lee, Sean Seungwon
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.23 no.6
    • /
    • pp.469-484
    • /
    • 2021
  • Rock mass classification results have a great influence on construction schedule and budget as well as tunnel stability in tunnel design. A total of 3,526 tunnels have been constructed in Korea and the associated techniques in tunnel design and construction have been continuously developed, however, not many studies have been performed on how to assess rock mass quality and grade more accurately. Thus, numerous cases show big differences in the results according to inspectors' experience and judgement. Hence, this study aims to suggest a more reliable rock mass classification (RMR) model using machine learning algorithms, which is surging in availability, through the analyses based on various rock and rock mass information collected from boring investigations. For this, 11 learning parameters (depth, rock type, RQD, electrical resistivity, UCS, Vp, Vs, Young's modulus, unit weight, Poisson's ratio, RMR) from 13 local tunnel cases were selected, 337 learning data sets as well as 60 test data sets were prepared, and 6 machine learning algorithms (DT, SVM, ANN, PCA & ANN, RF, XGBoost) were tested for various hyperparameters for each algorithm. The results show that the mean absolute errors in RMR value from five algorithms except Decision Tree were less than 8 and a Support Vector Machine model is the best model. The applicability of the model, established through this study, was confirmed and this prediction model can be applied for more reliable rock mass classification when additional various data is continuously cumulated.

Spatial Distribution Patterns and Prediction of Hotspot Area for Endangered Herpetofauna Species in Korea (국내 멸종위기양서·파충류의 공간적 분포형태와 주요 분포지역 예측에 대한 연구)

  • Do, Min Seock;Lee, Jin-Won;Jang, Hoan-Jin;Kim, Dae-In;Park, Jinwoo;Yoo, Jeong-Chil
    • Korean Journal of Environment and Ecology
    • /
    • v.31 no.4
    • /
    • pp.381-396
    • /
    • 2017
  • Understanding species distribution plays an important role in conservation as well as evolutionary biology. In this study, we applied a species distribution model to predict hotspot areas and habitat characteristics for endangered herpetofauna species in South Korea: the Korean Crevice Salamander (Karsenia koreana), Suweon-tree frog (Hyla suweonensis), Gold-spotted pond frog (Pelophylax chosenicus), Narrow-mouthed toad (Kaloula borealis), Korean ratsnake (Elaphe schrenckii), Mongolian racerunner (Eremias argus), Reeve's turtle (Mauremys reevesii) and Soft-shelled turtle (Pelodiscus sinensis). The Kori salamander (Hynobius yangi) and Black-headed snake (Sibynophis chinensis) were excluded from the analysis due to insufficient sample size. The results showed that the altitude was the most important environmental variable for their distribution, and the altitude at which these species were distributed correlated with the climate of that region. The predicted distribution area derived from the species distribution modelling adequately reflected the observation site used in this study as well as those reported in preceding studies. The average AUC value of the eigh species was relatively high ($0.845{\pm}0.08$), while the average omission rate value was relatively low ($0.087{\pm}0.01$). Therefore, the species overlaying model created for the endangered species is considered successful. When merging the distribution models, it was shown that five species shared their habitats in the coastal areas of Gyeonggi-do and Chungcheongnam-do, which are the western regions of the Korean Peninsula. Therefore, we suggest that protection should be a high priority in these area, and our overall results may serve as essential and fundamental data for the conservation of endangered amphibian and reptiles in Korea.

A study on the use of a Business Intelligence system : the role of explanations (비즈니스 인텔리전스 시스템의 활용 방안에 관한 연구: 설명 기능을 중심으로)

  • Kwon, YoungOk
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.155-169
    • /
    • 2014
  • With the rapid advances in technologies, organizations are more likely to depend on information systems in their decision-making processes. Business Intelligence (BI) systems, in particular, have become a mainstay in dealing with complex problems in an organization, partly because a variety of advanced computational methods from statistics, machine learning, and artificial intelligence can be applied to solve business problems such as demand forecasting. In addition to the ability to analyze past and present trends, these predictive analytics capabilities provide huge value to an organization's ability to respond to change in markets, business risks, and customer trends. While the performance effects of BI system use in organization settings have been studied, it has been little discussed on the use of predictive analytics technologies embedded in BI systems for forecasting tasks. Thus, this study aims to find important factors that can help to take advantage of the benefits of advanced technologies of a BI system. More generally, a BI system can be viewed as an advisor, defined as the one that formulates judgments or recommends alternatives and communicates these to the person in the role of the judge, and the information generated by the BI system as advice that a decision maker (judge) can follow. Thus, we refer to the findings from the advice-giving and advice-taking literature, focusing on the role of explanations of the system in users' advice taking. It has been shown that advice discounting could occur when an advisor's reasoning or evidence justifying the advisor's decision is not available. However, the majority of current BI systems merely provide a number, which may influence decision makers in accepting the advice and inferring the quality of advice. We in this study explore the following key factors that can influence users' advice taking within the setting of a BI system: explanations on how the box-office grosses are predicted, types of advisor, i.e., system (data mining technique) or human-based business advice mechanisms such as prediction markets (aggregated human advice) and human advisors (individual human expert advice), users' evaluations of the provided advice, and individual differences in decision-makers. Each subject performs the following four tasks, by going through a series of display screens on the computer. First, given the information of the given movie such as director and genre, the subjects are asked to predict the opening weekend box office of the movie. Second, in light of the information generated by an advisor, the subjects are asked to adjust their original predictions, if they desire to do so. Third, they are asked to evaluate the value of the given information (e.g., perceived usefulness, trust, satisfaction). Lastly, a short survey is conducted to identify individual differences that may affect advice-taking. The results from the experiment show that subjects are more likely to follow system-generated advice than human advice when the advice is provided with an explanation. When the subjects as system users think the information provided by the system is useful, they are also more likely to take the advice. In addition, individual differences affect advice-taking. The subjects with more expertise on advisors or that tend to agree with others adjust their predictions, following the advice. On the other hand, the subjects with more knowledge on movies are less affected by the advice and their final decisions are close to their original predictions. The advances in predictive analytics of a BI system demonstrate a great potential to support increasingly complex business decisions. This study shows how the designs of a BI system can play a role in influencing users' acceptance of the system-generated advice, and the findings provide valuable insights on how to leverage the advanced predictive analytics of the BI system in an organization's forecasting practices.

Analysis of urine β2-microglobulin in pediatric renal disease (소아 신장질환에서 요 β2-microglobulin검사의 분석)

  • Kim, Dong Woon;Lim, In Seok
    • Clinical and Experimental Pediatrics
    • /
    • v.50 no.4
    • /
    • pp.369-375
    • /
    • 2007
  • Purpose : There have been numerous researches on urine ${\beta}_2$-microglobulin (${\beta}_2$-M) concerned with primary nephrotic syndrome and other glomerular diseases, but not much has been done in relation to pediatric age groups. Thus, our hospital decided to study the relations between the analysis of the test results we have conducted on pediatric patients and renal functions. Methods : Retrospective data analysis was done to 102 patients of ages 0 to 4 with renal diseases with symptoms such as hematuria, edema, and proteinuria who were admitted to Chung-Ang Yongsan Hospital and who participated in 24-hour urine and urine ${\beta}_2$-M excretion test between January of 2003 and January of 2006. Each disease was differentiated as independent variables, and the statistical difference of the results of urine ${\beta}_2$-M excretion of several groups of renal diseases was analyzed with student T-test by using test results as dependent variables. Results : Levels of urine ${\beta}_2$-M excretion of the 102 patients were as follows : 52 had primary nephrotic syndrome [MCNS (n=45, $72{\pm}45{\mu}g/g$ creatinine, ${\mu}g/g-Cr$), MPGN (n=3, $154{\pm}415{\mu}g/g-Cr$), FSGS (n=4, $188{\pm}46{\mu}g/-Cr$], six had APSGN ($93{\pm}404{\mu}g/g-Cr$), seven had IgA nephropathy ($3,414{\pm}106{\mu}g/g-Cr$), 9 had APN ($742{\pm}160{\mu}g/g-Cr$), 16 had cystitis ($179{\pm}168{\mu}g/g-Cr$), and 12 had HSP nephritis ($109{\pm}898{\mu}g/g-Cr$). IgA nephropathy (P<0.05) and APN (P<0.05) were significantly higher than in other renal diseases. Among primary nephrotic syndrome, FSGS with higher results of ${\beta}_2$-microglobulin test had longer treatment period (P<0.01) when compared to the lower groups, but no significant differences in Ccr, BUN, or Cr were observed. Conclusion : IgA nephropathy and APN groups showed significantly higher level of ${\beta}_2$-M excretion value than other groups. Although ${\beta}_2$-microglobulin value is not appropriate as an indicator of general renal function and pathology, it seems to be sufficient in the differential diagnosis of the UTI and in the prediction of the treat-ment period of nephrotic syndrome patients.

Observation of Ice Gradient in Cheonji, Baekdu Mountain Using Modified U-Net from Landsat -5/-7/-8 Images (Landsat 위성 영상으로부터 Modified U-Net을 이용한 백두산 천지 얼음변화도 관측)

  • Lee, Eu-Ru;Lee, Ha-Seong;Park, Sun-Cheon;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1691-1707
    • /
    • 2022
  • Cheonji Lake, the caldera of Baekdu Mountain, located on the border of the Korean Peninsula and China, alternates between melting and freezing seasonally. There is a magma chamber beneath Cheonji, and variations in the magma chamber cause volcanic antecedents such as changes in the temperature and water pressure of hot spring water. Consequently, there is an abnormal region in Cheonji where ice melts quicker than in other areas, freezes late even during the freezing period, and has a high-temperature water surface. The abnormal area is a discharge region for hot spring water, and its ice gradient may be used to monitor volcanic activity. However, due to geographical, political and spatial issues, periodic observation of abnormal regions of Cheonji is limited. In this study, the degree of ice change in the optimal region was quantified using a Landsat -5/-7/-8 optical satellite image and a Modified U-Net regression model. From January 22, 1985 to December 8, 2020, the Visible and Near Infrared (VNIR) band of 83 Landsat images including anomalous regions was utilized. Using the relative spectral reflectance of water and ice in the VNIR band, unique data were generated for quantitative ice variability monitoring. To preserve as much information as possible from the visible and near-infrared bands, ice gradient was noticed by applying it to U-Net with two encoders, achieving good prediction accuracy with a Root Mean Square Error (RMSE) of 140 and a correlation value of 0.9968. Since the ice change value can be seen with high precision from Landsat images using Modified U-Net in the future may be utilized as one of the methods to monitor Baekdu Mountain's volcanic activity, and a more specific volcano monitoring system can be built.

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.