• Title/Summary/Keyword: 정확도 및 범주별 통계

Search Result 7, Processing Time 0.025 seconds

Automatic Classification by Land Use Category of National Level LULUCF Sector using Deep Learning Model (딥러닝모델을 이용한 국가수준 LULUCF 분야 토지이용 범주별 자동화 분류)

  • Park, Jeong Mook;Sim, Woo Dam;Lee, Jung Soo
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_2
    • /
    • pp.1053-1065
    • /
    • 2019
  • Land use statistics calculation is very informative data as the activity data for calculating exact carbon absorption and emission in post-2020. To effective interpretation by land use category, This study classify automatically image interpretation by land use category applying forest aerial photography (FAP) to deep learning model and calculate national unit statistics. Dataset (DS) applied deep learning is divided into training dataset (training DS) and test dataset (test DS) by extracting image of FAP based national forest resource inventory permanent sample plot location. Training DS give label to image by definition of land use category and learn and verify deep learning model. When verified deep learning model, training accuracy of model is highest at epoch 1,500 with about 89%. As a result of applying the trained deep learning model to test DS, interpretation classification accuracy of image label was about 90%. When the estimating area of classification by category using sampling method and compare to national statistics, consistency also very high, so it judged that it is enough to be used for activity data of national GHG (Greenhouse Gas) inventory report of LULUCF sector in the future.

Predicate-based Question Analysis for Korean Question-Answering System (질의응답 시스템을 위한 술어정보 기반 질의분석)

  • Kim, Won-Nam;Shin, Seung-Eun;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2004.10d
    • /
    • pp.296-300
    • /
    • 2004
  • 질의 응답 시스템이 정확한 정답을 제시하기 위해서는 사용자가 요구하는 정답의 유형을 결정할 필요가 있다. 질의분석의 일반적인 접근법으로는 의문사 정보, 규칙 그리고 통계 정보에 기반한 방법들이 있다. 본 논문에서는 술어정보를 이용한 질의분석을 제안한다. 먼저 의문사 정보를 이용하여 상위정답유형을 결정하고 질의문의 술어 정보와 구문 구조 정보를 이용하여 초점단어(focus word)를 추출한다. 초점단어란 정답유형을 결정하는데 단서가 되는 단어로써, 추출된 초점단어에 의해 75개의 하위정답유형 중 하나가 결정된다. 실험에 앞서 정답 유형별로 6개의 상위범주와 75개의 하위범주를 정의하였으며, 실험에는 학습 데이터의 일부와 일반 Web에서 수집한 테스트 데이터가 사용되었다. 실험결과 상위범주는 97.6%, 하위범주는 77.8%의 정확도를 보였으며 초점단어는 92.5%의 정확도를 보였다.

  • PDF

A Statistical Model for Decisions on Arrest Warrants (구속영장발부 여부에 관한 통계모형)

  • Kim, Jung-Hun;Lee, Na-Rae;Lee, Gye-Min
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1225-1234
    • /
    • 2010
  • When most examining judges deny the request for an arrest warrant, they cite (as a reason) that there is no worry about escape or the destruction of evidence. Consequently, there has been no knowing what characteristics of a crime mainly affect the decision for an arrest warrant and there has been significant dispute about the exact decision criteria used for an arrest warrant. This paper classified the data about the request of arrest warrants for crimes committed in the jurisdiction of the Jinju Public Prosecutors' Office in 2006, 2007 and 2008, into 7 categories according to characteristics of the crimes. For each category we construct a statistical model about the decision on arrest warrants by applying a crosstabulation analysis in order to look for the characteristic of crime that affect the decision for an arrest warrant.

Vulnerability Assessment for Fine Particulate Matter (PM2.5) in the Schools of the Seoul Metropolitan Area, Korea: Part II - Vulnerability Assessment for PM2.5 in the Schools (인공지능을 이용한 수도권 학교 미세먼지 취약성 평가: Part II - 학교 미세먼지 범주화)

  • Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_2
    • /
    • pp.1891-1900
    • /
    • 2021
  • Fine particulate matter (FPM; diameter ≤ 2.5 ㎛) is frequently found in metropolitan areas due to activities associated with rapid urbanization and population growth. Many adolescents spend a substantial amount of time at school where, for various reasons, FPM generated outdoors may flow into indoor areas. The aims of this study were to estimate FPM concentrations and categorize types of FPM in schools. Meteorological and chemical variables as well as satellite-based aerosol optical depth were analyzed as input data in a random forest model, which applied 10-fold cross validation and a grid-search method, to estimate school FPM concentrations, with four statistical indicators used to evaluate accuracy. Loose and strict standards were established to categorize types of FPM in schools. Under the former classification scheme, FPM in most schools was classified as type 2 or 3, whereas under strict standards, school FPM was mostly classified as type 3 or 4.

Improvement and Validation of Convective Rainfall Rate Retrieved from Visible and Infrared Image Bands of the COMS Satellite (COMS 위성의 가시 및 적외 영상 채널로부터 복원된 대류운의 강우강도 향상과 검증)

  • Moon, Yun Seob;Lee, Kangyeol
    • Journal of the Korean earth science society
    • /
    • v.37 no.7
    • /
    • pp.420-433
    • /
    • 2016
  • The purpose of this study is to improve the calibration matrixes of 2-D and 3-D convective rainfall rates (CRR) using the brightness temperature of the infrared $10.8{\mu}m$ channel (IR), the difference of brightness temperatures between infrared $10.8{\mu}m$ and vapor $6.7{\mu}m$ channels (IR-WV), and the normalized reflectance of the visible channel (VIS) from the COMS satellite and rainfall rate from the weather radar for the period of 75 rainy days from April 22, 2011 to October 22, 2011 in Korea. Especially, the rainfall rate data of the weather radar are used to validate the new 2-D and 3-DCRR calibration matrixes suitable for the Korean peninsula for the period of 24 rainy days in 2011. The 2D and 3D calibration matrixes provide the basic and maximum CRR values ($mm\;h^{-1}$) by multiplying the rain probability matrix, which is calculated by using the number of rainy and no-rainy pixels with associated 2-D (IR, IR-WV) and 3-D (IR, IR-WV, VIS) matrixes, by the mean and maximum rainfall rate matrixes, respectively, which is calculated by dividing the accumulated rainfall rate by the number of rainy pixels and by the product of the maximum rain rate for the calibration period by the number of rain occurrences. Finally, new 2-D and 3-D CRR calibration matrixes are obtained experimentally from the regression analysis of both basic and maximum rainfall rate matrixes. As a result, an area of rainfall rate more than 10 mm/h is magnified in the new ones as well as CRR is shown in lower class ranges in matrixes between IR brightness temperature and IR-WV brightness temperature difference than the existing ones. Accuracy and categorical statistics are computed for the data of CRR events occurred during the given period. The mean error (ME), mean absolute error (MAE), and root mean squire error (RMSE) in new 2-D and 3-D CRR calibrations led to smaller than in the existing ones, where false alarm ratio had decreased, probability of detection had increased a bit, and critical success index scores had improved. To take into account the strong rainfall rate in the weather events such as thunderstorms and typhoon, a moisture correction factor is corrected. This factor is defined as the product of the total precipitable waterby the relative humidity (PW RH), a mean value between surface and 500 hPa level, obtained from a numerical model or the COMS retrieval data. In this study, when the IR cloud top brightness temperature is lower than 210 K and the relative humidity is greater than 40%, the moisture correction factor is empirically scaled from 1.0 to 2.0 basing on PW RH values. Consequently, in applying to this factor in new 2D and 2D CRR calibrations, the ME, MAE, and RMSE are smaller than the new ones.

Drinking Pattern and Nonfatal Injuries of Adults in Korea (성인에서 AUDIT와 손상의 연관성)

  • Yoo, In-Sook;Choi, Eun-Mi;Kwon, Ho-Jang;Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.4
    • /
    • pp.1690-1698
    • /
    • 2012
  • As alcohol use is one of the most important risk factors for injuries, this study was intended to clarify and evaluate any relationship between drinking patterns and the incidence rates/specific characteristics of injuries in adult populations, using a widely accepted tool, the Alcohol Use Disorders Identification Test (chronic alcohol drinking behaviors measurement, hereinafter the AUDIT) developed by the World Health Organization to help to assess the behaviors in a more accurate and reliable manner. This study used the data collected from the 2009 Korea National Health and Nutrition Examination Survey (KNHANES), in which 7,511 of 7,893 adult participants aged ${\geq}19$ years answered the questions about injuries, and excluding 104 non-respondents, 6,258 of participants in the questionnaire survey of drinking patterns were finally analyzed. The incidence rates and specific characteristics of injuries as classified by the AUDIT categories (i.e., body regions, types and mechanisms) were assessed and estimated in terms of their relative risk using t-test, ANOVA, and logistic regression. SPSS 19.0 statistical package software was employed for statistical analyses. These analyses indicate that the incidence rates of overall injuries were significantly higher in male respondents than in female respondents. The risks of alcohol use related injuries were 8.3 times higher in male respondents than in female ones. Regarding educational background, high school graduates showed the highest rates in the AUDIT with significant difference from the other groups. The married group and the group of respondents having monthly income estimated at KRW 2.01 to 3 million also showed the highest rates in the AUDIT compared to the other groups, indicating statistically significant difference. Significantly increased in problematic drinkers and those with alcohol dependence, the incidence rate of injuries body regions was 0.0371 in the head/neck, and with respect to the AUDIT and the mechanisms of external causes of injuries, transport accidents ranked first, followed by slippage, others, crash and fall. In regard to the classified types of injuries, it was statistically significant in others (e.g., laceration, contusion, addiction, or penetrating wound). In conclusion, the mechanisms of external causes of injuries as well as injuries attributed to alcohol use are very important, and a strategy is required to reduce such the injuries in the manner of decreasing the frequency of drinking after motivation by professional counsellors.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.