• Title/Summary/Keyword: Accuracy Statistics

Search Result 805, Processing Time 0.028 seconds

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Analysis of Land Cover Classification and Pattern Using Remote Sensing and Spatial Statistical Method - Focusing on the DMZ Region in Gangwon-Do - (원격탐사와 공간통계 기법을 이용한 토지피복 분류 및 패턴 분석 - 강원도 DMZ일원을 대상으로 -)

  • NA, Hyun-Sup;PARK, Jeong-Mook;LEE, Jung-Soo
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.18 no.4
    • /
    • pp.100-118
    • /
    • 2015
  • This study established a land-cover classification method on objects using satellite images, and figured out distributional patterns of land cover according to categories through spatial statistics techniques. Object-based classification generated each land cover classification map by spectral information, texture information, and the combination of the two. Through assessment of accuracy, we selected optimum land cover classification map. Also, to figure out spatial distribution pattern of land cover according to categories, we analyzed hot spots and quantified them. Optimal weight for an object-based classification has been selected as the Scale 52, Shape 0.4, Color 0.6, Compactness 0.5, Smoothness 0.5. In case of using the combination of spectral information and texture information, the land cover classification map showed the best overall classification accuracy. Particularly in case of dry fields, protected cultivation, and bare lands, the accuracy has increased about 12 percent more than when we used only spectral information. Forest, paddy fields, transportation facilities, grasslands, dry fields, bare lands, buildings, water and protected cultivation in order of the higher area ratio of DMZ according to categories. Particularly, dry field sand transportation facilities in Yanggu occurred mainly in north areas of the civilian control line. dry fields in Cheorwon, forest and transportation facilities in Inje fulfilled actively in south areas of the civilian control line. In case of distributional patterns according to categories, hot spot of paddy fields, dry fields and protected cultivation, which is related to agriculture, was distributed intensively in plains of Yanggu and in basin areas of Cheorwon. Hot spot areas of bare lands, waters, buildings and roads have similar distribution patterns with hot spot areas related to agriculture, while hot spot areas of bare lands, water, buildings and roads have different distributional patterns with hot spot areas of forest and grasslands.

Evaluation of the accuracy of two different surgical guides in dental implantology: stereolithography fabricated vs. positioning device fabricated surgical guides (제작방법에 따른 임플란트 수술 가이드의 정확성비교: stereolithography와 positioning device로 제작한 수술 가이드)

  • Kwon, Chang-Ryeol;Choi, Byung-Ho;Jeong, Seung-Mi;Joo, Sang-Dong
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.50 no.4
    • /
    • pp.271-278
    • /
    • 2012
  • Purpose: Recently implant surgical guides were used for accurate and atraumatic operation. In this study, the accuracy of two different types of surgical guides, positioning device fabricated and stereolithography fabricated surgical guides, were evaluated in four different types of tooth loss models. Materials and methods: Surgical guides were fabricated with stereolithography and positioning device respectively. Implants were placed on 40 models using the two different types of surgical guides. The fitness of the surgical guides was evaluated by measuring the gap between the surgical guide and the model. The accuracy of surgical guide was evaluated on a pre- and post-surgical CT image fusion. Results: The gap between the surgical guide and the model was $1.4{\pm}0.3mm$ and $0.4{\pm}0.3mm$ for the stereolithography and positioning device surgical guide, respectively. The stereolithography showed mesiodistal angular deviation of $3.9{\pm}1.6^{\circ}$, buccolingual angular deviation of $2.7{\pm}1.5^{\circ}$ and vertical deviation of $1.9{\pm}0.9mm$, whereas the positioning device showed mesiodistal angular deviation of $0.7{\pm}0.3^{\circ}$, buccolingual angular deviation of $0.3{\pm}0.2^{\circ}$ and vertical deviation of $0.4{\pm}0.2mm$. The differences were statistically significant between the two groups (P<.05). Conclusion: The laboratory fabricated surgical guides using a positioning device allow implant placement more accurately than the stereolithography surgical guides in dental clinic.

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Real-Time Video Quality Assessment of Video Communication Systems (비디오 통신 시스템의 실시간 비디오 품질 측정 방법)

  • Kim, Byoung-Yong;Lee, Seon-Oh;Jung, Kwang-Su;Sim, Dong-Gyu;Lee, Soo-Youn
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.3
    • /
    • pp.75-88
    • /
    • 2009
  • This paper presents a video quality assessment method based on quality degradation factors of real-time multimedia streaming services. The video quality degradation is caused by video source compression and network states. In this paper, we propose a blocky metric on an image domain to measure quality degradation by video compression. In this paper, the proposed boundary strength index for the blocky metric is defined by ratio of the variation of two pixel values adjacent to $8{\times}8$ block boundary and the average variation at several pixels adjacent to the two boundary pixels. On the other hand, unnatural image movement caused by network performance deterioration such as jitter and delay factors can be observed. In this paper, a temporal-Jerkiness measurement method is proposed by computing statistics of luminance differences between consecutive frames and play-time intervals between frames. The proposed final Perceptual Video Quality Metric (PVQM) is proposed by consolidating both blocking strength and temporal-jerkiness. To evaluate performance of the proposed algorithm, the accuracy of the proposed algorithm is compared with Difference of Mean Opinion Score (DMOS) based on human visual system.

A study on the classification systems of domestic security fields (국내 보안 분야의 분류 체계에 관한 연구)

  • Jeon, Jeong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.3
    • /
    • pp.81-88
    • /
    • 2015
  • Recently the Security fields is emerged as a important issue in the world, While a variety of techniques such as a Cloud Computing or a Internet Of Things appeared. In these circumstances, The domestic security fields are divided into the Information Security, the Physical Security and the Convergence Security. and among these security fields, Convergence security is attracted much attention from various industries. the classification systems of a new field Convergence Security has become a very important criteria such about the Statistics calculation, the Analysis of status industry sector and the Road maps. However, In the domestic, The related institutions classified each other differently the Convergence Security Classification. so it is urgently needed a domestic security fields systematic classification due to the problems such as lack of reliability of the accuracy, compatibility of a data. Therefore, this paper will be analyzed to the characteristics of the domestic security classification systems by the cases. and will be proposed the newly improved classification system, to be possible to addition or deletion of an classification entries, and to be easy expanded according to the new technology trends. this proposed to classification system is expected to be utilized as a basis for the construct of a domestic security classification system in a future.

Analysis of Vitamin E in Agricultural Processed Foods in Korea (국내 농산가공식품의 비타민 E 함량 분석)

  • Park, Yeaji;Sung, Jeehye;Choi, Youngmin;Kim, Youngwha;Kim, Myunghee;Jeong, Heon Sang;Lee, Junsoo
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.45 no.5
    • /
    • pp.771-777
    • /
    • 2016
  • Accurate food composition data are essential for calculation of nutrient intake of a population based on its consumption statistics. In the Korean food composition database, there is a lack of reliable analytical data for tocopherols and tocotrienols. Therefore, this study was conducted to provide information on contents on vitamin E in agricultural processed foods in Korea. Tocopherols and tocotrienols were determined by the saponification extraction method followed by high performance liquid chromatography. Analytical method validation parameters were calculated to ensure the method's validity. Samples were obtained in the years of 2013 and 2014 from the Rural Development Administration. The samples included 34 grains and grain products, 14 snacks, 25 fruits, 5 oils, and 11 sources and spices. All vitamin E isomers were quantitated, and the results were expressed as ${\alpha}$-tocopherol equivalent (${\alpha}-TE$). ${\alpha}-TE$ values of cereal and cereal products, snacks, fruits, oils and sauces and spices ranged from 0.03 to 17.53, 1.01 to 12.84, 0.01 to 1.52, 1.09 to 8.15, and 0.01 to $27.53{\alpha}-TE/100g$, respectively. Accuracy was close to 100% (n=3). Repeatability and reproducibility were 2.04% and 4.69%, respectively. Our study provides reliable data on the tocopherol and tocotrienol contents of agricultural and processed foods in Korea.

Groundwater-use Estimation Method Based on Field Monitoring Data in South Korea (실측 자료에 기반한 우리나라 지하수의 용도별 이용량 추정 방법)

  • Kim, Ji-Wook;Jun, Hyung-Pil;Lee, Chan-Jin;Kim, Nam-Ju;Kim, Gyoo-Bum
    • The Journal of Engineering Geology
    • /
    • v.23 no.4
    • /
    • pp.467-476
    • /
    • 2013
  • With increasing interest in environmental issues and the quality of surface water becoming inadequate for water supply, the Korean government has launched a groundwater development policy to satisfy the demand for clean water. To drive this policy effectively, it is essential to guarantee the accuracy of sustainable groundwater yield and groundwater use amount. In this study, groundwater use was monitored over several years at various locations in Korea (32 cities/counties in 5 provinces) to obtain accurate groundwater use data. Statistical analysis of the results was performed as a method for estimating rational groundwater use. For the case of groundwater use for living purposes, we classified the cities/counties into three regional types (urban, rural, and urban-rural complex) and divided the groundwater facilities into five types (domestic use, apartment housing, small-scale water supply, schools, and businesses) according to use. For the case of agricultural use, we defined three regional types based on rainfall intensity (average rainfall, below-average rainfall, and above-average rainfall) and the facilities into six types (rice farming, dry-field farming, floriculture, livestock-cows, livestock-pigs, and livestock-chickens). Finally, we developed groundwater-use estimation equations for each region and use type, using cluster analysis and regression model analysis of the monitoring data. The results will enhance the reliability of national groundwater statistics.