• Title/Summary/Keyword: Classification accuracy

Search Result 3,065, Processing Time 0.032 seconds

A study on Digital Agriculture Data Curation Service Plan for Digital Agriculture

  • Lee, Hyunjo;Cho, Han-Jin;Chae, Cheol-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.171-177
    • /
    • 2022
  • In this paper, we propose a service method that can provide insight into multi-source agricultural data, way to cluster environmental factor which supports data analysis according to time flow, and curate crop environmental factors. The proposed curation service consists of four steps: collection, preprocessing, storage, and analysis. First, in the collection step, the service system collects and organizes multi-source agricultural data by using an OpenAPI-based web crawler. Second, in the preprocessing step, the system performs data smoothing to reduce the data measurement errors. Here, we adopt the smoothing method for each type of facility in consideration of the error rate according to facility characteristics such as greenhouses and open fields. Third, in the storage step, an agricultural data integration schema and Hadoop HDFS-based storage structure are proposed for large-scale agricultural data. Finally, in the analysis step, the service system performs DTW-based time series classification in consideration of the characteristics of agricultural digital data. Through the DTW-based classification, the accuracy of prediction results is improved by reflecting the characteristics of time series data without any loss. As a future work, we plan to implement the proposed service method and apply it to the smart farm greenhouse for testing and verification.

Cyber attack group classification based on MITRE ATT&CK model (MITRE ATT&CK 모델을 이용한 사이버 공격 그룹 분류)

  • Choi, Chang-hee;Shin, Chan-ho;Shin, Sung-uk
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.1-13
    • /
    • 2022
  • As the information and communication environment develops, the environment of military facilities is also development remarkably. In proportion to this, cyber threats are also increasing, and in particular, APT attacks, which are difficult to prevent with existing signature-based cyber defense systems, are frequently targeting military and national infrastructure. It is important to identify attack groups for appropriate response, but it is very difficult to identify them due to the nature of cyber attacks conducted in secret using methods such as anti-forensics. In the past, after an attack was detected, a security expert had to perform high-level analysis for a long time based on the large amount of evidence collected to get a clue about the attack group. To solve this problem, in this paper, we proposed an automation technique that can classify an attack group within a short time after detection. In case of APT attacks, compared to general cyber attacks, the number of attacks is small, there is not much known data, and it is designed to bypass signature-based cyber defense techniques. As an attack model, we used MITRE ATT&CK® which modeled many parts of cyber attacks. We design an impact score considering the versatility of the attack techniques and proposed a group similarity score based on this. Experimental results show that the proposed method classified the attack group with a 72.62% probability based on Top-5 accuracy.

A study on the selection of evapotranspiration observatory representative location in Chuncheon Dam basin (증발산량 관측 대표위치 선정에 관한 연구: 춘천댐 유역을 중심으로)

  • Park, Jaegon;Kim, Kiyoung;Lee, Yongjun;Hwag-Bo, Jong Gu
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.11
    • /
    • pp.979-989
    • /
    • 2022
  • In hydrological surveys, observation through representative location is essential due to temporal and spatial limitations and constraints. Regarding the use of hydrological data and the accuracy of the data, there are still insufficient observatories to be used in a specific watershed. In addition, since there is virtually no standard for the location of the current evapotranspiration, this study proposes a method for determining the location of the evapotranspiration. To determining the location of evapotranspiration, a grid is selected in consideration of the operating range of the Flux Tower using the eddy covariance measurement method, which is mainly used to measure evapotranspiration. The grid of representative location was calculated using the factors affecting evapotranspiration and satellite data of evapotranspiration. The grid of representative location was classified as good, fair, and poor. As a result, the number of good grids calculated was 54. It is judged that the classification of the grid has been achieved regarding topography and land use as a characteristic that appeared in the classification of the grid. In particular, in the case of elevation or city area, there was a large deviation, and the calculated good grid was judged to be a group between the two distributions.

A Study on Deep Learning Model for Discrimination of Illegal Financial Advertisements on the Internet

  • Kil-Sang Yoo; Jin-Hee Jang;Seong-Ju Kim;Kwang-Yong Gim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.21-30
    • /
    • 2023
  • The study proposes a model that utilizes Python-based deep learning text classification techniques to detect the legality of illegal financial advertising posts on the internet. These posts aim to promote unlawful financial activities, including the trading of bank accounts, credit card fraud, cashing out through mobile payments, and the sale of personal credit information. Despite the efforts of financial regulatory authorities, the prevalence of illegal financial activities persists. By applying this proposed model, the intention is to aid in identifying and detecting illicit content in internet-based illegal financial advertisining, thus contributing to the ongoing efforts to combat such activities. The study utilizes convolutional neural networks(CNN) and recurrent neural networks(RNN, LSTM, GRU), which are commonly used text classification techniques. The raw data for the model is based on manually confirmed regulatory judgments. By adjusting the hyperparameters of the Korean natural language processing and deep learning models, the study has achieved an optimized model with the best performance. This research holds significant meaning as it presents a deep learning model for discerning internet illegal financial advertising, which has not been previously explored. Additionally, with an accuracy range of 91.3% to 93.4% in a deep learning model, there is a hopeful anticipation for the practical application of this model in the task of detecting illicit financial advertisements, ultimately contributing to the eradication of such unlawful financial advertisements.

3D Film Image Inspection Based on the Width of Optimized Height of Histogram (히스토그램의 최적 높이의 폭에 기반한 3차원 필름 영상 검사)

  • Jae-Eun Lee;Jong-Nam Kim
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.2
    • /
    • pp.107-114
    • /
    • 2022
  • In order to classify 3D film images as right or wrong, it is necessary to detect the pattern in a 3D film image. However, if the contrast of the pixels in the 3D film image is low, it is not easy to classify as the right and wrong 3D film images because the pattern in the image might not be clear. In this paper, we propose a method of classifying 3D film images as right or wrong by comparing the width at a specific frequency of each histogram after obtaining the histogram. Since, it is classified using the width of the histogram, the analysis process is not complicated. From the experiment, the histograms of right and wrong 3D film images were distinctly different, and the proposed algorithm reflects these features, and showed that all 3D film images were accurately classified at a specific frequency of the histogram. The performance of the proposed algorithm was verified to be the best through the comparison test with the other methods such as image subtraction, otsu thresholding, canny edge detection, morphological geodesic active contour, and support vector machines, and it was shown that excellent classification accuracy could be obtained without detecting the patterns in 3D film images.

Abbreviation Disambiguation using Topic Modeling (토픽모델링을 이용한 약어 중의성 해소)

  • Woon-Kyo Lee;Ja-Hee Kim;Junki Yang
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.1
    • /
    • pp.35-44
    • /
    • 2023
  • In recent, there are many research cases that analyze trends or research trends with text analysis. When collecting documents by searching for keywords in abbreviations for data analysis, it is necessary to disambiguate abbreviations. In many studies, documents are classified by hand-work reading the data one by one to find the data necessary for the study. Most of the studies to disambiguate abbreviations are studies that clarify the meaning of words and use supervised learning. The previous method to disambiguate abbreviation is not suitable for classification studies of documents looking for research data from abbreviation search documents, and related studies are also insufficient. This paper proposes a method of semi-automatically classifying documents collected by abbreviations by going topic modeling with Non-Negative Matrix Factorization, an unsupervised learning method, in the data pre-processing step. To verify the proposed method, papers were collected from academic DB with the abbreviation 'MSA'. The proposed method found 316 papers related to Micro Services Architecture in 1,401 papers. The document classification accuracy of the proposed method was measured at 92.36%. It is expected that the proposed method can reduce the researcher's time and cost due to hand work.

Prediction of Safety Grade of Bridges Using the Classification Models of Decision Tree and Random Forest (의사결정나무 및 랜덤포레스트 분류 모델을 이용한 교량 안전등급 예측)

  • Hong, Jisu;Jeon, Se-Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.397-411
    • /
    • 2023
  • The number of deteriorated bridges with a service period of more than 30 years has been rapidly increasing in Korea. Accordingly, the importance of advanced maintenance technologies through the predictions of age-induced deterioration degree, condition, and performance of bridges is more and more noticed. The prediction method of the safety grade of bridges was proposed in this study using the classification models of the Decision Tree and the Random Forest based on machine learning. As a result of analyzing these models for the 8,850 bridges located in national roads with various evaluation indexes such as confusion matrix, balanced accuracy, recall, ROC curve, and AUC, the Random Forest largely showed better predictive performance than that of the Decision Tree. In particular, random under-sampling in the Random Forest showed higher predictive performance than that of other sampling techniques for the C and D grade bridges, with the recall of 83.4%, which need more attention to maintenance because of the significant deterioration degree. The proposed model can be usefully applied to rapidly identify the safety grade and to establish an efficient and economical maintenance plan of bridges that have not recently been inspected.

A Deep Learning-based Depression Trend Analysis of Korean on Social Media (딥러닝 기반 소셜미디어 한글 텍스트 우울 경향 분석)

  • Park, Seojeong;Lee, Soobin;Kim, Woo Jung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.91-117
    • /
    • 2022
  • The number of depressed patients in Korea and around the world is rapidly increasing every year. However, most of the mentally ill patients are not aware that they are suffering from the disease, so adequate treatment is not being performed. If depressive symptoms are neglected, it can lead to suicide, anxiety, and other psychological problems. Therefore, early detection and treatment of depression are very important in improving mental health. To improve this problem, this study presented a deep learning-based depression tendency model using Korean social media text. After collecting data from Naver KonwledgeiN, Naver Blog, Hidoc, and Twitter, DSM-5 major depressive disorder diagnosis criteria were used to classify and annotate classes according to the number of depressive symptoms. Afterwards, TF-IDF analysis and simultaneous word analysis were performed to examine the characteristics of each class of the corpus constructed. In addition, word embedding, dictionary-based sentiment analysis, and LDA topic modeling were performed to generate a depression tendency classification model using various text features. Through this, the embedded text, sentiment score, and topic number for each document were calculated and used as text features. As a result, it was confirmed that the highest accuracy rate of 83.28% was achieved when the depression tendency was classified based on the KorBERT algorithm by combining both the emotional score and the topic of the document with the embedded text. This study establishes a classification model for Korean depression trends with improved performance using various text features, and detects potential depressive patients early among Korean online community users, enabling rapid treatment and prevention, thereby enabling the mental health of Korean society. It is significant in that it can help in promotion.

A Study on the Measurement of Respiratory Rate Using Image Alignment and Statistical Pattern Classification (영상 정합 및 통계학적 패턴 분류를 이용한 호흡률 측정에 관한 연구)

  • Moon, Sujin;Lee, Eui Chul
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.10
    • /
    • pp.63-70
    • /
    • 2018
  • Biomedical signal measurement technology using images has been developed, and researches on respiration signal measurement technology for maintaining life have been continuously carried out. The existing technology measured respiratory signals through a thermal imaging camera that measures heat emitted from a person's body. In addition, research was conducted to measure respiration rate by analyzing human chest movement in real time. However, the image processing using the infrared thermal image may be difficult to detect the respiratory organ due to the external environmental factors (temperature change, noise, etc.), and thus the accuracy of the measurement of the respiration rate is low.In this study, the images were acquired using visible light and infrared thermal camera to enhance the area of the respiratory tract. Then, based on the two images, features of the respiratory tract region are extracted through processes such as face recognition and image matching. The pattern of the respiratory signal is classified through the k-nearest neighbor classifier, which is one of the statistical classification methods. The respiration rate was calculated according to the characteristics of the classified patterns and the possibility of breathing rate measurement was verified by analyzing the measured respiration rate with the actual respiration rate.

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.