• Title/Summary/Keyword: supervised learning

Search Result 747, Processing Time 0.029 seconds

A Method for Region-Specific Anomaly Detection on Patch-wise Segmented PA Chest Radiograph (PA 흉부 X-선 영상 패치 분할에 의한 지역 특수성 이상 탐지 방법)

  • Hyun-bin Kim;Jun-Chul Chun
    • Journal of Internet Computing and Services
    • /
    • v.24 no.1
    • /
    • pp.49-59
    • /
    • 2023
  • Recently, attention to the pandemic situation represented by COVID-19 emerged problems caused by unexpected shortage of medical personnel. In this paper, we present a method for diagnosing the presence or absence of lesional sign on PA chest X-ray images as computer vision solution to support diagnosis tasks. Method for visual anomaly detection based on feature modeling can be also applied to X-ray images. With extracting feature vectors from PA chest X-ray images and divide to patch unit, region-specific abnormality can be detected. As preliminary experiment, we created simulation data set containing multiple objects and present results of the comparative experiments in this paper. We present method to improve both efficiency and performance of the process through hard masking of patch features to aligned images. By summing up regional specificity and global anomaly detection results, it shows improved performance by 0.069 AUROC compared to previous studies. By aggregating region-specific and global anomaly detection results, it shows improved performance by 0.069 AUROC compared to our last study.

A Study on Classification Models for Predicting Bankruptcy Based on XAI (XAI 기반 기업부도예측 분류모델 연구)

  • Jihong Kim;Nammee Moon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.333-340
    • /
    • 2023
  • Efficient prediction of corporate bankruptcy is an important part of making appropriate lending decisions for financial institutions and reducing loan default rates. In many studies, classification models using artificial intelligence technology have been used. In the financial industry, even if the performance of the new predictive models is excellent, it should be accompanied by an intuitive explanation of the basis on which the result was determined. Recently, the US, EU, and South Korea have commonly presented the right to request explanations of algorithms, so transparency in the use of AI in the financial sector must be secured. In this paper, an artificial intelligence-based interpretable classification prediction model was proposed using corporate bankruptcy data that was open to the outside world. First, data preprocessing, 5-fold cross-validation, etc. were performed, and classification performance was compared through optimization of 10 supervised learning classification models such as logistic regression, SVM, XGBoost, and LightGBM. As a result, LightGBM was confirmed as the best performance model, and SHAP, an explainable artificial intelligence technique, was applied to provide a post-explanation of the bankruptcy prediction process.

Extractiong mood metadata through sound effects of video (영상의 효과음을 통한 분위기 메타데이터 추출)

  • You, Yeon-Hwi;Park, Hyo-Gyeong;Yong, Sung-Jung;Lee, Seo-Young;Moon, Il-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.453-455
    • /
    • 2022
  • Metadata is data that explains attributes and features to the data as structured data. Among them, video metadata refers to data extracted from information constituting the video for accurate content-based search. Recently, as the number of users using video content increases, the number of OTT providers is also increasing, and the role of metadata is becoming more important for OTT providers to recommend a large amount of video content to individual users or to search appropriately. In this paper, a study was conducted on a method of automatically extracting metadata for mood attributes through sound effects of images. In order to classify the sound effect of the video and generate metadata about the attributes of the mood, I would like to propose a method of establishing a terminology dictionary for the mood and extracting information through supervised learning.

  • PDF

A Research on the Audio Utilization Method for Generating Movie Genre Metadata (영화 장르 메타데이터 생성을 위한 오디오 활용 방법에 대한 연구)

  • Yong, Sung-Jung;Park, Hyo-Gyeong;You, Yeon-Hwi;Moon, Il-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.284-286
    • /
    • 2021
  • With the continuous development of the Internet and digital, platforms are emerging to store large amounts of media data and provide customized services to individuals through online. Companies that provide these services recommend movies that suit their personal tastes to promote media consumption. Each company is doing a lot of research on various algorithms to recommend media that users prefer. Movies are divided into genres such as action, melodrama, horror, and drama, and the film's audio (music, sound effect, voice) is an important production element that makes up the film. In this research, based on movie trailers, we extract audio for each genre, check the commonalities of audio for each genre, distinguish movie genres through supervised learning of artificial intelligence, and propose a utilization method for generating metadata in the future.

  • PDF

A Research on Image Metadata Extraction through YCrCb Color Model Analysis for Media Hyper-personalization Recommendation (미디어 초개인화 추천을 위한 YCrCb 컬러 모델 분석을 통한 영상의 메타데이터 추출에 대한 연구)

  • Park, Hyo-Gyeong;Yong, Sung-Jung;You, Yeon-Hwi;Moon, Il-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.277-280
    • /
    • 2021
  • Recently as various contents are mass produced based on high accessibility, the media contents market is more active. Users want to find content that suits their taste, and each platform is competing for personalized recommendations for content. For an efficient recommendation system, high-quality metadata is required. Existing platforms take a method in which the user directly inputs the metadata of an image. This will waste time and money processing large amounts of data. In this paper, for media hyperpersonalization recommendation, keyframes are extracted based on the YCrCb color model of the video based on movie trailers, movie genres are distinguished through supervised learning of artificial intelligence and In the future, we would like to propose a utilization plan for generating metadata.

  • PDF

Predicting Crime Risky Area Using Machine Learning (머신러닝기반 범죄발생 위험지역 예측)

  • HEO, Sun-Young;KIM, Ju-Young;MOON, Tae-Heon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.64-80
    • /
    • 2018
  • In Korea, citizens can only know general information about crime. Thus it is difficult to know how much they are exposed to crime. If the police can predict the crime risky area, it will be possible to cope with the crime efficiently even though insufficient police and enforcement resources. However, there is no prediction system in Korea and the related researches are very much poor. From these backgrounds, the final goal of this study is to develop an automated crime prediction system. However, for the first step, we build a big data set which consists of local real crime information and urban physical or non-physical data. Then, we developed a crime prediction model through machine learning method. Finally, we assumed several possible scenarios and calculated the probability of crime and visualized the results in a map so as to increase the people's understanding. Among the factors affecting the crime occurrence revealed in previous and case studies, data was processed in the form of a big data for machine learning: real crime information, weather information (temperature, rainfall, wind speed, humidity, sunshine, insolation, snowfall, cloud cover) and local information (average building coverage, average floor area ratio, average building height, number of buildings, average appraised land value, average area of residential building, average number of ground floor). Among the supervised machine learning algorithms, the decision tree model, the random forest model, and the SVM model, which are known to be powerful and accurate in various fields were utilized to construct crime prevention model. As a result, decision tree model with the lowest RMSE was selected as an optimal prediction model. Based on this model, several scenarios were set for theft and violence cases which are the most frequent in the case city J, and the probability of crime was estimated by $250{\times}250m$ grid. As a result, we could find that the high crime risky area is occurring in three patterns in case city J. The probability of crime was divided into three classes and visualized in map by $250{\times}250m$ grid. Finally, we could develop a crime prediction model using machine learning algorithm and visualized the crime risky areas in a map which can recalculate the model and visualize the result simultaneously as time and urban conditions change.

A System for Automatic Classification of Traditional Culture Texts (전통문화 콘텐츠 표준체계를 활용한 자동 텍스트 분류 시스템)

  • Hur, YunA;Lee, DongYub;Kim, Kuekyeng;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.39-47
    • /
    • 2017
  • The Internet have increased the number of digital web documents related to the history and traditions of Korean Culture. However, users who search for creators or materials related to traditional cultures are not able to get the information they want and the results are not enough. Document classification is required to access this effective information. In the past, document classification has been difficult to manually and manually classify documents, but it has recently been difficult to spend a lot of time and money. Therefore, this paper develops an automatic text classification model of traditional cultural contents based on the data of the Korean information culture field composed of systematic classifications of traditional cultural contents. This study applied TF-IDF model, Bag-of-Words model, and TF-IDF/Bag-of-Words combined model to extract word frequencies for 'Korea Traditional Culture' data. And we developed the automatic text classification model of traditional cultural contents using Support Vector Machine classification algorithm.

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

A Detection Model using Labeling based on Inference and Unsupervised Learning Method (추론 및 비교사학습 기법 기반 레이블링을 적용한 탐지 모델)

  • Hong, Sung-Sam;Kim, Dong-Wook;Kim, Byungik;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.65-75
    • /
    • 2017
  • The Detection Model is the model to find the result of a certain purpose using artificial intelligent, data mining, intelligent algorithms In Cyber Security, it usually uses to detect intrusion, malwares, cyber incident, and attacks etc. There are an amount of unlabeled data that are collected in a real environment such as security data. Since the most of data are not defined the class labels, it is difficult to know type of data. Therefore, the label determination process is required to detect and analysis with accuracy. In this paper, we proposed a KDFL(K-means and D-S Fusion based Labeling) method using D-S inference and k-means(unsupervised) algorithms to decide label of data records by fusion, and a detection model architecture using a proposed labeling method. A proposed method has shown better performance on detection rate, accuracy, F1-measure index than other methods. In addition, since it has shown the improved results in error rate, we have verified good performance of our proposed method.

Abbreviation Disambiguation using Topic Modeling (토픽모델링을 이용한 약어 중의성 해소)

  • Woon-Kyo Lee;Ja-Hee Kim;Junki Yang
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.1
    • /
    • pp.35-44
    • /
    • 2023
  • In recent, there are many research cases that analyze trends or research trends with text analysis. When collecting documents by searching for keywords in abbreviations for data analysis, it is necessary to disambiguate abbreviations. In many studies, documents are classified by hand-work reading the data one by one to find the data necessary for the study. Most of the studies to disambiguate abbreviations are studies that clarify the meaning of words and use supervised learning. The previous method to disambiguate abbreviation is not suitable for classification studies of documents looking for research data from abbreviation search documents, and related studies are also insufficient. This paper proposes a method of semi-automatically classifying documents collected by abbreviations by going topic modeling with Non-Negative Matrix Factorization, an unsupervised learning method, in the data pre-processing step. To verify the proposed method, papers were collected from academic DB with the abbreviation 'MSA'. The proposed method found 316 papers related to Micro Services Architecture in 1,401 papers. The document classification accuracy of the proposed method was measured at 92.36%. It is expected that the proposed method can reduce the researcher's time and cost due to hand work.