• Title/Summary/Keyword: Machine classification

Search Result 2,099, Processing Time 0.027 seconds

Research on the Production of Risk Maps on Cut Slope Using Weather Information and Adaboost Model (기상정보와 Adaboost 모델을 이용한 깎기비탈면 위험도 지도 개발 연구)

  • Woo, Yonghoon;Kim, Seung-Hyun;Kim, Jin uk;Park, GwangHae
    • The Journal of Engineering Geology
    • /
    • v.30 no.4
    • /
    • pp.663-671
    • /
    • 2020
  • Recently, there have been many natural disasters in Korea, not only in forest areas but also in urban areas, and the national requirements for them are increasing. In particular, there is no pre-disaster information system that can systematically manage the collapse of the slope of the national highway. In this study, big data analysis was conducted on the factors causing slope collapse based on the detailed investigation report on the slope collapse of national roads in Gangwon-do and Gyeongsang-do areas managed by the Cut Slope Management System (CSMS) and the basic survey of slope failures. Based on the analysis results, a slope collapse risk prediction model was established through Adaboost, a classification-based machine learning model, reflecting the collapse slope location and weather information. It also developed a visualization map for the risk of slope collapse, which is a visualization program, to show that it can be used for preemptive disaster prevention measures by identifying the risk of slope due to changes in weather conditions.

Data Analysis of Dropouts of University Students Using Topic Modeling (토픽모델링을 활용한 대학생의 중도탈락 데이터 분석)

  • Jeong, Do-Heon;Park, Ju-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.88-95
    • /
    • 2021
  • This study aims to provide implications for establishing support policies for students by empirically analyzing data on university students dropouts. To this end, data of students enrolled in D University after 2017 were sampled and collected. The collected data was analyzed using topic modeling(LDA: Latent Dirichlet Allocation) technique, which is a probabilistic model based on text mining. As a result of the study, it was found that topics that were characteristic of dropout students were found, and the classification performance between groups through topics was also excellent. Based on these results, a specific educational support system was proposed to prevent dropout of university students. This study is meaningful in that it shows the use of text mining techniques in the education field and suggests an education policy based on data analysis.

GAN System Using Noise for Image Generation (이미지 생성을 위해 노이즈를 이용한 GAN 시스템)

  • Bae, Sangjung;Kim, Mingyu;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.700-705
    • /
    • 2020
  • Generative adversarial networks are methods of generating images by opposing two neural networks. When generating the image, randomly generated noise is rearranged to generate the image. The image generated by this method is not generated well depending on the noise, and it is difficult to generate a proper image when the number of pixels of the image is small In addition, the speed and size of data accumulation in data classification increases, and there are many difficulties in labeling them. In this paper, to solve this problem, we propose a technique to generate noise based on random noise using real data. Since the proposed system generates an image based on the existing image, it is confirmed that it is possible to generate a more natural image, and if it is used for learning, it shows a higher hit rate than the existing method using the hostile neural network respectively.

A Study on Search Query Topics and Types using Topic Modeling and Principal Components Analysis (토픽모델링 및 주성분 분석 기반 검색 질의 유형 분류 연구)

  • Kang, Hyun-Ah;Lim, Heui-Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.6
    • /
    • pp.223-234
    • /
    • 2021
  • Recent advances in the 4th Industrial Revolution have accelerated the change of the shopping behavior from offline to online. Search queries show customers' information needs most intensively in online shopping. However, there are not many search query research in the field of search, and most of the prior research in the field of search query research has been studied on a limited topic and data-based basis based on researchers' qualitative judgment. To this end, this study defines the type of search query with data-based quantitative methodology by applying machine learning to search research query field to define the 15 topics of search query by conducting topic modeling based on search query and clicked document information. Furthermore, we present a new classification system of new search query types representing searching behavior characteristics by extracting key variables through principal component analysis and analyzing. The results of this study are expected to contribute to the establishment of effective search services and the development of search systems.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Stiffness Enhancement of Piecewise Integrated Composite Beam using 3D Training Data Set (3차원 학습 데이터를 이용한 PIC 보의 강성 향상에 대한 연구)

  • Ji, Seungmin;Ham, Seok Woo;Choi, Jin Kyung;Cheon, Seong S.
    • Composites Research
    • /
    • v.34 no.6
    • /
    • pp.394-399
    • /
    • 2021
  • Piecewise Integrated Composite (PIC) is a new concept to design composite structures of multiple stacking angles both for in-plane direction and through the thickness direction in order to improve stiffness and strength. In the present study, PIC beam was suggested based on 3D training data instead of 2D data, which did offer a limited behavior of beam characteristics, with enhancing the stiffness accompanied by reduced tip deformation. Generally training data were observed from the designated reference finite elements, and preliminary FE analysis was conducted with respect to regularly distributed reference elements. Also triaxiality values for each element were obtained in order to categorize the loading state, i.e. tensile, compressive or shear. The main FE analysis was conducted to predict the mechanical characteristics of the PIC beam.

An Artificial Neural Network-Based Drug Proarrhythmia Assessment Using Electrophysiological Characteristics of Cardiomyocytes (심근 세포의 전기생리학적 특징을 이용한 인공 신경망 기반 약물의 심장독성 평가)

  • Yoo, Yedam;Jeong, Da Un;Marcellinus, Aroli;Lim, Ki Moo
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.6
    • /
    • pp.287-294
    • /
    • 2021
  • Cardiotoxicity assessment of all drugs has been performed according to the ICH guidelines since 2005. Non-clinical evaluation S7B has focused on the hERG assay, which has a low specificity problem. The comprehensive in vitro proarrhythmia assay (CiPA) project was initiated to correct this problem, which presented a model for classifying the Torsade de pointes (TdP)-induced risk of drugs as biomarkers calculated through an in silico ventricular model. In this study, we propose a TdP-induced risk group classifier of artificial neural network (ANN)-based. The model was trained with 12 drugs and tested with 16 drugs. The ANN model was performed according to nine features, seven features, five features as an individual ANN model input, and the model with the highest performance was selected and compared with the classification performance of the qNet input logistic regression model. When the five features model was used, the results were AUC 0.93 in the high-risk group, AUC 0.73 in the intermediate-risk group, and 0.92 in the low-risk group. The model's performance using qNet was lower than the ANN model in the high-risk group by 17.6% and in the low-risk group by 29.5%. This study was able to express performance in the three risk groups, and it is a model that solved the problem of low specificity, which is the problem of hERG assay.

Stock News Dataset Quality Assessment by Evaluating the Data Distribution and the Sentiment Prediction

  • Alasmari, Eman;Hamdy, Mohamed;Alyoubi, Khaled H.;Alotaibi, Fahd Saleh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • This work provides a reliable and classified stocks dataset merged with Saudi stock news. This dataset allows researchers to analyze and better understand the realities, impacts, and relationships between stock news and stock fluctuations. The data were collected from the Saudi stock market via the Corporate News (CN) and Historical Data Stocks (HDS) datasets. As their names suggest, CN contains news, and HDS provides information concerning how stock values change over time. Both datasets cover the period from 2011 to 2019, have 30,098 rows, and have 16 variables-four of which they share and 12 of which differ. Therefore, the combined dataset presented here includes 30,098 published news pieces and information about stock fluctuations across nine years. Stock news polarity has been interpreted in various ways by native Arabic speakers associated with the stock domain. Therefore, this polarity was categorized manually based on Arabic semantics. As the Saudi stock market massively contributes to the international economy, this dataset is essential for stock investors and analyzers. The dataset has been prepared for educational and scientific purposes, motivated by the scarcity of data describing the impact of Saudi stock news on stock activities. It will, therefore, be useful across many sectors, including stock market analytics, data mining, statistics, machine learning, and deep learning. The data evaluation is applied by testing the data distribution of the categories and the sentiment prediction-the data distribution over classes and sentiment prediction accuracy. The results show that the data distribution of the polarity over sectors is considered a balanced distribution. The NB model is developed to evaluate the data quality based on sentiment classification, proving the data reliability by achieving 68% accuracy. So, the data evaluation results ensure dataset reliability, readiness, and high quality for any usage.

Pedestrian and Vehicle Distance Estimation Based on Hard Parameter Sharing (하드 파라미터 쉐어링 기반의 보행자 및 운송 수단 거리 추정)

  • Seo, Ji-Won;Cha, Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.389-395
    • /
    • 2022
  • Because of improvement of deep learning techniques, deep learning using computer vision such as classification, detection and segmentation has also been used widely at many fields. Expecially, automatic driving is one of the major fields that applies computer vision systems. Also there are a lot of works and researches to combine multiple tasks in a single network. In this study, we propose the network that predicts the individual depth of pedestrians and vehicles. Proposed model is constructed based on YOLOv3 for object detection and Monodepth for depth estimation, and it process object detection and depth estimation consequently using encoder and decoder based on hard parameter sharing. We also used attention module to improve the accuracy of both object detection and depth estimation. Depth is predicted with monocular image, and is trained using self-supervised training method.

Thread Block Scheduling for GPGPU based on Fine-Grained Resource Utilization (상세 자원 이용률에 기반한 병렬 가속기용 스레드 블록 스케줄링)

  • Bahn, Hyokyung;Cho, Kyungwoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.49-54
    • /
    • 2022
  • With the recent widespread adoption of general-purpose GPUs (GPGPUs) in cloud systems, maximizing the resource utilization through multitasking in GPGPU has become an important issue. In this article, we show that resource allocation based on the workload classification of computing-bound and memory-bound is not sufficient with respect to resource utilization, and present a new thread block scheduling policy for GPGPU that makes use of fine-grained resource utilizations of each workload. Unlike previous approaches, the proposed policy reduces scheduling overhead by separating profiling and scheduling, and maximizes resource utilizations by co-locating workloads with different bottleneck resources. Through simulations under various virtual machine scenarios, we show that the proposed policy improves the GPGPU throughput by 130.6% on average and up to 161.4%.