• Title/Summary/Keyword: Semi-Supervised learning

Search Result 150, Processing Time 0.022 seconds

Ethereum Phishing Scam Detection based on Graph Embedding and Semi-Supervised Learning (그래프 임베딩 및 준지도 기반의 이더리움 피싱 스캠 탐지)

  • Yoo-Young Cheong;Gyoung-Tae Kim;Dong-Hyuk Im
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.5
    • /
    • pp.165-170
    • /
    • 2023
  • With the recent rise of blockchain technology, cryptocurrency platforms using it are increasing, and currency transactions are being actively conducted. However, crimes that abuse the characteristics of cryptocurrency are also increasing, which is a problem. In particular, phishing scams account for more than a majority of Ethereum cybercrime and are considered a major security threat. Therefore, effective phishing scams detection methods are urgently needed. However, it is difficult to provide sufficient data for supervised learning due to the problem of data imbalance caused by the lack of phishing addresses labeled in the Ethereum participating account address. To address this, this paper proposes a phishing scams detection method that uses both Trans2vec, an effective graph embedding techique considering Ethereum transaction networks, and semi-supervised learning model Tri-training to make the most of not only labeled data but also unlabeled data.

A Study on Identification of Track Irregularity of High Speed Railway Track Using an SVM (SVM을 이용한 고속철도 궤도틀림 식별에 관한 연구)

  • Kim, Ki-Dong;Hwang, Soon-Hyun
    • Journal of Industrial Technology
    • /
    • v.33 no.A
    • /
    • pp.31-39
    • /
    • 2013
  • There are two methods to make a distinction of deterioration of high-speed railway track. One is that an administrator checks for each attribute value of track induction data represented in graph and determines whether maintenance is needed or not. The other is that an administrator checks for monthly trend of attribute value of the corresponding section and determines whether maintenance is needed or not. But these methods have a weak point that it takes longer times to make decisions as the amount of track induction data increases. As a field of artificial intelligence, the method that a computer makes a distinction of deterioration of high-speed railway track automatically is based on machine learning. Types of machine learning algorism are classified into four type: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. This research uses supervised learning that analogizes a separating function form training data. The method suggested in this research uses SVM classifier which is a main type of supervised learning and shows higher efficiency binary classification problem. and it grasps the difference between two groups of data and makes a distinction of deterioration of high-speed railway track.

  • PDF

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

A Branch-and-Bound Algorithm for Finding an Optimal Solution of Transductive Support Vector Machines (Transductive SVM을 위한 분지-한계 알고리즘)

  • Park Chan-Kyoo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.31 no.2
    • /
    • pp.69-85
    • /
    • 2006
  • Transductive Support Vector Machine(TSVM) is one of semi-supervised learning algorithms which exploit the domain structure of the whole data by considering labeled and unlabeled data together. Although it was proposed several years ago, there has been no efficient algorithm which can handle problems with more than hundreds of training examples. In this paper, we propose an efficient branch-and-bound algorithm which can solve large-scale TSVM problems with thousands of training examples. The proposed algorithm uses two bounding techniques: min-cut bound and reduced SVM bound. The min-cut bound is derived from a capacitated graph whose cuts represent a lower bound to the optimal objective function value of the dual problem. The reduced SVM bound is obtained by constructing the SVM problem with only labeled data. Experimental results show that the accuracy rate of TSVM can be significantly improved by learning from the optimal solution of TSVM, rather than an approximated solution.

Stock Trading Model using Portfolio Optimization and Forecasting Stock Price Movement (포트폴리오 최적화와 주가예측을 이용한 투자 모형)

  • Park, Kanghee;Shin, Hyunjung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.39 no.6
    • /
    • pp.535-545
    • /
    • 2013
  • The goal of stock investment is earning high rate or return with stability. To accomplish this goal, using a portfolio that distributes stocks with high rate of return with less variability and a stock price prediction model with high accuracy is required. In this paper, three methods are suggested to require these conditions. First of all, in portfolio re-balance part, Max-Return and Min-Risk (MRMR) model is suggested to earn the largest rate of return with stability. Secondly, Entering/Leaving Rule (E/L) is suggested to upgrade portfolio when particular stock's rate of return is low. Finally, to use outstanding stock price prediction model, a model based on Semi-Supervised Learning (SSL) which was suggested in last research was applied. The suggested methods were validated and applied on stocks which are listed in KOSPI200 from January 2007 to August 2008.

Deep learning-based post-disaster building inspection with channel-wise attention and semi-supervised learning

  • Wen Tang;Tarutal Ghosh Mondal;Rih-Teng Wu;Abhishek Subedi;Mohammad R. Jahanshahi
    • Smart Structures and Systems
    • /
    • v.31 no.4
    • /
    • pp.365-381
    • /
    • 2023
  • The existing vision-based techniques for inspection and condition assessment of civil infrastructure are mostly manual and consequently time-consuming, expensive, subjective, and risky. As a viable alternative, researchers in the past resorted to deep learning-based autonomous damage detection algorithms for expedited post-disaster reconnaissance of structures. Although a number of automatic damage detection algorithms have been proposed, the scarcity of labeled training data remains a major concern. To address this issue, this study proposed a semi-supervised learning (SSL) framework based on consistency regularization and cross-supervision. Image data from post-earthquake reconnaissance, that contains cracks, spalling, and exposed rebars are used to evaluate the proposed solution. Experiments are carried out under different data partition protocols, and it is shown that the proposed SSL method can make use of unlabeled images to enhance the segmentation performance when limited amount of ground truth labels are provided. This study also proposes DeepLab-AASPP and modified versions of U-Net++ based on channel-wise attention mechanism to better segment the components and damage areas from images of reinforced concrete buildings. The channel-wise attention mechanism can effectively improve the performance of the network by dynamically scaling the feature maps so that the networks can focus on more informative feature maps in the concatenation layer. The proposed DeepLab-AASPP achieves the best performance on component segmentation and damage state segmentation tasks with mIoU scores of 0.9850 and 0.7032, respectively. For crack, spalling, and rebar segmentation tasks, modified U-Net++ obtains the best performance with Igou scores (excluding the background pixels) of 0.5449, 0.9375, and 0.5018, respectively. The proposed architectures win the second place in IC-SHM2021 competition in all five tasks of Project 2.

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

  • William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.789-816
    • /
    • 2019
  • Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

Oil Price Forecasting Based on Machine Learning Techniques (기계학습기법에 기반한 국제 유가 예측 모델)

  • Park, Kang-Hee;Hou, Tianya;Shin, Hyun-Jung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.37 no.1
    • /
    • pp.64-73
    • /
    • 2011
  • Oil price prediction is an important issue for the regulators of the government and the related industries. When employing the time series techniques for prediction, however, it becomes difficult and challenging since the behavior of the series of oil prices is dominated by quantitatively unexplained irregular external factors, e.g., supply- or demand-side shocks, political conflicts specific to events in the Middle East, and direct or indirect influences from other global economical indices, etc. Identifying and quantifying the relationship between oil price and those external factors may provide more relevant prediction than attempting to unclose the underlying structure of the series itself. Technically, this implies the prediction is to be based on the vectoral data on the degrees of the relationship rather than the series data. This paper proposes a novel method for time series prediction of using Semi-Supervised Learning that was originally designed only for the vector types of data. First, several time series of oil prices and other economical indices are transformed into the multiple dimensional vectors by the various types of technical indicators and the diverse combination of the indicator-specific hyper-parameters. Then, to avoid the curse of dimensionality and redundancy among the dimensions, the wellknown feature extraction techniques, PCA and NLPCA, are employed. With the extracted features, a timepointspecific similarity matrix of oil prices and other economical indices is built and finally, Semi-Supervised Learning generates one-timepoint-ahead prediction. The series of crude oil prices of West Texas Intermediate (WTI) was used to verify the proposed method, and the experiments showed promising results : 0.86 of the average AUC.

Open set Object Detection combining Multi-branch Tree and ASSL (다중 분기 트리와 ASSL을 결합한 오픈 셋 물체 검출)

  • Shin, Dong-Kyun;Ahmed, Minhaz Uddin;Kim, JinWoo;Rhee, Phill-Kyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.5
    • /
    • pp.171-177
    • /
    • 2018
  • Recently there are many image datasets which has variety of data class and point to extract general features. But in order to this variety data class and point, deep learning model trained this dataset has not good performance in heterogeneous data feature local area. In this paper, we propose the structure which use sub-category and openset object detection methods to train more robust model, named multi-branch tree using ASSL. By using this structure, we can have more robust object detection deep learning model in heterogeneous data feature environment.

Learning Context Awareness Model based on User Feedback for Smart Home Service

  • Kwon, Seongcheol;Kim, Seyoung;Ryu, Kwang Ryel
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.7
    • /
    • pp.17-29
    • /
    • 2017
  • IRecently, researches on the recognition of indoor user situations through various sensors in a smart home environment are under way. In this paper, the case study was conducted to determine the operation of the robot vacuum cleaner by inferring the user 's indoor situation through the operation of home appliances, because the indoor situation greatly affects the operation of home appliances. In order to collect learning data for indoor situation awareness model learning, we received feedbacks from user when there was a mistake about the cleaning situation. In this paper, we propose a semi-supervised learning method using user feedback data. When we receive a user feedback, we search for the labels of unlabeled data that most fit the feedbacks collected through genetic algorithm, and use this data to learn the model. In order to verify the performance of the proposed algorithm, we performed a comparison experiments with other learning algorithms in the same environment and confirmed that the performance of the proposed algorithm is better than the other algorithms.