• Title/Summary/Keyword: machine learning

Search Result 5,293, Processing Time 0.027 seconds

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

Traffic Anomaly Identification Using Multi-Class Support Vector Machine (다중 클래스 SVM을 이용한 트래픽의 이상패턴 검출)

  • Park, Young-Jae;Kim, Gye-Young;Jang, Seok-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.4
    • /
    • pp.1942-1950
    • /
    • 2013
  • This paper suggests a new method of detecting attacks of network traffic by visualizing original traffic data and applying multi-class SVM (support vector machine). The proposed method first generates 2D images from IP and ports of transmitters and receivers, and extracts linear patterns and high intensity values from the images, representing traffic attacks. It then obtains variance of ports of transmitters and receivers and extracts the number of clusters and entropy features using ISODATA algorithm. Finally, it determines through multi-class SVM if the traffic data contain DDoS, DoS, Internet worm, or port scans. Experimental results show that the suggested multi-class SVM-based algorithm can more effectively detect network traffic attacks.

Fault Diagnosis Method for Automatic Machine Using Artificial Neutral Network Based on DWT Power Spectral Density (인공신경망을 이용한 DWT 전력스펙트럼 밀도 기반 자동화 기계 고장 진단 기법)

  • Kang, Kyung-Won
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.2
    • /
    • pp.78-83
    • /
    • 2019
  • Sounds based machine fault diagnosis recovers all the studies that aim to detect automatically abnormal sound on machines using the acoustic emission by these machines. Conventional methods that use mathematical models have been found inaccurate because of the complexity of the industry machinery systems and the obvious existence of nonlinear factors such as noises. Therefore, any fault diagnosis issue can be treated as a pattern recognition problem. We propose here an automatic fault diagnosis method of hand drills using discrete wavelet transform(DWT) and pattern recognition techniques such as artificial neural networks(ANN). We first conduct a filtering analysis based on DWT. The power spectral density(PSD) is performed on the wavelet subband except for the highest and lowest low frequency subband. The PSD of the wavelet coefficients are extracted as our features for classifier based on ANN the pattern recognition part. The results show that the proposed method can be effectively used not only to detect defects but also to various automatic diagnosis system based on sound.

Development of an Image Tagging System Based on Crowdsourcing (크라우드소싱 기반 이미지 태깅 시스템 구축 연구)

  • Lee, Hyeyoung;Chang, Yunkeum
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.29 no.3
    • /
    • pp.297-320
    • /
    • 2018
  • This study aims to improve the access and retrieval of images and to find a way to effectively generate tags as a tool for providing explanation of images. To do this, this study investigated the features of human tagging and machine tagging, and compare and analyze them. Machine tags had the highest general attributes, some specific attributes and visual elements, and few abstract attributes. The general attribute of the human tag was the highest, but the specific attribute was high for the object and scene where the human tag constructor can recognize the name. In addition, sentiments and emotions, as well as subjects of abstract concepts, events, places, time, and relationships are represented by various tags. The tag set generated through this study can be used as basic data for constructing training data set to improve the machine learning algorithm.

Neural Machine translation specialized for Coronavirus Disease-19(COVID-19) (Coronavirus Disease-19(COVID-19)에 특화된 인공신경망 기계번역기)

  • Park, Chan-Jun;Kim, Kyeong-Hee;Park, Ki-Nam;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.7-13
    • /
    • 2020
  • With the recent World Health Organization (WHO) Declaration of Pandemic for Coronavirus Disease-19 (COVID-19), COVID-19 is a global concern and many deaths continue. To overcome this, there is an increasing need for sharing information between countries and countermeasures related to COVID-19. However, due to linguistic boundaries, smooth exchange and sharing of information has not been achieved. In this paper, we propose a Neural Machine Translation (NMT) model specialized for the COVID-19 domain. Centering on English, a Transformer based bidirectional model was produced for French, Spanish, German, Italian, Russian, and Chinese. Based on the BLEU score, the experimental results showed significant high performance in all language pairs compared to the commercialization system.

Research on Recent Quality Estimation (최신 기계번역 품질 예측 연구)

  • Eo, Sugyeong;Park, Chanjun;Moon, Hyeonseok;Seo, Jaehyung;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.37-44
    • /
    • 2021
  • Quality estimation (QE) can evaluate the quality of machine translation output even for those who do not know the target language, and its high utilization highlights the need for QE. QE shared task is held every year at Conference on Machine Translation (WMT), and recently, researches applying Pretrained Language Model (PLM) are mainly being conducted. In this paper, we conduct a survey on the QE task and research trends, and we summarize the features of PLM. In addition, we used a multilingual BART model that has not yet been utilized and performed comparative analysis with the existing studies such as XLM, multilingual BERT, and XLM-RoBERTa. As a result of the experiment, we confirmed which PLM was most effective when applied to QE, and saw the possibility of applying the multilingual BART model to the QE task.

Implementation of a Classification System for Dog Behaviors using YOLI-based Object Detection and a Node.js Server (YOLO 기반 개체 검출과 Node.js 서버를 이용한 반려견 행동 분류 시스템 구현)

  • Jo, Yong-Hwa;Lee, Hyuek-Jae;Kim, Young-Hun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.21 no.1
    • /
    • pp.29-37
    • /
    • 2020
  • This paper implements a method of extracting an object about a dog through real-time image analysis and classifying dog behaviors from the extracted images. The Darknet YOLO was used to detect dog objects, and the Teachable Machine provided by Google was used to classify behavior patterns from the extracted images. The trained Teachable Machine is saved in Google Drive and can be used by ml5.js implemented on a node.js server. By implementing an interactive web server using a socket.io module on the node.js server, the classified results are transmitted to the user's smart phone or PC in real time so that it can be checked anytime, anywhere.

Research on Subword Tokenization of Korean Neural Machine Translation and Proposal for Tokenization Method to Separate Jongsung from Syllables (한국어 인공신경망 기계번역의 서브 워드 분절 연구 및 음절 기반 종성 분리 토큰화 제안)

  • Eo, Sugyeong;Park, Chanjun;Moon, Hyeonseok;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.3
    • /
    • pp.1-7
    • /
    • 2021
  • Since Neural Machine Translation (NMT) uses only a limited number of words, there is a possibility that words that are not registered in the dictionary will be entered as input. The proposed method to alleviate this Out of Vocabulary (OOV) problem is Subword Tokenization, which is a methodology for constructing words by dividing sentences into subword units smaller than words. In this paper, we deal with general subword tokenization algorithms. Furthermore, in order to create a vocabulary that can handle the infinite conjugation of Korean adjectives and verbs, we propose a new methodology for subword tokenization training by separating the Jongsung(coda) from Korean syllables (consisting of Chosung-onset, Jungsung-neucleus and Jongsung-coda). As a result of the experiment, the methodology proposed in this paper outperforms the existing subword tokenization methodology.

A technique for predicting the cutting points of fish for the target weight using AI machine vision

  • Jang, Yong-hun;Lee, Myung-sub
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.4
    • /
    • pp.27-36
    • /
    • 2022
  • In this paper, to improve the conditions of the fish processing site, we propose a method to predict the cutting point of fish according to the target weight using AI machine vision. The proposed method performs image-based preprocessing by first photographing the top and front views of the input fish. Then, RANSAC(RANdom SAmple Consensus) is used to extract the fish contour line, and then 3D external information of the fish is obtained using 3D modeling. Next, machine learning is performed on the extracted three-dimensional feature information and measured weight information to generate a neural network model. Subsequently, the fish is cut at the cutting point predicted by the proposed technique, and then the weight of the cut piece is measured. We compared the measured weight with the target weight and evaluated the performance using evaluation methods such as MAE(Mean Absolute Error) and MRE(Mean Relative Error). The obtained results indicate that an average error rate of less than 3% was achieved in comparison to the target weight. The proposed technique is expected to contribute greatly to the development of the fishery industry in the future by being linked to the automation system.

Performance of Support Vector Machine for Classifying Land Cover in Optical Satellite Images: A Case Study in Delaware River Port Area

  • Ramayanti, Suci;Kim, Bong Chan;Park, Sungjae;Lee, Chang-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_4
    • /
    • pp.1911-1923
    • /
    • 2022
  • The availability of high-resolution satellite images provides precise information without direct observation of the research target. Korea Multi-Purpose Satellite (KOMPSAT), also known as the Arirang satellite, has been developed and utilized for earth observation. The machine learning model was continuously proven as a good classifier in classifying remotely sensed images. This study aimed to compare the performance of the support vector machine (SVM) model in classifying the land cover of the Delaware River port area on high and medium-resolution images. Three optical images, which are KOMPSAT-2, KOMPSAT-3A, and Sentinel-2B, were classified into six land cover classes, including water, road, vegetation, building, vacant, and shadow. The KOMPSAT images are provided by Korea Aerospace Research Institute (KARI), and the Sentinel-2B image was provided by the European Space Agency (ESA). The training samples were manually digitized for each land cover class and considered the reference image. The predicted images were compared to the actual data to obtain the accuracy assessment using a confusion matrix analysis. In addition, the time-consuming training and classifying were recorded to evaluate the model performance. The results showed that the KOMPSAT-3A image has the highest overall accuracy and followed by KOMPSAT-2 and Sentinel-2B results. On the contrary, the model took a long time to classify the higher-resolution image compared to the lower resolution. For that reason, we can conclude that the SVM model performed better in the higher resolution image with the consequence of the longer time-consuming training and classifying data. Thus, this finding might provide consideration for related researchers when selecting satellite imagery for effective and accurate image classification.