• Title/Summary/Keyword: Classification accuracy

Search Result 3,065, Processing Time 0.03 seconds

An Approach for Determining Propensities of Blog Networks (블로그 연결망의 성향 판정 방안)

  • Yoon, Seok-Ho;Park, Sun-Ju;Kim, Sang-Wook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.3
    • /
    • pp.178-188
    • /
    • 2009
  • A blog is a personal website where its owner publishes his/her articles for others. A blog can have relationships with other blogs. In this paper, we define a network that is composed of blogs connected together with such relationships as a blog network. Blog networks can have two different propensities characterized by the articles published in the blogs: information-valued propensity and friendship-valued propensity. The degree of each propensity of a blog network plays an important role in deciding business policies for blog networks. In this paper, we address the problem of determining the degrees of two propensities of a given blog network. First, we determine the degree of the propensity of every relationship, a basic unit of a blog network, by using classification that is one of data mining functionalities. Then, by utilizing the result thus obtained, we compute the degrees of two propensities of the whole blog network. Also, we propose a method to solve the problem that the degree of propensities depends on the size of blog networks. To verify the superiority of the proposed approach, we perform extensive experiments using a huge volume of real-world blog data. The results show that our approach provides high accuracy of around 93% in determining the degrees of both propensities of relationships between arbitrary two blogs. We also verify the applicability of the proposed approach by showing that if determines the degrees of the information-valued and friendship-valued propensities correctly in real-world blog networks.

Component Analysis for Constructing an Emotion Ontology (감정 온톨로지의 구축을 위한 구성요소 분석)

  • Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.157-175
    • /
    • 2010
  • Understanding dialogue participant's emotion is important as well as decoding the explicit message in human communication. It is well known that non-verbal elements are more suitable for conveying speaker's emotions than verbal elements. Written texts, however, contain a variety of linguistic units that express emotions. This study aims at analyzing components for constructing an emotion ontology, that provides us with numerous applications in Human Language Technology. A majority of the previous work in text-based emotion processing focused on the classification of emotions, the construction of a dictionary describing emotion, and the retrieval of those lexica in texts through keyword spotting and/or syntactic parsing techniques. The retrieved or computed emotions based on that process did not show good results in terms of accuracy. Thus, more sophisticate components analysis is proposed and the linguistic factors are introduced in this study. (1) 5 linguistic types of emotion expressions are differentiated in terms of target (verbal/non-verbal) and the method (expressive/descriptive/iconic). The correlations among them as well as their correlation with the non-verbal expressive type are also determined. This characteristic is expected to guarantees more adaptability to our ontology in multi-modal environments. (2) As emotion-related components, this study proposes 24 emotion types, the 5-scale intensity (-2~+2), and the 3-scale polarity (positive/negative/neutral) which can describe a variety of emotions in more detail and in standardized way. (3) We introduce verbal expression-related components, such as 'experiencer', 'description target', 'description method' and 'linguistic features', which can classify and tag appropriately verbal expressions of emotions. (4) Adopting the linguistic tag sets proposed by ISO and TEI and providing the mapping table between our classification of emotions and Plutchik's, our ontology can be easily employed for multilingual processing.

  • PDF

Classification of Sedimentary Facies Using IKONOS Image in Hwangdo Tidal Flat, Cheonsu Bay (IKONOS 영상을 이용한 천수만 황도 갯벌 표층 퇴적상 분류)

  • Ryu, Joo-Hyung;Woo, Han Jun;Park, Chan-Hong;Yoo, Hong-Rhyong
    • Journal of Wetlands Research
    • /
    • v.7 no.2
    • /
    • pp.121-132
    • /
    • 2005
  • To classify the surface sedimentary facies using IKONOS image collected over Hwangdo tidal flat in Cheonsu Bay, the optical reflectance was compared for characterizing various sedimentary environments such as grain size, tidal channel pattern and area ratio of surface remnant water. The intertidal DEM (Digital Elevation Model) was generated by echo-sounder for analyzing the relationship between IKONOS image and sedimentary environments including topography. The boundary of the optical reflectance between mud-mixed facies and sand facies was distinct, and discrimination of the associated sandbar feature was also possible. The mud-mixed facies coupled with intricate tidal channels is confined to the relatively hi호 topography of Hwangdo tidal flat. The boundary between mud and mixed flat was indistinct in IKONOS optical reflectance but it would have a difference in the area ratio of surface remnant water. The dark area in the image represented the well developed sand facies having a lot of surface remnant water due to the relatively low surface topography. The overall accuracy of characterizing the surface sediment facies by maximum likelihood classification method was 86.2 %. These results demonstrate that high spatial resolution satellite imagery such as IKONOS coupled with knowledge of grain size, surface remnant water and tidal channel network can be effectively used to characterize the surface sedimentary facies (mud, mixed and sand) network of the tidal flat environments.

  • PDF

Visualization of Malwares for Classification Through Deep Learning (딥러닝 기술을 활용한 멀웨어 분류를 위한 이미지화 기법)

  • Kim, Hyeonggyeom;Han, Seokmin;Lee, Suchul;Lee, Jun-Rak
    • Journal of Internet Computing and Services
    • /
    • v.19 no.5
    • /
    • pp.67-75
    • /
    • 2018
  • According to Symantec's Internet Security Threat Report(2018), Internet security threats such as Cryptojackings, Ransomwares, and Mobile malwares are rapidly increasing and diversifying. It means that detection of malwares requires not only the detection accuracy but also versatility. In the past, malware detection technology focused on qualitative performance due to the problems such as encryption and obfuscation. However, nowadays, considering the diversity of malware, versatility is required in detecting various malwares. Additionally the optimization is required in terms of computing power for detecting malware. In this paper, we present Stream Order(SO)-CNN and Incremental Coordinate(IC)-CNN, which are malware detection schemes using CNN(Convolutional Neural Network) that effectively detect intelligent and diversified malwares. The proposed methods visualize each malware binary file onto a fixed sized image. The visualized malware binaries are learned through GoogLeNet to form a deep learning model. Our model detects and classifies malwares. The proposed method reveals better performance than the conventional method.

The attacker group feature extraction framework : Authorship Clustering based on Genetic Algorithm for Malware Authorship Group Identification (공격자 그룹 특징 추출 프레임워크 : 악성코드 저자 그룹 식별을 위한 유전 알고리즘 기반 저자 클러스터링)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.1-8
    • /
    • 2020
  • Recently, the number of APT(Advanced Persistent Threats) attack using malware has been increasing, and research is underway to prevent and detect them. While it is important to detect and block attacks before they occur, it is also important to make an effective response through an accurate analysis for attack case and attack type, these respond which can be determined by analyzing the attack group of such attacks. Therefore, this paper propose a framework based on genetic algorithm for analyzing malware and understanding attacker group's features. The framework uses decompiler and disassembler to extract related code in collected malware, and analyzes information related to author through code analysis. Malware has unique characteristics that only it has, which can be said to be features that can identify the author or attacker groups of that malware. So, we select specific features only having attack group among the various features extracted from binary and source code through the authorship clustering method, and apply genetic algorithm to accurate clustering to infer specific features. Also, we find features which based on characteristics each group of malware authors has that can express each group, and create profiles to verify that the group of authors is correctly clustered. In this paper, we do experiment about author classification using genetic algorithm and finding specific features to express author characteristic. In experiment result, we identified an author classification accuracy of 86% and selected features to be used for authorship analysis among the information extracted through genetic algorithm.

Classification of Parent Company's Downward Business Clients Using Random Forest: Focused on Value Chain at the Industry of Automobile Parts (랜덤포레스트를 이용한 모기업의 하향 거래처 기업의 분류: 자동차 부품산업의 가치사슬을 중심으로)

  • Kim, Teajin;Hong, Jeongshik;Jeon, Yunsu;Park, Jongryul;An, Teayuk
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.1-22
    • /
    • 2018
  • The value chain has been utilized as a strategic tool to improve competitive advantage, mainly at the enterprise level and at the industrial level. However, in order to conduct value chain analysis at the enterprise level, the client companies of the parent company should be classified according to whether they belong to it's value chain. The establishment of a value chain for a single company can be performed smoothly by experts, but it takes a lot of cost and time to build one which consists of multiple companies. Thus, this study proposes a model that automatically classifies the companies that form a value chain based on actual transaction data. A total of 19 transaction attribute variables were extracted from the transaction data and processed into the form of input data for machine learning method. The proposed model was constructed using the Random Forest algorithm. The experiment was conducted on a automobile parts company. The experimental results demonstrate that the proposed model can classify the client companies of the parent company automatically with 92% of accuracy, 76% of F1-score and 94% of AUC. Also, the empirical study confirm that a few transaction attributes such as transaction concentration, transaction amount and total sales per customer are the main characteristics representing the companies that form a value chain.

Learning-based Detection of License Plate using SIFT and Neural Network (SIFT와 신경망을 이용한 학습 기반 차량 번호판 검출)

  • Hong, Won Ju;Kim, Min Woo;Oh, Il-Seok
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.8
    • /
    • pp.187-195
    • /
    • 2013
  • Most of former studies for car license plate detection restrict the image acquisition environment. The aim of this research is to diminish the restrictions by proposing a new method of using SIFT and neural network. SIFT can be used in diverse situations with less restriction because it provides size- and rotation-invariance and large discriminating power. SIFT extracted from the license plate image is divided into the internal(inside class) and the external(outside class) ones and the classifier is trained using them. In the proposed method, by just putting the various types of license plates, the trained neural network classifier can process all of the types. Although the classification performance is not high, the inside class appears densely over the plate region and sparsely over the non-plate regions. These characteristics create a local feature map, from which we can identify the location with the global maximum value as a candidate of license plate region. We collected image database with much less restriction than the conventional researches. The experiment and evaluation were done using this database. In terms of classification accuracy of SIFT keypoints, the correct recognition rate was 97.1%. The precision rate was 62.0% and recall rate was 50.2%. In terms of license plate detection rate, the correct recognition rate was 98.6%.

Study on evaluating the significance of 3D nuclear texture features for diagnosis of cervical cancer (자궁경부암 진단을 위한 3차원 세포핵 질감 특성값 유의성 평가에 관한 연구)

  • Choi, Hyun-Ju;Kim, Tae-Yun;Malm, Patrik;Bengtsson, Ewert;Choi, Heung-Kook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.83-92
    • /
    • 2011
  • The aim of this study is to evaluate whether 3D nuclear chromatin texture features are significant in recognizing the progression of cervical cancer. In particular, we assessed that our method could detect subtle differences in the chromatin pattern of seemingly normal cells on specimens with malignancy. We extracted nuclear texture features based on 3D GLCM(Gray Level Co occurrence Matrix) and 3D Wavelet transform from 100 cell volume data for each group (Normal, LSIL and HSIL). To evaluate the feasibility of 3D chromatin texture analysis, we compared the correct classification rate for each of the classifiers using them. In addition to this, we compared the correct classification rates for the classifiers using the proposed 3D nuclear texture features and the 2D nuclear texture features which were extracted in the same way. The results showed that the classifier using the 3D nuclear texture features provided better results. This means our method could improve the accuracy and reproducibility of quantification of cervical cell.

Design of discriminant function for thick and thin coating from the white coating (백태 중 후태 및 박태 분류 판별함수 설계)

  • Choi, Eun-Ji;Kim, Keun-Ho;Ryu, Hyun-Hee;Lee, Hae-Jung;Kim, Jong-Yeol
    • Korean Journal of Oriental Medicine
    • /
    • v.13 no.3
    • /
    • pp.119-124
    • /
    • 2007
  • Introduction: In Oriental medicine, the status of tongue is the important indicator to diagnose one's health, because it represents physiological and clinicopathological changes of inner parts of the body. The method of tongue diagnosis is not only convenient but also non-invasive, so tongue diagnosis is most widely used in Oriental medicine. By the way, since tongue diagnosis is affected by examination circumstances a lot, its performance depends on a light source, degrees of an angle, a medical doctor's condition etc. Therefore, it is not easy to make an objective and standardized tongue diagnosis. In order to solve this problem, in this study, we tried to design a discriminant function for thick and thin coating with color vectors of preprocessed image. Method: 52 subjects, who were diagnosed as white-coated tongue, were involved. Among them, 45 subjects diagnosed as thin coating and 7 subjects diagnosed as thick coating by oriental medical doctors, and then their tongue images were obtained from a digital tongue diagnosis system. Using those acquired tongue images, we implemented two steps: Preprocessing and image analyzing. The preprocessing part of this method includes histogram equalization and histogram stretching at each color component, especially, intensity and saturation. It makes the difference between tongue substance and tongue coating was more visible, so that we can separate tongue coating easily. Next part, we analyzed the characteristic of color values and found the threshold to divide tongue area into coating area. Then, from tongue coating image, it is possible to extract the variables that were important to classify thick and thin coating. Result : By statistical analysis, two significant vectors, associated with G, were found, which were able to describe the difference between thick and thin coating very well. Using these two variables, we designed the discriminant function for coating classification and examined its performance. As a result, the overall accuracy of thick and thin coating classification was 92.3%. Discussion : From the result, we can expect that the discriminant function is applicable to other coatings in a similar way. Also, it can be used to make an objective and standardized diagnosis.

  • PDF

Improved Focused Sampling for Class Imbalance Problem (클래스 불균형 문제를 해결하기 위한 개선된 집중 샘플링)

  • Kim, Man-Sun;Yang, Hyung-Jeong;Kim, Soo-Hyung;Cheah, Wooi Ping
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.287-294
    • /
    • 2007
  • Many classification algorithms for real world data suffer from a data class imbalance problem. To solve this problem, various methods have been proposed such as altering the training balance and designing better sampling strategies. The previous methods are not satisfy in the distribution of the input data and the constraint. In this paper, we propose a focused sampling method which is more superior than previous methods. To solve the problem, we must select some useful data set from all training sets. To get useful data set, the proposed method devide the region according to scores which are computed based on the distribution of SOM over the input data. The scores are sorted in ascending order. They represent the distribution or the input data, which may in turn represent the characteristics or the whole data. A new training dataset is obtained by eliminating unuseful data which are located in the region between an upper bound and a lower bound. The proposed method gives a better or at least similar performance compare to classification accuracy of previous approaches. Besides, it also gives several benefits : ratio reduction of class imbalance; size reduction of training sets; prevention of over-fitting. The proposed method has been tested with kNN classifier. An experimental result in ecoli data set shows that this method achieves the precision up to 2.27 times than the other methods.