• Title/Summary/Keyword: Classifying characteristics

Search Result 663, Processing Time 0.026 seconds

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

Numerical Study on Flow Characteristics and Classification Performance of Circulating Air Classifier (수치해석을 이용한 순환형공기분급기 유동특성 및 분급성능 연구)

  • Yoon, Jong-Hwan;Cheong, Jun-Gyo
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.41 no.3
    • /
    • pp.211-219
    • /
    • 2017
  • In this study, we performed numerical simulations on a circulating air classifier using a commercial computational fluid dynamics program. The variations in the grade efficiency, the cut-size and the cut-sharpness were calculated and discussed. By controlling the rotating speed of the main fan, the cut-size could be rapidly increased. However the linearity of the cut-size variation with respect to the main fan speed was not sufficient for application to contaminated soil classification processes. On the other hand, by varying the rotating speed of the classifying fan, the cut-size gradually decreased and could be precisely adjusted. Using both the main fan and the classifying fan, we could achieve larger cut-sharpness values and better classifying performances.

Job Classifying method based on Data Traits for Increased Efficiency of Computational Resources in Distributed Environment (분산 환경에서 계산 자원의 효율 증대를 위한 데이터 특성 기반의 작업 분류방법)

  • Moon, Sung-Hwan;Kim, Jae-Kwon;Kim, Tae-Young;Choi, Jeong-Seok;Cho, Kyu-Cheol;Lee, Jong-Sik
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.4
    • /
    • pp.219-228
    • /
    • 2014
  • Various computational resources in distributed environment are to build a high-performance computing environments through virtualization technology. Recently, there is a growing need for a complicated process due to the improvement of the user-level application, which has led to demand for high-performance computing. The requested job from users is composed of data. And because of each data has own characteristics, the classifier may consider the features of data. In this paper, we propose Job Classifying method based on Data Traits for Increased Efficiency of Computational Resources in Distributed Environment (JCDT). JCDT classifies the job by data traits of the users' request, is expected to improve the job processing time and increase the processing speed of the calculation resources.

Classifying Digital Game Genres (게임 장르의 유형화)

  • Lee, Sul-Hi;Kwon, Min-Seok
    • Journal of Korea Game Society
    • /
    • v.8 no.3
    • /
    • pp.3-14
    • /
    • 2008
  • Classifying digital games' genres is quite ambiguous work, as they have been using other media's genre classification such as film genre. Therefore this paper tries to propose a new way of classifying genres of digital games. There are two kinds of approaches to digital games genres: one is based on gamers' activities and the other is on game text. The former turned out to be limitative because the meanings of gamers' activities are sometimes overlapped and this overlapping causes lack of objectivity in deciding which activity belongs to which genre. On the other hand, the latter is comparatively clear and accurate as it is based on what a game text provides. After all. a new classification of digital games based on game text makes total of 7 digital game types, which are Physical obstacles, Mainly Physical & partially intellectual obstacles, Intellectual obstacles, Mainly intellectual & partially physical obstacles, Self-supporting games, Confronting games, Ranking games, and each type has its own convention and characteristics.

  • PDF

POTENTIAL OF HYPERSPECTRAL DATA FOR THE CLASSIFICA TION OF VITD SOIL CLASSES

  • Kim Sun-Hwa;Ma Jung-Rim;Lee Kyu-Sung;Eo Yang-Dam;Lee Yong-Woong
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.221-224
    • /
    • 2005
  • Hyperspectral image data have great potential to depict more detailed information on biophysical characteristics of surface materials, which are not usually available with multispectral data. This study aims to test the potential of hyperspectral data for classifying five soil classes defined by the vector product interim terrain data (VITD). In this study, we try to classify surface materials of bare soil over the study area in Korea using both hyperspectral and multispectral image data. Training and test samples for classification are selected with using VITD vector map. The spectral angle mapper (SAM) method is applied to the EO-I Hyperion data and Landsat ETM+ data, that has been radiometrically corrected and geo-rectified. Higher classification accuracy is obtained with the hyperspectral data for classifying five soil classes of gravel, evaporites, inorganic silt and sand.

  • PDF

Relationship between Classification of Sa-Sang Constitutional Medicine and Chemical Composition of Samgye-Tang Ingredients and Other Food (삼계탕 재료 및 각종 식품의 사상의학적 분류와 화학조성과의 상관관계)

  • 유익종;전기홍;박우문;조혜연;최성유
    • Food Science of Animal Resources
    • /
    • v.21 no.2
    • /
    • pp.97-102
    • /
    • 2001
  • The characteristic fitness of food to each Sa-sang constitution and the relationship between Han-Yeoul characteristics and chemical composition after classifying Samgye-tang ingredients and other food into Ohn-Yeoul-Ryang-Han characteristic were assessed. When the suitable constitution to the each characteristic was investigated after classifying Samgye-tang ingredients and other food into Han, Ryang, Pyound, Ohn and Yeoul of which fitness case for Soeumin was 44∼63% but fitness case for Soyangin and Taeyangin was only 0∼18%. When the relationship between Samgye-tang ingredients and other food classified into Ohn-Yeoul-Ryang-Han and chemical composition of fatty acid, amino acid, vitamin and mineral was investigated, the value of correlation coefficient was extremely low. There was not the relationship between chemical composition and Han-Yeoul classification. Therefore it should be further investigated the relationship between characteristic and chemical composition by additional analysis index.

  • PDF

Pattern Classification for Biomedical Signal using BP Algorithm and SVM (BP알고리즘과 SVM을 이용한 심전도 신호의 패턴 분류)

  • Kim, Man-Sun;Lee, Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.1
    • /
    • pp.82-87
    • /
    • 2004
  • ECG consists of various waveforms of electric signals of heat. Datamining can be used for analyzing and classifying the waveforms. Conventional studies classifying electrocardiogram have problems like extraction of distorted characteristics, overfitting, etc. This study classifies electrocardiograms by using BP algorithm and SVM to solve the problems. As results, this study finds that SVM provides an effective prohibition of overfitting in neural networks and guarantees a sole global solution, showing excellence in generalization performance.

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

Empirical Validation of Customer Characteristics on Internet Shopping Mall Usage (사용자 특성이 인터넷 쇼핑몰 이용에 미치는 영향에 관한 실증적 연구)

  • 김정욱;주형진
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.27 no.4
    • /
    • pp.149-165
    • /
    • 2002
  • This study establishes key factors on ISM (Internet Shopping Mall) performance in Korea. The four factors are derived from the relevant literature and clarified the concept of ISM characteristics, customer characteristics, ISM evaluation level. and perceived risks by distinguishing between its components and determinants. ISM performance indicators were derived from the previous studios classifying by ISM attitude and ISM usage. We examine on the impact of ISM characteristics and customer characteristics on the ISM evaluation level, then its level and perceived risks on the ISM performance. Hypotheses on four factors of ISM performance were tested for 172 respondents. Results indicate that four factors may partially serve as key predictors on ISM performance. ISM characteristics and customer characteristics was found to be positively influenced on ISM evaluation level, and its level also positively affected on ISM performance while perceived risks negatively affected on ISM performance.

A Study on Physicochemical and Sensory Characteristics of Ssanghwa Tea

  • KIM, Oe-Sun;KIM, Jung-Yun;JO, Eun-Mi;RHA, Young-Ah
    • The Korean Journal of Food & Health Convergence
    • /
    • v.6 no.5
    • /
    • pp.11-17
    • /
    • 2020
  • This study tried to analyze sensual properties by classifying the thermal water extract of the main material used in Ssanghwa tea. Through this study, we wanted to develop popular Ssanghwa tea and further carry out basic research for the development of various menus using it. The ingredients for the Ssanghwa tea were washed under running water, then dehydrated and put in a pot as 2L of purified water. Ssangwha tea were heated at 100℃ for 10 minutes, then lowered the temperature to 75℃ and boiled down to 200 ml for 110 minutes. This study evaluated sensory characteristics of four types of commercial products and the five types manufactured by the description analysis. Quantitative analysis of the commercial Ssanghwa tea showed significant differences between samples in seven of the total 13 sensory characteristics except OG(Smell of grass), OC(Oriental medicine smell), TG(Umami), RT(Thick), RC(Rough) and RS(Tub-Tub) (p<0.05). In particular, differences between samples were evident in CT(Transmittance), CB(Brownness), TW(Sweet taste) and TB(Bitter) (p<0.001), which appeared to be the main differentiated features of appearance, aroma and taste for commercial Ssanghwa tea. This study tried to analyze sensual properties by classifying the thermal water extract of the main material used in Ssanghwa tea. Through, we wanted to develop popular Ssanghwa tea and further carry out basic research for the development of various menus using it.