• Title/Summary/Keyword: Data Classification Systems

Search Result 1,432, Processing Time 0.026 seconds

Construction of Hierarchical Classification of User Tags using WordNet-based Formal Concept Analysis (WordNet기반의 형식개념분석기법을 이용한 사용자태그 분류체계의 구축)

  • Hwang, Suk-Hyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.149-161
    • /
    • 2013
  • In this paper, we propose a novel approach to construction of classification hierarchies for user tags of folksonomies, using WordNet-based Formal Concept Analysis tool, called TagLighter, which is developed on this research. Finally, to give evidence of the usefulness of this approach in practice, we describe some experiments on user tag data of Bibsonomy.org site. The classification hierarchies of user tags constructed by our approach allow us to gain a better and further understanding and insight in tagged data during information retrieval and data analysis on the folksonomy-based systems. We expect that the proposed approach can be used in the fields of web data mining for folksonomy-based web services, social networking systems and semantic web applications.

Strategies for Activating BIM-data Sharing in Construction - Based on cases of defining practical data and a survey of practitioners - (건설분야 BIM 데이터 공유 활성화 전략 - 건설 실무분야의 데이터 연계방법과 실무자 설문을 기반으로-)

  • Kim, Do-Young;Lee, Sung-Woo;Nam, Ju-Hyun;Kim, Bum-Soo;Kim, Sung-Jin
    • Journal of KIBIM
    • /
    • v.12 no.1
    • /
    • pp.72-80
    • /
    • 2022
  • It has become mandatory to designs by BIM in construction. It is urgent to make accurate decisions through the linkage between complex and various types of data in projects. In particular, block-chain based data sharing process (using BIM files, general construction submitted files) is essential to support reliable decision making in complex data flood systems. Prior to developing data sharing system based on block-chain, in this paper, a data linkage method is proposed so that practitioners can simultaneously utilize existing construction information and BIM data. Examples are shown based on the construction classification system and file expression, and incentive strategies are explored through a survey so that heterogeneous information can be used at the same time in overall projects.

Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

  • Geonu Kim;Jungyeon Jang;Juwon Lee;Kitae Kim;Woonyoung Yeo;Jong Woo Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.771-788
    • /
    • 2019
  • Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) show superior performance in text classification than traditional approaches such as Support Vector Machines (SVMs) and Naïve Bayesian approaches. When using CNNs for text classification tasks, word embedding or character embedding is a step to transform words or characters to fixed size vectors before feeding them into convolutional layers. In this paper, we propose a parallel word-level and character-level embedding approach in CNNs for text classification. The proposed approach can capture word-level and character-level patterns concurrently in CNNs. To show the usefulness of proposed approach, we perform experiments with two English and three Korean text datasets. The experimental results show that character-level embedding works better in Korean and word-level embedding performs well in English. Also the experimental results reveal that the proposed approach provides better performance than traditional CNNs with word-level embedding or character-level embedding in both Korean and English documents. From more detail investigation, we find that the proposed approach tends to perform better when there is relatively small amount of data comparing to the traditional embedding approaches.

Classification of Ovarian Cancer Microarray Data based on Intelligent Systems with Marker gene (선별 시스템 기반 표지 유전자를 포함한 난소암 마이크로어레이 데이터 분류)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.3
    • /
    • pp.747-752
    • /
    • 2011
  • Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. A Microarray data of ovarian cancer consists of the expressions of thens of thousands of genes, and there is no systematic procedure to analyze this information instantaneously. In this paper, gene markers are selected by ranking genes according to statistics, popular classification rules - linear discriminant analysis, k-nearest-neighbor and decision trees - has been performed comparing classification accuracy of data selecting gene markers and not selecting gene markers. The Result that apply linear classification analysis at Microarray data set including marker gene that are selected using ANOVA method represent the highest classification accuracy of 97.78% and the lowest prediction error estimate.

E2GSM: Energy Effective Gear-Shifting Mechanism in Cloud Storage System

  • You, Xindong;Han, GuangJie;Zhu, Chuan;Dong, Chi;Shen, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.4681-4702
    • /
    • 2016
  • Recently, Massive energy consumption in Cloud Storage System has attracted great attention both in industry and research community. However, most of the solutions utilize single method to reduce the energy consumption only in one aspect. This paper proposed an energy effective gear-shifting mechanism (E2GSM) in Cloud Storage System to save energy consumption from multi-aspects. E2GSM is established on data classification mechanism and data replication management strategy. Data is classified according to its properties and then be placed into the corresponding zones through the data classification mechanism. Data replication management strategies determine the minimum replica number through a mathematical model and make decision on replica placement. Based on the above data classification mechanism and replica management strategies, the energy effective gear-shifting mechanism (E2GSM) can automatically gear-shifting among the nodes. Mathematical analytical model certificates our proposed E2GSM is energy effective. Simulation experiments based on Gridsim show that the proposed gear-shifting mechanism is cost effective. Compared to the other energy-saved mechanism, our E2GSM can save energy consumption substantially at the slight expense of performance loss while meeting the QoS of user.

Application of genetic algorithms to cluster analysis

  • Tagami, Takanori;Miyamoto, Sadaaki;Mogami, Yoshio
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1993.10b
    • /
    • pp.64-69
    • /
    • 1993
  • The aim of the present paper is to show the effectiveness of Genetic Algorithm for data classification problems in which the classification criteria are not the Euclidean distance. In particular, in order to improve a search performance of Genetic Algorithm, we introduce a concept of the degree of population diversity, and propose construction of genetic operators and the method of calculation for the fitness of an individual using the degree of population diversity. Then, we investigate their performances through numerical simulations.

  • PDF

Family System Model and Adolescent Adjustment - The Olson Circumplex and Beavers Systems Models - (가족체계모델과 청소년의 적응)

  • 전귀연
    • Korean Journal of Human Ecology
    • /
    • v.2 no.1
    • /
    • pp.38-51
    • /
    • 1999
  • The purpose of this study was to test the validity of Olson Circumplex Model and Beavers Systems Model related to adolescent adjustment. The 830 subjects were selected from the second grade of middle and high schools and adolescents of Juvenile Judge in the city of Taegu. The survey instruments were FACESIII, SFIII, State-Trait Anxiety Inventory, Depression Scale, and Delinquency Scale. Factor Analysis, Cronbach's ${\alpha}$. MANOVA, Scheff'e test were conducted for the data analysis. The major findings of this study were as follows: 1) Family system classification method on Olson Circumplex Model was partially useful in evaluating anxiety, depression, and delinquency of adolescent. 2) Family system classification method on Beavers Systems Model was partially useful in evaluating anxiety and depression of adolescent. (Korean J Human Ecology 2(1) : 38~51, 1999)

  • PDF

A Study on the Service Integration of Traditional Chatbot and ChatGPT (전통적인 챗봇과 ChatGPT 연계 서비스 방안 연구)

  • Cheonsu Jeong
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.4
    • /
    • pp.11-28
    • /
    • 2023
  • This paper proposes a method of integrating ChatGPT with traditional chatbot systems to enhance conversational artificial intelligence(AI) and create more efficient conversational systems. Traditional chatbot systems are primarily based on classification models and are limited to intent classification and simple response generation. In contrast, ChatGPT is a state-of-the-art AI technology for natural language generation, which can generate more natural and fluent conversations. In this paper, we analyze the business service areas that can be integrated with ChatGPT and traditional chatbots, and present methods for conducting conversational scenarios through case studies of service types. Additionally, we suggest ways to integrate ChatGPT with traditional chatbot systems for intent recognition, conversation flow control, and response generation. We provide a practical implementation example of how to integrate ChatGPT with traditional chatbots, making it easier to understand and build integration methods and actively utilize ChatGPT with existing chatbots.

Different approaches towards fuzzy database systems A Survey

  • Rundensteiner, Elke A.;Hawkes, Lois Wright
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.3 no.1
    • /
    • pp.65-75
    • /
    • 1993
  • Fuzzy data is a phenomenon often occurring in real life. There is the inherent vagueness of classification terms referring to a continuous scale, the uncertainty of linguistic terms such as "I almost agree" or the vagueness of terms and concepts due to the statistical variability in communication [20] and many more. Previously, such fuzzy data was approximated by non-fuzzy (crisp) data, which obviously did not lead to a correct and precise representation of the real world. Fuzzy set theory has been developed to represent and manipulate fuzzy data [18]. Explicitly managing the degree of fuzziness in databases allows the system to distinguish between what is known, what is not known and what is partially known. Systems in the literature whose specific objective is to handle imprecision in databases present various approaches. This paper is concerned with the different ways uncertainty and imprecision are handled in database design. It outlines the major areas of fuzzification in (relational) database systems.

  • PDF

The AS4059 Hydraulic System Cleanliness Classification System: Replacement of NAS1638

  • Day, Mik;Hong, Jeong-Hee
    • Journal of Drive and Control
    • /
    • v.9 no.2
    • /
    • pp.39-45
    • /
    • 2012
  • The NAS 1638 cleanliness classification system was originally developed in 1966 by the US Aircraft Industries of America to both simplify reporting of particle count data and to control the introduction of dirt during the assembly of aircraft fluid systems. The numbers of particles at stated sizes are represented by broad bands where the interval was generally a doubling of contamination. A number of systems have been introduced since this to suit differing requirements. NAS 1638 and AS4059 are used in other industrial sectors such as the Off-shore & Sub-Sea and the Primary Metal Industries. The changes to ISO contamination measurement standards controlled by ISO/TC131/SC6 in 1999 meant that a revision of most of these classification systems was necessary. The body responsible for NAS 1638 decided to withdraw it for new installations and replace it with an update of an existing standard, SAE AS 4059. This paper details the philosophy behind the contamination coding systems, the reasons for the changes to the ISO contamination standards and explains the workings of AS 4059, the replacement for NAS 1638. It goes on to detail the latest changes to this standard.