• 제목/요약/키워드: Domain classification

검색결과 553건 처리시간 0.035초

Multi-channel Long Short-Term Memory with Domain Knowledge for Context Awareness and User Intention

  • Cho, Dan-Bi;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Information Processing Systems
    • /
    • 제17권5호
    • /
    • pp.867-878
    • /
    • 2021
  • In context awareness and user intention tasks, dataset construction is expensive because specific domain data are required. Although pretraining with a large corpus can effectively resolve the issue of lack of data, it ignores domain knowledge. Herein, we concentrate on data domain knowledge while addressing data scarcity and accordingly propose a multi-channel long short-term memory (LSTM). Because multi-channel LSTM integrates pretrained vectors such as task and general knowledge, it effectively prevents catastrophic forgetting between vectors of task and general knowledge to represent the context as a set of features. To evaluate the proposed model with reference to the baseline model, which is a single-channel LSTM, we performed two tasks: voice phishing with context awareness and movie review sentiment classification. The results verified that multi-channel LSTM outperforms single-channel LSTM in both tasks. We further experimented on different multi-channel LSTMs depending on the domain and data size of general knowledge in the model and confirmed that the effect of multi-channel LSTM integrating the two types of knowledge from downstream task data and raw data to overcome the lack of data.

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

Learning Deep Representation by Increasing ConvNets Depth for Few Shot Learning

  • Fabian, H.S. Tan;Kang, Dae-Ki
    • International journal of advanced smart convergence
    • /
    • 제8권4호
    • /
    • pp.75-81
    • /
    • 2019
  • Though recent advancement of deep learning methods have provided satisfactory results from large data domain, somehow yield poor performance on few-shot classification tasks. In order to train a model with strong performance, i.e. deep convolutional neural network, it depends heavily on huge dataset and the labeled classes of the dataset can be extremely humongous. The cost of human annotation and scarcity of the data among the classes have drastically limited the capability of current image classification model. On the contrary, humans are excellent in terms of learning or recognizing new unseen classes with merely small set of labeled examples. Few-shot learning aims to train a classification model with limited labeled samples to recognize new classes that have neverseen during training process. In this paper, we increase the backbone depth of the embedding network in orderto learn the variation between the intra-class. By increasing the network depth of the embedding module, we are able to achieve competitive performance due to the minimized intra-class variation.

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • 제14권2호
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

3차원 Co-occurrence 특징을 이용한 지형분류 (Terrain Classification Using Three-Dimensional Co-occurrence Features)

  • 진문광;우동민;이규원
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제52권1호
    • /
    • pp.45-50
    • /
    • 2003
  • Texture analysis has been efficiently utilized in the area of terrain classification. In this application features have been obtained in the 2D image domain. This paper suggests 3D co-occurrence texture features by extending the concept of co-occurrence to 3D world. The suggested 3D features are described using co-occurrence histogram of digital elevations at two contiguous position as co-occurrence matrix. The practical construction of co-occurrence matrix limits the number of levels of digital elevation. If the digital elevation is quantized into the number of levels over the whole DEM(Digital Elevation Map), the distinctive features can not be obtained. To resolve the quantization problem, we employ local quantization technique which preserves the variation of elevations. Experiments has been carried out to verify the proposed 3D co-occurrence features, and the addition of the suggested features significantly improves the classification accuracy.

고속 퓨리에 변환을 이용한 지문의 분류 (Classification of Fingerprints using Fast Fourier Transform)

  • 이정문;박신재;권용호
    • 산업기술연구
    • /
    • 제18권
    • /
    • pp.295-302
    • /
    • 1998
  • Classification of fingerprints is one of the major subjects on which many researchers have been studying for efficient identification. But fingerprints should be preprocessed in various ways prior to being classified. Factors such as the accuracy and the processing time should be considered in classification of fingerprints. In this paper, we propose a method for classifying fingerprints into several frequent patterns. This method consists of two stages. A fingerprint image is first converted to a skeleton form to find out the center. Then it is identified as a member of one of preclassified pattern by the frequency domain feature. Experiments show that the proposed method is quite useful in classifying fingerprints into typical patterns.

  • PDF

Improving the Subject Independent Classification of Implicit Intention By Generating Additional Training Data with PCA and ICA

  • Oh, Sang-Hoon
    • International Journal of Contents
    • /
    • 제14권4호
    • /
    • pp.24-29
    • /
    • 2018
  • EEG-based brain-computer interfaces has focused on explicitly expressed intentions to assist physically impaired patients. For EEG-based-computer interfaces to function effectively, it should be able to understand users' implicit information. Since it is hard to gather EEG signals of human brains, we do not have enough training data which are essential for proper classification performance of implicit intention. In this paper, we improve the subject independent classification of implicit intention through the generation of additional training data. In the first stage, we perform the PCA (principal component analysis) of training data in a bid to remove redundant components in the components within the input data. After the dimension reduction by PCA, we train ICA (independent component analysis) network whose outputs are statistically independent. We can get additional training data by adding Gaussian noises to ICA outputs and projecting them to input data domain. Through simulations with EEG data provided by CNSL, KAIST, we improve the classification performance from 65.05% to 66.69% with Gamma components. The proposed sample generation method can be applied to any machine learning problem with fewer samples.

일개 3차 의료기관의 혈액투석 간호행위규명 및 간호원가 산정 (Search of hemodialysis nursing behaviors and Estimation of hemodialysis nursing costs at a tertiary hospital)

  • 심원희;박정호
    • 간호행정학회지
    • /
    • 제5권2호
    • /
    • pp.297-316
    • /
    • 1999
  • The purpose of this study is searching for hemodialysis nursing bahaviors by hemodialysis room nurses and analyzing them. Then, it estimates hemodialysis nursing costs and obtains basic data for development of proper nursing costs. First, it searched for hemodialysis nursing behaviors at a tertiary hospital hemodialysis room in Seoul and classified them. After the content validity was verified by 6 experts, Tool of hemodialysis nursing behaviors was developed. patients who recived hemodialysis were classified by dialysis patient classification tool. The searcher observed hemodialysis nursing behaviors applied to classified patients per 5 minutes. Then hemodialysis nursing hours spent to classified patients were calculated respectively. The direct expenditures and indirect expenditures were estimated. Ultimately, hemodialysis nursing costs were estimated. The results of the study were as follows ; 1. hemodialysis nursing behaviors were grouped by the same knowledge and skills. then, the content validity of them was verified by evaluation tool of nursing intervention classification by expert groups. They consisted of 9 hemodialysis activity domains and 71 hemodialysis nursing behaviors. The predialysis activity domain included 15 nursing behaviors, the activity domain of start-dialysis included 12 nursing behaviors, the activity domain of during- dialysis included 9 nursing behaviors, the activity domain of finish-dialysis included 5 nursing behaviors, the activity domain of after-dialysis included 5 nursing behaviors, the nursing documentation & undertaking and transfering included 5 nursing behaviors, the supply, drug, equipment & environment management activity domain included 7 nursing behaviors, the patient emotional support & education activity domain included 4 nursing behaviors, the emergency activity domain included 9 nursing behaviors. 2. The acute hemodialysis nursing hours were 106.42 minutes per a dialysis and the chroni hemodialysis nursing hours were 72.23 minutes per a dialysis. 3. The direct expenditure was 11.971 won per hour and indirect expenditure was 288won. 4. Finally, the cost of acute hemodialysis was 21,745 won and that of chronic hemodialysis was 14,759 won. By search of hemodialysis nursing behaviors, they will be used as hemodialysis nursing care standard and will be tended toward high qualitative care. Estimation of hemodialysis nursing costs will be used as fundamental data for development of proper nursing costs.

  • PDF

A Computational Approach for the Classification of Protein Tyrosine Kinases

  • Park, Hyun-Chul;Eo, Hae-Seok;Kim, Won
    • Molecules and Cells
    • /
    • 제28권3호
    • /
    • pp.195-200
    • /
    • 2009
  • Protein tyrosine kinases (PTKs) play a central role in the modulation of a wide variety of cellular events such as differentiation, proliferation and metabolism, and their unregulated activation can lead to various diseases including cancer and diabetes. PTKs represent a diverse family of proteins including both receptor tyrosine kinases (RTKs) and non-receptor tyrosine kinases (NRTKs). Due to the diversity and important cellular roles of PTKs, accurate classification methods are required to better understand and differentiate different PTKs. In addition, PTKs have become important targets for drugs, providing a further need to develop novel methods to accurately classify this set of important biological molecules. Here, we introduce a novel statistical model for the classification of PTKs that is based on their structural features. The approach allows for both the recognition of PTKs and the classification of RTKs into their subfamilies. This novel approach had an overall accuracy of 98.5% for the identification of PTKs, and 99.3% for the classification of RTKs.

특성함수 및 k-최근접이웃 알고리즘을 이용한 국악기 분류 (Classification of Korean Traditional Musical Instruments Using Feature Functions and k-nearest Neighbor Algorithm)

  • 김석호;곽경섭;김재천
    • 한국멀티미디어학회논문지
    • /
    • 제9권3호
    • /
    • pp.279-286
    • /
    • 2006
  • 주파수 분포벡터를 이용한 분류방법을 국악기 분류 및 인식에 적용하였으며 분류에 사용되는 주파수 분포 벡터 중에서 리듬성분을 수치화한 평균피크값을 제안하였다. 대부분의 주파수 처리함수들은 주파수값의 평균, 통계적특성에 기반을 두고 있으며 국악기자동분류를 위해 신호의 평균, 분산, 영교차율, 균형주파수, 평균 피크값을 이용하여 실험하였다. 국악의 장르 구분을 위한 선행 연구로서 음악신호를 함수처리하고 k-최근접이웃 분류알고리즘을 적용하여 분류하였다. 기존의 주파수 분포벡터를 이용하여 발표되었던 서양음악의 분류 성공률 87%보다 향상된 94.44%의 성공률을 나타냈다.

  • PDF