• Title/Summary/Keyword: Classification of Difficulty

Search Result 247, Processing Time 0.026 seconds

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

An EEG-fNIRS Hybridization Technique in the Multi-class Classification of Alzheimer's Disease Facilitated by Machine Learning (기계학습 기반 알츠하이머성 치매의 다중 분류에서 EEG-fNIRS 혼성화 기법)

  • Ho, Thi Kieu Khanh;Kim, Inki;Jeon, Younghoon;Song, Jong-In;Gwak, Jeonghwan
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.305-307
    • /
    • 2021
  • Alzheimer's Disease (AD) is a cognitive disorder characterized by memory impairment that can be assessed at early stages based on administering clinical tests. However, the AD pathophysiological mechanism is still poorly understood due to the difficulty of distinguishing different levels of AD severity, even using a variety of brain modalities. Therefore, in this study, we present a hybrid EEG-fNIRS modalities to compensate for each other's weaknesses with the help of Machine Learning (ML) techniques for classifying four subject groups, including healthy controls (HC) and three distinguishable groups of AD levels. A concurrent EEF-fNIRS setup was used to record the data from 41 subjects during Oddball and 1-back tasks. We employed both a traditional neural network (NN) and a CNN-LSTM hybrid model for fNIRS and EEG, respectively. The final prediction was then obtained by using majority voting of those models. Classification results indicated that the hybrid EEG-fNIRS feature set achieved a higher accuracy (71.4%) by combining their complementary properties, compared to using EEG (67.9%) or fNIRS alone (68.9%). These findings demonstrate the potential of an EEG-fNIRS hybridization technique coupled with ML-based approaches for further AD studies.

  • PDF

Support Vector Machines-based classification of video file fragments (서포트 벡터 머신 기반 비디오 조각파일 분류)

  • Kang, Hyun-Suk;Lee, Young-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.652-657
    • /
    • 2015
  • BitTorrent is an innovative protocol related to file-sharing and file-transferring, which allows users to receive pieces of files from multiple sharer on the Internet to make the pieces into complete files. In reality, however, free distribution of illegal or copyright related video data is counted for crime. Difficulty of regulation on the copyright of data on BitTorrent is caused by the fact that data is transferred with the pieces of files instead of the complete file formats. Therefore, the classification process of file formats of the digital contents should take precedence in order to restore digital contents from the pieces of files received from BitTorrent, and to check the violation of copyright. This study has suggested SVM classifier for the classification of digital files, which has the feature vector of histogram differential on the pieces of files. The suggested classifier has evaluated the performance with the division factor by applying the classifier to three different formats of video files.

Implementation on the Uroflowmetry System and Usefulness Estimation of the Uroflow Parameters (요류검사 시스템의 구현과 요류파라미터의 유용성 평가)

  • Han, B.H.;Jeong, D.U.;Kim, U.Y.;Bae, J.W.;Shon, J.M.;Kim, J.H.;Park, J.M.;Chung, M.K.;Jeon, G.R.
    • Proceedings of the IEEK Conference
    • /
    • 2002.06e
    • /
    • pp.293-296
    • /
    • 2002
  • In this study, the object is a development on uroflowmetry system to detect a voiding symptom conveniently in home or hospital. The hardware was composed of mechanism and system circuit part, the software was divided into firmware and PC program part. The following experiment was performed to evaluate an ability of classification and fitness. First, the following parameters was calculated in each flow curve pattern. The parameters are MFR, AFR, VOL, VT, FT, and TMF. A significant difference among parameters was examined through a statistical analysis for extracted parameters between normal and abnormal group. In the next work, the following experimentation was performed to classify the voiding symptom. Analysis of congregate rate was examined to find out classification possibility about each symptom of BPH, voiding difficulty, detrusor failure and hyperreflexia, unstable bladder. The uroflow data with the above symptom was divided into normal and abnormal group using fuzzy classifier. and that was performed appending the other group again. Fuzzy classification result using MFR and AFR was superior by 89.6 % more than grouping evaluation including VOL.

  • PDF

Method for Inferring Format Information of Data Field from CAN Trace (CAN 트레이스 분석을 통한 데이터 필드 형식 추론 방법 연구)

  • Ji, Cheongmin;Kim, Jimin;Hong, Manpyo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.1
    • /
    • pp.167-177
    • /
    • 2018
  • As the number of attacks on vehicles has increased, studies on CAN-based security technologies are actively being carried out. However, since the upper layer protocol of CAN differs for each vehicle manufacturer and model, there is a great difficulty in researches such as developing anomaly detection for CAN or finding vulnerabilities of ECUs. In this paper, we propose a method to infer the detailed structure of the data field of CAN frame by analyzing CAN trace to mitigate this problem. In the existing Internet environment, many researches for reverse engineering proprietary protocols have already been carried out. However, CAN bus has a structure difficult to apply the existing protocol reverse engineering technology as it is. In this paper, we propose new field classification methods with low computation-cost based on the characteristics of data in CAN frame and existing field classification method. The proposed methods are verified through implementation that analyze CAN traces generated by simulations of CAN communication and actual vehicles. They show higher accuracy of field classification with lower computational cost compared to the existing method.

An Analysis of Content Validity of Behavioral Domain of Descriptive Tests and Factors that Affect Content Validity: Focus on the Fifth and Sixth Grade Science (초등학교 과학과 5, 6학년 서술형 평가문항의 행동영역 내용타당도 및 이에 영향을 미치는 요인 분석)

  • Choi, Jung-In;Paik, Seoung-Hye
    • Journal of The Korean Association For Science Education
    • /
    • v.36 no.1
    • /
    • pp.87-101
    • /
    • 2016
  • This study analyzes the content validity of descriptive tests developed for elementary schools, in order to acquire basic data to improve them. Various descriptive tests were collected and tested for differences in proportions between two-dimensional classification of educational objectives and the level of behavioral objectives. Results show that the descriptive tests developed by elementary school teachers mainly focused on "knowledge" and "understanding," and that content validity for behavioral levels to be low. Nine elementary school teachers were interviewed to understand the result. From the interviews, we found both internal and external factors that cause low content validity. The main internal factors were teachers' ability to make two-dimensional classification of educational objectives, the teachers' consideration of students' level, item level of difficulty, the ease of scoring, and path dependence. The main external factors were curriculum, parents, and administration. Based on the results, we suggested the factors related to elementary school teachers' PCK of descriptive tests.

A Experimental Study on the Development of a Book Recommendation System Using Automatic Classification, Based on the Personality Type (자동분류기반 성격 유형별 도서추천시스템 개발을 위한 실험적 연구)

  • Cho, Hyun-Yang
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.2
    • /
    • pp.215-236
    • /
    • 2017
  • The purpose of this study is to develop an automatic classification system for recommending appropriate books of 9 enneagram personality types, using book information data reviewed by librarians. Data used for this study are book review of 501 recommended titles for children and young adults from National Library for Children and Young Adults. This study is implemented on the assumption that most people prefer different types of books, depending on their preference or personality type. Performance test for two different types of machine learning models, nonlinear kernel and linear kernel, composed of 360 clustering models with 6 different types of index term weighting and feature selections, and 10 feature selection critical mass were experimented. It is appeared that LIBLINEAR has better performance than that of LibSVM(RBF kernel). Although the performance of the developed system in this study is relatively below expectations, and the high level of difficulty in personality type base classification take into consideration, it is meaningful as a result of early stage of the experiment.

Assessing Techniques for Advancing Land Cover Classification Accuracy through CNN and Transformer Model Integration (CNN 모델과 Transformer 조합을 통한 토지피복 분류 정확도 개선방안 검토)

  • Woo-Dam SIM;Jung-Soo LEE
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.27 no.1
    • /
    • pp.115-127
    • /
    • 2024
  • This research aimed to construct models with various structures based on the Transformer module and to perform land cover classification, thereby examining the applicability of the Transformer module. For the classification of land cover, the Unet model, which has a CNN structure, was selected as the base model, and a total of four deep learning models were constructed by combining both the encoder and decoder parts with the Transformer module. During the training process of the deep learning models, the training was repeated 10 times under the same conditions to evaluate the generalization performance. The evaluation of the classification accuracy of the deep learning models showed that the Model D, which utilized the Transformer module in both the encoder and decoder structures, achieved the highest overall accuracy with an average of approximately 89.4% and a Kappa coefficient average of about 73.2%. In terms of training time, models based on CNN were the most efficient. however, the use of Transformer-based models resulted in an average improvement of 0.5% in classification accuracy based on the Kappa coefficient. It is considered necessary to refine the model by considering various variables such as adjusting hyperparameters and image patch sizes during the integration process with CNN models. A common issue identified in all models during the land cover classification process was the difficulty in detecting small-scale objects. To improve this misclassification phenomenon, it is deemed necessary to explore the use of high-resolution input data and integrate multidimensional data that includes terrain and texture information.

Traffic Safety Countermeasures According to the Accident Area Patterns and Impact Factor Analysis of the Large-scale Traffic Accident Locations (대형 교통사고 발생지점 유형화와 영향요인 분석에 따른 교통안전대책 방안에 관한 연구)

  • Kim, Bong-Gi;Jeong, Heon-Yeong;Go, Sang-Seon
    • Journal of Korean Society of Transportation
    • /
    • v.24 no.1 s.87
    • /
    • pp.39-52
    • /
    • 2006
  • This study divided the large-scale traffic accident locations into its own characteristics by using Cluster Analysis. Also, Quantification II and Classification and Regression Tree methods were used enabling evaluation for the amount of affecting influence by the crash type. After these analyses, we tested the fitness of the results and suggested the simplification of the quantification index. With the results from the discussed procedure, obvious differences were observed by groups according to the characteristics of crash type from the Discrimination and Classification analysis of divided four groups. Thus, measures and supplementary measures for the traffic accidents could be suggested in groups systematically. However, a lot of missing values in variables caused a huge loss of data and made this study difficult for more detailed analysis, With this difficulty. recording mandatory log files with a standardized format is also recommended to Prevent this Problem in advance.

A Promotion Plan through Measuring the Utilization of Information Classification Systems in the Construction Industry (건설정보 분류체계 활용도 측정을 통한 분류체계 활성화 방안)

  • Park, Hwan-Pyo;Lee, Jae-Seob
    • Korean Journal of Construction Engineering and Management
    • /
    • v.5 no.6 s.22
    • /
    • pp.90-100
    • /
    • 2004
  • The importance of information management has been emphasized in Korean construction industry, Both public and private have invested to establish and operate construction CALS, construction management (CM), computer integrated construction(CIC), and earned value management system(EVMS). A standard construction information classification system is essential to operate the systems mentioned above. Therefore, Korean government released Integrated Construction Information Classification System(ICICS) in 2001. However, the ICICS is not widely used in construction due to: 1) difficulty of changing existing system, 2) insufficient publicity of the ICICS, and 3) no legal binding force. Especially, participants in construction do not recognize applicability of the ICICS. This research surveyed the degree of recognition and utilization of the ICICS. This survey includes both customized classification systems used by companies and the ICICS and investigates the degree of utilization and drawbacks. The results show some construction companies use their own classification systems and the others use the ICICS prepared by the government. Furthermore, the degree of recognition is in sufficient. The degree of use in design management, specifications, cost and schedule management is very limited. The publicity and education are critical to induce the utilization of the ICICS. The necessity of revision was recommended based on pilot project study that is performed to measure the degree of application of ICICS on the projects. Therefore, this research proposes the measurement model for information application and analyzes the degree of utilization of the ICICS in different phases of construction.