Search | Korea Science

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

Kim, Kitae;Lee, Bomi;Kim, Jong Woo
- Journal of Intelligence and Information Systems
- /
- v.23 no.1
- /
- pp.95-108
- /
- 2017
Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.
https://doi.org/10.13088/jiis.2017.23.1.095 인용 PDF KSCI

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

Min, Sung-Hwan
- Journal of Intelligence and Information Systems
- /
- v.22 no.1
- /
- pp.139-157
- /
- 2016
Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.
https://doi.org/10.13088/jiis.2016.22.1.139 인용 PDF KSCI

A Comparative Study on the Aesthetic Aspect of Design Preferred Between Countries Centering Around the Analysis on the Aesthetic Aspect of Mobile Phone Preferred by Korean and Chinese Consumers - (국가 간 선호 디자인의 심미성요소 비교연구 - 한.중 소비자 선호휴대폰의 심미성요소 분석을 중심으로 -)

Jeong Su-Kyoung;Hong Jung-Pyo
- Science of Emotion and Sensibility
- /
- v.9 no.1
- /
- pp.49-61
- /
- 2006
The present mobile phone industry has significant effect on the domestic economy and has taken root as the core item that has the responsibility to lead the Korean economy for a considerable period of time. As the mobile phone market becomes gigantic, the mobile phone is being used by people in broader age bracket, and functions or designs preferred by people of various age are getting more diverse. Like that, as the mobile phone has greater effect on and meaning in our daily lives, consumers of mobile phone have growing expectation of the mobile phone Now, the core function of voice communication via the mobile phone is not a great concern to consumers. But the function, such as more convenient and friendly information input and output, processing and storage, and the design, which is more sophisticated and optimized for the user environment, are being demanded, not just the simple voice communication. And as the modern design is getting more similar to the objects of traditional high art consumed by consumers every day, the aesthetic aspect of design can play an important role, as the factor that differentiates the product, in creating new value which forms the spiritual and emotional value of human beings to improve the quality of living, and in addition, the willingness of consumers to buy is determined by the design that they prefer the most. Like that, a new design of mobile phone based on a new dimension and preferred by the consumers the most is urgently required to be developed by shedding light on the factors related to the preference of consumers on the basis of the analysis on the aesthetic aspect, which can be said to be the most critical factor in the design process. Therefore, this study aims to identity the common preference and different factors of aesthetic aspects through the analysis on the aesthetic aspects of the mobile phone preferred by users among countries, and figure out the formative artistic factors of aesthetic aspects that are considered to be important, in order to propose the guideline on the aesthetic aspect of mobile phone that can be applied to the design of mobile phone practically.
PDF

Analysis of Industrial Linkage Effects for Farm Land Base Development Project -With respect to the Hwangrak Benefited Area with Reservoir - (농업생산기반 정비사업의 산업연관효과분석 -황락 저수지지구를 중심으로-)

Lim, Jae Hwan;Han, Seok Ho
- Korean Journal of Agricultural Science
- /
- v.26 no.2
- /
- pp.77-93
- /
- 1999
This study is aiming at identifying the foreward and backward lingkage effects of the farm land base development project. Korean Government has continuously carried out farmland base development projets including the integrated agricultural development projects. large and medium scale irrigation projects and the comprehensive development of the four big river basin including tidal land reclamation and estuary dam construction for the all weather farming since 1962. the starting year of the five year economic development plans. Consequently the irrigation rate of paddy fields in Korea reached to 75% in 1998 and to escalate the irrigation rate, the Government had procured heavy investment fund from IBRD. IMF and OECF etc. To cope with the agricultural problems like trade liberalization in accordance with WTO policy, the government has tried to solve such problems as new farmland base development policy, preservation of the farmland and expansion of farmland to meet self-sufficiency of foods in the future. Especially, farmland base development projects have been challanged to environmental and ecological problems in evaluating economic benefits and costs where the value of non-market goods have not been included in those. Up to data, in evaluating benefits and costs of the projects, farmland base development projects have been confined to direct incremental value of farm products and it's related costs. Therefore the projects'efficiency as a decision making criteria has shown the low level of economic efficiencies. In estimating economic efficiencies including Leontiefs input-output analysis of the projects could not be founded in Korea at present. Accordingly this study is aimed at achieving and identifying the following objectives. (1) To identify the problems related to the financial supports of the Government in implementing the proposed projects. (2) To estimated backward and foreward linkage effects of the proposed project from the view point of national economy as a whole. To achieve the objectives, Hwangrak benefited area with reservoir which is located in Seosan-haemi Disticts, Chungnam Province were selected as a case study. The main results of the study are summarized as follows : a. The present value of investment and O & M cost were amounted to 3,510million won and the present value of the value added in related industries was estimated at 5.913million won for the period of economic life of 70 years. b. The total discounted value of farm products in the concerned industries derived by the project was estimated at 10,495million won and the foreward and backward linkage effects of the project were amounted to 6,760 and 5,126million won respectively. c. The total number of employment opportunities derived from the related industries for the period of project life were 3,136 man/year. d. Farmland base development projects were showed that the backward linkage effects estimated by index of the sensitivity dispersion were larger than the forward linkage effect estimated by index of the power of dispersion. On the other hand, the forward linkage effect of rice production value during project life was larger than the backward linkage effect e. The rate of creation of new job opportunity by means of implementing civil engineering works were shown high in itself rather than any other fields. and the linkage effects of production of the project investment were mainly derived from the metal and non-metal fields. f. According to the industrial linkage effect analysis, farmland base development projects were identified economically feasible from the view point of national economy as a whole even though the economic efficiencies of the project was outstandingly decreased owing to delaying construction period and increasing project costs.
PDF

An Empirical Comparative Study of the Seaport Clustering Measurement Using Bootstrapped DEA and Game Cross-efficiency Models (부트스트랩 DEA모형과 게임교차효율성모형을 이용한 항만클러스터링 측정에 대한 실증적 비교연구)

Park, Ro-Kyung
- Journal of Korea Port Economic Association
- /
- v.32 no.1
- /
- pp.29-58
- /
- 2016
The purpose of this paper is to show the clustering trend and the comparison of empirical results and is to choose the clustering ports for 3 Korean ports(Busan, Incheon and Gwangyang Ports) by using the bootstrapped DEA(Data Envelopment Analysis) and game Cross-efficiency models for 38 Asian ports during the period 2003-2013 with 4 input variables(birth length, depth, total area, and number of cranes) and 1 output variable(container TEU). The main empirical results of this paper are as follows. First, bootstrapped DEA efficiency of SW and LT is 0.7660, 0.7341 respectively. Clustering results of the bootstrapped DEA analysis show that 3 Korean ports [ Busan (6.46%), Incheon (3.92%), and Gwangyang (2.78%)] can increase the efficiency in the SW model, but the LT model shows clustering values of -1.86%, -0.124%, and 2.11% for Busan, Gwangyang, and Incheon respectively. Second, the game cross-efficiency model suggests that Korean ports should be clustered with Hong Kong, Shanghi, Guangzhou, Ningbo, Port Klang, Singapore, Kaosiung, Keelong, and Bangkok ports. This clustering enhances the efficiency of Gwangyang by 0.131%, and decreases that of Busan by-1.08%, and that of Incheon by -0.009%. Third, the efficiency ranking comparison between the two models using the Wilcoxon Signed-rank Test was matched with the average level of SW (72.83 %) and LT (68.91%). The policy implication of this paper is that Korean port policy planners should introduce the bootstrapped DEA, and game cross-efficiency models when clustering is needed among Asian ports for enhancing the efficiency of inputs and outputs. Also, the results of SWOT(Strength, Weakness, Opportunity, and Threat) analysis among the clustering ports should be considered.
PDF KSCI

Performance Optimization of Numerical Ocean Modeling on Cloud Systems (클라우드 시스템에서 해양수치모델 성능 최적화)

JUNG, KWANGWOOG;CHO, YANG-KI;TAK, YONG-JIN
- The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
- /
- v.27 no.3
- /
- pp.127-143
- /
- 2022
Recently, many attempts to run numerical ocean models in cloud computing environments have been tried actively. A cloud computing environment can be an effective means to implement numerical ocean models requiring a large-scale resource or quickly preparing modeling environment for global or large-scale grids. Many commercial and private cloud computing systems provide technologies such as virtualization, high-performance CPUs and instances, ether-net based high-performance-networking, and remote direct memory access for High Performance Computing (HPC). These new features facilitate ocean modeling experimentation on commercial cloud computing systems. Many scientists and engineers expect cloud computing to become mainstream in the near future. Analysis of the performance and features of commercial cloud services for numerical modeling is essential in order to select appropriate systems as this can help to minimize execution time and the amount of resources utilized. The effect of cache memory is large in the processing structure of the ocean numerical model, which processes input/output of data in a multidimensional array structure, and the speed of the network is important due to the communication characteristics through which a large amount of data moves. In this study, the performance of the Regional Ocean Modeling System (ROMS), the High Performance Linpack (HPL) benchmarking software package, and STREAM, the memory benchmark were evaluated and compared on commercial cloud systems to provide information for the transition of other ocean models into cloud computing. Through analysis of actual performance data and configuration settings obtained from virtualization-based commercial clouds, we evaluated the efficiency of the computer resources for the various model grid sizes in the virtualization-based cloud systems. We found that cache hierarchy and capacity are crucial in the performance of ROMS using huge memory. The memory latency time is also important in the performance. Increasing the number of cores to reduce the running time for numerical modeling is more effective with large grid sizes than with small grid sizes. Our analysis results will be helpful as a reference for constructing the best computing system in the cloud to minimize time and cost for numerical ocean modeling.
https://doi.org/10.7850/jkso.2022.27.3.127 인용 PDF KSCI

Development of the Information Delivery System for the Home Nursing Service (가정간호사업 운용을 위한 정보전달체계 개발 I (가정간호 데이터베이스 구축과 뇌졸중 환자의 가정간호 전산개발))

Park, J.H;Kim, M.J;Hong, K.J;Han, K.J;Park, S.A;Yung, S.N;Lee, I.S;Joh, H.;Bang, K.S
- Journal of Home Health Care Nursing
- /
- v.4
- /
- pp.5-22
- /
- 1997
The purpose of the study was to development an information delivery system for the home nursing service, to demonstrate and to evaluate the efficiency of it. The period of research conduct was from September 1996 to August 31, 1997. At the 1st stage to achieve the purpose, Firstly Assessment tool for the patients with cerebral vascular disease who have the first priority of HNS among the patients with various health problems at home was developed through literature review. Secondly, after identification of patient nursing problem by the home care nurse with the assessment tool, the patient's classification system developed by Park (1988) that was 128 nursing activities under 6 categories was used to identify the home care nurse's activities of the patient with CAV at home. The research team had several workshops with 5 clinical nurse experts to refine it. At last 110 nursing activities under 11 categories for the patients with CVA were derived. At the second stage, algorithms were developed to connect 110 nursing activities with the patient nursing problems identified by assessment tool. The computerizing process of the algorithms is as follows: These algorithms are realized with the computer program by use of the software engineering technique. The development is made by the prototyping method, which is the requirement analysis of the software specifications. The basic features of the usability, compatibility, adaptability and maintainability are taken into consideration. Particular emphasis is given to the efficient construction of the database. To enhance the database efficiency and to establish the structural cohesion, the data field is categorized with the weight of relevance to the particular disease. This approach permits the easy adaptability when numerous diseases are applied in the future. In paralleled with this, the expandability and maintainability is stressed through out the program development, which leads to the modular concept. However since the disease to be applied is increased in number as the project progress and since they are interrelated and coupled each other, the expand ability as well as maintainability should be considered with a big priority. Furthermore, since the system is to be synthesized with other medical systems in the future, these properties are very important. The prototype developed in this project is to be evaluated through the stage of system testing. There are various evaluation metrics such as cohesion, coupling and adaptability so on. But unfortunately, direct measurement of these metrics are very difficult, and accordingly, analytical and quantitative evaluations are almost impossible. Therefore, instead of the analytical evaluation, the experimental evaluation is to be applied through the test run by various users. This system testing will provide the viewpoint analysis of the user's level, and the detail and additional requirement specifications arising from user's real situation will be feedback into the system modeling. Also. the degree of freedom of the input and output will be improved, and the hardware limitation will be investigated. Upon the refining, the prototype system will be used as a design template. and will be used to develop the more extensive system. In detail. the relevant modules will be developed for the various diseases, and the module will be integrated by the macroscopic design process focusing on the inter modularity, generality of the database. and compatibility with other systems. The Home care Evaluation System is comprised of three main modules of : (1) General information on a patient, (2) General health status of a patient, and (3) Cerebrovascular disease patient. The general health status module has five sub modules of physical measurement, vitality, nursing, pharmaceutical description and emotional/cognition ability. The CVA patient module is divided into ten sub modules such as subjective sense, consciousness, memory and language pattern so on. The typical sub modules are described in appendix 3.
PDF

Evaluation on the Technique Efficiency of Annual Chestnut Production in South Korea (임업생산비통계를 이용한 연도별 밤 생산량의 기술효율성 평가)

Won, Hyun-Kyu;Jeon, Ju-Hyeon;Kim, Chul-Woo;Jeon, Hyun-Sun;Son, Yeung-Mo;Lee, Uk
- Journal of Korean Society of Forest Science
- /
- v.105 no.2
- /
- pp.247-252
- /
- 2016
This study was conducted to evaluate the technical efficiency of Annual Chestnut production in South Korea. In this study, technical efficiency is the maximum possible production for which a certain amount of costs is inputted. For analysis on the technical efficiency we used output-oriented BCC Model, and then we analyzed correlation among input costs, production, gross income, net income, and market price per unit in order to determine the cause of variation in the technical efficiency. As study materials, we used statistics for the forestry production costs for 7 years from 2008 to 2014. The study results showed that the maximum possible production and actual production in 2008, 2009, and 2010 were 1,568 kg, 1,745 kg, and 1,534 kg by hectares in the order which were the same values. Consequently, the technical efficiency of those was all evaluated as 1.00. On the other hand, actual production from 2011 to 2014 was 1,270 kg 1,047 kg, 1,258 kg, and 1,488 kg by hectares in the order and the maximum possible production was 1,524 kg, 1,467 kg, 1,635 kg, and 1,637 kg by hectares in the analysis. From those values, the technical efficiency was evaluated in the following order:0.83, 0.71, 0.75, 0.91. The lowest value of the technical efficiency was 0.71 in 2012, and the values of this increased gradually since 2013. It is indicated that the cause of variation in the technical efficiency was related to the relationship between production and market price, and there was a negative correlation with r = -0.821 (p<0.05). The level of maximum available production per unit area was between 1,488kg in lower limit and 1,745 kg in upper limit, and the average was turned out as 1,548 kg.
https://doi.org/10.14578/jkfs.2016.105.2.247 인용 PDF KSCI

Early Identification of Gifted Young Children and Dynamic assessment (유아 영재의 판별과 역동적 평가)

장영숙
- Journal of Gifted/Talented Education
- /
- v.11 no.3
- /
- pp.131-153
- /
- 2001
The importance of identifying gifted children during early childhood is becoming recognized. Nonetheless, most researchers preferred to study the primary and secondary levels where children are already and more clearly demonstrating what talents they have, and where more reliable predictions of gifted may be made. Comparatively lisle work has been done in this area. When we identify giftedness during early childhood, we have to consider the potential of the young children rather than on actual achievement. Giftedness during early childhood is still developing and less stable than that of older children and this prevents us from making firm and accurate predictions based on children's actual achievement. Dynamic assessment, based on Vygotsky's concept of the zone of proximal development(ZPD), suggests a new idea in the way the gifted young children are identified. In light of dynamic assessment, for identifying the potential giftedness of young children. we need to involve measuring both unassisted and assisted performance. Dynamic assessment usually consists of a test-intervene-retest format that focuses attention on the improvement in child performance when an adult provides mediated assistance on how to master the testing task. The advantages of the dynamic assessment are as follows: First, the dynamic assessment approach can provide a useful means for assessing young gifted child who have not demonstrated high ability on traditional identification method. Second, the dynamic assessment approach can assess the learning process of young children. Third, the dynamic assessment can lead an individualized education by the early identification of young gifted children. Fourth, the dynamic assessment can be a more accurate predictor of potential by linking diagnosis and instruction. Thus, it can make us provide an educational treatment effectively for young gifted children.
PDF

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

Kim, Museong;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.27 no.3
- /
- pp.175-197
- /
- 2021
Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.
https://doi.org/10.13088/jiis.2021.27.3.175 인용 PDF KSCI

Search Result 2,367, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)