• Title/Summary/Keyword: Random vector

Search Result 559, Processing Time 0.034 seconds

White striping degree assessment using computer vision system and consumer acceptance test

  • Kato, Talita;Mastelini, Saulo Martiello;Campos, Gabriel Fillipe Centini;Barbon, Ana Paula Ayub da Costa;Prudencio, Sandra Helena;Shimokomaki, Massami;Soares, Adriana Lourenco;Barbon, Sylvio Jr.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.1015-1026
    • /
    • 2019
  • Objective: The objective of this study was to evaluate three different degrees of white striping (WS) addressing their automatic assessment and customer acceptance. The WS classification was performed based on a computer vision system (CVS), exploring different machine learning (ML) algorithms and the most important image features. Moreover, it was verified by consumer acceptance and purchase intent. Methods: The samples for image analysis were classified by trained specialists, according to severity degrees regarding visual and firmness aspects. Samples were obtained with a digital camera, and 25 features were extracted from these images. ML algorithms were applied aiming to induce a model capable of classifying the samples into three severity degrees. In addition, two sensory analyses were performed: 75 samples properly grilled were used for the first sensory test, and 9 photos for the second. All tests were performed using a 10-cm hybrid hedonic scale (acceptance test) and a 5-point scale (purchase intention). Results: The information gain metric ranked 13 attributes. However, just one type of image feature was not enough to describe the phenomenon. The classification models support vector machine, fuzzy-W, and random forest showed the best results with similar general accuracy (86.4%). The worst performance was obtained by multilayer perceptron (70.9%) with the high error rate in normal (NORM) sample predictions. The sensory analysis of acceptance verified that WS myopathy negatively affects the texture of the broiler breast fillets when grilled and the appearance attribute of the raw samples, which influenced the purchase intention scores of raw samples. Conclusion: The proposed system has proved to be adequate (fast and accurate) for the classification of WS samples. The sensory analysis of acceptance showed that WS myopathy negatively affects the tenderness of the broiler breast fillets when grilled, while the appearance attribute of the raw samples eventually influenced purchase intentions.

Prediction of the direction of stock prices by machine learning techniques (기계학습을 활용한 주식 가격의 이동 방향 예측)

  • Kim, Yonghwan;Song, Seongjoo
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.745-760
    • /
    • 2021
  • Prediction of a stock price has been a subject of interest for a long time in financial markets, and thus, many studies have been conducted in various directions. As the efficient market hypothesis introduced in the 1970s acquired supports, it came to be the majority opinion that it was impossible to predict stock prices. However, recent advances in predictive models have led to new attempts to predict the future prices. Here, we summarize past studies on the price prediction by evaluation measures, and predict the direction of stock prices of Samsung Electronics, LG Chem, and NAVER by applying various machine learning models. In addition to widely used technical indicator variables, accounting indicators such as Price Earning Ratio and Price Book-value Ratio and outputs of the hidden Markov Model are used as predictors. From the results of our analysis, we conclude that no models show significantly better accuracy and it is not possible to predict the direction of stock prices with models used. Considering that the models with extra predictors show relatively high test accuracy, we may expect the possibility of a meaningful improvement in prediction accuracy if proper variables that reflect the opinions and sentiments of investors would be utilized.

Status of Groundwater Potential Mapping Research Using GIS and Machine Learning (GIS와 기계학습을 이용한 지하수 가능성도 작성 연구 현황)

  • Lee, Saro;Fetemeh, Rezaie
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1277-1290
    • /
    • 2020
  • Water resources which is formed of surface and groundwater, are considered as one of the pivotal natural resources worldwide. Since last century, the rapid population growth as well as accelerated industrialization and explosive urbanization lead to boost demand for groundwater for domestic, industrial and agricultural use. In fact, better management of groundwater can play crucial role in sustainable development; therefore, determining accurate location of groundwater based groundwater potential mapping is indispensable. In recent years, integration of machine learning techniques, Geographical Information System (GIS) and Remote Sensing (RS) are popular and effective methods employed for groundwater potential mapping. For determining the status of the integrated approach, a systematic review of 94 directly relevant papers were carried out over the six previous years (2015-2020). According to the literature review, the number of studies published annually increased rapidly over time. The total study area spanned 15 countries, and 85.1% of studies focused on Iran, India, China, South Korea, and Iraq. 20 variables were found to be frequently involved in groundwater potential investigations, of which 9 factors are almost always present namely slope, lithology (geology), land use/land cover (LU/LC), drainage/river density, altitude (elevation), topographic wetness index (TWI), distance from river, rainfall, and aspect. The data integration was carried random forest, support vector machine and boost regression tree among the machine learning techniques. Our study shows that for optimal results, groundwater mapping must be used as a tool to complement field work, rather than a low-cost substitute. Consequently, more study should be conducted to enhance the generalization and precision of groundwater potential map.

EEG Feature Engineering for Machine Learning-Based CPAP Titration Optimization in Obstructive Sleep Apnea

  • Juhyeong Kang;Yeojin Kim;Jiseon Yang;Seungwon Chung;Sungeun Hwang;Uran Oh;Hyang Woon Lee
    • International journal of advanced smart convergence
    • /
    • v.12 no.3
    • /
    • pp.89-103
    • /
    • 2023
  • Obstructive sleep apnea (OSA) is one of the most prevalent sleep disorders that can lead to serious consequences, including hypertension and/or cardiovascular diseases, if not treated promptly. Continuous positive airway pressure (CPAP) is widely recognized as the most effective treatment for OSA, which needs the proper titration of airway pressure to achieve the most effective treatment results. However, the process of CPAP titration can be time-consuming and cumbersome. There is a growing importance in predicting personalized CPAP pressure before CPAP treatment. The primary objective of this study was to optimize the CPAP titration process for obstructive sleep apnea patients through EEG feature engineering with machine learning techniques. We aimed to identify and utilize the most critical EEG features to forecast key OSA predictive indicators, ultimately facilitating more precise and personalized CPAP treatment strategies. Here, we analyzed 126 OSA patients' PSG datasets before and after the CPAP treatment. We extracted 29 EEG features to predict the features that have high importance on the OSA prediction index which are AHI and SpO2 by applying the Shapley Additive exPlanation (SHAP) method. Through extracted EEG features, we confirmed the six EEG features that had high importance in predicting AHI and SpO2 using XGBoost, Support Vector Machine regression, and Random Forest Regression. By utilizing the predictive capabilities of EEG-derived features for AHI and SpO2, we can better understand and evaluate the condition of patients undergoing CPAP treatment. The ability to predict these key indicators accurately provides more immediate insight into the patient's sleep quality and potential disturbances. This not only ensures the efficiency of the diagnostic process but also provides more tailored and effective treatment approach. Consequently, the integration of EEG analysis into the sleep study protocol has the potential to revolutionize sleep diagnostics, offering a time-saving, and ultimately more effective evaluation for patients with sleep-related disorders.

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.

The Analysis on the Relationship between Firms' Exposures to SNS and Stock Prices in Korea (기업의 SNS 노출과 주식 수익률간의 관계 분석)

  • Kim, Taehwan;Jung, Woo-Jin;Lee, Sang-Yong Tom
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.233-253
    • /
    • 2014
  • Can the stock market really be predicted? Stock market prediction has attracted much attention from many fields including business, economics, statistics, and mathematics. Early research on stock market prediction was based on random walk theory (RWT) and the efficient market hypothesis (EMH). According to the EMH, stock market are largely driven by new information rather than present and past prices. Since it is unpredictable, stock market will follow a random walk. Even though these theories, Schumaker [2010] asserted that people keep trying to predict the stock market by using artificial intelligence, statistical estimates, and mathematical models. Mathematical approaches include Percolation Methods, Log-Periodic Oscillations and Wavelet Transforms to model future prices. Examples of artificial intelligence approaches that deals with optimization and machine learning are Genetic Algorithms, Support Vector Machines (SVM) and Neural Networks. Statistical approaches typically predicts the future by using past stock market data. Recently, financial engineers have started to predict the stock prices movement pattern by using the SNS data. SNS is the place where peoples opinions and ideas are freely flow and affect others' beliefs on certain things. Through word-of-mouth in SNS, people share product usage experiences, subjective feelings, and commonly accompanying sentiment or mood with others. An increasing number of empirical analyses of sentiment and mood are based on textual collections of public user generated data on the web. The Opinion mining is one domain of the data mining fields extracting public opinions exposed in SNS by utilizing data mining. There have been many studies on the issues of opinion mining from Web sources such as product reviews, forum posts and blogs. In relation to this literatures, we are trying to understand the effects of SNS exposures of firms on stock prices in Korea. Similarly to Bollen et al. [2011], we empirically analyze the impact of SNS exposures on stock return rates. We use Social Metrics by Daum Soft, an SNS big data analysis company in Korea. Social Metrics provides trends and public opinions in Twitter and blogs by using natural language process and analysis tools. It collects the sentences circulated in the Twitter in real time, and breaks down these sentences into the word units and then extracts keywords. In this study, we classify firms' exposures in SNS into two groups: positive and negative. To test the correlation and causation relationship between SNS exposures and stock price returns, we first collect 252 firms' stock prices and KRX100 index in the Korea Stock Exchange (KRX) from May 25, 2012 to September 1, 2012. We also gather the public attitudes (positive, negative) about these firms from Social Metrics over the same period of time. We conduct regression analysis between stock prices and the number of SNS exposures. Having checked the correlation between the two variables, we perform Granger causality test to see the causation direction between the two variables. The research result is that the number of total SNS exposures is positively related with stock market returns. The number of positive mentions of has also positive relationship with stock market returns. Contrarily, the number of negative mentions has negative relationship with stock market returns, but this relationship is statistically not significant. This means that the impact of positive mentions is statistically bigger than the impact of negative mentions. We also investigate whether the impacts are moderated by industry type and firm's size. We find that the SNS exposures impacts are bigger for IT firms than for non-IT firms, and bigger for small sized firms than for large sized firms. The results of Granger causality test shows change of stock price return is caused by SNS exposures, while the causation of the other way round is not significant. Therefore the correlation relationship between SNS exposures and stock prices has uni-direction causality. The more a firm is exposed in SNS, the more is the stock price likely to increase, while stock price changes may not cause more SNS mentions.

Predicting Crime Risky Area Using Machine Learning (머신러닝기반 범죄발생 위험지역 예측)

  • HEO, Sun-Young;KIM, Ju-Young;MOON, Tae-Heon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.64-80
    • /
    • 2018
  • In Korea, citizens can only know general information about crime. Thus it is difficult to know how much they are exposed to crime. If the police can predict the crime risky area, it will be possible to cope with the crime efficiently even though insufficient police and enforcement resources. However, there is no prediction system in Korea and the related researches are very much poor. From these backgrounds, the final goal of this study is to develop an automated crime prediction system. However, for the first step, we build a big data set which consists of local real crime information and urban physical or non-physical data. Then, we developed a crime prediction model through machine learning method. Finally, we assumed several possible scenarios and calculated the probability of crime and visualized the results in a map so as to increase the people's understanding. Among the factors affecting the crime occurrence revealed in previous and case studies, data was processed in the form of a big data for machine learning: real crime information, weather information (temperature, rainfall, wind speed, humidity, sunshine, insolation, snowfall, cloud cover) and local information (average building coverage, average floor area ratio, average building height, number of buildings, average appraised land value, average area of residential building, average number of ground floor). Among the supervised machine learning algorithms, the decision tree model, the random forest model, and the SVM model, which are known to be powerful and accurate in various fields were utilized to construct crime prevention model. As a result, decision tree model with the lowest RMSE was selected as an optimal prediction model. Based on this model, several scenarios were set for theft and violence cases which are the most frequent in the case city J, and the probability of crime was estimated by $250{\times}250m$ grid. As a result, we could find that the high crime risky area is occurring in three patterns in case city J. The probability of crime was divided into three classes and visualized in map by $250{\times}250m$ grid. Finally, we could develop a crime prediction model using machine learning algorithm and visualized the crime risky areas in a map which can recalculate the model and visualize the result simultaneously as time and urban conditions change.

Control of Trophoblast Gene Expression and Cell Differentiation

  • Cheon, Jong-Yun
    • 대한생식의학회:학술대회논문집
    • /
    • 2001.03a
    • /
    • pp.195-205
    • /
    • 2001
  • 태반 영양배엽 (trophoblast)은 포유동물의 발생과정 중 가장 먼저 분화되는 세포로서, 자궁환경내에서 배아가 착상, 발생, 및 분화하기 위해서 반드시 필요한 태반을 형성하는 색심적인 세포이다. 영양배엽 세포의 분화과정중의 결함은 배아의 사산이나 임신질환 등의 치명적 결과를 초래한다. 하지만, 영양배엽 세포의 분화를 조절하는 분자생물학적인 메카니즘은 아직 규명되지 않고 있다. 영양배엽 세포의 분화를 조절하는 경로를 규경하기 위한 선결과제는 분화된 영양배엽 세포에서만 발현하는 많은 유전자들이 밝혀져야만 한다. 본 연구팀은 최근에 분화된 영양배엽 세포에서만 발현하는 두 종류의 새로운 유전자들을 찾았다. 한 종류는 homeobox를 보유하고 있는 조절 유전자 Psx이고, 다른 한 종류는 임신호르몬인 태반 프로락틴 라이크 단백질 유전자 PLP-C${\beta}$이다. 본 연구과제의 목표는 이들 유전자의 기능과 조절 메카니즘을 규명함으로써, 영양배엽 세포의 분화를 조절하는 조절경로를 밝히는 것이다. 이를 위하여 다음과 같은 일련의 연구를 수행할 것이다. 1) Psx 유전자가 분화된 영양배엽 세포에서만 발현케 하는 조절 메카니즘을 규명하기 위해 functional assays, in vitro footprinting, gel mobility shift assays, 생쥐형질전화, UV crosslinking, Southwestern blot 등의 방법을 통해 Psx 유전자의 cis-acting 요인과 trans-acting factor를 밝혀 분석한다. 2) 영양배엽 세포의 분화조절 경로를 규명하기 위해 random oligonuclotide library screening, DD-PCR, subtractive screening 등의 방법을 이용하여 Psx 유전자에 의해 조절되는 하부유전자를 밝힌다. 3) Psx 유전자를 knock-out시켜 영양배엽 세포가 발달 및 분화하는데 미치는 역할을 밝힌다. 4) Yeast two-hybrid screening방법을 이용하여 태반 프로락틴 유전자의 수용체를 찾아 이들의 신호전달 기전을 밝힌다. 제1차년 연구결과로서, mouse와 rat으로부터 각각 Psx 유전자의 genomic DNA를 클로닝하여, 유전자 구조를 비교한 결과, mouse Psx (mPsx2)는 4개의 exons으로 이루어져 있는 반면에, rat Psx (Psx3)는 3개의 exons으로 구성되어 있었다. 즉, rPsx3는 mPsx2의 exon1이 없었다. Notrhern blot과 in situ hybridization 분석에 의해 mouse와 rat에서 Psx 유전자가 다르게 발현 조절되는 현상을 밝혔다. 실제로 mPsx2와 rPsx3의 5'-flanking지역을 클로닝하여 염기서열 분석 결과 전혀 homology를 찾을 수 없었다. 또한, 이들 각각 promoter의 activity를 luciferase reporter를 이용하여 조사한 결과 Rcho-1 trophoblast cells에서 각기 다른 activity를 보여 주는 것을 발견하였다. Psx 유전자의 transcription start sites는 Primer extension에 의해 밝혔다. 또한 Psx2 유전자를 knock-out 시키기 위해 targeting vector를 Osdupde1에 제작하였다. 본 과제를 시작할 때 새로운 프로락틴 유전자 하나를 클로닝하여 이 유전자를 PLP-I라고 이름을 붙였다. 이 후 이 유전자 (PLP-I)는 PLP-C${\beta}$라고 이름을 붙이게 되었다. Mouse PLP-C${\beta}$ 유전자의 counterpart를 rat에서 찾아 염기서열을 비교한 결과 mouse와 rat에서 PLP-C${\beta}$유전자의 homology는 약 79% (amino acid level)였다. 본 연구과정을 통해 또 하나의 새로운 PLP-C subfamily member를 mouse로부터 클로닝 하였고, 이 유전자를 PLP-C${\gamma}$라 하였다. PLP-C${\beta}$와 PLP-C${\gamma}$의 발현 유형은 Northern blot과 in 냐셔 hybridization 분석에 의해 태반의 제한된 spongitrophoblast와 trophoblast giant cells에서만 발현하는 것을 밝혔다. 놀랍게도 이들 두 새로운 유전자는 alternative splicing에 의해 두 종류의 isoform이 있음을 밝혔다. PLP family member 유전자로서 splicing에 의한 isoforms을 보여 주는 유전자로는 PLP-C${\beta}$와 PLP-C${\gamma}$가 최초이다. 이들 isoform mRNAs의 발현 유형은 RT-PCR 방법을 이용하여 규명하였다. 또 하나의 새로운 발견은 PLP-C${\beta}$와 PLP-C${\gamma}$가 독특한 유전자 구조를 갖고 있었다. 즉, PLP-C${\beta}$는 exon3의 alternative splicing에 의해 5개 혹은 6개의 exons을 갖는 two isoforms이 생긴다. 반면에 PLP-C${\gamma}$는 exon2가 alternative splcing이 되면서 7개의 exons을 갖거나 6개의 exons을 갖는 isoforms을 만든다. 그리고, PLP-C${\gamma}$의 promoter activity를 trophoblast Rcho-l${\gamma}$ 세포주를 이용하여 PLP-C${\gamma}$ 의 1.5 kb 5'-flanking 지역이 trophoblast-specific promoter activity를 갖고 있음을 밝혔다. PLP-C${\gamma}$ 유전자의 transcription start site는 Primer extension에 의해 밝혔다. 제 1차 년도의 연구결과를 토대로, 2차년에서는 다음단계의 연구를 수행하고자 한다. 즉, 1) mPsx2와 rPsx3의 promoter를 비교분석 함으로서 mouse와 rat에서 Psx 유전자가 다르게 조절되는 메카니즘 규명, 2) Psx와 PLP-C 유전자의 promoter에 있는 cis-acting elements 탐색, 3) Psx2와 Psx3의 단백질을 이용하여 이들이 binding하는 target sequence 규명, 4) 제작한 Psx2 targeting vector를 이용하여 ES cells에서 Psx2 유전자 knock-out, 5) Psx 유전자를 과발현시키는 세포주를 만들고 Psx에 의해 조절되는 유전자 탐색, 6) 새로 밝히 PLP-C members 유전자들의 조절기전을 Rcho-1 세포주를 이용하여 여러 거지 성장인자와 다른 호르몬에 대한 반응을 탐색, 7) Psx와 PLP-C${\gamma}$ 유전자의 chromosomal mapping 등을 밝힐 것이다.

  • PDF

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.