• Title/Summary/Keyword: Bayesian model

Search Result 1,312, Processing Time 0.02 seconds

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Origin and Source Appointment of Sedimentary Organic Matter in Marine Fish Cage Farms Using Carbon and Nitrogen Stable Isotopes (탄소 및 질소 안정동위원소를 활용한 어류 가두리 양식장 내 퇴적 유기물의 기원 및 기여도 평가)

  • Young-Shin Go;Dae-In Lee;Chung Sook Kim;Bo-Ram Sim;Hyung Chul Kim;Won-Chan Lee;Dong-Hun Lee
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.2
    • /
    • pp.99-110
    • /
    • 2022
  • We investigated physicochemical properties and isotopic compositions of organic matter (δ13CTOC and δ 15NTN) in the old fish farming (OFF) site after the cessation of aquaculture farming. Based on this approach, our objective is to determine the organic matter origin and their relative contributions preserved at sediments of fish farming. Temporal and spatial distribution of particulate and sinking organic matter(OFF sites: 2.0 to 3.3 mg L-1 for particulate matter concentration, 18.8 to 246.6 g m-2 day-1 for sinking organic matter rate, control sites: 2.0 to 3.5 mg L-1 for particulate matter concentration, 25.5 to 129.4 g m-2 day-1 for sinking organic matter rate) between both sites showed significant difference along seasonal precipitations. In contrast to variations of δ13CTOC and δ15NTN values at water columns, these isotopic compositions (OFF sites: -21.5‰ to -20.4‰ for δ13 CTOC, 6.0‰ to 7.6‰ for δ15NTN, control sites: -21.6‰ to -21.0‰ for δ13CTOC, 6.6‰ to 8.0‰ for δ15NTN) investigated at sediments have distinctive isotopic patterns(p<0.05) for seawater-derived nitrogen sources, indicating the increased input of aquaculture-derived sources (e.g., fish fecal). With respect to past fish farming activities, representative sources(e.g., fish fecal and algae) between both sites showed significant difference (p<0.05), confirming predominant contribution (55.9±4.6%) of fish fecal within OFF sites. Thus, our results may determine specific controlling factor for sustainable use of fish farming sites by estimating the discriminative contributions of organic matter between both sites.