• Title/Summary/Keyword: deep similarity

Search Result 224, Processing Time 0.026 seconds

Method of Extracting the Topic Sentence Considering Sentence Importance based on ELMo Embedding (ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출 방법)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.39-46
    • /
    • 2021
  • This study is about a method of extracting a summary from a news article in consideration of the importance of each sentence constituting the article. We propose a method of calculating sentence importance by extracting the probabilities of topic sentence, similarity with article title and other sentences, and sentence position as characteristics that affect sentence importance. At this time, a hypothesis is established that the Topic Sentence will have a characteristic distinct from the general sentence, and a deep learning-based classification model is trained to obtain a topic sentence probability value for the input sentence. Also, using the pre-learned ELMo language model, the similarity between sentences is calculated based on the sentence vector value reflecting the context information and extracted as sentence characteristics. The topic sentence classification performance of the LSTM and BERT models was 93% accurate, 96.22% recall, and 89.5% precision, resulting in high analysis results. As a result of calculating the importance of each sentence by combining the extracted sentence characteristics, it was confirmed that the performance of extracting the topic sentence was improved by about 10% compared to the existing TextRank algorithm.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Evaluation of Criteria for Mapping Characters Using an Automated Hangul Font Generation System based on Deep Learning (딥러닝 학습을 이용한 한글 글꼴 자동 제작 시스템에서 글자 쌍의 매핑 기준 평가)

  • Jeon, Ja-Yeon;Ji, Young-Seo;Park, Dong-Yeon;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.7
    • /
    • pp.850-861
    • /
    • 2020
  • Hangul is a language that is composed of initial, medial, and final syllables. It has 11,172 characters. For this reason, the current method of designing all the characters by hand is very expensive and time-consuming. In order to solve the problem, this paper proposes an automatic Hangul font generation system and evaluates the standards for mapping Hangul characters to produce an effective automated Hangul font generation system. The system was implemented using character generation engine based on deep learning CycleGAN. In order to evaluate the criteria when mapping characters in pairs, each criterion was designed based on Hangul structure and character shape, and the quality of the generated characters was evaluated. As a result of the evaluation, the standards designed based on the Hangul structure did not affect the quality of the automated Hangul font generation system. On the other hand, when tried with similar characters, the standards made based on the shape of Hangul characters produced better quality characters than when tried with less similar characters. As a result, it is better to generate automated Hangul font by designing a learning method based on mapping characters in pairs that have similar character shapes.

Real-time Human Pose Estimation using RGB-D images and Deep Learning

  • Rim, Beanbonyka;Sung, Nak-Jun;Ma, Jun;Choi, Yoo-Joo;Hong, Min
    • Journal of Internet Computing and Services
    • /
    • v.21 no.3
    • /
    • pp.113-121
    • /
    • 2020
  • Human Pose Estimation (HPE) which localizes the human body joints becomes a high potential for high-level applications in the field of computer vision. The main challenges of HPE in real-time are occlusion, illumination change and diversity of pose appearance. The single RGB image is fed into HPE framework in order to reduce the computation cost by using depth-independent device such as a common camera, webcam, or phone cam. However, HPE based on the single RGB is not able to solve the above challenges due to inherent characteristics of color or texture. On the other hand, depth information which is fed into HPE framework and detects the human body parts in 3D coordinates can be usefully used to solve the above challenges. However, the depth information-based HPE requires the depth-dependent device which has space constraint and is cost consuming. Especially, the result of depth information-based HPE is less reliable due to the requirement of pose initialization and less stabilization of frame tracking. Therefore, this paper proposes a new method of HPE which is robust in estimating self-occlusion. There are many human parts which can be occluded by other body parts. However, this paper focuses only on head self-occlusion. The new method is a combination of the RGB image-based HPE framework and the depth information-based HPE framework. We evaluated the performance of the proposed method by COCO Object Keypoint Similarity library. By taking an advantage of RGB image-based HPE method and depth information-based HPE method, our HPE method based on RGB-D achieved the mAP of 0.903 and mAR of 0.938. It proved that our method outperforms the RGB-based HPE and the depth-based HPE.

Effect of Combisteamer Oven Cooking Condition on Quality Characteristics of Pork Cutlets (콤비스티머 오븐조리조건이 돈가스 품질 특성에 미치는 영향)

  • Kim, In-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.7
    • /
    • pp.3123-3129
    • /
    • 2011
  • Deep-frying pork cutlets contains high fat and calories and can cause obesity, even though it has a high preference among young consumers in Korea. In this study, we have investigated the use of oven cooking method and is studying quality characteristics of pork cutlets for the contribute to improving national health. For the replace the deep-frying method, the pork cutlet was using canola oil added brown crumbs and optimization the oven cooking time, temperature, relative humidity, fan speed. The fat content and calories of oven pork cutlet reduced by 55.4% and 28.6% respectively(P<0.05), when compared to frying method. In a color experiment, texture characteristics and separation ration of batter, oven pork cutlet has no difference(P>0.05), in a sensory characteristics, overall taste has no different (P>0.05) with frying pork cutlet. Therefore, if pork cutlet cooking by oven with optimized condition, without impoverishment of consumer's preference because of the taste similarity with frying pork cutlet and these results may be helpful to people who need dietary treatment.

Structure, Method, and Improved Performance Evaluation Function of SRCNN and VDSR (SRCNN과 VDSR의 구조와 방법 및 개선된 성능평가 함수)

  • Lee, Kwang-Chan;Wang, Guangxing;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.543-548
    • /
    • 2021
  • The higher the resolution of the image, the higher the satisfaction of the viewers of the image, and the super-resolution imaging has a considerable increase in research value among the fields of computer vision and image processing. In this study, the main features of low-resolution image LR are extracted mainly using deep learning super-resolution models. It learns and reconstructs the extracted features, and focuses on reconstruction-based algorithms that generate high-resolution image HR. In this paper, we investigate SRCNN and VDSR in a super-resolution algorithm model based on reconstruction. The structure and algorithm process of the SRCNN and VDSR model are briefly introduced, and the multi-channel and special form are also examined in the improved performance evaluation function, and understand the performance of each algorithm through experiments. In the experiment, an experiment was performed to compare the results of the SRCNN and VDSR models with the peak signal-to-noise ratio and image structure similarity, so that the results can be easily judged.

The Evaluation of Denoising PET Image Using Self Supervised Noise2Void Learning Training: A Phantom Study (자기 지도 학습훈련 기반의 Noise2Void 네트워크를 이용한 PET 영상의 잡음 제거 평가: 팬텀 실험)

  • Yoon, Seokhwan;Park, Chanrok
    • Journal of radiological science and technology
    • /
    • v.44 no.6
    • /
    • pp.655-661
    • /
    • 2021
  • Positron emission tomography (PET) images is affected by acquisition time, short acquisition times results in low gamma counts leading to degradation of image quality by statistical noise. Noise2Void(N2V) is self supervised denoising model that is convolutional neural network (CNN) based deep learning. The purpose of this study is to evaluate denoising performance of N2V for PET image with a short acquisition time. The phantom was scanned as a list mode for 10 min using Biograph mCT40 of PET/CT (Siemens Healthcare, Erlangen, Germany). We compared PET images using NEMA image-quality phantom for standard acquisition time (10 min), short acquisition time (2min) and simulated PET image (S2 min). To evaluate performance of N2V, the peak signal to noise ratio (PSNR), normalized root mean square error (NRMSE), structural similarity index (SSIM) and radio-activity recovery coefficient (RC) were used. The PSNR, NRMSE and SSIM for 2 min and S2 min PET images compared to 10min PET image were 30.983, 33.936, 9.954, 7.609 and 0.916, 0.934 respectively. The RC for spheres with S2 min PET image also met European Association of Nuclear Medicine Research Ltd. (EARL) FDG PET accreditation program. We confirmed generated S2 min PET image from N2V deep learning showed improvement results compared to 2 min PET image and The PET images on visual analysis were also comparable between 10 min and S2 min PET images. In conclusion, noisy PET image by means of short acquisition time using N2V denoising network model can be improved image quality without underestimation of radioactivity.

A study on the Generation Method of Aircraft Wing Flexure Data Using Generative Adversarial Networks (생성적 적대 신경망을 이용한 항공기 날개 플렉셔 데이터 생성 방안에 관한 연구)

  • Ryu, Kyung-Don
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.3
    • /
    • pp.179-184
    • /
    • 2022
  • The accurate wing flexure model is required to improve the transfer alignment performance of guided weapon system mounted on a wing of fighter aircraft or armed helicopter. In order to solve this problem, mechanical or stochastical modeling methods have been studying, but modeling accuracy is too low to be applied to weapon systems. The deep learning techniques that have been studying recently are suitable for nonlinear. However, operating fighter aircraft for deep-learning modeling to secure a large amount of data is practically difficult. In this paper, it was used to generate amount of flexure data samples that are similar to the actual flexure data. And it was confirmed that generated data is similar to the actual data by utilizing "measures of similarity" which measures how much alike the two data objects are.

Learning Source Code Context with Feature-Wise Linear Modulation to Support Online Judge System (온라인 저지 시스템 지원을 위한 Feature-Wise Linear Modulation 기반 소스코드 문맥 학습 모델 설계)

  • Hyun, Kyeong-Seok;Choi, Woosung;Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.473-478
    • /
    • 2022
  • Evaluation learning based on code testing is becoming a popular solution in programming education via Online judge(OJ). In the recent past, many papers have been published on how to detect plagiarism through source code similarity analysis to support OJ. However, deep learning-based research to support automated tutoring is insufficient. In this paper, we propose Input & Output side FiLM models to predict whether the input code will pass or fail. By applying Feature-wise Linear Modulation(FiLM) technique to GRU, our model can learn combined information of Java byte codes and problem information that it tries to solve. On experimental design, a balanced sampling technique was applied to evenly distribute the data due to the occurrence of asymmetry in data collected by OJ. Among the proposed models, the Input Side FiLM model showed the highest performance of 73.63%. Based on result, it has been shown that students can check whether their codes will pass or fail before receiving the OJ evaluation which could provide basic feedback for improvements.

Automatic Classification of Academic Articles Using BERT Model Based on Deep Learning (딥러닝 기반의 BERT 모델을 활용한 학술 문헌 자동분류)

  • Kim, In hu;Kim, Seong hee
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.293-310
    • /
    • 2022
  • In this study, we analyzed the performance of the BERT-based document classification model by automatically classifying documents in the field of library and information science based on the KoBERT. For this purpose, abstract data of 5,357 papers in 7 journals in the field of library and information science were analyzed and evaluated for any difference in the performance of automatic classification according to the size of the learned data. As performance evaluation scales, precision, recall, and F scale were used. As a result of the evaluation, subject areas with large amounts of data and high quality showed a high level of performance with an F scale of 90% or more. On the other hand, if the data quality was low, the similarity with other subject areas was high, and there were few features that were clearly distinguished thematically, a meaningful high-level performance evaluation could not be derived. This study is expected to be used as basic data to suggest the possibility of using a pre-trained learning model to automatically classify the academic documents.