Search | Korea Science

RoutingConvNet: A Light-weight Speech Emotion Recognition Model Based on Bidirectional MFCC (RoutingConvNet: 양방향 MFCC 기반 경량 음성감정인식 모델)

Hyun Taek Lim;Soo Hyung Kim;Guee Sang Lee;Hyung Jeong Yang
- Smart Media Journal
- /
- v.12 no.5
- /
- pp.28-35
- /
- 2023
In this study, we propose a new light-weight model RoutingConvNet with fewer parameters to improve the applicability and practicality of speech emotion recognition. To reduce the number of learnable parameters, the proposed model connects bidirectional MFCCs on a channel-by-channel basis to learn long-term emotion dependence and extract contextual features. A light-weight deep CNN is constructed for low-level feature extraction, and self-attention is used to obtain information about channel and spatial signals in speech signals. In addition, we apply dynamic routing to improve the accuracy and construct a model that is robust to feature variations. The proposed model shows parameter reduction and accuracy improvement in the overall experiments of speech emotion datasets (EMO-DB, RAVDESS, and IEMOCAP), achieving 87.86%, 83.44%, and 66.06% accuracy respectively with about 156,000 parameters. In this study, we proposed a metric to calculate the trade-off between the number of parameters and accuracy for performance evaluation against light-weight.
https://doi.org/10.30693/SMJ.2023.12.5.28 인용 PDF

Accuracy Analysis of Close-Range Digital Photogrammetry for Measuring Displacement about Loading to Structure (하중에 따른 구조물 변위계측을 위한 근접수치사진측량의 정확도 분석)

Choi, Hyun;Ahn, Chang Hwan
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.29 no.4D
- /
- pp.545-553
- /
- 2009
This paper describes the result of study on measurement of displacement of structure by means of non-contacting method, close-range digital photogrammetry using digital camera. To apply close-range digital photogrammetry to displacement measurement of structure, correction of lens distortion that interferes geometrical analysis has been carried out and then measuring displacement was performed on load regulated-rahmen. For enhanced applicability of displacement measurement, MIDAS which is a structural analysis program was used for modeling and the result was taken from comparative analysis. As a result of the study, it is showed that close-range digital photogrammetry could supplement several weaknesses of LVDT and cable displacement meter and, especially, economy in the perspective of measuring time could be realized. Close-range digital photogrammetry using digital camera can be applied to the area where requires visual analysis such as 3D modeling of structure, profile replication of measurement of structure as well as measurement of displacement of structure.
https://doi.org/10.12652/Ksce.2009.29.4D.545 인용 PDF

A Self-Guided Approach to Enhance Korean Text Generation in Writing Assistants (A Self-Guided Approach을 활용한 한국어 텍스트 생성 쓰기 보조 기법의 향상 방법)

Donghyeon Jang;Jinsu Kim;Minho Lee
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2023.07a
- /
- pp.541-544
- /
- 2023
LLM(Largescale Language Model)의 성능 향상을 위한 비용 효율적인 방법으로 ChatGPT, GPT-4와 같은 초거대 모델의 output에 대해 SLM(Small Language Model)을 finetune하는 방법이 주목받고 있다. 그러나, 이러한 접근법은 주로 범용적인 지시사항 모델을 위한 학습 방법으로 사용되며, 제한된 특정 도메인에서는 추가적인 성능 개선의 여지가 있다. 본 연구는 특정 도메인(Writing Assistant)에서의 성능 향상을 위한 새로운 방법인 Self-Guided Approach를 제안한다. Self-Guided Approach는 (1) LLM을 활용해 시드 데이터에 대해 도메인 특화된 metric(유용성, 관련성, 정확성, 세부사항의 수준별) 점수를 매기고, (2) 점수가 매겨진 데이터와 점수가 매겨지지 않은 데이터를 모두 활용하여 supervised 방식으로 SLM을 미세 조정한다. Vicuna에서 제안된 평가 방법인, GPT-4를 활용한 자동평가 프레임워크를 사용하여 Self-Guided Approach로 학습된 SLM의 성능을 평가하였다. 평가 결과 Self-Guided Approach가 Self-instruct, alpaca와 같이, 생성된 instruction 데이터에 튜닝하는 기존의 훈련 방법에 비해 성능이 향상됨을 확인했다. 다양한 스케일의 한국어 오픈 소스 LLM(Polyglot1.3B, PolyGlot3.8B, PolyGlot5.8B)에 대해서 Self-Guided Approach를 활용한 성능 개선을 확인했다. 평가는 GPT-4를 활용한 자동 평가를 진행했으며, Korean Novel Generation 도메인의 경우, 테스트 셋에서 4.547점에서 6.286점의 성능 향상이 발생했으며, Korean scenario Genration 도메인의 경우, 테스트 셋에서 4.038점에서 5.795 점의 성능 향상이 발생했으며, 다른 유사 도메인들에서도 비슷한 점수 향상을 확인했다. Self-Guided Approach의 활용을 통해 특정 도메인(Writing Assistant)에서의 SLM의 성능 개선 가능성을 확인했으며 이는 LLM에 비용부담을 크게 줄이면서도 제한된 도메인에서 성능을 유지하며, LLM을 활용한 응용 서비스에 있어 실질적인 도움을 제공할 수 있을 것으로 기대된다.
PDF

Empirical Study for Automatic Evaluation of Abstractive Summarization by Error-Types (오류 유형에 따른 생성요약 모델의 본문-요약문 간 요약 성능평가 비교)

Seungsoo Lee;Sangwoo Kang
- Korean Journal of Cognitive Science
- /
- v.34 no.3
- /
- pp.197-226
- /
- 2023
Generative Text Summarization is one of the Natural Language Processing tasks. It generates a short abbreviated summary while preserving the content of the long text. ROUGE is a widely used lexical-overlap based metric for text summarization models in generative summarization benchmarks. Although it shows very high performance, the studies report that 30% of the generated summary and the text are still inconsistent. This paper proposes a methodology for evaluating the performance of the summary model without using the correct summary. AggreFACT is a human-annotated dataset that classifies the types of errors in neural text summarization models. Among all the test candidates, the two cases, generation summary, and when errors occurred throughout the summary showed the highest correlation results. We observed that the proposed evaluation score showed a high correlation with models finetuned with BART and PEGASUS, which is pretrained with a large-scale Transformer structure.
https://doi.org/10.19066/cogsci.2023.34.3.003 인용 PDF

Assessment of Apartment Building Construction Workers' Noise Exposure (아파트 건설노동자 소음 노출평가)

Taesun Kang
- Journal of Korean Society of Occupational and Environmental Hygiene
- /
- v.33 no.3
- /
- pp.308-316
- /
- 2023
Objectives: The aim of this study is to measure and assess the occupational noise exposure levels among construction workers at apartment building construction sites in South Korea. Methods: Noise exposure assessments were conducted for 139 construction workers across 10 different trades at 53 apartment building construction sites in the northern part of Gyeonggi-do. Assessments were carried out using a noise dosimeter set with a 90 dB criterion, an 80 dB threshold, and a 5 dB exchange rate over a period of more than 6 hours(L_MOEL) Results: The mean L_MOEL (equivalent continuous noise level over 8 hours) for the 139 dosimeter samples was 87.8 ± 4.3 dBA. The mean noise exposure level for each construction trade, referred to as the trade mean, was also calculated. Significant differences in noise exposure levels were observed between construction trades (ANOVA, p < 0.001). The highest L_MOEL values were recorded for concrete chippers (93.2 ± 2.6 dBA), followed by ironworkers (88.4 ± 0.7 dBA), concrete finishers (88.3 ± 2.7 dBA), masonry workers (87.7 ± 1.9 dBA), pile driver operators (85.6 ± 1.7 dBA), concrete carpenters (84.9 ± 2.4 dBA), interior carpenters (83.5 ± 2.1 dBA), and other groups (81.4 ± 2.2 dBA). Conclusions: The findings suggest that nearly all construction workers in this study are at risk of Noise-Induced Hearing Loss (NIHL). Moreover, the study establishes that construction trades can serve as a useful metric for assessing noise exposure levels at apartment construction sites.
https://doi.org/10.15269/JKSOEH.2023.33.3.308 인용 PDF

Jaccard Index Reflecting Time-Context for User-based Collaborative Filtering

Soojung Lee
- Journal of the Korea Society of Computer and Information
- /
- v.28 no.10
- /
- pp.163-170
- /
- 2023
The user-based collaborative filtering technique, one of the implementation methods of the recommendation system, recommends the preferred items of neighboring users based on the calculations of neighboring users with similar rating histories. However, it fundamentally has a data scarcity problem in which the quality of recommendations is significantly reduced when there is little common rating history. To solve this problem, many existing studies have proposed various methods of combining Jaccard index with a similarity measure. In this study, we introduce a time-aware concept to Jaccard index and propose a method of weighting common items with different weights depending on the rating time. As a result of conducting experiments using various performance metrics and time intervals, it is confirmed that the proposed method showed the best performance compared to the original Jaccard index at most metrics, and that the optimal time interval differs depending on the type of performance metric.
https://doi.org/10.9708/jksci.2023.28.10.163 인용 PDF HTML

Geometric and Semantic Improvement for Unbiased Scene Graph Generation

Ruhui Zhang;Pengcheng Xu;Kang Kang;You Yang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.10
- /
- pp.2643-2657
- /
- 2023
Scene graphs are structured representations that can clearly convey objects and the relationships between them, but are often heavily biased due to the highly skewed, long-tailed relational labeling in the dataset. Indeed, the visual world itself and its descriptions are biased. Therefore, Unbiased Scene Graph Generation (USGG) prefers to train models to eliminate long-tail effects as much as possible, rather than altering the dataset directly. To this end, we propose Geometric and Semantic Improvement (GSI) for USGG to mitigate this issue. First, to fully exploit the feature information in the images, geometric dimension and semantic dimension enhancement modules are designed. The geometric module is designed from the perspective that the position information between neighboring object pairs will affect each other, which can improve the recall rate of the overall relationship in the dataset. The semantic module further processes the embedded word vector, which can enhance the acquisition of semantic information. Then, to improve the recall rate of the tail data, the Class Balanced Seesaw Loss (CBSLoss) is designed for the tail data. The recall rate of the prediction is improved by penalizing the body or tail relations that are judged incorrectly in the dataset. The experimental findings demonstrate that the GSI method performs better than mainstream models in terms of the mean Recall@K (mR@K) metric in three tasks. The long-tailed imbalance in the Visual Genome 150 (VG150) dataset is addressed better using the GSI method than by most of the existing methods.
https://doi.org/10.3837/tiis.2023.10.003 인용 PDF HTML

Smart Healthcare Access Management System using Iris Recognition (홍채인식을 이용한 스마트 헬스케어 출입관리 시스템)

Kwan-Hee Lee;Ji-In Kim;Goo-Rak Kwon
- The Journal of the Korea institute of electronic communication sciences
- /
- v.18 no.5
- /
- pp.971-980
- /
- 2023
Safety accidents and industrial accidents are constantly occurring in existing industrial sites. In addition, the probability of accidents occurring due to physical and mental fatigue of workers is increasing. Accordingly, it is required to introduce systematic management and various systems for the safety of workers. In this paper, by developing an access control system using bio-metric information at industrial sites, we develop efficient health management and access control management functions for workers. Workers are identified through face recognition for access control, and health status is determined through iris recognition. It aims to improve accuracy and develop a more efficient management system by diagnosing signs of health abnormalities through the congestion of the iris and eyes of workers. Finally, the contents of the development consist of an on-site access control system, an access control program for administrators, and a main server system that diagnoses signs of abnormal health of users.
https://doi.org/10.13067/JKIECS.2023.18.5.971 인용 PDF

A Study on the Impact of Speech Data Quality on Speech Recognition Models

Yeong-Jin Kim;Hyun-Jong Cha;Ah Reum Kang
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.1
- /
- pp.41-49
- /
- 2024
Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.
https://doi.org/10.9708/jksci.2024.29.01.041 인용 PDF HTML

Assessment of compressive strength of high-performance concrete using soft computing approaches

Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
- Computers and Concrete
- /
- v.33 no.1
- /
- pp.55-75
- /
- 2024
The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.
https://doi.org/10.12989/cac.2024.33.1.055 인용

Search Result 2,919, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)