• Title/Summary/Keyword: deep network

Search Result 2,982, Processing Time 0.032 seconds

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

  • Guiyoung Son;Soonil Kwon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.6
    • /
    • pp.284-290
    • /
    • 2024
  • Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.

Automatic Detection and Classification of Rib Fractures on Thoracic CT Using Convolutional Neural Network: Accuracy and Feasibility

  • Qing-Qing Zhou;Jiashuo Wang;Wen Tang;Zhang-Chun Hu;Zi-Yi Xia;Xue-Song Li;Rongguo Zhang;Xindao Yin;Bing Zhang;Hong Zhang
    • Korean Journal of Radiology
    • /
    • v.21 no.7
    • /
    • pp.869-879
    • /
    • 2020
  • Objective: To evaluate the performance of a convolutional neural network (CNN) model that can automatically detect and classify rib fractures, and output structured reports from computed tomography (CT) images. Materials and Methods: This study included 1079 patients (median age, 55 years; men, 718) from three hospitals, between January 2011 and January 2019, who were divided into a monocentric training set (n = 876; median age, 55 years; men, 582), five multicenter/multiparameter validation sets (n = 173; median age, 59 years; men, 118) with different slice thicknesses and image pixels, and a normal control set (n = 30; median age, 53 years; men, 18). Three classifications (fresh, healing, and old fracture) combined with fracture location (corresponding CT layers) were detected automatically and delivered in a structured report. Precision, recall, and F1-score were selected as metrics to measure the optimum CNN model. Detection/diagnosis time, precision, and sensitivity were employed to compare the diagnostic efficiency of the structured report and that of experienced radiologists. Results: A total of 25054 annotations (fresh fracture, 10089; healing fracture, 10922; old fracture, 4043) were labelled for training (18584) and validation (6470). The detection efficiency was higher for fresh fractures and healing fractures than for old fractures (F1-scores, 0.849, 0.856, 0.770, respectively, p = 0.023 for each), and the robustness of the model was good in the five multicenter/multiparameter validation sets (all mean F1-scores > 0.8 except validation set 5 [512 x 512 pixels; F1-score = 0.757]). The precision of the five radiologists improved from 80.3% to 91.1%, and the sensitivity increased from 62.4% to 86.3% with artificial intelligence-assisted diagnosis. On average, the diagnosis time of the radiologists was reduced by 73.9 seconds. Conclusion: Our CNN model for automatic rib fracture detection could assist radiologists in improving diagnostic efficiency, reducing diagnosis time and radiologists' workload.

A Performance Comparison of Super Resolution Model with Different Activation Functions (활성함수 변화에 따른 초해상화 모델 성능 비교)

  • Yoo, Youngjun;Kim, Daehee;Lee, Jaekoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.10
    • /
    • pp.303-308
    • /
    • 2020
  • The ReLU(Rectified Linear Unit) function has been dominantly used as a standard activation function in most deep artificial neural network models since it was proposed. Later, Leaky ReLU, Swish, and Mish activation functions were presented to replace ReLU, which showed improved performance over existing ReLU function in image classification task. Therefore, we recognized the need to experiment with whether performance improvements could be achieved by replacing the RELU with other activation functions in the super resolution task. In this paper, the performance was compared by changing the activation functions in EDSR model, which showed stable performance in the super resolution task. As a result, in experiments conducted with changing the activation function of EDSR, when the resolution was converted to double, the existing activation function, ReLU, showed similar or higher performance than the other activation functions used in the experiment. When the resolution was converted to four times, Leaky ReLU and Swish function showed slightly improved performance over ReLU. PSNR and SSIM, which can quantitatively evaluate the quality of images, were able to identify average performance improvements of 0.06%, 0.05% when using Leaky ReLU, and average performance improvements of 0.06% and 0.03% when using Swish. When the resolution is converted to eight times, the Mish function shows a slight average performance improvement over the ReLU. Using Mish, PSNR and SSIM were able to identify an average of 0.06% and 0.02% performance improvement over the RELU. In conclusion, Leaky ReLU and Swish showed improved performance compared to ReLU for super resolution that converts resolution four times and Mish showed improved performance compared to ReLU for super resolution that converts resolution eight times. In future study, we should conduct comparative experiments to replace activation functions with Leaky ReLU, Swish and Mish to improve performance in other super resolution models.

Requirement Analysis for Agricultural Meteorology Information Service Systems based on the Fourth Industrial Revolution Technologies (4차 산업혁명 기술에 기반한 농업 기상 정보 시스템의 요구도 분석)

  • Kim, Kwang Soo;Yoo, Byoung Hyun;Hyun, Shinwoo;Kang, DaeGyoon
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.3
    • /
    • pp.175-186
    • /
    • 2019
  • Efforts have been made to introduce the climate smart agriculture (CSA) for adaptation to future climate conditions, which would require collection and management of site specific meteorological data. The objectives of this study were to identify requirements for construction of agricultural meteorology information service system (AMISS) using technologies that lead to the fourth industrial revolution, e.g., internet of things (IoT), artificial intelligence, and cloud computing. The IoT sensors that require low cost and low operating current would be useful to organize wireless sensor network (WSN) for collection and analysis of weather measurement data, which would help assessment of productivity for an agricultural ecosystem. It would be recommended to extend the spatial extent of the WSN to a rural community, which would benefit a greater number of farms. It is preferred to create the big data for agricultural meteorology in order to produce and evaluate the site specific data in rural areas. The digital climate map can be improved using artificial intelligence such as deep neural networks. Furthermore, cloud computing and fog computing would help reduce costs and enhance the user experience of the AMISS. In addition, it would be advantageous to combine environmental data and farm management data, e.g., price data for the produce of interest. It would also be needed to develop a mobile application whose user interface could meet the needs of stakeholders. These fourth industrial revolution technologies would facilitate the development of the AMISS and wide application of the CSA.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Improvement of Mid-Wave Infrared Image Visibility Using Edge Information of KOMPSAT-3A Panchromatic Image (KOMPSAT-3A 전정색 영상의 윤곽 정보를 이용한 중적외선 영상 시인성 개선)

  • Jinmin Lee;Taeheon Kim;Hanul Kim;Hongtak Lee;Youkyung Han
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1283-1297
    • /
    • 2023
  • Mid-wave infrared (MWIR) imagery, due to its ability to capture the temperature of land cover and objects, serves as a crucial data source in various fields including environmental monitoring and defense. The KOMPSAT-3A satellite acquires MWIR imagery with high spatial resolution compared to other satellites. However, the limited spatial resolution of MWIR imagery, in comparison to electro-optical (EO) imagery, constrains the optimal utilization of the KOMPSAT-3A data. This study aims to create a highly visible MWIR fusion image by leveraging the edge information from the KOMPSAT-3A panchromatic (PAN) image. Preprocessing is implemented to mitigate the relative geometric errors between the PAN and MWIR images. Subsequently, we employ a pre-trained pixel difference network (PiDiNet), a deep learning-based edge information extraction technique, to extract the boundaries of objects from the preprocessed PAN images. The MWIR fusion imagery is then generated by emphasizing the brightness value corresponding to the edge information of the PAN image. To evaluate the proposed method, the MWIR fusion images were generated in three different sites. As a result, the boundaries of terrain and objects in the MWIR fusion images were emphasized to provide detailed thermal information of the interest area. Especially, the MWIR fusion image provided the thermal information of objects such as airplanes and ships which are hard to detect in the original MWIR images. This study demonstrated that the proposed method could generate a single image that combines visible details from an EO image and thermal information from an MWIR image, which contributes to increasing the usage of MWIR imagery.

Radiation Dose Reduction in Digital Mammography by Deep-Learning Algorithm Image Reconstruction: A Preliminary Study (딥러닝 알고리즘을 이용한 저선량 디지털 유방 촬영 영상의 복원: 예비 연구)

  • Su Min Ha;Hak Hee Kim;Eunhee Kang;Bo Kyoung Seo;Nami Choi;Tae Hee Kim;You Jin Ku;Jong Chul Ye
    • Journal of the Korean Society of Radiology
    • /
    • v.83 no.2
    • /
    • pp.344-359
    • /
    • 2022
  • Purpose To develop a denoising convolutional neural network-based image processing technique and investigate its efficacy in diagnosing breast cancer using low-dose mammography imaging. Materials and Methods A total of 6 breast radiologists were included in this prospective study. All radiologists independently evaluated low-dose images for lesion detection and rated them for diagnostic quality using a qualitative scale. After application of the denoising network, the same radiologists evaluated lesion detectability and image quality. For clinical application, a consensus on lesion type and localization on preoperative mammographic examinations of breast cancer patients was reached after discussion. Thereafter, coded low-dose, reconstructed full-dose, and full-dose images were presented and assessed in a random order. Results Lesions on 40% reconstructed full-dose images were better perceived when compared with low-dose images of mastectomy specimens as a reference. In clinical application, as compared to 40% reconstructed images, higher values were given on full-dose images for resolution (p < 0.001); diagnostic quality for calcifications (p < 0.001); and for masses, asymmetry, or architectural distortion (p = 0.037). The 40% reconstructed images showed comparable values to 100% full-dose images for overall quality (p = 0.547), lesion visibility (p = 0.120), and contrast (p = 0.083), without significant differences. Conclusion Effective denoising and image reconstruction processing techniques can enable breast cancer diagnosis with substantial radiation dose reduction.

True Orthoimage Generation from LiDAR Intensity Using Deep Learning (딥러닝에 의한 라이다 반사강도로부터 엄밀정사영상 생성)

  • Shin, Young Ha;Hyung, Sung Woong;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.4
    • /
    • pp.363-373
    • /
    • 2020
  • During last decades numerous studies generating orthoimage have been carried out. Traditional methods require exterior orientation parameters of aerial images and precise 3D object modeling data and DTM (Digital Terrain Model) to detect and recover occlusion areas. Furthermore, it is challenging task to automate the complicated process. In this paper, we proposed a new concept of true orthoimage generation using DL (Deep Learning). DL is rapidly used in wide range of fields. In particular, GAN (Generative Adversarial Network) is one of the DL models for various tasks in imaging processing and computer vision. The generator tries to produce results similar to the real images, while discriminator judges fake and real images until the results are satisfied. Such mutually adversarial mechanism improves quality of the results. Experiments were performed using GAN-based Pix2Pix model by utilizing IR (Infrared) orthoimages, intensity from LiDAR data provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF) through the ISPRS (International Society for Photogrammetry and Remote Sensing). Two approaches were implemented: (1) One-step training with intensity data and high resolution orthoimages, (2) Recursive training with intensity data and color-coded low resolution intensity images for progressive enhancement of the results. Two methods provided similar quality based on FID (Fréchet Inception Distance) measures. However, if quality of the input data is close to the target image, better results could be obtained by increasing epoch. This paper is an early experimental study for feasibility of DL-based true orthoimage generation and further improvement would be necessary.

Comparison of Models for Stock Price Prediction Based on Keyword Search Volume According to the Social Acceptance of Artificial Intelligence (인공지능의 사회적 수용도에 따른 키워드 검색량 기반 주가예측모형 비교연구)

  • Cho, Yujung;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.103-128
    • /
    • 2021
  • Recently, investors' interest and the influence of stock-related information dissemination are being considered as significant factors that explain stock returns and volume. Besides, companies that develop, distribute, or utilize innovative new technologies such as artificial intelligence have a problem that it is difficult to accurately predict a company's future stock returns and volatility due to macro-environment and market uncertainty. Market uncertainty is recognized as an obstacle to the activation and spread of artificial intelligence technology, so research is needed to mitigate this. Hence, the purpose of this study is to propose a machine learning model that predicts the volatility of a company's stock price by using the internet search volume of artificial intelligence-related technology keywords as a measure of the interest of investors. To this end, for predicting the stock market, we using the VAR(Vector Auto Regression) and deep neural network LSTM (Long Short-Term Memory). And the stock price prediction performance using keyword search volume is compared according to the technology's social acceptance stage. In addition, we also conduct the analysis of sub-technology of artificial intelligence technology to examine the change in the search volume of detailed technology keywords according to the technology acceptance stage and the effect of interest in specific technology on the stock market forecast. To this end, in this study, the words artificial intelligence, deep learning, machine learning were selected as keywords. Next, we investigated how many keywords each week appeared in online documents for five years from January 1, 2015, to December 31, 2019. The stock price and transaction volume data of KOSDAQ listed companies were also collected and used for analysis. As a result, we found that the keyword search volume for artificial intelligence technology increased as the social acceptance of artificial intelligence technology increased. In particular, starting from AlphaGo Shock, the keyword search volume for artificial intelligence itself and detailed technologies such as machine learning and deep learning appeared to increase. Also, the keyword search volume for artificial intelligence technology increases as the social acceptance stage progresses. It showed high accuracy, and it was confirmed that the acceptance stages showing the best prediction performance were different for each keyword. As a result of stock price prediction based on keyword search volume for each social acceptance stage of artificial intelligence technologies classified in this study, the awareness stage's prediction accuracy was found to be the highest. The prediction accuracy was different according to the keywords used in the stock price prediction model for each social acceptance stage. Therefore, when constructing a stock price prediction model using technology keywords, it is necessary to consider social acceptance of the technology and sub-technology classification. The results of this study provide the following implications. First, to predict the return on investment for companies based on innovative technology, it is most important to capture the recognition stage in which public interest rapidly increases in social acceptance of the technology. Second, the change in keyword search volume and the accuracy of the prediction model varies according to the social acceptance of technology should be considered in developing a Decision Support System for investment such as the big data-based Robo-advisor recently introduced by the financial sector.

A Combat Effectiveness Evaluation Algorithm Considering Technical and Human Factors in C4I System (NCW 환경에서 C4I 체계 전투력 상승효과 평가 알고리즘 : 기술 및 인적 요소 고려)

  • Jung, Whan-Sik;Park, Gun-Woo;Lee, Jae-Yeong;Lee, Sang-Hoon
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.2
    • /
    • pp.55-72
    • /
    • 2010
  • Recently, the battlefield environment has changed from platform-centric warfare(PCW) which focuses on maneuvering forces into network-centric warfare(NCW) which is based on the connectivity of each asset through the warfare information system as information technology increases. In particular, C4I(Command, Control, Communication, Computer and Intelligence) system can be an important factor in achieving NCW. It is generally used to provide direction across distributed forces and status feedback from thoseforces. It can provide the important information, more quickly and in the correct format to the friendly units. And it can achieve the information superiority through SA(Situational Awareness). Most of the advanced countries have been developed and already applied these systems in military operations. Therefore, ROK forces also have been developing C4I systems such as KJCCS(Korea Joint Command Control System). And, ours are increasing the budgets in the establishment of warfare information systems. However, it is difficult to evaluate the C4I effectiveness properly by deficiency of methods. We need to develop a new combat effectiveness evaluation method that is suitable for NCW. Existing evaluation methods lay disproportionate emphasis on technical factors with leaving something to be desired in human factors. Therefore, it is necessary to consider technical and human factors to evaluate combat effectiveness. In this study, we proposed a new Combat Effectiveness evaluation algorithm called E-TechMan(A Combat Effectiveness Evaluation Algorithm Considering Technical and Human Factors in C4I System). This algorithm uses the rule of Newton's second law($F=(m{\Delta}{\upsilon})/{\Delta}t{\Rightarrow}\frac{V{\upsilon}I}{T}{\times}C$). Five factors considered in combat effectiveness evaluation are network power(M), movement velocity(v), information accuracy(I), command and control time(T) and collaboration level(C). Previous researches did not consider the value of the node and arc in evaluating the network power after the C4I system has been established. In addition, collaboration level which could be a major factor in combat effectiveness was not considered. E-TechMan algorithm is applied to JFOS-K(Joint Fire Operating System-Korea) system that can connect KJCCS of Korea armed forces with JADOCS(Joint Automated Deep Operations Coordination System) of U.S. armed forces and achieve sensor to shooter system in real time in JCS(Joint Chiefs of Staff) level. We compared the result of evaluation of Combat Effectiveness by E-TechMan with those by other algorithms(e.g., C2 Theory, Newton's second Law). We can evaluate combat effectiveness more effectively and substantially by E-TechMan algorithm. This study is meaningful because we improved the description level of reality in calculation of combat effectiveness in C4I system. Part 2 will describe the changes of war paradigm and the previous combat effectiveness evaluation methods such as C2 theory while Part 3 will explain E-TechMan algorithm specifically. Part 4 will present the application to JFOS-K and analyze the result with other algorithms. Part 5 is the conclusions provided in the final part.