• Title/Summary/Keyword: Speech feature

Search Result 711, Processing Time 0.035 seconds

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

  • Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1122-1128
    • /
    • 2020
  • Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

A Hypothesis Study on the Physiological, Psychotic, and Psychological Factors of Vincent van Gogh's Yellow Expression (빈센트 반 고흐의 노란색 표현에 대한 생리적, 정신증적, 심리적 요인에 대한 가설 연구)

  • Oh, Seoung Jin;Ryu, Jung Mi
    • Journal of Naturopathy
    • /
    • v.11 no.2
    • /
    • pp.123-135
    • /
    • 2022
  • Background: The study aims to examine what color representation means to artists by investigating various hypotheses about van Gogh's expression of yellow and verifying the reason of his preference of yellow. Purpose: The purpose of this study is to investigate whether yellow expression of Vincent van Gogh is a result of physiological responses of alcoholism, expression feature of mental disorder, or a problem caused by psychological motivation. Methods: In order to verify the research question, we referred to research literatures that analyzed a characteristics of Gogh's works in a various area such as psychology and psychiatry, and Gogh's symptoms and his own skills. Results: The findings suggested that Gogh's yellow preference is related to the psychological factors such as inner motivation, not a xanthopsia which is brought about alcoholism and mental disorder. Conclusions: Gogh's Yellow expression is dominantly influenced by the psychological factors. Thus, it can say that the psychological factors has a great on characteristic of artist's color expression.

A Review on Advanced Methodologies to Identify the Breast Cancer Classification using the Deep Learning Techniques

  • Bandaru, Satish Babu;Babu, G. Rama Mohan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.420-426
    • /
    • 2022
  • Breast cancer is among the cancers that may be healed as the disease diagnosed at early times before it is distributed through all the areas of the body. The Automatic Analysis of Diagnostic Tests (AAT) is an automated assistance for physicians that can deliver reliable findings to analyze the critically endangered diseases. Deep learning, a family of machine learning methods, has grown at an astonishing pace in recent years. It is used to search and render diagnoses in fields from banking to medicine to machine learning. We attempt to create a deep learning algorithm that can reliably diagnose the breast cancer in the mammogram. We want the algorithm to identify it as cancer, or this image is not cancer, allowing use of a full testing dataset of either strong clinical annotations in training data or the cancer status only, in which a few images of either cancers or noncancer were annotated. Even with this technique, the photographs would be annotated with the condition; an optional portion of the annotated image will then act as the mark. The final stage of the suggested system doesn't need any based labels to be accessible during model training. Furthermore, the results of the review process suggest that deep learning approaches have surpassed the extent of the level of state-of-of-the-the-the-art in tumor identification, feature extraction, and classification. in these three ways, the paper explains why learning algorithms were applied: train the network from scratch, transplanting certain deep learning concepts and constraints into a network, and (another way) reducing the amount of parameters in the trained nets, are two functions that help expand the scope of the networks. Researchers in economically developing countries have applied deep learning imaging devices to cancer detection; on the other hand, cancer chances have gone through the roof in Africa. Convolutional Neural Network (CNN) is a sort of deep learning that can aid you with a variety of other activities, such as speech recognition, image recognition, and classification. To accomplish this goal in this article, we will use CNN to categorize and identify breast cancer photographs from the available databases from the US Centers for Disease Control and Prevention.

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

A Unit Selection Methods using Flexible Break in a Japanese TTS (일본어 합성기에서 유동 Break를 이용한 합성단위 선택 방법)

  • Song, Young-Hwan;Na, Deok-Su;Kim, Jong-Kuk;Bae, Myung-Jin;Lee, Jong-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.403-408
    • /
    • 2007
  • In a large corpus-based speech synthesizer, a break, which is a parameter influencing the naturalness and intelligibility, is used as an important feature during a unit selection process. Japanese is a language having intonations, which ate indicated by the relative differences in pitch heights and the APs(Accentual Phrases) are placed according to the changes of the accents while a break occurs on a boundary of the APs. Although a break can be predicted by using J-ToBI(Japanese-Tones and Break Indices), which is a rule-based or statistical approach, it is very difficult to predict a break exactly due to the flexibility. Therefore, in this paper, a method is to conduct a unit search by dividing breaks into two types, such as a fixed break and a flexible break, in order to use the advantages of a large-scale corpus, which includes various types of prosodies. As a result of an experiment, the proposed unit selection method contributed itself to enhance the naturalness of synthesized speeches.

Survey Study about Sasangin's Characteristics of Face, Voice, Skin and Pulse Diagnosis (사상인(四象人)의 안면, 음성, 피부 및 맥진 특성에 관한 설문조사 연구)

  • Lee, Jun-Hee;Kim, Yun-Hee;Hwang, Min-Woo;Kim, Jong-Yeol;Lee, Eui-Ju;Song, Il-Byung;Koh, Byung-Hee
    • Journal of Sasang Constitutional Medicine
    • /
    • v.19 no.3
    • /
    • pp.126-143
    • /
    • 2007
  • 1. Objectives The purpose of this study is to find out the grade of practical use, the important element and the significant characteristics of Sasangin' s face, voice, skin and pulse diagnostic impression, in Sasang constitutional clinical diagnosis. 2. Methods We analysed the survey data about Sasangin' s face, voice, skin and pulse diagnostic impression, drawn up by specialist in Sasang constitutional medicine. 3. Results and Conclusions (1) In the application degree of face feature, the case which it was applied with 20-40% and 40-60% were 16 people(43.2%) respectively. In voice, the case applied with 0-20% was 19 people(51.4%), in skin, 0-20% and 20-40% were 14 people(37.8%) respectively and in pulse diagnosis, 0-20% were 25 people(73.0%). (2) In constitutional diagnosis, the important element of face were 'frontal whole shape', 'whole impression' and 'size and shape of eye, ear, mose and mouth', the important element of voice 'speed of speech', 'purity and impurity' and 'pitch', the important element of skin 'thickness', 'feel of touch' and 'size of skin pores' and the important element of pulse diagnosis 'speed of pulse', 'sinking and floating' and 'weakness and firmness'. (3) The important face characteristics of Taeyangin were 'bright eye', 'broad forehead' and 'strong impression', Soyangin 'protruding forehead', 'thin and small lips', 'narrowing and sharp chin', Taeumin 'thick lips', 'flat face', 'large eye, nose, ear and mouth' and Soeumin 'long and slender face', 'downward slanting eyes' and 'small eye, nose, ear and mouth', The important voice characteristics of Taeyangin were 'loud' and 'clear', Soyangin 'rapid' and 'high-pitched tone', Taeumin 'chick', 'slow' and 'low-pitched tone' and Soeumin 'small and feeble' and 'slow'. The important skin characteristics of Taeyangin were 'thin' and 'white', Soyangin 'thin', 'smooth' and 'elastic', Taeumin 'thick', 'large skin-pore', 'coarse' and Soeumin 'soft', 'thin' and 'subtle skin-pore'. The important pulse characteristics of Taeyangin were 'rapid' and 'large', Soyangin 'rapid' and 'floating', Taeumin 'tense', 'long' and 'solid' and Soeumin 'fine', 'weak' and 'slow'.

  • PDF

Place Assimilation in OT

  • Lee, Sechang
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.109-116
    • /
    • 1996
  • In this paper, I would like to explore the possibility that the nature of place assimilation can be captured in terms of the OCP within the Optimality Theory (Mccarthy & Prince 1999. 1995; Prince & Smolensky 1993). In derivational models, each assimilatory process would be expressed through a different autosegmental rule. However, what any such model misses is a clear generalization that all of those processes have the effect of avoiding a configuration in which two consonantal place nodes are adjacent across a syllable boundary, as illustrated in (1):(equation omitted) In a derivational model, it is a coincidence that across languages there are changes that have the result of modifying a structure of the form (1a) into the other structure that does not have adjacent consonantal place nodes (1b). OT allows us to express this effect through a constraint given in (2) that forbids adjacent place nodes: (2) OCP(PL): Adjacent place nodes are prohibited. At this point, then, a question arises as to how consonantal and vocalic place nodes are formally distinguished in the output for the purpose of applying the OCP(PL). Besides, the OCP(PL) would affect equally complex onsets and codas as well as coda-onset clusters in languages that have them such as English. To remedy this problem, following Mccarthy (1994), I assume that the canonical markedness constraint is a prohibition defined over no more than two segments, $\alpha$ and $\beta$: that is, $^{*}\{{\alpha, {\;}{\beta{\}$ with appropriate conditions imposed on $\alpha$ and $\beta$. I propose the OCP(PL) again in the following format (3) OCP(PL) (table omitted) $\alpha$ and $\beta$ are the target and the trigger of place assimilation, respectively. The '*' is a reminder that, in this format, constraints specify negative targets or prohibited configurations. Any structure matching the specifications is in violation of this constraint. Now, in correspondence terms, the meaning of the OCP(PL) is this: the constraint is violated if a consonantal place $\alpha$ is immediately followed by a consonantal place $\bebt$ in surface. One advantage of this format is that the OCP(PL) would also be invoked in dealing with place assimilation within complex coda (e.g., sink [si(equation omitted)k]): we can make the constraint scan the consonantal clusters only, excluding any intervening vowels. Finally, the onset clusters typically do not undergo place assimilation. I propose that the onsets be protected by certain constraint which ensures that the coda, not the onset loses the place feature.

  • PDF

Object Tracking Method using Deep Learning and Kalman Filter (딥 러닝 및 칼만 필터를 이용한 객체 추적 방법)

  • Kim, Gicheol;Son, Sohee;Kim, Minseop;Jeon, Jinwoo;Lee, Injae;Cha, Jihun;Choi, Haechul
    • Journal of Broadcast Engineering
    • /
    • v.24 no.3
    • /
    • pp.495-505
    • /
    • 2019
  • Typical algorithms of deep learning include CNN(Convolutional Neural Networks), which are mainly used for image recognition, and RNN(Recurrent Neural Networks), which are used mainly for speech recognition and natural language processing. Among them, CNN is able to learn from filters that generate feature maps with algorithms that automatically learn features from data, making it mainstream with excellent performance in image recognition. Since then, various algorithms such as R-CNN and others have appeared in object detection to improve performance of CNN, and algorithms such as YOLO(You Only Look Once) and SSD(Single Shot Multi-box Detector) have been proposed recently. However, since these deep learning-based detection algorithms determine the success of the detection in the still images, stable object tracking and detection in the video requires separate tracking capabilities. Therefore, this paper proposes a method of combining Kalman filters into deep learning-based detection networks for improved object tracking and detection performance in the video. The detection network used YOLO v2, which is capable of real-time processing, and the proposed method resulted in 7.7% IoU performance improvement over the existing YOLO v2 network and 20 fps processing speed in FHD images.

A STUDY ON COMORBID DISORDERS AND ASSOCIATED SYMPTOMS OF PERVASIVE DEVELOPMENTAL DISORDER CHILDREN (전반적 발달장애 아동들의 공존질환 및 동반증상에 대한 연구)

  • Kwak, Young-Sook;Kang, Kyung-Mee;Cho, Seong-Jin
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.10 no.1
    • /
    • pp.64-75
    • /
    • 1999
  • Objective:The purpose of this study was to investigate the prevalence and characteristics of comorbid disorders and associated symptoms in pervasive developmental disorder(PDD) and to examine the correlation between associated symptoms and developmental characteristics in PDD children. Method:The sample consisted of 209 cases of PDD and 143 cases of developmental language disorder(DLD)(control group) who were treated at the Seoul National Mental Hospital from Jan. 1996 to Mar. 1999. The diagnostic work based on DSM-IV criteria was performed by one or two child psychiatrists, while the clinical feature was evaluated by doctors’s notes, occupational/speech therapy reports, and results of social maturity scale(SMS), childhood autism rating scale(CARS), and psycho-educational profile(PEP). Two groups were compared on a wide range of measures including comorbid disorders, associated symptoms, treatment drugs, and PEP. The relation between associated symptom & PEP was investigated in total(106 cases) and in each dignostic group. Sixty-four cases of PDD were divided into three groups by CARS and then compared on associated symptoms. Result:The prevalence of comorbid disorder was 19.6% in PDD, 41.2% in DLD. The rate of manifestation of 13 associated symptoms was 31.47% in PDD, 22.13% in DLD on the average. Associated symptoms significantly high in PDD were preoccupation, obsession, self-mutilation, stereotypy, sleep problems, and odd response. In total patient group, associated symptoms that significantly influenced PEP were preoccupation, self-stimulation, stereotypy, inappropriate affect, sleep problems, and odd response. But, in each diagnostic group, no associated symptom influenced PEP. Associated symptoms significantly different between the 3 groups of CARS were stereotypy, anxiety, and sleep problems. Conclusion:These preliminary results suggest that developmental characteristics may influence associated symptoms in PDD children and a realistic approach considering minute diagnosis by associated symptoms and comorbid disorders is required.

  • PDF

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.