• Title/Summary/Keyword: Multi-task Architecture

Search Result 63, Processing Time 0.023 seconds

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network (시간지연 회귀 신경회로망을 이용한 피치 악센트 인식)

  • Kim, Sung-Suk;Kim, Chul;Lee, Wan-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.4E
    • /
    • pp.112-119
    • /
    • 2004
  • This paper presents a method for the automatic recognition of pitch accents with no prior knowledge about the phonetic content of the signal (no knowledge of word or phoneme boundaries or of phoneme labels). The recognition algorithm used in this paper is a time-delay recurrent neural network (TDRNN). A TDRNN is a neural network classier with two different representations of dynamic context: delayed input nodes allow the representation of an explicit trajectory F0(t), while recurrent nodes provide long-term context information that can be used to normalize the input F0 trajectory. Performance of the TDRNN is compared to the performance of a MLP (multi-layer perceptron) and an HMM (Hidden Markov Model) on the same task. The TDRNN shows the correct recognition of $91.9{\%}\;of\;pitch\;events\;and\;91.0{\%}$ of pitch non-events, for an average accuracy of $91.5{\%}$ over both pitch events and non-events. The MLP with contextual input exhibits $85.8{\%},\;85.5{\%},\;and\;85.6{\%}$ recognition accuracy respectively, while the HMM shows the correct recognition of $36.8{\%}\;of\;pitch\;events\;and\;87.3{\%}$ of pitch non-events, for an average accuracy of $62.2{\%}$ over both pitch events and non-events. These results suggest that the TDRNN architecture is useful for the automatic recognition of pitch accents.

Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech (음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성)

  • Kim, Seung-Won;Zheng, Yu;Lee, Gary-Geunbae;Kim, Byeong-Chang
    • MALSORI
    • /
    • no.53
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

Hot Spot Detection of Thermal Infrared Image of Photovoltaic Power Station Based on Multi-Task Fusion

  • Xu Han;Xianhao Wang;Chong Chen;Gong Li;Changhao Piao
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.791-802
    • /
    • 2023
  • The manual inspection of photovoltaic (PV) panels to meet the requirements of inspection work for large-scale PV power plants is challenging. We present a hot spot detection and positioning method to detect hot spots in batches and locate their latitudes and longitudes. First, a network based on the YOLOv3 architecture was utilized to identify hot spots. The innovation is to modify the RU_1 unit in the YOLOv3 model for hot spot detection in the far field of view and add a neural network residual unit for fusion. In addition, because of the misidentification problem in the infrared images of the solar PV panels, the DeepLab v3+ model was adopted to segment the PV panels to filter out the misidentification caused by bright spots on the ground. Finally, the latitude and longitude of the hot spot are calculated according to the geometric positioning method utilizing known information such as the drone's yaw angle, shooting height, and lens field-of-view. The experimental results indicate that the hot spot recognition rate accuracy is above 98%. When keeping the drone 25 m off the ground, the hot spot positioning error is at the decimeter level.

Design of Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot (스냅샷을 가지는 다중 레벨 공간 DBMS를 기반으로 하는 센서 미들웨어 구조 설계)

  • Oh, Eun-Seog;Kim, Ho-Seok;Kim, Jae-Hong;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.1 s.16
    • /
    • pp.1-16
    • /
    • 2006
  • Recently, human based computing environment for supporting users to concentrate only user task without sensing other changes from users is being progressively researched and developed. But middleware deletes steream data processed for reducing process load of massive information from RFID sensor in this computing. So, this kind of middleware have problems when user demands probability or statistics needed for data warehousing or data mining and when user demands very important stream data repeatedly but already discarded in the middleware every former time. In this paper, we designs Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot and manage repeatedly required stream datas to solve reusing problems of historical stream data in current middleware. This system uses disk databse that manages historical stream datas filtered in middleware for requiring services using historical stream information as data mining or data warehousing from user, and uses memory database that mamages highly reuseable data as a snapshot when stream data storaged in disk database has high reuse frequency from user. For the more, this system processes memory database management policy in a cycle to maintain high reusement and rapid service for users. Our paper system solves problems of repeated requirement of stream datas, or a policy decision service using historical stream data of current middleware. Also offers variant and rapid data services maintaining high data reusement of main memory snapshot datas.

  • PDF

Design and Implementation of Path Computation Element Protocol (PCEP) - FSM and Interfaces (Path Computation Element 프로토콜 (PCEP)의 설계 및 구현 - FSM과 인터페이스)

  • Lee, Wonhyuk;Kang, Seungae;Kim, Hyuncheol
    • Convergence Security Journal
    • /
    • v.13 no.4
    • /
    • pp.19-25
    • /
    • 2013
  • The increasing demand for fast, flexible and guaranteed Quality of Service (QoS) in core networks has caused to deploy MultiProtocol Label Switching (MPLS) and Generalized MPLS (GMPLS) control plane. In GMPLS control plane, path computation and cooperation processes are one of the crucial element to maintain an acceptable level of service. The Internet Engineering Task Force (IETF) has proposed the Path Computation Element (PCE) architecture. The PCE is a dedicated network element devoted to path computation process and communications between Path Computation Clients (PCC) and PCEs is realized through the PCE Protocol (PCEP). This paper examines the PCE-based path computation architecture to include the design and implementation of PCEP. The functional modules including Finite State Machine (FSM) and related key design issues of each state are presented. In particular we also discuss internal/external protocol interfaces that efficiently control the communication channels.

An Implementation of Network Intrusion Detection Engines on Network Processors (네트워크 프로세서 기반 고성능 네트워크 침입 탐지 엔진에 관한 연구)

  • Cho, Hye-Young;Kim, Dae-Young
    • Journal of KIISE:Information Networking
    • /
    • v.33 no.2
    • /
    • pp.113-130
    • /
    • 2006
  • Recently with the explosive growth of Internet applications, the attacks of hackers on network are increasing rapidly and becoming more seriously. Thus information security is emerging as a critical factor in designing a network system and much attention is paid to Network Intrusion Detection System (NIDS), which detects hackers' attacks on network and handles them properly However, the performance of current intrusion detection system cannot catch the increasing rate of the Internet speed because most of the NIDSs are implemented by software. In this paper, we propose a new high performance network intrusion using Network Processor. To achieve fast packet processing and dynamic adaptation of intrusion patterns that are continuously added, a new high performance network intrusion detection system using Intel's network processor, IXP1200, is proposed. Unlike traditional intrusion detection engines, which have been implemented by either software or hardware so far, we design an optimized architecture and algorithms, exploiting the features of network processor. In addition, for more efficient detection engine scheduling, we proposed task allocation methods on multi-processing processors. Through implementation and performance evaluation, we show the proprieties of the proposed approach.

Modeling and Simulation of Platform Specific Model in MPSoC Environment (MPSoC용 임베디드 소프트웨어의 PSM 모델링 및 시뮬레이션)

  • Song, In-Gwon;Oh, Gi-Young;Hong, Jang-Eui;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.8
    • /
    • pp.697-707
    • /
    • 2007
  • Since embedded software is very dependent for target hardware architecture, characteristics of the platform must be considered when designing the software. Furthermore, MPSoCs consists of heterogeneous hardware components that are specified in micro level. Thus mapping of embedded software for MPSoCs should be considered the characteristics. In this paper, we provide an approach to automatic mapping PIM (Platform Independent Model) of an embedded software to PSM(Platform Specific Model) for MPSoC(Multi Processor System On Chip) and verify its effectiveness with simulation. In the proposed approach, tasks are derived from an object oriented model based on the UML (Unified Modeling Language). And then the types of the derived tasks are identified. With the identified types and inter relationship between tasks, the tasks are assigned to appropriate heterogeneous hardware components. We expect that the approach improve accuracy of the assigning and concurrency of the deployed software.

Realization of the Pulse Doppler Radar Signal Processor with an Expandable Feature using the Multi-DSP Based Morocco-2 Board (다중 DSP 구조의 Morocco-2 보드를 이용한 확장성을 갖는 펄스 도플러 레이다 신호처리기 구현)

  • 조명제;임중수
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.12 no.7
    • /
    • pp.1147-1156
    • /
    • 2001
  • In this paper, a new design architecture of radar signal processor in real time is proposed. It has been designed and implemented under the consideration to minimize the inter-processor communication overhead and to maintain the coherence in Doppler pulse domain and in range domain. Its structure can be easily reconfigured and reprogrammed in accordance with an addition of function algorithm or a modification of operational scenario. As we designed a task configuration for parallel processing from measures of computation time for function algorithms and transmission time for results by signal processing, data exchange between processors for performing of function algorithms could be fully removed. Morocco-2 board equipped ADSP-21060 processor of Analog Devices inc. and APEX-3.2 developed for SHARC DSP were used to construct the radar signal processor.

  • PDF

Real-Time Kernel for Linux based on ARM Processor, RTiKA (Real-Time Implant Kernel For ARMLinux) (ARM 프로세서 기반의 리눅스를 위한 실시간 확장 커널 (RTiKA, Real-Time implant Kernel for ARMLinux))

  • Lee, Seung-Yul;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.10
    • /
    • pp.587-597
    • /
    • 2017
  • Recently, the demand for real-time performance in mobile environment is increasing due to the improvement of hardware performance, however a GPOS(General-Purpose Operating System) such as Android and Linux do not provide real-time performance. We developed RTiK(Real-Time implant Kernel) for this problem, but it has the disadvantage of supporting only x86 Architecture. In this paper, we designed and implemented a RTiKA(Real-Time implanted Kernel for ARM) to support real-time in ARM Linux. We used MCT(Multi-Core Timer) timer which replaces Local APIC Timer for real-time support, and we measured the period of generated real-time task for performance verification and evaluation. As the recent the RTiKA can guarantee the operating of several real-time tasks based on the cycle of 1ms.

Lightweight Attention-Guided Network with Frequency Domain Reconstruction for High Dynamic Range Image Fusion

  • Park, Jae Hyun;Lee, Keuntek;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.205-208
    • /
    • 2022
  • Multi-exposure high dynamic range (HDR) image reconstruction, the task of reconstructing an HDR image from multiple low dynamic range (LDR) images in a dynamic scene, often produces ghosting artifacts caused by camera motion and moving objects and also cannot deal with washed-out regions due to over or under-exposures. While there has been many deep-learning-based methods with motion estimation to alleviate these problems, they still have limitations for severely moving scenes. They also require large parameter counts, especially in the case of state-of-the-art methods that employ attention modules. To address these issues, we propose a frequency domain approach based on the idea that the transform domain coefficients inherently involve the global information from whole image pixels to cope with large motions. Specifically we adopt Residual Fast Fourier Transform (RFFT) blocks, which allows for global interactions of pixels. Moreover, we also employ Depthwise Overparametrized convolution (DO-conv) blocks, a convolution in which each input channel is convolved with its own 2D kernel, for faster convergence and performance gains. We call this LFFNet (Lightweight Frequency Fusion Network), and experiments on the benchmarks show reduced ghosting artifacts and improved performance up to 0.6dB tonemapped PSNR compared to recent state-of-the-art methods. Our architecture also requires fewer parameters and converges faster in training.

  • PDF