• Title/Summary/Keyword: Embedded Hardware

Search Result 684, Processing Time 0.031 seconds

Design and Implementation of a Hardware Accelerator for Marine Object Detection based on a Binary Segmentation Algorithm for Ship Safety Navigation (선박안전 운항을 위한 이진 분할 알고리즘 기반 해상 객체 검출 하드웨어 가속기 설계 및 구현)

  • Lee, Hyo-Chan;Song, Hyun-hak;Lee, Sung-ju;Jeon, Ho-seok;Kim, Hyo-Sung;Im, Tae-ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.10
    • /
    • pp.1331-1340
    • /
    • 2020
  • Object detection in maritime means that the captain detects floating objects that has a risk of colliding with the ship using the computer automatically and as accurately as human eyes. In conventional ships, the presence and distance of objects are determined through radar waves. However, it cannot identify the shape and type. In contrast, with the development of AI, cameras help accurately identify obstacles on the sea route with excellent performance in detecting or recognizing objects. The computer must calculate high-volume pixels to analyze digital images. However, the CPU is specialized for sequential processing; the processing speed is very slow, and smooth service support or security is not guaranteed. Accordingly, this study developed maritime object detection software and implemented it with FPGA to accelerate the processing of large-scale computations. Additionally, the system implementation was improved through embedded boards and FPGA interface, achieving 30 times faster performance than the existing algorithm and a three-times faster entire system.

A study on the design of an efficient hardware and software mixed-mode image processing system for detecting patient movement (환자움직임 감지를 위한 효율적인 하드웨어 및 소프트웨어 혼성 모드 영상처리시스템설계에 관한 연구)

  • Seungmin Jung;Euisung Jung;Myeonghwan Kim
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.29-37
    • /
    • 2024
  • In this paper, we propose an efficient image processing system to detect and track the movement of specific objects such as patients. The proposed system extracts the outline area of an object from a binarized difference image by applying a thinning algorithm that enables more precise detection compared to previous algorithms and is advantageous for mixed-mode design. The binarization and thinning steps, which require a lot of computation, are designed based on RTL (Register Transfer Level) and replaced with optimized hardware blocks through logic circuit synthesis. The designed binarization and thinning block was synthesized into a logic circuit using the standard 180n CMOS library and its operation was verified through simulation. To compare software-based performance, performance analysis of binary and thinning operations was also performed by applying sample images with 640 × 360 resolution in a 32-bit FPGA embedded system environment. As a result of verification, it was confirmed that the mixed-mode design can improve the processing speed by 93.8% in the binary and thinning stages compared to the previous software-only processing speed. The proposed mixed-mode system for object recognition is expected to be able to efficiently monitor patient movements even in an edge computing environment where artificial intelligence networks are not applied.

Serialized Multitasking Code Generation from Dataflow Specification (데이타 플로우 명세로부터 직렬화된 멀티태스킹 코드 생성)

  • Kwon, Seong-Nam;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.429-440
    • /
    • 2008
  • As embedded system becomes more complex, software development becomes more important in the entire design process. Most embedded applications consist of multi -tasks, that are executed in parallel. So, dataflow model that expresses concurrency naturally is preferred than sequential programming language to develop multitask software. For the execution of multitasking codes, operating system is essential to schedule multi-tasks and to deal with the communication between tasks. But, it is needed to execute multitasking code without as when the target hardware platform cannot execute as or target platforms are candidates of design space exploration, because it is very costly to port as for all candidate platforms of DSE. For this reason, we propose the serialized multitasking code generation technique from dataflow specification. In the proposed technique, a task is specified with dataflow model, and generated as a C code. Code generation consists of two steps: First, a block in a task is generated as a separate function. Second, generated functions are scheduled by a multitasking scheduler that is also generated automatically. To make it easy to write customized scheduler manually, the data structure and information of each task are defined. With the preliminary experiment of DivX player, it is confirmed that the generated code from the proposed framework is efficiently and correctly executed on the target system.

Direction-Embedded Branch Prediction based on the Analysis of Neural Network (신경망의 분석을 통한 방향 정보를 내포하는 분기 예측 기법)

  • Kwak Jong Wook;Kim Ju-Hwan;Jhon Chu Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.1
    • /
    • pp.9-26
    • /
    • 2005
  • In the pursuit of ever higher levels of performance, recent computer systems have made use of deep pipeline, dynamic scheduling and multi-issue superscalar processor technologies. In this situations, branch prediction schemes are an essential part of modem microarchitectures because the penalty for a branch misprediction increases as pipelines deepen and the number of instructions issued per cycle increases. In this paper, we propose a novel branch prediction scheme, direction-gshare(d-gshare), to improve the prediction accuracy. At first, we model a neural network with the components that possibly affect the branch prediction accuracy, and analyze the variation of their weights based on the neural network information. Then, we newly add the component that has a high weight value to an original gshare scheme. We simulate our branch prediction scheme using Simple Scalar, a powerful event-driven simulator, and analyze the simulation results. Our results show that, compared to bimodal, two-level adaptive and gshare predictor, direction-gshare predictor(d-gshare. 3) outperforms, without additional hardware costs, by up to 4.1% and 1.5% in average for the default mont of embedded direction, and 11.8% in maximum and 3.7% in average for the optimal one.

Design of Reconfigurable Coprocessor for Multimedia Mobile Terminal (멀티미디어 무선 단말기를 위한 재구성 가능한 코프로세서의 설계)

  • Kim, Nam-Sub;Lee, Sang-Hun;Kum, Min-Ha;Kim, Jin-Sang;Cho, Won-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.4
    • /
    • pp.63-72
    • /
    • 2007
  • In this paper, we propose a novel reconfigurable coprocessor for multimedia mobile terminals. Because most of multimedia operations require fast operations of large amount of data in the limited clock frequency, it is necessary to enhance the performance of the embedded processor that is widely used in current multimedia mobile terminals. Therefore, we proposed and have designed the coprocessor which had the ability of fast operations of multimedia data. The proposed coprocessor was not only reconfigurable, but also flexible and expandable. The proposed coprocessor has been designed by using VHDL and compared with previous reconfigurable coprocessors and a commercial embedded processor in architecture and speed. As a result of the architectural comparison, the proposed coprocessor had better structure in terms of hardware size and flexibility. Also, the simulation results of DCT application showed that the proposed coprocessor was 26 times faster than a commercial ARM processor and 11 times faster than the ARM processor with fast DCT core.

Scenario-Based Implementation Synthesis for Real-Time Object-Oriented Models (실시간 객체 지향 모델을 위한 시나리오 기반 구현 합성)

  • Kim, Sae-Hwa;Park, Ji-Yong;Hong, Seong-Soo
    • The KIPS Transactions:PartD
    • /
    • v.12D no.7 s.103
    • /
    • pp.1049-1064
    • /
    • 2005
  • The demands of increasingly complicated software have led to the proliferation of object-oriented design methodologies in embedded systems. To execute a system designed with objects in target hardware, a task set should be derived from the objects, representing how many tasks reside in the system and which task processes which event arriving at an object. The derived task set greatly influences the responsiveness of the system. Nevertheless, it is very difficult to derive an optimal task set due to the discrepancy between objects and tasks. Therefore, the common method currently used by developers is to repetitively try various task sets. This paper proposes Scenario-based Implementation Synthesis Architecture (SISA) to solve this problem. SISA encompasses a method for deriving a task set from a system designed with objects as well as its supporting development tools and run-time system architecture. A system designed with SISA not only consists of the smallest possible number of tasks, but also guarantees that the response time for each event in the system is minimized. We have fully implemented SISA by extending the ResoRT development tool and applied it to an existing industrial PBX system. The experimental results show that maximum response times were reduced $30.3\%$ on average compared to when the task set was derived by the best known existing methods.

Implementation of a TCP/IP Offload Engine Using High Performance Lightweight TCP/IP (고성능 경량 TCP/IP를 이용한 소프트웨어 기반 TCP/IP 오프로드 엔진 구현)

  • Jun, Yong-Tae;Chung, Sang-Hwa;Yoon, In-Su
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.369-377
    • /
    • 2008
  • Today, Ethernet technology is rapidly developing to have a bandwidth of 10Gbps beyond 1Gbps. In such high-speed networks, the existing method that host CPU processes TCP/IP in the operating system causes numerous overheads. As a result of the overheads, user applications cannot get the enough computing power from the host CPU. To solve this problem, the TCP/IP Offload Engine(TOE) technology was emerged. TOE is a specialized NIC which processes the TCP/IP instead of the host CPU. In this paper, we implemented a high-performance, lightweight TCP/IP(HL-TCP) for the TOE and applied it to an embedded system. The HL-TCP supports existing fundamental TCP/IP functions; flow control, congestion control, retransmission, delayed ACK, processing out-of-order packets. And it was implemented to utilize Ethernet MAC's hardware features such as TCP segmentation offload(TSO), checksum offload(CSO) and interrupt coalescing. Also we eliminated the copy overhead from the host memory to the NIC memory when sending data and we implemented an efficient DMA mechanism for the TCP retransmission. The TOE using the HL-TCP has the CPU utilization of less than 6% and the bandwidth of 453Mbps.

Instruction-corruption-less Binary Modification Mechanism for Static Stack Protections (이진 조작을 통한 정적 스택 보호 시 발생하는 명령어 밀림현상 방지 기법)

  • Lee, Young-Rim;Kim, Young-Pil;Yoo, Hyuck
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.71-75
    • /
    • 2008
  • Many sensor operating systems have memory limitation constraint; therefore, stack memory areas of threads resides in a single memory space. Because most target platforms do not have hardware MMY (Memory Management Unit), it is difficult to protect each stack area. The method to solve this problem is to exchange original stack handling instructions in binary code for wrapper routines to protect stack area. In this exchanging phase, instruction corruption problem occurs due to difference of each instruction length between stack handling instructions and branch instructions. In this paper, we propose the algorithm to call a target routine without instruction corruption problem. This algorithm can reach a target routine by repeating branch instructions to have a short range. Our solution makes it easy to apply security patch and maintain upgrade of software of sensor node.

Effective SoC Architecture of a VDP for full HD TVs (Full HD TV를 위한 효율적인 VDP SoC 구조)

  • Kim, Ji-Hoon;Kim, Young-Chul
    • Smart Media Journal
    • /
    • v.1 no.1
    • /
    • pp.1-9
    • /
    • 2012
  • This Paper proposes an effective SoC hardware architecture implementing a VDP for Full HD TVs. The proposed architecture makes real time video processing possible with supporting efficient bus architecture and flexible interface. Video IP cores in the VDP are designed to provide a high quality of improved image enhancement function. The Avalon interface is adopted to guarantee real-time capability to IPs as well as SoC integration. This leads to reduced design time and also enhanced designer's convenience due to the easiness in IP addition, deletion, and revision for IP verification and SoC integration. The embedded software makes it possible to implement flexible real-time system by controlling setting parameter details and data transmitting schemes in real-time. The proposed VDP SoC design is implemented on Cyclon III SoPC platform. The experimental results show that our proposed architecture of the VDP SoC successfully provides required quality of Video image by converting SD level input to Full HD level image.

  • PDF

Code Visualization Approach for Low level Power Improvement via Identifying Performance Dissipation (성능 저하 식별을 통한 저전력 개선용 코드 가시화 방법)

  • An, Hyun Sik;Park, Bokyung;Kim, R.Young Chul;Kim, Ki Du
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.10
    • /
    • pp.213-220
    • /
    • 2020
  • The power consumption and performance of hardware-based mobile and IoT embedded systems that require high specifications are one of the important issues of these systems. In particular, the problem of excessive power consumption is because it causes a problem of increasing heat generation and shortening the life of the device. In addition, in the same environment, software also needs to perform stable operation in limited power and memory, thereby increasing power consumption of the device. In order to solve these issues, we propose a Low level power improvement via identifying performance dissipation. The proposed method identifies complex modules (especially Cyclomatic complexity, Coupling & Cohesion) through code visualization, and helps to simplify low power code patterning and performance code. Therefore, through this method, it is possible to optimize the quality of the code by reducing power consumption and improving performance.