• Title/Summary/Keyword: Parallel Simulator

Search Result 158, Processing Time 0.028 seconds

Comparison of Newton's and Euler's Algorithm in a Compound Pendulum (복합진자 모형의 뉴튼.오일러 알고리즘 비교)

  • Hah, Chong-Ku
    • Korean Journal of Applied Biomechanics
    • /
    • v.16 no.3
    • /
    • pp.1-7
    • /
    • 2006
  • The Primary type of swinging motion in human movement is that which is characteristic of a pendulum. The two types of pendulums are identified as simple and compound. A simple pendulum consist of a small body suspended by a relatively long cord. Its total mass is contained within the bob. The cord is not considered to have mass. A compound pendulum, on the other hand, is any pendulum such as the human body swinging by hands from a horizontal bar. Therefore a compound pendulum depicts important motions that are harmonic, periodic, and oscillatory. In this paper one discusses and compares two algorithms of Newton's method(F = m a) and Euler's method (M = $I{\times}{\alpha}$) in compound pendulum. Through exercise model such as human body with weight(m = 50 kg), body length(L = 1.5m), and center of gravity ($L_c$ = 0.4119L) from proximal end swinging by hands from a horizontal bar, one finds kinematic variables(angle displacement / velocity / acceleration), and simulates kinematic variables by changing body lengths and body mass. BSP by Clauser et al.(1969) & Chandler et al.(1975) is used to find moment of inertia of the compound pendulum. The radius of gyration about center of gravity (CoG) is $k_c\;=\;K_c{\times}L$ (단, k= radius of gyration, K= radius of gyration /segment length), and then moment of inertia about center of gravity(CoG) becomes $I_c\;=\;m\;k_c^2$. Finally, moment of inertia about Z-axis by parallel theorem becomes $I_o\;=\;I_c\;+\;m\;k^2$. The two-order ordinary differential equations of models are solved by ND function of numeric analysis method in Mathematica5.1. The results are as follows; First, The complexity of Newton's method is much more complex than that of Euler's method Second, one could be find kinematic variables according to changing body lengths(L = 1.3 / 1.7 m) and periods are increased by body length increment(L = 1.3 / 1.5 / 1.7 m). Third, one could be find that periods are not changing by means of changing mass(m = 50 / 55 / 60 kg). Conclusively, one is intended to meditate the possibility of applying a compound pendulum to sports(balling, golf, gymnastics and so on) necessary swinging motions. Further improvements to the study could be to apply Euler's method to real motions and one would be able to develop the simulator.

Optimized Hardware Design of Deblocking Filter for H.264/AVC (H.264/AVC를 위한 디블록킹 필터의 최적화된 하드웨어 설계)

  • Jung, Youn-Jin;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.20-27
    • /
    • 2010
  • This paper describes a design of 5-stage pipelined de-blocking filter with power reduction scheme and proposes a efficient memory architecture and filter order for high performance H.264/AVC Decoder. Generally the de-blocking filter removes block boundary artifacts and enhances image quality. Nevertheless filter has a few disadvantage that it requires a number of memory access and iterated operations because of filter operation for 4 time to one edge. So this paper proposes a optimized filter ordering and efficient hardware architecture for the reduction of memory access and total filter cycles. In proposed filter parallel processing is available because of structured 5-stage pipeline consisted of memory read, threshold decider, pre-calculation, filter operation and write back. Also it can reduce power consumption because it uses a clock gating scheme which disable unnecessary clock switching. Besides total number of filtering cycle is decreased by new filter order. The proposed filter is designed with Verilog-HDL and functionally verified with the whole H.264/AVC decoder using the Modelsim 6.2g simulator. Input vectors are QCIF images generated by JM9.4 standard encoder software. As a result of experiment, it shows that the filter can make about 20% total filter cycles reduction and it requires small transposition buffer size.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Optical Design of an Integrated Two-Channel Optical Transmitter for an HDMI interface (광 HDMI 인터페이스용 2채널 광송신기 광학 설계)

  • Yoon, Hyun-Jae;Kang, Hyun-Seo
    • Korean Journal of Optics and Photonics
    • /
    • v.26 no.5
    • /
    • pp.269-274
    • /
    • 2015
  • In this paper we design the optical system for an integrated two-channel TO-type optical transmitter to apply the HDMI interface using the code V simulator. The proposed integrated two-channel optical transmitter has two VCSELs attached in parallel on an 8-pin TO-CAN package, on top of which is a lens filter block ($1mm{\times}2mm{\times}4mm$) composed of hemispherical lenses and WDM filters. Considering two-channel transmitters manufactured with wavelength combinations of 1060nm/1270nm and 1330nm/1550nm, we obtain the optimum value of the diameter of the hemispherical lens as 0.6 mm for both combinations, and the distances L between the lens filter block and ball lens as 1.7 mm and 2.0 mm for the 1060nm/1270nm and 1330nm/1550nm wavelength combinations, respectively. At this time, the focal length f0 of the lens filter blocks for wavelengths of 1060, 1270, 1330, and 1550 nm are 0.351, 0.354, 0.355, and 0.359 mm, respectively, and the focal lengths F of light passing through the lens filter block and ball lens are 0.62 mm for 1060nm/1270nm and 0.60-0.66 mm for 1330nm/1550nm wavelength combinations.

Analysis of Distributed Computational Loads in Large-scale AC/DC Power System using Real-Time EMT Simulation (대규모 AC/DC 전력 시스템 실시간 EMP 시뮬레이션의 부하 분산 연구)

  • In Kwon, Park;Yi, Zhong Hu;Yi, Zhang;Hyun Keun, Ku;Yong Han, Kwon
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.8 no.2
    • /
    • pp.159-179
    • /
    • 2022
  • Often a network becomes complex, and multiple entities would get in charge of managing part of the whole network. An example is a utility grid. While the entire grid would go under a single utility company's responsibility, the network is often split into multiple subsections. Subsequently, each subsection would be given as the responsibility area to the corresponding sub-organization in the utility company. The issue of how to make subsystems of adequate size and minimum number of interconnections between subsystems becomes more critical, especially in real-time simulations. Because the computation capability limit of a single computation unit, regardless of whether it is a high-speed conventional CPU core or an FPGA computational engine, it comes with a maximum limit that can be completed within a given amount of execution time. The issue becomes worsened in real time simulation, in which the computation needs to be in precise synchronization with the real-world clock. When the subject of the computation allows for a longer execution time, i.e., a larger time step size, a larger portion of the network can be put on a computation unit. This translates into a larger margin of the difference between the worst and the best. In other words, even though the worst (or the largest) computational burden is orders of magnitude larger than the best (or the smallest) computational burden, all the necessary computation can still be completed within the given amount of time. However, the requirement of real-time makes the margin much smaller. In other words, the difference between the worst and the best should be as small as possible in order to ensure the even distribution of the computational load. Besides, data exchange/communication is essential in parallel computation, affecting the overall performance. However, the exchange of data takes time. Therefore, the corresponding consideration needs to be with the computational load distribution among multiple calculation units. If it turns out in a satisfactory way, such distribution will raise the possibility of completing the necessary computation in a given amount of time, which might come down in the level of microsecond order. This paper presents an effective way to split a given electrical network, according to multiple criteria, for the purpose of distributing the entire computational load into a set of even (or close to even) sized computational loads. Based on the proposed system splitting method, heavy computation burdens of large-scale electrical networks can be distributed to multiple calculation units, such as an RTDS real time simulator, achieving either more efficient usage of the calculation units, a reduction of the necessary size of the simulation time step, or both.

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit (그래픽 프로세서를 이용한 탄성파 수치모사의 계산속도 향상)

  • Nakata, Norimitsu;Tsuji, Takeshi;Matsuoka, Toshifumi
    • Geophysics and Geophysical Exploration
    • /
    • v.14 no.1
    • /
    • pp.98-104
    • /
    • 2011
  • Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, we use a Graphic Processing Unit (GPU) to accelerate the elastic wave simulation. Because a GPU has many processors and a wide memory bandwidth, we can use it in a parallelised computing architecture. The GPU board used in this study is an NVIDIA Tesla C1060, which has 240 processors and a 102 GB/s memory bandwidth. Despite the availability of a parallel computing architecture (CUDA), developed by NVIDIA, we must optimise the usage of the different types of memory on the GPU device, and the sequence of calculations, to obtain a significant speedup of the computation. In this study, we simulate two- (2D) and threedimensional (3D) elastic wave propagation using the Finite-Difference Time-Domain (FDTD) method on GPUs. In the wave propagation simulation, we adopt the staggered-grid method, which is one of the conventional FD schemes, since this method can achieve sufficient accuracy for use in numerical modelling in geophysics. Our simulator optimises the usage of memory on the GPU device to reduce data access times, and uses faster memory as much as possible. This is a key factor in GPU computing. By using one GPU device and optimising its memory usage, we improved the computation time by more than 14 times in the 2D simulation, and over six times in the 3D simulation, compared with one CPU. Furthermore, by using three GPUs, we succeeded in accelerating the 3D simulation 10 times.

Evaluation of a Water-based Bolus Device for Radiotherapy to the Extremities in Kaposi's Sarcoma Patients (사지에 발병한 카포시육종의 방사선치료를 위한 물볼루스 기구의 유용성 고찰)

  • Ahn, Seung-Kwon;Kim, Yong-Bae;Lee, Ik-Jae;Song, Tae-Soo;Son, Dong-Min;Jang, Yung-Jae;Cho, Jung-Hee;Kim, Joo-Ho;Kim, Dong-Wook;Cho, Jae-Ho;Suh, Chang-Ok
    • Radiation Oncology Journal
    • /
    • v.26 no.3
    • /
    • pp.189-194
    • /
    • 2008
  • Purpose: We designed a water-based bolus device for radiation therapy in Kaposi's sarcoma. This study evaluated the usefulness of this new device and compared it with the currently used rice-based bolus. Materials and Methods: We fashioned a polystyrene box and cut a hole in order to insert patient's extremities while the patient was in the supine position. We used a vacuum-vinyl based polymer to reduce water leakage. Next, we eliminated air using a vacuum pump and a vacuum valve to reduce the air gap between the water and extremities in the vacuum-vinyl box. We performed CT scans to evaluate the density difference of the fabricated water-based bolus device when the device in which the rice-based bolus was placed directly, the rice-based bolus with polymer-vinyl packed rice, and the water were all put in. We analyzed the density change with the air gap volume using a planning system. In addition, we measured the homogeneity and dose in the low-extremities phantom, attached to six TLD, and wrapped film exposed in parallel-opposite fields with the LINAC under the same conditions as the set-up of the CT-simulator. Results: The density value of the rice-based bolus with the rice put in directly was 14% lower than that of the water-based bolus. Moreover, the value of the other experiments in the rice-based bolus with the polymer-vinyl packed rice showed an 18% reduction in density. The analysis of the EDR2 film revealed that the water-based bolus shows a more homogeneous dose plan, which was superior by $4{\sim}4.4%$ to the rice-base bolus. The mean TLD readings of the rice-based bolus, with the rice put directly into the polystyrene box had a 3.4% higher density value. Moreover, the density value in the case of the rice-based bolus with polymer-vinyl packed rice had a 4.3% higher reading compared to the water-based bolus. Conclusion: Our custom-made water-based bolus device increases the accuracy of the set-up by confirming the treatment field. It also improves the accuracy of the therapy owing to the reduction of the air gap using a vacuum pump and a vacuum valve. This set-up represents a promising alternative device for delivering a homogenous dose to the target volume.

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF