• Title/Summary/Keyword: FPGA

Search Result 2,090, Processing Time 0.032 seconds

Performance of Energy Efficient Optical Ethernet Systems with a Dynamic Lane Control Scheme (동적 레인 제어방식을 적용한 에너지 절감형 광 이더넷 시스템의 성능분석)

  • Seo, Insoo;Yang, Choong-Reol;Yoon, Chongho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.11
    • /
    • pp.24-35
    • /
    • 2012
  • In this paper, we propose a dynamic lane control scheme with a traffic predictor module and a rate controller for reconciling with commercial optical PHY modules in energy efficient optical Ethernet systems. The commercial high speed optical Ethernet system capable of 40/100Gbps employs 4 or 10 multiple optical transceivers over WDM or multiple optical links. Each of the transceivers is always turned on even if the link is idle. To save energy, we propose the dynamic lane control scheme. It allows that several links may be entirely turned off in a low traffic load and frames are handled on the remaining active links. To preserve the byte order even if the number of active links may be changed, we propose a rate controller to be sat on the reconciliation sublayer. The main role of the controller is to insert null byte streams into the xGMII of inactive lanes. For the PHY module, the null input streams corresponding to inactive lanes will be disregarded on inactive PMDs. It is very handy to implement the rate controller module with MAC in FPGA without any modification of commercial PHYs. It is very crucial to determine the number of active links based on the fluctuated traffic load, we provide a simple traffic predictor based on both the current transmission buffer size and the past one with different weighting factors for adapting to the traffic load fluctuation. Using the OMNET++ simulation framework, we provide several performance results in terms of the energy consumption.

하이브리드 SEM 시스템

  • Kim, Yong-Ju
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2014.02a
    • /
    • pp.109-110
    • /
    • 2014
  • 주사전자현미경(Scanning Electron Microscopy: SEM)은 고체상태에서 미세조직과 형상을 관찰하는 데에 가장 다양하게 쓰이는 분석기기로서 최근에 판매되고 있는 고분해능 SEM은 수 나노미터의 분해능을 가지고 있다. 그리고 SEM의 초점심도가 크기 때문에 3차원적인 영상의 관찰이 용이해서 곡면 혹은 울퉁불퉁한 표면의 영상을 육안으로 관찰하는 것처럼 보여준다. 활용도도 매우 다양해서 금속파면, 광물과 화석, 반도체 소자와 회로망의 품질검사, 고분자 및 유기물, 생체시료 nnnnnnnnn와 유가공 제품 등 모든 산업영역에 걸쳐 있다(Fig. 1). 입사된 전자빔이 시료의 원자와 탄성, 비탄성 충돌을 할 때 2차 전자(secondary electron)외에 후방산란전자(back scattered electron), X선, 음극형광 등이 발생하게 되는 이것을 통하여 topography (시료의 표면 형상), morphology(시료의 구성입자의 형상), composition(시료의 구성원소), crystallography (시료의 원자배열상태)등의 정보를 얻을 수 있다. SEM은 2차 전자를 이용하여 시료의 표면형상을 측정하고 그 외에는 SEM을 플랫폼으로 하여 EDS (Energy Dispersive X-ray Spectroscopy), WDS (Wave Dispersive X-ray Spectroscope), EPMA (Electron Probe X-ray Micro Analyzer), FIB (Focus Ion Beam), EBIC (Electron Beam Induced Current), EBSD (Electron Backscatter Diffraction), PBMS (Particle Beam Mass Spectrometer) 등의 많은 분석장치들이 SEM에 부가적으로 장착되어 다양한 시료의 측정이 이루어진다. 이 중 결정구조, 조성분석을 쉽고 효과적으로 할 수 있게 하는 X선 분석장치인 EDS를 SEM에 일체화시킨 장비와 EDS 및 PBMS를 SEM에 장착하여 반도체 공정 중 발생하는 나노입자의 형상, 성분, 크기분포를 측정하는 PCDS(Particle Characteristic Diagnosis System)에 대해 소개하고자 한다. - EDS와 통합된 SEM 시스템 기본적으로 SEM과 EDS는 상호보완적인 기능을 통하여 매우 밀접하게 사용되고 있으나 제조사와 기술적 근간의 차이로 인해 전혀 다른 방식으로 운영되고 있다. 일반적으로 SEM과 EDS는 별개의 시스템으로 스캔회로와 이미지 프로세싱 회로가 개별적으로 구현되어 있지만 로렌츠힘에 의해 발생하는 전자빔의 왜곡을 보정을 위해 EDS 시스템은 SEM 시스템과 연동되어 운영될 수 밖에 없다. 따라서, 각각의 시스템에서는 필요하지만 전체 시스템에서 보면 중복된 기능을 가지는 전자회로들이 존재하게 되고 이로 인해 SEM과 EDS에서 보는 시료의 이미지의 차이로 인한 측정오차가 발생한다(Fig. 2). EDS와 통합된 SEM 시스템은 중복된 기능인 스캔을 담당하는 scanning generation circuit과 이미지 프로세싱을 담당하는 FPGA circuit 및 응용프로그램을 SEM의 회로와 프로그램을 사용하게 함으로 SEM과 EDS가 보는 시료의 이미지가 정확히 일치함으로 이미지 캘리브레이션이 필요없고 측정오차가 제거된 EDS 측정이 가능하다. - PCDS 공정 중 발생하는 입자는 반도체 생산 수율에 가장 큰 영향을 끼치는 원인으로 파악되고 있으며, 생산수율을 저하시키는 원인 중 70% 가량이 이와 관련된 것으로 알려져 있다. 현재 반도체 공정 중이나 반도체 공정 장비에서 발생하는 입자는 제어가 되고 있지 않은 실정이며 대부분의 반도체 공정은 저압환경에서 이루어지기에 이 때 발생하는 입자를 제어하기 위해서는 저압환경에서 측정할 수 있는 측정시스템이 필요하다. 최근 국내에서는 CVD (Chemical Vapor Deposition) 시스템 내 파이프내벽에서의 오염입자 침착은 심각한 문제점으로 인식되고 있다(Fig. 3). PCDS (Particle Characteristic Diagnosis System)는 오염입자의 형상을 측정할 수 있는 SEM, 오염입자의 성분을 측정할 수 있는 EDS, 저압환경에서 기체에 포함된 입자를 빔 형태로 집속, 가속, 포화상태에 이르게 대전시켜 오염입자의 크기분포를 측정할 수 있는 PBMS가 일체화 되어 반도체 공정 중 발생하는 나노입자 대해 실시간으로 대처와 조치가 가능하게 한다.

  • PDF

A Model-based Methodology for Application Specific Energy Efficient Data path Design Using FPGAs (FPGA에서 에너지 효율이 높은 데이터 경로 구성을 위한 계층적 설계 방법)

  • Jang Ju-Wook;Lee Mi-Sook;Mohanty Sumit;Choi Seonil;Prasanna Viktor K.
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.451-460
    • /
    • 2005
  • We present a methodology to design energy-efficient data paths using FPGAs. Our methodology integrates domain specific modeling, coarse-grained performance evaluation, design space exploration, and low-level simulation to understand the tradeoffs between energy, latency, and area. The domain specific modeling technique defines a high-level model by identifying various components and parameters specific to a domain that affect the system-wide energy dissipation. A domain is a family of architectures and corresponding algorithms for a given application kernel. The high-level model also consists of functions for estimating energy, latency, and area that facilitate tradeoff analysis. Design space exploration(DSE) analyzes the design space defined by the domain and selects a set of designs. Low-level simulations are used for accurate performance estimation for the designs selected by the DSE and also for final design selection We illustrate our methodology using a family of architectures and algorithms for matrix multiplication. The designs identified by our methodology demonstrate tradeoffs among energy, latency, and area. We compare our designs with a vendor specified matrix multiplication kernel to demonstrate the effectiveness of our methodology. To illustrate the effectiveness of our methodology, we used average power density(E/AT), energy/(area x latency), as themetric for comparison. For various problem sizes, designs obtained using our methodology are on average $25\%$ superior with respect to the E/AT performance metric, compared with the state-of-the-art designs by Xilinx. We also discuss the implementation of our methodology using the MILAN framework.

An Improvement of Implementation Method for Multi-Layer AHB BusMatrix (ML-AHB 버스 매트릭스 구현 방법의 개선)

  • Hwang Soo-Yun;Jhang Kyoung-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.629-638
    • /
    • 2005
  • In the System on a Chip design, the on chip bus is one of the critical factors that decides the overall system performance. Especially, in the case or reusing the IPs such as processors, DSPs and multimedia IPs that requires higher bandwidth, the bandwidth problems of on chip bus are getting more serious. Recently ARM proposes the Multi-Layer AHB BusMatrix that is a highly efficient on chip bus to solve the bandwidth problems. The Multi-Layer AHB BusMatrix allows parallel access paths between multiple masters and slaves in a system. This is achieved by using a more complex interconnection matrix and gives the benefit of increased overall bus bandwidth, and a more flexible system architecture. However, there is one clock cycle delay for each master in existing Multi-Layer AHB BusMatrix whenever the master starts new transactions or changes the slave layers because of the Input Stage and arbitration logic realized with Moore type. In this paper, we improved the existing Multi-Layer AHB BusMatrix architecture to solve the one clock cycle delay problems and to reduce the area overhead of the Input Stage. With the elimination of the Input Stage and some restrictions on the arbitration scheme, we tan take away the one clock cycle delay and reduce the area overhead. Experimental results show that the end time of total bus transaction and the average latency time of improved Multi-Layer AHB BusMatrix are improved by $20\%\;and\;24\%$ respectively. in ease of executing a number of transactions by 4-beat incrementing burst type. Besides the total area and the clock period are reduced by $22\%\;and\;29\%$ respectively, compared with existing Multi-layer AHB BusMatrix.

Randomness based Static Wear-Leveling for Enhancing Reliability in Large-scale Flash-based Storage (대용량 플래시 저장장치에서 신뢰성 향상을 위한 무작위 기반 정적 마모 평준화 기법)

  • Choi, Kilmo;Kim, Sewoog;Choi, Jongmoo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.126-131
    • /
    • 2015
  • As flash-based storage systems have been actively employed in large-scale servers and data centers, reliability has become an indispensable element. One promising technique for enhancing reliability is static wear-leveling, which distributes erase operations evenly among blocks so that the lifespan of storage systems can be prolonged. However, increasing the capacity makes the processing overhead of this technique non-trivial, mainly due to searching for blocks whose erase count would be minimum (or maximum) among all blocks. To reduce this overhead, we introduce a new randomized block selection method in static wear-leveling. Specifically, without exhaustive search, it chooses n blocks randomly and selects the maximal/minimal erased blocks among the chosen set. Our experimental results revealed that, when n is 2, the wear-leveling effects can be obtained, while for n beyond 4, the effect is close to that obtained from traditional static wear-leveling. For quantitative evaluation of the processing overhead, the scheme was actually implemented on an FPGA board, and overhead reduction of more than 3 times was observed. This implies that the proposed scheme performs as effectively as the traditional static wear-leveling while reducing overhead.

A STUDY ON THE RE-QUANTIZATION METHOD FOR PREVENTING DISTORTION OF CORRELATION RESULT (상관결과의 왜곡 방지를 위한 재양자화 방법에 관한 연구)

  • Yeom, Jae-Hwan;Oh, Se-Jin;Roh, Duk-Gyoo;Oh, Chung-Sik;Jung, Jin-Seung;Chung, Dong-Kyu;Oyama, Tomoaki;Kawaguchi, Noriyuki;Kobayashi, Hideyuki;Kawakami, Kazuyuki;Onuki, Hirofumi;Ozeki, Kensuke
    • Publications of The Korean Astronomical Society
    • /
    • v.27 no.5
    • /
    • pp.419-429
    • /
    • 2012
  • In this paper, we propose a new re-quantization method after FFT processing to prevent the distortion of correlation result of VCS (VLBI Correlation Subsystem). The re-quantization is used to rearrange the data bit so as to reduce the data rate processed as 16-bit of FFT result of VCS. Having done this procedure, we found that the distorted spectrum of correlation result occurred in the delay tracking experiments by the re-quantization method introduced for initial design of VCS. In order to solve this, two kinds of re-quantization method, that is, the comparison and selection-type, are proposed. The first is to re-quantize the FFT result as a valid-bit by comparing with the input data after determining the adequate threshold. The second is manually to select the valid-bit of FFT result after finding the valid-field of data according to the bit-distribution of input data. We confirmed that the second is more effective compared with the first through the experimental result, and it will be implemented without so much modification of applied method in the condition of the limited resource of FPGA. The re-quantization is, however, carried out with 4-bit in the proposed second method for FFT result, and then the distortion of correlation result is also appeared. To fix this problem, the bit for re-quantization is extended to 8-bit. The proposed 8-bit selection-type is effectively verified so that the distortion of correlation result disappeared by applying to VCS in consequence of the simulation and correlation experiments.

Analysis of Distributed Computational Loads in Large-scale AC/DC Power System using Real-Time EMT Simulation (대규모 AC/DC 전력 시스템 실시간 EMP 시뮬레이션의 부하 분산 연구)

  • In Kwon, Park;Yi, Zhong Hu;Yi, Zhang;Hyun Keun, Ku;Yong Han, Kwon
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.8 no.2
    • /
    • pp.159-179
    • /
    • 2022
  • Often a network becomes complex, and multiple entities would get in charge of managing part of the whole network. An example is a utility grid. While the entire grid would go under a single utility company's responsibility, the network is often split into multiple subsections. Subsequently, each subsection would be given as the responsibility area to the corresponding sub-organization in the utility company. The issue of how to make subsystems of adequate size and minimum number of interconnections between subsystems becomes more critical, especially in real-time simulations. Because the computation capability limit of a single computation unit, regardless of whether it is a high-speed conventional CPU core or an FPGA computational engine, it comes with a maximum limit that can be completed within a given amount of execution time. The issue becomes worsened in real time simulation, in which the computation needs to be in precise synchronization with the real-world clock. When the subject of the computation allows for a longer execution time, i.e., a larger time step size, a larger portion of the network can be put on a computation unit. This translates into a larger margin of the difference between the worst and the best. In other words, even though the worst (or the largest) computational burden is orders of magnitude larger than the best (or the smallest) computational burden, all the necessary computation can still be completed within the given amount of time. However, the requirement of real-time makes the margin much smaller. In other words, the difference between the worst and the best should be as small as possible in order to ensure the even distribution of the computational load. Besides, data exchange/communication is essential in parallel computation, affecting the overall performance. However, the exchange of data takes time. Therefore, the corresponding consideration needs to be with the computational load distribution among multiple calculation units. If it turns out in a satisfactory way, such distribution will raise the possibility of completing the necessary computation in a given amount of time, which might come down in the level of microsecond order. This paper presents an effective way to split a given electrical network, according to multiple criteria, for the purpose of distributing the entire computational load into a set of even (or close to even) sized computational loads. Based on the proposed system splitting method, heavy computation burdens of large-scale electrical networks can be distributed to multiple calculation units, such as an RTDS real time simulator, achieving either more efficient usage of the calculation units, a reduction of the necessary size of the simulation time step, or both.

Optimization of Approximate Modular Multiplier for R-LWE Cryptosystem (R-LWE 암호화를 위한 근사 모듈식 다항식 곱셈기 최적화)

  • Jae-Woo, Lee;Youngmin, Kim
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.736-741
    • /
    • 2022
  • Lattice-based cryptography is the most practical post-quantum cryptography because it enjoys strong worst-case security, relatively efficient implementation, and simplicity. Ring learning with errors (R-LWE) is a public key encryption (PKE) method of lattice-based encryption (LBC), and the most important operation of R-LWE is the modular polynomial multiplication of rings. This paper proposes a method for optimizing modular multipliers based on approximate computing (AC) technology, targeting the medium-security parameter set of the R-LWE cryptosystem. First, as a simple way to implement complex logic, LUT is used to omit some of the approximate multiplication operations, and the 2's complement method is used to calculate the number of bits whose value is 1 when converting the value of the input data to binary. We propose a total of two methods to reduce the number of required adders by minimizing them. The proposed LUT-based modular multiplier reduced both speed and area by 9% compared to the existing R-LWE modular multiplier, and the modular multiplier using the 2's complement method reduced the area by 40% and improved the speed by 2%. appear. Finally, the area of the optimized modular multiplier with both of these methods applied was reduced by up to 43% compared to the previous one, and the speed was reduced by up to 10%.

Family Structure and Succession of the Late Chosun Seen through Male Adoption (양자제도를 통해 본 조선후기 가족구조와 가계계승: 의성김씨 호구단자 분석을 중심으로)

  • Park, Soo-Mi
    • Korea journal of population studies
    • /
    • v.30 no.2
    • /
    • pp.71-95
    • /
    • 2007
  • This paper attempts to identify the principle of family succession and family patterns of yangban in the late Chosun period through an analysis of male adaptation cases found in family registration records. The primary source of analysis is the family registration documents of Uiseong Kim's from the late 17th century to the early 20th century. As a result, it is found that there is a substantial change in the patterns of family from the early and mid Chosun period to the late Chosun period. The change is the strengthening of the principle of patriarchy succession through male adoption. Looking at the data as a whole, the average number of household members is increased and the membership of kinship also expanded. In contrast to the family patterns of the early Chosun period, not only the patterns of Uiseong Kim's family are predominately immediate family or collateral family but also the majority is extended family in the 18th and 19th centuries. The male adoption cases recorded in Uiseong Kim's family registration documents take up 33.8% of the male adoption cases in the entire family registration documents. This goes to show that the strengthening of the principle of primogeniture succession at a time when child mortality rate is very high resulted in the increase of male adoption. In conclusion, the late Chosun society was a society where the seat of primogeniture was much more important than immediate hereditary members in the family succession.

A Hardware Implementation of the Underlying Field Arithmetic Processor based on Optimized Unit Operation Components for Elliptic Curve Cryptosystems (타원곡선을 암호시스템에 사용되는 최적단위 연산항을 기반으로 한 기저체 연산기의 하드웨어 구현)

  • Jo, Seong-Je;Kwon, Yong-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.88-95
    • /
    • 2002
  • In recent years, the security of hardware and software systems is one of the most essential factor of our safe network community. As elliptic Curve Cryptosystems proposed by N. Koblitz and V. Miller independently in 1985, require fewer bits for the same security as the existing cryptosystems, for example RSA, there is a net reduction in cost size, and time. In this thesis, we propose an efficient hardware architecture of underlying field arithmetic processor for Elliptic Curve Cryptosystems, and a very useful method for implementing the architecture, especially multiplicative inverse operator over GF$GF (2^m)$ onto FPGA and futhermore VLSI, where the method is based on optimized unit operation components. We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed and inversion speed has been improved 150 times, 480 times respectively compared with the thesis presented by Sarwono Sutikno et al. [7]. The designed underlying arithmetic processor can be also applied for implementing other crypto-processor and various finite field applications.