• Title/Summary/Keyword: cost matrix

Search Result 647, Processing Time 0.025 seconds

CUDA-based Parallel Bi-Conjugate Gradient Matrix Solver for BioFET Simulation (BioFET 시뮬레이션을 위한 CUDA 기반 병렬 Bi-CG 행렬 해법)

  • Park, Tae-Jung;Woo, Jun-Myung;Kim, Chang-Hun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.90-100
    • /
    • 2011
  • We present a parallel bi-conjugate gradient (Bi-CG) matrix solver for large scale Bio-FET simulations based on recent graphics processing units (GPUs) which can realize a large-scale parallel processing with very low cost. The proposed method is focused on solving the Poisson equation in a parallel way, which requires massive computational resources in not only semiconductor simulation, but also other various fields including computational fluid dynamics and heat transfer simulations. As a result, our solver is around 30 times faster than those with traditional methods based on single core CPU systems in solving the Possion equation in a 3D FDM (Finite Difference Method) scheme. The proposed method is implemented and tested based on NVIDIA's CUDA (Compute Unified Device Architecture) environment which enables general purpose parallel processing in GPUs. Unlike other similar GPU-based approaches which apply usually 32-bit single-precision floating point arithmetics, we use 64-bit double-precision operations for better convergence. Applications on the CUDA platform are rather easy to implement but very hard to get optimized performances. In this regard, we also discuss the optimization strategy of the proposed method.

Automation of Bio-Industrial Process Via Tele-Task Command(I) -identification and 3D coordinate extraction of object- (원격작업 지시를 이용한 생물산업공정의 생력화 (I) -대상체 인식 및 3차원 좌표 추출-)

  • Kim, S. C.;Choi, D. Y.;Hwang, H.
    • Journal of Biosystems Engineering
    • /
    • v.26 no.1
    • /
    • pp.21-28
    • /
    • 2001
  • Major deficiencies of current automation scheme including various robots for bioproduction include the lack of task adaptability and real time processing, low job performance for diverse tasks, and the lack of robustness of take results, high system cost, failure of the credit from the operator, and so on. This paper proposed a scheme that could solve the current limitation of task abilities of conventional computer controlled automatic system. The proposed scheme is the man-machine hybrid automation via tele-operation which can handle various bioproduction processes. And it was classified into two categories. One category was the efficient task sharing between operator and CCM(computer controlled machine). The other was the efficient interface between operator and CCM. To realize the proposed concept, task of the object identification and extraction of 3D coordinate of an object was selected. 3D coordinate information was obtained from camera calibration using camera as a measurement device. Two stereo images were obtained by moving a camera certain distance in horizontal direction normal to focal axis and by acquiring two images at different locations. Transformation matrix for camera calibration was obtained via least square error approach using specified 6 known pairs of data points in 2D image and 3D world space. 3D world coordinate was obtained from two sets of image pixel coordinates of both camera images with calibrated transformation matrix. As an interface system between operator and CCM, a touch pad screen mounted on the monitor and remotely captured imaging system were used. Object indication was done by the operator’s finger touch to the captured image using the touch pad screen. A certain size of local image processing area was specified after the touch was made. And image processing was performed with the specified local area to extract desired features of the object. An MS Windows based interface software was developed using Visual C++6.0. The software was developed with four modules such as remote image acquisiton module, task command module, local image processing module and 3D coordinate extraction module. Proposed scheme shoed the feasibility of real time processing, robust and precise object identification, and adaptability of various job and environments though selected sample tasks.

  • PDF

Clinical Laboratory Aspect of Carbapenem-Resistant Enterobacteriaceae (카바페넴내성장내세균속균종의 임상검사 측면)

  • Park, Chang-Eun
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.52 no.1
    • /
    • pp.18-27
    • /
    • 2020
  • The correct distinction of carbapenem-resistant Enterobacteriaceae (CRE) and ccarbapenemase producing Enterobacteriaceae (CPE) and the rapid detection of CPE are important for instituting the correct treatment and management of clinical infections. Screening protocols are mainly based on cultures of rectal swab specimens on selective media followed by phenotypic tests to confirm a carbapenem-hydrolyzing activity, the rapid carbapenem inactivation method, lateral flow immunoassay, the matrix-assisted laser desorption ionization-time-of-flight test and molecular methods. The CPE is accurate for detection, and is essential for the clinical treatment and prevention of infections. A variety of phenotypic methods and gene-based methods are available for the rapid detection of carbapenemases, and these are expected to be routinely used in clinical microbiology laboratories. Therefore, to control the spread of carbapenemase, many laboratories around the world will need to use reliable, fast, high efficiency, simple and low cost methods. Optimal effects in patient applications would require rapid testing of CRE to provide reproducible support for antimicrobial management interventions or the treatment by various types of clinicians. For the optimal test method, it is necessary to combine complementary test methods to discriminate between various resistant bacterial species and to discover the genetic diversity of various types of carbapenemase for arriving at the best infection control strategy.

The Intraday Lead-Lag Relationships between the Stock Index and the Stock Index Futures Market in Korea and China (한국과 중국의 현물시장과 주가지수선물시장간의 선-후행관계에 관한 연구)

  • Seo, Sang-Gu
    • Management & Information Systems Review
    • /
    • v.32 no.4
    • /
    • pp.189-207
    • /
    • 2013
  • Using high-frequency data for 2 years, this study investigates intraday lead-lag relationship between stock index and stock index futures markets in Korea and China. We found that there are some differences in price discovery and volatility transmission between Korea and China after the stock index futures markets was introduced. Following Stoll-Whaley(1990) and Chan(1992), the multiple regression is estimated to examine the lead-lag patterns between the two markets by Newey-West's(1987) heteroskedasticity and autocorrelation consistent covariance matrix(HAC matrix). Empirical results of KOSPI 200 shows that the futures market leads the cash market and weak evidence that the cash market leads the futures market. New market information disseminates in the futures market before the stock market with index arbitrageurs then stepping in quickly to bring the cost-of-carry relation back into alignment. The regression tests for the conditional volatility which is estimated using EGARCH model do not show that there is a clear pattern of the futures market leading the stock market in terms of the volatility even though controlling nonsynchronous trading effects. This implies that information in price innovations that originate in the futures market is transmitted to the volatility of the cash market. Empirical results of CSI 300 shows that the cash market is found to play a more dominant role in the price discovery process after the Chinese index started a sharp decline immediately after the stock index futures were introduced. The new stock index futures markets does not function well in its price discovery performance at its infancy stage, apparently due to high barriers to entry into this emerging futures markets. Based on EGAECH model, the results uncover strong bi-directional dependence in the intraday volatility of both markets.

  • PDF

Evaluation of the Canine Stifle Joint after Transection of the Cranial Cruciate Ligament and Medial Collateral Ligament, and Medial Meniscectomy without Postoperative Exercise (앞십자인대 및 내측 곁인대 절제와 내측 반월판 절제술을 한 뒤 수술후 운동을 실시하지 않은 개의 무릎 관절의 평가)

  • Lee, Hae-Beom;Jeong, Chang-Woo;Kim, Nam-Soo
    • Journal of Veterinary Clinics
    • /
    • v.24 no.3
    • /
    • pp.325-330
    • /
    • 2007
  • This study was to determine whether canine model which produce acute permanent joint instability in short period without postoperative exercise have a degenerative changes and also evaluated its suitability as an appropriate animal OA models. Ten skeletally mature beagle dogs underwent a unilateral surgical transection of the cranial cruciate ligament and, the medial collateral ligament as well as a medial meniscectomy. The contra-lateral joint was used as control. After 12 weeks, After 12 weeks, the amount of joint damage, inflammation and biochemical change of synovial fluid was evaluated. Histological analysis showed chondrocyte clone formation, hypertrophy of the cartilage and moderate loss of proteoglycans in the experimental joints compared to control joints. In addition, the synovial inflammation in the experimental joints was observed. Biochemical analysis of SF showed significantly increased MMP (matrix metalloproteinase) -2 and -9 in experimental joints compared to control joints. This canine OA model shows the characteristics of degenerative joint disease, and may have a advantages of reducing the time and cost because postoperative exercise is not needed in this OA model.

A Digital Phase-locked Loop design based on Minimum Variance Finite Impulse Response Filter with Optimal Horizon Size (최적의 측정값 구간의 길이를 갖는 최소 공분산 유한 임펄스 응답 필터 기반 디지털 위상 고정 루프 설계)

  • You, Sung-Hyun;Pae, Dong-Sung;Choi, Hyun-Duck
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.4
    • /
    • pp.591-598
    • /
    • 2021
  • The digital phase-locked loops(DPLL) is a circuit used for phase synchronization and has been generally used in various fields such as communication and circuit fields. State estimators are used to design digital phase-locked loops, and infinite impulse response state estimators such as the well-known Kalman filter have been used. In general, the performance of the infinite impulse response state estimator-based digital phase-locked loop is excellent, but a sudden performance degradation may occur in unexpected situations such as inaccuracy of initial value, model error, and disturbance. In this paper, we propose a minimum variance finite impulse response filter with optimal horizon for designing a new digital phase-locked loop. A numerical method is introduced to obtain the measured value interval length, which is an important parameter of the proposed finite impulse response filter, and to obtain a gain, the covariance matrix of the error is set as a cost function, and a linear matrix inequality is used to minimize it. In order to verify the superiority and robustness of the proposed digital phase-locked loop, a simulation was performed for comparison and analysis with the existing method in a situation where noise information was inaccurate.

Further Improvement of Direct Solution-based FETI Algorithm (직접해법 기반의 FETI 알고리즘의 개선)

  • Kang, Seung-Hoon;Gong, DuHyun;Shin, SangJoon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.35 no.5
    • /
    • pp.249-257
    • /
    • 2022
  • This paper presents an improved computational framework for the direct-solution-based finite element tearing and interconnecting (FETI) algorithm. The FETI-local algorithm is further improved herein, and localized Lagrange multipliers are used to define the interface among its subdomains. Selective inverse entry computation, using a property of the Boolean matrix, is employed for the computation of the subdomain interface stiffness and load, in which the original FETI-local algorithm requires a full matrix inverse computation of a high computational cost. In the global interface computation step, the original serial computation is replaced by a parallel multi-frontal method. The performance of the improved FETI-local algorithm was evaluated using a numerical example with 64 million degrees of freedom (DOFs). The computational time was reduced by up to 97.8% compared to that of the original algorithm. In addition, further stable and improved scalability was obtained in terms of a speed-up indicator. Furthermore, a performance comparison was conducted to evaluate the differences between the proposed algorithm and commercial software ANSYS using a large-scale computation with 432 million DOFs. Although ANSYS is superior in terms of computational time, the proposed algorithm has an advantage in terms of the speed-up increase per processor increase.

Study of Improved CNN Algorithm for Object Classification Machine Learning of Simple High Resolution Image (고해상도 단순 이미지의 객체 분류 학습모델 구현을 위한 개선된 CNN 알고리즘 연구)

  • Hyeopgeon Lee;Young-Woon Kim
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.1
    • /
    • pp.41-49
    • /
    • 2023
  • A convolutional neural network (CNN) is a representative algorithm for implementing artificial neural networks. CNNs have improved on the issues of rapid increase in calculation amount and low object classification rates, which are associated with a conventional multi-layered fully-connected neural network (FNN). However, because of the rapid development of IT devices, the maximum resolution of images captured by current smartphone and tablet cameras has reached 108 million pixels (MP). Specifically, a traditional CNN algorithm requires a significant cost and time to learn and process simple, high-resolution images. Therefore, this study proposes an improved CNN algorithm for implementing an object classification learning model for simple, high-resolution images. The proposed method alters the adjacency matrix value of the pooling layer's max pooling operation for the CNN algorithm to reduce the high-resolution image learning model's creation time. This study implemented a learning model capable of processing 4, 8, and 12 MP high-resolution images for each altered matrix value. The performance evaluation result showed that the creation time of the learning model implemented with the proposed algorithm decreased by 36.26% for 12 MP images. Compared to the conventional model, the proposed learning model's object recognition accuracy and loss rate were less than 1%, which is within the acceptable error range. Practical verification is necessary through future studies by implementing a learning model with more varied image types and a larger amount of image data than those used in this study.

A Study on Virtual Environment Platform for Autonomous Tower Crane (타워크레인 자율화를 위한 가상환경 플랫폼 개발에 관한 연구)

  • Kim, Myeongjun;Yoon, Inseok;Kim, Namkyoun;Park, Moonseo;Ahn, Changbum;Jung, Minhyuk
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.4
    • /
    • pp.3-14
    • /
    • 2022
  • Autonomous equipment requires a large amount of data from various environments. However, it takes a lot of time and cost for an experiment in a real construction sites, which are difficulties in data collection and processing. Therefore, this study aims to develop a virtual environment for autonomous tower cranes technology development and validation. The authors defined automation functions and operation conditions of tower cranes with three performance criteria: operational design domain, object and event detection and response, and minimum functional conditions. Afterward, this study developed a virtual environment for learning and validation for autonomous functions such as recognition, decision making, and control using the Unity game engine. Validation was conducted by construction industry experts with a fidelity which is the representative matrix for virtual environment assessment. Through the virtual environment platform developed in this study, it will be possible to reduce the cost and time for data collection and technology development. Also, it is also expected to contribute to autonomous driving for not only tower cranes but also other construction equipment.

Comparison of Parallel Preconditioners for Solving Large Sparse Linear Systems on a Massively Parallel Machine (대형이산 행렬 시스템의 초대형병렬컴퓨터에서의 해법을 위한 병렬준비 행렬의 비교)

  • Ma, Sang-Baek
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.4
    • /
    • pp.535-542
    • /
    • 1995
  • In this paper we present two preconditioners for solving large sparse linear systems arising from elliptic partial differential equations on massively parallel machines, such as the CM-5. Most massively parallel machines do heavily rely on the message-passing for the interprocessor communications. but according to the current manufacturing standards the cost of communications is very high compared to that of floating point arithmetic computations. Due to this we need an algorithm which minimizes the amount of interprocessor communication on the massively parallel machines. We will show that Block SOR(Successive Over Relaxation) method coupled with the multi-coloring technique is one of such preconditioner on the massively parallel machines, by conducting experiments in the CM-5. Also, we implemented the ADI(Alternation Direction Implicit) method in the CM-5, which has been conventionally one of the most powerful parallel preconditioner. Our experiment shows that Block SOR method coupled with the multi-coloring technique could yield a speedup with 50% efficiency with the range of number of processors form 16 to 512 for a matrix with dimension 512x512. On the other hand, the ADI method shows a very poor performance.

  • PDF