• Title/Summary/Keyword: Vectorization

Search Result 56, Processing Time 0.03 seconds

Comparative Analysis of Vectorization Techniques in Electronic Medical Records Classification (의무 기록 문서 분류를 위한 자연어 처리에서 최적의 벡터화 방법에 대한 비교 분석)

  • Yoo, Sung Lim
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.2
    • /
    • pp.109-115
    • /
    • 2022
  • Purpose: Medical records classification using vectorization techniques plays an important role in natural language processing. The purpose of this study was to investigate proper vectorization techniques for electronic medical records classification. Material and methods: 403 electronic medical documents were extracted retrospectively and classified using the cosine similarity calculated by Scikit-learn (Python module for machine learning) in Jupyter Notebook. Vectors for medical documents were produced by three different vectorization techniques (TF-IDF, latent sematic analysis and Word2Vec) and the classification precisions for three vectorization techniques were evaluated. The Kruskal-Wallis test was used to determine if there was a significant difference among three vectorization techniques. Results: 403 medical documents were relevant to 41 different diseases and the average number of documents per diagnosis was 9.83 (standard deviation=3.46). The classification precisions for three vectorization techniques were 0.78 (TF-IDF), 0.87 (LSA) and 0.79 (Word2Vec). There was a statistically significant difference among three vectorization techniques. Conclusions: The results suggest that removing irrelevant information (LSA) is more efficient vectorization technique than modifying weights of vectorization models (TF-IDF, Word2Vec) for medical documents classification.

Performance Improvement of Cumulus Parameterization Code by Unicon Optimization Scheme (Unicon Optimization 기법을 이용한 적운모수화 코드 성능 향상)

  • Lee, Chang-Hyun;kim, Min-gyu;Shin, Dae-Yeong;Cho, Ye-Rin;Yeom, Gi-Hun;Chung, Sung-Wook
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.2
    • /
    • pp.124-133
    • /
    • 2022
  • With the development of hardware technology and the advancement of numerical model methods, more precise weather forecasts can be carried out. In this paper, we propose a Unicon Optimization scheme combining Loop Vectorization, Dependency Vectorization, and Code Modernization to optimize and increase Maintainability the Unicon source contained in SCAM, a simplified version of CESM, and present an overall SCAM structure. This paper tested the unicorn optimization scheme in the SCAM structure, and compared to the existing source code, the loop vectorization resulted in a performance improvement of 3.086% and the dependency vectorization of 0.4572%. And in the case of Unicorn Optimization, which applied all of these, the performance improvement was 3.457% compared to the existing source code. This proves that the Unicorn Optimization technique proposed in this paper provides excellent performance.

Analysis of Three Dimensional Positioning Accuracy of Vectorization Using UAV-Photogrammetry (무인항공사진측량을 이용한 벡터화의 3차원 위치정확도 분석)

  • Lee, Jae One;Kim, Doo Pyo
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.525-533
    • /
    • 2019
  • There are two feature collection methods in digital mapping using the UAV (Unmanned Aerial Vehicle) Photogrammetry: vectorization and stereo plotting. In vectorization, planar information is extracted from orthomosaics and elevation value obtained from a DSM (Digital Surface Model) or a DEM (Digital Elevation Model). However, the exact determination of the positional accuracy of 3D features such as ground facilities and buildings is very ambiguous, because the accuracy of vectorizing results has been mainly analyzed using only check points placed on the ground. Thus, this study aims to review the possibility of 3D spatial information acquisition and digital map production of vectorization by analyzing the corner point coordinates of different layers as well as check points. To this end, images were taken by a Phantom 4 (DJI) with 3.6 cm of GSD (Ground Sample Distance) at altitude of 90 m. The outcomes indicate that the horizontal RMSE (Root Mean Square Error) of vectorization method is 0.045 cm, which was calculated from residuals at check point compared with those of the field survey results. It is therefore possible to produce a digital topographic (plane) map of 1:1,000 scale using ortho images. On the other hand, the three-dimensional accuracy of vectorization was 0.068~0.162 m in horizontal and 0.090~1.840 m in vertical RMSE. It is thus difficult to obtain 3D spatial information and 1:1,000 digital map production by using vectorization due to a large error in elevation.

Vectorization of an Explicit Finite Element Method on Memory-to-Memory Type Vector Computer (Memory-to-Memory방식 벡터컴퓨터에서의 외연적 유한요소법의 벡터화)

  • 이지호;이재석
    • Computational Structural Engineering
    • /
    • v.4 no.1
    • /
    • pp.95-108
    • /
    • 1991
  • An explicit finite element method can be executed more rapidly and effectively on vector computer than on the scalar computer because it has suitable structures for vector processing. In this paper, an efficient vectorization method of the explicit finite element program on the memory-to-memory type vector computer is proposed. First, the general vectorization method which can be applied regardless of the vector architecture is investigated, then the method which is suitable for the memory-to-memory type vector computer is proposed. To illustrate the usefulness of the proposed vectorization method, DYNA3D, the existing explicit finite element program, is migrated on HDS AS/XL V50 which is the memory-to-memory type vector computer. Performance results on actual test show a vector/scalar speedup is above 2.4.

  • PDF

The vectorization and recognition of circuit symbols for electronic circuit drawing management (전자회로 도면관리를 위한 벡터화와 회로 기호의 인식)

  • 백영묵;석종원;진성일;황찬식
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.3
    • /
    • pp.176-185
    • /
    • 1996
  • Transformin the huge size of drawings into a suitable format for CAD system and recognizng the contents of drawings are the major concerans in the automated analysis of engineering drawings. This paper proposes some methods for text/graphics separation, symbol extraction, vectorization and symbol recognition with the object of applying them to electronic cirucit drawings. We use MBR (Minimum bounding rectangle) and size of isolated region on the drawings for separating text and graphic regions. Characteristics parameters such as the number of pixels, the length of circular constant and the degree of round shape are used for extracting loop symbols and geometric structures for non-loop symbols. To recognize symbols, nearest netighbor between FD (foruier descriptor) of extractd symbols and these of classification reference symbols is used. Experimental results show that the proposed method can generate compact vector representation of extracted symbols and perform the scale change and rotation of extracted symbol using symbol vectorization. Also we achieve an efficient searching of circuit drawings.

  • PDF

The Vectorization of EOG for Man-Machine Interfacing (Man-Machine Interfacing을 위한 EOG의 벡터화)

  • Park, Jong-Hwan;Cheon, Woo-Young;Park, Hyung-Jun
    • Proceedings of the KIEE Conference
    • /
    • 1998.07b
    • /
    • pp.604-606
    • /
    • 1998
  • As a basic study for Man-Machine interfacing technics, this paper purposed the vertorization of EOG(electrooculogram) that is generated by eye movement. EOG is electric potential difference between the positive potential of cornea and the negative potential of retina. The magnitude and the polarity are depend on the direction of eye movement and degree of gaze angle. In order to vectorize EOG, EOG signal is measured about vertical and horizontal movement of eyes. This vectorization of EOG is expected to help Man-Machine Interfacing technics and development of other useful equipment.

  • PDF

A Simple Toeplitz Channel Matrix Decomposition with Vectorization Technique for Large scaled MIMO System (벡터화 기술을 이용한 대규모 MIMO 시스템의 간단한 Toeplitz 채널 행렬 분해)

  • Park, Ju Yong;Hanif, Mohammad Abu;Kim, Jeong Su;Song, Sang Seob;Lee, Moon Ho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.9
    • /
    • pp.21-29
    • /
    • 2014
  • Due to enormous number of user and limited memory space, the memory saving is become an important issue for big data service these days. In the large scaled multiple-input multiple-output (MIMO) system, the Teoplitz channel can play the significance rule to improve the performance as well as power efficiency. In this paper, we propose a Toeplitz channel decomposition based on matrix vectorization. Here we use Toeplitz matrix to the channel for large scaled MIMO system. And we show that the Toeplitz Jacket matrices are decomposed to Cooley-Tukey sparse matrices like fast Fourier transform (FFT).

Separation of Character Strings and High Quality Vectorization for Korean Cadastral Map (한국 지적도에서의 문자분리 및 고품질 벡터화)

  • Bang, Keuk-Joon;Hong, Dae-Sik
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.2
    • /
    • pp.63-68
    • /
    • 1999
  • We propose a new method which can solve the difficulty of separation the character strings from the interconnected lines and the distortions of vectorization at the crossing points and the junction points for the digitized maps at the same time. After the image is thinned, the crossing points and the junction points are detected with their neighbors, which we call the uncertain areas. And then the broken lines are connected each other, and the character strings are separated at the same time. The proposed method is applied to Korean cadastral map. Usually, Korean cadastral map consists of straight lines and character strings. The experimental results show that the method is effective in separating the character strings and getting high quality vectorization for the Korean cadastral map.

  • PDF

Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport

  • Romano, Paul K.;Siegel, Andrew R.
    • Nuclear Engineering and Technology
    • /
    • v.49 no.6
    • /
    • pp.1165-1171
    • /
    • 2017
  • The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.

INSTABILITY OF THE BETTI SEQUENCE FOR PERSISTENT HOMOLOGY AND A STABILIZED VERSION OF THE BETTI SEQUENCE

  • JOHNSON, MEGAN;JUNG, JAE-HUN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.25 no.4
    • /
    • pp.296-311
    • /
    • 2021
  • Topological Data Analysis (TDA), a relatively new field of data analysis, has proved very useful in a variety of applications. The main persistence tool from TDA is persistent homology in which data structure is examined at many scales. Representations of persistent homology include persistence barcodes and persistence diagrams, both of which are not straightforward to reconcile with traditional machine learning algorithms as they are sets of intervals or multisets. The problem of faithfully representing barcodes and persistent diagrams has been pursued along two main avenues: kernel methods and vectorizations. One vectorization is the Betti sequence, or Betti curve, derived from the persistence barcode. While the Betti sequence has been used in classification problems in various applications, to our knowledge, the stability of the sequence has never before been discussed. In this paper we show that the Betti sequence is unstable under the 1-Wasserstein metric with regards to small perturbations in the barcode from which it is calculated. In addition, we propose a novel stabilized version of the Betti sequence based on the Gaussian smoothing seen in the Stable Persistence Bag of Words for persistent homology. We then introduce the normalized cumulative Betti sequence and provide numerical examples that support the main statement of the paper.