• Title/Summary/Keyword: vectorization

Search Result 58, Processing Time 0.027 seconds

INSTABILITY OF THE BETTI SEQUENCE FOR PERSISTENT HOMOLOGY AND A STABILIZED VERSION OF THE BETTI SEQUENCE

  • JOHNSON, MEGAN;JUNG, JAE-HUN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.25 no.4
    • /
    • pp.296-311
    • /
    • 2021
  • Topological Data Analysis (TDA), a relatively new field of data analysis, has proved very useful in a variety of applications. The main persistence tool from TDA is persistent homology in which data structure is examined at many scales. Representations of persistent homology include persistence barcodes and persistence diagrams, both of which are not straightforward to reconcile with traditional machine learning algorithms as they are sets of intervals or multisets. The problem of faithfully representing barcodes and persistent diagrams has been pursued along two main avenues: kernel methods and vectorizations. One vectorization is the Betti sequence, or Betti curve, derived from the persistence barcode. While the Betti sequence has been used in classification problems in various applications, to our knowledge, the stability of the sequence has never before been discussed. In this paper we show that the Betti sequence is unstable under the 1-Wasserstein metric with regards to small perturbations in the barcode from which it is calculated. In addition, we propose a novel stabilized version of the Betti sequence based on the Gaussian smoothing seen in the Stable Persistence Bag of Words for persistent homology. We then introduce the normalized cumulative Betti sequence and provide numerical examples that support the main statement of the paper.

Direct Position Determination of Coherently Distributed Sources based on Compressed Sensing with a Moving Nested Array

  • Yankui, Zhang;Haiyun, Xu;Bin, Ba;Rong, Zong;Daming, Wang;Xiangzhi, Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.5
    • /
    • pp.2454-2468
    • /
    • 2019
  • The existing direct position determinations(DPD) for coherently distributed(CD) sources are mostly applicable for uniform linear array(ULA), which result in a low degree of freedom(DOF), and it is difficult for them to realize the effective positioning in underdetermined condition. In this paper, a novel DPD algorithm for coherently distributed sources based on compressed sensing with a moving nested array is present. In this algorithm, the nested array is introduced to DPD firstly, and a positioning model of signal moving station based on nested array is constructed. Owing to the features of coherently distributed sources, the cost function of compressed sensing is established based on vectorization. For the sake of convenience, unconstrained transformation and convex transformation of cost functions are carried out. Finally, the position coordinates of the distribution source signals are obtained according to the theory of optimization. At the same time, the complexity is analyzed, and the simulation results show that, in comparison with two-step positioning algorithms and subspace-based algorithms, the proposed algorithm effectively solves the positioning problem in underdetermined condition with the same physical element number.

Technology Development for Improving Animation Performance Based on Train Route Patterns (열차 경로 패턴기반 애니메이션 성능 개선 기술 개발)

  • Lee, Duk-Hee;Yang, Won-Mo;Kim, Yong-Il;Yang, Yun-Hee;Shin, Yong-Tae
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.5
    • /
    • pp.136-146
    • /
    • 2012
  • As information technology used for simulation and virtual reality developed, there is a growing interest in animation technologies which will effectively deliver simulation results to users. Various efforts have been made to improve animation performance, like playback quality and speed, input-output speed and storage space reduction. However, earlier studies generally focused on image compression frame by frame. To significantly improve storage space and playback speed, animation data should be vectorized. Also, spatial and temporal duplication have to be removed. In this study, animation data structure was improved fundamentally through establishment of hierarchy and vectorization. Also Spatial and temporal duplication of animation data was removed through vectorization based on train route. As a result, storage space was reduced, input-output speed and playback speed were considerably improved. According to the test, additional Patternization which followed vectorization brought reduction of over 80% in storage space and input-output speed was quadrupled. Patternization technology can be used as a proper storage method of animation data, and can provide user-specific animation by small data transmission.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

A Main Wall Recognition of Architectural Drawings using Dimension Extension Line (치수보조선을 이용한 도면의 주벽인식)

  • Kwon, Young-Bin
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.837-846
    • /
    • 2003
  • This paper deals with plain figures on the architectural drawings of apartment. This kind of architectural drawings consist of main walls represented by two parallel bold lines, symbols (door, window, $\cdots$), dimension line, extension line, and dimensions represent various numerical values and characters. This paper suggests a method for recognizing main wall which is a backbone of apartment in an architectural drawing. In this thesis, the following modules are realized : an efficient image barbarization, a removal of thin lines, a vectorization of detected lines, a region bounding for main walls, a calculation of extension lines, a finding main walls based on extension line, and a field expansion by searching other main walls which are linked with the detected main walls. Although the windows between main walls are not represented as main walls, a detection module for the windows is considered during the recognition period. So the windows are found as a part of main wall. An experimental result on 9 different architectural drawings shows 96.5% recognition of main walls and windows, which is about 5.8% higher than that of Karl Tombre.

Research on the Table Vacuolization in the Document Image (문서 영상 내의 테이블 벡터화 연구)

  • Kim, U-Seong;Sim, Jin-Bo;Park, Yong-Beom;Mun, Gyeong-Ae;Ji, Su-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1147-1159
    • /
    • 1996
  • In this paper. we develop an efficient algorithm which vectorize the table input for mixed document recognition system. It is necessary to separate character and line for recognizing the character in the table. For recognizing table, we have to recognize the character which is blocked by table line and develop the efficient rectorization method for table line. For vectorizing table, we develop several methods. The first method is to extract table line part using 8-dircction chaincodes. The second method is to extract horizontal and vertical lines using histogram of lines. The third one is to extract diagonal lines of table by using the cross points of horizontal and verticallines. Finally we also develop the table vectorization method which finds the regularity characteristics of horizontal and vertical lines composing table, In the paper, we sugest a regularity method for efficient table vectorization.

  • PDF

Sentiment Analysis and Star Rating Prediction Based on Big Data Analysis of Online Reviews of Foreign Tourists Visiting Korea (방한 관광객의 온라인 리뷰에 대한 빅데이터 분석 기반의 감성분석 및 평점 예측모형)

  • Hong, Taeho
    • Knowledge Management Research
    • /
    • v.23 no.1
    • /
    • pp.187-201
    • /
    • 2022
  • Online reviews written by tourists provide important information for the management and operation of the tourism industry. The star rating of online reviews is a simple quantitative evaluation of a product or service, but it is difficult to reflect the sincere attitude of tourists. There is also an issue; the star rating and review content are not matched. In this study, a star rating prediction model based on online review content was proposed to solve the discrepancy problem. We compared the differences in star ratings and sentiment by continent through sentiment analysis on tourist attractions and hotels written by foreign tourists who visited Korea. Variables were selected through TF-IDF vectorization and sentiment analysis results. Logit, artificial neural network, and SVM(Support Vector Machine) were used for the classification model, and artificial neural network and SVR(Support Vector regression) were applied for the rating prediction model. The online review rating prediction model proposed in this study could solve inconsistency problems and also could be applied even if when there is no star rating.

Clustering Meta Information of K-Pop Girl Groups Using Term Frequency-inverse Document Frequency Vectorization (단어-역문서 빈도 벡터화를 통한 한국 걸그룹의 음반 메타 정보 군집화)

  • JoonSeo Hyeon;JaeHyuk Cho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.12-23
    • /
    • 2023
  • In the 2020s, the K-Pop market has been dominated by girl groups over boy groups and the fourth generation over the third generation. This paper presents methods and results on lyric clustering to investigate whether the generation of girl groups has started to change. We collected meta-information data for 1469 songs of 47 groups released from 2013 to 2022 and classified them into lyric information and non-lyric meta-information and quantified them respectively. The lyrics information was preprocessed by applying word-translation frequency vectorization based on previous studies and then selecting only the top vector values. Non-lyric meta-information was preprocessed and applied with One-Hot Encoding to reduce the bias of using only lyric information and show better clustering results. The clustering performance on the preprocessed data is 129%, 45% higher for Spherical K-Means' Silhouette Score and Calinski-Harabasz Score, respectively, compared to Hierarchical Clustering. This paper is expected to contribute to the study of Korean popular song development and girl group lyrics analysis and clustering.

  • PDF

Recognition of dimension lines based on extraction of the objet in mechanical drawings (기계 도면에서 객체의 분리 추출에 기반한 치수선의 인식)

  • 정영수;박길흠
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.10
    • /
    • pp.120-131
    • /
    • 1997
  • This paper prsents a new method that automatically recognizes the dimension lines (consisting of shape lines, tail lines and extension lines) from the mechanical drawings. In the proposed method, the object and closed-loop symbols are separated from the character-free drawings. Then the object lines and interpretation lines are vectorized by using several techniques such as thinning, line-vectorization, and vector-clustering. Finally, after recognizing arrowheads by using pattern matching, we recognize dimension lines from interpretation lines by using arrohead's directional vector and centroid. By using the methods of geometric modeling and mathematical operation, the proposed method readility recognizes the dimension lines from complex drawings. Experimental resuls are presented, which are obtained by applying the proposed method to drawings drawn in compliance with the KS drafting standard.

  • PDF

Automatic Geographical Entity Recognition and Modeling for Land Registered Map (지적도를 위한 자동지형객체 인식 및 모델링)

  • 유희종;정창성
    • Spatial Information Research
    • /
    • v.2 no.2
    • /
    • pp.197-205
    • /
    • 1994
  • In this paper, we present a vectorization algorithm for finding a vector image from a raster image of the land registered map which is used as the base map for various applications, and an automatic region creation algorithm for generating every re¬gion automatically from the vector image. We describe an ARM (automatic geographical entity recognition and modeling software) which carries out the recognition and process¬ing of geographical entities automatically using those algorithms.

  • PDF