• Title/Summary/Keyword: Vector data model

Search Result 1,186, Processing Time 0.032 seconds

The Use of MSVM and HMM for Sentence Alignment

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.301-314
    • /
    • 2012
  • In this paper, two new approaches to align English-Arabic sentences in bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted from the text pair that is under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov Model. Another set of data was used for testing. The results of the MSVM and HMM outperform the results of the length based approach. Moreover these new approaches are valid for any language pairs and are quite flexible since the feature vector may contain less, more, or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research.

Short-Term Wind Speed Forecast Based on Least Squares Support Vector Machine

  • Wang, Yanling;Zhou, Xing;Liang, Likai;Zhang, Mingjun;Zhang, Qiang;Niu, Zhiqiang
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1385-1397
    • /
    • 2018
  • There are many factors that affect the wind speed. In addition, the randomness of wind speed also leads to low prediction accuracy for wind speed. According to this situation, this paper constructs the short-time forecasting model based on the least squares support vector machines (LSSVM) to forecast the wind speed. The basis of the model used in this paper is support vector regression (SVR), which is used to calculate the regression relationships between the historical data and forecasting data of wind speed. In order to improve the forecast precision, historical data is clustered by cluster analysis so that the historical data whose changing trend is similar with the forecasting data can be filtered out. The filtered historical data is used as the training samples for SVR and the parameters would be optimized by particle swarm optimization (PSO). The forecasting model is tested by actual data and the forecast precision is more accurate than the industry standards. The results prove the feasibility and reliability of the model.

Analysis of Multivariate Financial Time Series Using Cointegration : Case Study

  • Choi, M.S.;Park, J.A.;Hwang, S.Y.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.73-80
    • /
    • 2007
  • Cointegration(together with VARMA(vector ARMA)) has been proven to be useful for analyzing multivariate non-stationary data in the field of financial time series. It provides a linear combination (which turns out to be stationary series) of non-stationary component series. This linear combination equation is referred to as long term equilibrium between the component series. We consider two sets of Korean bivariate financial time series and then illustrate cointegration analysis. Specifically estimated VAR(vector AR) and VECM(vector error correction model) are obtained and CV(cointegrating vector) is found for each data sets.

  • PDF

How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.41-51
    • /
    • 2022
  • We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.

Development of a Standard Vector Data Model for Interoperability of River-Geospatial Information (하천공간정보의 상호운용성을 위한 표준벡터데이터 모델 개발)

  • Shin, Hyung-Jin;Chae, Hyo-Sok;Lee, Eul-Rae
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.2
    • /
    • pp.44-58
    • /
    • 2014
  • In this study, a standard vector data model was developed for interoperability of river-geospatial information and for verification purpose the applicability of the standard vector model was evaluated using a model to RIMGIS vector data at Changnyeong-Hapcheon & Gangjung-Goryeong irrigation watershed. The standards from ISO and OGC were analyzed and the river geospatial data model standard was established by applying the standards. The ERD was designed based on the analysis information on data characteristics and relationship. The verification of RIMGIS vector data included points, lines and polygon to develope GDM was carried out by comparing with the data by layer. This conducting comparison of basic spatial data and attribute data to each record and spatial information vertex. The error in the process of conversion was 0 %, indicating no problem with model. Our Geospatial Data Model presented in this study provides a new and consistent format for the storage and retrieval of river geospatial data from connected database. It is designed to facilitators integrated analysis of large data sets collected by multiple institutes.

The Study On the Effectiveness of Information Retrieval in the Vector Space Model and the Neural Network Inductive Learning Model

  • Kim, Seong-Hee
    • The Journal of Information Technology and Database
    • /
    • v.3 no.2
    • /
    • pp.75-96
    • /
    • 1996
  • This study is intended to compare the effectiveness of the neural network inductive learning model with a vector space model in information retrieval. As a result, searches responding to incomplete queries in the neural network inductive learning model produced a higher precision and recall as compared with searches responding to complete queries in the vector space model. The results show that the hybrid methodology of integrating an inductive learning technique with the neural network model can help solve information retrieval problems that are the results of inconsistent indexing and incomplete queries--problems that have plagued information retrieval effectiveness.

  • PDF

Fuzzy Semiparametric Support Vector Regression for Seasonal Time Series Analysis

  • Shim, Joo-Yong;Hwang, Chang-Ha;Hong, Dug-Hun
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.335-348
    • /
    • 2009
  • Fuzzy regression is used as a complement or an alternative to represent the relation between variables among the forecasting models especially when the data is insufficient to evaluate the relation. Such phenomenon often occurs in seasonal time series data which require large amount of data to describe the underlying pattern. Semiparametric model is useful tool in the case where domain knowledge exists about the function to be estimated or emphasis is put onto understandability of the model. In this paper we propose fuzzy semiparametric support vector regression so that it can provide good performance on forecasting of the seasonal time series by incorporating into fuzzy support vector regression the basis functions which indicate the seasonal variation of time series. In order to indicate the performance of this method, we present two examples of predicting the seasonal time series. Experimental results show that the proposed method is very attractive for the seasonal time series in fuzzy environments.

Issues Related to the Use of Time Series in Model Building and Analysis: Review Article

  • Wei, William W.S.
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.209-222
    • /
    • 2015
  • Time series are used in many studies for model building and analysis. We must be very careful to understand the kind of time series data used in the analysis. In this review article, we will begin with some issues related to the use of aggregate and systematic sampling time series. Since several time series are often used in a study of the relationship of variables, we will also consider vector time series modeling and analysis. Although the basic procedures of model building between univariate time series and vector time series are the same, there are some important phenomena which are unique to vector time series. Therefore, we will also discuss some issues related to vector time models. Understanding these issues is important when we use time series data in modeling and analysis, regardless of whether it is a univariate or multivariate time series.

Estimating Software Development Cost using Support Vector Regression (Support Vector Regression을 이용한 소프트웨어 개발비 예측)

  • Park, Chan-Kyoo
    • Korean Management Science Review
    • /
    • v.23 no.2
    • /
    • pp.75-91
    • /
    • 2006
  • The purpose of this paper is to propose a new software development cost estimation method using SVR(Support Vector Regression) SVR, one of machine learning techniques, has been attracting much attention for its theoretic clearness and food performance over other machine learning techniques. This paper may be the first study in which SVR is applied to the field of software cost estimation. To derive the new method, we analyze historical cost data including both well-known overseas and domestic software projects, and define cost drivers affecting software cost. Then, the SVR model is trained using the historical data and its estimation accuracy is compared with that of the linear regression model. Experimental results show that the SVR model produces more accurate prediction than the linear regression model.

The Compression of Normal Vectors to Prevent Visulal Distortion in Shading 3D Mesh Models (3D 메쉬 모델의 쉐이딩 시 시각적 왜곡을 방지하는 법선 벡터 압축에 관한 연구)

  • Mun, Hyun-Sik;Jeong, Chae-Bong;Kim, Jay-Jung
    • Korean Journal of Computational Design and Engineering
    • /
    • v.13 no.1
    • /
    • pp.1-7
    • /
    • 2008
  • Data compression becomes increasingly an important issue for reducing data storage spaces as well as transmis-sion time in network environments. In 3D geometric models, the normal vectors of faces or meshes take a major portion of the data so that the compression of the vectors, which involves the trade off between the distortion of the images and compression ratios, plays a key role in reducing the size of the models. So, raising the compression ratio when the normal vector is compressed and minimizing the visual distortion of shape model's shading after compression are important. According to the recent papers, normal vector compression is useful to heighten com-pression ratio and to improve memory efficiency. But, the study about distortion of shading when the normal vector is compressed is rare relatively. In this paper, new normal vector compression method which is clustering normal vectors and assigning Representative Normal Vector (RNV) to each cluster and using the angular deviation from actual normal vector is proposed. And, using this new method, Visually Undistinguishable Lossy Compression (VULC) algorithm which distortion of shape model's shading by angular deviation of normal vector cannot be identified visually has been developed. And, being applied to the complicated shape models, this algorithm gave a good effectiveness.