• 제목/요약/키워드: Vector data model

검색결과 1,198건 처리시간 0.025초

The Use of MSVM and HMM for Sentence Alignment

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • 제8권2호
    • /
    • pp.301-314
    • /
    • 2012
  • In this paper, two new approaches to align English-Arabic sentences in bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted from the text pair that is under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov Model. Another set of data was used for testing. The results of the MSVM and HMM outperform the results of the length based approach. Moreover these new approaches are valid for any language pairs and are quite flexible since the feature vector may contain less, more, or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research.

Short-Term Wind Speed Forecast Based on Least Squares Support Vector Machine

  • Wang, Yanling;Zhou, Xing;Liang, Likai;Zhang, Mingjun;Zhang, Qiang;Niu, Zhiqiang
    • Journal of Information Processing Systems
    • /
    • 제14권6호
    • /
    • pp.1385-1397
    • /
    • 2018
  • There are many factors that affect the wind speed. In addition, the randomness of wind speed also leads to low prediction accuracy for wind speed. According to this situation, this paper constructs the short-time forecasting model based on the least squares support vector machines (LSSVM) to forecast the wind speed. The basis of the model used in this paper is support vector regression (SVR), which is used to calculate the regression relationships between the historical data and forecasting data of wind speed. In order to improve the forecast precision, historical data is clustered by cluster analysis so that the historical data whose changing trend is similar with the forecasting data can be filtered out. The filtered historical data is used as the training samples for SVR and the parameters would be optimized by particle swarm optimization (PSO). The forecasting model is tested by actual data and the forecast precision is more accurate than the industry standards. The results prove the feasibility and reliability of the model.

Analysis of Multivariate Financial Time Series Using Cointegration : Case Study

  • Choi, M.S.;Park, J.A.;Hwang, S.Y.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권1호
    • /
    • pp.73-80
    • /
    • 2007
  • Cointegration(together with VARMA(vector ARMA)) has been proven to be useful for analyzing multivariate non-stationary data in the field of financial time series. It provides a linear combination (which turns out to be stationary series) of non-stationary component series. This linear combination equation is referred to as long term equilibrium between the component series. We consider two sets of Korean bivariate financial time series and then illustrate cointegration analysis. Specifically estimated VAR(vector AR) and VECM(vector error correction model) are obtained and CV(cointegrating vector) is found for each data sets.

  • PDF

How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • 제29권1호
    • /
    • pp.41-51
    • /
    • 2022
  • We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.

하천공간정보의 상호운용성을 위한 표준벡터데이터 모델 개발 (Development of a Standard Vector Data Model for Interoperability of River-Geospatial Information)

  • 신형진;채효석;이을래
    • 한국지리정보학회지
    • /
    • 제17권2호
    • /
    • pp.44-58
    • /
    • 2014
  • 본 연구에서는 하천공간정보의 상호운용성을 위해 표준벡터데이터 모델을 개발하고 이를 검증하기위해 강정고령보 및 창녕합천보유역의 RIMGIS 벡터자료를 대상으로 모형에 적용하여 표준벡터데이터 모델의 적용성을 평가하였다. 국제표준화기구(ISO)와 공간자료 표준화 단체(OGC)의 표준을 조사 분석하고 표준을 준용하여 하천공간 데이터모델의 규격을 정립하였다. 데이터 속성 및 관계등에 대한 분석 정보를 바탕으로 ERD를 설계하였다. 개발된 GDM에 RIMGIS의 벡터자료인 점, 선, 면 자료에 대한 검증을 레이어 별 자료에 대해 비교하고 각 자료에 대한 기본공간정보와 속성정보를 정밀 전수 비교하였다. 변환시 오류는 0%로 모델의 문제점은 없는 것으로 판단되었다. 하천공간정보 표준데이터모델은 여러 연구자와 기관들에 의해 수집된 대량의 데이터 세트의 통합 분석이 용이하도록 설계된 관계형 데이터베이스에 저장 및 하천공간자료의 검색을 위해 일관성 있는 형식을 제공하고자 한다.

The Study On the Effectiveness of Information Retrieval in the Vector Space Model and the Neural Network Inductive Learning Model

  • Kim, Seong-Hee
    • 정보기술과데이타베이스저널
    • /
    • 제3권2호
    • /
    • pp.75-96
    • /
    • 1996
  • This study is intended to compare the effectiveness of the neural network inductive learning model with a vector space model in information retrieval. As a result, searches responding to incomplete queries in the neural network inductive learning model produced a higher precision and recall as compared with searches responding to complete queries in the vector space model. The results show that the hybrid methodology of integrating an inductive learning technique with the neural network model can help solve information retrieval problems that are the results of inconsistent indexing and incomplete queries--problems that have plagued information retrieval effectiveness.

  • PDF

Fuzzy Semiparametric Support Vector Regression for Seasonal Time Series Analysis

  • Shim, Joo-Yong;Hwang, Chang-Ha;Hong, Dug-Hun
    • Communications for Statistical Applications and Methods
    • /
    • 제16권2호
    • /
    • pp.335-348
    • /
    • 2009
  • Fuzzy regression is used as a complement or an alternative to represent the relation between variables among the forecasting models especially when the data is insufficient to evaluate the relation. Such phenomenon often occurs in seasonal time series data which require large amount of data to describe the underlying pattern. Semiparametric model is useful tool in the case where domain knowledge exists about the function to be estimated or emphasis is put onto understandability of the model. In this paper we propose fuzzy semiparametric support vector regression so that it can provide good performance on forecasting of the seasonal time series by incorporating into fuzzy support vector regression the basis functions which indicate the seasonal variation of time series. In order to indicate the performance of this method, we present two examples of predicting the seasonal time series. Experimental results show that the proposed method is very attractive for the seasonal time series in fuzzy environments.

Issues Related to the Use of Time Series in Model Building and Analysis: Review Article

  • Wei, William W.S.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.209-222
    • /
    • 2015
  • Time series are used in many studies for model building and analysis. We must be very careful to understand the kind of time series data used in the analysis. In this review article, we will begin with some issues related to the use of aggregate and systematic sampling time series. Since several time series are often used in a study of the relationship of variables, we will also consider vector time series modeling and analysis. Although the basic procedures of model building between univariate time series and vector time series are the same, there are some important phenomena which are unique to vector time series. Therefore, we will also discuss some issues related to vector time models. Understanding these issues is important when we use time series data in modeling and analysis, regardless of whether it is a univariate or multivariate time series.

Support Vector Regression을 이용한 소프트웨어 개발비 예측 (Estimating Software Development Cost using Support Vector Regression)

  • 박찬규
    • 경영과학
    • /
    • 제23권2호
    • /
    • pp.75-91
    • /
    • 2006
  • The purpose of this paper is to propose a new software development cost estimation method using SVR(Support Vector Regression) SVR, one of machine learning techniques, has been attracting much attention for its theoretic clearness and food performance over other machine learning techniques. This paper may be the first study in which SVR is applied to the field of software cost estimation. To derive the new method, we analyze historical cost data including both well-known overseas and domestic software projects, and define cost drivers affecting software cost. Then, the SVR model is trained using the historical data and its estimation accuracy is compared with that of the linear regression model. Experimental results show that the SVR model produces more accurate prediction than the linear regression model.

3D 메쉬 모델의 쉐이딩 시 시각적 왜곡을 방지하는 법선 벡터 압축에 관한 연구 (The Compression of Normal Vectors to Prevent Visulal Distortion in Shading 3D Mesh Models)

  • 문현식;정채봉;김재정
    • 한국CDE학회논문집
    • /
    • 제13권1호
    • /
    • pp.1-7
    • /
    • 2008
  • Data compression becomes increasingly an important issue for reducing data storage spaces as well as transmis-sion time in network environments. In 3D geometric models, the normal vectors of faces or meshes take a major portion of the data so that the compression of the vectors, which involves the trade off between the distortion of the images and compression ratios, plays a key role in reducing the size of the models. So, raising the compression ratio when the normal vector is compressed and minimizing the visual distortion of shape model's shading after compression are important. According to the recent papers, normal vector compression is useful to heighten com-pression ratio and to improve memory efficiency. But, the study about distortion of shading when the normal vector is compressed is rare relatively. In this paper, new normal vector compression method which is clustering normal vectors and assigning Representative Normal Vector (RNV) to each cluster and using the angular deviation from actual normal vector is proposed. And, using this new method, Visually Undistinguishable Lossy Compression (VULC) algorithm which distortion of shape model's shading by angular deviation of normal vector cannot be identified visually has been developed. And, being applied to the complicated shape models, this algorithm gave a good effectiveness.