• Title/Summary/Keyword: dimension reduction method

Search Result 251, Processing Time 0.024 seconds

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

A study on reduction of sensibility dimension for selection of wallpaper (벽지 선택을 위한 감성 차원 축소에 관한 연구)

  • Chun Young-Min;Kim Soon-Young;Kim Sung-Hwan;Chung Sung-Suk
    • Science of Emotion and Sensibility
    • /
    • v.8 no.4
    • /
    • pp.333-344
    • /
    • 2005
  • The sensitivity adjectives on wall paper are collected. With the collected sensitivity adjective, we are going to develop the model which can recommend the wallpaper to customer. A large number of adjectives describing affective responses were collected from such diverse sources as questionnaire survey results, field survey results and internet survey result. To search the representative adjective of collected adjective, we used the diverse statistical analysis method. We attempted to decide the axis name of dimension through the MDS(Multi-Dimensional Scale) analysis method using the similarity matrix an4 to find a three or four reduced factors through the factor analysis method using the varimax rotation method. The result of the analysis showed that the reduced factors could account about $82\%$ when the number of factor is three(popular, elegance, and passable) ant about $93\%$ when the number of factor is four (elegance, passable, beautiful, and affectionate) On the basis of this result, we expect it can be used to develop the model recommending the wallpaper.

  • PDF

An Improved RSR Method to Obtain the Sparse Projection Matrix (희소 투영행렬 획득을 위한 RSR 개선 방법론)

  • Ahn, Jung-Ho
    • Journal of Digital Contents Society
    • /
    • v.16 no.4
    • /
    • pp.605-613
    • /
    • 2015
  • This paper addresses the problem to make sparse the projection matrix in pattern recognition method. Recently, the size of computer program is often restricted in embedded systems. It is very often that developed programs include some constant data. For example, many pattern recognition programs use the projection matrix for dimension reduction. To improve the recognition performance, very high dimensional feature vectors are often extracted. In this case, the projection matrix can be very big. Recently, RSR(roated sparse regression) method[1] was proposed. This method has been proved one of the best algorithm that obtains the sparse matrix. We propose three methods to improve the RSR; outlier removal, sampling and elastic net RSR(E-RSR) in which the penalty term in RSR optimization function is replaced by that of the elastic net regression. The experimental results show that the proposed methods are very effective and improve the sparsity rate dramatically without sacrificing the recognition rate compared to the original RSR method.

Aerodynamic acoustics of automotive weather strip protuberance (풍절소음 저감을 위한 웨더스트립 돌출부 형상연구)

  • Kim, Tae-Hoo;Lee, Gye-Ho;Jeon, Seung-Gyeong;Hwang, Jung-Ho;Kim, Joon-Hyung
    • Proceedings of the KSME Conference
    • /
    • 2007.05b
    • /
    • pp.2546-2551
    • /
    • 2007
  • Weather Strip(W/S) is a rubber part to proof water, sound and dust for opening and shutting devices including vehicle doors. And it requires high dimension precision and durability to proof water, noise, vibration and etc. But ironically it itself makes some wind noise because of some protuberance with glasses. The air flow analysis of door part of vehicle makes it possible to calculate and find out the cause of wind noise. In previous analysis, we focus on the numerical air flow analysis of the automobile side part. We do 2D-C.F.D first and 3D second. Through simulations, we can calculate the amount of sound pressure level at the glass run and find out the effects of glass run to make wind noise. Finally we can improve shape of glass run to reduce wind noise although it is small amounts of sound pressure reduction compared with total vehicle noise level.

  • PDF

Graphical regression and model assessment in logistic model (로지스틱모형에서 그래픽을 이용한 회귀와 모형평가)

  • Kahng, Myung-Wook;Kim, Bu-Yong;Hong, Ju-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.21-32
    • /
    • 2010
  • Graphical regression is a paradigm for obtaining regression information using plots without model assumptions. The general goal of this approach is to find lowdimensional sufficient summary plots without loss of important information. Model assessments using residual plots are less likely to be successful in models that are not linear. As an alternative approach, marginal model plots provide a general graphical method for assessing the model. We apply the methods of graphical regression and model assessment using marginal model plots to the logistic regression model.

Optimization of Geometric Dimension & Tolerance Parameters of Front Suspension System for Vehicle Pulls Improvement (차량 쏠림 개선을 위한 전륜 현가시스템의 기하공차 최적화)

  • Kim, Yong-Suk;Jang, Dong-Young
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.33 no.9
    • /
    • pp.903-912
    • /
    • 2009
  • This study is focused on simulation-based dimensional tolerance optimization process (DTOP) to minimize vehicle pulls by reduction of dimensional variation in front suspension system. In previous studies, the effect of tires and wheel alignment sensitivity have mainly been investigated to eliminate vehicle pulls in nominal design condition without allocating optimal tolerance level for selected components, among various factors regarding vehicle pulls such as vehicle design parameters, vehicle weight balance, tires, and environmental factors. Unfortunately, there are wide variations in the real vehicle, and these have impacted actual vehicle pulls, especially wheel alignment effects from suspension geometry variation has not been considered in the previous studies. In the tolerance design of suspension, tolerance variables with the uncertainty such as parts dimensional variation, assembly process, datum position and direction, and assembly tool tolerance has a great influence on the variation of the suspension dimensional performances. This study introduces total vehicle pull prediction model in considering major key factors for vehicle pull sensitivity. The Monte Carlo-based tolerance analysis model using Taguchi robust method is developed to optimize dimensional tolerance parameters, satisfying on the target variation level.

Nonlinear dynamic analysis of laterally loaded pile

  • Mehndiratta, S.;Sawant, V.A.;Samadhiya, N.K.
    • Structural Engineering and Mechanics
    • /
    • v.49 no.4
    • /
    • pp.479-489
    • /
    • 2014
  • In the present study a parametric analysis is conducted to study the effect of pile dimension and soil properties on the nonlinear dynamic response of pile subjected to lateral sinusoidal load at the pile head. The study is conducted on soil-pile model of different pile diameter, pile length and soil modulus, and results are compared to get the effect. The soil-pile system is modelled using Finite element method. The programming is done in MATLAB. Time history analysis of model is done for varying non-dimensional frequency of load and the results are compared to get the non-dimensional frequency at which pile head displacement is maximum in each case. Maximum possible bending moment and soil-pile interacting forces for the dynamic excitation of the pile is also compared. When results are compared with the linear response, it is observed that non-dimensional frequency is reduced in nonlinear response on account of reduction in the soil stiffness due to yielding. Nonlinear response curve shows high amplitude as compared to linear response curve.

An Analysis of 3-D Object Characteristics Using Locally Linear Embedding (시점별 형상의 지역적 선형 사상을 통한 3차원 물체의 특성 분석)

  • Lee, Soo-Chahn;Yun, Il-Dong
    • Journal of Broadcast Engineering
    • /
    • v.14 no.1
    • /
    • pp.81-84
    • /
    • 2009
  • This paper explores the possibility of describing objects from the change in the shape according to the change in viewpoint. Specifically, we sample the shapes from various viewpoints of a 3-D model, and apply dimension reduction by locally linear embedding. A low dimensional distribution of points are constructed, and characteristics of the object are described from this distribution. Also, we propose two 3-D retrieval methods by applying the iterative closest point algorithm, and by applying Fourier transform and measuring similarity by modified Housdorff distance, and present experimental results. The proposed method shows that the change of shape according to the change in viewpoint can describe the characteristics of an object.

The Impact of Avoidable Mortality on Life Expectancy at Birth in Korea, 1990-2009 (우리나라 피할 수 있는 사망의 기대수명에 미치는 영향)

  • Kim, Young-Bae
    • The Korean Journal of Health Service Management
    • /
    • v.5 no.3
    • /
    • pp.123-132
    • /
    • 2011
  • To evaluate the impact of avoidable mortality on the changes in life expectancy at birth in Korea. Standard life table techniques and the Arriaga method were used to calculate and to decompose life expectancy changes by age, effects and groups of causes of avoidable mortality among two periods(1990-2000 and 2000-2009). A list of causes of avoidable mortality reached by consensus and previously published in Spain was used. Mortality in young adults produced a reduction in life expectancy at birth during the 1990-2000, but there was an important increase in life expectancy at birth during the 2000-2009; in both cases, this was the result of factors amenable to health policy interventions. The highest improvement in life expectancy at birth was due to non-avoidable causes, but avoidable mortality through health service interventions showed improvements in life expectancy at birth in those elderly people than 1 year and in those younger. Making a distinction between several groups of causes of avoidable mortality and using decomposition by causes, ages and effects allowed us to better explain the impact of avoidable mortality on the life expectancy at birth of the whole population and gave a new dimension to this indicator that could be very useful in public health.

An Investigation on the Effect of Stabilization Methods for Rice Paddies contaminated by Heavy Metal considering Characteristics of submerged Paddy (담수답의 특성을 고려한 중금속 오염 농경지의 토양개량공법 효과 검토)

  • Yu, Chan;Yun, Sung-Wook;Lee, Jung-Hoon;Choi, Seung-Jin;Lee, Seong-Min
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2009.09a
    • /
    • pp.1455-1471
    • /
    • 2009
  • In order to investigate on the effect of stabilization methods for rice paddies contaminated by heavy metals, a series of lab-scale model test was carried out by applying the characteristics of submerged Paddy soil. To perform the lab-scale model test, columns were made by acrylic with the dimension of diameter=10cm, thickness=0.5cm and were filled with soils which was contaminated were mixed with stabilization agents(lime stone 5% and steel refining slag 5% respectively). To manipulate the reduction condition, soils in the columns were submerged with distilled water. And then soil water and subsurface water in each column were sampled in the regular term and analysed the various physical and chemical properties.

  • PDF