• 제목/요약/키워드: Symbolic data analysis

검색결과 178건 처리시간 0.024초

혼합형태 심볼릭 데이터의 군집분석방법 (A Divisive Clustering for Mixed Feature-Type Symbolic Data)

  • 김재직
    • 응용통계연구
    • /
    • 제28권6호
    • /
    • pp.1147-1161
    • /
    • 2015
  • 오늘날 데이터는 p-차원의 공간에서 점들로써 표현되는 전통적인 형태를 벗어나 시그널(signal), 함수, 이미지(image), 모양(shape) 등과 같은 다양한 형태의 자료들이 데이터로써 고려되고 분석되고있다. 그러한 종류의 새로운 종류의 데이터 중 하나로 심볼릭 데이터(symbolic data)를 고려할 수 있다. 심볼릭 데이터는 구간(interval), 히스토그램(histogram), 목록(list), 통계표, 분포, 또는 모형 등과 같은 다양한 형태들을 가질 수 있다. 지금까지의 연구가 주로 심볼릭 데이터의 각각의 형태별 자료를 고려했다면, 본 연구에서는 이를 확장하여 수집된 히스토그램과 멀티모달의 혼합된 형태로 이루어진 자료에 대한 계층 분할적 군집분석방법을 소개하고 이를 업종별 산업재해자료의 분석을 위해 이용한다.

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • 제21권3호
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

청소년들의 상징적 의류제품 소비성향과 관련변수와의 관계연구 (The Symbolic Consumption in Clothing and Related Factors)

  • 이옥희;홍병숙
    • 한국의류학회지
    • /
    • 제22권6호
    • /
    • pp.781-792
    • /
    • 1998
  • The purpose of this study was to investigate the factors related to the propensity for symbolic consumption and the effects of materialism, reference group, and social stratification on the symbolic consumption in clothing. Data were administered to 957 adolescence in middle, high school, and college student living in Seoul, Chonju, Sunchon, Yousu, and Kwangyang from May to June 1997. For analysis of the data, frequencies, percentage, means, standard deviation, factor analysis, 1-test, one-way anomia, duncan's multiple range test, and multiple regression analysis were employed. The results of this study can be summarized asfollows. 1) Symbolic consumption, materialism, and reference group were found to have the significant differences according to social stratification groups by objectivemethod. The higher social stratification is, the higher symbolic consumption, materialism, and reference group were. 2) symbolic consumption were proven to have the significant differences according to materiaiism and reference group. The higher materialism and the influence of referencegroup indicated, the higher symbolic consumption. 3)according to the results of the regression analysis examining the relative influences of variables affecting symbolic consumption in clothing, the relative importance of the variables are in order of : the influences of the reference group, materialism, social stratification, status inconsistency type (occupation-income), and their explanatory power totalled 40.0%.

  • PDF

병렬 유전자 프로그래밍을 이용한 Symbolic Regression (Symbolic regression based on parallel Genetic Programming)

  • 김찬수;한근희
    • 디지털융복합연구
    • /
    • 제18권12호
    • /
    • pp.481-488
    • /
    • 2020
  • 기호적 회귀분석 (Symbolic Regression)은 회귀분석에서 주어진 데이터에 대하여 종속변수와 독립변수들 사이의 관계를 설명할 수 있는 함수를 직접 생성하는 분석방법으로서 Genetic Programming 이 본 분야의 연구에 가장 선도적으로 적용되고 있으며, 고정된 모델로부터 매개변수들의 최적화를 추구하는 다른 회귀분석 알고리즘들에 비하여 해석이 가능한 모델을 직접 도출할 수 있다는 장점을 갖는다. 본 연구에서는 Coarse grained 병렬 모델에 기반한 Parellel Genetic Programming 을 이용한 symbolic regression 알고리즘을 제시하고 제시된 알고리즘을 PMLB 데이타에 적용하여 해당 알고리즘의 효용성을 분석하고자 한다.

청소년의 상징적 의류소비에 관한 연구 (The Symbolic Consumption of Adolescent Clothing)

  • 이옥희
    • 대한가정학회지
    • /
    • 제36권10호
    • /
    • pp.131-144
    • /
    • 1998
  • The purpose of this study was to examine the differences of symbolic consumption of adolescents, and the effects of demographic factors on the symbokic consumption in clothing. Data were administered to 957 adolescents in middle, high school, and college students living in Seoul, Chonju, Sunchon, Yousu, and Kwangyang from May to June 1997. For analysis of the data, factor analysis, t-test, one-way ANOCA, duncan's multiple range test, and multiple regression analysis were employed. The results of this study were summarized as follows. 1) Symbolic consumption in colthing were shown to have the significant differences accoding to age, gender, the level of urbanization, parent's education, father's occupation, social stratification groups. The higher the age, the level of urbanization, and parent's education, father's occupation, social stratification is, or the female, the higher is symbolic consumption in clothing. 2) According to the results of the regression analysis examining the rerlative influences of variables affecting symbolic consumption in clothing, the relative importance of the variables are in order of; income, gender, age, mother's education, residence, and their explanatory powere totalled 11.5%.

  • PDF

Forecasting Symbolic Candle Chart-Valued Time Series

  • Park, Heewon;Sakaori, Fumitake
    • Communications for Statistical Applications and Methods
    • /
    • 제21권6호
    • /
    • pp.471-486
    • /
    • 2014
  • This study introduces a new type of symbolic data, a candle chart-valued time series. We aggregate four stock indices (i.e., open, close, highest and lowest) as a one data point to summarize a huge amount of data. In other words, we consider a candle chart, which is constructed by open, close, highest and lowest stock indices, as a type of symbolic data for a long period. The proposed candle chart-valued time series effectively summarize and visualize a huge data set of stock indices to easily understand a change in stock indices. We also propose novel approaches for the candle chart-valued time series modeling based on a combination of two midpoints and two half ranges between the highest and the lowest indices, and between the open and the close indices. Furthermore, we propose three types of sum of square for estimation of the candle chart valued-time series model. The proposed methods take into account of information from not only ordinary data, but also from interval of object, and thus can effectively perform for time series modeling (e.g., forecasting future stock index). To evaluate the proposed methods, we describe real data analysis consisting of the stock market indices of five major Asian countries'. We can see thorough the results that the proposed approaches outperform for forecasting future stock indices compared with classical data analysis.

비대칭적 유사도 기반의 심볼릭 객체의 계층적 클러스터링 (Hierarchical Clustering of Symbolic Objects based on Asymmetric Proximity)

  • 오승준;박찬웅
    • 한국지능시스템학회논문지
    • /
    • 제22권6호
    • /
    • pp.729-734
    • /
    • 2012
  • 패턴 인식, 데이터 분석, 침입 탐지, 이미지 처리, 바이오 인포매틱스 등과 같은 수많은 분야에서 클러스터링 분석이 사용되고 있다. 기존의 많은 연구들은 수치 데이터에만 기반을 두고 있다. 그러나 구간 데이터, 히스토그램, 심지어는 함수들을 값으로 갖는 변수들을 다루는 심볼릭 데이터 분석이 부상하고 있다. 본 논문에서는 이런 심볼릭 데이터들을 클러스터링하기 위하여 비대칭적 유사도를 제안한다. 또한 평균 유사도 값(ASV)에 기반한 클러스터링 방법도 개발한다. 제안하는 클러스터링의 결과는 기존 방법들과 다르며, 매우 고무적인 결과를 보여준다.

기호계산 기법을 이용한 현가장치의 기구학적 민감도 해석 (Kinematic Design Sensitivity Analysis of Suspension System Using a Symbolic Computation Method)

  • 송성재;탁태오
    • 한국자동차공학회논문집
    • /
    • 제4권6호
    • /
    • pp.247-259
    • /
    • 1996
  • Kinematic design sensitivity analysis for vehicle in suspension systems design is performed. Suspension systems are modeled using composite joins to reduce the number of the constraint equations. This allows a semi-analytical approach that is computerized symbolic manipulation before numerical computations and that may compensate for their drawbacks. All the constraint equations including design variables are derived in symbolic equations for sensitivity analysis. By directly differentiating the equations with respect to design variables, sensitivity equations are obtained. Since the proposed method only requires the hard point data, sensitivity analysis is possible in suspension design stage.

  • PDF

패션제품의 상징적 소비성향에 따른 브랜드 애착과 브랜드 충성도와의 관계 (The Relations between Brand Attachment and Brand Loyalty with regard to Symbolic Consumption Propensity toward Fashion Goods)

  • 김정란;유태순
    • 한국의류산업학회지
    • /
    • 제10권4호
    • /
    • pp.499-505
    • /
    • 2008
  • The Purpose of this study is to research the relations between brand attachment and brand loyalty depending on symbolic consumption propensity toward fashion goods. Subjects were 391 women in their twenties to fifties who live in Gyungsang Province and have purchased the fashion goods. Frequency analysis, reliability analysis, factor analysis, multiple regression analysis, and one-way layout variance analysis were conducted using SPSS 13.0 as data analysis. The findings from the analysis are described in the following: Uniqueness and materialism out of the symbolic consumption propensity toward fashion goods had positive effects on the elements of brand attachment such as love, care, and knowledge. Brand loyalty was influenced positively by social face sensitivity and materialism among symbolic consumption toward fashion goods.

서울 아파트 매매가 자료의 심볼릭 데이터를 이용한 군집분석 (Cluster analysis for Seoul apartment price using symbolic data)

  • 김재직
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권6호
    • /
    • pp.1239-1247
    • /
    • 2015
  • 이 논문에서는 아파트 매매가 활발히 일어나는 서울시내 64개 행정동들에 대해 아파트 전용면적별 실거래 매매가를 기준으로 군집분석을 실시하였다. 군집분석에 있어서 각 행정동의 실거래가에 대한 정보를 최대한 이용하기 위해 실거래가의 평균 뿐만 아니라 그 분포까지 고려할 수 있도록 전통적인 형태의 데이터를 히스토그램 형태의 데이터로 변환하여 분석을 하였다. 히스토그램 데이터는 심볼릭 데이터의 한 종류이고, 심볼릭 데이터는 기본적으로 구간, 목록, 히스토그램, 분포, 모형 등과 같이 데이터 자체가 내부적인 변동을 갖는 모든 형태의 데이터를 포함한다. 이러한 각 행정동들의 내부적인 매매가의 변동을 고려한 군집분석의 결과 강남구, 서초구, 송파구와 그에 인접한 행정동들이 상대적으로 다른 지역보다 매매가도 높았고 실거래가의 분포도 훨씬 더 넓은 것으로 조사되었다. 전반적으로 도심에 대한 접근성이 좋고 교육환경이 우수한 지역과 강북의 뉴타운 지역이 상대적으로 주변지역보다 더 높고 넓은 매매가 분포를 보이는 것으로 분석되었다.