• Title/Summary/Keyword: Categorical time series

Search Result 8, Processing Time 0.019 seconds

An Analysis of Categorical Time Series Driven by Clipping GARCH Processes (연속형-GARCH 시계열의 범주형화(Clipping)를 통한 분석)

  • Choi, M.S.;Baek, J.S.;Hwan, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.4
    • /
    • pp.683-692
    • /
    • 2010
  • This short article is concerned with a categorical time series obtained after clipping a heteroscedastic GARCH process. Estimation methods are discussed for the model parameters appearing both in the original process and in the resulting binary time series from a clipping (cf. Zhen and Basawa, 2009). Assuming AR-GARCH model for heteroscedastic time series, three data sets from Korean stock market are analyzed and illustrated with applications to calculating certain probabilities associated with the AR-GARCH process.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

An Analysis of Panel Count Data from Multiple random processes

  • Park, You-Sung;Kim, Hee-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.265-272
    • /
    • 2002
  • An Integer-valued autoregressive integrated (INARI) model is introduced to eliminate stochastic trend and seasonality from time series of count data. This INARI extends the previous integer-valued ARMA model. We show that it is stationary and ergodic to establish asymptotic normality for conditional least squares estimator. Optimal estimating equations are used to reflect categorical and serial correlations arising from panel count data and variations arising from three random processes for obtaining observation into estimation. Under regularity conditions for martingale sequence, we show asymptotic normality for estimators from the estimating equations. Using cancer mortality data provided by the U.S. National Center for Health Statistics (NCHS), we apply our results to estimate the probability of cells classified by 4 causes of death and 6 age groups and to forecast death count of each cell. We also investigate impact of three random processes on estimation.

  • PDF

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

A Study on the Mapping and Characteristics of Distributions in Cultural-Historic Sites of Yanbian Area using Google Earth (구글어스를 이용한 연변지역의 문화.역사유적 지도화와 분포의 특징에 관한 연구)

  • Jin, Shizhu;Kim, Nam-Sin
    • Journal of the Korean association of regional geographers
    • /
    • v.17 no.1
    • /
    • pp.122-139
    • /
    • 2011
  • Yanbian area is a region with great interests to Cultural-Historically Korea as well as China. Cultural-historic study on Yanbian are lots of researches but can find few mapping for sites. This study aimed to make a map and analyze characteristics of distributions in the Cultural-Historic sites of Yanbian using Google Earth. We made a distribution map from stone age to Qing Dynasty. Symbology for mapping made color symbols by time series and categorical symbols. As a research finding, Sites of Balhae and Yuo-Geum age account for large parts in comparison with other ages in Yanbian. Especially, sites of Goguryeo, Balhae and Yuo-Geum age showed spatio-temporal structure of accumulative layers Characteristics of distributions is located in basin and stream area in the early age, and after then historical period moved to hilly sides and mountainous areas. The result of this research is expected to offer information for relevant follow-up studies of Cultural-Historic sites.

  • PDF

A Study on the Estimation of Limits to Life Expectancy (한국인 기대여명의 한계추정에 관한 연구)

  • 천성수;김정근
    • Korea journal of population studies
    • /
    • v.16 no.2
    • /
    • pp.65-83
    • /
    • 1993
  • The purpose of this study is estimate limits of Korean life expectancy at birth by 'Gompertz growth curse Model', 'Cause-Elimination Model' and Multidimensional models of Senescencee and Mortality'. Data used in Gompertz curve were obtained from all life tables published from 1905 to 1990 in Korea, and life expectancies at birth of eighteen groups were selected at five-year interval in consideration of time-series changes. Data used in Cause-Elimination Model are 'Cause of Death statistics in 1991' published in 1992 by National Bureau of Statistics of Korea and 'life table of 1989' published in 1990 by National Bureau of Statistics, Economic Planning Board of Korea. The materials are all classifiable death data, 119, 253 cases of male and 82, 420 cases of female, which is from 1991 Causes of Death statistics. The cases of death analyzed belong to one of 8 categories; i.e., Infectious and Parasitic Diseases(001-139; with notation of Infectious Diseases), Malignant Neoplasms(140-208), Hypertensive Diseases(401-405), Ischemic Heart Dieases and Diseases of Pulmonary Circulation and Other Forms of Heart Diseases(410-429;with notation of Heart Disease), Cerebrovascular Diseases(430-438), Chronic Liver Diseases and Cirrhosis(571; with notation of Liver Diseases), Injury and Poisoning(800-999) and all other disease. Data used in 'Multidimensional models of senescence and mortality' were life table of 1989 published by National Bureau of statistics, Economic Planning Board of Korea and life table of 1970, 1978-79, 1983, 1985 and 1987. The major findings may be summarised as follows: 1. Estimate equations of Gompertz growth curve using life expectancy at birth during the 1905-1990 period are as the following. Male : y = 88.047697 $\times$ $0.199690^{0.903381x}$ Female : y = 95.632828 $\times$ $0.199690^{0.903381x}$ Limits of life expectancy at birth, which were estimated by Gompertz growth curve, are 88.05 for male and 95.63 for female. 2. The effect on life expectancy at birth eliminationg all causes death is 14.04 years(for male) and 10.86 years(for female). Astonishingly, eliminating the malignant neoplasms increase life expectancy at birth by 2.85 years for male 2.03 years for female in 1991. In table 8 we show the effect on life expectancy at birth of separately eliminating each of the 8 categorical causes of death. The theoretical limit to life expectancy by Cause-Elimination Model is 80.96 for male and 85.82 for female. 3. If the same rate of delay [0.376 year(male), 0.435 year(femable) per calendar year] continued, then life expectancy at birth would reach 74.82(male) years and 84, 10(female) years in 2010. With 14.04-years(male) and 10.86-years(female) effect attributable in 2010 would be 88.86 years(male) and 94.96(femable) years. 4. 'Multidimensional models of senescence and death' permits calculations of the value of the attribution coefficient (B), percent of loss per year of physiologic function. The results of Ro and B during the 1970-1989 period are listed in table 9. Estimate of limit to Korean life expectancy at birth by 'Multidimensional models of senescence and death' is 99.47 years for male and 104.74 years for female in 1989.

  • PDF

Surgery for symptomatic hepatic hemangioma: Resection vs. enucleation, an experience over two decades

  • Nalini Kanta Ghosh;Rahul R;Ashish Singh;Somanath Malage;Supriya Sharma;Ashok Kumar;Rajneesh Kumar Singh;Anu Behari;Ashok Kumar;Rajan Saxena
    • Annals of Hepato-Biliary-Pancreatic Surgery
    • /
    • v.27 no.3
    • /
    • pp.258-263
    • /
    • 2023
  • Backgrounds/Aims: Hemangiomas are the most common benign liver lesions; however, they are usually asymptomatic and seldom require surgery. Enucleation and resection are the most commonly performed surgical procedures for symptomatic lesions. This study aims to compare the outcomes of these two surgical techniques. Methods: A retrospective analysis of symptomatic hepatic hemangiomas (HH) operated upon between 2000 and 2021. Patients were categorized into the enucleation and resection groups. Demographic profile, intraoperative bleeding, and morbidity (Clavien-Dindo Grade) were compared. Independent t-test and chi-square tests were used for continuous and categorical variables respectively. p-value of < 0.05 was considered significant. Results: Sixteen symptomatic HH patients aged 30 to 66 years underwent surgery (enucleation = 8, resection = 8) and majority were females (n = 10 [62.5%]). Fifteen patients presented with abdominal pain, and one patient had an interval increase in the size of the lesion from 9 to 12 cm. The size of hemangiomas varied from 6 to 23 cm. The median blood loss (enucleation: 350 vs. resection: 600 mL), operative time (enucleation: 5.8 vs. resection: 7.5 hours), and postoperative hospital stay (enucleation: 6.5 vs. resection: 11 days) were greater in the resection group (statistically insignificant). In the resection group, morbidity was significantly higher (62.6% vs. 12.5%, p = 0.05), including one mortality. All patients remained asymptomatic during the follow-up. Conclusions: Enucleation was simpler with less morbidity as compared to resection in our series. However, considering the small number of patients, further studies are needed with comparable groups to confirm the superiority of enucleation over resection.