• Title/Summary/Keyword: Skewed Data

Search Result 206, Processing Time 0.021 seconds

Linear regression under log-concave and Gaussian scale mixture errors: comparative study

  • Kim, Sunyul;Seo, Byungtae
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.6
    • /
    • pp.633-645
    • /
    • 2018
  • Gaussian error distributions are a common choice in traditional regression models for the maximum likelihood (ML) method. However, this distributional assumption is often suspicious especially when the error distribution is skewed or has heavy tails. In both cases, the ML method under normality could break down or lose efficiency. In this paper, we consider the log-concave and Gaussian scale mixture distributions for error distributions. For the log-concave errors, we propose to use a smoothed maximum likelihood estimator for stable and faster computation. Based on this, we perform comparative simulation studies to see the performance of coefficient estimates under normal, Gaussian scale mixture, and log-concave errors. In addition, we also consider real data analysis using Stack loss plant data and Korean labor and income panel data.

Efficient Continuous Skyline Query Processing Scheme over Large Dynamic Data Sets

  • Li, He;Yoo, Jaesoo
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1197-1206
    • /
    • 2016
  • Performing continuous skyline queries of dynamic data sets is now more challenging as the sizes of data sets increase and as they become more volatile due to the increase in dynamic updates. Although previous work proposed support for such queries, their efficiency was restricted to small data sets or uniformly distributed data sets. In a production database with many concurrent queries, the execution of continuous skyline queries impacts query performance due to update requirements to acquire exclusive locks, possibly blocking other query threads. Thus, the computational costs increase. In order to minimize computational requirements, we propose a method based on a multi-layer grid structure. First, relational data object, elements of an initial data set, are processed to obtain the corresponding multi-layer grid structure and the skyline influence regions over the data. Then, the dynamic data are processed only when they are identified within the skyline influence regions. Therefore, a large amount of computation can be pruned by adopting the proposed multi-layer grid structure. Using a variety of datasets, the performance evaluation confirms the efficiency of the proposed method.

Archaeomagnetic Secular Variation of the Neolithic Age in Korea: Focusing on the Mid-Western Region Sites (한반도 신석기시대의 고고지자기 변동: 중서부지역 유적을 중심으로)

  • Sung, Hyong Mi
    • Journal of Conservation Science
    • /
    • v.29 no.3
    • /
    • pp.223-229
    • /
    • 2013
  • It is not known in details for the A.D. period as the archaeomagnetic dating method to be fully facilitated in Korea but it has prepared for the revised shape of standard curve to trace the geomagnetic field variation, and there were cases to increase the survey on relics on the B.C. period to find out for the detailed archaeomagnetic field variation on the Bronze Age to the Early Iron Age. Furthermore, the survey cases on the relics on the Neolithic Age began to emerge a little by little archaeomagnetic field variation of the Neolithic Age through 34 pieces of the archaeomagnetic measurement data as making active advancement around mid-western region. Data is insufficient yet that it is difficult to find out the detailed trend of modification but it is estimated for approximate appearance. The archaeomagnetic field variation of the Neolithic Age made changes without breaking away from the scope of changes in the A.D. period as in the same way with the Bronze Age, and comparing to the variation of archaeomagnetic field for the Bronze Age, the magnetic inclination shifted within the scope of having almost no difference, but the declination is shown to skewed toward the east in its overall appearance. In addition, the comparison was made with the data of the Jomon Age in Japan and the archaeomagnetic measurement data of Korea has a little bit more depth for while the declination is skewed toward the east for 10 degree or more compared to those of Japan. However, in the part where the data is concentrated most intensely, the data for both countries has significant part to overlap to each other that the archaeomagnetic field variation of the Neolithic Age of Korea showed overall similar variation with certain partial changes when compared to those of Japan.

Projected Circular and l-Axial Skew-Normal Distributions

  • Seo, Han-Son;Shin, Jong-Kyun;Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.879-891
    • /
    • 2009
  • We developed the projected l-axial skew-normal(LASN) family of distributions for I-axial data. The LASN family of distributions contains the semicircular skew-normal(SCSN) and the circular skew-normal(CSN) families of distributions as special cases. The LASN densities are similar to the wrapped skew-normal densities for the small values of the scale parameter. However CSN densities have more heavy tails than those of the wrapped skew-normal densities on the circle. Furthermore the CSN densities have two modes as the scale parameter increases. The LASN distribution has very convenient mathematical features. We extend the LASN family of distributions to a bivariate case.

Sedimentary Petrology and Depositional Environments of the Sindong Group in the Euiseong Subbasin (의성소익지(義城小益地) 신동층군(新洞層群)의 퇴적암석학(堆積岩石學) 및 퇴적환경(堆積環境))

  • Lee, Kwang-Choon
    • Economic and Environmental Geology
    • /
    • v.18 no.3
    • /
    • pp.289-299
    • /
    • 1985
  • Sedimentary petrology and depositional environments of the Sindong Group, consisting of in ascending order the Nagdong, Hasandong and Jinju Formations, in the Euiseong Subbasin are studied. For these, the Sindong sequence over 1,000m thick is measured at the scale of 1:200 and 36 thin sections of sandstones of the Hasandong Formation are studied under the polarizing microscope. In addition, published paleontologic data are incorporated in the sedimentologic interpretation. Most of the sandstones are classified as arkose. They are moderately sorted, near symmetrical to fine skewed and mesokurtic. Relationship between the textural parameters suggests a fluviatile environment of the Hasandong Formation. The Sindong fauna and flora also indicate non-marine depositional environments. Sedimentologic data of the measured sections show that the Sindong Group is made up of from the bottom an alluvial fan (lower part of the Nagdong Formation), a fluvial plain (upper part of the Nagdong Formation and the Hasandong Formation) and a fluvial/lacustrine (the Jinju Formation) deposits.

  • PDF

An Improved Composite Estimator for Cut-off Sampling

  • Hwang, Hee-Jin;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.5
    • /
    • pp.367-376
    • /
    • 2013
  • Cut-off sampling is widely used for a highly skewed population like a business survey by discarding a part of the population (the take-nothing stratum). In this paper, we suggest a new composite estimator of the take-nothing stratum total obtained by use of the survey results of the take-nothing stratum and a take-some sub-stratum (a part of take-some stratum) for a more accurate estimate of the population total. Small simulation studies are conducted to compare the performances of known estimators and the new composite estimator suggested in this study. In addition, we use briquette consumption survey data for real data analysis.

Location-based Clustering for Skewed-topology Wireless Sensor Networks (편향된 토플로지를 가진 무선센서네트워크를 위한 위치기반 클러스터링)

  • Choi, Hae-Won;Ryu, Myung-Chun;Kim, Sang-Jin
    • Journal of Digital Convergence
    • /
    • v.14 no.1
    • /
    • pp.171-179
    • /
    • 2016
  • The energy consumption problem in wireless sensor networks is investigated. The problem is to expend as little energy as possible receiving and transmitting data, because of constrained battery. In this paper, in order to extend the lifetime of the network, we proposed a location-based clustering algorithm for wireless sensor network with skewed-topology. The proposed algorithm is to deploy multiple child nodes at the sink to avoid bottleneck near the sink and to save energy. Proposed algorithm can reduce control traffic overhead by creating a dynamic cluster. We have evaluated the performance of our clustering algorithm through an analysis and a simulation. We compare our algorithm's performance to the best known centralized algorithm, and demonstrate that it achieves a good performance in terms of the life time.

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

  • Oh, Hee-Seok;Jang, Dong-Ik;Oh, Seung-Yoon;Kim, Hee-Bal
    • Interdisciplinary Bio Central
    • /
    • v.2 no.2
    • /
    • pp.4.1-4.6
    • /
    • 2010
  • The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

Use of beta-P distribution for modeling hydrologic events

  • Murshed, Md. Sharwar;Seo, Yun Am;Park, Jeong-Soo;Lee, Youngsaeng
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.1
    • /
    • pp.15-27
    • /
    • 2018
  • Parametric method of flood frequency analysis involves fitting of a probability distribution to observed flood data. When record length at a given site is relatively shorter and hard to apply the asymptotic theory, an alternative distribution to the generalized extreme value (GEV) distribution is often used. In this study, we consider the beta-P distribution (BPD) as an alternative to the GEV and other well-known distributions for modeling extreme events of small or moderate samples as well as highly skewed or heavy tailed data. The L-moments ratio diagram shows that special cases of the BPD include the generalized logistic, three-parameter log-normal, and GEV distributions. To estimate the parameters in the distribution, the method of moments, L-moments, and maximum likelihood estimation methods are considered. A Monte-Carlo study is then conducted to compare these three estimation methods. Our result suggests that the L-moments estimator works better than the other estimators for this model of small or moderate samples. Two applications to the annual maximum stream flow of Colorado and the rainfall data from cloud seeding experiments in Southern Florida are reported to show the usefulness of the BPD for modeling hydrologic events. In these examples, BPD turns out to work better than $beta-{\kappa}$, Gumbel, and GEV distributions.

Bit-Vector-Based Space Partitioning Indexing Scheme for Improving Node Utilization and Information Retrieval (노드 이용률과 검색 속도 개선을 위한 비트 벡터 기반 공간 분할 색인 기법)

  • Yeo, Myung-Ho;Seong, Dong-Ook;Yoo, Jae-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.799-803
    • /
    • 2010
  • The KDB-tree is a traditional indexing scheme for retrieving multidimensional data. Much research for KDB-tree family frequently addresses the low storage utilization and insufficient retrieval performance as their two bottlenecks. The bottlenecks occur due to a number of unnecessary splits caused by data insertion orders and data skewness. In this paper, we propose a novel index structure, called as $KDB_{CS}^+$-tree, to process skewed data efficiently and improve the retrieval performance. The $KDB_{CS}^+$-tree increases the number of fan-outs by exploiting bit-vectors for representing splitting information and pointer elimination. It also improves the storage utilization by representing entries as a hierarchical structure in each internal node.