• Title/Summary/Keyword: Rare Data

Search Result 992, Processing Time 0.026 seconds

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data (네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구)

  • Ryu, Kyung Joon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.411-418
    • /
    • 2020
  • In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.

A Study on the Adjustment of Posterior Probability for Oversampling when the Target is Rare (목표 범주가 희귀한 자료의 과대표본추출에 대한 연구)

  • Kim, U.N.;Lee, S.K.;Choi, J.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.477-484
    • /
    • 2011
  • When an event of target variable is rare, a widespread strategy is to build a model on the sample that disproportionally over-represents the events, that is over-sampled. Using the data over-sampled from the original data set, the predicted values would be biased; however, it can be easily corrected to represent the population. In this study, we investigate into the relationship between the proportion of rare event on a data-mart and the model performance using real world data of a Korean credit card company. Also, we use the methods for adjusting of posterior probability for over-sampled data of the offset method and the weighted method. Finally, we compare the performance of the methods using real data sets.

Comparison of Bias Correction Methods for the Rare Event Logistic Regression (희귀 사건 로지스틱 회귀분석을 위한 편의 수정 방법 비교 연구)

  • Kim, Hyungwoo;Ko, Taeseok;Park, No-Wook;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.277-290
    • /
    • 2014
  • We analyzed binary landslide data from the Boeun area with logistic regression. Since the number of landslide occurrences is only 9 out of 5000 observations, this can be regarded as a rare event data. The main issue of logistic regression with the rare event data is a serious bias problem in regression coefficient estimates. Two bias correction methods were proposed before and we quantitatively compared them via simulation. Firth (1993)'s approach outperformed and provided the most stable results for analyzing the rare-event binary data.

Mining Association Rules on Significant Rare Data using Relative Support (상대 지지도를 이용한 의미 있는 희소 항목에 대한 연관 규칙 탐사 기법)

  • Ha, Dan-Shim;Hwang, Bu-Hyun
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.577-586
    • /
    • 2001
  • Recently data mining, which is analyzing the stored data and discovering potential knowledge and information in large database is a key research topic in database research data In this paper, we study methods of discovering association rules which are one of data mining techniques. And we propose a technique of discovering association rules using the relative support to consider significant rare data which have the high relative support among some data. And we compare and evaluate existing methods and the proposed method of discovering association rules for discovering significant rare data.

  • PDF

Rare Disaster Events, Growth Volatility, and Financial Liberalization: International Evidence

  • Bongseok Choi
    • Journal of Korea Trade
    • /
    • v.27 no.2
    • /
    • pp.96-114
    • /
    • 2023
  • Purpose - This paper elucidates a nexus between the occurrence of rare disaster events and the volatility of economic growth by distinguishing the likelihood of rare events from stochastic volatility. We provide new empirical facts based on a quarterly time series. In particular, we focus on the role of financial liberalization in spreading the economic crisis in developing countries. Design/methodology - We use quarterly data on consumption expenditure (real per capita consumption) from 44 countries, including advanced and developing countries, ending in the fourth quarter of 2020. We estimate the likelihood of rare event occurrences and stochastic volatility for countries using the Bayesian Markov chain Monte Carlo (MCMC) method developed by Barro and Jin (2021). We present our estimation results for the relationship between rare disaster events, stochastic volatility, and growth volatility. Findings - We find the global common disaster event, the COVID-19 pandemic, and thirteen country-specific disaster events. Consumption falls by about 7% on average in the first quarter of a disaster and by 4% in the long run. The occurrence of rare disaster events and the volatility of gross domestic product (GDP) growth are positively correlated (4.8%), whereas the rare events and GDP growth rate are negatively correlated (-12.1%). In particular, financial liberalization has played an important role in exacerbating the adverse impact of both rare disasters and financial market instability on growth volatility. Several case studies, including the case of South Korea, provide insights into the cause of major financial crises in small open developing countries, including the Asian currency crisis of 1998. Originality/value - This paper presents new empirical facts on the relationship between the occurrence of rare disaster events (or stochastic volatility) and growth volatility. Increasing data frequency allows for greater accuracy in assessing a country's specific risk. Our findings suggest that financial market and institutional stability can be vital for buffering against rare disaster shocks. It is necessary to preemptively strengthen the foundation for financial stability in developing countries and increase the quality of the information provided to markets.

Exome and genome sequencing for diagnosing patients with suspected rare genetic disease

  • Go Hun Seo;Hane Lee
    • Journal of Genetic Medicine
    • /
    • v.20 no.2
    • /
    • pp.31-38
    • /
    • 2023
  • Rare diseases, even though defined as fewer than 20,000 in South Korea, with over 8,000 rare Mendelian disorders having been identified, they collectively impact 6-8% of the global population. Many of the rare diseases pose significant challenges to patients, patients' families, and the healthcare system. The diagnostic journey for rare disease patients is often lengthy and arduous, hampered by the genetic diversity and phenotypic complexity of these conditions. With the advent of next-generation sequencing technology and clinical implementation of exome sequencing (ES) and genome sequencing (GS), the diagnostic rate for rare diseases is 25-50% depending on the disease category. It is also allowing more rapid new gene-disease association discovery and equipping us to practice precision medicine by offering tailored medical management plans, early intervention, family planning options. However, a substantial number of patients remain undiagnosed, and it could be due to several factors. Some may not have genetic disorders. Some may have disease-causing variants that are not detectable or interpretable by ES and GS. It's also possible that some patient might have a disease-causing variant in a gene that hasn't yet been linked to a disease. For patients who remain undiagnosed, reanalysis of existing data has shown promises in providing new molecular diagnoses achieved by new gene-disease associations, new variant discovery, and variant reclassification, leading to a 5-10% increase in the diagnostic rate. More advanced approach such as long-read sequencing, transcriptome sequencing and integration of multi-omics data may provide potential values in uncovering elusive genetic causes.

Quantitative Reliability Assessment for Safety Critical System Software

  • Chung, Dae-Won
    • Journal of Electrical Engineering and Technology
    • /
    • v.2 no.3
    • /
    • pp.386-390
    • /
    • 2007
  • At recent times, an essential issue in the replacement of the old analogue I&C to computer-based digital systems in nuclear power plants becomes the quantitative software reliability assessment. Software reliability models have been successfully applied to many industrial applications, but have the unfortunate drawback of requiring data from which one can formulate a model. Software that is developed for safety critical applications is frequently unable to produce such data for at least two reasons. First, the software is frequently one-of-a-kind, and second, it rarely fails. Safety critical software is normally expected to pass every unit test producing precious little failure data. The basic premise of the rare events approach is that well-tested software does not fail under normal routine and input signals, which means that failures must be triggered by unusual input data and computer states. The failure data found under the reasonable testing cases and testing time for these conditions should be considered for the quantitative reliability assessment. We presented the quantitative reliability assessment methodology of safety critical software for rare failure cases in this paper.

Algorithm mining Association Rules by considering Weight Support (중요지지도를 고려한 연관규칙 탐사 알고리즘)

  • Kim, Keun-Hyung;Whang, Byung-Woong;Kim, Min-Chul
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.545-552
    • /
    • 2004
  • Association rules mining, which is one of data mining technologies, searches data among which are frequent and related to each other in database. But, although the data are not of frequent and rare in database, they have the enough worth of business information if the data ares important and strongly related to each other, In this paper, we propose the algorithm discovering association rules that consist of data, which are rare but, important and strongly related to each other in database. The proposed algorithm was evaluated through simulation. We found that the proposed algorithm discovered efficiently association rules among data, which are not frequent but, important.

Parenting Stress and Guilty Feeling for Mothers Having Children with Rare Genetic Metabolic Diseases (희귀유전대사질환 아동 어머니의 양육 스트레스와 죄책감)

  • Kwon, Eun Kyung;Choi, Mi Hye;Kim, Su Kang
    • Journal of Korean Clinical Nursing Research
    • /
    • v.14 no.3
    • /
    • pp.153-163
    • /
    • 2008
  • Purpose: The purpose of this research, using descriptive correlation design was to identify the extent to which the mothers having children with rare genetic metabolic diseases(MPS, PWS) have parenting stress and guilt feeling. Method: This study used PSI /SF(Abidin, 1995) and Guilt Index as devised herein. From 156 mothers, data were collected from February to July 2006, using self-administered questionnaires. This study received the approval from IRB at S Hospital (IRB File No: 2006-02-014). Data were analyzed with descriptive statistics, t-test, ANOVA, and correlation. Results: Mothers felt very high level of parenting stress and sense of guilt. Parenting stress was related positively to guilt feeling. Conclusion: These findings could help understand the families of children with rare genetic metabolic diseases and those provide basic information in developing effective counseling and education programs for relief of parenting stress and guilt feeling. This study would be significant in the fact that it is the first research, targeting on the families of children with rare genetic metabolic diseases in Korea.

  • PDF

Collaboration through the Asia Pacific MPS Network (APMN), Asia Pacific MPS Registry (APMR), and Association for Research of MPS & Rare Diseases (ARMRD)

  • Cho, Sung Yoon
    • Journal of mucopolysaccharidosis and rare diseases
    • /
    • v.1 no.1
    • /
    • pp.2-4
    • /
    • 2015
  • Though the rate of incidence of each rare disease, including mucopolysaccharidosis (MPS), is low, this is not the case if they are taken as a whole. Rare diseases often have genetic causes and vary in type. However, the signs and symptoms vary greatly by disease, making it difficult to make accurate diagnoses and conduct necessary research, which is why we believe it is a field that deserves more attention and research. It is important to establish an infrastructure of experts in each country and promote cooperation within the Asia-Pacific region in order to improve specialist training and communication. Given the need for a system of cooperation, the Asia Pacific MPS Network (APMN) was established by several MPS experts in South Korea, Japan, and Taiwan in January 2013. Thereafter, the Asia Pacific MPS Registry (APMR), an electronic remote data system, was established by the APMN. Then, the Association for Research of MPS & Rare Diseases (ARMRD), an academic society that supports research on MPS and other rare diseases, was established by President Dong-Kyu Jin in April in 2015. The main task of the ARMRD is to support APMN-related work. The ARMRD published a uniform guideline that reflects the characteristics and circumstances of local patients through the Korean MPS Expert Council. Now, the APMN, APMR, and the annual Korean MPS Symposium are supported by ARMRD. Organizations like the APMN and APMR are necessary because international cooperation and collaboration are needed to conduct clinical trials on those diseases. ARMRD members hope to encourage the interest of experts and researchers of MPS & rare diseases as well as active participation in the research and treatment of patients suffering from rare diseases, including MPS, to ultimately improve the quality of life of the patients as well as their families.