• Title/Summary/Keyword: datasets

Search Result 2,085, Processing Time 0.028 seconds

Assessment of Needs and Accessibility Towards Health Insurance Claims Data (연구를 위한 건강보험 청구자료 요구 및 이용 요인분석)

  • Lee, Jung-A;Oh, Ju-Hwan;Moon, Sang-Jun;Lim, Jun-Tae;Lee, Jin-Seok;Lee, Jin-Yong;Kim, Yoon
    • Health Policy and Management
    • /
    • v.21 no.1
    • /
    • pp.77-92
    • /
    • 2011
  • Objectives : This study examined the health policy researchers' needs and their accessibility towards health insurance claim datasets according to their academic capacity. Methods : An online questionnaire to capture relevant proxy variables for academic needs, accessibility, and research capacity was constructed based on previous studies. The survey was delivered to active health policy researchers through three major scholarly associations in South Korea. Seven-hundred and one scholars responded while the survey as open for 12 days (starting on December 20th, 2010). Descriptive statistics and logistic regression analysis were carried out. Results : Regardless of the definition for operational needs, the prevalent needs of survey respondents were not met with the current provision of claim data. Greater research capacity was shown to be correlated with increased demand for claim data along with a positive correlation between attempts to obtain claim datasets and research capacity. A greater research capacity, however, was not necessarily correlated with better accessibility to the claim data. Conclusions : The substantial unmet need for claim data among the healthcare policy research community calls for establishing proactive institutions which could systematically prepare and make available public datasets and provide call-in services to facilitate proper handling of data.

Fuzzy Cluster Analysis of Gene Expression Profiles Using Evolutionary Computation and Adaptive ${\alpha}$-cut based Evaluation (진화연산과 적응적 ${\alpha}$-cut 기반 평가를 이용한 유전자 발현 데이타의 퍼지 클러스터 분석)

  • Park Han-Saem;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.681-691
    • /
    • 2006
  • Clustering is one of widely used methods for grouping thousands of genes by their similarities of expression levels, so that it helps to analyze gene expression profiles. This method has been used for identifying the functions of genes. Fuzzy clustering method, which is one category of clustering, assigns one sample to multiple groups according to their degrees of membership. This method is more appropriate for analyzing gene expression profiles because single gene might involve multiple genetic functions. Clustering methods, however, have the problems that they are sensitive to initialization and can be trapped into local optima. To solve these problems, this paper proposes an evolutionary fuzzy clustering method, where adaptive a-cut based evaluation is used for the fitness evaluation to apply different criteria considering the characteristics of datasets to overcome the limitation of Bayesian validation method that applies the same criterion to all datasets. We have conducted experiments with SRBCT and yeast cell-cycle datasets and analyzed the results to confirm the usefulness of the proposed method.

SINGLE PANORAMA DEPTH ESTIMATION USING DOMAIN ADAPTATION (도메인 적응을 이용한 단일 파노라마 깊이 추정)

  • Lee, Jonghyeop;Son, Hyeongseok;Lee, Junyong;Yoon, Haeun;Cho, Sunghyun;Lee, Seungyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.61-68
    • /
    • 2020
  • In this paper, we propose a deep learning framework for predicting a depth map of a 360° panorama image. Previous works use synthetic 360° panorama datasets to train networks due to the lack of realistic datasets. However, the synthetic nature of the datasets induces features extracted by the networks to differ from those of real 360° panorama images, which inevitably leads previous methods to fail in depth prediction of real 360° panorama images. To address this gap, we use domain adaptation to learn features shared by real and synthetic panorama images. Experimental results show that our approach can greatly improve the accuracy of depth estimation on real panorama images while achieving the state-of-the-art performance on synthetic images.

A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

  • Nishanth, Kancherla Jonah;Ravi, Vadlamani
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.633-650
    • /
    • 2013
  • All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP)or GRNN in Stage-1and Stage-2respectively. Several experiments were conducted on 8benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.

A Constrained Optimum Match-filtering Method for Cross-equalization of Time-lapse Seismic Datasets (시간경과 탄성파 자료의 교차균등화를 위한 제약적 최적 맞춤필터링 방법)

  • Choi, Yun-Gyeong;Ji, Jun
    • Geophysics and Geophysical Exploration
    • /
    • v.15 no.1
    • /
    • pp.23-32
    • /
    • 2012
  • The comparison between time-lapse seismic datasets is the most popular method in the reservoir monitoring. The method of extracting the changes only due to the change in the reservoir is the essential technique in the comparison of time-lapse seismic datasets. In the paper, the conventional cross-equalization approaches and an enhanced optimized approach have been tested and compared each other. As conventional approaches, the bandwidth equalization and phase rotation methods have been tested in frequency, time and mixed domains, respectively and their results were compared each other. In order to overcome the limit of the conventional approaches, which loses high frequency components, a new constrained optimum filtering method was proposed and experimented. The new constrained filtering method has shown the improvement in broadening the bandwidth of the components of reservoir changes by acquiring optimized match filter.

Enhancement of Tongue Segmentation by Using Data Augmentation (데이터 증강을 이용한 혀 영역 분할 성능 개선)

  • Chen, Hong;Jung, Sung-Tae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.5
    • /
    • pp.313-322
    • /
    • 2020
  • A large volume of data will improve the robustness of deep learning models and avoid overfitting problems. In automatic tongue segmentation, the availability of annotated tongue images is often limited because of the difficulty of collecting and labeling the tongue image datasets in reality. Data augmentation can expand the training dataset and increase the diversity of training data by using label-preserving transformations without collecting new data. In this paper, augmented tongue image datasets were developed using seven augmentation techniques such as image cropping, rotation, flipping, color transformations. Performance of the data augmentation techniques were studied using state-of-the-art transfer learning models, for instance, InceptionV3, EfficientNet, ResNet, DenseNet and etc. Our results show that geometric transformations can lead to more performance gains than color transformations and the segmentation accuracy can be increased by 5% to 20% compared with no augmentation. Furthermore, a random linear combination of geometric and color transformations augmentation dataset gives the superior segmentation performance than all other datasets and results in a better accuracy of 94.98% with InceptionV3 models.

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

  • Malhotra, Ruchika;Sharma, Anjali
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.751-770
    • /
    • 2018
  • Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

The Analysis of Flood in an Ungauged Watershed using Remotely Sensed and Geospatial Datasets (I) - Focus on Estimation of Flood Discharge - (원격탐사와 공간정보를 활용한 미계측 유역 홍수범람 해석에 관한 연구(I) - 홍수량 산정을 중심으로 -)

  • Son, Ahlong;Kim, Jongpil
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.5_2
    • /
    • pp.781-796
    • /
    • 2019
  • This study attempted to simulate the flood discharge in the Duman River basin containing Hoeryong City and Musan County of North Korea where were damaged from Typhoon Lionrock on August, 2016. For hydrological modelling remotely sensed datasets were used to estimate watershed properties and hydrologic factors because the basin is ungauged where hydrological observation is not exist or sparse. For validation we applied our methodology and datasets to the Soyanggang Dam basin. It has not only similar shape factor and compactness ratio to those of the target basin but also accurate, adequate, and abundant measurements. The results showed that the flood discharge from Typhoon Lionrock corresponded to three to five years design floods in the Duman River basin. This indicate that the Duman River basin has a high risk of flood in the near future. Finally this study demonstrated that remotely sensed data and geographic information could be utilized to simulate flood discharge in an ungauged watershed.

The Changing Epidemiology of Gastroesophageal Reflux Disease: Are Patients Getting Younger?

  • Yamasaki, Takahisa;Hemond, Colin;Eisa, Mohamed;Ganocy, Stephen;Fass, Ronnie
    • Journal of Neurogastroenterology and Motility
    • /
    • v.24 no.4
    • /
    • pp.559-569
    • /
    • 2018
  • Background/Aims Gastroesophageal reflux disease (GERD) is a common disease globally with increasing prevalence and consequently greater burden on the Healthcare system. Traditionally, GERD has been considered a disease of middle-aged and older people. Since risk factors for GERD affect a growing number of the adult population, concerns have been raised that increasingly younger people may develop GERD. We aim to determine if the proportion of younger patients has increased among the GERD population. Methods The incidence of GERD as well as several variables were evaluated during an 11-year period. Explorys was used to evaluate datasets at a "Universal" and Healthcare system in northern Ohio to determine if trends at a local level reflected those at a universal level. GERD patients were classified into 7 age groups (15-19, 20-29, 30-39, 40-49, 50-59, 60-69, and ${\geq}70$ years). Results The proportion of patients with GERD increased in all age groups, except for those who were ${\geq}70$ years in the universal dataset (P < 0.001) and those who were ${\geq}60$ years in the Healthcare system (P < 0.001). The greatest rise was seen in 30-39 years in both datasets (P < 0.001). Similarly, the proportion of GERD patients who were using proton pump inhibitors increased in all age groups except for those who were ${\geq}70$ years in both datasets (P < 0.001), with the greatest increase being the group 30-39 years (P < 0.001). Conclusion Over the last decade, there has been a significant increase in the proportion of younger patients with GERD, especially those within the age range of 30-39 years.

DRAZ: SPARQL Query Engine for heterogeneous metadata sources (DRAZ : 이기종 메타 데이터 소스를 위한 SPARQL 쿼리 엔진)

  • Qudus, UMAIR;Hossain, Md Ibrahim;Lee, ChangJu;Khan, Kifayat Ullah;Won, Heesun;Lee, Young-Koo
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.69-85
    • /
    • 2018
  • Many researches proposed federated query engines to perform query on several homogeneous or heterogeneous datasets simultaneously that significantly improve the quality of query results. The existing techniques allow querying only over a few heterogeneous datasets considering the static binding using the non-standard query. However, we observe that a simultaneous system considering the integration of heterogeneous metadata standards can offer better opportunity to generalize the query over any homogeneous and heterogeneous datasets. In this paper, we propose a transparent federated engine (DRAZ) to query over multiple data sources using SPARQL. In our system, we first develop the ontology for a non-RDF metadata standard based on the metadata kernel dictionary elements, which are standardized by the metadata provider. For a given SPARQL query, we translate any triple pattern into an API call to access the dataset of corresponding non-RDF metadata standard. We convert the results of every API call to N-triples and summarize the final results considering all triple patterns. We evaluated our proposed DRAZ using modified Fedbench benchmark queries over heterogeneous metadata standards, such as DCAT and DOI. We observed that DRAZ can achieve 70 to 100 percent correctness of the results despite the unavailability of the JOIN operations.