• 제목/요약/키워드: Classification Database

검색결과 940건 처리시간 0.026초

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

  • Dorjmaa, Tserendulam;Shin, Taeksoo
    • 한국IT서비스학회지
    • /
    • 제16권3호
    • /
    • pp.167-183
    • /
    • 2017
  • The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.

해양 전자정보자원 메타 데이터베이스 시스템 설계 및 구현방안에 관한 연구 (A Study on Planning & Implementation of the Meta Database System for Ocean Electronic Resources)

  • 한종엽
    • 한국도서관정보학회지
    • /
    • 제33권2호
    • /
    • pp.109-137
    • /
    • 2002
  • 해양 전자정보자원 메타 데이터베이스 시스템 설계 및 구현을 위해 국내외 관련 연구를 조사하고 분석하였다. 연구대상은 해양분야 웹 자원에서 해양조사자료가지를 범위로 하였다. 본 연구의 목적은 네트워크 자원의 기술에 적합한 더블린코어를 기반으로 효율적인 해양분야 전자정보자원 정보검색서비스를 제공하는데 있다. 본 논문에서는 해양분야 전자정보원 조사, 메타데이터 기술요소 분석, 베타데이터 분류체계, 시스템 구성 및 검색 구현방안의 연구를 수행하였다.

  • PDF

Data-processing pipeline and database design for integrated analysis of mycoviruses

  • Je, Mikyung;Son, Hyeon Seok;Kim, Hayeon
    • International journal of advanced smart convergence
    • /
    • 제8권3호
    • /
    • pp.115-122
    • /
    • 2019
  • Recent and ongoing discoveries of mycoviruses with new properties demand the development of an appropriate research infrastructure to analyze their evolution and classification. In particular, the discovery of negative-sense single-stranded mycoviruses is worth noting in genome types in which double-stranded RNA virus and positive-sense single-stranded RNA virus were predominant. In addition, some genomic properties of mycoviruses are more interesting because they have been reported to have similarities with the pathogenic virus family that infects humans and animals. Genetic information on mycoviruses continues to accumulate in public repositories; however, these databases have some difficulty reflecting the latest taxonomic information and obtaining specialized data for mycoviruses. Therefore, in this study, we developed a bioinformatics-based pipeline to efficiently utilize this genetic information. We also designed a schema for data processing and database construction and an algorithm to keep taxonomic information of mycoviruses up to date. The pipeline and database (termed 'mycoVDB') presented in this study are expected to serve as useful foundations for improving the accuracy and efficiency of future research on mycoviruses.

KDRG를 이용한 건강보험 외래 진료비 분류 타당성 (On Feasibility of Ambulatory KDRGs for the Classification of Health Insurance Claims)

  • 박하영;박기동;신영수
    • 보건행정학회지
    • /
    • 제13권1호
    • /
    • pp.98-115
    • /
    • 2003
  • Concerns about growing health insurance expenditures became a national Issue in 2001 when the National Health Insurance went into a deficit. Increases in spending for ambulatory care shared the largest portion of the problem. Methods and systems to control the spending should be developed and a system to measure case mix of providers is one of core components of the control system. The objectives of this article is to examine the feasibility of applying Korean Diagnosis Related Groups (KDRGs) to classify health insurance claims for ambulatory care and to identify problem areas of the classification. A database of 11,586,270 claims for ambulatory care delivered during January 2002 was obtained for the study, and the final number of claims analyzed was 8,319,494 after KDRG numbers were assigned to the data and records with an error KDRG were excluded from the study. The unit of analysis was a claim and resource use was measured by the sum of charges incurred during a month at a department of a hospital of at a clinic. Within group variance was assessed by th coefficient of variation (CV), and the classification accuracy was evaluated by the variance reduction achieved by the KDRG classification. The analyses were performed on both all and non-outlier data, and on a subset of the database to examine the validity of study results. Data were assigned to 787 KDRGs among 1,244 KDRGs defined in the classification system. For non-outlier data, 77.4% of KDRGs had a CV of charges from tertiary care hospitals less than 100% and 95.43% of KDRGs for data from clinics. The variance reduction achieved by the KDRG classification was 40.80% for non-outlier claims from tertiary care hospitals, 51.98% for general hospitals, 40.89% for hospitals, and 54.99% for clinics. Similar results were obtained from the analyses performed on a subset of the study database. The study results indicated that KDRGs developed for a classification of inpatient care could be used for ambulatory care, although there were areas where the classification should be refined. Its power to predict tile resource utilization showed a potential for its application to measure case mix of providers for monitoring and managing delivery of ambulatory care. The issue concerning the quality of diagnostic information contained in insurance claims remains to be improved, and significance of future studies for other classification systems based on visits or episodes is guaranteed.

과학기술 분야 통합 개념체계의 구축 방안 연구 (An Integrated Ontological Approach to Effective Information Management in Science and Technology)

  • 정영미;김명옥;이재윤;한승희;유재복
    • 정보관리학회지
    • /
    • 제19권1호
    • /
    • pp.135-161
    • /
    • 2002
  • 과학기술 분류표, 시소러스, 용어사전 등의 주요한 색인 및 검색 도구를 한국어, 영어 일본어의 3개 언어로 통합 구축하여 활용할 수 있도록 다기능, 다국어 과학기술 통합 개념체계의 모형을 설계하였다. 이 연구에서는 개념을 기본 단위로 한 시소러스 모형을 개발하였으며, 시소러스와 연계되는 용어사전 레코드는 ISO 12620 표준에 근거하여 필수요소를 지정하였다. 또한 과학기술분야 표준분류표를 마련하고 기존의 일반 분류표와의 매핑 테이블을 작성하여 다른 분류표를 통한 접근이 가능하도록 하였다. 본 연구에서 개발한 통합 개념체계를 이용하여 원자력 분야를 대상으로 한 프로토타입 시스템을 구축하고 실제 검색 사례를 제시하였다.

음악추천을 위한 다중 옥타브 밴드 기반 장르 분류기 (Multiple octave-band based genre classification algorithm for music recommendation)

  • 임신철;장세진;이석필;김무영
    • 한국정보통신학회논문지
    • /
    • 제15권7호
    • /
    • pp.1487-1494
    • /
    • 2011
  • 본 논문은 음악 추천을 위한 새로운 장르 분류 알고리즘을 제안하였다. 특히, 장르 분류 알고리즘에 사용되는 특정 벡터 중 octave-based spectral contrast (OSC)의 성능 개선을 위해서 심리청각 모델과 악기별 사용 octave 범위에 근거하여 새로운 band-pass filter를 설계하였다. 10개 장르별 음악을 포함하고 있는 GTZAN database에 대해서 10-fold cross validation 실험 결과, 다중 옥타브 밴드 OSC에 대해서 기존 OSC에 비해 2.26% 향상된 인식율을 얻을 수 있었다. 또한, 기존의 mel-frequency cepstral coefficient (MFCC)와 복합 특징 벡터를 구성하여 실험한 결과, 향상된 인식율을 얻을 수 있었다.

공동주택의 공사정보분류체계를 활용한 적산 자동화 개념 모형 개발 (A Conceptual Model for Automated Cost Estimating Using Work Information Classification System of Apartment House)

  • Lee, Yang Kyu;Park, Hong Tae
    • 한국재난정보학회 논문집
    • /
    • 제10권1호
    • /
    • pp.15-24
    • /
    • 2014
  • 본 연구는 설계 과정의 분해, 시공 과정의 조립, 공사비 적산 등 공사의 계획과 관리에 걸친 모든 공사 관리의 업무를 체계화할 수 있는 공동주택의 공사정보분류체계를 제시하였다. 또한, 본 연구는 이 공사정보분류체계를 작업순서에 따라 관계형 데이터베이스(Data Base)로 구축 방법을 제시하였고, 구축된 데이터베이스를 근거로 적산 자동화 시스템 개념 모형을 구축하였다. 이러한 적산 자동화 시스템 개념 모형은 기존 적산 시스템들의 근본적인 문제점이었던 부적절함을 해소하여 공동주택 건설현장에서 효과적으로 적용가능한 과학적인 적산 시스템으로 활용할 수 있을 것이다.

An Adjustment for a Regional Incongruity in Global land Cover Map: case of Korea

  • Park Youn-Young;Han Kyung-Soo;Yeom Jong-Min;Suh Yong-Cheol
    • 대한원격탐사학회지
    • /
    • 제22권3호
    • /
    • pp.199-209
    • /
    • 2006
  • The Global Land Cover 2000 (GLC 200) project, as a most recent issue, is to provide for the year 2000 a harmonized land cover database over the whole globe. The classifications were performed according to continental or regional scales by corresponding organization using the data of VEGETATION sensor onboard the SPOT4 Satellite. Even if the global land cover classification for Asia provided by Chiba University showed a good accuracy in whole Asian area, some problems were detected in Korean region. Therefore, the construction of new land cover database over Korea is strongly required using more recent data set. The present study focuses on the development of a new upgraded land cover map at 1 km resolution over Korea considering the widely used K-means clustering, which is one of unsupervised classification technique using distance function for land surface pattern classification, and the principal components transformation. It is based on data sets from the Earth observing system SPOT4/VEGETATION. Newly classified land cover was compared with GLC 2000 for Korean peninsula to access how well classification performed using confusion matrix.

스마트 헬스케어 환경에서 복잡도를 고려한 R파 검출 및 QRS 패턴을 통한 향상된 부정맥 분류 방법 (R Wave Detection and Advanced Arrhythmia Classification Method through QRS Pattern Considering Complexity in Smart Healthcare Environments)

  • 조익성
    • 디지털산업정보학회논문지
    • /
    • 제17권1호
    • /
    • pp.7-14
    • /
    • 2021
  • With the increased attention about healthcare and management of heart diseases, smart healthcare services and related devices have been actively developed recently. R wave is the largest representative signal among ECG signals. R wave detection is very important because it detects QRS pattern and classifies arrhythmia. Several R wave detection algorithms have been proposed with different features, but the remaining problem is their implementation in low-cost portable platforms for real-time applications. In this paper, we propose R wave detection based on optimal threshold and arrhythmia classification through QRS pattern considering complexity in smart healthcare environments. For this purpose, we detected R wave from noise-free ECG signal through the preprocessing method. Also, we classify premature ventricular contraction arrhythmia in realtime through QRS pattern. The performance of R wave detection and premature ventricular contraction arrhythmia classification is evaluated by using 9 record of MIT-BIH arrhythmia database that included over 30 premature ventricular contraction. The achieved scores indicate the average of 98.72% in R wave detection and the rate of 94.28% in PVC classification.

Validation of Administrative Big Database for Colorectal Cancer Searched by International Classification of Disease 10th Codes in Korean: A Retrospective Big-cohort Study

  • Hwang, Young-Jae;Kim, Nayoung;Yun, Chang Yong;Yoon, Hyuk;Shin, Cheol Min;Park, Young Soo;Son, Il Tae;Oh, Heung-Kwon;Kim, Duck-Woo;Kang, Sung-Bum;Lee, Hye Seung;Park, Seon Mee;Lee, Dong Ho
    • Journal of Cancer Prevention
    • /
    • 제23권4호
    • /
    • pp.183-190
    • /
    • 2018
  • Background: As the number of big-cohort studies increases, validation becomes increasingly more important. We aimed to validate administrative database categorized as colorectal cancer (CRC) by the International Classification of Disease (ICD) 10th code. Methods: Big-cohort was collected from Clinical Data Warehouse using ICD 10th codes from May 1, 2003 to November 30, 2016 at Seoul National University Bundang Hospital. The patients in the study group had been diagnosed with cancer and were recorded in the ICD 10th code of CRC by the National Health Insurance Service. Subjects with codes of inflammatory bowel disease or tuberculosis colitis were selected for the control group. For the accuracy of registered CRC codes (C18-21), the chart, imaging results, and pathologic findings were examined by two reviewers. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for CRC were calculated. Results: A total of 6,780 subjects with CRC and 1,899 control subjects were enrolled. Of these patients, 22 subjects did not have evidence of CRC by colonoscopy, computed tomography, magnetic resonance imaging, or positron emission tomography. The sensitivity and specificity of hospitalization data for identifying CRC were 100.00% and 98.86%, respectively. PPV and NPV were 99.68% and 100.00%, respectively. Conclusions: The big-cohort database using the ICD 10th code for CRC appears to be accurate.