• Title/Summary/Keyword: Classification Database

Search Result 951, Processing Time 0.025 seconds

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

  • Dorjmaa, Tserendulam;Shin, Taeksoo
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.167-183
    • /
    • 2017
  • The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.

A Study on Planning & Implementation of the Meta Database System for Ocean Electronic Resources (해양 전자정보자원 메타 데이터베이스 시스템 설계 및 구현방안에 관한 연구)

  • 한종엽
    • Journal of Korean Library and Information Science Society
    • /
    • v.33 no.2
    • /
    • pp.109-137
    • /
    • 2002
  • A literature analysis for the planning and realization of meta database system was carried out to establish the ocean electronic resources, the first in Korea. The study targeted from web resources and to oceanographic survey data. The focus of the analysis lies in the providing practical information retrieval service for ocean electronic resources based on the framework of effective Dublin Core metadata with network resources description. The analyses included ocean electronic resources, metadata descriptive elements, metadata classification, system organization and retrieval for planning and implementation of meta database system.

  • PDF

Data-processing pipeline and database design for integrated analysis of mycoviruses

  • Je, Mikyung;Son, Hyeon Seok;Kim, Hayeon
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.115-122
    • /
    • 2019
  • Recent and ongoing discoveries of mycoviruses with new properties demand the development of an appropriate research infrastructure to analyze their evolution and classification. In particular, the discovery of negative-sense single-stranded mycoviruses is worth noting in genome types in which double-stranded RNA virus and positive-sense single-stranded RNA virus were predominant. In addition, some genomic properties of mycoviruses are more interesting because they have been reported to have similarities with the pathogenic virus family that infects humans and animals. Genetic information on mycoviruses continues to accumulate in public repositories; however, these databases have some difficulty reflecting the latest taxonomic information and obtaining specialized data for mycoviruses. Therefore, in this study, we developed a bioinformatics-based pipeline to efficiently utilize this genetic information. We also designed a schema for data processing and database construction and an algorithm to keep taxonomic information of mycoviruses up to date. The pipeline and database (termed 'mycoVDB') presented in this study are expected to serve as useful foundations for improving the accuracy and efficiency of future research on mycoviruses.

On Feasibility of Ambulatory KDRGs for the Classification of Health Insurance Claims (KDRG를 이용한 건강보험 외래 진료비 분류 타당성)

  • 박하영;박기동;신영수
    • Health Policy and Management
    • /
    • v.13 no.1
    • /
    • pp.98-115
    • /
    • 2003
  • Concerns about growing health insurance expenditures became a national Issue in 2001 when the National Health Insurance went into a deficit. Increases in spending for ambulatory care shared the largest portion of the problem. Methods and systems to control the spending should be developed and a system to measure case mix of providers is one of core components of the control system. The objectives of this article is to examine the feasibility of applying Korean Diagnosis Related Groups (KDRGs) to classify health insurance claims for ambulatory care and to identify problem areas of the classification. A database of 11,586,270 claims for ambulatory care delivered during January 2002 was obtained for the study, and the final number of claims analyzed was 8,319,494 after KDRG numbers were assigned to the data and records with an error KDRG were excluded from the study. The unit of analysis was a claim and resource use was measured by the sum of charges incurred during a month at a department of a hospital of at a clinic. Within group variance was assessed by th coefficient of variation (CV), and the classification accuracy was evaluated by the variance reduction achieved by the KDRG classification. The analyses were performed on both all and non-outlier data, and on a subset of the database to examine the validity of study results. Data were assigned to 787 KDRGs among 1,244 KDRGs defined in the classification system. For non-outlier data, 77.4% of KDRGs had a CV of charges from tertiary care hospitals less than 100% and 95.43% of KDRGs for data from clinics. The variance reduction achieved by the KDRG classification was 40.80% for non-outlier claims from tertiary care hospitals, 51.98% for general hospitals, 40.89% for hospitals, and 54.99% for clinics. Similar results were obtained from the analyses performed on a subset of the study database. The study results indicated that KDRGs developed for a classification of inpatient care could be used for ambulatory care, although there were areas where the classification should be refined. Its power to predict tile resource utilization showed a potential for its application to measure case mix of providers for monitoring and managing delivery of ambulatory care. The issue concerning the quality of diagnostic information contained in insurance claims remains to be improved, and significance of future studies for other classification systems based on visits or episodes is guaranteed.

An Integrated Ontological Approach to Effective Information Management in Science and Technology (과학기술 분야 통합 개념체계의 구축 방안 연구)

  • 정영미;김명옥;이재윤;한승희;유재복
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.1
    • /
    • pp.135-161
    • /
    • 2002
  • This study presents a multilingual integrated ontological approach that enables linking classification systems. thesauri. and terminology databases in science and technology for more effective indexing and information retrieval online. In this integrated system, we designed a thesaurus model with concept as a unit and designated essential data elements for a terminology database on the basis of ISO 12620 standard. The classification system for science and technology adopted in this study provides subject access channels from other existing classification systems through its mapping table. A prototype system was implemented with the field of nuclear energy as an application area.

Multiple octave-band based genre classification algorithm for music recommendation (음악추천을 위한 다중 옥타브 밴드 기반 장르 분류기)

  • Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1487-1494
    • /
    • 2011
  • In this paper, a novel genre classification algorithm is proposed for music recommendation system. Especially, to improve the classification accuracy, the band-pass filter for octave-based spectral contrast (OSC) feature is designed considering the psycho-acoustic model and actual frequency range of musical instruments. The GTZAN database including 10 genres was used for 10-fold cross validation experiments. The proposed multiple-octave based OSC produces better accuracy by 2.26% compared with the conventional OSC. The combined feature vector based on the proposed OSC and mel-frequency cepstral coefficient (MFCC) gives even better accuracy.

A Conceptual Model for Automated Cost Estimating Using Work Information Classification System of Apartment House (공동주택의 공사정보분류체계를 활용한 적산 자동화 개념 모형 개발)

  • Lee, Yang Kyu;Park, Hong Tae
    • Journal of the Society of Disaster Information
    • /
    • v.10 no.1
    • /
    • pp.15-24
    • /
    • 2014
  • The study presents work information classification system of apartment house which can organize all construction management services throughout the planning and management of a construction such as the decomposition of the design process, the assembly of construction process and cost estimating, etc. In addition, the study suggested a way to connect work information classification system based on a relational database in working order and built a conceptual model for automated cost estimating by utilizing established data base. A conceptual model for automated cost estimating will resolve the fundamental problems of the existing cost estimating system and will be able to take advantage of scientific cost estimating system at the construction site of apartment house.

An Adjustment for a Regional Incongruity in Global land Cover Map: case of Korea

  • Park Youn-Young;Han Kyung-Soo;Yeom Jong-Min;Suh Yong-Cheol
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.3
    • /
    • pp.199-209
    • /
    • 2006
  • The Global Land Cover 2000 (GLC 200) project, as a most recent issue, is to provide for the year 2000 a harmonized land cover database over the whole globe. The classifications were performed according to continental or regional scales by corresponding organization using the data of VEGETATION sensor onboard the SPOT4 Satellite. Even if the global land cover classification for Asia provided by Chiba University showed a good accuracy in whole Asian area, some problems were detected in Korean region. Therefore, the construction of new land cover database over Korea is strongly required using more recent data set. The present study focuses on the development of a new upgraded land cover map at 1 km resolution over Korea considering the widely used K-means clustering, which is one of unsupervised classification technique using distance function for land surface pattern classification, and the principal components transformation. It is based on data sets from the Earth observing system SPOT4/VEGETATION. Newly classified land cover was compared with GLC 2000 for Korean peninsula to access how well classification performed using confusion matrix.

R Wave Detection and Advanced Arrhythmia Classification Method through QRS Pattern Considering Complexity in Smart Healthcare Environments (스마트 헬스케어 환경에서 복잡도를 고려한 R파 검출 및 QRS 패턴을 통한 향상된 부정맥 분류 방법)

  • Cho, Iksung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.1
    • /
    • pp.7-14
    • /
    • 2021
  • With the increased attention about healthcare and management of heart diseases, smart healthcare services and related devices have been actively developed recently. R wave is the largest representative signal among ECG signals. R wave detection is very important because it detects QRS pattern and classifies arrhythmia. Several R wave detection algorithms have been proposed with different features, but the remaining problem is their implementation in low-cost portable platforms for real-time applications. In this paper, we propose R wave detection based on optimal threshold and arrhythmia classification through QRS pattern considering complexity in smart healthcare environments. For this purpose, we detected R wave from noise-free ECG signal through the preprocessing method. Also, we classify premature ventricular contraction arrhythmia in realtime through QRS pattern. The performance of R wave detection and premature ventricular contraction arrhythmia classification is evaluated by using 9 record of MIT-BIH arrhythmia database that included over 30 premature ventricular contraction. The achieved scores indicate the average of 98.72% in R wave detection and the rate of 94.28% in PVC classification.

Novel Database Classification and Life Estimation Model for Accurate Database Asset Valuation

  • Youn-Soo Park;Ho-Hyun Park;Dong-Woon Jeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.131-143
    • /
    • 2023
  • In the future knowledge society, the importance of business data is expected to increase, and it is recognized as a raw material for companies to manufacture product or develop service. As the importance of data increases, methods to calculate the economic value of database assets is being studied. There are many studies to evaluate the value of database assets, but the characteristics of database assets are not fully reflected. In this study, we classified database assets into revenue-type, non-revenue-type, and public-type database assets by considering the characteristics of database assets. In addition, focusing on the fact that revenue-type database assets can be valued similarly to existing technology valuation, we developed a method for calculating the life of database assets that includes risk-adjusted discount rate.