• Title/Summary/Keyword: Record Selection

Search Result 150, Processing Time 0.029 seconds

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

Smarter Classification for Imbalanced Data Set and Its Application to Patent Evaluation (불균형 데이터 집합에 대한 스마트 분류방법과 특허 평가에의 응용)

  • Kwon, Ohbyung;Lee, Jonathan Sangyun
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.15-34
    • /
    • 2014
  • Overall, accuracy as a performance measure does not fully consider modular accuracy: the accuracy of classifying 1 (or true) as 1 is not same as classifying 0 (or false) as 0. A smarter classification algorithm would optimize the classification rules to match the modular accuracies' goals according to the nature of problem. Correspondingly, smarter algorithms must be both more generalized with respect to the nature of problems, and free from decretization, which may cause distortion of the real performance. Hence, in this paper, we propose a novel vertical boosting algorithm that improves modular accuracies. Rather than decretizing items, we use simple classifiers such as a regression model that accepts continuous data types. To improve the generalization, and to select a classification model that is well-suited to the nature of the problem domain, we developed a model selection algorithm with smartness. To show the soundness of the proposed method, we performed an experiment with a real-world application: predicting the intellectual properties of e-transaction technology, which had a 47,000+ record data set.

The Observed Change in Interannual Variations of January Minimum Temperature between 1951-1980 and 1971-2000 in South Korea (지난 반세기 동안 남한에서 관측된 1월 최저기온의 연차변이)

  • Jung J. E.;Chung U.;Yun J. I.;Choi D. K.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.6 no.4
    • /
    • pp.235-241
    • /
    • 2004
  • There is a growing concern about the possible increase in inter-annual variation of minimum temperature during the winter season in Korea. This view is strengthened by frequently reported freezing injury to dormant fruit trees, while warmer winters have prevailed recently. The January minimum temperature record at fourteen weather stations was analyzed for 1951-2000. The results showed no evidence of increasing standard deviation at 3 locations between 1951-1980 and 1971-2000, while the remaining 11 stations showed a trend of decreasing standard deviation for the two periods. An empirical model explaining the spatial variation of the standard deviation was derived by regression analysis of 56 stations' data for 1971-2000. Daily minimum temperature and the site elevation may account for 68% of the observed variations. We applied this model to restore the average standard deviation of the January minimum temperature for 1971-2000, and the result was used to produce gridded minimum temperature data for the recurrence interval of 10 and 30 years at 250m resolution. A digital form of the plant hardiness zone map may be developed from this product for site-specific selection of adapted plant species.

The Development of a Computer-Assisted HACCP Program for the Microbiological Quality Assurance in Hospital Foodservice Operations (병원급식의 미생물적 품질보증을 위한 HACCP 전산프로그램의 개발)

  • Kwak, Tong-Kyung;Ryu, Kyung;Choi, Seong-Kyung
    • Journal of the Korean Society of Food Culture
    • /
    • v.11 no.1
    • /
    • pp.107-121
    • /
    • 1996
  • This study was carried out to develop the computer-assisted Hazard Analysis and Critical Control Point (HACCP) program for a systematic approach to the identification, assessment and control of hazards for foodservice manager to assure the microbiological quality of food in hospital foodservice operations. Sanitation practices were surveyed and analyzed in the dietetic department of 4 hospitals. Among them, one 762-bed general hospital was selected as standard model to develop computer-assisted HACCP program. All data base files and processing programs were created by using Foxpro package for easy access of HACCP concept. HACCP program was developed based on the methods suggested by NACMCF, IAMFES and Bryan. This program consisted of two parts: the pre-stage for HACCP study and the implementation stage of the HACCP system. 1. Pre-stage for HACCP study includes the selection of menu item, the development of the HACCP recipe, the construction of a product flow diagram, and printing the HACCP recipe and a product flow diagram. A menu item for HACCP study can be selected from the menu item lists classified by cooking methods. HACCP recipe includes ingredients, their amount and cooking procedure. A flow diagram is constructed based on the HACCP recipe. The HACCP recipe and a product flow diagram are printed out. 2. Implementation of HACCP study includes the identification of microbiological hazards, the determination of critical control points, the establishment of control methods of each hazard, and the complementation of data base file. Potentially hazardous ingredients are determined and microbiological hazards are identified in each phase of the product flow. Critical control points (CCPs) are identified by applying CCP decision trees for ingredients and each process stage. After hazards and CCPs are identified, criteria, monitoring system, corrective action plan, record-keeping system and verification methods are established. When the HACCP study is complemented, HACCP study result forms are printed out. HACCP data base file can be either added, corrected or deleted.

  • PDF

A Record and Conservation of Cultural Heritages through Web Ecomuseum : the Case of Mountain Mudeung (웹 생태박물관을 활용한 문화유산의 기록과 보존 : 무등산을 중심으로)

  • Noh, Shi-Hun
    • The Korean Journal of Archival Studies
    • /
    • no.27
    • /
    • pp.209-238
    • /
    • 2011
  • Ecomuseum which appeared in France in 1968 and widely diffused over the world, is a new type of museum. The purpose of this museum is not to simply possess and exhibit the existing relics, but to discover the locational senses of a territory by in-situ conserving and interpreting its entire natural and cultural heritages, and to plan the participation of its population and the development of its local community. The significance of this museum can be found in the recovery of disappearing collective memories of a territory, the restoration of the cultural identity of its population and the revitalization of a underdeveloped area. As the majority of these museums are fragmented or open air museums, an 'web ecomuseum' which makes the remote offering of informations about whole dispersed heritages and their holistic interpretation possible by digitalizing, recording, conserving, interpreting and utilizing related heritages, is necessary. This paper considers the possibility of web ecomuseum and its constitution contents and methods through the case of Mountain Mudeung area. Especially, in relation to the latter, this paper suggests a plan which consists of selection of own local themes, construction of digital archives, design of web expositions and production of electronic cultural maps.

A Study on the Job Analysis of Job Competency Assessor (직무능력평가사의 직무분석에 관한 연구)

  • Lee, Jin Gu;Jung, Il-chan;Kim, Jiyoung
    • Journal of Practical Engineering Education
    • /
    • v.14 no.2
    • /
    • pp.413-423
    • /
    • 2022
  • The purpose of this study is to analyze the role of the job competency assessor who assess achievement of job performance ability based on NCS (educational training, qualifications, field experience, etc.) through competency assessment. For this purpose, job analysis including development and verification of the job model and selection of core task are conducted. As a result, main duties of the job competency assessor are to understand the NCS based assessment principle, establish an assessment plan, design and develop assessment tools, assess competence, provide feedback and re-assessment, record and manage assessment result, verify the internal assessment result, establish the RPL (recognition of prior learning) plan, implement the RPL and verify the RPL assessment result, and 48 task are derived. In addition, a total of 21 core tasks are derived based on the threshold value multiplied by the importance and difficulty of the task for each duty. Based on this, implications for job analysis of the job competency assessor are presented.

A Study on Management of Records of Art Archives (미술 아카이브의 미술기록관리 방안 연구)

  • Jeong, Hye-Rin;Kim, Ik-Han
    • The Korean Journal of Archival Studies
    • /
    • no.20
    • /
    • pp.151-212
    • /
    • 2009
  • Museums are producing new value and being redefined as places that reproduce context, as the process of globalization are being reflected in museum activities. The new additional functions and roles to the traditional mission of museums allow artworks to find potential functions of art archive and meseum. At the same time, the public has faced originality and aura of an artwork by viewing the physical subject. However, with the appearance of a new digital object, the initiative of viewing has moved over from the artwork to the hands of the public. Now, the public does not go to the museum to see an artwork, but has started to adopt to an opposite paradigm of bringing the artwork forward to the screen. Therefore, they are not satisfied any longer with just seeing an artwork, but demand more information about the artworks and reproduce it as knowledge. Therefore, this study aimed to find types and characteristics through definition and range selection of art archive at this point where the value of art archive is enhanced and systematic management is required, and to present record management methods according to art archive structure and core execution function. It especially stressed that the basis of overall art archive definition was in an 'approach' paradigm rather than a 'preservation' paradigm, and embodied various application methods of digitalized art records. The digital object of an artwork was recognized as the first materialization of an actual artwork, and the digital original of an artwork was presented as the core record. Art archive managed under physical and intellectual control were organically restructured focusing on digital original copies of artworks, which are the core record in a digital technology environment, and could be provided to users in forms of various services that meet their demands. The beginning of systematic management of such art records will become a first step to enhance historical value, establish art cultural identity, and truly possess art culture.

Variation of Hospital Costs and Product Heterogeneity

  • Shin, Young-Soo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.11 no.1
    • /
    • pp.123-127
    • /
    • 1978
  • The major objective of this research is to identify those hospital characteristics that best explain cost variation among hospitals and to formulate linear models that can predict hospital costs. Specific emphasis is placed on hospital output, that is, the identification of diagnosis related patient groups (DRGs) which are medically meaningful and demonstrate similar patterns of hospital resource consumption. A casemix index is developed based on the DRGs identified. Considering the common problems encountered in previous hospital cost research, the following study requirements are estab-lished for fulfilling the objectives of this research: 1. Selection of hospitals that exercise similar medical and fiscal practices. 2. Identification of an appropriate data collection mechanism in which demographic and medical characteristics of individual patients as well as accurate and comparable cost information can be derived. 3. Development of a patient classification system in which all the patients treated in hospitals are able to be split into mutually exclusive categories with consistent and stable patterns of resource consumption. 4. Development of a cost finding mechanism through which patient groups' costs can be made comparable across hospitals. A data set of Medicare patients prepared by the Social Security Administration was selected for the study analysis. The data set contained 27,229 record abstracts of Medicare patients discharged from all but one short-term general hospital in Connecticut during the period from January 1, 1971, to December 31, 1972. Each record abstract contained demographic and diagnostic information, as well as charges for specific medical services received. The 'AUT-OGRP System' was used to generate 198 DRGs in which the entire range of Medicare patients were split into mutually exclusive categories, each of which shows a consistent and stable pattern of resource consumption. The 'Departmental Method' was used to generate cost information for the groups of Medicare patients that would be comparable across hospitals. To fulfill the study objectives, an extensive analysis was conducted in the following areas: 1. Analysis of DRGs: in which the level of resource use of each DRG was determined, the length of stay or death rate of each DRG in relation to resource use was characterized, and underlying patterns of the relationships among DRG costs were explained. 2. Exploration of resource use profiles of hospitals; in which the magnitude of differences in the resource uses or death rates incurred in the treatment of Medicare patients among the study hospitals was explored. 3. Casemix analysis; in which four types of casemix-related indices were generated, and the significance of these indices in the explanation of hospital costs was examined. 4. Formulation of linear models to predict hospital costs of Medicare patients; in which nine independent variables (i. e., casemix index, hospital size, complexity of service, teaching activity, location, casemix-adjusted death. rate index, occupancy rate, and casemix-adjusted length of stay index) were used for determining factors in hospital costs. Results from the study analysis indicated that: 1. The system of 198 DRGs for Medicare patient classification was demonstrated not only as a strong tool for determining the pattern of hospital resource utilization of Medicare patients, but also for categorizing patients by their severity of illness. 2. The wei틴fed mean total case cost (TOTC) of the study hospitals for Medicare patients during the study years was $11,27.02 with a standard deviation of $117.20. The hospital with the highest average TOTC ($1538.15) was 2.08 times more expensive than the hospital with the lowest average TOTC ($743.45). The weighted mean per diem total cost (DTOC) of the study hospitals for Medicare patients during the sutdy years was $107.98 with a standard deviation of $15.18. The hospital with the highest average DTOC ($147.23) was 1.87 times more expensive than the hospital with the lowest average DTOC ($78.49). 3. The linear models for each of the six types of hospital costs were formulated using the casemix index and the eight other hospital variables as the determinants. These models explained variance to the extent of 68.7 percent of total case cost (TOTC), 63.5 percent of room and board cost (RMC), 66.2 percent of total ancillary service cost (TANC), 66.3 percent of per diem total cost (DTOC), 56.9 percent of per diem room and board cost (DRMC), and 65.5 percent of per diem ancillary service cost (DTANC). The casemix index alone explained approximately one half of interhospital cost variation: 59.1 percent for TOTC and 44.3 percent for DTOC. Thsee results demonstrate that the casemix index is the most importand determinant of interhospital cost variation Future research and policy implications in regard to the results of this study is envisioned in the following three areas: 1. Utilization of casemix related indices in the Medicare data systems. 2. Refinement of data for hospital cost evaluation. 3. Development of a system for reimbursement and cost control in hospitals.

  • PDF

Impact of Personal Health Information Security Awareness on Convenience (개인의료정보보안인식이 편의성에 미치는 영향)

  • Park, Jung-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.6
    • /
    • pp.600-612
    • /
    • 2017
  • The purpose of the research is that how awareness of importance of personal medical data, laws regarding personal medical data and perception gap regarding information of medical data system may affect usage of hospital convenience between a regular patient who has experienced hospital service and medical professionals. Preceding research analysis was conducted previous on establishing research model; 150 questionnaires to a regular patient and 150 questionnaires for a medical professional, total of 300 questionnaires were gathered for conducting a question investigation. First of all, the research concluded that there are a regular perception differences between a regular patient and medical professional. Moreover, there are perception differences among the different gender, age, and area of residence. Furthermore, medical professionals tend to consider that convenience of hospital usage will be increased if user strengthens recognition of security of personal medical data. Results of hypothesis stress that higher awareness of exposure of personal medical data and medical information system affect decision making convenience for a better usage of hospital. On the other side, awareness of laws related with personal medical information security does not affect decision making convenience of hospital usage and transaction. The results of the research analyzes with proof that strengthening awareness of personal medical data security positively increase convenience of decision making and transactions in selection of provided medical service.

A Study on Application of Predictive Coding Tool for Enterprise E-Discovery (기업의 전자증거개시 대응을 위한 예측 부호화(Predictive Coding) 도구 적용 방안)

  • Yu, Jun Sang;Yim, Jin Hee
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.4
    • /
    • pp.125-157
    • /
    • 2016
  • As the domestic companies which have made inroads into foreign markets have more lawsuits, these companies' demands for responding to E-Discovery are also increasing. E-Discovery, derived from Anglo-American law, is the system to find electronic evidences related to lawsuits among scattered electronic data within limited time, to review them as evidences, and to submit them. It is not difficult to find, select, review, and submit evidences within limited time given the reality that the domestic companies do not manage their records even though lots of electronic records are produced everyday. To reduce items to be reviewed and proceed the process efficiently is one of the most important tasks to win a lawsuit. The Predictive Coding is a computer assisted review instrument used in reviewing process of E-Discovery, which is to help companies review their own electronic data using mechanical learning. Predictive Coding is more efficient than the previous computer assister review tools and has a merit to select electronic data related to lawsuit. Through companies' selection of efficient computer assisted review instrument and continuous records management, it is expected that time and cost for reviewing will be saved. Therefore, in for companies to respond to E-Discovery, it is required to seek the most effective method through introduction of the professional Predictive Coding solution and Business records management with consideration of time and cost.