• Title/Summary/Keyword: 이론 기반 데이터 과학

Search Result 119, Processing Time 0.025 seconds

A High Order Product Approximation Method based on the Minimization of Upper Bound of a Bayes Error Rate and Its Application to the Combination of Numeral Recognizers (베이스 에러율의 상위 경계 최소화에 기반한 고차 곱 근사 방법과 숫자 인식기 결합에의 적용)

  • Kang, Hee-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.9
    • /
    • pp.681-687
    • /
    • 2001
  • In order to raise a class discrimination power by combining multiple classifiers under the Bayesian decision theory, the upper bound of a Bayes error rate bounded by the conditional entropy of a class variable and decision variables obtained from training data samples should be minimized. Wang and Wong proposed a tree dependence first-order approximation scheme of a high order probability distribution composed of the class and multiple feature pattern variables for minimizing the upper bound of the Bayes error rate. This paper presents an extended high order product approximation scheme dealing with higher order dependency more than the first-order tree dependence, based on the minimization of the upper bound of the Bayes error rate. Multiple recognizers for unconstrained handwritten numerals from CENPARMI were combined by the proposed approximation scheme using the Bayesian formalism, and the high recognition rates were obtained by them.

  • PDF

A One-Pass Aggregation Algorithm using the Disjoint-Inclusive Partition Multidimensional Files in Multidimensional OLAP (다차원 온라인 분석처리에서 분리-포함 분할 다차원 파일 구조를 사용한 원-패스 집계 알고리즘)

  • Lee, Yeong-Gu;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.28 no.2
    • /
    • pp.153-167
    • /
    • 2001
  • 다차원 온라인 분석처리(Multidimensional On-Line Analytical Processing: MOLAP)에서 집계 연산은 중요한 기본 연산이다. 기존의 MOLAP 집계 연산은 다차원 배열 구조를 기반으로 한 파일 구조에 대해서 연구되어 왔다. 이러한 파일 구조는 편중된 분포를 갖는 데이터에서는 잘 동작하지 못한다는 단점이 있다. 본 논문에서는 편중된 분포에도 잘 동작하는 다차원 파일구조를 사용한 집계 알고리즘을 제안한다. 먼저, 새로운 분리-포함 분할이라는 개념을 사용한 집계 연산 처리 모델을 제안한다. 집계 연산 처리에서 분리-포함 분할 개념을 사용하면 페이지들의 액세스 순서를 미리 알아 낼 수 있다는 특징을 가진다. 그리고, 제안한 모델에 기반하여 원-패스 버퍼 크기(one-pass buffer size)를 사용하여 집계 연산을 처리하는 원-패스 집계 알고리즘을 제안한다. 원-패스 버퍼 크기란 페이지 당 한 번의 디스크 액세스를 보장하기 위해 필요한 최소 버퍼 크기이다. 또한, 제안한 집계 연산 처리 모델 하에서 제안된 알고리즘이 최소의 원-패스 버퍼 크기를 갖는다는 것을 증명한다. 마지막으로, 많은 실험을 통하여 이론적으로 구한 원-패스 버퍼 크기가 실제 환경에서 정확히 동작함을 실험적으로 확인하였다. 리 알고리즘은 미리 알려진 페이지 액세스 순서를 이용하는 버퍼 교체 정책을 사용함으로써 최적의 원-패스 버퍼 크기를 달성한다. 제안하는 알고리즘을 여 러 집계 질의가 동시에 요청되는 다사용자 환경에서 특히 유용하다. 이는 이 알고리즘이 정규화 된 디스크 액세스 횟수를 1.0으로 유지하기 위해 반드시 필요한 크기의 버퍼만을 사용하기 때문이다.

  • PDF

Searching for Optimal Ensemble of Feature-classifier Pairs in Gene Expression Profile using Genetic Algorithm (유전알고리즘을 이용한 유전자발현 데이타상의 특징-분류기쌍 최적 앙상블 탐색)

  • 박찬호;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.525-536
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism, measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify disease with gene expression profile. Because all genes are not related to disease, it is needed to select related genes that is called feature selection, and it is needed to classify selected genes properly. This paper Proposes GA based method for searching optimal ensemble of feature-classifier pairs that are composed with seven feature selection methods based on correlation, similarity, and information theory, and six representative classifiers. In experimental results with leave-one-out cross validation on two gene expression Profiles related to cancers, we can find ensembles that produce much superior to all individual feature-classifier fairs for Lymphoma dataset and Colon dataset.

Design and Implementation of Service based Virtual Screening System in Grids (그리드에서 서비스 기반 가상 탐색 시스템 설계 및 구현)

  • Lee, Hwa-Min;Chin, Sung-Ho;Lee, Jong-Hyuk;Lee, Dae-Won;Park, Seong-Bin;Yu, Heon-Chang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.6
    • /
    • pp.237-247
    • /
    • 2008
  • A virtual screening is the process of reducing an unmanageable number of compounds to a limited number of compounds for the target of interest by means of computational techniques such as molecular docking. And it is one of a large-scale scientific application that requires large computing power and data storage capability. Previous applications or softwares for molecular docking such as AutoDock, FlexX, Glide, DOCK, LigandFit, ViSION were developed to be run on a supercomputer, a workstation, or a cluster-computer. However the virtual screening using a supercomputer has a problem that a supercomputer is very expensive and the virtual screening using a workstation or a cluster-computer requires a long execution time. Thus we propose a service-based virtual screening system using Grid computing technology which supports a large data intensive operation. We constructed 3-dimensional chemical molecular database for virtual screening. And we designed a resource broker and a data broker for supporting efficient molecular docking service and proposed various services for virtual screening. We implemented service based virtual screening system with DOCK 5.0 and Globus 3.2 toolkit. Our system can reduce a timeline and cost of drug or new material design.

Trends in Research Design and Methods: Research on Elementary and Secondary Mathematics Curriculum (연구 설계 및 연구 방법의 최근 동향: 초.중등 수학과 교육과정에 관한 연구를 중심으로)

  • Kim, Rae-Young;Kim, Goo-Yeon;Kwon, Na-Young
    • School Mathematics
    • /
    • v.14 no.3
    • /
    • pp.395-408
    • /
    • 2012
  • This study aims to examine the trends in research design and methods used in research on K-12 mathematics curriculum. By analyzing 124 peer-reviewed research articles published between 2000 and 2010, we concluded that more rigorous and various research design and methods should be conducted to improve educational research on curriculum. Although increasing scholarly attention has recently been given to systematic empirical studies about this topic, a large proportion of the studies examined in this study appeared to lack either a coherent conceptual framework or a systematic analytic tool or method. More effort needs to be made on improving the rigor of research in terms of research design and methods.

  • PDF

Towards an Understanding of User Satisfaction and Continuance Intention in Human-Mediated Services: An Investigation of Academic Libraries (인적서비스 이용자 만족도 및 지속의도의 이해: 대학도서관의 연구)

  • Lee, Bo-Ram;Park, Ji-Hong
    • Journal of Information Management
    • /
    • v.42 no.3
    • /
    • pp.187-210
    • /
    • 2011
  • This study aims at examining how academic library staffs' service quality affects the user satisfaction and continuance intention, and also seeking practical solutions for improving the satisfaction and continuance intention in academic libraries. Despite the value and importance of human-mediated library services which enable various library services to be more valuable, relatively few prior studies focuses on this topic. This study develops a conceptual framework based on the concepts of service quality, satisfaction, and continuance intention. This framework provides a useful guideline for data collection and data analyses. Values of this study include ensuring the continuance intention by suggesting strategies that may increase users' positive attitude toward human-mediated services in academic libraries, and methodologically, using both quantitative and qualitative methods.

Getting Closer to Consumer Performance Experience: Research on Performance Experience Components through Online Post Analysis (소비자의 공연 경험에 다가가기 - 온라인 게시글 분석을 통한 공연 경험의 구성요소 탐구 -)

  • Ko, Yena;Lee, Joongseek;Kim, Eun-mee;Lee, Soomin
    • Korean Association of Arts Management
    • /
    • no.52
    • /
    • pp.75-105
    • /
    • 2019
  • In studying culture consumption today, it is essential to understand and analyze the actual visitors' experiences in detail. This is deeply related to the fact that we can utilize subjective experience records that were previously inaccessible as data since plenty of people actually record many performance experiences in the media space such as social media. This study attempts to examine what elements actually consists of people's performance experience based on actual expression of the performance experience that exists online. For this, we collected two types of data. First, we collected posts which required performance recommendation on online platforms such as Jisik-In and Cafes to see how people describe what they want and analyzed data focusing on the modifiers. Results show that people mainly use modifiers that reflect the specific situation of the individual such as companion or age. In addition we analyzed how the experience was described after the show through the review posts of ticket booking site. Results show how expressions are centered around companions, revisit intentions, and viewing experiences besides elements such as story and music, which have been known as main satisfaction elements of performance experience in previous studies. In addition, we discussed the practical implications and limitations of the study as well as the theoretical discussion.

A Web-based Sensor Network Query and Data Management (웹 기반의 센서네트워크 질의 및 데이타 관리)

  • Hwang, Kwang-Il;Eom, Doo-Seop
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.11
    • /
    • pp.820-829
    • /
    • 2006
  • Wireless sensor networks consisting of hundreds to thousands of nodes are expected to be increasingly deployed in coming years, as they enable reliable monitoring and analysis of physical worlds. These networks have unique features that are very different from traditional networks, e.g., the numerous numbers of nodes, limitation in power, processing, and memory. Due to these unique features of wireless sensor networks, sensor data management including querying becomes a challenging problem. Furthermore, due to wide popularization of the Internet and its facility in use, it is generally accepted that an unattended network can be efficiently managed and monitored over the Internet. In particular, in order to more efficiently query and manage data in a sensor network. in this paper, the architecture of a sensor gateway including web-based query server is presented and its implementation detail is illustrated. The presented web-based gateway is largely divided into two important parts: Internet part and sensor network part. The sensor network part plays an important role of handling a variety of sensor networks, including flat or hierarchical network architecture, by using internally layered architecture for efficiently querying and managing data in a sensor network. In addition, the Internet part provides a modular gateway function for favorable exchange between the sensor network and Internet.

Analyzing the Impact of Species on Urban Development Using Meta Population Model (메타개체군 이론을 활용한 도시개발에 따른 생물 종 영향 평가 활용 가능성 분석)

  • Eun Sub Kim;Young Won Mo;Tae Yoon Park;Yoonho Jeon;Jiyoung Choi;Dong Kun Lee
    • Journal of Environmental Impact Assessment
    • /
    • v.32 no.2
    • /
    • pp.61-71
    • /
    • 2023
  • As differences in the impact of each species on a spatial scale occur, analysis at the landscape scale is necessary to evaluate the impact of a development project. In previous studies, the Incidence Function Model (IFM) based on meta population theory was used to analyze the impact of species on the environment that changes according to urban development. However, since the model was required at least 10 occupied areas, it is difficult to use it for species that are difficult to monitor such as endangered species. Therefore, we proposed the Incidence Function Model (IFM) using species distribution model to fill the species data. In addition, we reviewed whether the developed model can be used in environmental impact assessment. As a result of the analysis, the minimum occupancy of Prionailurus bengalensis on urban development decreased to 56.5% and the possibility of survival to 28.7%. We confirmed that It rapidly decreased from the reference points of 230 and 70habitats through analysis of the meta-population capacity according to the decrease in the number of habitats. These results can be assessing the environment impact of each species on habitat loss. And it can support decision-making on the minimum number and area of habitat for species protection. This study is expected to be used as basic data for environment impact assessment on before and after development projects and mitigation measures plans, thereby increasing the effectiveness of reduction plans.

Computation of Maintainability Index Using SysML-Based M&S Technique for Improved Weapon Systems Development (SysML 기반 모델링 및 시뮬레이션 기법을 활용한 무기체계 정비도 지수 산출)

  • Yoo, Yeon-Yong;Lee, Jae-Chon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.88-95
    • /
    • 2018
  • Maintainability indicates how easily a system can be restored to the normal state when a system failure occurs. Systems developed to have high maintainability can be competitive due to reduced maintenance time, workforce and resources. Quantification of the maintainability is possible in many ways, but only after prototype production or with historical data. As such, the graph theory and 3D model data have been used, but there are limitations in management efficiency and early use. To solve this problem, we studied the maintainability index of weapon systems using SysML-based modeling and simulation technique. A SysML structure diagram was generated to simultaneously model the system design and maintainability of system components by reflecting the maintainability attributes acquired from the system engineering tool. Then, a SysML parametric diagram was created to quantify the maintainability through simulation linked with MATLAB. As a result, an integrated model to account for system design and maintainability simultaneously has been presented. The model can be used from early design stages to identify components with low maintainability index. The design of such components can be changed to improve maintainability and thus to reduce the risks of cost overruns and time delays due to belated design changes.