• 제목/요약/키워드: knowledge discovery in database

검색결과 69건 처리시간 0.026초

효과적인 지식창출을 위한 인터넷 상의 지식채굴과정: 주식시장에의 응용 (Knowledge Discovery Process In Internet For Effective Knowledge Creation: Application To Stock Market)

  • 김경재;홍태호;한인구
    • 한국데이타베이스학회:학술대회논문집
    • /
    • 한국데이타베이스학회 1999년도 춘계공동학술대회: 지식경영과 지식공학
    • /
    • pp.105-113
    • /
    • 1999
  • 최근 데이터와 데이터베이스의 폭발적 증가에 따라 무한한 데이터 속에서 정보나 지식을 찾고자하는 지식채굴과정 (knowledge discovery process)에 대한 관심이 높아지고 있다. 특히 기업 내외부 데이터베이스 뿐만 아니라 데이터웨어하우스 (data warehouse)를 기반으로 하는 OLAP환경에서의 데이터와 인터넷을 통한 웹 (web)에서의 정보 등 정보원의 다양화와 첨단화에 따라 다양한 환경 하에서의 지식채굴과정이 요구되고 있다. 본 연구에서는 인터넷 상의 지식을 효과적으로 채굴하기 위한 지식채굴과정을 제안한다. 제안된 지식채굴과정은 명시지 (explicit knowledge)외에 암묵지 (tacit knowledge)를 지식채굴과정에 반영하기 위해 선행지식베이스 (prior knowledge base)와 선행지식관리시스템 (prior knowledge management system)을 이용한다. 선행지식관리시스템은 퍼지인식도(fuzzy cognitive map)를 이용하여 선행지식베이스를 구축하여 이를 통해 웹에서 찾고자 하는 유용한 정보를 정의하고 추출된 정보를 지식변환시스템 (knowledge transformation system)을 통해 통합적인 추론과정에 사용할 수 있는 형태로 변환한다. 제안된 연구모형의 유용성을 검증하기 위하여 재무자료에 선행지식을 제외한 자료와 선행지식을 포함한 자료를 사례기반추론 (case-based reasoning)을 이용하여 실험한 결과, 제안된 지식채굴과정이 유용한 것으로 나타났다.

  • PDF

Computing Post-translation Modification using FTMS

  • Shen, Wei;Sung, Wing-Kin;SZE, Siu Kwan
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.331-336
    • /
    • 2005
  • Post translational modifications (PTMs) discovery is an important problem in proteomic. In the past, people discover PTMs by Tandem Mass Spectrometer based on ‘bottom-up’ strategy. However, such strategy suffers from the problem of failing to discover all PTMs. Recently, due to the improvement in proteomic technology, Taylor et al. proposed a database software to discover PTMs with ‘topdown’ strategy by FTMS, which avoids the disadvantages of ‘bottom-up’ approach. However, their proposed algorithm runs in exponential time, requires a database of proteins, and needs prior knowledge about PTM sites. In this paper, a new algorithm is proposed which can work without a protein database and can identify modifications in polynomial time. Besides, no prior knowledge about PTM sites is needed.

  • PDF

Knowledge Discovery in Nursing Minimum Data Set Using Data Mining

  • Park Myong-Hwa;Park Jeong-Sook;Kim Chong-Nam;Park Kyung-Min;Kwon Young-Sook
    • 대한간호학회지
    • /
    • 제36권4호
    • /
    • pp.652-661
    • /
    • 2006
  • Purpose. The purposes of this study were to apply data mining tool to nursing specific knowledge discovery process and to identify the utilization of data mining skill for clinical decision making. Methods. Data mining based on rough set model was conducted on a large clinical data set containing NMDS elements. Randomized 1000 patient data were selected from year 1998 database which had at least one of the five most frequently used nursing diagnoses. Patient characteristics and care service characteristics including nursing diagnoses, interventions and outcomes were analyzed to derive the meaningful decision rules. Results. Number of comorbidity, marital status, nursing diagnosis related to risk for infection and nursing intervention related to infection protection, and discharge status were the predictors that could determine the length of stay. Four variables (age, impaired skin integrity, pain, and discharge status) were identified as valuable predictors for nursing outcome, relived pain. Five variables (age, pain, potential for infection, marital status, and primary disease) were identified as important predictors for mortality. Conclusions. This study demonstrated the utilization of data mining method through a large data set with stan dardized language format to identify the contribution of nursing care to patient's health.

Integrated Method for Knowledge Discovery in Databases

  • Hong Chung;Park, Kyoung-Oak;Chung, Hwan-Mook
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1998년도 The Third Asian Fuzzy Systems Symposium
    • /
    • pp.122-127
    • /
    • 1998
  • This paper suggests an integrated method for discovering knowledge from a large database. Our approach applies an attribute-oriented concept hierarchy ascension technique to extract generalized data from actural data in databases, induction of decision trees to measure the value of information, and knowledge reduction of rough set theory to remove dispensable attributes and attribute values. The integrated algorithm first reduce the size of database for the concept generalization, reduces the number of attributes by way of elimination condition attributes which have little influence on decision attribute, and finally induces simplified decision rules removing the dispensable attribute values by analyzing the dependency relationships among the attributes.

  • PDF

Applying Decision Tree Algorithms for Analyzing HS-VOSTS Questionnaire Results

  • Kang, Dae-Ki
    • 공학교육연구
    • /
    • 제15권4호
    • /
    • pp.41-47
    • /
    • 2012
  • Data mining and knowledge discovery techniques have shown to be effective in finding hidden underlying rules inside large database in an automated fashion. On the other hand, analyzing, assessing, and applying students' survey data are very important in science and engineering education because of various reasons such as quality improvement, engineering design process, innovative education, etc. Among those surveys, analyzing the students' views on science-technology-society can be helpful to engineering education. Because, although most researches on the philosophy of science have shown that science is one of the most difficult concepts to define precisely, it is still important to have an eye on science, pseudo-science, and scientific misconducts. In this paper, we report the experimental results of applying decision tree induction algorithms for analyzing the questionnaire results of high school students' views on science-technology-society (HS-VOSTS). Empirical results on various settings of decision tree induction on HS-VOSTS results from one South Korean university students indicate that decision tree induction algorithms can be successfully and effectively applied to automated knowledge discovery from students' survey data.

An Efficient Algorithm for Mining Frequent Sequences In Spatiotemporal Data

  • ;지정희;류근호
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 한국공간정보시스템학회 2005년도 추계학술대회
    • /
    • pp.61-66
    • /
    • 2005
  • Spatiotemporal data mining represents the confluence of several fields including spatiotemporal databases, machine loaming, statistics, geographic visualization, and information theory. Exploration of spatial data mining and temporal data mining has received much attention independently in knowledge discovery in databases and data mining research community. In this paper, we introduce an algorithm Max_MOP for discovering moving sequences in mobile environment. Max_MOP mines only maximal frequent moving patterns. We exploit the characteristic of the problem domain, which is the spatiotemporal proximity between activities, to partition the spatiotemporal space. The task of finding moving sequences is to consider all temporally ordered combination of associations, which requires an intensive computation. However, exploiting the spatiotemporal proximity characteristic makes this task more cornputationally feasible. Our proposed technique is applicable to location-based services such as traffic service, tourist service, and location-aware advertising service.

  • PDF

ICAIM;An Improved CAIM Algorithm for Knowledge Discovery

  • Yaowapanee, Piriya;Pinngern, Ouen
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.2029-2032
    • /
    • 2004
  • The quantity of data were rapidly increased recently and caused the data overwhelming. This led to be difficult in searching the required data. The method of eliminating redundant data was needed. One of the efficient methods was Knowledge Discovery in Database (KDD). Generally data can be separate into 2 cases, continuous data and discrete data. This paper describes algorithm that transforms continuous attributes into discrete ones. We present an Improved Class Attribute Interdependence Maximization (ICAIM), which designed to work with supervised data, for discretized process. The algorithm does not require user to predefine the number of intervals. ICAIM improved CAIM by using significant test to determine which interval should be merged to one interval. Our goal is to generate a minimal number of discrete intervals and improve accuracy for classified class. We used iris plant dataset (IRIS) to test this algorithm compare with CAIM algorithm.

  • PDF

Applied Computational Tools for Crop Genome Research

  • Love Christopher G;Batley Jacqueline;Edwards David
    • Journal of Plant Biotechnology
    • /
    • 제5권4호
    • /
    • pp.193-195
    • /
    • 2003
  • A major goal of agricultural biotechnology is the discovery of genes or genetic loci which are associated with characteristics beneficial to crop production. This knowledge of genetic loci may then be applied to improve crop breeding. Agriculturally important genes may also benefit crop production through transgenic technologies. Recent years have seen an application of high throughput technologies to agricultural biotechnology leading to the production of large amounts of genomic data. The challenge today is the effective structuring of this data to permit researchers to search, filter and importantly, make robust associations within a wide variety of datasets. At the Plant Biotechnology Centre, Primary Industries Research Victoria in Melbourne, Australia, we have developed a series of tools and computational pipelines to assist in the processing and structuring of genomic data to aid its application to agricultural biotechnology resear-ch. These tools include a sequence database, ASTRA, for the processing and annotation of expressed sequence tag data. Tools have also been developed for the discovery of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) molecular markers from large sequence datasets. Application of these tools to Brassica research has assisted in the production of genetic and comparative physical maps as well as candidate gene discovery for a range of agronomically important traits.

설명기반 유전자알고리즘을 활용한 경영성과 데이터베이스이 데이터마이닝 (Data-Mining in Business Performance Database Using Explanation-Based Genetic Algorithms)

  • 조성훈;정민용
    • 경영과학
    • /
    • 제18권1호
    • /
    • pp.135-145
    • /
    • 2001
  • In recent environment of dynamic management, there is growing recognition that information and knowledge management systems are essential for efficient/effective decision making by CEO. To cope with this situation, we suggest the Data-Miming scheme as a key component of integrated information and knowledge management system. The proposed system measures business performance by considering both VA(Value-Added), which represents stakeholder’s point of view and EVA (Economic Value-Added), which represents shareholder’s point of view. To mine the new information & Knowledge discovery, we applied the improved genetic algorithms that consider predictability, understandability (lucidity) and reasonability factors simultaneously, we use a linear combination model for GAs learning structure. Although this model’s predictability will be more decreased than non-linear model, this model can increase the knowledge’s understandability that is meaning of induced values. Moreover, we introduce a random variable scheme based on normal distribution for initial chromosomes in GAs, so we can expect to increase the knowledge’s reasonability that is degree of expert’s acceptability. the random variable scheme based on normal distribution uses statistical correlation/determination coefficient that is calculated with training data. To demonstrate the performance of the system, we conducted a case study using financial data of Korean automobile industry over 16 years from 1981 to 1996, which is taken from database of KISFAS (Korea Investors Services Financial Analysis System).

  • PDF

기술-산업 연계구조 및 특허 분석을 통한 미래유망 아이템 발굴 (Discovery of promising business items by technology-industry concordance and keyword co-occurrence analysis of US patents.)

  • 고병열;노현숙
    • 기술혁신학회지
    • /
    • 제8권2호
    • /
    • pp.860-885
    • /
    • 2005
  • This study relates to develop a quantitative method through which promising technology-based business items can be discovered and selected. For this study, we utilized patent trend analysis, technology-industry concordance analysis, and keyword co-occurrence analysis of US patents. By analyzing patent trends and technology-industry concordance, we were able to find out the emerging industry trends : prevalence of bio industry, service industry, and B2C business. From the direct and co-occurrence analysis of newly discovered patent keywords in the year, 2000, 28 promising business item candidates were extracted. Finally, the promising item candidates were prioritized using 4 business attractiveness determinants; market size, product life cycle, degree of the technological innovation, and coincidence with the industry trends. This result implicates that reliable discovery and selection of promising technology-based business items can be performed by a quantitative, objective and low- cost process using knowledge discovery method from patent database instead of peer review.

  • PDF