• Title/Summary/Keyword: Bayesian Method

Search Result 1,140, Processing Time 0.033 seconds

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

Enhancement of Buckling Characteristics for Composite Square Tube by Load Type Analysis (하중유형 분석을 통한 좌굴에 강한 복합재료 사각관 설계에 관한 연구)

  • Seokwoo Ham;Seungmin Ji;Seong S. Cheon
    • Composites Research
    • /
    • v.36 no.1
    • /
    • pp.53-58
    • /
    • 2023
  • The PIC design method is assigning different stacking sequences for each shell element through the preliminary FE analysis. In previous study, machine learning was applied to the PIC design method in order to assign the region efficiently, and the training data is labeled by dividing each region into tension, compression, and shear through the preliminary FE analysis results value. However, since buckling is not considered, when buckling occurs, it can't be divided into appropriate loading type. In the present study, it was proposed PIC-NTL (PIC design using novel technique for analyzing load type) which is method for applying a novel technique for analyzing load type considering buckling to the conventional PIC design. The stress triaxiality for each ply were analyzed for buckling analysis, and the representative loading type was designated through the determined loading type within decision area divided into two regions of the same size in the thickness direction of the elements. The input value of the training data and label consisted in coordination of element and representative loading type of each decision area, respectively. A machine learning model was trained through the training data, and the hyperparameters that affect the performance of the machine learning model were tuned to optimal values through Bayesian algorithm. Among the tuned machine learning models, the SVM model showed the highest performance. Most effective stacking sequence were mapped into PIC tube based on trained SVM model. FE analysis results show the design method proposed in this study has superior external loading resistance and energy absorption compared to previous study.

Construction of Multiple Classifier Systems based on a Classifiers Pool (인식기 풀 기반의 다수 인식기 시스템 구축방법)

  • Kang, Hee-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.8
    • /
    • pp.595-603
    • /
    • 2002
  • Only a few studies have been conducted on how to select multiple classifiers from the pool of available classifiers for showing the good classification performance. Thus, the selection problem if classifiers on how to select or how many to select still remains an important research issue. In this paper, provided that the number of selected classifiers is constrained in advance, a variety of selection criteria are proposed and applied to tile construction of multiple classifier systems, and then these selection criteria will be evaluated by the performance of the constructed multiple classifier systems. All the possible sets of classifiers are trammed by the selection criteria, and some of these sets are selected as the candidates of multiple classifier systems. The multiple classifier system candidates were evaluated by the experiments recognizing unconstrained handwritten numerals obtained both from Concordia university and UCI machine learning repository. Among the selection criteria, particularly the multiple classifier system candidates by the information-theoretic selection criteria based on conditional entropy showed more promising results than those by the other selection criteria.

Modeling the Trend of Apartment Market Price in Seoul (서울시 아파트 가격 추세의 모형화)

  • Hwang, Eun-Yeon;Kwon, Yong-Chan;Jang, Dong-Ik;Lee, Jae-Yong;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.2
    • /
    • pp.173-191
    • /
    • 2008
  • The goal of this paper is analyzing and modeling the trend of apartment market price in Seoul using the dynamic linear model(DLM). We use the market price per pyeong of 30-pyeong-apartment provided by "KB apartment market price database" of Kookmin bank. The data is collected from June $24^{th}$, 2003 to August $28^{th}$, 2006. The inspection of the data reveals that the trend of apartment market price in Seoul can be divided into two groups and we assume that the price is expressed by the common trend of divided groups. We try to estimate the price of apartment by DLM using the Bayesian method.

Design of a User Location Prediction Algorithm Using the Cache Scheme (캐시 기법을 이용한 위치 예측 알고리즘 설계)

  • Son, Byoung-Hee;Kim, Sang-Hee;Nahm, Eui-Seok;Kim, Hag-Bae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.6B
    • /
    • pp.375-381
    • /
    • 2007
  • This paper focuses on the prediction algorithm among the context-awareness technologies. With a representative algorithm, Bayesian Networks, it is difficult to realize a context-aware as well as to decrease process time in real-time environment. Moreover, it is also hard to be sure about the accuracy and reliability of prediction. One of the simplest algorithms is the sequential matching algorithm. We use it by adding the proposed Cache Scheme. It is adequate for a context-aware service adapting user's habit and reducing the processing time by average 48.7% in this paper. Thus, we propose a design method of user location prediction algorithm that uses sequential matching with the cache scheme by taking user's habit or behavior into consideration. The novel approach will be dealt in a different way compared to the conventional prediction algorithm.

Rule Generation and Approximate Inference Algorithms for Efficient Information Retrieval within a Fuzzy Knowledge Base (퍼지지식베이스에서의 효율적인 정보검색을 위한 규칙생성 및 근사추론 알고리듬 설계)

  • Kim Hyung-Soo
    • Journal of Digital Contents Society
    • /
    • v.2 no.2
    • /
    • pp.103-115
    • /
    • 2001
  • This paper proposes the two algorithms which generate a minimal decision rule and approximate inference operation, adapted the rough set and the factor space theory in fuzzy knowledge base. The generation of the minimal decision rule is executed by the data classification technique and reduct applying the correlation analysis and the Bayesian theorem related attribute factors. To retrieve the specific object, this paper proposes the approximate inference method defining the membership function and the combination operation of t-norm in the minimal knowledge base composed of decision rule. We compare the suggested algorithms with the other retrieval theories such as possibility theory, factor space theory, Max-Min, Max-product and Max-average composition operations through the simulation generating the object numbers and the attribute values randomly as the memory size grows. With the result of the comparison, we prove that the suggested algorithm technique is faster than the previous ones to retrieve the object in access time.

  • PDF

Automated K-Means Clustering and R Implementation (자동화 K-평균 군집방법 및 R 구현)

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.723-733
    • /
    • 2009
  • The crucial problems of K-means clustering are deciding the number of clusters and initial centroids of clusters. Hence, the steps of K-means clustering are generally consisted of two-stage clustering procedure. The first stage is to run hierarchical clusters to obtain the number of clusters and cluster centroids and second stage is to run nonhierarchical K-means clustering using the results of first stage. Here we provide automated K-means clustering procedure to be useful to obtain initial centroids of clusters which can also be useful for large data sets, and provide software program implemented using R.

Comparison of the fit of automatic milking system and test-day records with the use of lactation curves

  • Sitkowska, B.;Kolenda, M.;Piwczynski, D.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.3
    • /
    • pp.408-415
    • /
    • 2020
  • Objective: The aim of the paper was to compare the fit of data derived from daily automatic milking systems (AMS) and monthly test-day records with the use of lactation curves; data was analysed separately for primiparas and multiparas. Methods: The study was carried out on three Polish Holstein-Friesians (PHF) dairy herds. The farms were equipped with an automatic milking system which provided information on milking performance throughout lactation. Once a month cows were also subjected to test-day milkings (method A4). Most studies described in the literature are based on test-day data; therefore, we aimed to compare models based on both test-day and AMS data to determine which mathematical model (Wood or Wilmink) would be the better fit. Results: Results show that lactation curves constructed from data derived from the AMS were better adjusted to the actual milk yield (MY) data regardless of the lactation number and model. Also, we found that the Wilmink model may be a better fit for modelling the lactation curve of PHF cows milked by an AMS as it had the lowest values of Akaike information criterion, Bayesian information criterion, mean square error, the highest coefficient of determination values, and was more accurate in estimating MY than the Wood model. Although both models underestimated peak MY, mean, and total MY, the Wilmink model was closer to the real values. Conclusion: Models of lactation curves may have an economic impact and may be helpful in terms of herd management and decision-making as they assist in forecasting MY at any moment of lactation. Also, data obtained from modelling can help with monitoring milk performance of each cow, diet planning, as well as monitoring the health of the cow.

Intelligent Diagnosing Method Based on the Conditional Probability for the Pancreatic Cancer Early Detection (췌장암 조기진단을 위한 조건부 확률 기반 지능형 진단 방식)

  • JANG, IK GYU;JUNG, JOONHO;KO, JAE HO;MOON, HYUN SEOK;JO, YUNG HO
    • Journal of Biomedical Engineering Research
    • /
    • v.38 no.5
    • /
    • pp.227-231
    • /
    • 2017
  • Early diagnosis of pancreatic cancer had been considered one of the important barrier for successful therapy since the five year survival rate after treatment of pancreatic cancer was critically low. Nonetheless, patients often miss the golden time of treatment because they rarely visit the hospital until their symptoms are severe. To overcome these problems, a lot of information about the patient's symptoms should be applied as biomarkers for early diagnosis. For this reason, a biomarker for early detection of pancreatic cancer (CA19-9) has been developed as a diagnostic kit. However, since the diagnosis is not accurate enough, pancreatic symptoms (abdominal pain, jaundice, anorexia, diabetes, etc.) and biomarkers (CA19-9) should be considered together. We develop an intelligent diagnostic system that considers CA19-9 and the incidence of pancreatic cancer for pancreatic symptoms that was determined by studying a large number of patient information. It shows a higher accuracy than one using CA19-9 alone. It may increase the survival rate of pancreatic cancer because it can diagnose pancreatic cancer early.

The Weighted Polya Posterior Confidence Interval For the Difference Between Two Independent Proportions (독립표본에서 두 모비율의 차이에 대한 가중 POLYA 사후분포 신뢰구간)

  • Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.171-181
    • /
    • 2006
  • The Wald confidence interval has been considered as a standard method for the difference of proportions. However, the erratic behavior of the coverage probability of the Wald confidence interval is recognized in various literatures. Various alternatives have been proposed. Among them, Agresti-Caffo confidence interval has gained the reputation because of its simplicity and fairly good performance in terms of coverage probability. It is known however, that the Agresti-Caffo confidence interval is conservative. In this note, a confidence interval is developed using the weighted Polya posterior which was employed to obtain a confidence interval for the binomial proportion in Lee(2005). The resulting confidence interval is simple and effective in various respects such as the closeness of the average coverage probability to the nominal confidence level, the average expected length and the mean absolute error of the coverage probability. Practically it can be used for the interval estimation of the difference of proportions for any sample sizes and parameter values.