• Title/Summary/Keyword: bayesian classification

Search Result 254, Processing Time 0.026 seconds

Bayesian analysis of finite mixture model with cluster-specific random effects (군집 특정 변량효과를 포함한 유한 혼합 모형의 베이지안 분석)

  • Lee, Hyejin;Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.57-68
    • /
    • 2017
  • Clustering algorithms attempt to find a partition of a finite set of objects in to a potentially predetermined number of nonempty subsets. Gibbs sampling of a normal mixture of linear mixed regressions with a Dirichlet prior distribution calculates posterior probabilities when the number of clusters was known. Our approach provides simultaneous partitioning and parameter estimation with the computation of classification probabilities. A Monte Carlo study of curve estimation results showed that the model was useful for function estimation. Examples are given to show how these models perform on real data.

Differences of Cold-heat Patterns between Healthy and Disease Group (건강군과 질환군의 한열지표 차이에 관한 고찰)

  • Kim Ji-Eun;Lee Seung-Gi;Ryu Hwa-Seung;Park Kyung-Mo
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.20 no.1
    • /
    • pp.224-228
    • /
    • 2006
  • The pattern identification of exterior-interior syndrome and cold-heat syndrome is one of the diagnostic methods using most frequently in Oriental medicine. There was no systematic studies analyzing the characteristics of the 'exterior-interior and cold-heat' between healthy and disease group. In this study, cold-heat pattern, blood pressure, pulse rate, height and weight are recorded from 100 healthy subjects and 196 disease subjects with age ranging from 30 to 59 years. To analyze the differences between healthy and disease group, we used the descriptive statistics. And linear regression function, linear support vector machine and bayesian classifier were used for distinguishing healthy group from disease group. The score of both exterior-heat and interior-cold in healthy group is higher than the score in disease group. This means that if one belongs to the disease group, his(or her) exterior gets cold and his interior gets hot. And also, these result have no relevance to age. But, the attempt to classify healthy group from disease group with a exterior-interior and cold-heat and other vital signs did not have good performance. It mean that even though they have a different trend each other, only these kinds of information couldn't classify healthy group and disease group.

Mitochondrial Genome of Spirometra theileri Compared with Other Spirometra Species

  • Ndosi, Barakaeli Abdieli;Park, Hansol;Lee, Dongmin;Choe, Seongjun;Kang, Yeseul;Nath, Tilak Chandra;Bia, Mohammed Mebarek;Eamudomkarn, Chatanun;Jeon, Hyeong-Kyu;Eom, Keeseon S.
    • Parasites, Hosts and Diseases
    • /
    • v.59 no.2
    • /
    • pp.139-148
    • /
    • 2021
  • This study was carried out to provide information on the taxonomic classification and analysis of mitochondrial genomes of Spirometra theileri. One strobila of S. theileri was collected from the intestine of an African leopard (Panthera pardus) in the Maswa Game Reserve, Tanzania. The complete mtDNA sequence of S. theileri was 13,685 bp encoding 36 genes including 12 protein genes, 22 tRNAs and 2 rRNAs with absence of atp8. Divergences of 12 protein-coding genes were as follow: 14.9% between S. theileri and S. erinaceieuropaei, 14.7% between S. theileri and S. decipiens, and 14.5% between S. theileri with S. ranarum. Divergences of 12 proteins of S. theileri and S. erinaceieuropaei ranged from 2.3% in cox1 to 15.7% in nad5, while S. theileri varied from S. decipiens and S. ranarum by 1.3% in cox1 to 15.7% in nad3. Phylogenetic relationship of S. theileri with eucestodes inferred using the maximum likelihood and Bayesian inferences exhibited identical tree topologies. A clade composed of S. decipiens and S. ranarum formed a sister species to S. erinaceieuropaei, and S. theileri formed a sister species to all species in this clade. Within the diphyllobothridean clade, Dibothriocephalus, Diphyllobothrium and Spirometra formed a monophyletic group, and sister genera were well supported.

Analysis of Molecular Variance and Population Structure of Sesame (Sesamum indicum L.) Genotypes Using Simple Sequence Repeat Markers

  • Asekova, Sovetgul;Kulkarni, Krishnanand P.;Oh, Ki Won;Lee, Myung-Hee;Oh, Eunyoung;Kim, Jung-In;Yeo, Un-Sang;Pae, Suk-Bok;Ha, Tae Joung;Kim, Sung Up
    • Plant Breeding and Biotechnology
    • /
    • v.6 no.4
    • /
    • pp.321-336
    • /
    • 2018
  • Sesame (Sesamum indicum L.) is an important oilseed crop grown in tropical and subtropical areas. The objective of this study was to investigate the genetic relationships among 129 sesame landraces and cultivars using simple sequence repeat (SSR) markers. Out of 70 SSRs, 23 were found to be informative and produced 157 alleles. The number of alleles per locus ranged from 3 - 14, whereas polymorphic information content ranged from 0.33 - 0.86. A distance-based phylogenetic analysis revealed two major and six minor clusters. The population structure analysis using a Bayesian model-based program in STRUCTURE 2.3.4 divided 129 sesame accessions into three major populations (K = 3). Based on pairwise comparison estimates, Pop1 was observed to be genetically close to Pop2 with $F_{ST}$ value of 0.15, while Pop2 and Pop3 were genetically closest with $F_{ST}$ value of 0.08. Analysis of molecular variance revealed a high percentage of variability among individuals within populations (85.84%) than among the populations (14.16%). Similarly, a high variance was observed among the individuals within the country of origins (90.45%) than between the countries of origins. The grouping of genotypes in clusters was not related to their geographic origin indicating considerable gene flow among sesame genotypes across the selected geographic regions. The SSR markers used in the present study were able to distinguish closely linked sesame genotypes, thereby showing their usefulness in assessing the potentially important source of genetic variation. These markers can be used for future sesame varietal classification, conservation, and other breeding purposes.

A Study on Detection of Small Size Malicious Code using Data Mining Method (데이터 마이닝 기법을 이용한 소규모 악성코드 탐지에 관한 연구)

  • Lee, Taek-Hyun;Kook, Kwang-Ho
    • Convergence Security Journal
    • /
    • v.19 no.1
    • /
    • pp.11-17
    • /
    • 2019
  • Recently, the abuse of Internet technology has caused economic and mental harm to society as a whole. Especially, malicious code that is newly created or modified is used as a basic means of various application hacking and cyber security threats by bypassing the existing information protection system. However, research on small-capacity executable files that occupy a large portion of actual malicious code is rather limited. In this paper, we propose a model that can analyze the characteristics of known small capacity executable files by using data mining techniques and to use them for detecting unknown malicious codes. Data mining analysis techniques were performed in various ways such as Naive Bayesian, SVM, decision tree, random forest, artificial neural network, and the accuracy was compared according to the detection level of virustotal. As a result, more than 80% classification accuracy was verified for 34,646 analysis files.

An Effective Feature Generation Method for Distributed Denial of Service Attack Detection using Entropy (엔트로피를 이용한 분산 서비스 거부 공격 탐지에 효과적인 특징 생성 방법 연구)

  • Kim, Tae-Hun;Seo, Ki-Taek;Lee, Young-Hoon;Lim, Jong-In;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.4
    • /
    • pp.63-73
    • /
    • 2010
  • Malicious bot programs, the source of distributed denial of service attack, are widespread and the number of PCs which were infected by malicious bot program are increasing geometrically thesedays. The continuous distributed denial of service attacks are happened constantly through these bot PCs and some financial incident cases have found lately. Therefore researches to response distributed denial of service attack are necessary so we propose an effective feature generation method for distributed denial of service attack detection using entropy. In this paper, we apply our method to both the DARPA 2000 datasets and also the distributed denial of service attack datasets that we composed and generated ourself in general university. And then we evaluate how the proposed method is useful through classification using bayesian network classifier.

A Phylogenetic Analysis of Otters (Lutra lutra) Inhabiting in the Gyeongnam Area Using D-Loop Sequence of mtDNA and Microsatellite Markers (경남지역 수달(Lutra lutra)의 mitochondrial DNA D-loop지역과 microsatellite marker를 이용한 계통유전학적 유연관계 분석)

  • Park, Moon-Sung;Lim, Hyun-Tae;Oh, Ki-Cheol;Moon, Young-Rok;Kim, Jong-Gap;Jeon, Jin-Tae
    • Journal of Life Science
    • /
    • v.21 no.3
    • /
    • pp.385-392
    • /
    • 2011
  • The otter (Lutra lutra) in Korea is classified as a first grade endangered species and is managed under state control. We performed a phylogenetic analysis of the otter that inhabits the Changnyeong, Jinju, and Geoje areas in Gyeongsangnamdo, Korea using mtDNA and microsatellite (MS) markers. As a result of the analysis using the 676-bp D-loop sequence of mtDNA, six haplotypes were estimated from five single nucleotide polymorphisms. The genetic distance between the Jinju and Geoje areas was greater than distances within the areas, and the distance between Jinju and Geoje was especially clear. From the phylogenetic tree estimated using the Bayesian Markov chain Monte Carlo analysis by the MrBays program, two subgroups, one containing samples from Jinju and the other containing samples from the Changnyeong and Geoje areas were clearly identified. The result of a parsimonious median-joining network analysis also showed two clear subgroups, supporting the result of the phylogenetic analysis. On the other hand, in the consensus tree estimated using the genetic distances estimated from the genotypes of 13 MS markers, there were clear two subgroups, one containing samples from the Jinju, Geoje and Changnyeong areas and the other containing samples from only the Jinju area. The samples were not identically classified into each subgroup defined by mtDNA and MS markers. It could be inferred that the differential classification of samples by the two different marker systems was because of the different characteristics of the marker systems used, that is, the mtDNA was for detecting maternal lineage and the MS markers were for estimating autosomal genetic distances. Nonetheless, the results from the two marker systems showed that there has been a progressive genetic fixation according to the habitats of the otters. Further analyses using not only newly developed MS markers that will possess more analytical power but also the whole mtDNA are needed. Expansion of the phylogenetic analysis using otter samples collected from the major habitats in Korea should be helpful in scientifically and efficiently maintaining and preserving them.

Change Detection of land-surface Environment in Gongju Areas Using Spatial Relationships between Land-surface Change and Geo-spatial Information (지표변화와 지리공간정보의 연관성 분석을 통한 공주지역 지표환경 변화 분석)

  • Jang Dong-Ho
    • Journal of the Korean Geographical Society
    • /
    • v.40 no.3 s.108
    • /
    • pp.296-309
    • /
    • 2005
  • In this study, we investigated the change of future land-surface and relationships of land-surface change with geo-spatial information, using a Bayesian prediction model based on a likelihood ratio function, for analysing the land-surface change of the Gongju area. We classified the land-surface satellite images, and then extracted the changing area using a way of post classification comparison. land-surface information related to the land-surface change is constructed in a GIS environment, and the map of land-surface change prediction is made using the likelihood ratio function. As the results of this study, the thematic maps which definitely influence land-surface change of rural or urban areas are elevation, water system, population density, roads, population moving, the number of establishments, land price, etc. Also, thematic maps which definitely influence the land-surface change of forests areas are elevation, slope, population density, population moving, land price, etc. As a result of land-surface change analysis, center proliferation of old and new downtown is composed near Gum-river, and the downtown area will spread around the local roads and interchange areas in the urban area. In case of agricultural areas, a small tributary of Gum-river or an area of local roads which are attached with adjacent areas showed the high probability of change. Most of the forest areas are located in southeast and from this result we can guess why the wide chestnut-tree cultivation complex is located in these areas and the capability of forest damage is very high. As a result of validation using a prediction rate curve, a capability of prediction of urban area is $80\%$, agriculture area is $55\%$, forest area is $40\%$ in higher $10\%$ of possibility which the land-surface change would occur. This integration model is unsatisfactory to Predict the forest area in the study area and thus as a future work, it is necessary to apply new thematic maps or prediction models In conclusion, we can expect that this way can be one of the most essential land-surface change studies in a few years.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

CLINICAL STUDY OF POSITRON EMISSION TOMOGRAPHY WITH $[^{18}F]$-FLUORODEOXYGLUCOSE IN MAXILLOFACIAL TUMOR DIAGNOSIS (구강 악안면 영역의 암종 진단에 있어서 $[^{18}F]$-Fluorodeoxyglucose를 이용한 양전자방출 단층촬영의 임상적 연구)

  • Kim, Jae-Hwan;Kim, Kyung-Wook;Kim, Yong-Kack
    • Journal of the Korean Association of Oral and Maxillofacial Surgeons
    • /
    • v.26 no.5
    • /
    • pp.462-469
    • /
    • 2000
  • Positron Emission Tomography(PET) is a new diagnostic method that can create functional images of the distribution of positron emitting radionuclides, which when administered intravenously in the body, makes possible anatomical and functional analysis by quantity of biochemical and physiological process. After genetic and biochemical changes in initial stage, malignant tumor undergoes functional changes before undergoing anatomical changes. So, early diagnosis of malignant tumors by functional analysis with PET can be achieved, replacing traditional anatomical analysis, such as computed tomography(CT) and magnetic resonance image(MRI), etc. Similarly, PET can identify malignant tumor without confusion with scar and fibrosis in follow up check. In the Korea Cancer Center Hospital(KCCH) from October 1997 to September 1999, clinical study was performed in 79 cases that underwent 89 times PET evaluation with [18F]-Fluorodeoxyglucose for diagnosis of oral and maxillofacial tumors, and the data was analysed by Bayesian $2{\times}2$ Classification Table. The results were as follows : Evaluation for initial diagnosis with FDG-PET (P<0.005) 1. Agreement rate or accuracy rate is 88.9%. 2. Sensitivity is 95.2%, and specificity 66.7%. 3. Positive predictive rate is 90.9%, and negative predictive rate 80.0%. 4. In consideration of tumor stage, diagnostic rate in less than stage II was 90% and in greater than stage III 100%. 5. In consideration of tumor size, diagnostic rate in less than T2 was 92.3% and in greater than T3 100%. After primary treatment, evaluation for follow up check with FDG-PET (P < 0.001) 1. Agreement rate or accuracy rate is 85.4%. 2. Sensitivity is 87.5%, and specificity 82.4%. 3. Positive predictive rate is 87.5%, and negative predictive rate 82.4%. 4. In 24 recurred cases, 6 had distant metastasis, and 5 of them were diagnosed with FDG-PET, resulting in diagnostic rate of FDG-PET of 83.3%. From the above results, Positron Emission Tomography with [18F]- Fluorodeoxyglucose appears to be more sensitive and accurate for detecting the presence of oral and maxillofacial tumors, and has various clinical applications such as early diagnosis of tumor in initial and follow up check and detection of distant metastasis.

  • PDF