• Title/Summary/Keyword: Bayesian model

Search Result 1,315, Processing Time 0.028 seconds

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.110-114
    • /
    • 2010
  • We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.

Classifying Indian Medicinal Leaf Species Using LCFN-BRNN Model

  • Kiruba, Raji I;Thyagharajan, K.K;Vignesh, T;Kalaiarasi, G
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3708-3728
    • /
    • 2021
  • Indian herbal plants are used in agriculture and in the food, cosmetics, and pharmaceutical industries. Laboratory-based tests are routinely used to identify and classify similar herb species by analyzing their internal cell structures. In this paper, we have applied computer vision techniques to do the same. The original leaf image was preprocessed using the Chan-Vese active contour segmentation algorithm to efface the background from the image by setting the contraction bias as (v) -1 and smoothing factor (µ) as 0.5, and bringing the initial contour close to the image boundary. Thereafter the segmented grayscale image was fed to a leaky capacitance fired neuron model (LCFN), which differentiates between similar herbs by combining different groups of pixels in the leaf image. The LFCN's decay constant (f), decay constant (g) and threshold (h) parameters were empirically assigned as 0.7, 0.6 and h=18 to generate the 1D feature vector. The LCFN time sequence identified the internal leaf structure at different iterations. Our proposed framework was tested against newly collected herbal species of natural images, geometrically variant images in terms of size, orientation and position. The 1D sequence and shape features of aloe, betel, Indian borage, bittergourd, grape, insulin herb, guava, mango, nilavembu, nithiyakalyani, sweet basil and pomegranate were fed into the 5-fold Bayesian regularization neural network (BRNN), K-nearest neighbors (KNN), support vector machine (SVM), and ensemble classifier to obtain the highest classification accuracy of 91.19%.

Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression (베이즈 정보 기준을 활용한 분할-정복 벌점화 분위수 회귀)

  • Kang, Jongkyeong;Han, Seokwon;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.217-227
    • /
    • 2022
  • Quantile regression is widely used in many fields based on the advantage of providing an efficient tool for examining complex information latent in variables. However, modern large-scale and high-dimensional data makes it very difficult to estimate the quantile regression model due to limitations in terms of computation time and storage space. Divide-and-conquer is a technique that divide the entire data into several sub-datasets that are easy to calculate and then reconstruct the estimates of the entire data using only the summary statistics in each sub-datasets. In this paper, we studied on a variable selection method using Bayes information criteria by applying the divide-and-conquer technique to the penalized quantile regression. When the number of sub-datasets is properly selected, the proposed method is efficient in terms of computational speed, providing consistent results in terms of variable selection as long as classical quantile regression estimates calculated with the entire data. The advantages of the proposed method were confirmed through simulation data and real data analysis.

An Analysis on the Decoupling between Energy Consumption and Economic Growth in South Korea (한국의 에너지 소비와 경제성장의 탈동조화에 대한 분석)

  • Hyun-Soo Kang
    • Asia-Pacific Journal of Business
    • /
    • v.14 no.4
    • /
    • pp.305-318
    • /
    • 2023
  • Purpose - This study analyzed the decoupling phenomenon between energy consumption and economic growth in Korea from 1990 to 2021. The main purpose of this study is to suggest policy implications for achieving a low-carbon society and decoupling that Korea must move forward in the face of the climate change crisis. Design/methodology/approach - This study investigated the relationship between energy consumption and economic growth by energy source and sector using the energy-EKC (EEKC) hypothesis which included the energy consumption on the traditional Environmental Kuznets Curve (EKC), and the impulse response function (IRF) model based on Bayesian vector auto-regression (BVAR). Findings - During the analysis period, the trend of decoupling of energy consumption and economic growth in Korea is confirmed starting from 1996. However, the decoupling tendency appeared differently depending on the differences in energy consumption by sources and fields. The results of the IRF model using data on energy consumption by source showed that the impact of GDP and renewable energy consumption resulted in an increase in energy consumption of bio and waste, but a decrease in energy consumption by sources, and the impact of trade dependence was found to increase the consumption of petroleum products. Research implications or Originality - According to the main results, efficient distribution by existing energy source is required through expansion of development of not only renewable energy but also alternative energy. Additionally, in order to increase the effectiveness of existing energy policies to achieve carbon neutrality, more detailed strategies by source and sector of energy consumption are needed.

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining (베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템)

  • Yun, Jiyoung;Shin, Gun-Yoon;Kim, Dong-Wook;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.77-87
    • /
    • 2021
  • With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.

Evaluation of flood frequency analysis technique using measured actual discharge data (실측유량 자료를 활용한 홍수량 빈도해석 기법 평가)

  • Kim, Tae-Jeong;Kim, Jang-Gyeong;Song, Jae-Hyun;Kim, Jin-Guk;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.5
    • /
    • pp.333-343
    • /
    • 2022
  • For water resource management, the design flood is calculated using the flood frequency analysis technique and the rainfall runoff model. The method by design flood frequency analysis calculates the stochastic design flood by directly analyzing the actual discharge data and is theoretically evaluated as the most accurate method. Actual discharge data frequency analysis of the measured flow was limited due to data limitations in the existing flood flow analysis. In this study, design flood frequency analysis was performed using the measured flow data stably secured through the water level-discharge relationship curve formula. For the frequency analysis of design flood, the parameters were calculated by applying the bayesian inference, and the uncertainty of flood volume by frequency was quantified. It was confirmed that the result of calculating the design flood was close to that calculated by the rainfall-runoff model by applying long-term rainfall data. It is judged that hydrological analysis can be done from various perspectives by using long-term actual flow data through hydrological survey.

A Study on Stock Assessment of Japanese Flying Squid (Todarodes pacificus) in Korea·China·Japan Waters (한·중·일 해역의 살오징어(Todarodes pacificus) 자원평가 연구)

  • Sungsu Lim;Do-Hoon Kim;Jae-Beum Hong
    • Environmental and Resource Economics Review
    • /
    • v.31 no.4
    • /
    • pp.451-480
    • /
    • 2022
  • The Japanese Flying Squid (Todarodes pacificus) is a commercially important species in South Korea and the most popular species among consumers. However commercial catches of Japanese Flying Squid have been declining since 2000. In this study, we conducted a stock assessment to identify stock status. This study differed from previous studies in two aspects: a greater amount of available fishing effort data was used, and data from China, Japan, and Korea were included. A CMSY (catch-maximum sustainable yield) model was used to estimate MSY, biomass and exploitation with Bayesian state-space implementation of the Schaefer (BSS) model for the method of stock assessment, and evaluated the species by dividing into two groups, 'Korea' and ' Korea·China·Japan'. In all cases, Japanese flying squid biomass showed a general decreasing trend. Additionally, the biomass estimated for 2020 was lower than the biomass necessary to achieve the maximum sustainable yield. To manage Japanese Flying Squid effectively, it is necessary to strengthen the resource management strategies of individual countries and prepare a cooperative plan among countries.

Internal Property and Stochastic Deterioration Modeling of Total Pavement Condition Index for Transportation Asset Management (도로자산관리를 위한 포장종합평가지수의 속성과 변화과정의 모델링)

  • HAN, Daeseok;DO, Myungsik;KIM, Booil
    • International Journal of Highway Engineering
    • /
    • v.19 no.5
    • /
    • pp.1-11
    • /
    • 2017
  • PURPOSES : This study is aimed at development of a stochastic pavement deterioration forecasting model using National Highway Pavement Condition Index (NHPCI) to support infrastructure asset management. Using this model, the deterioration process regarding life expectancy, deterioration speed change, and reliability were estimated. METHODS : Eight years of Long-Term Pavement Performance (LTPP) data fused with traffic loads (Equivalent Single Axle Loads; ESAL) and structural capacity (Structural Number of Pavement; SNP) were used for the deterioration modeling. As an ideal stochastic model for asset management, Bayesian Markov multi-state exponential hazard model was introduced. RESULTS:The interval of NHPCI was empirically distributed from 8 to 2, and the estimation functions of individual condition indices (crack, rutting, and IRI) in conjunction with the NHPCI index were suggested. The derived deterioration curve shows that life expectancies for the preventive maintenance level was 8.34 years. The general life expectancy was 12.77 years and located in the statistical interval of 11.10-15.58 years at a 95.5% reliability level. CONCLUSIONS : This study originates and contributes to suggesting a simple way to develop a pavement deterioration model using the total condition index that considers road user satisfaction. A definition for level of service system and the corresponding life expectancies are useful for building long-term maintenance plan, especially in Life Cycle Cost Analysis (LCCA) work.

Model selection algorithm in Gaussian process regression for computer experiments

  • Lee, Youngsaeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.383-396
    • /
    • 2017
  • The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

Model selection for unstable AR process via the adaptive LASSO (비정상 자기회귀모형에서의 벌점화 추정 기법에 대한 연구)

  • Na, Okyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.909-922
    • /
    • 2019
  • In this paper, we study the adaptive least absolute shrinkage and selection operator (LASSO) for the unstable autoregressive (AR) model. To identify the existence of the unit root, we apply the adaptive LASSO to the augmented Dickey-Fuller regression model, not the original AR model. We illustrate our method with simulations and a real data analysis. Simulation results show that the adaptive LASSO obtained by minimizing the Bayesian information criterion selects the order of the autoregressive model as well as the degree of differencing with high accuracy.