• Title/Summary/Keyword: 이진독립모형

Search Result 12, Processing Time 0.023 seconds

Application of the 2-Poisson Model to Full-Text Information Retrieval System (2-포아송 모형의 전문검색시스템 응용에 관한 연구)

  • 문성빈
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.3
    • /
    • pp.49-63
    • /
    • 1999
  • The purpose of this study is to investigate whether the terms in queries are distributed according to the 2-Poisson model in the documents represented by abstract/title or full-text. In this study, retrieval experiments using Binary independence and 2-Poisson independence model, which are based on the probabilistic theory, were conducted to see if the 2-Poisson distribution of the query terms has an influence on the retrieval effectiveness, particularly of full-text information retrieval system.

  • PDF

Semiparametric Approach to Logistic Model with Random Intercept (준모수적 방법을 이용한 랜덤 절편 로지스틱 모형 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1121-1131
    • /
    • 2015
  • Logistic models with a random intercept are useful to analyze longitudinal binary data. Traditionally, the random intercept of the logistic model is assumed to be parametric (such as normal distribution) and is also assumed to be independent to variables. Such assumptions are very strong and restricted for application to real data. Recently, Garcia and Ma (2015) derived semiparametric efficient estimators for logistic model with a random intercept without these assumptions. Their estimator shows the consistency where we do not assume any parametric form for the random intercept. In addition, the method is computationally simple. In this paper, we apply this method to analyze toenail infection data. We compare the semiparametric estimator with maximum likelihood estimator, penalized quasi-likelihood estimator and hierarchical generalized linear estimator.

Enhancing performance of full-text retrieval systems using relevance feedback (적합성피이드백을 이용한 전문검색시스템의 검색효율성 증진을 위한 연구)

  • 문성빈
    • Journal of the Korean Society for information Management
    • /
    • v.10 no.2
    • /
    • pp.43-67
    • /
    • 1993
  • The primary purpose of the study is to improve the low preclslon often found In full-text retrleval systems. In order to enhance the low precision of full-text retrleval wh~le retaining ~ t s hgh recall, relevance feedback mechanisms based on probabilistic retrieval models (binary independence and two-Polsson Independence models) were employed. Thls paper investigates the effect of relevance feedback on the performance of full-text retrieval systems.

  • PDF

Introduction to the Indian Buffet Process: Theory and Applications (인도부페 프로세스의 소개: 이론과 응용)

  • Lee, Youngseon;Lee, Kyoungjae;Lee, Kwangmin;Lee, Jaeyong;Seo, Jinwook
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.2
    • /
    • pp.251-267
    • /
    • 2015
  • The Indian Buffet Process is a stochastic process on equivalence classes of binary matrices having finite rows and infinite columns. The Indian Buffet Process can be imposed as the prior distribution on the binary matrix in an infinite feature model. We describe the derivation of the Indian buffet process from a finite feature model, and briefly explain the relation between the Indian buffet process and the beta process. Using a Gaussian linear model, we describe three algorithms: Gibbs sampling algorithm, Stick-breaking algorithm and variational method, with application for finding features in image data. We also illustrate the use of the Indian Buffet Process in various type of analysis such as dyadic data analysis, network data analysis and independent component analysis.

Development of Soil Erosion Analysis Systems Based on Cloud and HyGIS (클라우드 및 HyGIS기반 토양유실분석 시스템 개발)

  • Kim, Joo-Hun;Kim, Kyung-Tak;Lee, Jin-Won
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.4
    • /
    • pp.63-76
    • /
    • 2011
  • This study purposes to develop a model to analyze soil loss in estimating prior disaster influence. The model of analyzing soil loss develops the soil loss analysis system on the basis of Internet by introducing cloud computing system, and also develops a standalone type in connection with HyGIS. The soil loss analysis system is developed to draw a distribution chart without requiring a S/W license as well as without preparing basic data such as DEM, soil map and land cover map. Besides, it can help users to draw a soil loss distribution chart by applying various factors like direct rain factors. The tools of Soil Loss Anaysis Model in connection with HyGiS are developed as add-on type of GMMap2009 in GEOMania, and also are developed to draw Soil Loss Hazard Map suggested by OECD. As a result of using both models, they are developed very conveniently to analyze soil loss. Hereafter, these models will be able to be improved continuously through researches to analyze sediment a watershed outlet and to calculate R value using data of many rain stations.

A Study on Time Series Analysis of Membrane Fouling by using Genetic Algorithm in the Field Plant (유전자알고리즘을 이용한 막오염 시계열 예측 연구)

  • Lee, Jin Sook;Kim, Jun Hyun;Jun, Yong Seong;Kwak, Young Ju;Lee, Jin Hyo
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.38 no.8
    • /
    • pp.444-451
    • /
    • 2016
  • Most research on membrane fouling models in the past are based on theoretical equations in lab-scale experiments. But these studies are barely suitable for applying on the full-scale spot where there is a sequential process such as filtration, backwash and drain. This study was conducted in submerged membrane system which being on operation auto sequentially and treating wastewater from G-water purification plant in Incheon. TMP had been designated as a fouling indicator in constant flux conditions. Total volume of inflow and SS concentration are independent variables as major operation parameters and time-series analysis and prediction of TMP were conducted. And similarity between simulated values and measured values was assessed. Final prediction model by using genetic algorithm was fully adaptable because simulated values expressed pulse-shape periodicity and increasing trend according to time at the same time. As results of twice validation, correlation coefficients between simulated and measured data were $r^2=0.721$, $r^2=0.928$, respectively. Although this study was conducted limited to data for summer season, the more amount of data, better reliability for prediction model can be obtained. If simulator for short range forecast can be developed and applied, TMP prediction technique will be a great help to energy efficient operation.

Revenue Change by Peak Hour Fare Imposition for Senior Free Ride : Using Seoul Metropolitan Subway Smart Card Data (노인무임승차 첨두시 요금부과에 따른 수입금 변화 : 수도권 스마트카드자료를 이용하여)

  • Seongil Shin;Jinhak Lee;Hasik Lee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.2
    • /
    • pp.1-14
    • /
    • 2023
  • This study derives quantitative data on how much the fiscal deficit of subway operation agencies can be reduced in the process of charging free rides for the elderly in metropolitan subways during peak periods. In smart card data, every trip of elderly is recorded except fares. Therefore, it is required to establish a methodology for estimating the fares of elderly passengers and distributing them to subway opertation agencies as income. This study builds a simultaneous dynamic traffic allocation model that reflects the assumption that elderly selects a minimum time route based on the departure time. The travel route of the elderly is estimated, and the distance-proportional fare charged to the elderly is calculated based on this, and the fare is distributed by reflecting the connected railway revenue allocation principle of the metropolitan subway operating agencies. As a result of conducting a case study for before and after COVID-19 in 2019 and 2020, it is analyzed that Seoul Metro's annual free loss of 360 billion won could be reduced 6~8% at the morning peak (07:00-08:59), and 13~16% at the morning and afternoon peak (18:00-19:59).

A Study on the Antecedent Factors of Performance and Sustainability of Social Enterprises (사회적기업의 성과와 지속가능성의 성공요인에 관한 연구)

  • Lee, Jin-Min;Lee, Sang-Shik
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.22 no.2
    • /
    • pp.123-142
    • /
    • 2017
  • After the Enactment of Support of Social Enterprise in 2007, Cultivation of Social Enterprises has been Promoted in Eearnest these days. This Study Attempts to Examine the Factors that Affect the Sustainability of Social Enterprise, Which is the Current Policy Issue Regarding the Social Enterprise. For this Purpose, the Study Developed a Research Model that has Antecedent Factors(strategy, managerial capability, business environment and social entrepreneurship) of Social Enterprise as an Independent Variable, Performance as Parameter and Sustainability as a Dependent Variable. Using this Model, the Study Established Hypotheses that Examine the Performance and Antecedent Factors of Sustainability of Social Enterprise. According to the Hypothesis Testing Results, the Economic Performance Showed Partial Mediating Effect on the Impact of Strategy, Management Capacity, Business Environment and Social Entrepreneurial Spirit on the Sustainability. As for the Social Performance, Strategy, Management Capacity and Social Entrepreneurial Spirit Turned out to Partially Mediate the Impact on Sustainability. Meanwhile, The Social Performance did not Show Mediating Effect in the Impact of Business Environment on Sustainability.

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

Estimating design floods for ungauged basins in the geum-river basin through regional flood frequency analysis using L-moments method (L-모멘트법을 이용한 지역홍수빈도분석을 통한 금강유역 미계측 유역의 설계홍수량 산정)

  • Lee, Jin-Young;Park, Dong-Hyeok;Shin, Ji-Yae;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.8
    • /
    • pp.645-656
    • /
    • 2016
  • The study performed a regional flood frequency analysis and proposed a regression equation to estimate design floods corresponding to return periods for ungauged basins in Geum-river basin. Five preliminary tests were employed to investigate hydrological independence and homogeneity of streamflow data, i.e. the lag-one autocorrelation test, time homogeneity test, Grubbs-Beck outlier test, discordancy measure test ($D_i$), and regional homogeneity measure (H). The test results showed that streamflow data were time-independent, discordant and homogeneous within the basin. Using five probability distributions (generalized extreme value (GEV), three-parameter log-normal (LN-III), Pearson type 3 (P-III), generalized logistic (GLO), generalized Pareto (GPA)), comparative regional flood frequency analyses were carried out for the region. Based on the L-moment ratio diagram, average weighted distance (AWD) and goodness-of-fit statistics ($Z^{DIST}$), the GLO distribution was selected as the best fit model for Geum-river basin. Using the GLO, a regression equation was developed for estimating regional design floods, and validated by comparing the estimated and observed streamflows at the Ganggyeong station.