• Title/Summary/Keyword: nearest-neighbor analysis

Search Result 253, Processing Time 0.029 seconds

Analysis of Temporal and Spatial Distribution of Traffic Accidents in Jinju (진주시 교통사고의 시계열적 공간분포특성 분석)

  • Sung, Byeong Jun;Bae, Gyu Han;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.2
    • /
    • pp.3-9
    • /
    • 2015
  • Since changes in land use in urban space cause traffic volume and it is closely related to traffic accidents. Therefore, an analysis on the causes of traffic accidents is judged to be an essential factor to establish the measure to reduce traffic accidents. In this regard, the analysis was conducted on the clustering by using the nearest neighbor indexes with regard to the occurrence frequencies of commercial and residential zone based on traffic accident data of the past five years (2009-2013) with the target of local small-medium sized city, Jinju-si. The analysis results, obtained in this study, are as follows: the occurrence frequency of traffic accidents was the highest in spring and the lowest in winter respectively. The clustering of traffic accident occurrence at nighttime was stronger than at daytime. In addition, terms of the analysis on the clustering of traffic accident according to land use, changes according to the seasons was not significant in commercial areas, while clustering density in winter tended to become significantly lower in residential areas. The analysis results of traffic accident types showed that the side-right angle collision of cars was the highest in frequency occurrence, and widespread in both commercial areas and residential areas. These results can provide us with important information to identify the occurrence pattern of traffic accidents in the structure of urban space, and it is expected that they will be appropriately utilized to establish measures to reduce traffic accidents.

Analysis of Reading Domian of Men and Women Elderly Using Book Lending Data (도서 대출데이터를 활용한 남녀 노령자의 독서 주제 분석)

  • Cho, Jane
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.1
    • /
    • pp.23-41
    • /
    • 2019
  • This study understand the subject domain of book which has been read by men and woman elderly by analizying the PFNET using library big data and confirm the difference between adult at age 30-40. This study extract co-occurrence matrix of book lending on the popular book list from library big data, for 4 group, men/woman elderly, men/woman adult. With these matrix, this study performs FP network analysis. And Pearson Correlation Analysis based on the Triangle Betweenness Centrality calculated on the loan book was performed to understand the correlation among the 4 clusters which has been created by PNNC algorithm. As a result, reading trend which has been focused on modern korean novel has been revealed in elderly regardless gender, among them, men elderly show extreme tendency concentrated on modern korean long series novel. In the correlation analysis, the male elderly showed a weak negative correlation with the adult male of r = -0.222, and the negative direction of all the other groups showed that the tendency of male elderly's loan book was opposite.

A Study on Research Trends of Library Science and Information Science Through Analyzing Subject Headings of Doctoral Dissertations Recently Published in the U.S. (학위논문 분석을 통한 미국 도서관학 및 정보과학 최근 연구 동향에 관한 연구)

  • Kim, Hyunjung
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.11-39
    • /
    • 2018
  • The study examines the research trends of doctoral dissertations in Library Science and Information Science published in the U.S. for the last 5 years. Data collected from PQDT Global includes 1,016 doctoral dissertations containing "Library Science" or "Information Science" as subject headings, and keywords extracted from those dissertations were used for a network analysis, which helps identifying the intellectual structure of the dissertations. Also, the analysis using 103 subject heading keywords resulted in various centrality measures, including triangle betweenness centrality and nearest neighbor centrality, as well as 26 clusters of associated subject headings. The most frequently studied subjects include computer-related subjects, education-related subjects, and communication-related subjects, and a cluster with information science as the most central subject contains most of the computer-related keywords, while a cluster with library science as the most central subject contains many of the education-related keywords. Other related subjects include various user groups for user studies, and subjects related to information systems such as management, economics, geography, and biomedical engineering.

Exploratory Research on Automating the Analysis of Scientific Argumentation Using Machine Learning (머신 러닝을 활용한 과학 논변 구성 요소 코딩 자동화 가능성 탐색 연구)

  • Lee, Gyeong-Geon;Ha, Heesoo;Hong, Hun-Gi;Kim, Heui-Baik
    • Journal of The Korean Association For Science Education
    • /
    • v.38 no.2
    • /
    • pp.219-234
    • /
    • 2018
  • In this study, we explored the possibility of automating the process of analyzing elements of scientific argument in the context of a Korean classroom. To gather training data, we collected 990 sentences from science education journals that illustrate the results of coding elements of argumentation according to Toulmin's argumentation structure framework. We extracted 483 sentences as a test data set from the transcription of students' discourse in scientific argumentation activities. The words and morphemes of each argument were analyzed using the Python 'KoNLPy' package and the 'Kkma' module for Korean Natural Language Processing. After constructing the 'argument-morpheme:class' matrix for 1,473 sentences, five machine learning techniques were applied to generate predictive models relating each sentences to the element of argument with which it corresponded. The accuracy of the predictive models was investigated by comparing them with the results of pre-coding by researchers and confirming the degree of agreement. The predictive model generated by the k-nearest neighbor algorithm (KNN) demonstrated the highest degree of agreement [54.04% (${\kappa}=0.22$)] when machine learning was performed with the consideration of morpheme of each sentence. The predictive model generated by the KNN exhibited higher agreement [55.07% (${\kappa}=0.24$)] when the coding results of the previous sentence were added to the prediction process. In addition, the results indicated importance of considering context of discourse by reflecting the codes of previous sentences to the analysis. The results have significance in that, it showed the possibility of automating the analysis of students' argumentation activities in Korean language by applying machine learning.

An Analysis of Policy Effects of Export Infrastructure Strengthening Program on Export of Food Distribution Companies (수출인프라강화사업이 식품유통기업 수출에 미치는 정책효과 분석)

  • Huang, Seong-Hyuk;Ji, Seong-Tae
    • Journal of Distribution Science
    • /
    • v.16 no.1
    • /
    • pp.87-99
    • /
    • 2018
  • Purpose - The Export Infrastructure Strengthening Program(EISP) is a project to expand exports of agri-food products through providing customized export information to food distribution companies and supporting overseas information activities. A total of 39.6 billion won was provided by 2016. So, the purpose of this study is to analyze whether EISP is effective for expanding exports of agri-food products. Research design, data, and methodology - A simple average difference between the export performance of the policy beneficiaries and the non-policy beneficiaries can be biased if the export capacity or inherent characteristics of the enterprise are not taken into consideration. In order to solve the problem of such a bias, the propensity score matching(PSM) method has been employed in this study. PSM is a method of converting the characteristics of an export company into an index through logit analysis and then reducing the matching to one dimension to improve the accuracy of the performance measurement. Results - The balancing test was conducted to determine how the characteristics of the policy beneficiary group and the matched policy non-beneficiary group corresponded to each other. As a result of the test, we could not reject the null hypothesis that there was no difference between the two groups, so that after the matching, the two groups were similar and the explanatory variables were well controlled. Using the nearest neighbor matching with propensity score estimating through logit analysis, we estimated average treatment effect on the treated(ATT). The food companies participating the EISP had the effect of increasing the exports of $ 5.88 million. As a result, the number of export contracts increased by 11.77, the number of exporting countries by 7.52, the number of export items by 47.51, and the number of buyers' consultation by 3.50. And overseas marketing expenses increased by 35.92 million won. Except for the number of export contracts, other export performance results showed statistically significant results. Conclusions - As the EISP has a positive effect on the expansion of agro-food exports, efforts should be made to find out the limitations or problems of the policy in the future and to make a greater contribution to the increase of exports.

Statistical Analysis of Projection-Based Face Recognition Algorithms (투사에 기초한 얼굴 인식 알고리즘들의 통계적 분석)

  • 문현준;백순화;전병민
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.5A
    • /
    • pp.717-725
    • /
    • 2000
  • Within the last several years, there has been a large number of algorithms developed for face recognition. The majority of these algorithms have been view- and projection-based algorithms. Our definition of projection is not restricted to projecting the image onto an orthogonal basis the definition is expansive and includes a general class of linear transformation of the image pixel values. The class includes correlation, principal component analysis, clustering, gray scale projection, and matching pursuit filters. In this paper, we perform a detailed analysis of this class of algorithms by evaluating them on the FERET database of facial images. In our experiments, a projection-based algorithms consists of three steps. The first step is done off-line and determines the new basis for the images. The bases is either set by the algorithm designer or is learned from a training set. The last two steps are on-line and perform the recognition. The second step projects an image onto the new basis and the third step recognizes a face in an with a nearest neighbor classifier. The classification is performed in the projection space. Most evaluation methods report algorithm performance on a single gallery. This does not fully capture algorithm performance. In our study, we construct set of independent galleries. This allows us to see how individual algorithm performance varies over different galleries. In addition, we report on the relative performance of the algorithms over the different galleries.

  • PDF

Comparison of Inflammatory Markers Changes in Patients Who Used Postoperative Prophylactic Antibiotics within 24 Hours after Spine Surgery and 5 Days after Spine Surgery

  • Youn, Gun;Choi, Man Kyu;Kim, Sung Bum
    • Journal of Korean Neurosurgical Society
    • /
    • v.65 no.6
    • /
    • pp.834-840
    • /
    • 2022
  • Objective : C-reactive protein (CRP) level, erythrocyte sedimentation rate (ESR), and white blood cell (WBC) count are inflammatory markers used to evaluate postoperative infections. Although these markers are non-specific, understanding their normal kinetics after surgery may be helpful in the early detection of postoperative infections. To compliment the recent trend of reducing the duration of antibiotic use, this retrospective study investigated the inflammatory markers of patients who had received antibiotics within 24 hours after surgery according to the Health Insurance Review & Assessment Service guidelines and compared them with those of patients who had received antibiotics for 5 days, which was proven to be non-infectious. Methods : We enrolled 74 patients, divided into two groups. Patients underwent posterior lumbar interbody fusion (PLIF) at a single institution between 2019 and 2020. Group A included 37 patients who received antibiotics within 24 hours after the PLIF procedure, and group B comprised 37 patients who had used antibiotics for 5 days. A 1 : 1 nearest-neighbor propensity-matched analysis was used. The clinical variables included age, sex, medical history, body mass index, estimated blood loss, and operation time. Laboratory data included CRP, ESR, and WBC, which were measured preoperatively and on postoperative days (POD) 1, 3, 5, and 7. Results : CRP dynamics tended to decrease after peaking on POD 3, with a similar trend in both groups. The average CRP level in group B was slightly higher than that in group A; however, the difference was not statistically significant. Multiple linear regression analysis revealed operation time, number of fused levels, and estimated blood loss as significant predictors of a greater CRP peak value (r2=0.473, p<0.001) in patients. No trend (a tendency to decrease from the peak value) could be determined for ESR and WBC count on POD 7. Conclusion : Although slight differences were observed in numerical values and kinetics, sequential changes in inflammatory markers according to the duration of antibiotic administration showed similar patterns. Knowledge of CRP kinetics allows the assessment of the degree of difference between the clinical and expected values.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

kNN Query Processing Algorithm based on the Encrypted Index for Hiding Data Access Patterns (데이터 접근 패턴 은닉을 지원하는 암호화 인덱스 기반 kNN 질의처리 알고리즘)

  • Kim, Hyeong-Il;Kim, Hyeong-Jin;Shin, Youngsung;Chang, Jae-woo
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1437-1457
    • /
    • 2016
  • In outsourced databases, the cloud provides an authorized user with querying services on the outsourced database. However, sensitive data, such as financial or medical records, should be encrypted before being outsourced to the cloud. Meanwhile, k-Nearest Neighbor (kNN) query is the typical query type which is widely used in many fields and the result of the kNN query is closely related to the interest and preference of the user. Therefore, studies on secure kNN query processing algorithms that preserve both the data privacy and the query privacy have been proposed. However, existing algorithms either suffer from high computation cost or leak data access patterns because retrieved index nodes and query results are disclosed. To solve these problems, in this paper we propose a new kNN query processing algorithm on the encrypted database. Our algorithm preserves both data privacy and query privacy. It also hides data access patterns while supporting efficient query processing. To achieve this, we devise an encrypted index search scheme which can perform data filtering without revealing data access patterns. Through the performance analysis, we verify that our proposed algorithm shows better performance than the existing algorithms in terms of query processing times.

Simulation Study on E-commerce Recommender System by Use of LSI Method (LSI 기법을 이용한 전자상거래 추천자 시스템의 시뮬레이션 분석)

  • Kwon, Chi-Myung
    • Journal of the Korea Society for Simulation
    • /
    • v.15 no.3
    • /
    • pp.23-30
    • /
    • 2006
  • A recommender system for E-commerce site receives information from customers about which products they are interested in, and recommends products that are likely to fit their needs. In this paper, we investigate several methods for large-scale product purchase data for the purpose of producing useful recommendations to customers. We apply the traditional data mining techniques of cluster analysis and collaborative filtering(CF), and CF with reduction of product-dimensionality by use of latent semantic indexing(LSI). If reduced product-dimensionality obtained from LSI shows a similar latent trend of customers for buying products to that based on original customer-product purchase data, we expect less computational effort for obtaining the nearest-neighbor for target customer may improve the efficiency of recommendation performance. From simulation experiments on synthetic customer-product purchase data, CF-based method with reduction of product-dimensionality presents a better performance than the traditional CF methods with respect to the recall, precision and F1 measure. In general, the recommendation quality increases as the size of the neighborhood increases. However, our simulation results shows that, after a certain point, the improvement gain diminish. Also we find, as a number of products of recommendation increases, the precision becomes worse, but the improvement gain of recall is relatively small after a certain point. We consider these informations may be useful in applying recommender system.

  • PDF