• 제목/요약/키워드: nearest-neighbor analysis

검색결과 254건 처리시간 0.023초

경항통 설문지를 이용한 한의학적 진단 및 분류체계에 관한 연구 (Research on Oriental Medicine Diagnosis and Classification System by Using Neck Pain Questionnaire)

  • 송인;이건목;홍권의
    • Journal of Acupuncture Research
    • /
    • 제28권3호
    • /
    • pp.85-100
    • /
    • 2011
  • Objectives : The purpose of this thesis is to help the preparation of oriental medicine clinical guidelines for drawing up the standards of oriental medicine demonstration and diagnosis classification about the neck pain. Methods : Statistical analysis about Gyeonghangtong(頸項痛), Nakchim(落枕), Sagyeong(斜頸), Hanggang (項强) classified experts' opinions about neck pain patients by Delphi method is conducted by using oriental medicine diagnosis questionnaire. The result was classified by using linear discriminant analysis (LDA), diagonal linear discriminant analysis (DLDA), diagonal quadratic discriminant analysis (DQDA), K-nearest neighbor classification (KNN), classification and regression trees (CART), support vector machines (SVM). Results : The results are summarized as follows. 1. The result analyzed by using LDA has a hit rate of 84.47% in comparison with the original diagnosis. 2. High hit rate was shown when the test for three categories such as Gyeonghangtong and Hanggang category, Sagyeong caterogy and Nakchim caterogy was conducted. 3. The result analyzed by using DLDA has a hit rate of 58.25% in comparison with the original diagnosis. The result analyzed by using DQDA has a accuracy of 57.28% in comparison with the original diagnosis. 4. The result analyzed by using KNN has a hit rate of 69.90% in comparison with the original diagnosis. 5. The result analyzed by using CART has a hit rate of 69.60% in comparison with the original diagnosis. There was a hit rate of 70.87% When the test of selected 8 significant questions based on analysis of variance was performed. 6. The result analyzed by using SVM has a hit rate of 80.58% in comparison with the original diagnosis. Conclusions : Statistical analysis using oriental medicine diagnosis questionnaire on neck pain generally turned out to have a significant result.

철원지역 두루미 취식지의 핵심지역 설정을 위한 MCP, 커널밀도측정법(KDE)과 국지근린지점외곽연결(LoCoH) 분석 (MCP, Kernel Density Estimation and LoCoH Analysis for the Core Area Zoning of the Red-crowned Crane's Feeding Habitat in Cheorwon, Korea)

  • 유승화;이기섭;박종화
    • 한국환경생태학회지
    • /
    • 제27권1호
    • /
    • pp.11-21
    • /
    • 2013
  • 본 연구는 두루미(Grus japonensis)의 이용분포 내에서 행동권 분석의 기법인 MCP(최소볼록다각형법), KDE(커널밀도측정법), LoCoH(국지근린지점외곽연결)를 이용하여 이용면적과 핵심서식지를 선정하였다. 또한, 각 기법의 차이와 의미를 고찰하도록 하였다. 두루미의 분포자료는 철원지역 2012년 2월 17일 조사자료를 사용하였다. MCP에 의한 두루미류 서식영역은 $140km^2$이었다. KDE 분석에서 띠폭에 해당하는 h값을 1000m, CVh, LSCVh로 달리하여 KDE 등치선을 생성하였을 때, 핵심지역에 해당하는(Kernel 50% 이상) 면적은 $33.3km^2$($KDE_{1000m}$), $25.7km^2$($KDE_{CVh}$), $19.7km^2$($KDE_{LSCVh}$)이었다. 결과적으로 띠폭에 대한 기본값(1000m)-CVh(554.6m)-LSCVh(329.9m) 순으로 변수를 작게 입력할 경우 핵심면적 개수는 늘어나고, 면적은 감소하였으며, 형태의 복잡성은 증가하였다. 두루미류의 KDE 분석에 의한 핵심지역의 선정에서 적합한 띠폭변수는 CVh 값인 것으로 판단되었다. LoCoH분석에서는 서식범위와 핵심지역(50% 등치선 이상의 지역)의 면적이 k값의 증가에 따라 증가하는 모습을 보였으며, 점차 큰 핵심지역으로 합쳐지는 모습을 나타내었다. 핵심지역을 도출하기에 적합한 k 값은 24로 나타났으며, 전체 개체군의 핵심지역은 $18.2km^2$로 전체 서식면적의 16.5%를 차지하였다. 최종적으로, LoCoH 분석은 두 개의 큰 핵심서식지를 제시하였으며, 이것은 KDE에 의한 핵심지역에 비하여 작은 수의 핵심지역을 제시한 것이었다. 국내의 게재논문 및 발표자료를 포함한 연구에서 KDE는 대부분 기본설정으로 분석되었으며, 띠폭에 의한 변수를 고려한 것은 매우 드물었다. 따라서 띠폭변수를 명확히 제시하는 것이 요구되었다.

생존분석에서의 기계학습 (Machine learning in survival analysis)

  • 백재욱
    • 산업진흥연구
    • /
    • 제7권1호
    • /
    • pp.1-8
    • /
    • 2022
  • 본 논문은 중도중단 데이터가 포함된 생존데이터의 경우 적용할 수 있는 기계학습 방법에 대해 살펴보았다. 우선 탐색적인 자료분석으로 각 특성에 대한 분포, 여러 특성들 간의 관계 및 중요도 순위를 파악할 수 있었다. 다음으로 독립변수에 해당하는 여러 특성들과 종속변수에 해당하는 특성(사망여부) 간의 관계를 분류문제로 보고 logistic regression, K nearest neighbor 등의 기계학습 방법들을 적용해본 결과 적은 수의 데이터이지만 통상적인 기계학습 결과에서와 같이 logistic regression보다는 random forest가 성능이 더 좋게 나왔다. 하지만 근래에 성능이 좋다고 하는 artificial neural network나 gradient boost와 같은 기계학습 방법은 성능이 월등히 좋게 나오지 않았는데, 그 이유는 주어진 데이터가 빅데이터가 아니기 때문인 것으로 판명된다. 마지막으로 Kaplan-Meier나 Cox의 비례위험모델과 같은 통상적인 생존분석 방법을 적용하여 어떤 독립변수가 종속변수 (ti, δi)에 결정적인 영향을 미치는지 살펴볼 수 있었으며, 기계학습 방법에 속하는 random forest를 중도중단 데이터가 포함된 생존데이터에도 적용하여 성능을 평가할 수 있었다.

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.

지적 구조의 규명을 위한 네트워크 형성 방식에 관한 연구 (A Study on the Network Generation Methods for Examining the Intellectual Structure of Knowledge Domains)

  • 이재윤
    • 한국문헌정보학회지
    • /
    • 제40권2호
    • /
    • pp.333-355
    • /
    • 2006
  • 이 연구에서는 지적 구조 분석을 위해서 계량서지적 자료를 시각적으로 표현하는 다양한 네트워크 형성 방식에 대해서 사례와 함께 각각의 특성을 살펴보았다. 기준값 절단 방식, 최근접이웃 그래프, 최소비용 신장트리, 패스파인더 네트워크의 네 가지 네트워크 형성 방식 중에서 전체 구조와 세부 구조의 표현 능력이 모두 뛰어난 패스파인더 네트워크 알고리즘이 최근 가장 활발히 응용되고 있다. 최근접이웃 그래프는 아직까지 계량서지적 분석에 응용된 사례는 없으나 간단한 알고리즘과 클러스터링 능력 등과 같은 지적 구조 규명에 도움이 될 수 있는 몇 가지 장점을 갖추고 있는 것으로 확인되었다. 다차원척도나 군집분석과 달리 네트워크를 이용한 시각화에서는 입력자료의 전처리에 따라서 생성된 지적 구조의 차이가 큰 것으로 나타났다. 이 연구에서 고찰한 여러 네트워크 형성 방식을 적절히 활용함으로써 국내의 지적 구조 규명 연구를 활성화할 수 있을 것이라 기대된다.

과학기술분야 국제협력 증진을 위한 아시아 국가 간 공동연구 현황 분석 (A Study on Research Collaboration Among Asian Countries in Science and Technology)

  • 김원진;정영미
    • 정보관리학회지
    • /
    • 제27권3호
    • /
    • pp.103-123
    • /
    • 2010
  • 과학기술분야 국제협력은 국가 경쟁력 확보를 위해서 필수적이다. 한국은 과학기술의 인적 물적 자원의 한계를 극복하고자 연구의 국제화를 추진하고 있으며 최근 아시아 국가와 연구협력에서 높은 성장률을 보여주었다. 본 연구에서는 네트워크 분석을 이용하여 한국과의 공동연구가 크게 증가한 아시아 국가 간 공동연구 현황을 공저논문 수와 주제범주로 구분하여 실증적으로 파악하였다. 최근 5년간 아시아 국가 간 공저논문 수 기반 네트워크를 살펴보면, 일본, 중국, 한국 등 동북아시아 국가들이 네트워크 중심부에 있었으며 국가 상호 간 공동연구가 활발하게 이루어졌다. 또한 아시아 지역별로 공동연구의 주제범주를 분석한 결과, 동북아시아 지역은 기초과학 분야에서, 남부아시아, 동남아시아, 서남아시아 지역은 의학 분야에서 공동연구가 활발하게 이루어진 것으로 나타났다.

자동 선구조 추출 알고리즘을 이용한 경북 의성지역의 선구조 분석 (Lineament analysis in the euiseong area using automatic lineament extraction algorithm)

  • 김상완
    • 자원환경지질
    • /
    • 제32권1호
    • /
    • pp.19-31
    • /
    • 1999
  • In this study, we have estimated lineaments in the Euiseong area, Kyungbuk Province, from Landsat TM by applying the algorithm developed by Kim and Won et al. which can effectively reduce the look direction bias associated with the Sun's azimuth angle. Fratures over the study area were also mapped in the field at 57 selected sites to compare them with the results from the satellite image. The trends of lineaments estimated from the Landsat TM images are characterized as $N50^{\circ}$~70W, NS~$N10^{\circ}$W, and $N10^{\circ}$~$60^{\circ}$E trends. The spatial distribution of lineaments is also studied using a circular grid, and the results show that the area can be divided into two domains : domain A in which NS~$N20^{\circ}$E direction is dominant, and domain B in which west-north-west direction is prominent. The trends of lineaments can also be classified into seven groups. Among them, only C, D and G trends are found to be dominant based upon Donnelly's nearest neighbor analysis and correlations of lineament desities. In the color composite image produced by overlaying the lineament density map of these C-, D-, and G-trends, G-trend is shown to be developed in the whole study area while the eastern part of the area is dominated by D-trend. C-trend develops extensively over the whole are except the southeastern part. The orientation of fractures measured at 35 points in the field shows major trends of NS~$N30^{\circ}$E, $N50^{\circ}$~$80^{\circ}$W, and N80$^{\circ}$E~EW, which agree relatively well with the lineaments estimated form the satellite image. The rose diagram analysis fo field data shows that WNW-ESE trending discontinuities are developed in the whole area while discontinuities of NS~$N20^{\circ}$E are develped only in the estern part, which also coincide with the result from the satellite image. The combined results of lineaments from the satellite image and fracture orientation of field data at 22 points including 18 minor faults in Sindong Group imply that the WNW-ESE trend is so prominent that Gumchun and Gaum faults are possibly extended up to the lower Sindong Group in the study area.

  • PDF

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권12호
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

경관의 지수화 및 시각화 기법을 활용한 대전광역시 녹지비오톱 파편화 분석 (Fragmentation Analysis of Daejeon City's Green Biotope Using Landscape Index and Visualization Method)

  • 김진효;나정화;이순주;권오성;조현주;이은재
    • 한국환경복원기술학회지
    • /
    • 제19권3호
    • /
    • pp.29-44
    • /
    • 2016
  • The purpose of this study is to quantitatively and visually analyze the degree of green biotope fragmentation caused by road construction and other development work using FRAGSTATS and GUIDOS tool. Moreover, linking of the endangered species research, we mapped "Biotope Fragmentation Map" of Daejeon-city. The findings of the study are summarized as follows: First, as the result of FRAGSTATS, landscape indices : number of patch(NP), mean patch size (MPS), edge length(TE), mean nearest neighbor distance(MNN), edge shape(LSI) showed meaningful change from fragmentation. Moreover, the result of GUIDOS analysis, middle core-small core-bridge-branch-edge-islet-perforation showed increase of area percentage without large core. Lastly, analysis result of 'Biotope Fragmentation Map' revealed that changing site of large core's size appeared eighteen-site and designated as the special protection area appeared forty-one site. As the result of the two data, overlapping areas that showed both change of core size and revealed special protection areas revealed four site. For example, five species of endangered species appeared on the NO. 4 site in 'Biotope Fragmentation Map'. The findings of this study as summarized above are considered to play an important role in basic data preventing green biotope fragmentation at the planned level from various development work.

The Kernel Trick for Content-Based Media Retrieval in Online Social Networks

  • Cha, Guang-Ho
    • Journal of Information Processing Systems
    • /
    • 제17권5호
    • /
    • pp.1020-1033
    • /
    • 2021
  • Nowadays, online or mobile social network services (SNS) are very popular and widely spread in our society and daily lives to instantly share, disseminate, and search information. In particular, SNS such as YouTube, Flickr, Facebook, and Amazon allow users to upload billions of images or videos and also provide a number of multimedia information to users. Information retrieval in multimedia-rich SNS is very useful but challenging task. Content-based media retrieval (CBMR) is the process of obtaining the relevant image or video objects for a given query from a collection of information sources. However, CBMR suffers from the dimensionality curse due to inherent high dimensionality features of media data. This paper investigates the effectiveness of the kernel trick in CBMR, specifically, the kernel principal component analysis (KPCA) for dimensionality reduction. KPCA is a nonlinear extension of linear principal component analysis (LPCA) to discovering nonlinear embeddings using the kernel trick. The fundamental idea of KPCA is mapping the input data into a highdimensional feature space through a nonlinear kernel function and then computing the principal components on that mapped space. This paper investigates the potential of KPCA in CBMR for feature extraction or dimensionality reduction. Using the Gaussian kernel in our experiments, we compute the principal components of an image dataset in the transformed space and then we use them as new feature dimensions for the image dataset. Moreover, KPCA can be applied to other many domains including CBMR, where LPCA has been used to extract features and where the nonlinear extension would be effective. Our results from extensive experiments demonstrate that the potential of KPCA is very encouraging compared with LPCA in CBMR.