• Title/Summary/Keyword: nearest-neighbor analysis

Search Result 257, Processing Time 0.033 seconds

Machine Learning Algorithms for Predicting Anxiety and Depression (불안과 우울 예측을 위한 기계학습 알고리즘)

  • Kang, Yun-Jeong;Lee, Min-Hye;Park, Hyuk-Gyu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.207-209
    • /
    • 2022
  • In the IoT environment, it is possible to collect life pattern data by recognizing human physical activity from smart devices. In this paper, the proposed model consists of a prediction stage and a recommendation stage. The prediction stage predicts the scale of anxiety and depression by using logistic regression and k-nearest neighbor algorithm through machine learning on the dataset collected from life pattern data. In the recommendation step, if the symptoms of anxiety and depression are classified, the principal component analysis algorithm is applied to recommend food and light exercise that can improve them. It is expected that the proposed anxiety/depression prediction and food/exercise recommendations will have a ripple effect on improving the quality of life of individuals.

  • PDF

Utilizing Machine Learning Algorithms for Recruitment Predictions of IT Graduates in the Saudi Labor Market

  • Munirah Alghamlas;Reham Alabduljabbar
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.113-124
    • /
    • 2024
  • One of the goals of the Saudi Arabia 2030 vision is to ensure full employment of its citizens. Recruitment of graduates depends on the quality of skills that they may have gained during their study. Hence, the quality of education and ensuring that graduates have sufficient knowledge about the in-demand skills of the market are necessary. However, IT graduates are usually not aware of whether they are suitable for recruitment or not. This study builds a prediction model that can be deployed on the web, where users can input variables to generate predictions. Furthermore, it provides data-driven recommendations of the in-demand skills in the Saudi IT labor market to overcome the unemployment problem. Data were collected from two online job portals: LinkedIn and Bayt.com. Three machine learning algorithms, namely, Support Vector Machine, k-Nearest Neighbor, and Naïve Bayes were used to build the model. Furthermore, descriptive and data analysis methods were employed herein to evaluate the existing gap. Results showed that there existed a gap between labor market employers' expectations of Saudi workers and the skills that the workers were equipped with from their educational institutions. Planned collaboration between industry and education providers is required to narrow down this gap.

Research on Oriental Medicine Diagnosis and Classification System by Using Neck Pain Questionnaire (경항통 설문지를 이용한 한의학적 진단 및 분류체계에 관한 연구)

  • Song, In;Lee, Geon-Mok;Hong, Kwon-Eui
    • Journal of Acupuncture Research
    • /
    • v.28 no.3
    • /
    • pp.85-100
    • /
    • 2011
  • Objectives : The purpose of this thesis is to help the preparation of oriental medicine clinical guidelines for drawing up the standards of oriental medicine demonstration and diagnosis classification about the neck pain. Methods : Statistical analysis about Gyeonghangtong(頸項痛), Nakchim(落枕), Sagyeong(斜頸), Hanggang (項强) classified experts' opinions about neck pain patients by Delphi method is conducted by using oriental medicine diagnosis questionnaire. The result was classified by using linear discriminant analysis (LDA), diagonal linear discriminant analysis (DLDA), diagonal quadratic discriminant analysis (DQDA), K-nearest neighbor classification (KNN), classification and regression trees (CART), support vector machines (SVM). Results : The results are summarized as follows. 1. The result analyzed by using LDA has a hit rate of 84.47% in comparison with the original diagnosis. 2. High hit rate was shown when the test for three categories such as Gyeonghangtong and Hanggang category, Sagyeong caterogy and Nakchim caterogy was conducted. 3. The result analyzed by using DLDA has a hit rate of 58.25% in comparison with the original diagnosis. The result analyzed by using DQDA has a accuracy of 57.28% in comparison with the original diagnosis. 4. The result analyzed by using KNN has a hit rate of 69.90% in comparison with the original diagnosis. 5. The result analyzed by using CART has a hit rate of 69.60% in comparison with the original diagnosis. There was a hit rate of 70.87% When the test of selected 8 significant questions based on analysis of variance was performed. 6. The result analyzed by using SVM has a hit rate of 80.58% in comparison with the original diagnosis. Conclusions : Statistical analysis using oriental medicine diagnosis questionnaire on neck pain generally turned out to have a significant result.

MCP, Kernel Density Estimation and LoCoH Analysis for the Core Area Zoning of the Red-crowned Crane's Feeding Habitat in Cheorwon, Korea (철원지역 두루미 취식지의 핵심지역 설정을 위한 MCP, 커널밀도측정법(KDE)과 국지근린지점외곽연결(LoCoH) 분석)

  • Yoo, Seung-Hwa;Lee, Ki-Sup;Park, Chong-Hwa
    • Korean Journal of Environment and Ecology
    • /
    • v.27 no.1
    • /
    • pp.11-21
    • /
    • 2013
  • We tried to find out the core feeding site of the Red-crowned Crane(Grus japonensis) in Cheorwon, Korea by using analysis techniques which are MCP(minimum convex polygon), KDE(kernel density estimation), LoCoH(local nearest-neighbor convex-hull). And, We discussed the difference and meaning of result among analysis methods. We choose the data of utilization distribution from distribution map of Red-crowned Crane in Cheorwon, Korea at $17^{th}$ February 2012. Extent of the distribution area was $140km^2$ by MCP analysis. Extents of core feeding area of the Red-crowned Crane were $33.3km^2$($KDE_{1000m}$), $25.7km^2$($KDE_{CVh}$), $19.7km^2$($KDE_{LSCVh}$), according to the 1000m, CVh, LSCVh in value of bandwidth. Extent, number and shape complexity of the core area has decreased, and size of each core area have decreased as small as the bandwidth size(default:1000m, CVh: 554.6m, LSCVh: 329.9). We would suggest the CVh value in KDE analysis as a proper bandwidth value for the Red-crowned crane's core area zoning. Extent of the distribution range and core area have increased and merged into the large core area as a increasing of k value in LoCoH analysis. Proper value for the selecting core area of Red-crowned Crane's distribution was k=24, and extent of the core area was $18.2km^2$, 16.5% area of total distribution area. Finally, the result of LoCoH analysis, we selected two core area, and number of selected core area was smaller than selected area of KDE analysis. Exact value of bandwidth have not been used in studies using KDE analysis in most articles and presentations of the Korea. As a result, it is needed to clarify the exact using bandwidth value in KDE studies.

Machine learning in survival analysis (생존분석에서의 기계학습)

  • Baik, Jaiwook
    • Industry Promotion Research
    • /
    • v.7 no.1
    • /
    • pp.1-8
    • /
    • 2022
  • We investigated various types of machine learning methods that can be applied to censored data. Exploratory data analysis reveals the distribution of each feature, relationships among features. Next, classification problem has been set up where the dependent variable is death_event while the rest of the features are independent variables. After applying various machine learning methods to the data, it has been found that just like many other reports from the artificial intelligence arena random forest performs better than logistic regression. But recently well performed artificial neural network and gradient boost do not perform as expected due to the lack of data. Finally Kaplan-Meier and Cox proportional hazard model have been employed to explore the relationship of the dependent variable (ti, δi) with the independent variables. Also random forest which is used in machine learning has been applied to the survival analysis with censored data.

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.

A Study on the Network Generation Methods for Examining the Intellectual Structure of Knowledge Domains (지적 구조의 규명을 위한 네트워크 형성 방식에 관한 연구)

  • Lee Jae-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.40 no.2
    • /
    • pp.333-355
    • /
    • 2006
  • Network generation methods to visualize bibliometric data for examining the intellectual structure of knowledge domains are investigated in some detail. Among the four methods investigated in this study, pathfinder network algorithm is the most effective method in representing local details as well as global intellectual structure. The nearest neighbor graph, although never used in bibliometic analysis, also has some advantages such as its simplicity and clustering ability. The effect of input data preparation process on resulting intellectual structures are examined, and concluded that unlike MDS map with clusters, the network structure could be changed significantly by the differences in data matrix preparation process. The network generation methods investigated in this paper could be alternatives to conventional multivariate analysis methods and could facilitate our research on examining intellectual structure of knowledge domains.

A Study on Research Collaboration Among Asian Countries in Science and Technology (과학기술분야 국제협력 증진을 위한 아시아 국가 간 공동연구 현황 분석)

  • Kim, Won-Jin;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.103-123
    • /
    • 2010
  • Recently, research community in Korea has shown a rapid growth in collaborating with Asian countries. In this study, we analyzed research collaboration among Asian countries using network analysis of co-authored papers as well as subject categories. The network of co-authored papers among Asian countries over the 5-year period since 2005 revealed that Japan, China, and Korea were positioned at the central part of the network and highly productive in collaborative research. In the analysis of the subject categories of co-authored papers in four different Asian regions with 2009 data, physics and material science were found the most productive subject fields in collaborative research in Northeast Asia. On the other hand, medical science was the most collaborative subject field in the remaining Asian regions.

Lineament analysis in the euiseong area using automatic lineament extraction algorithm (자동 선구조 추출 알고리즘을 이용한 경북 의성지역의 선구조 분석)

  • 김상완
    • Economic and Environmental Geology
    • /
    • v.32 no.1
    • /
    • pp.19-31
    • /
    • 1999
  • In this study, we have estimated lineaments in the Euiseong area, Kyungbuk Province, from Landsat TM by applying the algorithm developed by Kim and Won et al. which can effectively reduce the look direction bias associated with the Sun's azimuth angle. Fratures over the study area were also mapped in the field at 57 selected sites to compare them with the results from the satellite image. The trends of lineaments estimated from the Landsat TM images are characterized as $N50^{\circ}$~70W, NS~$N10^{\circ}$W, and $N10^{\circ}$~$60^{\circ}$E trends. The spatial distribution of lineaments is also studied using a circular grid, and the results show that the area can be divided into two domains : domain A in which NS~$N20^{\circ}$E direction is dominant, and domain B in which west-north-west direction is prominent. The trends of lineaments can also be classified into seven groups. Among them, only C, D and G trends are found to be dominant based upon Donnelly's nearest neighbor analysis and correlations of lineament desities. In the color composite image produced by overlaying the lineament density map of these C-, D-, and G-trends, G-trend is shown to be developed in the whole study area while the eastern part of the area is dominated by D-trend. C-trend develops extensively over the whole are except the southeastern part. The orientation of fractures measured at 35 points in the field shows major trends of NS~$N30^{\circ}$E, $N50^{\circ}$~$80^{\circ}$W, and N80$^{\circ}$E~EW, which agree relatively well with the lineaments estimated form the satellite image. The rose diagram analysis fo field data shows that WNW-ESE trending discontinuities are developed in the whole area while discontinuities of NS~$N20^{\circ}$E are develped only in the estern part, which also coincide with the result from the satellite image. The combined results of lineaments from the satellite image and fracture orientation of field data at 22 points including 18 minor faults in Sindong Group imply that the WNW-ESE trend is so prominent that Gumchun and Gaum faults are possibly extended up to the lower Sindong Group in the study area.

  • PDF

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.