• Title/Summary/Keyword: machine learning

Search Result 5,182, Processing Time 0.032 seconds

Exploiting Friend's Username to De-anonymize Users across Heterogeneous Social Networking Sites (이종 소셜 네트워크 상에서 친구계정의 이름을 이용한 사용자 식별 기법)

  • Kim, Dongkyu;Park, Seog
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1110-1116
    • /
    • 2014
  • Nowadays, social networking sites (SNSs), such as Twitter, LinkedIn, and Tumblr, are coming into the forefront, due to the growth in the number of users. While users voluntarily provide their information in SNSs, privacy leakages resulting from the use of SNSs is becoming a problem owing to the evolution of large data processing techniques and the raising awareness of privacy. In order to solve this problem, the studies on protecting privacy on SNSs, based on graph and machine learning, have been conducted. However, examples of privacy leakages resulting from the advent of a new SNS are consistently being uncovered. In this paper, we propose a technique enabling a user to detect privacy leakages beforehand in the case where the service provider or third-party application developer threatens the SNS user's privacy maliciously.

A Proactive Inference Method of Suspicious Domains (선제 대응을 위한 의심 도메인 추론 방안)

  • Kang, Byeongho;YANG, JISU;So, Jaehyun;Kim, Czang Yeob
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.2
    • /
    • pp.405-413
    • /
    • 2016
  • In this paper, we propose a proactive inference method of finding suspicious domains. Our method detects potential malicious domains from the seed domain information extracted from the TLD Zone files and WHOIS information. The inference process follows the three steps: searching the candidate domains, machine learning, and generating a suspicious domain pool. In the first step, we search the TLD Zone files and build a candidate domain set which has the same name server information with the seed domain. The next step clusters the candidate domains by the similarity of the WHOIS information. The final step in the inference process finds the seed domain's cluster, and make the cluster as a suspicious domain set. In experiments, we used .COM and .NET TLD Zone files, and tested 10 seed domains selected by our analysts. The experimental results show that our proposed method finds 55 suspicious domains and 52 true positives. F1 scores 0.91, and precision is 0.95 We hope our proposal will contribute to the further proactive malicious domain blacklisting research.

Maximum Entropy-based Emotion Recognition Model using Individual Average Difference (개인별 평균차를 이용한 최대 엔트로피 기반 감성 인식 모델)

  • Park, So-Young;Kim, Dong-Keun;Whang, Min-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.7
    • /
    • pp.1557-1564
    • /
    • 2010
  • In this paper, we propose a maximum entropy-based emotion recognition model using the individual average difference of emotional signal, because an emotional signal pattern depends on each individual. In order to accurately recognize a user's emotion, the proposed model utilizes the difference between the average of the input emotional signals and the average of each emotional state's signals(such as positive emotional signals and negative emotional signals), rather than only the given input signal. With the aim of easily constructing the emotion recognition model without the professional knowledge of the emotion recognition, it utilizes a maximum entropy model, one of the best-performed and well-known machine learning techniques. Considering that it is difficult to obtain enough training data based on the numerical value of emotional signal for machine learning, the proposed model substitutes two simple symbols such as +(positive number)/-(negative number) for every average difference value, and calculates the average of emotional signals per second rather than the total emotion response time(10 seconds).

A New Self-Organizing Map based on Kernel Concepts (자가 조직화 지도의 커널 공간 해석에 관한 연구)

  • Cheong Sung-Moon;Kim Ki-Bom;Hong Soon-Jwa
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.439-448
    • /
    • 2006
  • Previous recognition/clustering algorithms such as Kohonen SOM(Self-Organizing Map), MLP(Multi-Layer Percecptron) and SVM(Support Vector Machine) might not adapt to unexpected input pattern. And it's recognition rate depends highly on the complexity of own training patterns. We could make up for and improve the weak points with lowering complexity of original problem without losing original characteristics. There are so many ways to lower complexity of the problem, and we chose a kernel concepts as an approach to do it. In this paper, using a kernel concepts, original data are mapped to hyper-dimension space which is near infinite dimension. Therefore, transferred data into the hyper-dimension are distributed spasely rather than originally distributed so as to guarantee the rate to be risen. Estimating ratio of recognition is based on a new similarity-probing and learning method that are proposed in this paper. Using CEDAR DB which data is written in cursive letters, 0 to 9, we compare a recognition/clustering performance of kSOM that is proposed in this paper with previous SOM.

Fire Accident Analysis of Hazardous Materials Using Data Analytics (Data Analytics를 활용한 위험물 화재사고 분석)

  • Shin, Eun-Ji;Koh, Moon-Soo;Shin, Dongil
    • Journal of the Korean Institute of Gas
    • /
    • v.24 no.5
    • /
    • pp.47-55
    • /
    • 2020
  • Hazardous materials accidents are not limited to the leakage of the material, but if the early response is not appropriate, it can lead to a fire or an explosion, which increases the scale of the damage. However, as the 4th industrial revolution and the rise of the big data era are being discussed, systematic analysis of hazardous materials accidents based on new techniques has not been attempted, but simple statistics are being collected. In this study, we perform the systematic analysis, using machine learning, on the fire accident data for the past 11 years (2008 ~ 2018), accumulated by the National Fire Service. The analysis results are visualized and presented through text mining analysis, and the possibility of developing a damage-scale prediction model is explored by applying the regression analysis method, using the main factors present in the hazardous materials fire accident data.

Groundwater Level Trend Analysis for Long-term Prediction Basedon Gaussian Process Regression (가우시안 프로세스 회귀분석을 이용한 지하수위 추세분석 및 장기예측 연구)

  • Kim, Hyo Geon;Park, Eungyu;Jeong, Jina;Han, Weon Shik;Kim, Kue-Young
    • Journal of Soil and Groundwater Environment
    • /
    • v.21 no.4
    • /
    • pp.30-41
    • /
    • 2016
  • The amount of groundwater related data is drastically increasing domestically from various sources since 2000. To justify the more expansive continuation of the data acquisition and to derive valuable implications from the data, continued employments of sophisticated and state-of-the-arts statistical tools in the analyses and predictions are important issue. In the present study, we employed a well established machine learning technique of Gaussian Process Regression (GPR) model in the trend analyses of groundwater level for the long-term change. The major benefit of GPR model is that the model provide not only the future predictions but also the associated uncertainty. In the study, the long-term predictions of groundwater level from the stations of National Groundwater Monitoring Network located within Han River Basin were exemplified as prediction cases based on the GPR model. In addition, a few types of groundwater change patterns were delineated (i.e., increasing, decreasing, and no trend) on the basis of the statistics acquired from GPR analyses. From the study, it was found that the majority of the monitoring stations has decreasing trend while small portion shows increasing or no trend. To further analyze the causes of the trend, the corresponding precipitation data were jointly analyzed by the same method (i.e., GPR). Based on the analyses, the major cause of decreasing trend of groundwater level is attributed to reduction of precipitation rate whereas a few of the stations show weak relationship between the pattern of groundwater level changes and precipitation.

Structural failure classification for reinforced concrete buildings using trained neural network based multi-objective genetic algorithm

  • Chatterjee, Sankhadeep;Sarkar, Sarbartha;Hore, Sirshendu;Dey, Nilanjan;Ashour, Amira S.;Shi, Fuqian;Le, Dac-Nhuong
    • Structural Engineering and Mechanics
    • /
    • v.63 no.4
    • /
    • pp.429-438
    • /
    • 2017
  • Structural design has an imperative role in deciding the failure possibility of a Reinforced Concrete (RC) structure. Recent research works achieved the goal of predicting the structural failure of the RC structure with the assistance of machine learning techniques. Previously, the Artificial Neural Network (ANN) has been trained supported by Particle Swarm Optimization (PSO) to classify RC structures with reasonable accuracy. Though, keeping in mind the sensitivity in predicting the structural failure, more accurate models are still absent in the context of Machine Learning. Since the efficiency of multi-objective optimization over single objective optimization techniques is well established. Thus, the motivation of the current work is to employ a Multi-objective Genetic Algorithm (MOGA) to train the Neural Network (NN) based model. In the present work, the NN has been trained with MOGA to minimize the Root Mean Squared Error (RMSE) and Maximum Error (ME) toward optimizing the weight vector of the NN. The model has been tested by using a dataset consisting of 150 RC structure buildings. The proposed NN-MOGA based model has been compared with Multi-layer perceptron-feed-forward network (MLP-FFN) and NN-PSO based models in terms of several performance metrics. Experimental results suggested that the NN-MOGA has outperformed other existing well known classifiers with a reasonable improvement over them. Meanwhile, the proposed NN-MOGA achieved the superior accuracy of 93.33% and F-measure of 94.44%, which is superior to the other classifiers in the present study.

A Study on the Idol Survivability Prediction Using Machine Learning Techniques : Focused on the Industrial Competitiveness (머신러닝 기법을 활용한 아이돌 생존 가능성 예측 연구 : 산업 경쟁력 증진을 중심으로)

  • Kim, Seul-ah;Ahn, Ju Hyuk;Cui, Fuquan
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.291-302
    • /
    • 2020
  • Korean popular music industry, which is lead by "Idol group", has forsaken their fandom all over the world. Therefore, idol groups has become not only an artist but also the most influential people in the Korean economy. A global idol group with a strong fandom can earn more than a trillion-dollar by attracting their global fan's interest in Korea. In other words, it is considerably important to carry the idol to a successful conclusion. This study tries to expect whether the idols can be survived or not at a certain point after their debut by ANN, Decision Tree, Random Forest. We decide that certain point as the three-year and eight-year after their debut, because it is their break-even point year and the year after their average renewal of the contract. In addition, this study also explains which feature is the most important to their survival by feature importance and Logistic regression. In conclusion, features like the number of idol competitors, the number of debut members and the number of the genre are significant. These results shed light on the efficient management of K-Pop idol to improve industrial competitiveness.

k-NN Query Optimization Scheme Based on Machine Learning Using a DNN Model (DNN 모델을 이용한 기계 학습 기반 k-최근접 질의 처리 최적화 기법)

  • We, Ji-Won;Choi, Do-Jin;Lee, Hyeon-Byeong;Lim, Jong-Tae;Lim, Hun-Jin;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.715-725
    • /
    • 2020
  • In this paper, we propose an optimization scheme for a k-Nearest Neighbor(k-NN) query, which finds k objects closest to the query in the high dimensional feature vectors. The k-NN query is converted and processed into a range query based on the range that is likely to contain k data. In this paper, we propose an optimization scheme using DNN model to derive an optimal range that can reduce processing cost and accelerate search speed. The entire system of the proposed scheme is composed of online and offline modules. In the online module, a query is actually processed when it is issued from a client. In the offline module, an optimal range is derived for the query by using the DNN model and is delivered to the online module. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Bug Report Quality Prediction for Enhancing Performance of Information Retrieval-based Bug Localization (정보검색기반 결함위치식별 기술의 성능 향상을 위한 버그리포트 품질 예측)

  • Kim, Misoo;Ahn, June;Lee, Eunseok
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.832-841
    • /
    • 2017
  • Bug reports are essential documents for developers to localize and fix bugs. These reports contain information regarding software bugs or failures that occur during software operation and maintenance phase. Information Retrieval-based Bug Localization (IR-BL) techniques have been proposed to reduce the time and cost it takes for developers to resolve bug reports. However, if a low-quality bug report is submitted, the performance of such techniques can be significantly degraded. To address this problem, we propose a quality prediction method that selects low-quality bug reports. This process; defines a Quality property of a Bug report as a Query (Q4BaQ) and predicts the quality of the bug reports using machine learning. We evaluated the proposed method with 3 open source projects. The results of the experiment show that the proposed method achieved an average F-measure of 87.31% and outperformed previous prediction techniques by up to 6.62% in the F-measure. Finally, a combination of the proposed method and traditional automatic query reformulation method improved the MRR and MAP by 0.9% and 1.3%, respectively.