• Title/Summary/Keyword: Sparse learning

Search Result 127, Processing Time 0.019 seconds

Privacy Protection Model for Location-Based Services

  • Ni, Lihao;Liu, Yanshen;Liu, Yi
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.96-112
    • /
    • 2020
  • Solving the disclosure problem of sensitive information with the k-nearest neighbor query, location dummy technique, or interfering data in location-based services (LBSs) is a new research topic. Although they reduced security threats, previous studies will be ineffective in the case of sparse users or K-successive privacy, and additional calculations will deteriorate the performance of LBS application systems. Therefore, a model is proposed herein, which is based on geohash-encoding technology instead of latitude and longitude, memcached server cluster, encryption and decryption, and authentication. Simulation results based on PHP and MySQL show that the model offers approximately 10× speedup over the conventional approach. Two problems are solved using the model: sensitive information in LBS application is not disclosed, and the relationship between an individual and a track is not leaked.

An Efficient Learning Method for Large Bayesian Networks using Clustering (클러스터링을 이용한 효율적인 대규모 베이지안 망 학습 방법)

  • Jung Sungwon;Lee Kwang H.;Lee Doheon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.700-702
    • /
    • 2005
  • 본 논문에서는 대규모 베이지안 망을 빠른 시간 안에 학습하기 위한 방법으로, 클러스터링을 이용한 방법을 제안한다. 제안하는 방법은 베이지안 구조 학습에 있어서 DAG(Directed Acyclic Graph)를 탐색하는 영역을 제한하기 위해 클러스터링을 사용한다. 기존의 베이지안 구조 학습 방법들이 고려하는 후보 DAG의 수가 전체 노드 수에 의해 제한되는 데 반해, 제안되는 방법에서는 미리 정해진 클러스터의 최대 크기에 의해 제한된다. 실험 결과를 통해, 제안하는 방법이 기존의 대규모 베이지안 망 학습에 활용되었던 SC(Sparse Candidate) 방법 보다 훨씬 적은 수의 후보 DAG만을 고려하였음에도 불구하고, 비슷한 정도의 정확도를 나타냄을 보인다.

  • PDF

Inverted Index based Modified Version of KNN for Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.17-26
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.

POI Recommendation Method Based on Multi-Source Information Fusion Using Deep Learning in Location-Based Social Networks

  • Sun, Liqiang
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.352-368
    • /
    • 2021
  • Sign-in point of interest (POI) are extremely sparse in location-based social networks, hindering recommendation systems from capturing users' deep-level preferences. To solve this problem, we propose a content-aware POI recommendation algorithm based on a convolutional neural network. First, using convolutional neural networks to process comment text information, we model location POI and user latent factors. Subsequently, the objective function is constructed by fusing users' geographical information and obtaining the emotional category information. In addition, the objective function comprises matrix decomposition and maximisation of the probability objective function. Finally, we solve the objective function efficiently. The prediction rate and F1 value on the Instagram-NewYork dataset are 78.32% and 76.37%, respectively, and those on the Instagram-Chicago dataset are 85.16% and 83.29%, respectively. Comparative experiments show that the proposed method can obtain a higher precision rate than several other newer recommended methods.

Multiple Fusion-based Deep Cross-domain Recommendation (다중 융합 기반 심층 교차 도메인 추천)

  • Hong, Minsung;Lee, WonJin
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.819-832
    • /
    • 2022
  • Cross-domain recommender system transfers knowledge across different domains to improve the recommendation performance in a target domain that has a relatively sparse model. However, they suffer from the "negative transfer" in which transferred knowledge operates as noise. This paper proposes a novel Multiple Fusion-based Deep Cross-Domain Recommendation named MFDCR. We exploit Doc2Vec, one of the famous word embedding techniques, to fuse data user-wise and transfer knowledge across multi-domains. It alleviates the "negative transfer" problem. Additionally, we introduce a simple multi-layer perception to learn the user-item interactions and predict the possibility of preferring items by users. Extensive experiments with three domain datasets from one of the most famous services Amazon demonstrate that MFDCR outperforms recent single and cross-domain recommendation algorithms. Furthermore, experimental results show that MFDCR can address the problem of "negative transfer" and improve recommendation performance for multiple domains simultaneously. In addition, we show that our approach is efficient in extending toward more domains.

Practical method to improve usage efficiency of bike-sharing systems

  • Lee, Chun-Hee;Lee, Jeong-Woo;Jung, YungJoon
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.244-259
    • /
    • 2022
  • Bicycle- or bike-sharing systems (BSSs) have received increasing attention as a secondary transportation mode due to their advantages, for example, accessibility, prevention of air pollution, and health promotion. However, in BSSs, due to bias in bike demands, the bike rebalancing problem should be solved. Various methods have been proposed to solve this problem; however, it is difficult to apply such methods to small cities because bike demand is sparse, and there are many practical issues to solve. Thus, we propose a demand prediction model using multiple classifiers, time grouping, categorization, weather analysis, and station correlation information. In addition, we analyze real-world relocation data by relocation managers and propose a relocation algorithm based on the analytical results to solve the bike rebalancing problem. The proposed system is compared experimentally with the results obtained by the real relocation managers.

Rain Detection via Deep Convolutional Neural Networks (심층 컨볼루셔널 신경망 기반의 빗줄기 검출 기법)

  • Son, Chang-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.8
    • /
    • pp.81-88
    • /
    • 2017
  • This paper proposes a method of detecting rain regions from a single image. More specifically, a way of training the deep convolutional neural network based on the collected rain and non-rain patches is presented in a supervised manner. It is also shown that the proposed rain detection method based on deep convolutional neural network can provide better performance than the conventional rain detection method based on dictionary learning. Moreover, it is confirmed that the application of the proposed rain detection for rain removal can lead to some improvement in detail representation on the low-frequency regions of the rain-removed images. Additionally, this paper introduces the rain transfer method that inserts rain patterns into original images, thereby producing rain effects on the resulting images. The proposed rain transfer method could be used to augment rain patterns while constructing rain database.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

Malicious Traffic Classification Using Mitre ATT&CK and Machine Learning Based on UNSW-NB15 Dataset (마이터 어택과 머신러닝을 이용한 UNSW-NB15 데이터셋 기반 유해 트래픽 분류)

  • Yoon, Dong Hyun;Koo, Ja Hwan;Won, Dong Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.99-110
    • /
    • 2023
  • This study proposed a classification of malicious network traffic using the cyber threat framework(Mitre ATT&CK) and machine learning to solve the real-time traffic detection problems faced by current security monitoring systems. We applied a network traffic dataset called UNSW-NB15 to the Mitre ATT&CK framework to transform the label and generate the final dataset through rare class processing. After learning several boosting-based ensemble models using the generated final dataset, we demonstrated how these ensemble models classify network traffic using various performance metrics. Based on the F-1 score, we showed that XGBoost with no rare class processing is the best in the multi-class traffic environment. We recognized that machine learning ensemble models through Mitre ATT&CK label conversion and oversampling processing have differences over existing studies, but have limitations due to (1) the inability to match perfectly when converting between existing datasets and Mitre ATT&CK labels and (2) the presence of excessive sparse classes. Nevertheless, Catboost with B-SMOTE achieved the classification accuracy of 0.9526, which is expected to be able to automatically detect normal/abnormal network traffic.

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • v.4 no.4
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.