• Title/Summary/Keyword: search algorithm

Search Result 3,903, Processing Time 0.033 seconds

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Design and Implementation of Autonomic De-fragmentation for File System Aging (파일 시스템 노화를 해소하기 위한 자동적인 단편화 해결 시스템의 설계와 구현)

  • Lee, Jun-Seok;Park, Hyun-Chan;Yoo, Chuck
    • The KIPS Transactions:PartA
    • /
    • v.16A no.2
    • /
    • pp.101-112
    • /
    • 2009
  • Existing techniques for defragmentation of the file system need intensive disk operation for some periods at specific time such as disk defragmentation program. In this paper, for solving this problem, we design and implement the automatic and continuous defragmentation free system by distributing the disk operation. We propose the Automatic Layout Scoring(ALS) mechanism for measuring defragmentation degree and suggest the Lazy Copy mechanism that copies the defragmented data at idle time for scattering the disk operation. We search the defragmented file by Automatic Layout Scoring mechanism and then find for empty spaces for that searched file. After lazy copy of searched fils to empty space for preventing that file from being lost, the algorithm solves the defragmentation problem by updating the I-node of that file. We implement these algorithms in Linux and evaluate them for small and defragmented file to get the layout scoring. We outperform the Linux EXT2 file system by $2.4%{\sim}10.4%$ in layout scoring evaluation. And the performance of read and write for various file size is better than the EXT2 by $1%{\sim}8.5%$ for write performance and by $1.2%{\sim}7.5%$ for read performance. We suggest this system for solving the problem of defragmentation automatically without disturbing the I/O task and manual management.

Development and Validation of the GPU-based 3D Dynamic Analysis Code for Simulating Rock Fracturing Subjected to Impact Loading (충격 하중 시 암석의 파괴거동해석을 위한 GPGPU 기반 3차원 동적해석기법의 개발과 검증 연구)

  • Min, Gyeong-Jo;Fukuda, Daisuke;Oh, Se-Wook;Cho, Sang-Ho
    • Explosives and Blasting
    • /
    • v.39 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • Recently, with the development of high-performance processing devices such as GPGPU, a three-dimensional dynamic analysis technique that can replace expensive rock material impact tests has been actively developed in the defense and aerospace fields. Experimentally observing or measuring fracture processes occurring in rocks subjected to high impact loads, such as blasting and earth penetration of small-diameter missiles, are difficult due to the inhomogeneity and opacity of rock materials. In this study, a three-dimensional dynamic fracture process analysis technique (3D-DFPA) was developed to simulate the fracture behavior of rocks due to impact. In order to improve the operation speed, an algorithm capable of GPGPU operation was developed for explicit analysis and contact element search. To verify the proposed dynamic fracture process analysis technique, the dynamic fracture toughness tests of the Straight Notched Disk Bending (SNDB) limestone samples were simulated and the propagation of the reflection and transmission of the stress waves at the rock-impact bar interfaces and the fracture process of the rock samples were compared. The dynamic load tests for the SNDB sample applied a Pulse Shape controlled Split Hopkinson presure bar (PS-SHPB) that can control the waveform of the incident stress wave, the stress state, and the fracture process of the rock models were analyzed with experimental results.

Sleep Deprivation Attack Detection Based on Clustering in Wireless Sensor Network (무선 센서 네트워크에서 클러스터링 기반 Sleep Deprivation Attack 탐지 모델)

  • Kim, Suk-young;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.1
    • /
    • pp.83-97
    • /
    • 2021
  • Wireless sensors that make up the Wireless Sensor Network generally have extremely limited power and resources. The wireless sensor enters the sleep state at a certain interval to conserve power. The Sleep deflation attack is a deadly attack that consumes power by preventing wireless sensors from entering the sleep state, but there is no clear countermeasure. Thus, in this paper, using clustering-based binary search tree structure, the Sleep deprivation attack detection model is proposed. The model proposed in this paper utilizes one of the characteristics of both attack sensor nodes and normal sensor nodes which were classified using machine learning. The characteristics used for detection were determined using Long Short-Term Memory, Decision Tree, Support Vector Machine, and K-Nearest Neighbor. Thresholds for judging attack sensor nodes were then learned by applying the SVM. The determined features were used in the proposed algorithm to calculate the values for attack detection, and the threshold for determining the calculated values was derived by applying SVM.Through experiments, the detection model proposed showed a detection rate of 94% when 35% of the total sensor nodes were attack sensor nodes and improvement of up to 26% in power retention.

Analysis of the Precedence of Stock Price Variables Using Cultural Content Big Data (문화콘텐츠 빅데이터를 이용한 주가 변수 선행성 분석)

  • Ryu, Jae Pil;Lee, Ji Young;Jeong, Jeong Young
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.222-230
    • /
    • 2022
  • Recently, Korea's cultural content industry is developing, and behind the growing recognition around the world is the real-time sharing service of global network users due to the development of science and technology. In particular, in the case of YouTube, its propagation power is fast and powerful in that everyone, not limited users, can become potential video providers. As more than 80% of mobile phone users are using YouTube in Korea, YouTube's information means that psychological factors of users are reflected. For example, information such as the number of video views, likes, and comments of a channel with a specific personality shows a measure of the channel's personality interest. This is highly related to the fact that information such as the frequency of keyword search on portal sites is closely related to the stock market economically and psychologically. Therefore, in this study, YouTube information from a representative entertainment company is collected through a crawling algorithm and analyzed for the causal relationship with major variables related to stock prices. This study is considered meaningful in that it conducted research by combining cultural content, IT, and financial fields in accordance with the era of the fourth industry.

Leision Detection in Chest X-ray Images based on Coreset of Patch Feature (패치 특징 코어세트 기반의 흉부 X-Ray 영상에서의 병변 유무 감지)

  • Kim, Hyun-bin;Chun, Jun-Chul
    • Journal of Internet Computing and Services
    • /
    • v.23 no.3
    • /
    • pp.35-45
    • /
    • 2022
  • Even in recent years, treatment of first-aid patients is still often delayed due to a shortage of medical resources in marginalized areas. Research on automating the analysis of medical data to solve the problems of inaccessibility for medical services and shortage of medical personnel is ongoing. Computer vision-based medical inspection automation requires a lot of cost in data collection and labeling for training purposes. These problems stand out in the works of classifying lesion that are rare, or pathological features and pathogenesis that are difficult to clearly define visually. Anomaly detection is attracting as a method that can significantly reduce the cost of data collection by adopting an unsupervised learning strategy. In this paper, we propose methods for detecting abnormal images on chest X-RAY images as follows based on existing anomaly detection techniques. (1) Normalize the brightness range of medical images resampled as optimal resolution. (2) Some feature vectors with high representative power are selected in set of patch features extracted as intermediate-level from lesion-free images. (3) Measure the difference from the feature vectors of lesion-free data selected based on the nearest neighbor search algorithm. The proposed system can simultaneously perform anomaly classification and localization for each image. In this paper, the anomaly detection performance of the proposed system for chest X-RAY images of PA projection is measured and presented by detailed conditions. We demonstrate effect of anomaly detection for medical images by showing 0.705 classification AUROC for random subset extracted from the PadChest dataset. The proposed system can be usefully used to improve the clinical diagnosis workflow of medical institutions, and can effectively support early diagnosis in medically poor area.

Scheduling of Parallel Offset Printing Process for Packaging Printing (패키징 인쇄를 위한 병렬 오프셋 인쇄 공정의 스케줄링)

  • Jaekyeong, Moon;Hyunchul, Tae
    • KOREAN JOURNAL OF PACKAGING SCIENCE & TECHNOLOGY
    • /
    • v.28 no.3
    • /
    • pp.183-192
    • /
    • 2022
  • With the growth of the packaging industry, demand on the packaging printing comes in various forms. Customers' orders are diversifying and the standards for quality are increasing. Offset printing is mainly used in the packaging printing since it is easy to print in large quantities. However, productivity of the offset printing decreases when printing various order. This is because it takes time to change colors for each printing unit. Therefore, scheduling that minimizes the color replacement time and shortens the overall makespan is required. By the existing manual method based on workers' experience or intuition, scheduling results may vary for workers and this uncertainty increase the production cost. In this study, we propose an automated scheduling method of parallel offset printing process for packaging printing. We decompose the original problem into assigning and sequencing orders, and ink arrangement for printing problems. Vehicle routing problem and assignment problem are applied to each part. Mixed integer programming is used to model the problem mathematically. But it needs a lot of computational time to solve as the size of the problem grows. So guided local search algorithm is used to solve the problem. Through actual data experiments, we reviewed our method's applicability and role in the field.

Water leakage accident analysis of water supply networks using big data analysis technique (R기반 빅데이터 분석기법을 활용한 상수도시스템 누수사고 분석)

  • Hong, Sung-Jin;Yoo, Do-Guen
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1261-1270
    • /
    • 2022
  • The purpose of this study is to collect and analyze information related to water leaks that cannot be easily accessed, and utilized by using the news search results that people can easily access. We applied a web crawling technique for extracting big data news on water leakage accidents in the water supply system and presented an algorithm in a procedural way to obtain accurate leak accident news. In addition, a data analysis technique suitable for water leakage accident information analysis was developed so that additional information such as the date and time of occurrence, cause of occurrence, location of occurrence, damaged facilities, damage effect. The primary goal of value extraction through big data-based leak analysis proposed in this study is to extract a meaningful value through comparison with the existing waterworks statistical results. In addition, the proposed method can be used to effectively respond to consumers or determine the service level of water supply networks. In other words, the presentation of such analysis results suggests the need to inform the public of information such as accidents a little more, and can be used in conjunction to prepare a radio wave and response system that can quickly respond in case of an accident.

A Study on Innovation Plan of Archives' Recording Service using Social Media: Focused on Gyeongnam Archives and Seoul Metropolitan Archives (소셜미디어를 이용한 기록관리기관의 기록서비스 혁신 방안 연구: 경남기록원과 서울기록원을 중심으로)

  • Kim, Ye-ji;Kim, Ik-han
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.2
    • /
    • pp.1-25
    • /
    • 2022
  • Today, most archives provide recording services through social media; however, their effectiveness is very low. This study aimed to analyze the causes of insufficient social media recording service, focusing on Gyeongnam Archives and Seoul Metropolitan Archives, which are permanent records management institutions and local government archives, and design ways to create synergy by mutual growth with classical recording service. Through literature research, the characteristics and mechanisms of each social medium were identified, and the institutions' current status of social media operations and internal documents were reviewed to analyze the common problems. An in-depth analysis was conducted by interviewing the person in charge of recording services at each institution. In addition, a plan that can be applied to archives was proposed by reviewing the cases of social media operations of domestic-related institutions and overseas archives. Based on this, a new recording service process was established, strategic operation plans for each social medium were proposed, and a plan to mutually grow with the existing recording service was designed.

NFT(Non-Fungible Token) Patent Trend Analysis using Topic Modeling

  • Sin-Nyum Choi;Woong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.41-48
    • /
    • 2023
  • In this paper, we propose an analysis of recent trends in the NFT (Non-Fungible Token) industry using topic modeling techniques, focusing on their universal application across various industrial fields. For this study, patent data was utilized to understand industry trends. We collected data on 371 domestic and 454 international NFT-related patents registered in the patent information search service KIPRIS from 2017, when the first NFT standard was introduced, to October 2023. In the preprocessing stage, stopwords and lemmas were removed, and only noun words were extracted. For the analysis, the top 50 words by frequency were listed, and their corresponding TF-IDF values were examined to derive key keywords of the industry trends. Next, Using the LDA algorithm, we identified four major latent topics within the patent data, both domestically and internationally. We analyzed these topics and presented our findings on NFT industry trends, underpinned by real-world industry cases. While previous review presented trends from an academic perspective using paper data, this study is significant as it provides practical trend information based on data rooted in field practice. It is expected to be a useful reference for professionals in the NFT industry for understanding market conditions and generating new items.