• Title/Summary/Keyword: Large Size data Processing

Search Result 246, Processing Time 0.022 seconds

Quantitative Analysis of Spatial Resolution for the Influence of the Focus Size and Digital Image Post-Processing on the Computed Radiography (CR(Computed Radiography)에서 초점 크기와 디지털영상후처리에 따른 공간분해능의 정량적 분석)

  • Seoung, Youl-Hun
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.407-414
    • /
    • 2014
  • The aim of the present study was to carry out quantitative analysis of spatial resolution for the influence of the focus size and digital image post-processing on the Computed Radiography (CR). The modulation transfer functions of an edge measuring method (MTF) was used for the evaluation of the spatial resolution. The focus size of X-ray tube was used the small focus (0.6 mm) and the large focus (1.2 mm). We evaluated the 50% and 10% of MTF for the enhancement of edge and contrast by using multi-scale image contrast amplification (MUSICA) in digital image post-processing. As a results, the edge enhancement than the contrast enhancement were significantly higher the spatial resolution of MTF 50% in all focus. Also the spatial resolution of the obtained images in a large focus were improved by digital image processing. In conclusion, the results of this study should serve as a basic data for obtain the high resolution clinical images, such as skeletal and chest images on the CR.

Cross-Validation Probabilistic Neural Network Based Face Identification

  • Lotfi, Abdelhadi;Benyettou, Abdelkader
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1075-1086
    • /
    • 2018
  • In this paper a cross-validation algorithm for training probabilistic neural networks (PNNs) is presented in order to be applied to automatic face identification. Actually, standard PNNs perform pretty well for small and medium sized databases but they suffer from serious problems when it comes to using them with large databases like those encountered in biometrics applications. To address this issue, we proposed in this work a new training algorithm for PNNs to reduce the hidden layer's size and avoid over-fitting at the same time. The proposed training algorithm generates networks with a smaller hidden layer which contains only representative examples in the training data set. Moreover, adding new classes or samples after training does not require retraining, which is one of the main characteristics of this solution. Results presented in this work show a great improvement both in the processing speed and generalization of the proposed classifier. This improvement is mainly caused by reducing significantly the size of the hidden layer.

The File Splitting Distribution Scheme Using the P2P Networks with The Mesh topology (그물망 위상의 P2P 네트워크를 활용한 파일 분리 분산 방안)

  • Lee Myoung-Hoon;Park Jung-Su;Kim Jin-Hong;Jo In-June
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.8
    • /
    • pp.1669-1675
    • /
    • 2005
  • Recently, the small sized wireless terminals have problems of processing of large sized file because of the trends of a small sized terminals and a large sized files. Moreover, the web servers or the file servers have problems of the overload because of the concentration with many number of files to the them. Also, There is a security vulnerability of the data processing caused by the processing with a unit of the independent file. To resolve the problems, this paper proposes a new scheme of fat splining distribution using the P2P networks with the mesh topology. The proposed scheme is to distribute blocks of file into any peer of P2P networks. It can do that the small sized wireless terminals can process the large size file, the overload problems of a web or file servers can solve because of the decentralized files, and, the security vulnerability of the data processing is mitigated because of the distributed processing with a unit of the blocks to the peers.

An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce (맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘)

  • Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.

Combining Local and Global Features to Reduce 2-Hop Label Size of Directed Acyclic Graphs

  • Ahn, Jinhyun;Im, Dong-Hyuk
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.201-209
    • /
    • 2020
  • The graph data structure is popular because it can intuitively represent real-world knowledge. Graph databases have attracted attention in academia and industry because they can be used to maintain graph data and allow users to mine knowledge. Mining reachability relationships between two nodes in a graph, termed reachability query processing, is an important functionality of graph databases. Online traversals, such as the breadth-first and depth-first search, are inefficient in processing reachability queries when dealing with large-scale graphs. Labeling schemes have been proposed to overcome these disadvantages. The state-of-the-art is the 2-hop labeling scheme: each node has in and out labels containing reachable node IDs as integers. Unfortunately, existing 2-hop labeling schemes generate huge 2-hop label sizes because they only consider local features, such as degrees. In this paper, we propose a more efficient 2-hop label size reduction approach. We consider the topological sort index, which is a global feature. A linear combination is suggested for utilizing both local and global features. We conduct experiments over real-world and synthetic directed acyclic graph datasets and show that the proposed approach generates smaller labels than existing approaches.

A Study on Selecting Bitmap Join Index to Speed up Complex Queries in Relational Data Warehouses (관계형 데이터 웨어하우스의 복잡한 질의의 처리 효율 향상을 위한 비트맵 조인 인덱스 선택에 관한 연구)

  • An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.1-14
    • /
    • 2012
  • As the size of the data warehouse is large, the selection of indices on the data warehouse affects the efficiency of the query processing of the data warehouse. Indices induce the lower query processing cost, but they occupy the large storage areas and induce the index maintenance cost which are accompanied by database updates. The bitmap join indices are well applied when we optimize the star join queries which join a fact table and many dimension tables and the selection on dimension tables in data warehouses. Though the bitmap join indices with the binary representations induce the lower storage cost, the task to select the indexing attributes among the huge candidate attributes which are generated is difficult. The processes of index selection are to reduce the number of candidate attributes to be indexed and then select the indexing attributes. In this paper on bitmap join index selection problem we reduce the number of candidate attributes by the data mining techniques. Compared to the existing techniques which reduce the number of candidate attributes by the frequencies of attributes we consider the frequencies of attributes and the size of dimension tables and the size of the tuples of the dimension tables and the page size of disk. We use the mining of the frequent itemsets as mining techniques and reduce the great number of candidate attributes. We make the bitmap join indices which have the least costs and the least storage area adapted to storage constraints by using the cost functions applied to the bitmap join indices of the candidate attributes. We compare the existing techniques and ours and analyze them in order to evaluate the efficiencies of ours.

External Merge Sorting in Tajo with Variable Server Configuration (매개변수 환경설정에 따른 타조의 외부합병정렬 성능 연구)

  • Lee, Jongbaeg;Kang, Woon-hak;Lee, Sang-won
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.820-826
    • /
    • 2016
  • There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

A Study of File Replacement Policy in Data Grid Environments (데이터 그리드 환경에서 파일 교체 정책 연구)

  • Park, Hong-Jin
    • The KIPS Transactions:PartA
    • /
    • v.13A no.6 s.103
    • /
    • pp.511-516
    • /
    • 2006
  • The data grid computing provides geographically distributed storage resources to solve computational problems with large-scale data. Unlike cache replacement policies in virtual memory or web-caching replacement, an optimal file replacement policy for data grids is the one of the important problems by the fact that file size is very large. The traditional file replacement policies such as LRU(Least Recently Used) LCB-K(Least Cost Beneficial based on K), EBR(Economic-based cache replacement), LVCT(Least Value-based on Caching Time) have the problem that they have to predict requests or need additional resources to file replacement. To solve theses problems, this paper propose SBR-k(Sized-based replacement-k) that replaces files based on file size. The results of the simulation show that the proposed policy performs better than traditional policies.

Efficient Data Scheduling considering number of Spatial query of Client in Wireless Broadcast Environments (무선방송환경에서 클라이언트의 공간질의 수를 고려한 효율적인 데이터 스케줄링)

  • Song, Doohee;Park, Kwangjin
    • Journal of Internet Computing and Services
    • /
    • v.15 no.2
    • /
    • pp.33-39
    • /
    • 2014
  • How to transfer spatial data from server to client in wireless broadcasting environment is shown as following: A server arranges data information that client wants and transfers data by one-dimensional array for broadcasting cycle. Client listens data transferred by the server and returns resulted value only to server. Recently number of users using location-based services is increasing alongside number of objects, and data volume is changing into large amount. Large volume of data in wireless broadcasting environment may increase query time of client. Therefore, we propose Client based Data Scheduling (CDS) for efficient data scheduling in wireless broadcasting environment. CDS divides map and then calculates total sum of objects for each grid by considering number of objects and data size within divided grids. It carries out data scheduling by applying hot-cold method considering total data size of objects for each grid and number of client. It's proved that CDS reduces average query processing time for client compared to existing method.