• Title/Summary/Keyword: File Cluster

Search Result 114, Processing Time 0.022 seconds

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

Implementation and Performance Analysis of Single I/O Space Service for Cluster Computers (클러스터 컴퓨터를 위한 단일 I/O 공간 서비스의 구현 및 성능분석)

  • Kim, Tae-Kyu;Kim, Bang-Hyun;Kim, Jong-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.6 s.103
    • /
    • pp.517-524
    • /
    • 2006
  • In cluster computers, it is essential to Implement the single I/O space(SIOS) supporting integrated I/O substructure to efficiently process I/O intensive applications. SIOS service provides with global I/O address space to directly access peripherals and hard disks in its own or remote nodes from any node in the cluster computer In this thesis, we propose the implementation method of SIOS in Linux clusters by using only freewares. This method is implemented at device driver level that uses Enhanced Network Block Device(ENBD) and file system level that uses S/W RAID and NFS. The major strengths of this method are easiness of implementation and almost no cost due to using freewares. In addition, since freewares used are open sources, it is possible to apply this method to other platforms with only slight modification. Moreover, experiments show that I/O throughputs are up to 5.5 times higher in write operations and approximately 2.3 times higher in read operations than those of CDD method that uses the device driver developed at kernel level.

Comparison of Directory Structures for SAN Based Very Large File Systems (SAN 환경 대용량 파일 시스템을 위한 디렉토리 구조 비교)

  • 김신우;이용규
    • The Journal of Society for e-Business Studies
    • /
    • v.9 no.1
    • /
    • pp.83-104
    • /
    • 2004
  • Recently, information systems that require storage and retrieval of huge amount of data are becoming used widely. Accordingly, research efforts have been made to develop Linux cluster file systems in the SAN environment in which clients themselves can manage metadata and access data directly. Also a semi-flat directory structure based on extendible hashing has been proposed to support fast retrieval of files[1]. In this research, we have designed and implemented the semi-flat extendible hash directory under the Linux system. In order to evaluate the practicality of the directory, we have also implemented the B+-tree based directory and experimented the performance. According to the performance comparisons, the extendible hash directory has the better performance at insert, delete, and search operations. On the other hand, the B+-tree directory is better at sorting files.

  • PDF

Species composition of the catches collected by trammel net in the coastal waters off Ulleungdo of Korea (울릉도해역에서 삼중자망에 의한 어획물의 종조성)

  • CHUNG, Sangdeok;CHA, Hyung Kee;LEE, Jae Bong;LEE, Hae Won;YANG, Jae Hyeong
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.51 no.4
    • /
    • pp.567-575
    • /
    • 2015
  • Species composition in the coastal waters off Ulleungdo of Korea were examined based on catches bimonthly collected by trammel net in 2013. A total of 711 individuals and 181.9 Kg were caught and catches were composed of 4 classes 15 orders 27 families 52 species including 44 Pisces, 4 Gastropoda, 3 Cephalopoda, and 1 Echinodermata. The dominant species in biomass were File fish (Thamnaconus modestus), Atka mackerel (Pleurogrammus azonus), and Greenling (Hexagrammos otakii). Data were summarized using hierarchical cluster analysis (HCA) and detrended correspondence analysis (DCA) to examine similarity in species composition for each month, and community structure in Ulleungdo was divided into two groups. Community structures in February, April and December with low temperature and well-mixed surface water were distinguished from those in June, August and October with high temperature and strong stratification, which could be attributed to temporal changes in dominant species. Atka mackerel and Spear squid mainly caught in February and April, disappearing in June, August and October, and File fish outburst was shown in October. Because the water off Ulleungdo has been under low human pressure, it could be a good case study to elucidate effects of climate change on community structure and ecosystem in the East sea. Continuous surveys and further studies are required to demonstrate migration route and distribution of dominant species and long-term changes in community structure in the water of Ulleungdo.

Implementation of Data processing of the High Availability for Software Architecture of the Cloud Computing (클라우드 서비스를 위한 고가용성 대용량 데이터 처리 아키텍쳐)

  • Lee, Byoung-Yup;Park, Junho;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.32-43
    • /
    • 2013
  • These days, there are more and more IT research institutions which foresee cloud services as the predominant IT service in the near future and there, in fact, are actual cloud services provided by some IT leading vendors. Regardless of physical location of the service and environment of the system, cloud service can provide users with storage services, usage of data and software. On the other hand, cloud service has challenges as well. Even though cloud service has its edge in terms of the extent to which the IT resource can be freely utilized regardless of the confinement of hardware, the availability is another problem to be solved. Hence, this paper is dedicated to tackle the aforementioned issues; prerequisites of cloud computing for distributed file system, open source based Hadoop distributed file system, in-memory database technology and high availability database system. Also the author tries to body out the high availability mass distributed data management architecture in cloud service's perspective using currently used distributed file system in cloud computing market.

Application of Group Master Cache for the Integrated Environment of SAN and NAS (Group Master Cache를 활용한 SAN과 NAS의 통합 방안)

  • Lee, Won-Bok;Park, Jin-Won
    • Journal of the Korea Society for Simulation
    • /
    • v.16 no.2
    • /
    • pp.9-15
    • /
    • 2007
  • As the Internet grows and the mass multimedia data become popular, the storage system migrates from DAS, where the storage and the server are directly connected, to SAN and NAS. SAN connects the storages with a separate network, and NAS provides only file services, connects the storages with IP network. However, SAN and NAS can not fulfill the needs for companies if used separately, thus are asked to be integrated. In this research, we propose an efficient data sharing method which employees the concept of GMC, Croup Master Cache for the integrated environment of SAN and NAS. GMC is based on MCI, Metadata server and Cluster system Integration, but tries to solve the high expansion cost problem with MCI. We introduce the basic concept of GMC, compare the performance of GMC with that of MCI using computer simulation.

  • PDF

Efficient Content-based Load Distribution for Web Server Clusters (웹 서버 클러스터를 위한 효율적인 내용 기반의 부하 분배)

  • Chung Ji Yung;Kim Sungsoo
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.1
    • /
    • pp.60-67
    • /
    • 2005
  • A cluster consists of a collection of interconnected stand-alone computers working together and provides a high-availability solution in application area such as web services or information systems. Content-based load distribution for web server clusters uses the detailed data found in the application layer to intelligently route user requests among web servers. In this paper, we propose a content-based load distribution algorithm that considers cache hit and load information of the web servers under the web server clusters. In addition, we expand this algorithm in order to manage user requests for dynamic file. Specially, our algorithm does not keep track of any frequency of access information or try to model the contents of the caches of the web servers.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

Deployment and Performance Analysis of Data Transfer Node Cluster for HPC Environment (HPC 환경을 위한 데이터 전송 노드 클러스터 구축 및 성능분석)

  • Hong, Wontaek;An, Dosik;Lee, Jaekook;Moon, Jeonghoon;Seok, Woojin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.197-206
    • /
    • 2020
  • Collaborative research in science applications based on HPC service needs rapid transfers of massive data between research colleagues over wide area network. With regard to this requirement, researches on enhancing data transfer performance between major superfacilities in the U.S. have been conducted recently. In this paper, we deploy multiple data transfer nodes(DTNs) over high-speed science networks in order to move rapidly large amounts of data in the parallel filesystem of KISTI's Nurion supercomputer, and perform transfer experiments between endpoints with approximately 130ms round trip time. We have shown the results of transfer throughput in different size file sets and compared them. In addition, it has been confirmed that the DTN cluster with three nodes can provide about 1.8 and 2.7 times higher transfer throughput than a single node in two types of concurrency and parallelism settings.

Efficient Load Balancing Scheme using Resource Information in Web Server System (웹 서버 시스템에서의 자원 정보를 이용한 효율적인 부하분산 기법)

  • Chang Tae-Mu;Myung Won-Shig;Han Jun-Tak
    • The KIPS Transactions:PartA
    • /
    • v.12A no.2 s.92
    • /
    • pp.151-160
    • /
    • 2005
  • The exponential growth of Web users requires the web serves with high expandability and reliability. It leads to the excessive transmission traffic and system overload problems. To solve these problems, cluster systems are widely studied. In conventional cluster systems, when the request size is large owing to such types as multimedia and CGI, the particular server load and response time tend to increase even if the overall loads are distributed evenly. In this paper, a cluster system is proposed where each Web server in the system has different contents and loads are distributed efficiently using the Web server resource information such as CPU, memory and disk utilization. Web servers having different contents are mutually connected and managed with a network file system to maintain information consistency required to support resource information updates, deletions, and additions. Load unbalance among contents group owing to distribution of contents can be alleviated by reassignment of Web servers. Using a simulation method, we showed that our method shows up to $50\%$ about average throughput and processing time improvement comparing to systems using each LC method and RR method.