• Title/Summary/Keyword: Supercomputers

Search Result 52, Processing Time 0.021 seconds

Next-generation Sequencing for Environmental Biology - Full-fledged Environmental Genomics around the Corner (차세대 유전체 기술과 환경생물학 - 환경유전체학 시대를 맞이하여)

  • Song, Ju Yeon;Kim, Byung Kwon;Kwon, Soon-Kyeong;Kwak, Min-Jung;Kim, Jihyun F.
    • Korean Journal of Environmental Biology
    • /
    • v.30 no.2
    • /
    • pp.77-89
    • /
    • 2012
  • With the advent of the genomics era powered by DNA sequencing technologies, life science is being transformed significantly and biological research and development have been accelerated. Environmental biology concerns the relationships among living organisms and their natural environment, which constitute the global biogeochemical cycle. As sustainability of the ecosystems depends on biodiversity, examining the structure and dynamics of the biotic constituents and fully grasping their genetic and metabolic capabilities are pivotal. The high-speed high-throughput next-generation sequencing can be applied to barcoding organisms either thriving or endangered and to decoding the whole genome information. Furthermore, diversity and the full gene complement of a microbial community can be elucidated and monitored through metagenomic approaches. With regard to human welfare, microbiomes of various human habitats such as gut, skin, mouth, stomach, and vagina, have been and are being scrutinized. To keep pace with the rapid increase of the sequencing capacity, various bioinformatic algorithms and software tools that even utilize supercomputers and cloud computing are being developed for processing and storage of massive data sets. Environmental genomics will be the major force in understanding the structure and function of ecosystems in nature as well as preserving, remediating, and bioprospecting them.

An Economic Analysis on the Operation Effect of Public Supercomputer (공용 슈퍼컴퓨터 운영효과에 대한 경제성 분석)

  • Lee, Hyung Jin;Choi, Youn Keun;Park, Jinsoo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.4
    • /
    • pp.69-79
    • /
    • 2018
  • We performs the cost-benefit analysis, an economical analysis technique, to measure the effect of a shared public supercomputer. The costs of two given alternatives, to share the public supercomputer in a national center and to employ their own supercomputers in the organizations under the necessity, will be estimated and compared for decision making. In the case of sharing, we can simply predict the cost based on the results of the previous public supercomputer. The cost of individual introduction, however, is almost unpredictable since it has a remarkable variability due to the required system performances, locations, human factor, and so on. Accordingly, an objective and valid method to estimate the cost of individual cases will be proposed in this research. Finally, we analyze the economic effect of operating public supercomputer by comparing the sharing cost with that of the individual employs. The results of analysis confirms that the sharing public supercomputer will reduce the operational cost about 10.3 billion won annually compared with the individual introduction. Accordingly, it is expected that the sharing public supercomputer will bring a considerable economical effect.

A Job Allocation Manager for Dynamic Remote Execution of Distributed Jobs in P2P Network (분산처리 작업의 동적 원격실행을 위한 P2P 기반 작업 할당 관리자)

  • Lee, Seung-Ha;Kim, Yang-Woo
    • Journal of Internet Computing and Services
    • /
    • v.7 no.6
    • /
    • pp.87-103
    • /
    • 2006
  • Advances in computer and network technology provide new computing environment that were only possible with supercomputers before. In order to provide the environment, a distributed runtime system has to be provided, but most of the conventional distributed runtime systems lack in providing dynamic and flexible system reconfiguration depending on workload variance, due to a static architecture of fixed master node and slave working nodes. This paper proposes and implements a new model for distributed job allocation and management which is a distributed runtime system is P2P environment for flexible and dynamic system reconfiguration. The implemented systems enables job program transfer and management, remote compile and execution among cooperative developers based on P2P standard protocol Jxta platform. Since it makes dynamic and flexible system reconfiguration possible, the proposed method has some advantages in that it can collect and utilize idle computing resources immediately at a needed time for distributed job processing. Moreover, the implemented system's effectiveness and performance increase are shown by applying and processing the crawler jobs, in a distributed way, for collecting a large amount of data needed for internet search.

  • PDF

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Improving the Job Success Rate through Analysis of User Logs in HPC (HPC 환경에서 사용자 로그 분석을 통한 작업 성공률 개선)

  • Yoon, JunWeon;Hong, TaeYoung;Kong, Ki-Sik;Park, ChanYeol
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.691-697
    • /
    • 2015
  • Supercomputers are used for many different areas including new product design of industries as well as state-of-the-art science and technology for large amount of computational needs. Tachyon is a 4th supercomputer built at KISTI that is a high-performance parallel computing system with 3,200 computing nodes and infrastructures. This system is currently about 10,000 users and over 170 organizations are used, the number of jobs they are performing work in batch type form through a scheduler. Also, this system logs lots of job scripts, execution environment, library, job status from the job submit to end. In this paper, we analyzed batch jobs information from Sun Grid Engine, that use as a scheduler in Tachyon system, and job executed information in Tachyon System. In particular, we distinguished the fail jobs from the all tasks that users perform and we analyzed the cause of failure. Among them, we can extracted some of jobs that can be regarded as normal jobs through the improvement in those works logged as all of fail jobs.

An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance

  • Srinivasan, Kathiravan;Chang, Chuan-Yu;Huang, Chao-Hsi;Chang, Min-Hao;Sharma, Anant;Ankur, Avinash
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.989-1009
    • /
    • 2018
  • Rapid advances in science and technology with exponential development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers has been witnessed over the past few years. The sudden increase in the Internet population and manifold growth in internet speeds has occasioned the generation of an enormous amount of data, now termed 'big data'. Given this scenario, storage of data on local servers or a personal computer is an issue, which can be resolved by utilizing cloud computing. At present, there are several cloud computing service providers available to resolve the big data issues. This paper establishes a framework that builds Hadoop clusters on the new single-board computer (SBC) Mobile Raspberry Pi. Moreover, these clusters offer facilities for storage as well as computing. Besides the fact that the regular data centers require large amounts of energy for operation, they also need cooling equipment and occupy prime real estate. However, this energy consumption scenario and the physical space constraints can be solved by employing a Mobile Raspberry Pi with Hadoop clusters that provides a cost-effective, low-power, high-speed solution along with micro-data center support for big data. Hadoop provides the required modules for the distributed processing of big data by deploying map-reduce programming approaches. In this work, the performance of SBC clusters and a single computer were compared. It can be observed from the experimental data that the SBC clusters exemplify superior performance to a single computer, by around 20%. Furthermore, the cluster processing speed for large volumes of data can be enhanced by escalating the number of SBC nodes. Data storage is accomplished by using a Hadoop Distributed File System (HDFS), which offers more flexibility and greater scalability than a single computer system.

Container-based Cluster Management System for User-driven Distributed Computing (사용자 맞춤형 분산 컴퓨팅을 위한 컨테이너 기반 클러스터 관리 시스템)

  • Park, Ju-Won;Hahm, Jaegyoon
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.9
    • /
    • pp.587-595
    • /
    • 2015
  • Several fields of science have traditionally demanded large-scale workflow support, which requires thousands of central processing unit (CPU) cores. In order to support such large-scale scientific workflows, large-capacity cluster systems such as supercomputers are widely used. However, as users require a diversity of software packages and configurations, a system administrator has some trouble in making a service environment in real time. In this paper, we present a container-based cluster management platform and introduce an implementation case to minimize performance reduction and dynamically provide a distributed computing environment desired by users. This paper offers the following contributions. First, a container-based virtualization technology is assimilated with a resource and job management system to expand applicability to support large-scale scientific workflows. Second, an implementation case in which docker and HTCondor are interlocked is introduced. Lastly, docker and native performance comparison results using two widely known benchmark tools and Monte-Carlo simulation implemented using various programming languages are presented.

State of Information Technology and Its Application in Agricultural Meteorology (농업기상활용 정보기술 현황)

  • Byong-Lyol Lee;Dong-Il Lee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.6 no.2
    • /
    • pp.118-126
    • /
    • 2004
  • Grid is a new Information Technology (IT) concept of "super Internet" for high-performance computing: worldwide collections of high-end resources such as supercomputers, storage, advanced instruments and immerse environments. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, real-time data sources and instruments, and human collaborators. The term "the Grid" was coined in the mid1990s to denote a proposed distributed computing infrastructure for advanced science and engineering. The term computational Grids refers to infrastructures aimed at allowing users to access and/or aggregate potentially large numbers of powerful and sophisticated resources. More formally, Grids are defined as infrastructure allowing flexible, secure, and coordinated resource sharing among dynamic collections of individuals, institutions and resources referred to as virtual Organizations. GRID is an emerging IT as a kind of next generation Internet technology which will fit very well with agrometeorological services in the future. I believe that it would contribute to the resource sharing in agrometeorology by providing super computing power, virtual storage, and efficient data exchanges, especially for developing countries that are suffering from the lack of resources for their agmet services at national level. Thus, the establishment of CAgM-GRID based on existing RADMINSII is proposed as a part of FWIS of WMO.part of FWIS of WMO.

A Technique for Provisioning Virtual Clusters in Real-time and Improving I/O Performance on Computational-Science Simulation Environments (계산과학 시뮬레이션을 위한 실시간 가상 클러스터 생성 및 I/O 성능 향상 기법)

  • Choi, Chanho;Lee, Jongsuk Ruth;Kim, Hangi;Jin, DuSeok;Yu, Jung-lok
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.13-18
    • /
    • 2015
  • Computational science simulations have been used to enable discovery in a broad spectrum of application areas, these simulations show irregular demanding characteristics of computing resources from time to time. The adoption of virtualized high performance cloud, rather than CPU-centric computing platform (such as supercomputers), is gaining interest of interests mainly due to its ease-of-use, multi-tenancy and flexibility. Basically, provisioning a virtual cluster, which consists of a lot of virtual machines, in a real-time has a critical impact on the successful deployment of the virtualized HPC clouds for computational science simulations. However, the cost of concurrently creating many virtual machines in constructing a virtual cluster can be as much as two orders of magnitude worse than expected. One of the main factors in this bottleneck is the time spent to create the virtual images for the virtual machines. In this paper, we propose a novel technique to minimize the creation time of virtual machine images and improve I/O performance of the provisioned virtual clusters. We also confirm that our proposed technique outperforms the conventional ones using various sets of experiments.