• 제목/요약/키워드: Target Collection

Search Result 295, Processing Time 0.023 seconds

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.4 no.1
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

Universal-, Genus-specific, Species-specific Probes and Primers Design for Microbial Identification

  • Park, Jun-Hyung;Park, Hee-Kyung;Song, Eunsil;Jang, Hyun-Jung;Kang, Byeong-Chul;Lee, Seung-Won;Kim, Hyun-Jin;Kim, Cheol-Min
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.399-401
    • /
    • 2005
  • MIPROBE is a web-based tool for design of universal, genus-specific, and species-specific primers and probes. The main functions of MIPROBE are collection of target gene sequences, construction of consensus sequences, collection of candidate primers and probes, and evaluation of candidates by BLAST. Biologists with little computer skills can easily use MIPROBE to design large-scale universal, genus-, and species-specific primers and probes. This software is available at http://www.miprobe.com. Also detailed descriptions of how to use the program are found at this site.

  • PDF

IMAGE COLLECTION PLANNING ALGORITHM FOR SINGLE PASS STEREO IMAGING

  • Kang, Chi-Ho;Ahn, Sang-II;Cheon, Yee-Jin
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.255-258
    • /
    • 2008
  • The DEM (Digital Elevation Model) can be obtained from stereo image pair acquired by LEO satellite. Stereo images may consist of at least two images with different viewing angles to the imaging target for one pass or multiple passes. While each image is generally acquired from each pass in cross-track direction for multiple passes, stereo image pair in along-track direction can be acquired during one pass with attitude control capability for the pitch axis of the satellite. Single pass stereo imaging provides stereo pair image more efficiently on the fact that stereo pair image is generated with less orbit resources and less imaging time consumption. In this paper, the feasibility study result on the stereo pair image collection planning algorithm during single pass is addressed.

  • PDF

Feasibility study through simulation of LSM propulsion system for the catenary-current collection run tester (전차선로-집전계 주행시험기의 LSM 추진장치의 타당성 검토 시뮬레이션)

  • Kwon, Sam-Young;Lee, Hyeung-Woo;Park, Hyun-June;Lee, Ju
    • Proceedings of the KIEE Conference
    • /
    • 2006.07b
    • /
    • pp.1101-1102
    • /
    • 2006
  • In this paper, as a conceptual design of the catenary-current collection run tester which is planning to be constructed by KRRI, the feasibility study is described. In this study, reviews to determine the propulsion linear motor rating based on the target distance-speed curve through various simulation of LSM propulsion system were conducted. Moreover, the reviews of simulation results and desirable linear motor specifications are discussed in this paper.

  • PDF

Large eddy simulation of flow over a wooded building complex

  • Rehm, R.G.;McGrattan, K.B.;Baum, H.R.
    • Wind and Structures
    • /
    • v.5 no.2_3_4
    • /
    • pp.291-300
    • /
    • 2002
  • An efficient large eddy simulation algorithm is used to compute surface pressure distributions on an eleven story (target) building on the NIST campus. Local meteorology, neighboring buildings, topography and large vegetation (trees) all play an important part in determining the flows and therefore the pressures experienced by the target. The wind profile imposed at the upstream surface of the computational domain follows a power law with an exponent representing a suburban terrain. This profile accounts for the flow retardation due to friction from the surface of the earth, but does not include fluctuations that would naturally occur in this flow. The effect of neighboring buildings on the time dependent surface pressures experienced by the target is examined. Comparison of the pressure fluctuations on the single target building alone with those on the target building in situ show that, owing to vortices shed by the upstream buildings, fluctuations are larger when such buildings are present. Even when buildings are lateral to or behind the target, the pressure disturbances generate significantly different flows around this building. A simple grid-free mathematical model of a tree is presented in which the trunk and the branches are each represented by a collection of spherical particles strung together like beads on a string. The drag from the tree, determined as the sum of the drags of the component particles, produces an oscillatory, spreading wake of slower fluid, suggesting that the behavior of trees as wind breakers can be modeled usefully.

A Study on the Verification of Traffic Flow and Traffic Accident Cognitive Function for Road Traffic Situation Cognitive System

  • Am-suk, Oh
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.4
    • /
    • pp.273-279
    • /
    • 2022
  • Owing to the need to establish a cooperative-intelligent transport system (C-ITS) environment in the transportation sector locally and abroad, various research and development efforts such as high-tech road infrastructure, connection technology between road components, and traffic information systems are currently underway. However, the current central control center-oriented information collection and provision service structure and the insufficient road infrastructure limit the realization of the C-ITS, which requires a diversity of traffic information, real-time data, advanced traffic safety management, and transportation convenience services. In this study, a network construction method based on the existing received signal strength indicator (RSSI) selected as a comparison target, and the experimental target and the proposed intelligent edge network compared and analyzed. The result of the analysis showed that the data transmission rate in the intelligent edge network was 97.48%, the data transmission time was 215 ms, and the recovery time of network failure was 49,983 ms.

A Sparse Target Matrix Generation Based Unsupervised Feature Learning Algorithm for Image Classification

  • Zhao, Dan;Guo, Baolong;Yan, Yunyi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2806-2825
    • /
    • 2018
  • Unsupervised learning has shown good performance on image, video and audio classification tasks, and much progress has been made so far. It studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. Many promising deep learning systems are commonly trained by the greedy layerwise unsupervised learning manner. The performance of these deep learning architectures benefits from the unsupervised learning ability to disentangling the abstractions and picking out the useful features. However, the existing unsupervised learning algorithms are often difficult to train partly because of the requirement of extensive hyperparameters. The tuning of these hyperparameters is a laborious task that requires expert knowledge, rules of thumb or extensive search. In this paper, we propose a simple and effective unsupervised feature learning algorithm for image classification, which exploits an explicit optimizing way for population and lifetime sparsity. Firstly, a sparse target matrix is built by the competitive rules. Then, the sparse features are optimized by means of minimizing the Euclidean norm ($L_2$) error between the sparse target and the competitive layer outputs. Finally, a classifier is trained using the obtained sparse features. Experimental results show that the proposed method achieves good performance for image classification, and provides discriminative features that generalize well.

Strategy Considerations in Genome Cohort Construction in Korea (한국 유전체 코호트 구축의 전략적 고려사항)

  • Sung, Joo-Hon;Cho, Sung-Il
    • Journal of Preventive Medicine and Public Health
    • /
    • v.40 no.2
    • /
    • pp.95-101
    • /
    • 2007
  • Focusing on complex diseases of public health significance, strategic issues regarding the on-going Korean Genome Cohort were reviewed: target size and diseases, measurements, study design issues, and follow-up strategy of the cohort. Considering the epidemiologic characteristics of Korean population as well as strengths and drawbacks of current research environment, we tried to tailor the experience of other existing cohorts into proposals for this Korean study. Currently 100,000 individuals have been participating the new Genome Cohort in Korea. Target size of de novo collection is recommended to be set as between 300,000 to 500,000. This target size would allow acceptable power to detect genetic and environmental factors of moderate effect size and possible interactions between them. Family units and/or special subgroups are recommended to parallel main body of adult individuals to increase the overall efficiency of the study. Given that response rate to the conventional re-contact method may not be satisfactory, successful follow-up is the main key to the achievement of the Korean Genome Cohort. Access to the central database such as National Health Insurance data can provide enormous potential for near-complete case detection. Efforts to build consensus amongst scientists from broad fields and stakeholders are crucial to unleash the centralized database as well as to refine the commitment of this national project.

WebSHArk 1.0: A Benchmark Collection for Malicious Web Shell Detection

  • Kim, Jinsuk;Yoo, Dong-Hoon;Jang, Heejin;Jeong, Kimoon
    • Journal of Information Processing Systems
    • /
    • v.11 no.2
    • /
    • pp.229-238
    • /
    • 2015
  • Web shells are programs that are written for a specific purpose in Web scripting languages, such as PHP, ASP, ASP.NET, JSP, PERL-CGI, etc. Web shells provide a means to communicate with the server's operating system via the interpreter of the web scripting languages. Hence, web shells can execute OS specific commands over HTTP. Usually, web attacks by malicious users are made by uploading one of these web shells to compromise the target web servers. Though there have been several approaches to detect such malicious web shells, no standard dataset has been built to compare various web shell detection techniques. In this paper, we present a collection of web shell files, WebSHArk 1.0, as a standard dataset for current and future studies in malicious web shell detection. To provide baseline results for future studies and for the improvement of current tools, we also present some benchmark results by scanning the WebSHArk dataset directory with three web shell scanning tools that are publicly available on the Internet. The WebSHArk 1.0 dataset is only available upon request via email to one of the authors, due to security and legal issues.

Partial Garbage Collection Technique for Improving Write Performance of Log-Structured File Systems (부분 가비지 컬렉션을 이용한 로그 구조 파일시스템의 쓰기 성능 개선)

  • Gwak, Hyunho;Shin, Dongkun
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1026-1034
    • /
    • 2014
  • Recently, flash storages devices have become popular. Log-structured file systems (LFS) are suitable for flash storages since these can provide high write performance by only generating sequential writes to the flash device. However, LFS should perform garbage collections (GC) in order to reclaim obsolete space. Recently, a slack space recycling (SSR) technique was proposed to reduce the GC overhead. However, since SSR generates random writes, write performance can be negatively impacted if the random write performance is significantly lower than sequential write performance of the target device. This paper proposes a partial garbage collection technique that copies only a part of valid blocks in a victim segment in order to increase the size of the contiguous invalid space to be used by SSR. The experiments performed in this study show that the write performance in an SD card improves significantly as a result of the partial GC technique.