• Title/Summary/Keyword: Original Benchmark

Search Result 64, Processing Time 0.031 seconds

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Optimal SVM learning method based on adaptive sparse sampling and granularity shift factor

  • Wen, Hui;Jia, Dongshun;Liu, Zhiqiang;Xu, Hang;Hao, Guangtao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.4
    • /
    • pp.1110-1127
    • /
    • 2022
  • To improve the training efficiency and generalization performance of a support vector machine (SVM) in a large-scale set, an optimal SVM learning method based on adaptive sparse sampling and the granularity shift factor is presented. The proposed method combines sampling optimization with learner optimization. First, an adaptive sparse sampling method based on the potential function density clustering is designed to adaptively obtain sparse sampling samples, which can achieve a reduction in the training sample set and effectively approximate the spatial structure distribution of the original sample set. A granularity shift factor method is then constructed to optimize the SVM decision hyperplane, which fully considers the neighborhood information of each granularity region in the sparse sampling set. Experiments on an artificial dataset and three benchmark datasets show that the proposed method can achieve a relatively higher training efficiency, as well as ensure a good generalization performance of the learner. Finally, the effectiveness of the proposed method is verified.

A Remote Sensing Scene Classification Model Based on EfficientNetV2L Deep Neural Networks

  • Aljabri, Atif A.;Alshanqiti, Abdullah;Alkhodre, Ahmad B.;Alzahem, Ayyub;Hagag, Ahmed
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.406-412
    • /
    • 2022
  • Scene classification of very high-resolution (VHR) imagery can attribute semantics to land cover in a variety of domains. Real-world application requirements have not been addressed by conventional techniques for remote sensing image classification. Recent research has demonstrated that deep convolutional neural networks (CNNs) are effective at extracting features due to their strong feature extraction capabilities. In order to improve classification performance, these approaches rely primarily on semantic information. Since the abstract and global semantic information makes it difficult for the network to correctly classify scene images with similar structures and high interclass similarity, it achieves a low classification accuracy. We propose a VHR remote sensing image classification model that uses extracts the global feature from the original VHR image using an EfficientNet-V2L CNN pre-trained to detect similar classes. The image is then classified using a multilayer perceptron (MLP). This method was evaluated using two benchmark remote sensing datasets: the 21-class UC Merced, and the 38-class PatternNet. As compared to other state-of-the-art models, the proposed model significantly improves performance.

Optimum design of retaining structures under seismic loading using adaptive sperm swarm optimization

  • Khajehzadeh, Mohammad;Kalhor, Amir;Tehrani, Mehran Soltani;Jebeli, Mohammadreza
    • Structural Engineering and Mechanics
    • /
    • v.81 no.1
    • /
    • pp.93-102
    • /
    • 2022
  • The optimum design of reinforced concrete cantilever retaining walls subjected to seismic loads is an extremely important challenge in structural and geotechnical engineering, especially in seismic zones. This study proposes an adaptive sperm swarm optimization algorithm (ASSO) for economic design of retaining structure under static and seismic loading. The proposed ASSO algorithm utilizes a time-varying velocity damping factor to provide a fine balance between the explorative and exploitative behavior of the original method. In addition, the new method considers a reasonable velocity limitation to avoid the divergence of the sperm movement. The proposed algorithm is benchmarked with a set of test functions and the results are compared with the standard sperm swarm optimization (SSO) and some other robust metaheuristic from the literature. For seismic optimization of retaining structures, Mononobe-Okabe method is employed for dynamic loading conditions and total construction cost of the structure is considered as the single objective function. The optimization constraints include both geotechnical and structural restrictions and the design variables are the geometrical dimensions of the wall and the amount of steel reinforcement. Finally, optimization of two benchmark retaining structures under static and seismic loads using the ASSO algorithm is presented. According to the numerical results, the ASSO may provide better optimal solutions, and the designs obtained by ASSO have a lower cost by up to 20% compared with some other methods from the literature.

The Improvement of Point Cloud Data Processing Program For Efficient Earthwork BIM Design (토공 BIM 설계 효율화를 위한 포인트 클라우드 데이터 처리 프로그램 개선에 관한 연구)

  • Kim, Heeyeon;Kim, Jeonghwan;Seo, Jongwon;Shim, Ho
    • Korean Journal of Construction Engineering and Management
    • /
    • v.21 no.5
    • /
    • pp.55-63
    • /
    • 2020
  • Earthwork automation has emerged as a promising technology in the construction industry, and the application of earthwork automation technology is starting from the acquisition and processing of point cloud data of the site. Point cloud data has more than a million data due to vast extent of the construction site, and the processing time of the original point cloud data is critical because it takes tens or hundreds of hours to generate a Digital Terrain Model (DTM), and enhancement of the processing time can largely impact on the efficiency of the modeling. Currently, a benchmark program (BP) is actively used for the purpose of both point cloud data processing and BIM design as an integrated program in Korea, however, there are some aspects to be modified and refined. This study modified the BP, and developed an updated program by adopting a compile-based development environment, newly designed UI/UX, and OpenGL while maintaining existing PCD processing functions, and expended compatibility of the PCD file formats. We conducted a comparative test in terms of loading speed with different number of point cloud data, and the results showed that 92 to 99% performance increase was found in the developed program. This program can be used as a foundation for the development of a program that reduces the gap between design and construction by integrating PCD and earthwork BIM functions in the future.

Proposal of Minimum Spanning Tree Algorithm using 2-Edges Connected Grap (2-간선 연결 그래프를 사용한 최소신장트리 알고리즘 제안)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.233-241
    • /
    • 2014
  • This paper suggests a fast minimum spanning tree algorithm which simplify the original graph to 2-edge connected graph, and using the cycling property. Borůvka algorithm firstly gets the partial spanning tree using cycle property for one-edge connected graph that selects the only one minimum weighted edge (e) per vertex (v). Additionally, that selects minimum weighted edge between partial spanning trees using cut property. Kruskal algorithm uses cut property for ascending ordered of all edges. Reverse-delete algorithm uses cycle property for descending ordered of all edges. Borůvka and Kruskal algorithms always perform |e| times for all edges. The proposed algorithm obtains 2-edge connected graph that selects 2 minimum weighted edges for each vertex firstly. Secondly, we use cycle property for 2-edges connected graph, and stop the algorithm until |e|=|v|-1 For actual 10 benchmark data, The proposed algorithm can be get the minimum spanning trees. Also, this algorithm reduces 60% of the trial number than Borůvka, Kruskal and Reverse-delete algorithms.

Efficient Process Checkpointing through Fine-Grained COW Management in New Memory based Systems (뉴메모리 기반 시스템에서 세밀한 COW 관리 기법을 통한 효율적 프로세스 체크포인팅 기법)

  • Park, Jay H.;Moon, Young Je;Noh, Sam H.
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.132-138
    • /
    • 2017
  • We design and implement a process-based fault recovery system to increase the reliability of new memory based computer systems. A rollback point is made at every context switch to which a process can rollback to upon a fault. In this study, a clone process of the original process, which we refer to as a P-process (Persistent-process), is created as a rollback point. Such a design minimizes losses when a fault does occur. Specifically, first, execution loss can be minimized as rollback points are created only at context switches, which bounds the lost execution. Second, as we make use of the COW (Copy-On-Write)mechanism, only those parts of the process memory state that are modified (in page units) are copied decreasing the overhead for creating the P-process. Our experimental results show that the overhead is approximately 5% in 8 out of 11 PARSEC benchmark workloads when P-process is created at every context switch time. Even for workloads that result in considerable overhead, we show that this overhead can be reduced by increasing the P-process generation interval.

An Incremental Multi Partition Averaging Algorithm Based on Memory Based Reasoning (메모리 기반 추론 기법에 기반한 점진적 다분할평균 알고리즘)

  • Yih, Hyeong-Il
    • Journal of IKEEE
    • /
    • v.12 no.1
    • /
    • pp.65-74
    • /
    • 2008
  • One of the popular methods used for pattern classification is the MBR (Memory-Based Reasoning) algorithm. Since it simply computes distances between a test pattern and training patterns or hyperplanes stored in memory, and then assigns the class of the nearest training pattern, it is notorious for memory usage and can't learn additional information from new data. In order to overcome this problem, we propose an incremental learning algorithm (iMPA). iMPA divides the entire pattern space into fixed number partitions, and generates representatives from each partition. Also, due to the fact that it can not learn additional information from new data, we present iMPA which can learn additional information from new data and not require access to the original data, used to train. Proposed methods have been successfully shown to exhibit comparable performance to k-NN with a lot less number of patterns and better result than EACH system which implements the NGE theory using benchmark data sets from UCI Machine Learning Repository.

  • PDF

LDBAS: Location-aware Data Block Allocation Strategy for HDFS-based Applications in the Cloud

  • Xu, Hua;Liu, Weiqing;Shu, Guansheng;Li, Jing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.204-226
    • /
    • 2018
  • Big data processing applications have been migrated into cloud gradually, due to the advantages of cloud computing. Hadoop Distributed File System (HDFS) is one of the fundamental support systems for big data processing on MapReduce-like frameworks, such as Hadoop and Spark. Since HDFS is not aware of the co-location of virtual machines in the cloud, the default scheme of block allocation in HDFS does not fit well in the cloud environments behaving in two aspects: data reliability loss and performance degradation. In this paper, we present a novel location-aware data block allocation strategy (LDBAS). LDBAS jointly optimizes data reliability and performance for upper-layer applications by allocating data blocks according to the locations and different processing capacities of virtual nodes in the cloud. We apply LDBAS to two stages of data allocation of HDFS in the cloud (the initial data allocation and data recovery), and design the corresponding algorithms. Finally, we implement LDBAS into an actual Hadoop cluster and evaluate the performance with the benchmark suite BigDataBench. The experimental results show that LDBAS can guarantee the designed data reliability while reducing the job execution time of the I/O-intensive applications in Hadoop by 8.9% on average and up to 11.2% compared with the original Hadoop in the cloud.

COARSE MESH FINITE DIFFERENCE ACCELERATION OF DISCRETE ORDINATE NEUTRON TRANSPORT CALCULATION EMPLOYING DISCONTINUOUS FINITE ELEMENT METHOD

  • Lee, Dong Wook;Joo, Han Gyu
    • Nuclear Engineering and Technology
    • /
    • v.46 no.6
    • /
    • pp.783-796
    • /
    • 2014
  • The coarse mesh finite difference (CMFD) method is applied to the discontinuous finite element method based discrete ordinate calculation for source convergence acceleration. The three-dimensional (3-D) DFEM-Sn code FEDONA is developed for general geometry applications as a framework for the CMFD implementation. Detailed methods for applying the CMFD acceleration are established, such as the method to acquire the coarse mesh flux and current by combining unstructured tetrahedron elements to rectangular coarse mesh geometry, and the alternating calculation method to exchange the updated flux information between the CMFD and DFEM-Sn. The partial current based CMFD (p-CMFD) is also implemented for comparison of the acceleration performance. The modified p-CMFD method is proposed to correct the weakness of the original p-CMFD formulation. The performance of CMFD acceleration is examined first for simple two-dimensional multigroup problems to investigate the effect of the problem and coarse mesh sizes. It is shown that smaller coarse meshes are more effective in the CMFD acceleration and the modified p-CMFD has similar effectiveness as the standard CMFD. The effectiveness of CMFD acceleration is then assessed for three-dimensional benchmark problems such as the IAEA (International Atomic Energy Agency) and C5G7MOX problems. It is demonstrated that a sufficiently converged solution is obtained within 7 outer iterations which would require 175 iterations with the normal DFEM-Sn calculations for the IAEA problem. It is claimed that the CMFD accelerated DFEM-Sn method can be effectively used in the practical eigenvalue calculations involving general geometries.