• Title/Summary/Keyword: datasets

Search Result 2,005, Processing Time 0.027 seconds

A Novel Image Captioning based Risk Assessment Model (이미지 캡셔닝 기반의 새로운 위험도 측정 모델)

  • Jeon, Min Seong;Ko, Jae Pil;Cheoi, Kyung Joo
    • The Journal of Information Systems
    • /
    • v.32 no.4
    • /
    • pp.119-136
    • /
    • 2023
  • Purpose We introduce a groundbreaking surveillance system explicitly designed to overcome the limitations typically associated with conventional surveillance systems, which often focus primarily on object-centric behavior analysis. Design/methodology/approach The study introduces an innovative approach to risk assessment in surveillance, employing image captioning to generate descriptive captions that effectively encapsulate the interactions among objects, actions, and spatial elements within observed scenes. To support our methodology, we developed a distinctive dataset comprising pairs of [image-caption-danger score] for training purposes. We fine-tuned the BLIP-2 model using this dataset and utilized BERT to decipher the semantic content of the generated captions for assessing risk levels. Findings In a series of experiments conducted with our self-constructed datasets, we illustrate that these datasets offer a wealth of information for risk assessment and display outstanding performance in this area. In comparison to models pre-trained on established datasets, our generated captions thoroughly encompass the necessary object attributes, behaviors, and spatial context crucial for the surveillance system. Additionally, they showcase adaptability to novel sentence structures, ensuring their versatility across a range of contexts.

Handwritten Indic Digit Recognition using Deep Hybrid Capsule Network

  • Mohammad Reduanul Haque;Rubaiya Hafiz;Mohammad Zahidul Islam;Mohammad Shorif Uddin
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.89-94
    • /
    • 2024
  • Indian subcontinent is a birthplace of multilingual people where documents such as job application form, passport, number plate identification, and so forth is composed of text contents written in different languages/scripts. These scripts may be in the form of different indic numerals in a single document page. Due to this reason, building a generic recognizer that is capable of recognizing handwritten indic digits written by diverse writers is needed. Also, a lot of work has been done for various non-Indic numerals particularly, in case of Roman, but, in case of Indic digits, the research is limited. Moreover, most of the research focuses with only on MNIST datasets or with only single datasets, either because of time restraints or because the model is tailored to a specific task. In this work, a hybrid model is proposed to recognize all available indic handwritten digit images using the existing benchmark datasets. The proposed method bridges the automatically learnt features of Capsule Network with hand crafted Bag of Feature (BoF) extraction method. Along the way, we analyze (1) the successes (2) explore whether this method will perform well on more difficult conditions i.e. noise, color, affine transformations, intra-class variation, natural scenes. Experimental results show that the hybrid method gives better accuracy in comparison with Capsule Network.

Slime mold and four other nature-inspired optimization algorithms in analyzing the concrete compressive strength

  • Yinghao Zhao;Hossein Moayedi;Loke Kok Foong;Quynh T. Thi
    • Smart Structures and Systems
    • /
    • v.33 no.1
    • /
    • pp.65-91
    • /
    • 2024
  • The use of five optimization techniques for the prediction of a strength-based concrete mixture's best-fit model is examined in this work. Five optimization techniques are utilized for this purpose: Slime Mold Algorithm (SMA), Black Hole Algorithm (BHA), Multi-Verse Optimizer (MVO), Vortex Search (VS), and Whale Optimization Algorithm (WOA). MATLAB employs a hybrid learning strategy to train an artificial neural network that combines least square estimation with backpropagation. Thus, 72 samples are utilized as training datasets and 31 as testing datasets, totaling 103. The multi-layer perceptron (MLP) is used to analyze all data, and results are verified by comparison. For training datasets in the best-fit models of SMA-MLP, BHA-MLP, MVO-MLP, VS-MLP, and WOA-MLP, the statistical indices of coefficient of determination (R2) in training phase are 0.9603, 0.9679, 0.9827, 0.9841 and 0.9770, and in testing phase are 0.9567, 0.9552, 0.9594, 0.9888 and 0.9695 respectively. In addition, the best-fit structures for training for SMA, BHA, MVO, VS, and WOA (all combined with multilayer perceptron, MLP) are achieved when the term population size was modified to 450, 500, 250, 150, and 500, respectively. Among all the suggested options, VS could offer a stronger prediction network for training MLP.

A Study on Recent Trends in Building Linked Data for Overseas Libraries: Focusing on Published Datasets, Reused Vocabulary, and Interlinked External Datasets (해외 도서관 링크드 데이터 구축의 최근 동향 연구 - 발행 데이터세트, 재사용 어휘집, 인터링킹 외부 데이터세트를 중심으로 -)

  • Sung-Sook Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.4
    • /
    • pp.5-28
    • /
    • 2022
  • In this study, LD construction cases of overseas libraries were analyzed with focus on published datasets, reused vocabulary, and interlinked external datasets, and based on the analysis results, basic data on LD construction plans of domestic libraries were obtained. As a result of the analysis of 21 library cases, overseas libraries have established a faithful authority LD and conducted new services using published LDs. To this end, overseas libraries collaborated with other libraries and cultural institutions within the region, within the country, and nationally under the leadership of the library, and based on this cooperation, a specialized dataset was published. Overseas libraries used Schema.org to increase the visibility of published LDs, and used BIBFRAME for subdivision of description to define various entities and build LDs based on the defined entities. Overseas libraries have utilized various defined entities to link related information, display results, browse, and download in bulk. Overseas libraries were interested in the continuous up-to-date of interlinked external datasets, and directly utilized external data to reinforce catalog information. In this study, based on the derived implications, points to be considered when issuing LDs by domestic libraries were proposed. The research results can be used as basic data when future domestic libraries plan LD services or upgrade existing services.

Optimization of Uneven Margin SVM to Solve Class Imbalance in Bankruptcy Prediction (비대칭 마진 SVM 최적화 모델을 이용한 기업부실 예측모형의 범주 불균형 문제 해결)

  • Sung Yim Jo;Myoung Jong Kim
    • Information Systems Review
    • /
    • v.24 no.4
    • /
    • pp.23-40
    • /
    • 2022
  • Although Support Vector Machine(SVM) has been used in various fields such as bankruptcy prediction model, the hyperplane learned by SVM in class imbalance problem can be severely skewed toward minority class and has a negative impact on performance because the area of majority class is expanded while the area of minority class is invaded. This study proposed optimized uneven margin SVM(OPT-UMSVM) combining threshold moving or post scaling method with UMSVM to cope with the limitation of the traditional even margin SVM(EMSVM) in class imbalance problem. OPT-UMSVM readjusted the skewed hyperplane to the majority class and had better generation ability than EMSVM improving the sensitivity of minority class and calculating the optimized performance. To validate OPT-UMSVM, 10-fold cross validations were performed on five sub-datasets with different imbalance ratio values. Empirical results showed two main findings. First, UMSVM had a weak effect on improving the performance of EMSVM in balanced datasets, but it greatly outperformed EMSVM in severely imbalanced datasets. Second, compared to EMSVM and conventional UMSVM, OPT-UMSVM had better performance in both balanced and imbalanced datasets and showed a significant difference performance especially in severely imbalanced datasets.

Integration and Reanalysis of Four RNA-Seq Datasets Including BALF, Nasopharyngeal Swabs, Lung Biopsy, and Mouse Models Reveals Common Immune Features of COVID-19

  • Rudi Alberts;Sze Chun Chan;Qian-Fang Meng;Shan He;Lang Rao;Xindong Liu;Yongliang Zhang
    • IMMUNE NETWORK
    • /
    • v.22 no.3
    • /
    • pp.22.1-22.25
    • /
    • 2022
  • Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndromecoronavirus-2 (SARS-CoV-2), has spread over the world causing a pandemic which is still ongoing since its emergence in late 2019. A great amount of effort has been devoted to understanding the pathogenesis of COVID-19 with the hope of developing better therapeutic strategies. Transcriptome analysis using technologies such as RNA sequencing became a commonly used approach in study of host immune responses to SARS-CoV-2. Although substantial amount of information can be gathered from transcriptome analysis, different analysis tools used in these studies may lead to conclusions that differ dramatically from each other. Here, we re-analyzed four RNA-sequencing datasets of COVID-19 samples including human bronchoalveolar lavage fluid, nasopharyngeal swabs, lung biopsy and hACE2 transgenic mice using the same standardized method. The results showed that common features of COVID-19 include upregulation of chemokines including CCL2, CXCL1, and CXCL10, inflammatory cytokine IL-1β and alarmin S100A8/S100A9, which are associated with dysregulated innate immunity marked by abundant neutrophil and mast cell accumulation. Downregulation of chemokine receptor genes that are associated with impaired adaptive immunity such as lymphopenia is another common feather of COVID-19 observed. In addition, a few interferon-stimulated genes but no type I IFN genes were identified to be enriched in COVID-19 samples compared to their respective control in these datasets. These features are in line with results from single-cell RNA sequencing studies in the field. Therefore, our re-analysis of the RNA-seq datasets revealed common features of dysregulated immune responses to SARS-CoV-2 and shed light to the pathogenesis of COVID-19.

Nudging of Vertical Profiles of Meteorological Parameters in One-Dimensional Atmospheric Model: A Step Towards Improvements in Numerical Simulations

  • Subrahamanyam, D. Bala;Rani, S. Indira;Ramachandran, Radhika;Kunhikrishnan, P. K.
    • Ocean Science Journal
    • /
    • v.43 no.4
    • /
    • pp.165-173
    • /
    • 2008
  • In this article, we describe a simple yet effective method for insertion of observational datasets in a mesoscale atmospheric model used in one-dimensional configuration through Nudging. To demonstrate the effectiveness of this technique, vertical profiles of meteorological parameters obtained from GLASS Sonde launches from a tiny island of Kaashidhoo in the Republic of Maldives are injected in a mesoscale atmospheric model - Advanced Regional Prediction System (ARPS), and model simulated parameters are compared with the available observational datasets. Analysis of one-time nudging in the model simulations over Kaashidhoo show that incorporation of this technique reasonably improves the model simulations within a time domain of +6 to +12 Hrs, while its impact on +18 Hrs simulations and beyond becomes literally null.

Finding Top-k Answers in Node Proximity Search Using Distribution State Transition Graph

  • Park, Jaehui;Lee, Sang-Goo
    • ETRI Journal
    • /
    • v.38 no.4
    • /
    • pp.714-723
    • /
    • 2016
  • Considerable attention has been given to processing graph data in recent years. An efficient method for computing the node proximity is one of the most challenging problems for many applications such as recommendation systems and social networks. Regarding large-scale, mutable datasets and user queries, top-k query processing has gained significant interest. This paper presents a novel method to find top-k answers in a node proximity search based on the well-known measure, Personalized PageRank (PPR). First, we introduce a distribution state transition graph (DSTG) to depict iterative steps for solving the PPR equation. Second, we propose a weight distribution model of a DSTG to capture the states of intermediate PPR scores and their distribution. Using a DSTG, we can selectively follow and compare multiple random paths with different lengths to find the most promising nodes. Moreover, we prove that the results of our method are equivalent to the PPR results. Comparative performance studies using two real datasets clearly show that our method is practical and accurate.

A new clustering algorithm based on the connected region generation

  • Feng, Liuwei;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2619-2643
    • /
    • 2018
  • In this paper, a new clustering algorithm based on the connected region generation (CRG-clustering) is proposed. It is an effective and robust approach to clustering on the basis of the connectivity of the points and their neighbors. In the new algorithm, a connected region generating (CRG) algorithm is developed to obtain the connected regions and an isolated point set. Each connected region corresponds to a homogeneous cluster and this ensures the separability of an arbitrary data set theoretically. Then, a region expansion strategy and a consensus criterion are used to deal with the points in the isolated point set. Experimental results on the synthetic datasets and the real world datasets show that the proposed algorithm has high performance and is insensitive to noise.