• Title/Summary/Keyword: high performance computing

Search Result 1,114, Processing Time 0.026 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.

A Comparative Study on the Effective Deep Learning for Fingerprint Recognition with Scar and Wrinkle (상처와 주름이 있는 지문 판별에 효율적인 심층 학습 비교연구)

  • Kim, JunSeob;Rim, BeanBonyka;Sung, Nak-Jun;Hong, Min
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.17-23
    • /
    • 2020
  • Biometric information indicating measurement items related to human characteristics has attracted great attention as security technology with high reliability since there is no fear of theft or loss. Among these biometric information, fingerprints are mainly used in fields such as identity verification and identification. If there is a problem such as a wound, wrinkle, or moisture that is difficult to authenticate to the fingerprint image when identifying the identity, the fingerprint expert can identify the problem with the fingerprint directly through the preprocessing step, and apply the image processing algorithm appropriate to the problem. Solve the problem. In this case, by implementing artificial intelligence software that distinguishes fingerprint images with cuts and wrinkles on the fingerprint, it is easy to check whether there are cuts or wrinkles, and by selecting an appropriate algorithm, the fingerprint image can be easily improved. In this study, we developed a total of 17,080 fingerprint databases by acquiring all finger prints of 1,010 students from the Royal University of Cambodia, 600 Sokoto open data sets, and 98 Korean students. In order to determine if there are any injuries or wrinkles in the built database, criteria were established, and the data were validated by experts. The training and test datasets consisted of Cambodian data and Sokoto data, and the ratio was set to 8: 2. The data of 98 Korean students were set up as a validation data set. Using the constructed data set, five CNN-based architectures such as Classic CNN, AlexNet, VGG-16, Resnet50, and Yolo v3 were implemented. A study was conducted to find the model that performed best on the readings. Among the five architectures, ResNet50 showed the best performance with 81.51%.

A Study of Guide System for Cerebrovascular Intervention (뇌혈관 중재시술 지원 가이드 시스템에 관한 연구)

  • Lee, Sung-Gwon;Jeong, Chang-Won;Yoon, Kwon-Ha;Joo, Su-Chong
    • Journal of Internet Computing and Services
    • /
    • v.17 no.1
    • /
    • pp.101-107
    • /
    • 2016
  • Due to the recent advancement in digital imaging technology, development of intervention equipment has become generalize. Video arbitration procedure is a process to insert a tiny catheter and a guide wire in the body, so in order to enhance the effectiveness and safety of this treatment, the high-quality of x-ray of image should be used. However, the increasing of radiation has become the problem. Therefore, the studies to improve the performance of x-ray detectors are being actively processed. Moreover, this intervention is based on the reference of the angiographic imaging and 3D medical image processing. In this paper, we propose a guidance system to support this intervention. Through this intervention, it can solve the problem of the existing 2D medical images based vessel that has a formation of cerebrovascular disease, and guide the real-time tracking and optimal route to the target lesion by intervention catheter and guide wire tool. As a result, the system was completely composed for medical image acquisition unit and image processing unit as well as a display device. The experimental environment, guide services which are provided by the proposed system Brain Phantom (complete intracranial model with aneurysms, ref H+N-S-A-010) was taken with x-ray and testing. To generate a reference image based on the Laplacian algorithm for the image processing which derived from the cerebral blood vessel model was applied to DICOM by Volume ray casting technique. $A^*$ algorithm was used to provide the catheter with a guide wire tracking path. Finally, the result does show the location of the catheter and guide wire providing in the proposed system especially, it is expected to provide a useful guide for future intervention service.