• Title/Summary/Keyword: Big Data Computing

Search Result 482, Processing Time 0.025 seconds

A Study on the Agenda Rank-Order Correlation between Twitter and Portal News about Sewol Ferry Catastrophe (세월호 참사에 대한 트위터와 포털뉴스의 의제 순위 상관관계 연구)

  • Kim, Shin-Ku;Choi, Eun-Kyoung
    • Journal of Internet Computing and Services
    • /
    • v.16 no.3
    • /
    • pp.105-116
    • /
    • 2015
  • The Sewol ferry catastrophe that took place on April 16 2014 was unprecedented in terms of its sociopolitical implications, which had reverberated throughout the Korean nation. Mindful of such distinct characteristics of the Sewol ferry catastrophe, this thesis looks into the salience of the agendas portrayed in Twitter and Portal News coverage on the disaster and the correlation between the attribute-specific agendas of the foregoing mediums by making use of the agenda rank order correlation method. Extraction and analysis of big data revealed that first, while the hypothesis that there were little difference in terms of salience among the main agendas between Twitter and Portal News was dismissed, the rank order correlation proved to be high as regards the main agendas on Twitter and Portal News. This signifies that Twitter agendas exert influence over those on Portal News. Next, and regarding the five main agendas on the incident, there existed differences in salience between the attribute-specific agendas of the two mediums, with low figures for corresponding rank order correlations. Such results signify that Twitter and Portal News have little influence over each other as regards their agenda rank order correlation.

A step-by-step service encryption model based on routing pattern in case of IP spoofing attacks on clustering environment (클러스터링 환경에 대한 IP 스푸핑 공격 발생시 라우팅 패턴에 기반한 단계별 서비스 암호화 모델)

  • Baek, Yong-Jin;Jeong, Won-Chang;Hong, Suk-Won;Park, Jae-Hung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.6
    • /
    • pp.580-586
    • /
    • 2017
  • The establishment of big data service environment requires both cloud-based network technology and clustering technology to improve the efficiency of information access. These cloud-based networks and clustering environments can provide variety of valuable information in real-time, which can be an intensive target of attackers attempting illegal access. In particular, attackers attempting IP spoofing can analyze information of mutual trust hosts constituting clustering, and attempt to attack directly to system existing in the cluster. Therefore, it is necessary to detect and respond to illegal attacks quickly, and it is demanded that the security policy is stronger than the security system that is constructed and operated in the existing single system. In this paper, we investigate routing pattern changes and use them as detection information to enable active correspondence and efficient information service in illegal attacks at this network environment. In addition, through the step-by -step encryption based on the routing information generated during the detection process, it is possible to manage the stable service information without frequent disconnection of the information service for resetting.

Designing a Platform Model for Building MyData Ecosystem (마이데이터 생태계 구축을 위한 플랫폼 모델 설계)

  • Kang, Nam-Gyu;Choi, Hee-Seok;Lee, Hye-Jin;Han, Sang-Jun;Lee, Seok-Hyoung
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.123-131
    • /
    • 2021
  • The Fourth Industrial Revolution was triggered by data-driven digital technologies such as AI and big data. There is a rapid movement to expand the scope of data utilization to the privacy area, which was considered only a protected area. Through the revision of the Data 3 Act, laws and systems were established that allow personal information to be freely transferred and utilized under their consent. But, it will be necessary to support the platform that encompasses the entire process from collecting personal information to managing and utilizing it. In this paper, we propose a platform model that can be applied to building mydata ecosystem using personal information. It describes the six essential functional requirements for building MyData platforms and the procedures and methods for implementing them. The six proposed essential features describe consent, sharing/downloading/ receipt of data, data collection and utilization, user authentication, API gateway, and platform services. We also illustrate the case of applying the MyData platform model to real-world, underprivileged mobility support services.

Keyword Network Visualization for Text Summarization and Comparative Analysis (문서 요약 및 비교분석을 위한 주제어 네트워크 가시화)

  • Kim, Kyeong-rim;Lee, Da-yeong;Cho, Hwan-Gue
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.139-147
    • /
    • 2017
  • Most of the information prevailing in the Internet space consists of textual information. So one of the main topics regarding the huge document analyses that are required in the "big data" era is the development of an automated understanding system for textual data; accordingly, the automation of the keyword extraction for text summarization and abstraction is a typical research problem. But the simple listing of a few keywords is insufficient to reveal the complex semantic structures of the general texts. In this paper, a text-visualization method that constructs a graph by computing the related degrees from the selected keywords of the target text is developed; therefore, two construction models that provide the edge relation are proposed for the computing of the relation degree among keywords, as follows: influence-interval model and word- distance model. The finally visualized graph from the keyword-derived edge relation is more flexible and useful for the display of the meaning structure of the target text; furthermore, this abstract graph enables a fast and easy understanding of the target text. The authors' experiment showed that the proposed abstract-graph model is superior to the keyword list for the attainment of a semantic and comparitive understanding of text.

Relevance Feedback using Region-of-interest in Retrieval of Satellite Images (위성영상 검색에서 사용자 관심영역을 이용한 적합성 피드백)

  • Kim, Sung-Jin;Chung, Chin-Wan;Lee, Seok-Lyong;Kim, Deok-Hwan
    • Journal of KIISE:Databases
    • /
    • v.36 no.6
    • /
    • pp.434-445
    • /
    • 2009
  • Content-based image retrieval(CBIR) is the retrieval technique which uses the contents of images. However, in contrast to text data, multimedia data are ambiguous and there is a big difference between system's low-level representation and human's high-level concept. So it doesn't always mean that near points in the vector space are similar to user. We call this the semantic-gap problem. Due to this problem, performance of image retrieval is not good. To solve this problem, the relevance feedback(RF) which uses user's feedback information is used. But existing RF doesn't consider user's region-of-interest(ROI), and therefore, irrelevant regions are used in computing new query points. Because the system doesn't know user's ROI, RF is proceeded in the image-level. We propose a new ROI RF method which guides a user to select ROI from relevant images for the retrieval of complex satellite image, and this improves the accuracy of the image retrieval by computing more accurate query points in this paper. Also we propose a pruning technique which improves the accuracy of the image retrieval by using images not selected by the user in this paper. Experiments show the efficiency of the proposed ROI RF and the pruning technique.

Violation Detection of Application Network QoS using Ontology in SDN Environment (SDN 환경에서 온톨로지를 활용한 애플리케이션 네트워크의 품질 위반상황 식별 방법)

  • Hwang, Jeseung;Kim, Ungsoo;Park, Joonseok;Yeom, Keunhyuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.6
    • /
    • pp.7-20
    • /
    • 2017
  • The advancement of cloud and big data and the considerable growth of traffic have increased the complexity and problems in the management inefficiency of existing networks. The software-defined networking (SDN) environment has been developed to solve this problem. SDN enables us to control network equipment through programming by separating the transmission and control functions of the equipment. Accordingly, several studies have been conducted to improve the performance of SDN controllers, such as the method of connecting existing legacy equipment with SDN, the packet management method for efficient data communication, and the method of distributing controller load in a centralized architecture. However, there is insufficient research on the control of SDN in terms of the quality of network-using applications. To support the establishment and change of the routing paths that meet the required network service quality, we require a mechanism to identify network requirements based on a contract for application network service quality and to collect information about the current network status and identify the violations of network service quality. This study proposes a method of identifying the quality violations of network paths through ontology to ensure the network service quality of applications and provide efficient services in an SDN environment.

An Efficient Dual Queue Strategy for Improving Storage System Response Times (저장시스템의 응답 시간 개선을 위한 효율적인 이중 큐 전략)

  • Hyun-Seob Lee
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.3
    • /
    • pp.19-24
    • /
    • 2024
  • Recent advances in large-scale data processing technologies such as big data, cloud computing, and artificial intelligence have increased the demand for high-performance storage devices in data centers and enterprise environments. In particular, the fast data response speed of storage devices is a key factor that determines the overall system performance. Solid state drives (SSDs) based on the Non-Volatile Memory Express (NVMe) interface are gaining traction, but new bottlenecks are emerging in the process of handling large data input and output requests from multiple hosts simultaneously. SSDs typically process host requests by sequentially stacking them in an internal queue. When long transfer length requests are processed first, shorter requests wait longer, increasing the average response time. To solve this problem, data transfer timeout and data partitioning methods have been proposed, but they do not provide a fundamental solution. In this paper, we propose a dual queue based scheduling scheme (DQBS), which manages the data transfer order based on the request order in one queue and the transfer length in the other queue. Then, the request time and transmission length are comprehensively considered to determine the efficient data transmission order. This enables the balanced processing of long and short requests, thus reducing the overall average response time. The simulation results show that the proposed method outperforms the existing sequential processing method. This study presents a scheduling technique that maximizes data transfer efficiency in a high-performance SSD environment, which is expected to contribute to the development of next-generation high-performance storage systems

Recommendation of Best Empirical Route Based on Classification of Large Trajectory Data (대용량 경로데이터 분류에 기반한 경험적 최선 경로 추천)

  • Lee, Kye Hyung;Jo, Yung Hoon;Lee, Tea Ho;Park, Heemin
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.101-108
    • /
    • 2015
  • This paper presents the implementation of a system that recommends empirical best routes based on classification of large trajectory data. As many location-based services are used, we expect the amount of location and trajectory data to become big data. Then, we believe we can extract the best empirical routes from the large trajectory repositories. Large trajectory data is clustered into similar route groups using Hadoop MapReduce framework. Clustered route groups are stored and managed by a DBMS, and thus it supports rapid response to the end-users' request. We aim to find the best routes based on collected real data, not the ideal shortest path on maps. We have implemented 1) an Android application that collects trajectories from users, 2) Apache Hadoop MapReduce program that can cluster large trajectory data, 3) a service application to query start-destination from a web server and to display the recommended routes on mobile phones. We validated our approach using real data we collected for five days and have compared the results with commercial navigation systems. Experimental results show that the empirical best route is better than routes recommended by commercial navigation systems.

Recent Technique Analysis, Infant Commodity Pattern Analysis Scenario and Performance Analysis of Incremental Weighted Maximal Representative Pattern Mining (점진적 가중화 맥시멀 대표 패턴 마이닝의 최신 기법 분석, 유아들의 물품 패턴 분석 시나리오 및 성능 분석)

  • Yun, Unil;Yun, Eunmi
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.39-48
    • /
    • 2020
  • Data mining techniques have been suggested to find efficiently meaningful and useful information. Especially, in the big data environments, as data becomes accumulated in several applications, related pattern mining methods have been proposed. Recently, instead of analyzing not only static data stored already in files or databases, mining dynamic data incrementally generated in a real time is considered as more interesting research areas because these dynamic data can be only one time read. With this reason, researches of how these dynamic data are mined efficiently have been studied. Moreover, approaches of mining representative patterns such as maximal pattern mining have been proposed since a huge number of result patterns as mining results are generated. As another issue, to discover more meaningful patterns in real world, weights of items in weighted pattern mining have been used, In real situation, profits, costs, and so on of items can be utilized as weights. In this paper, we analyzed weighted maximal pattern mining approaches for data generated incrementally. Maximal representative pattern mining techniques, and incremental pattern mining methods. And then, the application scenarios for analyzing the required commodity patterns in infants are presented by applying weighting representative pattern mining. Furthermore, the performance of state-of-the-art algorithms have been evaluated. As a result, we show that incremental weighted maximal pattern mining technique has better performance than incremental weighted pattern mining and weighted maximal pattern mining.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.