• Title/Summary/Keyword: Big data Problem

Search Result 571, Processing Time 0.032 seconds

Neighbor Cooperation Based In-Network Caching for Content-Centric Networking

  • Luo, Xi;An, Ying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2398-2415
    • /
    • 2017
  • Content-Centric Networking (CCN) is a new Internet architecture with routing and caching centered on contents. Through its receiver-driven and connectionless communication model, CCN natively supports the seamless mobility of nodes and scalable content acquisition. In-network caching is one of the core technologies in CCN, and the research of efficient caching scheme becomes increasingly attractive. To address the problem of unbalanced cache load distribution in some existing caching strategies, this paper presents a neighbor cooperation based in-network caching scheme. In this scheme, the node with the highest betweenness centrality in the content delivery path is selected as the central caching node and the area of its ego network is selected as the caching area. When the caching node has no sufficient resource, part of its cached contents will be picked out and transferred to the appropriate neighbor by comprehensively considering the factors, such as available node cache, cache replacement rate and link stability between nodes. Simulation results show that our scheme can effectively enhance the utilization of cache resources and improve cache hit rate and average access cost.

Order selection based on scaled lift (척도화 향상도에 근거한 처방 선택)

  • Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.2
    • /
    • pp.227-234
    • /
    • 2011
  • In this study, we propose order selection methods based on scaled lift. This study is proposed to overcome the problem that the lift used by Park and Kim (2010) takes unbounded values and thus it is hard to know how big (or small) lift value is big (or small). The first scaled lift just scales lift, so that it takes values between 0 and 1, and the second scaled lift scales lift-1, so that it takes values between -1 and 1. In other words, the first method scales lift only and the second methods ceters and scales lift. We apply order selection methods based on scaled lift to acute appendicitis patients in emergency room and compare them with the results based on lift.

Design of Real-Time Vehicle Information Management Platform Using an IoT-based Gateway (IoT기반 게이트웨이를 활용한 실시간 차량 정보 관리 플랫폼 설계)

  • Chang, Moon-Soo;Lee, Jeong-Il
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.548-551
    • /
    • 2018
  • Most vehicles are in the form of maintenance when a problem occurs by the user himself or herself. During maintenance, users are not able to operate the car while it is being serviced, and if the target vehicle is a revenue-generating vehicle, they will have to bear economic losses. Collecting vehicle information in real time, identifying problems that could arise with a vehicle based on the collected big data and providing advance service rather than after-sales service can help secure vehicle operation and reduce economic loss. Thus, in this thesis, a platform was designed to design IoT-based gateways, collect real-time vehicle information, and organize big data to provide vehicle information in real time.

  • PDF

Urban Growth Analysis Through Satellite Image and Zonal Data (도시성장분석상 위상영상자료와 구역자료의 통합이용에 관한 연구)

  • Kim, Jae-Ik;Hwang, Kook-Woong;Chung, Hyun-Wook;Yeo, Chang-Hwan
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.7 no.3
    • /
    • pp.1-12
    • /
    • 2004
  • Nowadays, a satellite image is widely utilized in identifying and predicting urban spatial growth. It provides essential informations on horizontal expansion of urbanized areas. However, its usefulness becomes very limited in analyzing density of urban development. On the contrary, zonal data, typically census data, provides various density information such as population, number of houses, floor information within a given zone. The problem of the zonal data in analyzing urban growth is that the size of the zone is too big. The minimum administration unit, Dong, is too big to match the satellite images. This study tries to derive synergy effects by matching the merits of the two information sources-- image data and zonal data. For this purpose, basic statistical unit (census block size) is utilized as a zonal unit. By comparing the image and zonal data of 1985 and 2000 of Daegu metropolitan area, this study concludes that urban growth pattern is better explained when the two types of data are properly used.

  • PDF

Hybrid Simulated Annealing for Data Clustering (데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Beom-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

A Heuristic for Drone-Utilized Blood Inventory and Delivery Planning (드론 활용 혈액 재고/배송계획 휴리스틱)

  • Jang, Jin-Myeong;Kim, Hwa-Joong;Son, Dong-Hoon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.3
    • /
    • pp.106-116
    • /
    • 2021
  • This paper considers a joint problem for blood inventory planning at hospitals and blood delivery planning from blood centers to hospitals, in order to alleviate the blood service imbalance between big and small hospitals being occurred in practice. The joint problem is to determine delivery timing, delivery quantity, delivery means such as medical drones and legacy blood vehicles, and inventory level to minimize inventory and delivery costs while satisfying hospitals' blood demand over a planning horizon. This problem is formulated as a mixed integer programming model by considering practical constraints such as blood lifespan and drone specification. To solve the problem, this paper employs a Lagrangian relaxation technique and suggests a time efficient Lagrangian heuristic algorithm. The performance of the suggested heuristic is evaluated by conducting computational experiments on randomly-generated problem instances, which are generated by mimicking the real data of Korean Red Cross in Seoul and other reliable sources. The results of computational experiments show that the suggested heuristic obtains near-optimal solutions in a shorter amount of time. In addition, we discuss the effect of changes in the length of blood lifespan, the number of planning periods, the number of hospitals, and drone specifications on the performance of the suggested Lagrangian heuristic.

Exploring the Job Competencies of Data Scientists Using Online Job Posting (온라인 채용정보를 이용한 데이터 과학자 요구 역량 탐색)

  • Jin, Xiangdan;Baek, Seung Ik
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.1-20
    • /
    • 2022
  • As the global business environment is rapidly changing due to the 4th industrial revolution, new jobs that did not exist before are emerging. Among them, the job that companies are most interested in is 'Data Scientist'. As information and communication technologies take up most of our lives, data on not only online activities but also offline activities are stored in computers every hour to generate big data. Companies put a lot of effort into discovering new opportunities from such big data. The new job that emerged along with the efforts of these companies is data scientist. The demand for data scientist, a promising job that leads the big data era, is constantly increasing, but its supply is not still enough. Although data analysis technologies and tools that anyone can easily use are introduced, companies still have great difficulty in finding proper experts. One of the main reasons that makes the data scientist's shortage problem serious is the lack of understanding of the data scientist's job. Therefore, in this study, we explore the job competencies of a data scientist by qualitatively analyzing the actual job posting information of the company. This study finds that data scientists need not only the technical and system skills required of software engineers and system analysts in the past, but also business-related and interpersonal skills required of business consultants and project managers. The results of this study are expected to provide basic guidelines to people who are interested in the data scientist profession and to companies that want to hire data scientists.

Effective Utilization of Data based on Analysis of Spatial Data Mining (공간 데이터마이닝 분석을 통한 데이터의 효과적인 활용)

  • Kim, Kibum;An, Beongku
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.157-163
    • /
    • 2013
  • Data mining is a useful technology that can support new discoveries based on the pattern analysis and a variety of linkages between data, and currently is utilized in various fields such as finance, marketing, medical. In this paper, we propose an effective utilization method of data based on analysis of spatial data mining. We make use of basic data of foreigners living in Seoul. However, the data has some features distinguished from other areas of data, classification as sensitive information and legal problem such as personal information protection. So, we use the basic statistical data that does not contain personal information. The main features and contributions of the proposed method are as follows. First, we can use Big Data as information through a variety of ways and can classify and cluster Big Data through refinement. Second. we can use these kinds of information for decision-making of future and new patterns. In the performance evaluation, we will use visual approach through graph of themes. The results of performance evaluation show that the analysis using data mining technology can support new discoveries of patterns and results.

Survey on Out-Of-Domain Detection for Dialog Systems (대화시스템 미지원 도메인 검출에 관한 조사)

  • Jeong, Young-Seob;Kim, Young-Min
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.9
    • /
    • pp.1-12
    • /
    • 2019
  • A dialog system becomes a new way of communication between human and computer. The dialog system takes human voice as an input, and gives a proper response in voice or perform an action. Although there are several well-known products of dialog system (e.g., Amazon Echo, Naver Wave), they commonly suffer from a problem of out-of-domain utterances. If it poorly detects out-of-domain utterances, then it will significantly harm the user satisfactory. There have been some studies aimed at solving this problem, but it is still necessary to study about this intensively. In this paper, we give an overview of the previous studies of out-of-domain detection in terms of three point of view: dataset, feature, and method. As there were relatively smaller studies of this topic due to the lack of datasets, we believe that the most important next research step is to construct and share a large dataset for dialog system, and thereafter try state-of-the-art techniques upon the dataset.

A Technology Analysis Model using Dynamic Time Warping

  • Choi, JunHyeog;Jun, SungHae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.113-120
    • /
    • 2015
  • Technology analysis is to analyze technological data such as patent and paper for a given technology field. From the results of technology analysis, we can get novel knowledge for R&D planing and management. For the technology analysis, we can use diverse methods of statistics. Time series analysis is one of efficient approaches for technology analysis, because most technologies have researched and developed depended on time. So many technological data are time series. Time series data are occurred through time. In this paper, we propose a methodology of technology forecasting using the dynamic time warping (DTW) of time series analysis. To illustrate how to apply our methodology to real problem, we perform a case study of patent documents in target technology field. This research will contribute to R&D planning and technology management.