• Title/Summary/Keyword: Combination among the big data

Search Result 24, Processing Time 0.026 seconds

The Mediating Effect and Moderating Effect of Pseudonymized Information Combination in the Relationship Between Regulation Factors of Personal Information and Big Data Utilization (개인정보 규제요인과 빅데이터 활용간의 관계에서 가명정보 결합의 매개효과 및 조절효과)

  • Kim, Sang-Gwang
    • Informatization Policy
    • /
    • v.27 no.3
    • /
    • pp.82-111
    • /
    • 2020
  • Recently, increasing use of big data have caused regulation factors of personal information and combination of pseudonymized information to emerge as key policy measures. Therefore, this study empirically analyzed the mediating effect and moderating effect of pseudonymized information combination as the third variable in the relationship between regulation factors of personal information and big data utilization. The analysis showed the following results: First, among personal information regulation factors, definition regulation, consent regulation, supervisory authority regulation, and punishment intensity regulation showed a positive(+) relationship with the big data utilization, while among pseudonymized information combination factors, non-identification of combination, standardization of combined pseudonymized information, and responsibility of combination were also found to be in a positive relationship with the use of big data. Second, among the factors of pseudonymized information combination, non-identification of combination, standardization of combined pseudonymized information, and responsibility of combination showed a positive(+) mediating effect in relation to regulation factors of personal information and big data utilization. Third, in the relationship between personal information regulation factors and big data utilization, the moderating effect hypothesis that each combination institution type of pseudonymized information (free-type, intermediary-type, and designated-type) would play a different role as a moderator was rejected. Based on the results of the empirical research, policy alternatives of 'Good Regulation' were proposed, which would maintain balance between protection of personal information and big data utilization.

An Exploration on Personal Information Regulation Factors and Data Combination Factors Affecting Big Data Utilization (빅데이터 활용에 영향을 미치는 개인정보 규제요인과 데이터 결합요인의 탐색)

  • Kim, Sang-Gwang;Kim, Sun-Kyung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.2
    • /
    • pp.287-304
    • /
    • 2020
  • There have been a number of legal & policy studies on the affecting factors of big data utilization, but empirical research on the composition factors of personal information regulation or data combination, which acts as a constraint, has been hardly done due to the lack of relevant statistics. Therefore, this study empirically explores the priority of personal information regulation factors and data combination factors that influence big data utilization through Delphi Analysis. As a result of Delphi analysis, personal information regulation factors include in order of the introduction of pseudonymous information, evidence clarity of personal information de-identification, clarity of data combination regulation, clarity of personal information definition, ease of personal information consent, integration of personal information supervisory authority, consistency among personal information protection acts, adequacy punishment intensity in case of violation of law, and proper penalty level when comparing EU GDPR. Next, data combination factors were examined in order of de-identification of data combination, standardization of combined data, responsibility of data combination, type of data combination institute, data combination experience, and technical value of data combination. These findings provide implications for which policy tasks should be prioritized when designing personal information regulations and data combination policies to utilize big data.

The Current Situation of the Big Data Utilization in the Agricultural Food Area and its Future Direction

  • Chung, Daniel Byungho;Cho, Jongpyo;Moon, Junghoon
    • Agribusiness and Information Management
    • /
    • v.5 no.2
    • /
    • pp.17-26
    • /
    • 2013
  • The purpose of this study is to prove that new values for the agricultural food area can be created by combining various big data collected in the agricultural food area and analyzing them in an appropriate analysis method. For this, the analysis techniques generally used were studied, and the use of the big data in the various areas of the current society was explored through practical application instances. In addition, by the current status and analysis instances of the big data use in the agricultural food area, this study was conducted to verify how the new values found were being used.

Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System

  • Zolkepli, Maslina;Dong, Fangyan;Hirota, Kaoru
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.256-267
    • /
    • 2014
  • An automatic switch among ensembles of clustering algorithms is proposed as a part of the bibliographic big data retrieval system by utilizing a fuzzy inference engine as a decision support tool to select the fastest performing clustering algorithm between fuzzy C-means (FCM) clustering, Newman-Girvan clustering, and the combination of both. It aims to realize the best clustering performance with the reduction of computational complexity from O($n^3$) to O(n). The automatic switch is developed by using fuzzy logic controller written in Java and accepts 3 inputs from each clustering result, i.e., number of clusters, number of vertices, and time taken to complete the clustering process. The experimental results on PC (Intel Core i5-3210M at 2.50 GHz) demonstrates that the combination of both clustering algorithms is selected as the best performing algorithm in 20 out of 27 cases with the highest percentage of 83.99%, completed in 161 seconds. The self-adapted FCM is selected as the best performing algorithm in 4 cases and the Newman-Girvan is selected in 3 cases.The automatic switch is to be incorporated into the bibliographic big data retrieval system that focuses on visualization of fuzzy relationship using hybrid approach combining FCM and Newman-Girvan algorithm, and is planning to be released to the public through the Internet.

A Study on the Role and Security Enhancement of the Expert Data Processing Agency: Focusing on a Comparison of Data Brokers in Vermont (데이터처리전문기관의 역할 및 보안 강화방안 연구: 버몬트주 데이터브로커 비교를 중심으로)

  • Soo Han Kim;Hun Yeong Kwon
    • Journal of Information Technology Services
    • /
    • v.22 no.3
    • /
    • pp.29-47
    • /
    • 2023
  • With the recent advancement of information and communication technologies such as artificial intelligence, big data, cloud computing, and 5G, data is being produced and digitized in unprecedented amounts. As a result, data has emerged as a critical resource for the future economy, and overseas countries have been revising laws for data protection and utilization. In Korea, the 'Data 3 Act' was revised in 2020 to introduce institutional measures that classify personal information, pseudonymized information, and anonymous information for research, statistics, and preservation of public records. Among them, it is expected to increase the added value of data by combining pseudonymized personal information, and to this end, "the Expert Data Combination Agency" and "the Expert Data Agency" (hereinafter referred to as the Expert Data Processing Agency) system were introduced. In comparison to these domestic systems, we would like to analyze similar overseas systems, and it was recently confirmed that the Vermont government in the United States enacted the first "Data Broker Act" in the United States as a measure to protect personal information held by data brokers. In this study, we aim to compare and analyze the roles and functions of the "Expert Data Processing Agency" and "Data Broker," and to identify differences in designated standards, security measures, etc., in order to present ways to contribute to the activation of the data economy and enhance information protection.

Transfer Learning-Based Feature Fusion Model for Classification of Maneuver Weapon Systems

  • Jinyong Hwang;You-Rak Choi;Tae-Jin Park;Ji-Hoon Bae
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.673-687
    • /
    • 2023
  • Convolutional neural network-based deep learning technology is the most commonly used in image identification, but it requires large-scale data for training. Therefore, application in specific fields in which data acquisition is limited, such as in the military, may be challenging. In particular, the identification of ground weapon systems is a very important mission, and high identification accuracy is required. Accordingly, various studies have been conducted to achieve high performance using small-scale data. Among them, the ensemble method, which achieves excellent performance through the prediction average of the pre-trained models, is the most representative method; however, it requires considerable time and effort to find the optimal combination of ensemble models. In addition, there is a performance limitation in the prediction results obtained by using an ensemble method. Furthermore, it is difficult to obtain the ensemble effect using models with imbalanced classification accuracies. In this paper, we propose a transfer learning-based feature fusion technique for heterogeneous models that extracts and fuses features of pre-trained heterogeneous models and finally, fine-tunes hyperparameters of the fully connected layer to improve the classification accuracy. The experimental results of this study indicate that it is possible to overcome the limitations of the existing ensemble methods by improving the classification accuracy through feature fusion between heterogeneous models based on transfer learning.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

An Empirical Study of Implementation and Application of Mold Life Cycle Management Information System In the Cloud Computing Environment (클라우드 컴퓨팅 환경에서 금형 수명주기관리 정보시스템 구축 및 적용의 실증적 연구)

  • Koh, Joon-Cheol;Nam, Seung-Done;Kim, Kyung-Sik
    • Journal of the Korea Safety Management & Science
    • /
    • v.16 no.4
    • /
    • pp.331-341
    • /
    • 2014
  • Internet of Thing(IoT), which is recently talked about with the development of information and communication technology, provides big data to all nodes such as companies and homes, means of transportation etc. by connecting all things with all people through the integrated global network and connecting all actual aspects of economic and social life with Internet of Thing through sensor and software. Defining Internet of Thing, it plays the role of a connector of providing various information required for the decision-making of companies in the cloud computing environment for the Insight usage by collecting and storing Raw Data of the production site through the sensor network and extracting big data in which data is accumulated and Insight through this. In addition, as the industry showing the largest linkage with other root industries among root industries, the mold industry is the core technology for controlling the quality and performance of the final product and realizing the commercialization of new industry such as new growth power industry etc. Recently, awareness on the mold industry is changing from the structure of being labor-intensive, relying on the experience of production workers and repeating modification without the concept of cost to technology-intensive, digitization, high intellectualization due to technology combination according to IT convergence. This study, therefore, is to provide a golden opportunity to increase the direct and indirect expected effects in poor management activities of small businesses by actually implementing and managing the entire process of mold life cycle to information system from mold planning to mass production and preservation by building SME(small and medium-sized enterprises)-type mold life cycle management information system in the cloud computing environment and applying it to the production site.

Epidemiology of PAH in Korea: An Analysis of the National Health Insurance Data, 2002-2018

  • Albert Youngwoo Jang;Hyeok-Hee Lee;Hokyou Lee;Hyeon Chang Kim;Wook-Jin Chung
    • Korean Circulation Journal
    • /
    • v.53 no.5
    • /
    • pp.313-327
    • /
    • 2023
  • Background and Objectives: Pulmonary arterial hypertension (PAH) is a rare but fatal disease. Recent advances in PAH-specific drugs have improved its outcomes, although the healthcare burden of novel therapeutics may lead to a discrepancy in outcomes between developing and developed countries. We analyzed how the epidemiology and clinical features of PAH has changed through the rapidly advancing healthcare infrastructure in South Korea. Methods: PAH was defined according to a newly devised 3-component algorithm. Using a nationwide health insurance claims database, we delineated annual trends in the prevalence, incidence, medication prescription pattern, and 5-year survival of PAH in Korea. Cumulative survival and potential predictors of mortality were also assessed among 2,151 incident PAH cases. Results: Between 2002 or 2004 and 2018, the prevalence and incidence of PAH increased 75-fold (0.4 to 29.9 per million people) and 12-fold (0.5 to 6.3 per million person-years), respectively. The proportion of patients on combination PAH-specific drug therapy has also steadily increased up to 29.0% in 2018. Among 2,151 incident PAH cases (median [interquartile range] age, 50 [37-62] years; 67.2% female), the 5-year survival rate and median survival duration were 71.8% and 13.1 years, respectively. Independent predictors of mortality were age, sex, etiology of PAH, diabetes, dyslipidemia, and chronic kidney disease. Conclusions: This nationwide study delineated that the prevalence and incidence of PAH have grown rapidly in Korea since the early 2000s. The use of combination therapy has also increased, and the 5-year survival rate of PAH in Korea was similar to those in western countries.

Implementation of R-language-based REST API and Solution for Security Issues (R 언어 기반의 REST API 구현 및 보안문제의 해결 방안)

  • Kang, DongHoon;Oh, Sejong
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.9 no.1
    • /
    • pp.387-394
    • /
    • 2019
  • Recently, the importance of big data has been increased, and demand for data analysis for the big data is also increased. R language is developed for data analysis, and users are analyzing data by using algorithms of various statistics, machine learning and data mining packages in R language. However, it is difficult to develop an application using R. Early study proposed a method to call R script through another language such as PHP, Java, and so on. However, it is troublesome to write such a development method in addition to R in combination with other languages. In this study, we introduce how to write API using only R language without using another language by using Plumber package. We also propose a solution for security issues related with R API. If we use propose technology for developing web application, we can expect high productivity, easy of use, and easy of maintenance.