• Title/Summary/Keyword: Big Data Processing

Search Result 1,059, Processing Time 0.025 seconds

Application Development for Text Mining: KoALA (텍스트 마이닝 통합 애플리케이션 개발: KoALA)

  • Byeong-Jin Jeon;Yoon-Jin Choi;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.21 no.2
    • /
    • pp.117-137
    • /
    • 2019
  • In the Big Data era, data science has become popular with the production of numerous data in various domains, and the power of data has become a competitive power. There is a growing interest in unstructured data, which accounts for more than 80% of the world's data. Along with the everyday use of social media, most of the unstructured data is in the form of text data and plays an important role in various areas such as marketing, finance, and distribution. However, text mining using social media is difficult to access and difficult to use compared to data mining using numerical data. Thus, this study aims to develop Korean Natural Language Application (KoALA) as an integrated application for easy and handy social media text mining without relying on programming language or high-level hardware or solution. KoALA is a specialized application for social media text mining. It is an integrated application that can analyze both Korean and English. KoALA handles the entire process from data collection to preprocessing, analysis and visualization. This paper describes the process of designing, implementing, and applying KoALA applications using the design science methodology. Lastly, we will discuss practical use of KoALA through a block-chain business case. Through this paper, we hope to popularize social media text mining and utilize it for practical and academic use in various domains.

A Study of the Definition and Components of Data Literacy for K-12 AI Education (초·중등 AI 교육을 위한 데이터 리터러시 정의 및 구성 요소 연구)

  • Kim, Seulki;Kim, Taeyoung
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.5
    • /
    • pp.691-704
    • /
    • 2021
  • The development of AI technology has brought about a big change in our lives. The importance of AI and data education is also growing as AI's influence from life to society to the economy grows. In response, the OECD Education Research Report and various domestic information and curriculum studies deal with data literacy and present it as an essential competency. However, the definition of data literacy and the content and scope of the components vary among researchers. Thus, we analyze the semantic similarity of words through Word2Vec deep learning natural language processing methods along with the definitions of key data literacy studies and analysis of word frequency utilized in components, to present objective and comprehensive definition and components. It was revised and supplemented by expert review, and we defined data literacy as the 'basic ability of knowledge construction and communication to collect, analyze, and use data and process it as information for problem solving'. Furthermore we propose the components of each category of knowledge, skills, values and attitudes. We hope that the definition and components of data literacy derived from this study will serve as a good foundation for the systematization and education research of AI education related to students' future competency.

Efficient Privacy-Preserving Duplicate Elimination in Edge Computing Environment Based on Trusted Execution Environment (신뢰실행환경기반 엣지컴퓨팅 환경에서의 암호문에 대한 효율적 프라이버시 보존 데이터 중복제거)

  • Koo, Dongyoung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.9
    • /
    • pp.305-316
    • /
    • 2022
  • With the flood of digital data owing to the Internet of Things and big data, cloud service providers that process and store vast amount of data from multiple users can apply duplicate data elimination technique for efficient data management. The user experience can be improved as the notion of edge computing paradigm is introduced as an extension of the cloud computing to improve problems such as network congestion to a central cloud server and reduced computational efficiency. However, the addition of a new edge device that is not entirely reliable in the edge computing may cause increase in the computational complexity for additional cryptographic operations to preserve data privacy in duplicate identification and elimination process. In this paper, we propose an efficiency-improved duplicate data elimination protocol while preserving data privacy with an optimized user-edge-cloud communication framework by utilizing a trusted execution environment. Direct sharing of secret information between the user and the central cloud server can minimize the computational complexity in edge devices and enables the use of efficient encryption algorithms at the side of cloud service providers. Users also improve the user experience by offloading data to edge devices, enabling duplicate elimination and independent activity. Through experiments, efficiency of the proposed scheme has been analyzed such as up to 78x improvements in computation during data outsourcing process compared to the previous study which does not exploit trusted execution environment in edge computing architecture.

FDANT-PCSV: Fast Detection of Abnormal Network Traffic Using Parallel Coordinates and Sankey Visualization (FDANT-PCSV: Parallel Coordinates 및 Sankey 시각화를 이용한 신속한 이상 트래픽 탐지)

  • Han, Ki hun;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.693-704
    • /
    • 2020
  • As a company's network structure is getting bigger and the number of security system is increasing, it is not easy to quickly detect abnormal traffic from huge amounts of security system events. In this paper, We propose traffic visualization analysis system(FDANT-PCSV) that can detect and analyze security events of information security systems such as firewalls in real time. FDANT-PCSV consists of Parallel Coordinates visualization using five factors(source IP, destination IP, destination port, packet length, processing status) and Sankey visualization using four factors(source IP, destination IP, number of events, data size) among security events. In addition, the use of big data-based SIEM enables real-time detection of network attacks and network failure traffic from the internet and intranet. FDANT-PCSV enables cyber security officers and network administrators to quickly and easily detect network abnormal traffic and respond quickly to network threats.

Access Control Method for Software on Virtual OS Using the Open Authentication Protocol (개방형 인증 프로토콜을 이용한 가상 운영체제에 설치된 SW 접근통제 방안)

  • Kim, Sun-Joo;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.568-574
    • /
    • 2013
  • In recent years, IT companies offer various cloud services using hardware-based technologies or software-based technologies. User can access these cloud services without the constraints of location or devices. The technologies are virtualization, provisioning, and big data processing. However, security incidents are constantly occurring even with these techniques. Thus, many companies build and operate private cloud service to prevent the leak of critical data. If virtual environment are different according to user permission, many system are needed, and user should login several virtual system to execute an program. In this paper, I suggest the access control method for application software on virtual operating system using the Open Authentication protocol in the Cloud system.

Exploration of emerging technologies based on patent analysis in complex product systems for catch-up: the case of gas turbine (복합제품시스템 추격을 위한 특허 기반 부상기술 탐색: 가스터빈 사례를 중심으로)

  • Kwak, Kiho;Park, Joohyoung
    • Knowledge Management Research
    • /
    • v.17 no.2
    • /
    • pp.27-50
    • /
    • 2016
  • Korean manufacturing industry have recently faced the catch-up of China in the mass commodity product, such as automotive, display, and smart phone in terms of market as well as technology. Accordingly, discussion on the importance of achieving catch-up in complex product systems (CoPS) has been increasing as a new innovation engine for the industry. In order to achieve successful catch-up of CoPS, we explored emerging technologies of CoPS, which are featured by the characteristics of radical novelty, relatively fast growth and self-sustaining, through the study of emerging technologies of gas turbine for power generation. We found that emerging technologies of the gas turbine are technologies for combustion nozzle and composition of electrical machine for increasing power efficiency, washing technology for particulate matter, cast and material processing technology for enhancing durability from fatigue, cooling technologies from extremely high temperature, interconnection operation technology between renewable energy and the gas turbine for flexibility in power generation, and big data technology for remote monitoring and diagnosis of the gas turbine. We also found that those emerging technologies resulted in technological progress of the gas turbine by converging with other conventional technologies in the gas turbine. It indicates that emerging technologies in CoPS can be appeared on various technological knowledge fields and have complementary relationship with conventional technologies for technology progress of CoPS. It also implies that latecomers need to pursue integrated learning that includes emerging technologies as well as conventional technologies rather than independent learning related to emerging technologies for successful catch-up of CoPS. Our findings provide an important initial theoretical ground for investigating the emerging technologies and their characteristics in CoPS as well as recognizing knowledge management strategy for successful catch-up of latecomers. Our findings also contribute to the policy development of the CoPS from the perspective of innovation strategy and knowledge management.

Establishing a Sustainable Future Smart Education System (지속가능한 미래형 스마트교육 시스템 구축 방안)

  • Park, Ji-Hyeon;Choi, Jae-Myeong;Park, Byoung-Lyoul;Kang, Heau-Jo
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.3
    • /
    • pp.495-503
    • /
    • 2012
  • As modern society rapidly changes, the field of education has also developed speedily. Since Edunet system developed in 1996, many different systems are developing continuously such as Center for Teaching and Learning, cyber home learning systems, diagnosis prescribing systems, video systems, teaching and counseling, and study management systems. However, the aforementioned systems have had not great response from the educational consumers due to a lack of interconnection. There are several reasons for it. One of the reasons is that program administrators did not carefully consider the continuity of each programs but established a brand new system whenever they need rather than predict or consider the future needs. The suitable system for smart education should be one big integrated system based on many different data analysis and processing. The system should also supply educational consumers various and useful information by adopting the idea of bigdata rather than a single sign on system connecting each independent system. The cloud computing system should be established as a system that can be managed not as simple compiled files and application programs but as various contents and DATA.

Analysis of relationship between frequency of crime occurrence and frequency of web search (범죄 발생 빈도수와 웹 검색 빈도수의 관계 분석 연구)

  • Park, Jung-Min;Park, Koo-Rack;Chung, Young-Suk
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.15-20
    • /
    • 2018
  • In modern society, crime is one of the major social problems. Crime has a great impact not only on victims but also on those around them. It is important to predict crimes before they occur and to prevent crime. Various studies have been conducted to predict crime. One of the most important factors in predicting crime is frequency of crime occurrence. The frequency of crime is widely used as basic data for predicting crime. However, the frequency of crime occurrence is announced about 2 years after the statistical processing period. In this paper, we propose a frequency analysis of crime - related key words retrieved from the web as a way to indirectly grasp the frequency of crime occurrence. The relationship between the number of frequency of crime occurrence and frequency of actual crime occurrence was analyzed by correlation coefficient.

Software Equation Based on Function Points (기능점수 기반 소프트웨어 공식)

  • Lee, Sang-Un
    • The KIPS Transactions:PartD
    • /
    • v.17D no.5
    • /
    • pp.327-336
    • /
    • 2010
  • This paper proposed software equation that is relation with effort and duration based on function point (FP) software size. Existent software equation based on lines of code (LOC). LOC sees big difference according to development language and there are a lot of difficulties in software size estimation. First, considered method that change LOC to FP. But, this method is not decided definitely conversion ratio between LOC and FP by development language. Also, failed though the conversion ratio motives software formula because was not presented about specification development language. Therefore, we derived software formula directly to large project data that was developed by FP. Firstly, datas that reasonable development period is set among development projects. Secondly, FP through regression analysis about this data and effort, motived relation with FP and duration. Finally, software equation was derived from these relation. Proposed model solves application problems that LOC-based model has and has advantage that application is possible easily in business.

A 500MSamples/s 6-Bit CMOS Folding and Interpolating AD Converter (500MSamples/s 6-비트 CMOS 폴딩-인터폴레이팅 아날로그-디지털 변환기)

  • Lee Don-Suep;Kwack Kae-Dal
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.7
    • /
    • pp.1442-1447
    • /
    • 2004
  • In this paper, a 6-Bit CMOS Folding and Interpolating AD Converter is presented. The converter is considered to be useful as an integrated part of a VLSI circuit handling both analog and digital signals as in the case of HDD or LAN applications. A built-in analog circuit for VLSI of a high-speed data communication requires a small chip area, low power consumption, and fast data processing. The proposed folding and interpolating AD Converter uses a very small number of comparators and interpolation resistors, which is achieved by cascading a couple of folders working in different principles. This reduced number of parts is a big advantage for a built-in AD converter design. The design is based on 0.25m double-poly 2 metal n-well CMOS process. In the simulation, with the applied 2.5V and a sampling frequency of 500MHz, the measurements are as follows: power consumption of 27mw, INL and DNL of $\pm$0.1LSB, $\pm$0.15LSB each, SNDR of 42dB with an input signal of 10MHz.