• Title/Summary/Keyword: Big data processing

Search Result 1,063, Processing Time 0.032 seconds

Detecting Common Weakness Enumeration(CWE) Based on the Transfer Learning of CodeBERT Model (CodeBERT 모델의 전이 학습 기반 코드 공통 취약점 탐색)

  • Chansol Park;So Young Moon;R. Young Chul Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.431-436
    • /
    • 2023
  • Recently the incorporation of artificial intelligence approaches in the field of software engineering has been one of the big topics. In the world, there are actively studying in two directions: 1) software engineering for artificial intelligence and 2) artificial intelligence for software engineering. We attempt to apply artificial intelligence to software engineering to identify and refactor bad code module areas. To learn the patterns of bad code elements well, we must have many datasets with bad code elements labeled correctly for artificial intelligence in this task. The current problems have insufficient datasets for learning and can not guarantee the accuracy of the datasets that we collected. To solve this problem, when collecting code data, bad code data is collected only for code module areas with high-complexity, not the entire code. We propose a method for exploring common weakness enumeration by learning the collected dataset based on transfer learning of the CodeBERT model. The CodeBERT model learns the corresponding dataset more about common weakness patterns in code. With this approach, we expect to identify common weakness patterns more accurately better than one in traditional software engineering.

An Analysis Study of Deliberation Results to Change the Present Condition around Gyeonggi-do Designated Cultural Properties - Focusing on the Proposed Legislation 3 or More Times a Deliberations of the Cultural Properties Committee - (경기도지정문화재 주변 현상변경허가 신청안 심의결과에 관한 분석 연구 - 문화재위원회심의 3회 이상 상정안을 중심으로 -)

  • Lim, Jin-Kang;Kim, Dong-Chan
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.29 no.3
    • /
    • pp.85-96
    • /
    • 2011
  • The purpose of this study, Around Gyeonggi-do cultural propertie Change the Present Condition not apply to analyze the results of processing Change the Present Condition of the trends and issues, and characteristics are derived and In determining the basic data processing of the Change the Present Condition presented are intended to be. 248 of 2009 regulated by Gyeonggi-do Cultural Assets committee agenda for consideration of the more than three times a copy of 15 were enrolled in the study. Review the results of the Change the Present Condition permit, permit held, to review classified information and analyzes the results of processing and complementary. Application for change processing standards and their comparison with the Change the Present Condition of cultural property through the deliberations and conclusions should analyze the results. As a result of research first, decision to allow processing of the application is characterized by a variety of facilities and the lower floors many times the result of the approval, the construction of cultural property conditioned space after the application complements the exterior of the building permit has been determined, applied to the current building near where the decision to allow the existence of is the main reason Second, decisions permit held, if requested neighborhood facilities lots of facilities and construction of large-scale is the most. Results from the first hearing until a final decision is not much change in results and cultural property surroundings due to the building of the reason for rejection was the most inhibited. Third, reconsideration of the decision if the city's development projects and other large development projects, and floors of the building height did not significantly affect the change. Above all, Decisions based on the results of the presence or absence was a big acts and the reason for reconsideration, and on-site investigation is the most. Fourth, It is based on the processing of Change the Present Condition that has been passed or rejected treatment and standards of treatment in two areas where the two sections across any side of the strict criteria were applied. Cultural Properties and applications with the distance increases, the rejection and the reconsideration decision is limited Such distance did not affect the decision to allow.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Verification of Multi-point Displacement Response Measurement Algorithm Using Image Processing Technique (영상처리기법을 이용한 다중 변위응답 측정 알고리즘의 검증)

  • Kim, Sung-Wan;Kim, Nam-Sik
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.3A
    • /
    • pp.297-307
    • /
    • 2010
  • Recently, maintenance engineering and technology for civil and building structures have begun to draw big attention and actually the number of structures that need to be evaluate on structural safety due to deterioration and performance degradation of structures are rapidly increasing. When stiffness is decreased because of deterioration of structures and member cracks, dynamic characteristics of structures would be changed. And it is important that the damaged areas and extent of the damage are correctly evaluated by analyzing dynamic characteristics from the actual behavior of a structure. In general, typical measurement instruments used for structure monitoring are dynamic instruments. Existing dynamic instruments are not easy to obtain reliable data when the cable connecting measurement sensors and device is long, and have uneconomical for 1 to 1 connection process between each sensor and instrument. Therefore, a method without attaching sensors to measure vibration at a long range is required. The representative applicable non-contact methods to measure the vibration of structures are laser doppler effect, a method using GPS, and image processing technique. The method using laser doppler effect shows relatively high accuracy but uneconomical while the method using GPS requires expensive equipment, and has its signal's own error and limited speed of sampling rate. But the method using image signal is simple and economical, and is proper to get vibration of inaccessible structures and dynamic characteristics. Image signals of camera instead of sensors had been recently used by many researchers. But the existing method, which records a point of a target attached on a structure and then measures vibration using image processing technique, could have relatively the limited objects of measurement. Therefore, this study conducted shaking table test and field load test to verify the validity of the method that can measure multi-point displacement responses of structures using image processing technique.

A Study on the Seepage Behavior of Embankment with Weak Zone using Numerical Analysis and Model Test (취약대를 가진 모형제방의 침투거동에 관한 연구)

  • Park, Mincheol;Im, Eunsang;Lee, Seokyoung;Han, Heuisoo
    • Journal of the Korean GEO-environmental Society
    • /
    • v.17 no.7
    • /
    • pp.5-13
    • /
    • 2016
  • This research is focused on the seepage behavior of embankment which had the weak zone with big permeability. The distributed TDR (Time Domain Reflectometer) and point sensors such as settlement gauge, pore water pressuremeter, vertical total stressmeter, and FDR (Frequency Domain Reflectometer) sensor were used to measure the seepage characteristics and embankment behavior. Also, the measured data were compared to the data of 2-D and 3-D numerical analysis. The dimension of model embankment was 7 m length, 5 m width and 1.5 m height, which is composed of fine-grained sands and the water level of embankment was 1.3 m height. The seepage behavior of measuring and numerical analysis were very similar, it means that the proper sensing system can monitor the real-time safety of embankment. The result by 2-D and 3-D numerical analysis showed similar saturation processing, however in case of weak zone, the phreatic lines of 2-D showed faster movement than that of 3-D analysis, and finally they converged.

Design and Implementation of an In-Memory File System Cache with Selective Compression (대용량 파일시스템을 위한 선택적 압축을 지원하는 인-메모리 캐시의 설계와 구현)

  • Choe, Hyeongwon;Seo, Euiseong
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.658-667
    • /
    • 2017
  • The demand for large-scale storage systems has continued to grow due to the emergence of multimedia, social-network, and big-data services. In order to improve the response time and reduce the load of such large-scale storage systems, DRAM-based in-memory cache systems are becoming popular. However, the high cost of DRAM severely restricts their capacity. While the method of compressing cache entries has been proposed to deal with the capacity limitation issue, compression and decompression, which are technically difficult to parallelize, induce significant processing overhead and in turn retard the response time. A selective compression scheme is proposed in this paper for in-memory file system caches that rapidly estimates the compression ratio of incoming cache entries with their Shannon entropies and compresses cache entries with low compression ratio. In addition, a description is provided of the design and implementation of an in-kernel in-memory file system cache with the proposed selective compression scheme. The evaluation showed that the proposed scheme reduced the execution time of benchmarks by approximately 18% in comparison to the conventional non-compressing in-memory cache scheme. It also provided a cache hit ratio similar to the all-compressing counterpart and reduced 7.5% of the execution time by reducing the compression overhead. In addition, it was shown that the selective compression scheme can reduce the CPU time used for compression by 28% compared to the case of the all-compressing scheme.

Design and Implementation of Luo-kuan Recognition Application (낙관 인식을 위한 애플리케이션의 설계 및 구현)

  • Kim, Han-Syel;Seo, Kwi-Bin;Kang, Mingoo;Ryu, Gee Soo;Hong, Min
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.97-103
    • /
    • 2018
  • In oriental paintings, there is Luo-kuan that expressed in a single picture by compressing the artist's information. Such Luo-kuan includes various information such as the title of the work or the name of the artist. Therefore, information about Luo-kuan is considered important to those who collect or enjoy oriental paintings. However, most of the letters in the Luo-kuan are difficult kanji, kanzai, or various shapes, so it is difficult for the ordinary people to interpret. In this paper, we developed an Luo-kuan search application to easily check the information of the Luo-kuan. The application uses a search algorithm that analyzes the captured Luo-kuan image and sends it to the server to output information about the Luo-kuan candidates that are most similar to the Luo-kuan images taken from the database in the server. We also compared and analyzed the accuracy of the algorithm based on 170 Luo-kuan data in order to find out the ranking of the Luo-kuan that matched the Luo-kuan among the candidates. Accuracy Analysis Experimental Results The accuracy of the search algorithm of this application is confirmed to be about 90%, and it is anticipated that it will be possible to develop a platform to automatically analyze and search images in a big data environment by supplementing the optimizing algorithm and multi-threading algorithm.

Research on the Variation of Deposition & Accumulation on the Shorelines using Ortho Areial Photos (수치항공사진을 이용한 해안선 침퇴적변화에 관한 연구)

  • Choi, Chul-Uong;Lee, Chang-Hun;Oh, Che-Young;Son, Jung-Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.17 no.3
    • /
    • pp.23-31
    • /
    • 2009
  • The border of the shorelines in a nation is an important factor in determining the border of a national territory, but Korea's shorelines are rapidly changing due to the recent rise in sea level from global warming and growth-centered economic policy over the decades of years. This research was done centering on the areas having well-preserved shorelines as they naturally are and other areas having damaged shorelines in their vicinities due to artificial structures at the two beaches located at the neighboring areas and having mutually homogeneous ocean conditions with each other. First, this research derived the shorelines using the aerial photographies taken from 1947 until 2007 and revised the tidal levels sounding data obtained from a hydrographical survey automation system consisting of Echosounder[Echotrac 3100] and Differential Global Positioning System[Beacon]by using topographical data and ships on land obtained by applying post-processing Kinematic GPS measuring method. In addition, this research evaluated the changes and dimensional variations for the last 60 years by dividing these determined shorelines into 5 sections. As a result, the Haewundae Beach showed a total of 29% decrease rate in dimension as of the year 2007 in comparison with the year 1947 due to a rapid dimensional decline centering on its west areas, while the dimension of the Gwanganri Beach showed an increase in its dimension amounting to a total of 69% due to the decrease in flow velocity by artificial structures built on both ends of the beach-forming accumulation; thus, it was found that there existed a big difference in deposition & accumulation tendency depending on neighboring environment in spite of the homogeneous ocean conditions.

  • PDF

Policy and Strategy for Intelligence Information Education and Technology (지능정보 교육과 기술 지원 정책 및 전략)

  • Lee, Tae-Gyu;Jung, Dae-Chul;Kim, Yong-Kab
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.8
    • /
    • pp.359-368
    • /
    • 2017
  • What is the term "intelligence information society", which is a term that has been continuously discussed recently? This means that the automation beyond the limits of human ability in the whole societies based on intelligent information technology is a universalized social future. In particular, it is a concept that minimizes human intervention and continuously pursues evolution to data (or big data) -based automation. For example, autonomous automation is constantly aiming at unmanned vehicles with artificial intelligence as a key element. However, until now, intelligent information research has focused on the intelligence itself and has made an effort to improve intelligence logic and replace human brain and intelligence. On the other hand, in order to replace the human labor force, we have continued to make efforts to replace workers with robots by analyzing the working principles of workers and developing optimized simple logic. This study proposes important strategies and directions to implement intelligent information education policy and intelligent information technology research strategy by suggesting access strategy, education method and detailed policy road map for intelligent information technology research strategy and educational service. In particular, we propose a phased approach to intelligent information education such as basic intelligence education, intelligent content education, and intelligent application education. In addition, we propose education policy plan for the improvement of intelligent information technology, intelligent education contents, and intelligent education system as an important factor for success and failure of the 4th industrial revolution, which is centered on intelligence and automation.