• Title/Summary/Keyword: Generate Data

Search Result 3,066, Processing Time 0.035 seconds

Event Log Analysis Framework Based on the ATT&CK Matrix in Cloud Environments (클라우드 환경에서의 ATT&CK 매트릭스 기반 이벤트 로그 분석 프레임워크)

  • Yeeun Kim;Junga Kim;Siyun Chae;Jiwon Hong;Seongmin Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.2
    • /
    • pp.263-279
    • /
    • 2024
  • With the increasing trend of Cloud migration, security threats in the Cloud computing environment have also experienced a significant increase. Consequently, the importance of efficient incident investigation through log data analysis is being emphasized. In Cloud environments, the diversity of services and ease of resource creation generate a large volume of log data. Difficulties remain in determining which events to investigate when an incident occurs, and examining all the extensive log data requires considerable time and effort. Therefore, a systematic approach for efficient data investigation is necessary. CloudTrail, the Amazon Web Services(AWS) logging service, collects logs of all API call events occurring in an account. However, CloudTrail lacks insights into which logs to analyze in the event of an incident. This paper proposes an automated analysis framework that integrates Cloud Matrix and event information for efficient incident investigation. The framework enables simultaneous examination of user behavior log events, event frequency, and attack information. We believe the proposed framework contributes to Cloud incident investigations by efficiently identifying critical events based on the ATT&CK Framework.

Mapping Mammalian Species Richness Using a Machine Learning Algorithm (머신러닝 알고리즘을 이용한 포유류 종 풍부도 매핑 구축 연구)

  • Zhiying Jin;Dongkun Lee;Eunsub Kim;Jiyoung Choi;Yoonho Jeon
    • Journal of Environmental Impact Assessment
    • /
    • v.33 no.2
    • /
    • pp.53-63
    • /
    • 2024
  • Biodiversity holds significant importance within the framework of environmental impact assessment, being utilized in site selection for development, understanding the surrounding environment, and assessing the impact on species due to disturbances. The field of environmental impact assessment has seen substantial research exploring new technologies and models to evaluate and predict biodiversity more accurately. While current assessments rely on data from fieldwork and literature surveys to gauge species richness indices, limitations in spatial and temporal coverage underscore the need for high-resolution biodiversity assessments through species richness mapping. In this study, leveraging data from the 4th National Ecosystem Survey and environmental variables, we developed a species distribution model using Random Forest. This model yielded mapping results of 24 mammalian species' distribution, utilizing the species richness index to generate a 100-meter resolution map of species richness. The research findings exhibited a notably high predictive accuracy, with the species distribution model demonstrating an average AUC value of 0.82. In addition, the comparison with National Ecosystem Survey data reveals that the species richness distribution in the high-resolution species richness mapping results conforms to a normal distribution. Hence, it stands as highly reliable foundational data for environmental impact assessment. Such research and analytical outcomes could serve as pivotal new reference materials for future urban development projects, offering insights for biodiversity assessment and habitat preservation endeavors.

Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market (KOSDAQ 시장의 관리종목 지정 탐지 모형 개발)

  • Shin, Dong-In;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.157-176
    • /
    • 2018
  • The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues. According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder's equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%. Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder's equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive. If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities. Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased. In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

A study on the derivation and evaluation of flow duration curve (FDC) using deep learning with a long short-term memory (LSTM) networks and soil water assessment tool (SWAT) (LSTM Networks 딥러닝 기법과 SWAT을 이용한 유량지속곡선 도출 및 평가)

  • Choi, Jung-Ryel;An, Sung-Wook;Choi, Jin-Young;Kim, Byung-Sik
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1107-1118
    • /
    • 2021
  • Climate change brought on by global warming increased the frequency of flood and drought on the Korean Peninsula, along with the casualties and physical damage resulting therefrom. Preparation and response to these water disasters requires national-level planning for water resource management. In addition, watershed-level management of water resources requires flow duration curves (FDC) derived from continuous data based on long-term observations. Traditionally, in water resource studies, physical rainfall-runoff models are widely used to generate duration curves. However, a number of recent studies explored the use of data-based deep learning techniques for runoff prediction. Physical models produce hydraulically and hydrologically reliable results. However, these models require a high level of understanding and may also take longer to operate. On the other hand, data-based deep-learning techniques offer the benefit if less input data requirement and shorter operation time. However, the relationship between input and output data is processed in a black box, making it impossible to consider hydraulic and hydrological characteristics. This study chose one from each category. For the physical model, this study calculated long-term data without missing data using parameter calibration of the Soil Water Assessment Tool (SWAT), a physical model tested for its applicability in Korea and other countries. The data was used as training data for the Long Short-Term Memory (LSTM) data-based deep learning technique. An anlysis of the time-series data fond that, during the calibration period (2017-18), the Nash-Sutcliffe Efficiency (NSE) and the determinanation coefficient for fit comparison were high at 0.04 and 0.03, respectively, indicating that the SWAT results are superior to the LSTM results. In addition, the annual time-series data from the models were sorted in the descending order, and the resulting flow duration curves were compared with the duration curves based on the observed flow, and the NSE for the SWAT and the LSTM models were 0.95 and 0.91, respectively, and the determination coefficients were 0.96 and 0.92, respectively. The findings indicate that both models yield good performance. Even though the LSTM requires improved simulation accuracy in the low flow sections, the LSTM appears to be widely applicable to calculating flow duration curves for large basins that require longer time for model development and operation due to vast data input, and non-measured basins with insufficient input data.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

Contactless Data Society and Reterritorialization of the Archive (비접촉 데이터 사회와 아카이브 재영토화)

  • Jo, Min-ji
    • The Korean Journal of Archival Studies
    • /
    • no.79
    • /
    • pp.5-32
    • /
    • 2024
  • The Korean government ranked 3rd among 193 UN member countries in the UN's 2022 e-Government Development Index. Korea, which has consistently been evaluated as a top country, can clearly be said to be a leading country in the world of e-government. The lubricant of e-government is data. Data itself is neither information nor a record, but it is a source of information and records and a resource of knowledge. Since administrative actions through electronic systems have become widespread, the production and technology of data-based records have naturally expanded and evolved. Technology may seem value-neutral, but in fact, technology itself reflects a specific worldview. The digital order of new technologies, armed with hyper-connectivity and super-intelligence, not only has a profound influence on traditional power structures, but also has an a similar influence on existing information and knowledge transmission media. Moreover, new technologies and media, including data-based generative artificial intelligence, are by far the hot topic. It can be seen that the all-round growth and spread of digital technology has led to the augmentation of human capabilities and the outsourcing of thinking. This also involves a variety of problems, ranging from deep fakes and other fake images, auto profiling, AI lies hallucination that creates them as if they were real, and copyright infringement of machine learning data. Moreover, radical connectivity capabilities enable the instantaneous sharing of vast amounts of data and rely on the technological unconscious to generate actions without awareness. Another irony of the digital world and online network, which is based on immaterial distribution and logical existence, is that access and contact can only be made through physical tools. Digital information is a logical object, but digital resources cannot be read or utilized without some type of device to relay it. In that respect, machines in today's technological society have gone beyond the level of simple assistance, and there are points at which it is difficult to say that the entry of machines into human society is a natural change pattern due to advanced technological development. This is because perspectives on machines will change over time. Important is the social and cultural implications of changes in the way records are produced as a result of communication and actions through machines. Even in the archive field, what problems will a data-based archive society face due to technological changes toward a hyper-intelligence and hyper-connected society, and who will prove the continuous activity of records and data and what will be the main drivers of media change? It is time to research whether this will happen. This study began with the need to recognize that archives are not only records that are the result of actions, but also data as strategic assets. Through this, author considered how to expand traditional boundaries and achieves reterritorialization in a data-driven society.

Major Class Recommendation System based on Deep learning using Network Analysis (네트워크 분석을 활용한 딥러닝 기반 전공과목 추천 시스템)

  • Lee, Jae Kyu;Park, Heesung;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.95-112
    • /
    • 2021
  • In university education, the choice of major class plays an important role in students' careers. However, in line with the changes in the industry, the fields of major subjects by department are diversifying and increasing in number in university education. As a result, students have difficulty to choose and take classes according to their career paths. In general, students choose classes based on experiences such as choices of peers or advice from seniors. This has the advantage of being able to take into account the general situation, but it does not reflect individual tendencies and considerations of existing courses, and has a problem that leads to information inequality that is shared only among specific students. In addition, as non-face-to-face classes have recently been conducted and exchanges between students have decreased, even experience-based decisions have not been made as well. Therefore, this study proposes a recommendation system model that can recommend college major classes suitable for individual characteristics based on data rather than experience. The recommendation system recommends information and content (music, movies, books, images, etc.) that a specific user may be interested in. It is already widely used in services where it is important to consider individual tendencies such as YouTube and Facebook, and you can experience it familiarly in providing personalized services in content services such as over-the-top media services (OTT). Classes are also a kind of content consumption in terms of selecting classes suitable for individuals from a set content list. However, unlike other content consumption, it is characterized by a large influence of selection results. For example, in the case of music and movies, it is usually consumed once and the time required to consume content is short. Therefore, the importance of each item is relatively low, and there is no deep concern in selecting. Major classes usually have a long consumption time because they have to be taken for one semester, and each item has a high importance and requires greater caution in choice because it affects many things such as career and graduation requirements depending on the composition of the selected classes. Depending on the unique characteristics of these major classes, the recommendation system in the education field supports decision-making that reflects individual characteristics that are meaningful and cannot be reflected in experience-based decision-making, even though it has a relatively small number of item ranges. This study aims to realize personalized education and enhance students' educational satisfaction by presenting a recommendation model for university major class. In the model study, class history data of undergraduate students at University from 2015 to 2017 were used, and students and their major names were used as metadata. The class history data is implicit feedback data that only indicates whether content is consumed, not reflecting preferences for classes. Therefore, when we derive embedding vectors that characterize students and classes, their expressive power is low. With these issues in mind, this study proposes a Net-NeuMF model that generates vectors of students, classes through network analysis and utilizes them as input values of the model. The model was based on the structure of NeuMF using one-hot vectors, a representative model using data with implicit feedback. The input vectors of the model are generated to represent the characteristic of students and classes through network analysis. To generate a vector representing a student, each student is set to a node and the edge is designed to connect with a weight if the two students take the same class. Similarly, to generate a vector representing the class, each class was set as a node, and the edge connected if any students had taken the classes in common. Thus, we utilize Node2Vec, a representation learning methodology that quantifies the characteristics of each node. For the evaluation of the model, we used four indicators that are mainly utilized by recommendation systems, and experiments were conducted on three different dimensions to analyze the impact of embedding dimensions on the model. The results show better performance on evaluation metrics regardless of dimension than when using one-hot vectors in existing NeuMF structures. Thus, this work contributes to a network of students (users) and classes (items) to increase expressiveness over existing one-hot embeddings, to match the characteristics of each structure that constitutes the model, and to show better performance on various kinds of evaluation metrics compared to existing methodologies.

A Study on the Key Factors Affecting Big Data Use Intention of Agriculture Ventures in Terms of Technology, Organization and Environment: Focusing on Moderating Effect of Technical Field (농업벤처기업의 빅데이터 활용의도에 영향을 미치는 기술·조직·환경 관점의 핵심요인 연구: 기술분야의 조절효과를 중심으로)

  • Ahn, Mun Hyoung
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.16 no.6
    • /
    • pp.249-267
    • /
    • 2021
  • The use of big data accumulated along with the progress of digitalization is bringing disruptive innovation to the global agricultural industry. Recently, the government is establishing an agricultural big data platform and a support organization. However, in the domestic agricultural industry, the use of big data is insufficient except for some companies in the field of cultivation and growth. In this context, this study identifies factors affecting the intention to use big data in terms of technology, organization and environment, and also confirm the moderating effect of technical field, focusing on agricultural ventures which should be the main entities in creating innovation by using big data. Research data was obtained from 309 agricultural ventures supported by the A+ Center of FACT(Foundation of AgTech Commercialization and Transfer), and was analyzed using IBM SPSS 22.0. As a result, Among technical factors, relative advantage and compatibility were found to have a significant positive (+) effect. Among organizational factors, it was found that management support had a positive (+) effect and cost had a negative (-) effect. Among environmental factors, policy support were found to have a positive (+) effect. As a result of the verification of the moderating effect of technology field, it was found that firms other than cultivation had a moderating effect that alleviated the relationship between all variables other than relative advantage, compatibility, and competitor pressure and the intention to use big data. These results suggest the following implications. First, it is necessary to select a core business that will provide opportunities to generate new profits and improve operational efficiency to agricultural ventures through the use of big data, and to increase collaboration opportunities through policy. Second, it is necessary to provide a big data analysis solution that can overcome the difficulties of analysis due to the characteristics of the agricultural industry. Third, in small organizations such as agricultural ventures, the will of the top management to reorganize the organizational culture should be preceded by a high level of understanding on the use of big data. Fourth, it is important to discover and promote successful cases that can be benchmarked at the level of SMEs and venture companies. Fifth, it will be more effective to divide the priorities of core business and support business by agricultural venture technology sector. Finally, the limitations of this study and follow-up research tasks are presented.

A 2D / 3D Map Modeling of Indoor Environment (실내환경에서의 2 차원/ 3 차원 Map Modeling 제작기법)

  • Jo, Sang-Woo;Park, Jin-Woo;Kwon, Yong-Moo;Ahn, Sang-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.355-361
    • /
    • 2006
  • In large scale environments like airport, museum, large warehouse and department store, autonomous mobile robots will play an important role in security and surveillance tasks. Robotic security guards will give the surveyed information of large scale environments and communicate with human operator with that kind of data such as if there is an object or not and a window is open. Both for visualization of information and as human machine interface for remote control, a 3D model can give much more useful information than the typical 2D maps used in many robotic applications today. It is easier to understandable and makes user feel like being in a location of robot so that user could interact with robot more naturally in a remote circumstance and see structures such as windows and doors that cannot be seen in a 2D model. In this paper we present our simple and easy to use method to obtain a 3D textured model. For expression of reality, we need to integrate the 3D models and real scenes. Most of other cases of 3D modeling method consist of two data acquisition devices. One for getting a 3D model and another for obtaining realistic textures. In this case, the former device would be 2D laser range-finder and the latter device would be common camera. Our algorithm consists of building a measurement-based 2D metric map which is acquired by laser range-finder, texture acquisition/stitching and texture-mapping to corresponding 3D model. The algorithm is implemented with laser sensor for obtaining 2D/3D metric map and two cameras for gathering texture. Our geometric 3D model consists of planes that model the floor and walls. The geometry of the planes is extracted from the 2D metric map data. Textures for the floor and walls are generated from the images captured by two 1394 cameras which have wide Field of View angle. Image stitching and image cutting process is used to generate textured images for corresponding with a 3D model. The algorithm is applied to 2 cases which are corridor and space that has the four wall like room of building. The generated 3D map model of indoor environment is shown with VRML format and can be viewed in a web browser with a VRML plug-in. The proposed algorithm can be applied to 3D model-based remote surveillance system through WWW.

  • PDF