Search | Korea Science

Incremental Generation of A Decision Tree Using Global Discretization For Large Data (대용량 데이터를 위한 전역적 범주화를 이용한 결정 트리의 순차적 생성)

Han, Kyong-Sik;Lee, Soo-Won
- The KIPS Transactions:PartB
- /
- v.12B no.4 s.100
- /
- pp.487-498
- /
- 2005
Recently, It has focused on decision tree algorithm that can handle large dataset. However, because most of these algorithms for large datasets process data in a batch mode, if new data is added, they have to rebuild the tree from scratch. h more efficient approach to reducing the cost problem of rebuilding is an approach that builds a tree incrementally. Representative algorithms for incremental tree construction methods are BOAT and ITI and most of these algorithms use a local discretization method to handle the numeric data type. However, because a discretization requires sorted numeric data in situation of processing large data sets, a global discretization method that sorts all data only once is more suitable than a local discretization method that sorts in every node. This paper proposes an incremental tree construction method that efficiently rebuilds a tree using a global discretization method to handle the numeric data type. When new data is added, new categories influenced by the data should be recreated, and then the tree structure should be changed in accordance with category changes. This paper proposes a method that extracts sample points and performs discretiration from these sample points to recreate categories efficiently and uses confidence intervals and a tree restructuring method to adjust tree structure to category changes. In this study, an experiment using people database was made to compare the proposed method with the existing one that uses a local discretization.
https://doi.org/10.3745/KIPSTB.2005.12B.4.487 인용 PDF KSCI

Research on Longitudinal Slope Estimation Using Digital Elevation Model (수치표고모델 정보를 활용한 도로 종단경사 산출 연구)

Han, Yohee;Jung, Yeonghun;Chun, Uibum;Kim, Youngchan;Park, Shin Hyoung
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.20 no.6
- /
- pp.84-99
- /
- 2021
As the micro-mobility market grows, the demand for route guidance, that includes uphill information as well, is increasing. Since the climbing angle depends on the electric motor uesed, it is necessary to establish an uphill road DB according to the threshold standard. Although road alignment information is a very important element in the basic information of the roads, there is no information currently on the longitudinal slope in the road digital map. The High Definition(HD) map which is being built as a preparation for the era of autonomous vehicles has the altitude value, unlike the existing standard node link system. However, the HD map is very insufficient because it has the altitude value only for some sections of the road network. This paper, hence, intends to propose a method to generate the road longitudinal slope using currently available data. We developed a method of computing the longitudinal slope by combining the digital elevation model and the standard link system. After creating an altitude at the road link point divided by 4m based on the Seoul road network, we calculated individual slope per unit distance of the road. After designating a representative slope for each road link, we have extracted the very steep road that cannot be climbed with personal mobility and the slippery roads that cannot be used during heavy snowfall. We additionally described errors in the altitude values due to surrounding terrain and the issues related to the slope calculation method. In the future, we expect that the road longitudinal slope information will be used as basic data that can be used for various convergence analyses.
https://doi.org/10.12815/kits.2021.20.6.84 인용 PDF KSCI

A Study on the Authenticity Verification of UxNB Assisting Terrestrial Base Stations

Kim, Keewon;Park, Kyungmin;Kim, Jonghyun;Park, Tae-Keun
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.12
- /
- pp.131-139
- /
- 2022
In this paper, to verify the authenticity of UxNB that assists terrestrial base stations, the solutions for SI (System Information) security presented in 3GPP TR 33.809 are analyzed from the perspective of UxNB. According to the definition of 3GPP (Third Generation Partnership Project), UxNB is a base station mounted on a UAV (Unmanned Aerial Vehicle), is carried in the air by the UAV, and is a radio access node that provides a connection to the UE (User Equipment). Such solutions for SI security can be classified into hash based, MAC (Message Authentication Codes) based, and digital signature based, and a representative solution for each category is introduced one by one. From the perspective of verifying the authenticity of UxNB for each solution, we compare and analyze the solutions in terms of provisioning information and update, security information leakage of UxNB, and additionally required amount of computation and transmission. As a result of the analysis, the solution for verifying the authenticity of the UxNB should minimize the secret information to be stored in the UxNB, be stored in a secure place, and apply encryption when it is updated over the air. In addition, due to the properties of the low computing power of UxNB and the lack of power, it is necessary to minimize the amount of computation and transmission.
https://doi.org/10.9708/jksci.2022.27.12.131 인용 PDF KSCI HTML

Automated Finite Element Analyses for Structural Integrated Systems (통합 구조 시스템의 유한요소해석 자동화)

Chongyul Yoon
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.37 no.1
- /
- pp.49-56
- /
- 2024
An automated dynamic structural analysis module stands as a crucial element within a structural integrated mitigation system. This module must deliver prompt real-time responses to enable timely actions, such as evacuation or warnings, in response to the severity posed by the structural system. The finite element method, a widely adopted approximate structural analysis approach globally, owes its popularity in part to its user-friendly nature. However, the computational efficiency and accuracy of results depend on the user-provided finite element mesh, with the number of elements and their quality playing pivotal roles. This paper introduces a computationally efficient adaptive mesh generation scheme that optimally combines the h-method of node movement and the r-method of element division for mesh refinement. Adaptive mesh generation schemes automatically create finite element meshes, and in this case, representative strain values for a given mesh are employed for error estimates. When applied to dynamic problems analyzed in the time domain, meshes need to be modified at each time step, considering a few hundred or thousand steps. The algorithm's specifics are demonstrated through a standard cantilever beam example subjected to a concentrated load at the free end. Additionally, a portal frame example showcases the generation of various robust meshes. These examples illustrate the adaptive algorithm's capability to produce robust meshes, ensuring reasonable accuracy and efficient computing time. Moreover, the study highlights the potential for the scheme's effective application in complex structural dynamic problems, such as those subjected to seismic or erratic wind loads. It also emphasizes its suitability for general nonlinear analysis problems, establishing the versatility and reliability of the proposed adaptive mesh generation scheme.
https://doi.org/10.7734/COSEIK.2024.37.1.49 인용 PDF

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
- Journal of Intelligence and Information Systems
- /
- v.17 no.4
- /
- pp.227-240
- /
- 2011
New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.
https://doi.org/10.13088/jiis.2011.17.4.227 인용 PDF KSCI

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
- Journal of Intelligence and Information Systems
- /
- v.20 no.1
- /
- pp.49-65
- /
- 2014
Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.
https://doi.org/10.13088/jiis.2014.20.1.049 인용 PDF KSCI

An Analysis into the Characteristics of the High-pass Transportation Data and Information Processing Measures on Urban Roads (도시부도로에서의 하이패스 교통자료 특성분석 및 정보가공방안)

Jung, Min-Chul;Kim, Young-Chan;Kim, Dong-Hyo
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.10 no.6
- /
- pp.74-83
- /
- 2011
The high-pass transportation information system directly collects section information by using probe cars and therefore can offer more reliable information to drivers. However, because the running condition and features of probe cars and statistical processing methods affect the reliability of the information and particularly because the section travel time is greatly influenced by whether there has been delay by signals on urban roads or not, there can be much deviation among the collected individual probe data. Accordingly, researches in multilateral directions are necessary in order to enhance the credibility of the section information. Yet, the precedent studies related to high-pass information provision have been conducted on the highway sections with the feature of continuous flow, which has a limit to be applied to the urban roads with the transportational feature of an interrupted flow. Therefore, this research aims at analyzing the features of high-pass transportation data on urban roads and finding a proper processing method. When the characteristics of the high-pass data on urban roads collected from RSE were analyzed by using a time-space diagram, the collected data was proved to have a certain pattern according to the arriving cars' waiting for signals with the period of the signaling cycle of the finish node. Moreover, the number of waiting for signals and the time of waiting caused the deviation in the collected data, and it was bigger in traffic jam. The analysis result showed that it was because the increased number of waiting for signals in traffic jam caused the deviation to be offset partially. The analysis result shows that it is appropriate to use the mean of this collected data of high-pass on urban roads as its representative value to reflect the transportational features by waiting for signals, and the standard of judgment of delay and congestion needs to be changed depending on the features of signals and roads. The results of this research are expected to be the foundation stone to improve the reliability of high-pass information on urban roads.
PDF KSCI

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
- Journal of Internet Computing and Services
- /
- v.14 no.6
- /
- pp.71-84
- /
- 2013
Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.
https://doi.org/10.7472/jksii.2013.14.6.71 인용 PDF KSCI

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

Yun, Unil;Pyun, Gwangbum
- Journal of Internet Computing and Services
- /
- v.16 no.1
- /
- pp.67-74
- /
- 2015
In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.
https://doi.org/10.7472/jksii.2015.16.1.67 인용 PDF KSCI

Search Result 129, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)