• Title/Summary/Keyword: 비정형데이터

Search Result 589, Processing Time 0.031 seconds

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

Using Text-mining Method to Identify Research Trends of Freshwater Exotic Species in Korea (텍스트마이닝 (text-mining) 기법을 이용한 국내 담수외래종 연구동향 파악)

  • Do, Yuno;Ko, Eui-Jeong;Kim, Young-Min;Kim, Hyo-Gyeom;Joo, Gea-Jae;Kim, Ji Yoon;Kim, Hyun-Woo
    • Korean Journal of Ecology and Environment
    • /
    • v.48 no.3
    • /
    • pp.195-202
    • /
    • 2015
  • We identified research trends for freshwater exotic species in South Korea using text mining methods in conjunction with bibliometric analysis. We searched scientific and common names of freshwater exotic species as searching keywords including 1 mammal species, 3 amphibian-reptile species, 11 fish species, 2 aquatic plant species. A total of 245 articles including research articles and abstracts of conference proceedings published by 56 academic societies and institutes were collected from scientific article databases. The search keywords used were the common names for the exotic species. The $20^{th}$ century (1900's) saw the number of articles increase; however, during the early $21^{st}$ century (2000's) the number of published articles decreased slowly. The number of articles focusing on physiological and embryological research was significantly greater than taxonomic and ecological studies. Rainbow trout and Nile tilapia were the main research topic, specifically physiological and embryological research associated with the aquaculture of these species. Ecological studies were only conducted on the distribution and effect of large-mouth bass and nutria. The ecological risk associated with freshwater exotic species has been expressed yet the scientific information might be insufficient to remove doubt about ecological issues as expressed by interested by individuals and policy makers due to bias in research topics with respect to freshwater exotic species. The research topics of freshwater exotic species would have to diversify to effectively manage freshwater exotic species.

Effect of Forest Fire on the Microbial Community Activity of Forest Soil according to the Difference between Geology and Soil Depth (산불이 지질과 토심의 차이에 따른 산림토양 미생물 군집 활성도에 미치는 영향에 대한 연구)

  • Ji Seul Kim;Jun Ho Kim;Hyeong Chul Jeong;Eun Young Lee
    • The Journal of Engineering Geology
    • /
    • v.33 no.1
    • /
    • pp.15-25
    • /
    • 2023
  • The effects of forest fires on the activity of microbial communities in topsoil and subsoil were investigated. Samples were collected from Korean forest soils comprising mainly igneous and sedimentary rocks. Analysis of beta-glucosidase, found higher microbial activity in sedimentary rocks than in igneous rocks. Enzyme activity was not observed immediately after fire, but was restored over time. The enzyme activity of subsoil was inhibited by 33~46% compared with that in the topsoil, regardless of soil damage. The effect of fire on the availability of microbial substrate was investigated using EcoPlate. The percentages of average well color development values of damaged and normal topsoil were 52.7~56.8% and 62.3~83.6%, respectively. Forest fires appear to affect the diversity and substrate availability of the subsoil microbial community by accelerating the decomposition of soil organic matter. The Shanon index, representing microbial biodiversity, was high in the topsoil of all samples; it was higher for soil microorganisms in sedimentary rocks than in igneous rocks, and higher in topsoil than in subsoil.

A Study on Robust Optimal Sensor Placement for Real-time Monitoring of Containment Buildings in Nuclear Power Plants (원전 격납 건물의 실시간 모니터링을 위한 강건한 최적 센서배치 연구)

  • Chanwoo Lee;Youjin Kim;Hyung-jo Jung
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.36 no.3
    • /
    • pp.155-163
    • /
    • 2023
  • Real-time monitoring technology is critical for ensuring the safety and reliability of nuclear power plant structures. However, the current seismic monitoring system has limited system identification capabilities such as modal parameter estimation. To obtain global behavior data and dynamic characteristics, multiple sensors must be optimally placed. Although several studies on optimal sensor placement have been conducted, they have primarily focused on civil and mechanical structures. Nuclear power plant structures require robust signals, even at low signal-to-noise ratios, and the robustness of each mode must be assessed separately. This is because the mode contributions of nuclear power plant containment buildings are concentrated in low-order modes. Therefore, this study proposes an optimal sensor placement methodology that can evaluate robustness against noise and the effects of each mode. Indicators, such as auto modal assurance criterion (MAC), cross MAC, and mode shape distribution by node were analyzed, and the suitability of the methodology was verified through numerical analysis.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Improving University Homepage FAQ Using Semantic Network Analysis (의미 연결망 분석을 활용한 대학 홈페이지 FAQ 개선방안)

  • Ahn, Su-Hyun;Lee, Sang-Jun
    • Journal of Digital Convergence
    • /
    • v.16 no.9
    • /
    • pp.11-20
    • /
    • 2018
  • The Q&A board is widely used as a means of communicating service enquiries, and the need for efficient management of the enquiry system has risen because certain questions are being repeatedly and frequently registered. This study aims to construct a student-centered FAQ, centered on the unstructured data posted on the university homepage's Q&A board. We extracted major keywords from 690 postings registered in the recent 3 years, and conducted the semantic network analysis to find the relationship between the keywords and the centrality analysis in order to carry out network visualization. The most central keywords found through the analysis, in order of centrality, were application, curriculum, credit point, completion, graduation, approval, period, major, portal, department. Also, the major keywords were classified into 8 groups of course, register, student life, scholarship, library, dormitory, IT and commute. If the most frequent questions are organized into these areas to form the FAQ, based on the results above, it is expected to contribute to user convenience and the efficiency of administration by simplifying the service enquiry process for repeated questions, as well as enabling smooth two-way communication among the members of the university.

A Study on the e-Document Development of Parcel Service for Reliable Delivery (택배 물류 안전 배송을 위한 전자문서 개발 연구)

  • Ahn, Kyeong Rim;Park, Chan Kwon
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.47-59
    • /
    • 2016
  • Parcel service is to deliver goods from one place to the designated destination requested according to user request. Parcel operations such as sorting, distributing, etc. or the managed information are heterogeneous by the companies. Additionally, it is impossible to support interoperability between companies with unformatted data of manual processing. Most parcel package boxes attached to paper typed waybill is attached is delivered to consignee. So, security problems such as personal information leaking are occurred, or extra processing time and logistics costs are needed due to wrong or the damaged information. Business environment of parcel service is rapidly changed as introducing unmanned delivery or the advanced technology such as Internet of Things. User want to know the accurate status or steps from parcel service request to delivery. To provide these requirements, the unified and integrated waybill information for reliable transportation of parcel service is needed. This information will provide to pickup or delivery carrier, warehouse or terminal, and parcel service user per pickup, transport, and delivery stage of parcel delivery service. Therefore, this paper defines the simplified and unified information model for parcel service waybill by analyzing information systems used for logistics unit processes that is occurred to parcel service, and manual work processes, and developing the relevant information of work flows occurred between business processes or transactions with the collected or processed information by from parcel service's stages. It is possible to share these standard model between business entities, and replacing paper typed waybill will improve national life safety as preventing security threats by paper typed waybill. As a result, it will promote the public interest from the stakeholder's perspective.

A Study on Questionnaire Improvement using Text Mining (텍스트 마이닝 기법을 활용한 설문 문항 개선에 관한 연구)

  • Paek, Yun-Ji;Jung, Chang-Hyun
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.121-128
    • /
    • 2020
  • The Marine Safety Culture Index (MSCI) was developed in the year 2018 for objectively assessing the public safety culture levels and for incorporating it as data to spread knowledge regarding the marine safety culture. The method for calculating the safety culture index should include issues that may affect the safety culture and should consist of appropriate attributes for estimating the current status. In addition, continuous verification and supplementation are required for addressing social and economic changes. In this study, to determine whether the questionnaire designed by marine experts reflects the people's interests and needs, we analyzed 915 marine safety proposals. Text mining was employed for analyzing the unstructured data of the marine safety proposals, and network analysis and topic modeling were subsequently performed. Analysis of the marine safety proposals was centered on attributes such as education, public relations, safety rules, awareness, skilled workers, and systems. Eighteen questions were modified and supplemented for reflecting the marine safety proposals, and reliability of the revised questions was analyzed. Furthermore, compared to the previous year, the questionnaire's internal consistency was improved upon and was rated at a high value of 0.895. It is expected that by employing the derived marine safety culture index and incorporating the improved questionnaire that reflects the requirements of marine experts and the people, the improved questionnaire will contribute to the establishment of policies for spreading knowledge regarding the marine safety culture.

Preliminary Scheduling Based on Historical and Experience Data for Airport Project (초기 기획단계의 실적 및 경험자료 기반 공항사업 기준공기 산정체계)

  • Kang, Seunghee;Jung, Youngsoo;Kim, Sungrae;Lee, Ikhaeng;Lee, Changweon;Jeong, Jinhak
    • Korean Journal of Construction Engineering and Management
    • /
    • v.18 no.6
    • /
    • pp.26-37
    • /
    • 2017
  • Preliminary scheduling at the initial stage of planning phase is usually performed with limited information and details. Therefore, the reliability and accuracy of preliminary scheduling is affected by personal experiences and skills of the schedule planners, and it requires enormous managerial effort (or workload). Reusing of historical data of the similar projects is important for efficient preliminary scheduling. However, understanding the structure of historical data and applying them to a new project requires a great deal of experience and knowledge. In this context, this paper propose a framework and methodology for automated preliminary schedule generation based on historical database. The proposed methodology and framework enables to automatically generate CPM schedules for airport projects in the early planning stage in order to enhance the reliability and to reduce the workload by using structured knowledge and experience.