Search | Korea Science

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

Lee, Jong Hwa;Lee, Hyun-Kyu
- Journal of Korea Society of Industrial Information Systems
- /
- v.20 no.2
- /
- pp.113-124
- /
- 2015
Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.
https://doi.org/10.9723/jksiis.2015.20.2.113 인용 PDF KSCI

Constructing Gene Regulatory Networks using Frequent Gene Expression Pattern and Chain Rules (빈발 유전자 발현 패턴과 연쇄 규칙을 이용한 유전자 조절 네트워크 구축)

Lee, Heon-Gyu;Ryu, Keun-Ho;Joung, Doo-Young
- The KIPS Transactions:PartD
- /
- v.14D no.1 s.111
- /
- pp.9-20
- /
- 2007
Groups of genes control the functioning of a cell by complex interactions. Such interactions of gene groups are tailed Gene Regulatory Networks(GRNs). Two previous data mining approaches, clustering and classification, have been used to analyze gene expression data. Though these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rules. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and gene expression patterns we detected by applying the FP-growth algorithm. Next, we construct a gene regulatory network from frequent gene patterns using chain rules. Finally, we validate our proposed method through our experimental results, which are consistent with published results.
https://doi.org/10.3745/KIPSTD.2007.14-D.1.009 인용 PDF KSCI

Adaptive Data Mining Model using Fuzzy Performance Measures (퍼지 성능 측정자를 이용한 적응 데이터 마이닝 모델)

Rhee, Hyun-Sook
- The KIPS Transactions:PartB
- /
- v.13B no.5 s.108
- /
- pp.541-546
- /
- 2006
Data Mining is the process of finding hidden patterns inside a large data set. Cluster analysis has been used as a popular technique for data mining. It is a fundamental process of data analysis and it has been Playing an important role in solving many problems in pattern recognition and image processing. If fuzzy cluster analysis is to make a significant contribution to engineering applications, much more attention must be paid to fundamental decision on the number of clusters in data. It is related to cluster validity problem which is how well it has identified the structure that Is present in the data. In this paper, we design an adaptive data mining model using fuzzy performance measures. It discovers clusters through an unsupervised neural network model based on a fuzzy objective function and evaluates clustering results by a fuzzy performance measure. We also present the experimental results on newsgroup data. They show that the proposed model can be used as a document classifier.
https://doi.org/10.3745/KIPSTB.2006.13B.5.541 인용 PDF KSCI

Evaluation of Water Quality Prediction Models at Intake Station by Data Mining Techniques (데이터마이닝 기법을 적용한 취수원 수질예측모형 평가)

Kim, Ju-Hwan;Chae, Soo-Kwon;Kim, Byung-Sik
- Journal of Environmental Impact Assessment
- /
- v.20 no.5
- /
- pp.705-716
- /
- 2011
For the efficient discovery of knowledge and information from the observed systems, data mining techniques can be an useful tool for the prediction of water quality at intake station in rivers. Deterioration of water quality can be caused at intake station in dry season due to insufficient flow. This demands additional outflow from dam since some extent of deterioration can be attenuated by dam reservoir operation to control outflow considering predicted water quality. A seasonal occurrence of high ammonia nitrogen ($NH_3$-N) concentrations has hampered chemical treatment processes of a water plant in Geum river. Monthly flow allocation from upstream dam is important for downstream $NH_3$-N control. In this study, prediction models of water quality based on multiple regression (MR), artificial neural network and data mining methods were developed to understand water quality variation and to support dam operations through providing predicted $NH_3$-N concentrations at intake station. The models were calibrated with eight years of monthly data and verified with another two years of independent data. In those models, the $NH_3$-N concentration for next time step is dependent on dam outflow, river water quality such as alkalinity, temperature, and $NH_3$-N of previous time step. The model performances are compared and evaluated by error analysis and statistical characteristics like correlation and determination coefficients between the observed and the predicted water quality. It is expected that these data mining techniques can present more efficient data-driven tools in modelling stage and it is found that those models can be applied well to predict water quality in stream river systems.
https://doi.org/10.14249/eia.2011.20.5.705 인용 PDF KSCI

A Medium Access Control Mechanism for Distributed In-band Full-Duplex Wireless Networks

Zuo, Haiwei;Sun, Yanjing;Li, Song;Ni, Qiang;Wang, Xiaolin;Zhang, Xiaoguang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.11
- /
- pp.5338-5359
- /
- 2017
In-band full-duplex (IBFD) wireless communication supports symmetric dual transmission between two nodes and asymmetric dual transmission among three nodes, which allows improved throughput for distributed IBFD wireless networks. However, inter-node interference (INI) can affect desired packet reception in the downlink of three-node topology. The current Half-duplex (HD) medium access control (MAC) mechanism RTS/CTS is unable to establish an asymmetric dual link and consequently to suppress INI. In this paper, we propose a medium access control mechanism for use in distributed IBFD wireless networks, FD-DMAC (Full-Duplex Distributed MAC). In this approach, communication nodes only require single channel access to establish symmetric or asymmetric dual link, and we fully consider the two transmission modes of asymmetric dual link. Through FD-DMAC medium access, the neighbors of communication nodes can clearly know network transmission status, which will provide other opportunities of asymmetric IBFD dual communication and solve hidden node problem. Additionally, we leverage FD-DMAC to transmit received power information. This approach can assist communication nodes to adjust transmit powers and suppress INI. Finally, we give a theoretical analysis of network performance using a discrete-time Markov model. The numerical results show that FD-DMAC achieves a significant improvement over RTS/CTS in terms of throughput and delay.
https://doi.org/10.3837/tiis.2017.11.009 인용 PDF KSCI

Analysis of the mechanical properties and failure modes of rock masses with nonpersistent joint networks

Wu, Yongning;Zhao, Yang;Tang, Peng;Wang, Wenhai;Jiang, Lishuai
- Geomechanics and Engineering
- /
- v.30 no.3
- /
- pp.281-291
- /
- 2022
Complex rock masses include various joint planes, bedding planes and other weak structural planes. The existence of these structural planes affects the mechanical properties, deformation rules and failure modes of jointed rock masses. To study the influence of the parameters of a nonpersistent joint network on the mechanical properties and failure modes of jointed rock masses, synthetic rock mass (SRM) technology based on discrete elements is introduced. The results show that as the size of the joints in the rock mass increases, the compressive strength and the discreteness of the rock mass first increase and then decrease. Among them, the joints that are characterized by "small but many" joints and "large and clustered" joints have the most significant impact on the strength of the rock mass. With the increase in joint density in the rock mass, the compressive strength of rock mass decreases monotonically, but the rate of decrease gradually decreases. With the increase in the joint dip angle in rock mass, the strength of the rock mass first decreases and then increases, forming a U-shaped change rule. In the analysis of the failure mode and deformation of a jointed rock mass, the type of plastic zone formed after rock mass failure is closely related to the macroscopic displacement deformation of the rock mass and the parameters of the joints, which generally shows that the location and density of the joints greatly affect the failure mode and displacement degree of the jointed rock mass. The instability mechanism of jointed surrounding rock is revealed.
https://doi.org/10.12989/gae.2022.30.3.281 인용 KSCI

Applying a Novel Neuroscience Mining (NSM) Method to fNIRS Dataset for Predicting the Business Problem Solving Creativity: Emphasis on Combining CNN, BiLSTM, and Attention Network

Kim, Kyu Sung;Kim, Min Gyeong;Lee, Kun Chang
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.8
- /
- pp.1-7
- /
- 2022
With the development of artificial intelligence, efforts to incorporate neuroscience mining with AI have increased. Neuroscience mining, also known as NSM, expands on this concept by combining computational neuroscience and business analytics. Using fNIRS (functional near-infrared spectroscopy)-based experiment dataset, we have investigated the potential of NSM in the context of the BPSC (business problem-solving creativity) prediction. Although BPSC is regarded as an essential business differentiator and a difficult cognitive resource to imitate, measuring it is a challenging task. In the context of NSM, appropriate methods for assessing and predicting BPSC are still in their infancy. In this sense, we propose a novel NSM method that systematically combines CNN, BiLSTM, and attention network for the sake of enhancing the BPSC prediction performance significantly. We utilized a dataset containing over 150 thousand fNIRS-measured data points to evaluate the validity of our proposed NSM method. Empirical evidence demonstrates that the proposed NSM method reveals the most robust performance when compared to benchmarking methods.
https://doi.org/10.9708/jksci.2022.27.08.001 인용 PDF KSCI HTML

A Trend Analysis and Policy proposal for the Work Permit System through Text Mining: Focusing on Text Mining and Social Network analysis (텍스트마이닝을 통한 고용허가제 트렌드 분석과 정책 제안 : 텍스트마이닝과 소셜네트워크 분석을 중심으로)

Ha, Jae-Been;Lee, Do-Eun
- Journal of Convergence for Information Technology
- /
- v.11 no.9
- /
- pp.17-27
- /
- 2021
The aim of this research was to identify the issue of the work permit system and consciousness of the people on the system, and to suggest some ideas on the government policies on it. To achieve the aim of research, this research used text mining based on social data. This research collected 1,453,272 texts from 6,217 units of online documents which contained 'work permit system' from January to December, 2020 using Textom, and did text-mining and social network analysis. This research extracted 100 key words frequently mentioned from the analyses of data top-level key word frequency, and degree centrality analysis, and constituted job problem, importance of policy process, competitiveness in the respect of industries, and improvement of living conditions of foreign workers as major key words. In addition, through semantic network analysis, this research figured out major awareness like 'employment policy', and various kinds of ambient awareness like 'international cooperation', 'workers' human rights', 'law', 'recruitment of foreigners', 'corporate competitiveness', 'immigrant culture' and 'foreign workforce management'. Finally, this research suggested some ideas worth considering in establishing government policies on the work permit system and doing related researches.
https://doi.org/10.22156/CS4SMB.2021.11.09.017 인용 PDF KSCI

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

Eo, Kyun Sun;Lee, Kun Chang
- Journal of Digital Convergence
- /
- v.17 no.2
- /
- pp.163-170
- /
- 2019
Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.
https://doi.org/10.14400/JDC.2019.17.2.163 인용 PDF KSCI HTML

Reorganizing Social Issues from R&D Perspective Using Social Network Analysis

Shun Wong, William Xiu;Kim, Namgyu
- Journal of Information Technology Applications and Management
- /
- v.22 no.3
- /
- pp.83-103
- /
- 2015
The rapid development of internet technologies and social media over the last few years has generated a huge amount of unstructured text data, which contains a great deal of valuable information and issues. Therefore, text mining-extracting meaningful information from unstructured text data-has gained attention from many researchers in various fields. Topic analysis is a text mining application that is used to determine the main issues in a large volume of text documents. However, it is difficult to identify related issues or meaningful insights as the number of issues derived through topic analysis is too large. Furthermore, traditional issue-clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be recognized using traditional issue-clustering methods, even if those issues are strongly related in other perspectives. Therefore, in this research, a methodology to reorganize social issues from a research and development (R&D) perspective using social network analysis is proposed. Using an R&D perspective lexicon, issues that consistently share the same R&D keywords can be further identified through social network analysis. In this study, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Issue clustering can then be performed based on the analysis results. Furthermore, the relationship between issues that share the same R&D keywords can be reorganized more systematically, by grouping them into clusters according to the R&D perspective lexicon. We expect that our methodology will contribute to establishing efficient R&D investment policies at the national level by enhancing the reusability of R&D knowledge, based on issue clustering using the R&D perspective lexicon. In addition, business companies could also utilize the results by aligning the R&D with their business strategy plans, to help companies develop innovative products and new technologies that sustain innovative business models.
https://doi.org/10.21219/jitam.2015.22.3.083 인용 PDF KSCI

Search Result 1,053, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)