• Title/Summary/Keyword: Document quality prediction model

Search Result 6, Processing Time 0.019 seconds

Text-Confidence Feature Based Quality Evaluation Model for Knowledge Q&A Documents (텍스트 신뢰도 자질 기반 지식 질의응답 문서 품질 평가 모델)

  • Lee, Jung-Tae;Song, Young-In;Park, So-Young;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.10
    • /
    • pp.608-615
    • /
    • 2008
  • In Knowledge Q&A services where information is created by unspecified users, document quality is an important factor of user satisfaction with search results. Previous work on quality prediction of Knowledge Q&A documents evaluate the quality of documents by using non-textual information, such as click counts and recommendation counts, and focus on enhancing retrieval performance by incorporating the quality measure into retrieval model. Although the non-textual information used in previous work was proven to be useful by experiments, data sparseness problem may occur when predicting the quality of newly created documents with such information. To solve data sparseness problem of non-textual features, this paper proposes new features for document quality prediction, namely text-confidence features, which indicate how trustworthy the content of a document is. The proposed features, extracted directly from the document content, are stable against data sparseness problem, compared to non-textual features that indirectly require participation of service users in order to be collected. Experiments conducted on real world Knowledge Q&A documents suggests that text-confidence features show performance comparable to the non-textual features. We believe the proposed features can be utilized as effective features for document quality prediction and improve the performance of Knowledge Q&A services in the future.

Quality Analysis of the Request for Proposals of Public Information Systems Project : System Operational Concept (공공정보화사업 제안요청서 품질분석 : 시스템 운영 개념을 중심으로)

  • Park, Sanghwi;Kim, Byungcho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.37-54
    • /
    • 2019
  • The purpose of this study is to present an evaluation model to measure the clarification level of stakeholder requirements of public sector software projects in the Republic of Korea. We tried to grasp the quality of proposal request through evaluation model. It also examines the impact of the level of stakeholder requirements on the level of system requirements. To do this, we analyzed existing research models and related standards related to business requirements and stakeholder requirements, and constructed evaluation models for the system operation concept documents in the ISO/IEC/IEEE 29148. The system operation concept document is a document prepared by organizing the requirements of stakeholders in the organization and sharing the intention of the organization. The evaluation model proposed in this study focuses on evaluating whether the contents related to the system operation concept are faithfully written in the request for proposal. The evaluation items consisted of three items: 'organization status', 'desired changes', and 'operational constraints'. The sample extracted 217 RFPs in the national procurement system. As a result of the analysis, the evaluation model proved to be valid and the internal consistency was maintained. The level of system operation concept was very low, and it was also found to affect the quality of system requirements. It is more important to clearly write stakeholders' requirements than the functional requirements. we propose a news classification methods for sentiment analysis that is effective for bankruptcy prediction model.

Using Ontologies for Semantic Text Mining (시맨틱 텍스트 마이닝을 위한 온톨로지 활용 방안)

  • Yu, Eun-Ji;Kim, Jung-Chul;Lee, Choon-Youl;Kim, Nam-Gyu
    • The Journal of Information Systems
    • /
    • v.21 no.3
    • /
    • pp.137-161
    • /
    • 2012
  • The increasing interest in big data analysis using various data mining techniques indicates that many commercial data mining tools now need to be equipped with fundamental text analysis modules. The most essential prerequisite for accurate analysis of text documents is an understanding of the exact semantics of each term in a document. The main difficulties in understanding the exact semantics of terms are mainly attributable to homonym and synonym problems, which is a traditional problem in the natural language processing field. Some major text mining tools provide a thesaurus to solve these problems, but a thesaurus cannot be used to resolve complex synonym problems. Furthermore, the use of a thesaurus is irrelevant to the issue of homonym problems and hence cannot solve them. In this paper, we propose a semantic text mining methodology that uses ontologies to improve the quality of text mining results by resolving the semantic ambiguity caused by homonym and synonym problems. We evaluate the practical applicability of the proposed methodology by performing a classification analysis to predict customer churn using real transactional data and Q&A articles from the "S" online shopping mall in Korea. The experiments revealed that the prediction model produced by our proposed semantic text mining method outperformed the model produced by traditional text mining in terms of prediction accuracy such as the response, captured response, and lift.

A Study on the Development of GIS based Integrated Information System for Water Quality Management of Yeongsan River Estuary (영산강 하구역 수질환경 관리를 위한 GIS기반 통합정보시스템 개발에 관한 연구)

  • Lee, Sung Joo;Kim, Kye Hyun;Park, Young Gil;Lee, Geon Hwi;Yoo, Jea Hyun
    • Journal of Wetlands Research
    • /
    • v.16 no.1
    • /
    • pp.73-83
    • /
    • 2014
  • The government has recently carried out monitoring to attain a better understanding of the current situation and model for prediction of future events pertaining to water quality in the estuarine area of Yeongsan River. But many users have noted difficulties to understand and utilize the results because most monitoring and model data consist of figures and text. The aim of this study is to develop a GIS-based integrated information system to support the understanding of the current situation and prediction of future events about water quality in the estuarine area of Yeongsan River. To achieve this, a monitoring DB is assembled, a linkages model is defined, a GUI is composed, and the system development environment and system composition are defined. The monitoring data consisted of observation data from 2010 ~ 2012 in the estuarine area of Yeongsan River. The models used in the study are HSPF (Hydrological Simulation Program-Fortran) for simulation of the basin and EFDC (Environmental Fluid Dynamics Code) for simulation of the estuary and river. Ultimately, a GIS based system was presented for utilization and expression using monitoring and model data. The system supports prediction of the estuarine area ecological environment quantitatively and displays document type model simulation results in a map-based environment to enhance the user's spatial understanding. In future study, the system will be updated to include a decision making support system that is capable of handling estuary environment issues and support environmental assessment and development of related policies.

Link Error Analysis and Modeling for Video Streaming Cross-Layer Design in Mobile Communication Networks

  • Karner, Wolfgang;Nemethova, Olivia;Svoboda, Philipp;Rupp, Markus
    • ETRI Journal
    • /
    • v.29 no.5
    • /
    • pp.569-595
    • /
    • 2007
  • Particularly in wireless communications, link errors severely affect the quality of the services due to the high error probability and the specific error characteristics (burst errors) in the radio access part of the network. In this work, we show that thorough analysis and appropriate modeling of radio-link error behavior are essential to evaluate and optimize higher layer protocols and services. They are also the basis for finding network-aware cross-layer processing algorithms which are capable of exploiting the specific properties of the link error statistics, such as predictability. This document presents the analysis of the radio link errors based on measurements in live Universal Mobile Telecommunication System (UMTS) radio access networks as well as new link error models originating from that analysis. It is shown that the knowledge of the specific link error characteristics leads to significant improvements in the quality of streamed video by applying the proposed novel network- and content-aware cross-layer scheduling algorithms. Although based on live UMTS network experience, many of the conclusions in this work are of general validity and are not limited to UMTS only.

  • PDF

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.