Search | Korea Science

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms (중립도 기반 선택적 단어 제거를 통한 유용 리뷰 분류 정확도 향상 방안)

Lee, Minsik;Lee, Hong Joo
- Journal of Intelligence and Information Systems
- /
- v.22 no.3
- /
- pp.129-142
- /
- 2016
Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product offer more reliable information than that provided by sellers. However, there are too many products and reviews, the advantage of e-commerce can be overwhelmed by increasing search costs. Reading all of the reviews to find out the pros and cons of a certain product can be exhausting. To help users find the most useful information about products without much difficulty, e-commerce companies try to provide various ways for customers to write and rate product reviews. To assist potential customers, online stores have devised various ways to provide useful customer reviews. Different methods have been developed to classify and recommend useful reviews to customers, primarily using feedback provided by customers about the helpfulness of reviews. Most shopping websites provide customer reviews and offer the following information: the average preference of a product, the number of customers who have participated in preference voting, and preference distribution. Most information on the helpfulness of product reviews is collected through a voting system. Amazon.com asks customers whether a review on a certain product is helpful, and it places the most helpful favorable and the most helpful critical review at the top of the list of product reviews. Some companies also predict the usefulness of a review based on certain attributes including length, author(s), and the words used, publishing only reviews that are likely to be useful. Text mining approaches have been used for classifying useful reviews in advance. To apply a text mining approach based on all reviews for a product, we need to build a term-document matrix. We have to extract all words from reviews and build a matrix with the number of occurrences of a term in a review. Since there are many reviews, the size of term-document matrix is so large. It caused difficulties to apply text mining algorithms with the large term-document matrix. Thus, researchers need to delete some terms in terms of sparsity since sparse words have little effects on classifications or predictions. The purpose of this study is to suggest a better way of building term-document matrix by deleting useless terms for review classification. In this study, we propose neutrality index to select words to be deleted. Many words still appear in both classifications - useful and not useful - and these words have little or negative effects on classification performances. Thus, we defined these words as neutral terms and deleted neutral terms which are appeared in both classifications similarly. After deleting sparse words, we selected words to be deleted in terms of neutrality. We tested our approach with Amazon.com's review data from five different product categories: Cellphones & Accessories, Movies & TV program, Automotive, CDs & Vinyl, Clothing, Shoes & Jewelry. We used reviews which got greater than four votes by users and 60% of the ratio of useful votes among total votes is the threshold to classify useful and not-useful reviews. We randomly selected 1,500 useful reviews and 1,500 not-useful reviews for each product category. And then we applied Information Gain and Support Vector Machine algorithms to classify the reviews and compared the classification performances in terms of precision, recall, and F-measure. Though the performances vary according to product categories and data sets, deleting terms with sparsity and neutrality showed the best performances in terms of F-measure for the two classification algorithms. However, deleting terms with sparsity only showed the best performances in terms of Recall for Information Gain and using all terms showed the best performances in terms of precision for SVM. Thus, it needs to be careful for selecting term deleting methods and classification algorithms based on data sets.
https://doi.org/10.13088/jiis.2016.22.3.129 인용 PDF KSCI

Dynamic Traffic Assignment Using Genetic Algorithm (유전자 알고리즘을 이용한 동적통행배정에 관한 연구)

Park, Kyung-Chul;Park, Chang-Ho;Chon, Kyung-Soo;Rhee, Sung-Mo
- Journal of Korean Society for Geospatial Information Science
- /
- v.8 no.1 s.15
- /
- pp.51-63
- /
- 2000
Dynamic traffic assignment(DTA) has been a topic of substantial research during the past decade. While DTA is gradually maturing, many aspects of DTA still need improvement, especially regarding its formulation and solution algerian Recently, with its promise for In(Intelligent Transportation System) and GIS(Geographic Information System) applications, DTA have received increasing attention. This potential also implies higher requirement for DTA modeling, especially regarding its solution efficiency for real-time implementation. But DTA have many mathematical difficulties in searching process due to the complexity of spatial and temporal variables. Although many solution algorithms have been studied, conventional methods cannot iud the solution in case that objective function or constraints is not convex. In this paper, the genetic algorithm to find the solution of DTA is applied and the Merchant-Nemhauser model is used as DTA model because it has a nonconvex constraint set. To handle the nonconvex constraint set the GENOCOP III system which is a kind of the genetic algorithm is used in this study. Results for the sample network have been compared with the results of conventional method.
PDF

A study on improving the accuracy of machine learning models through the use of non-financial information in predicting the Closure of operator using electronic payment service (전자결제서비스 이용 사업자 폐업 예측에서 비재무정보 활용을 통한 머신러닝 모델의 정확도 향상에 관한 연구)

Hyunjeong Gong;Eugene Hwang;Sunghyuk Park
- Journal of Intelligence and Information Systems
- /
- v.29 no.3
- /
- pp.361-381
- /
- 2023
Research on corporate bankruptcy prediction has been focused on financial information. Since the company's financial information is updated quarterly, there is a problem that timeliness is insufficient in predicting the possibility of a company's business closure in real time. Evaluated companies that want to improve this need a method of judging the soundness of a company that uses information other than financial information to judge the soundness of a target company. To this end, as information technology has made it easier to collect non-financial information about companies, research has been conducted to apply additional variables and various methodologies other than financial information to predict corporate bankruptcy. It has become an important research task to determine whether it has an effect. In this study, we examined the impact of electronic payment-related information, which constitutes non-financial information, when predicting the closure of business operators using electronic payment service and examined the difference in closure prediction accuracy according to the combination of financial and non-financial information. Specifically, three research models consisting of a financial information model, a non-financial information model, and a combined model were designed, and the closure prediction accuracy was confirmed with six algorithms including the Multi Layer Perceptron (MLP) algorithm. The model combining financial and non-financial information showed the highest prediction accuracy, followed by the non-financial information model and the financial information model in order. As for the prediction accuracy of business closure by algorithm, XGBoost showed the highest prediction accuracy among the six algorithms. As a result of examining the relative importance of a total of 87 variables used to predict business closure, it was confirmed that more than 70% of the top 20 variables that had a significant impact on the prediction of business closure were non-financial information. Through this, it was confirmed that electronic payment-related information of non-financial information is an important variable in predicting business closure, and the possibility of using non-financial information as an alternative to financial information was also examined. Based on this study, the importance of collecting and utilizing non-financial information as information that can predict business closure is recognized, and a plan to utilize it for corporate decision-making is also proposed.
https://doi.org/10.13088/jiis.2023.29.3.361 인용 PDF

Visualization Technique of Spatial Statistical Data and System Implementation (공간 통계 데이터의 시각화 기술 및 시스템 개발)

Baek, Ryong;Hong, Gwang-Soo;Yang, Seung-Hoon;Kim, Byung-Gyu
- KIPS Transactions on Software and Data Engineering
- /
- v.2 no.12
- /
- pp.849-854
- /
- 2013
In this paper, a system technology-based algorithms and visualization is proposed to show a space data. Also the proposed system provides analysis function with combination of usual map and automatic document generation function to give a useful information for making an important decision based on spatial distributed data. In the proposed method, we employ the heat map analysis to present a suitable color distribution for 2 dimensional map data. The buffering analysis method is also used to define the spatial data access. By using the proposed system, spatial information in a variety of distribution will be easy to identify. Also, if we make a use of automatic document generation function in the proposed algorithm, a lot of time and cost savings are expected to make electronic document which representation of spatial information is required.
https://doi.org/10.3745/KTSDE.2013.2.12.849 인용 PDF KSCI

A probabilistic knowledge model for analyzing heart rate variability (심박수변이도 분석을 위한 확률적 지식기반 모형)

Son, Chang-Sik;Kang, Won-Seok;Choi, Rock-Hyun;Park, Hyoung-Seob;Han, Seongwook;Kim, Yoon-Nyun
- Journal of Korea Society of Industrial Information Systems
- /
- v.20 no.3
- /
- pp.61-69
- /
- 2015
This study presents a probabilistic knowledge discovery method to interpret heart rate variability (HRV) based on time and frequency domain indexes, extracted using discrete wavelet transform. The knowledge induction algorithm was composed of two phases: rule generation and rule estimation. Firstly, a rule generation converts numerical attributes to intervals using ROC curve analysis and constructs a reduced ruleset by comparing consistency degree between attribute-value pairs with different decision values. Then, we estimated three measures such as rule support, confidence, and coverage to a probabilistic interpretation for each rule. To show the effectiveness of proposed model, we evaluated the statistical discriminant power of five rules (3 for atrial fibrillation, 1 for normal sinus rhythm, and 1 for both atrial fibrillation and normal sinus rhythm) generated using a data (n=58) collected from 1 channel wireless holter electrocardiogram (ECG), i.e., HeartCall$^{(R)}$, U-Heart Inc. The experimental result showed the performance of approximately 0.93 (93%) in terms of accuracy, sensitivity, specificity, and AUC measures, respectively.
https://doi.org/10.9723/jksiis.2015.20.3.061 인용 PDF KSCI

Development of Healthcare Data Quality Control Algorithm Using Interactive Decision Tree: Focusing on Hypertension in Diabetes Mellitus Patients (대화식 의사결정나무를 이용한 보건의료 데이터 질 관리 알고리즘 개발: 당뇨환자의 고혈압 동반을 중심으로)

Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Seong-Ok;Park, Jung-Sun;Kwak, Mi-Sook;Lee, Ye-Jin;Lim, Chae-Hyeok;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
- The Korean Journal of Health Service Management
- /
- v.10 no.3
- /
- pp.63-74
- /
- 2016
Objectives : There is a need to develop a data quality management algorithm to improve the quality of healthcare data using a data quality management system. In this study, we developed a data quality control algorithms associated with diseases related to hypertension in patients with diabetes mellitus. Methods : To make a data quality algorithm, we extracted the 2011 and 2012 discharge damage survey data from diabetes mellitus patients. Derived variables were created using the primary diagnosis, diagnostic unit, primary surgery and treatment, minor surgery and treatment items. Results : Significant factors in diabetes mellitus patients with hypertension were sex, age, ischemic heart disease, and diagnostic ultrasound of the heart. Depending on the decision tree results, we found four groups with extreme values for diabetes accompanying hypertension patients. Conclusions : There is a need to check the actual data contained in the Outlier (extreme value) groups to improve the quality of the data.
https://doi.org/10.12811/kshsm.2016.10.3.063 인용 PDF KSCI

A Study on Forecasting Risk of Gas Accident using Weather Data (기상 데이터를 활용한 가스사고위험 예보에 관한 연구)

Oh, Jeong Seok
- Journal of the Korean Institute of Gas
- /
- v.22 no.5
- /
- pp.107-113
- /
- 2018
While accident data are used to show alertness to accidents or to review similar cases, the analysis of nature of accident data its association with surrounding environment is very insufficient. Therefore, it is very necessary to demonstrate the possibility of an accident for a particular region by developing analysis techniques with the related accident data. The purpose of this study is to develop an analysis model and implement a system that produces regional accident probability based on historical weather information data and accident and reporting data. In other words, the system is designed and developed to create models by k-NN and decision tree algorithms with optional user-environment variables based on the probability between weather and accidents about many particular region of Korea. In the future, the models developed in this study are intended to be used to analyze and calculate the risk of a more narrow area.
https://doi.org/10.7842/kigas.2018.22.5.107 인용 PDF KSCI

A Hybrid System of Wavelet Transformations and Neural Networks Using Genetic Algorithms: Applying to Chaotic Financial Markets (유전자알고리즘을 이용한 웨이블릿분석 및 인공신경망기법의 통합모형구축)

Shin, Taeksoo;Han, Ingoo
- Proceedings of the Korea Database Society Conference
- /
- 1999.06a
- /
- pp.271-280
- /
- 1999
인공신경망을 시계열예측에 적용하는 경우에 고려되어야 할 문제중, 특히 모형에 적합한 입력변수의 생성이 중요시되고 있는데, 이러한 분야는 인공신경망의 모형생성과정에서 입력변수에 대한 전처리기법으로써 다양하게 제시되어 왔다. 가장 최근의 입력변수 전처리기법으로써 제시되고 있는 신호처리기법은 전통적 주기분할처리방법인 푸리에변환기법(Fourier transforms)을 비롯하여 이를 확장시킨 개념인 웨이블릿변환기법(wavelet transforms) 등으로 대별될 수 있다. 이는 기본적으로 시계열이 다수의 주기(cycle)들로 구성된 상이한 시계열들의 집합이라는 가정에서 출발하고 있다. 전통적으로 이러한 시계열은 전기 또는 전자공학에서 주파수영역분할, 즉 고주파 및 저주파수를 분할하기 위한 기법에 적용되어 왔다. 그러나, 최근에는 이러한 연구가 다양한 분야에 활발하게 응용되기 시작하였으며, 그 중의 대표적인 예가 바로 경영분야의 재무시계열에 대한 분석이다 전통적으로 재무시계열은 장, 단기의사결정을 가진 시장참여자들간의 거래특성이 시계열에 각기 달리 가격으로 반영되기 때문에 이러한 상이한 집단들의 고유한 거래움직임으로 말미암아 예를 들어, 주식시장이 프랙탈구조를 가지고 있다고 보기도 한다. 이처럼 재무시계열은 다양한 사회현상의 집합체라고 볼 수 있으며, 그만큼 예측모형을 구축하는데 어려움이 따른다. 본 연구는 이러한 시계열의 주기적 특성에 기반을 둔 신호처리분석으로서 기존의 시계열로부터 노이즈를 줄여 주면서 보다 의미 있는 정보로 변환시켜 줄 수 있는 웨이블릿분석 방법론을 새로운 필터링기법으로 사용하여 현재 많은 연구가 진행되고 있는 인공신경망과의 모형결합을 통해 기존연구와는 다른 새로운 통합예측방법론을 제시하고자 한다. 본 연구에서 제시하는 통합방법론은 크게 2단계 과정을 거쳐 예측모형으로 완성이 된다. 즉, 1차 모형단계에서 원시 재무시계열은 먼저 웨이블릿분석을 통해서 노이즈가 필터링 되는 동시에, 과거 재무시계열의 프랙탈 구조, 즉 비선형적인 움직임을 보다 잘 반영시켜 주는 다차원 주기요소를 가지는 시계열로 분해, 생성되며, 이렇게 주기에 따라 장단기로 분할된 시계열들은 2차 모형단계에서 신경망의 새로운 입력변수로서 사용되어 최종적인 인공 신경망모델을 구축하는 데 반영된다.
PDF

MRI Quantification Analysis on Fall in Sick Times of the Cerebral Infarction Patients Using Object-Centered Hierarchical Planning (객체 중심 계층적 계획을 이용한 뇌경색 환자의 시기별 MRI 정량적 분석에 관한 연구)

Ha, Kwang;Jeon, Gye-Rok;Kim, Gil-Joong
- Journal of Biomedical Engineering Research
- /
- v.24 no.2
- /
- pp.61-68
- /
- 2003
This paper presents a quantitative analysis method for fall in sick times of the cerebral infarction patients using three types of magnetic resonance image, which play an important role in deciding method of medical treatment. For this object, image characteristics obtained by three radiographic methods of MRI and their relation were analyzed by means of object centered hierarchical Planning method. This methode presents an approach to the knowledge based processes for image interpretation and analysis. To compare three type of MRI. a multiple warping algorithm and affine transform method performed for image matching. Then each fall in sick times level of cerebral infarction was quantified and pseudo-color mapping performed by comparing gray level value one another according to Previously obtained hand maid data. The result of this study was compared to a medical doctors decision.
PDF KSCI

Balanced Clustering based on Mobile Agents for the Ubiquitous Healthcare Systems (유비쿼터스 헬스케어 시스템에서 이동에이전트 기반 균형화 클러스터링)

Mateo, Romeo Mark A.;Lee, Jae-Wan;Lee, Mal-Rey
- Journal of Internet Computing and Services
- /
- v.11 no.3
- /
- pp.65-74
- /
- 2010
In the ubiquitous healthcare, automated diagnosis is commonly achieved by an agent system to provide intelligent decision support and fast diagnosis result. Mobile agent technology is used for efficient load distribution by migrating processes to a less loaded node which is considered in our design of a ubiquitous healthcare system. This paper presents a framework for ubiquitous healthcare technologies which mainly focuses on mobile agents that serve the on-demand processes of an automated diagnosis support system. Considering the efficient utilization of resources, a balanced clustering for the load distribution of processes within nodes is proposed. The proposed algorithm selects overloaded nodes to migrate processes to near nodes until the load variance of the system is minimized. Our proposed balanced clustering efficiently distributes processes to all nodes considering message overheads by performing the migration to the near nodes.
PDF KSCI

Search Result 583, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)