• Title/Summary/Keyword: 데이터 논문

Search Result 41,151, Processing Time 0.073 seconds

State of Health and State of Charge Estimation of Li-ion Battery for Construction Equipment based on Dual Extended Kalman Filter (이중확장칼만필터(DEKF)를 기반한 건설장비용 리튬이온전지의 State of Charge(SOC) 및 State of Health(SOH) 추정)

  • Hong-Ryun Jung;Jun Ho Kim;Seung Woo Kim;Jong Hoon Kim;Eun Jin Kang;Jeong Woo Yun
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.31 no.1
    • /
    • pp.16-22
    • /
    • 2024
  • Along with the high interest in electric vehicles and new renewable energy, there is a growing demand to apply lithium-ion batteries in the construction equipment industry. The capacity of heavy construction equipment that performs various tasks at construction sites is rapidly decreasing. Therefore, it is essential to accurately predict the state of batteries such as SOC (State of Charge) and SOH (State of Health). In this paper, the errors between actual electrochemical measurement data and estimated data were compared using the Dual Extended Kalman Filter (DEKF) algorithm that can estimate SOC and SOH at the same time. The prediction of battery charge state was analyzed by measuring OCV at SOC 5% intervals under 0.2C-rate conditions after the battery cell was fully charged, and the degradation state of the battery was predicted after 50 cycles of aging tests under various C-rate (0.2, 0.3, 0.5, 1.0, 1.5C rate) conditions. It was confirmed that the SOC and SOH estimation errors using DEKF tended to increase as the C-rate increased. It was confirmed that the SOC estimation using DEKF showed less than 6% at 0.2, 0.5, and 1C-rate. In addition, it was confirmed that the SOH estimation results showed good performance within the maximum error of 1.0% and 1.3% at 0.2 and 0.3C-rate, respectively. Also, it was confirmed that the estimation error also increased from 1.5% to 2% as the C-rate increased from 0.5 to 1.5C-rate. However, this result shows that all SOH estimation results using DEKF were excellent within about 2%.

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

  • Byun, Sungho;Lee, Donghoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.23-43
    • /
    • 2016
  • As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.

CComparative evaluation of the methods of producing planar image results by using Q-Metrix method of SPECT/CT in Lung Perfusion Scan (Lung Perfusion scan에서 SPECT-CT의 Q-Metrix방법과 평면영상 결과 산출방법에 대한 비교평가)

  • Ha, Tae Hwan;Lim, Jung Jin;Do, Yong Ho;Cho, Sung Wook;Noh, Gyeong Woon
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.22 no.1
    • /
    • pp.90-97
    • /
    • 2018
  • Purpose The lung segment ratio which is obtained through quantitative analyses of lung perfusion scan images is calculated to evaluate the lung function pre and post surgery. In this Study, the planar image production methods by using Q-Metrix (GE Healthcare, USA) program capable of not only quantitative analysis but also computation of the segment ratio after having performed SPECT/CT are comparatively evaluated. Materials and Methods Lung perfusion scan and SPECT/CT were performed on 50 lung cancer patients prior to surgery who visited our hospital from May 1, 2015 to September 13, 2016 by using Discovery 670(GE Healthcare, USA) equipment. AP(Anterior Posterior)method that uses planar image divided the frontal and rear images into three rectangular portions by means of ROI tool while PO(Posterior Oblique)method computed the segment ratio by dividing the right lobe into three parts and the left lobe into two parts on the oblique image. Segment ratio was computed by setting the ROI and VOI in the CT image by using Q-Metrix program and statistically analysis was performed with SPSS Ver. 23. Results Regarding the correlation concordance rate of Q-Metrix and AP methods, RUL(Right upper lobe), RML(Right middle lobe) and RLL(Right lower lobe) were 0.224, 0.035 and 0.447. LUL(Left upper lobe) and LLL(Left lower lobe) were found to be 0.643 and 0.456, respectively. In the PO method, the right lobe were 0.663, 0.623 and 0.702, respectively, while the left lobe were 0.754 and 0.823. When comparison was made by using the Paired sample T-test, Right lobe were $11.6{\pm}4.5$, $26.9{\pm}6.2$ and $17.8{\pm}4.2$, respectively in the AP method. Left lobe were $28.4{\pm}4.8$ and $15.4{\pm}5.6$. The right lobe of PO had values of $17.4{\pm}5.0$, $10.5{\pm}3.6$ and $27.3{\pm}6.0$, while the left lobe had values of $21.6{\pm}4.8$ and $23.1{\pm}6.6$, thereby having statistically significant difference in comparison to the Q-Metrix method for each of the lobes (P<0.05). However, there was no statistically significant difference in Right middle lobe (P>0.05). Conclusion The AP method showed low concordance rate in correlation with the Q-Metrix method. However, PO method displayed high concordance rate overall. although AP method had significant differences in all lobes, there was no significant difference in Right middle lobe of PO method. Therefore, at the time of production of lung perfusion scan results, utilization of Q-Metrix method of SPECT/CT would be useful in computation of accurate resultant values. Moreover, it is deemed possible to expect obtain more practical sectional computation result values by using PO method at the time of planar image acquisition.

Strategy for Store Management Using SOM Based on RFM (RFM 기반 SOM을 이용한 매장관리 전략 도출)

  • Jeong, Yoon Jeong;Choi, Il Young;Kim, Jae Kyeong;Choi, Ju Choel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.93-112
    • /
    • 2015
  • Depending on the change in consumer's consumption pattern, existing retail shop has evolved in hypermarket or convenience store offering grocery and daily products mostly. Therefore, it is important to maintain the inventory levels and proper product configuration for effectively utilize the limited space in the retail store and increasing sales. Accordingly, this study proposed proper product configuration and inventory level strategy based on RFM(Recency, Frequency, Monetary) model and SOM(self-organizing map) for manage the retail shop effectively. RFM model is analytic model to analyze customer behaviors based on the past customer's buying activities. And it can differentiates important customers from large data by three variables. R represents recency, which refers to the last purchase of commodities. The latest consuming customer has bigger R. F represents frequency, which refers to the number of transactions in a particular period and M represents monetary, which refers to consumption money amount in a particular period. Thus, RFM method has been known to be a very effective model for customer segmentation. In this study, using a normalized value of the RFM variables, SOM cluster analysis was performed. SOM is regarded as one of the most distinguished artificial neural network models in the unsupervised learning tool space. It is a popular tool for clustering and visualization of high dimensional data in such a way that similar items are grouped spatially close to one another. In particular, it has been successfully applied in various technical fields for finding patterns. In our research, the procedure tries to find sales patterns by analyzing product sales records with Recency, Frequency and Monetary values. And to suggest a business strategy, we conduct the decision tree based on SOM results. To validate the proposed procedure in this study, we adopted the M-mart data collected between 2014.01.01~2014.12.31. Each product get the value of R, F, M, and they are clustered by 9 using SOM. And we also performed three tests using the weekday data, weekend data, whole data in order to analyze the sales pattern change. In order to propose the strategy of each cluster, we examine the criteria of product clustering. The clusters through the SOM can be explained by the characteristics of these clusters of decision trees. As a result, we can suggest the inventory management strategy of each 9 clusters through the suggested procedures of the study. The highest of all three value(R, F, M) cluster's products need to have high level of the inventory as well as to be disposed in a place where it can be increasing customer's path. In contrast, the lowest of all three value(R, F, M) cluster's products need to have low level of inventory as well as to be disposed in a place where visibility is low. The highest R value cluster's products is usually new releases products, and need to be placed on the front of the store. And, manager should decrease inventory levels gradually in the highest F value cluster's products purchased in the past. Because, we assume that cluster has lower R value and the M value than the average value of good. And it can be deduced that product are sold poorly in recent days and total sales also will be lower than the frequency. The procedure presented in this study is expected to contribute to raising the profitability of the retail store. The paper is organized as follows. The second chapter briefly reviews the literature related to this study. The third chapter suggests procedures for research proposals, and the fourth chapter applied suggested procedure using the actual product sales data. Finally, the fifth chapter described the conclusion of the study and further research.

A Study on Industries's Leading at the Stock Market in Korea - Gradual Diffusion of Information and Cross-Asset Return Predictability- (산업의 주식시장 선행성에 관한 실증분석 - 자산간 수익률 예측 가능성 -)

  • Kim Jong-Kwon
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2004.11a
    • /
    • pp.355-380
    • /
    • 2004
  • I test the hypothesis that the gradual diffusion of information across asset markets leads to cross-asset return predictability in Korea. Using thirty-six industry portfolios and the broad market index as our test assets, I establish several key results. First, a number of industries such as semiconductor, electronics, metal, and petroleum lead the stock market by up to one month. In contrast, the market, which is widely followed, only leads a few industries. Importantly, an industry's ability to lead the market is correlated with its propensity to forecast various indicators of economic activity such as industrial production growth. Consistent with our hypothesis, these findings indicate that the market reacts with a delay to information in industry returns about its fundamentals because information diffuses only gradually across asset markets. Traditional theories of asset pricing assume that investors have unlimited information-processing capacity. However, this assumption does not hold for many traders, even the most sophisticated ones. Many economists recognize that investors are better characterized as being only boundedly rational(see Shiller(2000), Sims(2201)). Even from casual observation, few traders can pay attention to all sources of information much less understand their impact on the prices of assets that they trade. Indeed, a large literature in psychology documents the extent to which even attention is a precious cognitive resource(see, eg., Kahneman(1973), Nisbett and Ross(1980), Fiske and Taylor(1991)). A number of papers have explored the implications of limited information- processing capacity for asset prices. I will review this literature in Section II. For instance, Merton(1987) develops a static model of multiple stocks in which investors only have information about a limited number of stocks and only trade those that they have information about. Related models of limited market participation include brennan(1975) and Allen and Gale(1994). As a result, stocks that are less recognized by investors have a smaller investor base(neglected stocks) and trade at a greater discount because of limited risk sharing. More recently, Hong and Stein(1999) develop a dynamic model of a single asset in which information gradually diffuses across the investment public and investors are unable to perform the rational expectations trick of extracting information from prices. Hong and Stein(1999). My hypothesis is that the gradual diffusion of information across asset markets leads to cross-asset return predictability. This hypothesis relies on two key assumptions. The first is that valuable information that originates in one asset reaches investors in other markets only with a lag, i.e. news travels slowly across markets. The second assumption is that because of limited information-processing capacity, many (though not necessarily all) investors may not pay attention or be able to extract the information from the asset prices of markets that they do not participate in. These two assumptions taken together leads to cross-asset return predictability. My hypothesis would appear to be a very plausible one for a few reasons. To begin with, as pointed out by Merton(1987) and the subsequent literature on segmented markets and limited market participation, few investors trade all assets. Put another way, limited participation is a pervasive feature of financial markets. Indeed, even among equity money managers, there is specialization along industries such as sector or market timing funds. Some reasons for this limited market participation include tax, regulatory or liquidity constraints. More plausibly, investors have to specialize because they have their hands full trying to understand the markets that they do participate in

  • PDF

Impact of Shortly Acquired IPO Firms on ICT Industry Concentration (ICT 산업분야 신생기업의 IPO 이후 인수합병과 산업 집중도에 관한 연구)

  • Chang, YoungBong;Kwon, YoungOk
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.51-69
    • /
    • 2020
  • Now, it is a stylized fact that a small number of technology firms such as Apple, Alphabet, Microsoft, Amazon, Facebook and a few others have become larger and dominant players in an industry. Coupled with the rise of these leading firms, we have also observed that a large number of young firms have become an acquisition target in their early IPO stages. This indeed results in a sharp decline in the number of new entries in public exchanges although a series of policy reforms have been promulgated to foster competition through an increase in new entries. Given the observed industry trend in recent decades, a number of studies have reported increased concentration in most developed countries. However, it is less understood as to what caused an increase in industry concentration. In this paper, we uncover the mechanisms by which industries have become concentrated over the last decades by tracing the changes in industry concentration associated with a firm's status change in its early IPO stages. To this end, we put emphasis on the case in which firms are acquired shortly after they went public. Especially, with the transition to digital-based economies, it is imperative for incumbent firms to adapt and keep pace with new ICT and related intelligent systems. For instance, after the acquisition of a young firm equipped with AI-based solutions, an incumbent firm may better respond to a change in customer taste and preference by integrating acquired AI solutions and analytics skills into multiple business processes. Accordingly, it is not unusual for young ICT firms become an attractive acquisition target. To examine the role of M&As involved with young firms in reshaping the level of industry concentration, we identify a firm's status in early post-IPO stages over the sample periods spanning from 1990 to 2016 as follows: i) being delisted, ii) being standalone firms and iii) being acquired. According to our analysis, firms that have conducted IPO since 2000s have been acquired by incumbent firms at a relatively quicker time than those that did IPO in previous generations. We also show a greater acquisition rate for IPO firms in the ICT sector compared with their counterparts in other sectors. Our results based on multinomial logit models suggest that a large number of IPO firms have been acquired in their early post-IPO lives despite their financial soundness. Specifically, we show that IPO firms are likely to be acquired rather than be delisted due to financial distress in early IPO stages when they are more profitable, more mature or less leveraged. For those IPO firms with venture capital backup have also become an acquisition target more frequently. As a larger number of firms are acquired shortly after their IPO, our results show increased concentration. While providing limited evidence on the impact of large incumbent firms in explaining the change in industry concentration, our results show that the large firms' effect on industry concentration are pronounced in the ICT sector. This result possibly captures the current trend that a few tech giants such as Alphabet, Apple and Facebook continue to increase their market share. In addition, compared with the acquisitions of non-ICT firms, the concentration impact of IPO firms in early stages becomes larger when ICT firms are acquired as a target. Our study makes new contributions. To our best knowledge, this is one of a few studies that link a firm's post-IPO status to associated changes in industry concentration. Although some studies have addressed concentration issues, their primary focus was on market power or proprietary software. Contrast to earlier studies, we are able to uncover the mechanism by which industries have become concentrated by placing emphasis on M&As involving young IPO firms. Interestingly, the concentration impact of IPO firm acquisitions are magnified when a large incumbent firms are involved as an acquirer. This leads us to infer the underlying reasons as to why industries have become more concentrated with a favor of large firms in recent decades. Overall, our study sheds new light on the literature by providing a plausible explanation as to why industries have become concentrated.

A Study of the Reactive Movement Synchronization for Analysis of Group Flow (그룹 몰입도 판단을 위한 움직임 동기화 연구)

  • Ryu, Joon Mo;Park, Seung-Bo;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.79-94
    • /
    • 2013
  • Recently, the high value added business is steadily growing in the culture and art area. To generated high value from a performance, the satisfaction of audience is necessary. The flow in a critical factor for satisfaction, and it should be induced from audience and measures. To evaluate interest and emotion of audience on contents, producers or investors need a kind of index for the measurement of the flow. But it is neither easy to define the flow quantitatively, nor to collect audience's reaction immediately. The previous studies of the group flow were evaluated by the sum of the average value of each person's reaction. The flow or "good feeling" from each audience was extracted from his face, especially, the change of his (or her) expression and body movement. But it was not easy to handle the large amount of real-time data from each sensor signals. And also it was difficult to set experimental devices, in terms of economic and environmental problems. Because, all participants should have their own personal sensor to check their physical signal. Also each camera should be located in front of their head to catch their looks. Therefore we need more simple system to analyze group flow. This study provides the method for measurement of audiences flow with group synchronization at same time and place. To measure the synchronization, we made real-time processing system using the Differential Image and Group Emotion Analysis (GEA) system. Differential Image was obtained from camera and by the previous frame was subtracted from present frame. So the movement variation on audience's reaction was obtained. And then we developed a program, GEX(Group Emotion Analysis), for flow judgment model. After the measurement of the audience's reaction, the synchronization is divided as Dynamic State Synchronization and Static State Synchronization. The Dynamic State Synchronization accompanies audience's active reaction, while the Static State Synchronization means to movement of audience. The Dynamic State Synchronization can be caused by the audience's surprise action such as scary, creepy or reversal scene. And the Static State Synchronization was triggered by impressed or sad scene. Therefore we showed them several short movies containing various scenes mentioned previously. And these kind of scenes made them sad, clap, and creepy, etc. To check the movement of audience, we defined the critical point, ${\alpha}$and ${\beta}$. Dynamic State Synchronization was meaningful when the movement value was over critical point ${\beta}$, while Static State Synchronization was effective under critical point ${\alpha}$. ${\beta}$ is made by audience' clapping movement of 10 teams in stead of using average number of movement. After checking the reactive movement of audience, the percentage(%) ratio was calculated from the division of "people having reaction" by "total people". Total 37 teams were made in "2012 Seoul DMC Culture Open" and they involved the experiments. First, they followed induction to clap by staff. Second, basic scene for neutralize emotion of audience. Third, flow scene was displayed to audience. Forth, the reversal scene was introduced. And then 24 teams of them were provided with amuse and creepy scenes. And the other 10 teams were exposed with the sad scene. There were clapping and laughing action of audience on the amuse scene with shaking their head or hid with closing eyes. And also the sad or touching scene made them silent. If the results were over about 80%, the group could be judged as the synchronization and the flow were achieved. As a result, the audience showed similar reactions about similar stimulation at same time and place. Once we get an additional normalization and experiment, we can obtain find the flow factor through the synchronization on a much bigger group and this should be useful for planning contents.

Dynamic Virtual Ontology using Tags with Semantic Relationship on Social-web to Support Effective Search (효율적 자원 탐색을 위한 소셜 웹 태그들을 이용한 동적 가상 온톨로지 생성 연구)

  • Lee, Hyun Jung;Sohn, Mye
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.19-33
    • /
    • 2013
  • In this research, a proposed Dynamic Virtual Ontology using Tags (DyVOT) supports dynamic search of resources depending on user's requirements using tags from social web driven resources. It is general that the tags are defined by annotations of a series of described words by social users who usually tags social information resources such as web-page, images, u-tube, videos, etc. Therefore, tags are characterized and mirrored by information resources. Therefore, it is possible for tags as meta-data to match into some resources. Consequently, we can extract semantic relationships between tags owing to the dependency of relationships between tags as representatives of resources. However, to do this, there is limitation because there are allophonic synonym and homonym among tags that are usually marked by a series of words. Thus, research related to folksonomies using tags have been applied to classification of words by semantic-based allophonic synonym. In addition, some research are focusing on clustering and/or classification of resources by semantic-based relationships among tags. In spite of, there also is limitation of these research because these are focusing on semantic-based hyper/hypo relationships or clustering among tags without consideration of conceptual associative relationships between classified or clustered groups. It makes difficulty to effective searching resources depending on user requirements. In this research, the proposed DyVOT uses tags and constructs ontologyfor effective search. We assumed that tags are extracted from user requirements, which are used to construct multi sub-ontology as combinations of tags that are composed of a part of the tags or all. In addition, the proposed DyVOT constructs ontology which is based on hierarchical and associative relationships among tags for effective search of a solution. The ontology is composed of static- and dynamic-ontology. The static-ontology defines semantic-based hierarchical hyper/hypo relationships among tags as in (http://semanticcloud.sandra-siegel.de/) with a tree structure. From the static-ontology, the DyVOT extracts multi sub-ontology using multi sub-tag which are constructed by parts of tags. Finally, sub-ontology are constructed by hierarchy paths which contain the sub-tag. To create dynamic-ontology by the proposed DyVOT, it is necessary to define associative relationships among multi sub-ontology that are extracted from hierarchical relationships of static-ontology. The associative relationship is defined by shared resources between tags which are linked by multi sub-ontology. The association is measured by the degree of shared resources that are allocated into the tags of sub-ontology. If the value of association is larger than threshold value, then associative relationship among tags is newly created. The associative relationships are used to merge and construct new hierarchy the multi sub-ontology. To construct dynamic-ontology, it is essential to defined new class which is linked by two more sub-ontology, which is generated by merged tags which are highly associative by proving using shared resources. Thereby, the class is applied to generate new hierarchy with extracted multi sub-ontology to create a dynamic-ontology. The new class is settle down on the ontology. So, the newly created class needs to be belong to the dynamic-ontology. So, the class used to new hyper/hypo hierarchy relationship between the class and tags which are linked to multi sub-ontology. At last, DyVOT is developed by newly defined associative relationships which are extracted from hierarchical relationships among tags. Resources are matched into the DyVOT which narrows down search boundary and shrinks the search paths. Finally, we can create the DyVOT using the newly defined associative relationships. While static data catalog (Dean and Ghemawat, 2004; 2008) statically searches resources depending on user requirements, the proposed DyVOT dynamically searches resources using multi sub-ontology by parallel processing. In this light, the DyVOT supports improvement of correctness and agility of search and decreasing of search effort by reduction of search path.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.