• Title/Summary/Keyword: SNS Use

Search Result 660, Processing Time 0.03 seconds

Context Sharing Framework Based on Time Dependent Metadata for Social News Service (소셜 뉴스를 위한 시간 종속적인 메타데이터 기반의 컨텍스트 공유 프레임워크)

  • Ga, Myung-Hyun;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.39-53
    • /
    • 2013
  • The emergence of the internet technology and SNS has increased the information flow and has changed the way people to communicate from one-way to two-way communication. Users not only consume and share the information, they also can create and share it among their friends across the social network service. It also changes the Social Media behavior to become one of the most important communication tools which also includes Social TV. Social TV is a form which people can watch a TV program and at the same share any information or its content with friends through Social media. Social News is getting popular and also known as a Participatory Social Media. It creates influences on user interest through Internet to represent society issues and creates news credibility based on user's reputation. However, the conventional platforms in news services only focus on the news recommendation domain. Recent development in SNS has changed this landscape to allow user to share and disseminate the news. Conventional platform does not provide any special way for news to be share. Currently, Social News Service only allows user to access the entire news. Nonetheless, they cannot access partial of the contents which related to users interest. For example user only have interested to a partial of the news and share the content, it is still hard for them to do so. In worst cases users might understand the news in different context. To solve this, Social News Service must provide a method to provide additional information. For example, Yovisto known as an academic video searching service provided time dependent metadata from the video. User can search and watch partial of video content according to time dependent metadata. They also can share content with a friend in social media. Yovisto applies a method to divide or synchronize a video based whenever the slides presentation is changed to another page. However, we are not able to employs this method on news video since the news video is not incorporating with any power point slides presentation. Segmentation method is required to separate the news video and to creating time dependent metadata. In this work, In this paper, a time dependent metadata-based framework is proposed to segment news contents and to provide time dependent metadata so that user can use context information to communicate with their friends. The transcript of the news is divided by using the proposed story segmentation method. We provide a tag to represent the entire content of the news. And provide the sub tag to indicate the segmented news which includes the starting time of the news. The time dependent metadata helps user to track the news information. It also allows them to leave a comment on each segment of the news. User also may share the news based on time metadata as segmented news or as a whole. Therefore, it helps the user to understand the shared news. To demonstrate the performance, we evaluate the story segmentation accuracy and also the tag generation. For this purpose, we measured accuracy of the story segmentation through semantic similarity and compared to the benchmark algorithm. Experimental results show that the proposed method outperforms benchmark algorithms in terms of the accuracy of story segmentation. It is important to note that sub tag accuracy is the most important as a part of the proposed framework to share the specific news context with others. To extract a more accurate sub tags, we have created stop word list that is not related to the content of the news such as name of the anchor or reporter. And we applied to framework. We have analyzed the accuracy of tags and sub tags which represent the context of news. From the analysis, it seems that proposed framework is helpful to users for sharing their opinions with context information in Social media and Social news.

Importance-Performance Analysis of Operation of Specialized Complexes for Horticultural Production (원예전문생산단지 운영에 대한 중요도-만족도 분석)

  • Hong, Na-Kyoung;Rhee, Zae-Woong;Kim, Tae-Kyun
    • Current Research on Agriculture and Life Sciences
    • /
    • v.33 no.1
    • /
    • pp.25-31
    • /
    • 2015
  • This study investigated the operation criteria of specialized complexes for horticultural production reflecting the farmers' preferences. First, the analysis of the communal activity included six factors: the group purchase of consumables for common activity, group purchase of the greenhouse apparatus, cooperative seed raising, use of a common air conditioning and heating system, cooperative shipping, and soil examination and certification system. The results of the Importance-Performance analysis can be summarized as follows. The factors requiring good management included the group purchase of consumables for common activity, group purchase of the greenhouse apparatus, and cooperative shipping. The factors with a lower priority included cooperative seed raising and the use of a common air conditioning and heating system. While the importance of the soil examination and certification system was low, the satisfaction was high, so this factor needs to be managed to avoid overkill. Second, the analysis of information exchange and education included six factors: production technique information, greenhouse facility management information, distribution-related information, production technique education, greenhouse facility management education, and distribution-related education. The results of the Importance-Performance analysis can be summarized as follows. The factor of production technique education was the most important determinant, plus the factors requiring good management included production technique information, greenhouse facility management information, and distribution-related information. The factors with a lower priority included greenhouse facility management education and distribution-related education. Therefore, to enhance productivity through facility modernization, the scaling up and creation of more specialized horticulture complexes are recommended as policy measures to gain export competitiveness. As the Korean government is expected to expand the scale of specialized horticulture complexes, the results of this paper can be widely utilized.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

The Effects of Game User's Social Capital and Information Privacy Concern on SNGReuse Intention and Recommendation Intention Through Flow (게임 이용자의 사회자본과 개인정보제공에 대한 우려가 플로우를 통해 SNG 재이용의도와 추천의도에 미치는 영향)

  • Lee, Ji-Hyeon;Kim, Han-Ku
    • Management & Information Systems Review
    • /
    • v.37 no.4
    • /
    • pp.21-39
    • /
    • 2018
  • Today, Mobile Instant Message (MIM) has become a communication means which is commonly used by many people as the technology on smart phones has been enhanced. Among the services, KakaoGame creates much profits continuously by using its representative Kakao platform. However, even though the number of users of KakaoGame increases and the characteristics of the users are more diversified, there are few researches on the relationship between the characteristics of the SNG users and the continuous use of the game. Since the social capital that is formed by the SNG users with the acquaintances create the sense of belonging, its role is being emphasized under the environment of social network. In addition, game user's concerns about the information privacy may decrease the trust on a game APP, and it also caused to threaten about the game system. Therefore, this study was designed to examine the structural relationships among SNG users' social capital, concerns about the information privacy, flow, SNG reuse intention and recommendation intention. The results from this study are as follow. First of all, the participants' bridging social capital had a positive effect on the flow of an SNG, but the bonding social capital had a negative effect on the flow of an SNG. In addition, awareness of information privacy concern had a negative effects on the flow of an SNG, but control of information privacy concern had a positive effect on the flow of an SNG. Lastly, the flow of an SNG had a positive effect on the reuse intention and recommendation intention of an SNG. Also, reuse intention of an SNG had a positive effect on the recommendation intention. Based on the results from this study, academic and practical implications can be drawn. First, This study focused on KakaoTalk which has both of the closed and open characteristics of an SNS and it was found that the SNG user's social capital might be a factor influencing each user's behaviors through the user's flow experiences in SNG. Second, this study extends the scope of prior researches by empirically analysing the relationship between the concerns about the SNG user's information privacy and flow of an SNG. Finally, the results of this research can provide practical guidelines to develop effective marketing strategies considering them for SNG companies.

A Study on the Residential Environment Preference and Needs of the Multi-academic Young Single Family Based on Life Style (라이프스타일 기반 다학제적 청년층 1인 가구의 주거 환경 선호 및 요구 분석)

  • Lim, Jun Hyung;Choi, In Young;Park, Hey Kyung
    • Korea Science and Art Forum
    • /
    • v.37 no.1
    • /
    • pp.249-260
    • /
    • 2019
  • Recently, the proportion of single-person households is on the increase in Korea, expected to reach 34.6% in 2035. Among the single-person households, Young single family households are facing greater difficulties due to high house prices in Korea. The government is expanding its support to Young single family, executing various policies such as public lease housings, private lease housings for youth, youth dormitory, etc. The purpose of this study is to understand the exact housing requirement of Young single family households who have different lifestyles with other age groups and provide base line data for youth house planning which will be in use later on. Study methods are shown below. First, this research studied the status and characteristics of Young single family households by looking into literature. Second, by studying previous studies concerned with life style and housing preferences of youth, the tool for investigating preferences and needs of housing environment by Young single family households was composed. Third, survey on characterstics of space usage, preferences and needs on flat composition, and preferences of interior design were conducted based on lifestyle of Young single family-households. The survey was conducted as an online survey using SNS for 150 Young single family holds from the age of 20 and 39, including students and office workers from December 2018 to January 2019. The results are as following. (1) Looking into the space usage characteristics, considering that various activities other than basic functions take place in bedroom and living room of small-sized Young single family households, we need to consider this additionally when planning the housing. (2) Looking into the preferences and composite needs of flat composition, the subjects demand separate bed room and more living room space, and also demand expansion of living room space where various activities take place and additional storage such as dress room in bed room (3) The preferences toward interior design show preferences toward modern style and achromatic color, a representative color. The subjects also prefer floor finishing materials normally used for living spaces, and indirect, soft lighting that uses wall. Also, there are differences between interior design preferences between students (20's) and office workers(30's) due to their different lifestyles. Research is needed to propose practical residential environment requirements and plans through a case study of actual public rental housing and a wider range of users.

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.

Structural Properties of Social Network and Diffusion of Product WOM: A Sociocultural Approach (사회적 네트워크 구조특성과 제품구전의 확산: 사회문화적 접근)

  • Yoon, Sung-Joon;Han, Hee-Eun
    • Journal of Distribution Research
    • /
    • v.16 no.1
    • /
    • pp.141-177
    • /
    • 2011
  • I. Research Objectives: Most of the previous studies on diffusion have concentrated on efficacy of WOM communication with the use of variables at individual level (Iacobucci 1996; Midgley et al. 1992). However, there is a paucity of studies which investigated network's structural properties as antecedents of WOM from the perspective of consumers' sociocultural propensities. Against this research backbone, this study attempted to link the network's structural properties and consumer' WOM behavior on cross-national basis. The major research objective of this study was to examine the relationship between network properties and WOM by comparing Korean and Chinese consumers. Specific objectives of this research are threefold; firstly, it sought to examine whether network properties (i.e., tie strength, centrality, range) affect WOM (WOM intention and quality of WOM). Secondly, it aimed to explore the moderating effects of cutural orientation (uncertainty avoidance and individuality) on the relationship between network properties and WOM. Thirdly, it substantiates the role of innovativeness as antecedents to both network properties and WOM. II. Research Hypotheses: Based on the above research objectives, the study put forth the following research hypotheses to validate. ${\cdot}$ H 1-1 : The Strength of tie between two counterparts within network will positively influence WOM effectivenes ${\cdot}$ H 1-2 : The network centrality will positively influence the WOM effectiveness ${\cdot}$ H 1-3 : The network range will positively influence the WOM effectiveness ${\cdot}$ H 2-1 : The consumer's uncertainty avoidance tendency will moderate the relationship between network properties and WOM effectiveness ${\cdot}$ H 2-2 : The consumer's individualism tendency will moderate the relationship between network properties and WOM effectiveness ${\cdot}$ H 3-1 : The consumer's innovativeness will positively influence the social network properties ${\cdot}$ H 3-2 : The consumer's innovativeness will positively influence WOM effectiveness III. Methodology: Through a pilot study and back-translation, two versions of questionnaire were prepared, one in Korean and the other in Chinese. The chinese data were collected from the chinese students enrolled in language schools in Suwon city in Korea, while Korean data were collected from students taking classes in a major university in Seoul. A total of 277 questionnaire were used for analysis of Korean data and 212 for Chinese data. The reason why Chinese students living in Korea rather than in China were selected was based on two factors: one was to neutralize the differences (ie, retail channel availability) that may arise from living in separate countries and the second was to minimize the difference in communication venues such as internet accessibility and cell phone usability. SPSS 12.0 and AMOS 7.0 were used for analysis. IV. Results: Prior to hypothesis verification, mean differences between the two countries in terms of major constructs were performed with the following result; As for network properties (tie strength, centrality and range), Koreans showed higher scores in all three constructs. For cultural orientation traits, Koreans scored higher only on uncertainty avoidance trait than Chinese. As a result of verifying the first research objective, confirming the relationship between network properties and WOM effectiveness, on Korean side, tie strength(Beta=.116; t=1.785) and centrality (Beta=.499; t=6.776) significantly influenced on WOM intention, and similar finding was obtained for Chinese side, with tie strength (Beta=.246; t=3.544) and centrality (Beta=.247; t=3.538) being significant. However, with regard to WOM argument quality, Korean data yielded only centrality (Beta=.82; t=7.600) having a significant impact on WOM, whereas China showed both tie strength(Beat=.142; t=2.052) and centrality(Beta=.348; t=5.031) being influential. To answer for the second research objective addressing the moderating role of cultural orientation, moderated regression anaylsis was performed and the result showed that uncertainty avoidance moderated between network range and WOM intention for both Korea and China, But for Korea, the uncertainty avoidance moderated between tie strength and WOM quality, while for China it moderated between network range and WOM intention. And innovativeness moderated between tie strength and WOM intention for Korea but it moderated between network range and WOM intention for China. As a result of analysing for third research objective, we found that for Korea, innovativeness positively influenced centrality only (Beta=.546; t=10.808), while for China it influenced both tie strength (Beta=.203; t=2.998) and centrality(Beta=.518; t=8.782). But for both countries alike, the innovativeness influenced positively on WOM (WOM intention and WOM quality). V. Implications: The study yields the two practical implications. Firstly, the result suggests that companies targeting multinational customers need to identify segments which are susceptible to the positive WOM and WOM information based on individual traits such as uncertainty avoidance and individualism and based on that, develop marketing communication strategy. Secondly, the companies need to divide the market on Roger's five innovation stages and based on this information, enforce marketing strategy which utilizes social networking tools such as public media and WOM. For instance, innovator and early adopters, if provided with new product information, will be able to capitalize upon the network advantages and thus add informational value to network operations using SNS or corporate blog.

  • PDF

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Clustering Method based on Genre Interest for Cold-Start Problem in Movie Recommendation (영화 추천 시스템의 초기 사용자 문제를 위한 장르 선호 기반의 클러스터링 기법)

  • You, Tithrottanak;Rosli, Ahmad Nurzid;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.57-77
    • /
    • 2013
  • Social media has become one of the most popular media in web and mobile application. In 2011, social networks and blogs are still the top destination of online users, according to a study from Nielsen Company. In their studies, nearly 4 in 5active users visit social network and blog. Social Networks and Blogs sites rule Americans' Internet time, accounting to 23 percent of time spent online. Facebook is the main social network that the U.S internet users spend time more than the other social network services such as Yahoo, Google, AOL Media Network, Twitter, Linked In and so on. In recent trend, most of the companies promote their products in the Facebook by creating the "Facebook Page" that refers to specific product. The "Like" option allows user to subscribed and received updates their interested on from the page. The film makers which produce a lot of films around the world also take part to market and promote their films by exploiting the advantages of using the "Facebook Page". In addition, a great number of streaming service providers allows users to subscribe their service to watch and enjoy movies and TV program. They can instantly watch movies and TV program over the internet to PCs, Macs and TVs. Netflix alone as the world's leading subscription service have more than 30 million streaming members in the United States, Latin America, the United Kingdom and the Nordics. As the matter of facts, a million of movies and TV program with different of genres are offered to the subscriber. In contrast, users need spend a lot time to find the right movies which are related to their interest genre. Recent years there are many researchers who have been propose a method to improve prediction the rating or preference that would give the most related items such as books, music or movies to the garget user or the group of users that have the same interest in the particular items. One of the most popular methods to build recommendation system is traditional Collaborative Filtering (CF). The method compute the similarity of the target user and other users, which then are cluster in the same interest on items according which items that users have been rated. The method then predicts other items from the same group of users to recommend to a group of users. Moreover, There are many items that need to study for suggesting to users such as books, music, movies, news, videos and so on. However, in this paper we only focus on movie as item to recommend to users. In addition, there are many challenges for CF task. Firstly, the "sparsity problem"; it occurs when user information preference is not enough. The recommendation accuracies result is lower compared to the neighbor who composed with a large amount of ratings. The second problem is "cold-start problem"; it occurs whenever new users or items are added into the system, which each has norating or a few rating. For instance, no personalized predictions can be made for a new user without any ratings on the record. In this research we propose a clustering method according to the users' genre interest extracted from social network service (SNS) and user's movies rating information system to solve the "cold-start problem." Our proposed method will clusters the target user together with the other users by combining the user genre interest and the rating information. It is important to realize a huge amount of interesting and useful user's information from Facebook Graph, we can extract information from the "Facebook Page" which "Like" by them. Moreover, we use the Internet Movie Database(IMDb) as the main dataset. The IMDbis online databases that consist of a large amount of information related to movies, TV programs and including actors. This dataset not only used to provide movie information in our Movie Rating Systems, but also as resources to provide movie genre information which extracted from the "Facebook Page". Formerly, the user must login with their Facebook account to login to the Movie Rating System, at the same time our system will collect the genre interest from the "Facebook Page". We conduct many experiments with other methods to see how our method performs and we also compare to the other methods. First, we compared our proposed method in the case of the normal recommendation to see how our system improves the recommendation result. Then we experiment method in case of cold-start problem. Our experiment show that our method is outperform than the other methods. In these two cases of our experimentation, we see that our proposed method produces better result in case both cases.