• Title/Summary/Keyword: social filtering

Search Result 156, Processing Time 0.026 seconds

Clustering Method based on Genre Interest for Cold-Start Problem in Movie Recommendation (영화 추천 시스템의 초기 사용자 문제를 위한 장르 선호 기반의 클러스터링 기법)

  • You, Tithrottanak;Rosli, Ahmad Nurzid;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.57-77
    • /
    • 2013
  • Social media has become one of the most popular media in web and mobile application. In 2011, social networks and blogs are still the top destination of online users, according to a study from Nielsen Company. In their studies, nearly 4 in 5active users visit social network and blog. Social Networks and Blogs sites rule Americans' Internet time, accounting to 23 percent of time spent online. Facebook is the main social network that the U.S internet users spend time more than the other social network services such as Yahoo, Google, AOL Media Network, Twitter, Linked In and so on. In recent trend, most of the companies promote their products in the Facebook by creating the "Facebook Page" that refers to specific product. The "Like" option allows user to subscribed and received updates their interested on from the page. The film makers which produce a lot of films around the world also take part to market and promote their films by exploiting the advantages of using the "Facebook Page". In addition, a great number of streaming service providers allows users to subscribe their service to watch and enjoy movies and TV program. They can instantly watch movies and TV program over the internet to PCs, Macs and TVs. Netflix alone as the world's leading subscription service have more than 30 million streaming members in the United States, Latin America, the United Kingdom and the Nordics. As the matter of facts, a million of movies and TV program with different of genres are offered to the subscriber. In contrast, users need spend a lot time to find the right movies which are related to their interest genre. Recent years there are many researchers who have been propose a method to improve prediction the rating or preference that would give the most related items such as books, music or movies to the garget user or the group of users that have the same interest in the particular items. One of the most popular methods to build recommendation system is traditional Collaborative Filtering (CF). The method compute the similarity of the target user and other users, which then are cluster in the same interest on items according which items that users have been rated. The method then predicts other items from the same group of users to recommend to a group of users. Moreover, There are many items that need to study for suggesting to users such as books, music, movies, news, videos and so on. However, in this paper we only focus on movie as item to recommend to users. In addition, there are many challenges for CF task. Firstly, the "sparsity problem"; it occurs when user information preference is not enough. The recommendation accuracies result is lower compared to the neighbor who composed with a large amount of ratings. The second problem is "cold-start problem"; it occurs whenever new users or items are added into the system, which each has norating or a few rating. For instance, no personalized predictions can be made for a new user without any ratings on the record. In this research we propose a clustering method according to the users' genre interest extracted from social network service (SNS) and user's movies rating information system to solve the "cold-start problem." Our proposed method will clusters the target user together with the other users by combining the user genre interest and the rating information. It is important to realize a huge amount of interesting and useful user's information from Facebook Graph, we can extract information from the "Facebook Page" which "Like" by them. Moreover, we use the Internet Movie Database(IMDb) as the main dataset. The IMDbis online databases that consist of a large amount of information related to movies, TV programs and including actors. This dataset not only used to provide movie information in our Movie Rating Systems, but also as resources to provide movie genre information which extracted from the "Facebook Page". Formerly, the user must login with their Facebook account to login to the Movie Rating System, at the same time our system will collect the genre interest from the "Facebook Page". We conduct many experiments with other methods to see how our method performs and we also compare to the other methods. First, we compared our proposed method in the case of the normal recommendation to see how our system improves the recommendation result. Then we experiment method in case of cold-start problem. Our experiment show that our method is outperform than the other methods. In these two cases of our experimentation, we see that our proposed method produces better result in case both cases.

Multi-day Trip Planning System with Collaborative Recommendation (협업적 추천 기반의 여행 계획 시스템)

  • Aprilia, Priska;Oh, Kyeong-Jin;Hong, Myung-Duk;Ga, Myeong-Hyeon;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.159-185
    • /
    • 2016
  • Planning a multi-day trip is a complex, yet time-consuming task. It usually starts with selecting a list of points of interest (POIs) worth visiting and then arranging them into an itinerary, taking into consideration various constraints and preferences. When choosing POIs to visit, one might ask friends to suggest them, search for information on the Web, or seek advice from travel agents; however, those options have their limitations. First, the knowledge of friends is limited to the places they have visited. Second, the tourism information on the internet may be vast, but at the same time, might cause one to invest a lot of time reading and filtering the information. Lastly, travel agents might be biased towards providers of certain travel products when suggesting itineraries. In recent years, many researchers have tried to deal with the huge amount of tourism information available on the internet. They explored the wisdom of the crowd through overwhelming images shared by people on social media sites. Furthermore, trip planning problems are usually formulated as 'Tourist Trip Design Problems', and are solved using various search algorithms with heuristics. Various recommendation systems with various techniques have been set up to cope with the overwhelming tourism information available on the internet. Prediction models of recommendation systems are typically built using a large dataset. However, sometimes such a dataset is not always available. For other models, especially those that require input from people, human computation has emerged as a powerful and inexpensive approach. This study proposes CYTRIP (Crowdsource Your TRIP), a multi-day trip itinerary planning system that draws on the collective intelligence of contributors in recommending POIs. In order to enable the crowd to collaboratively recommend POIs to users, CYTRIP provides a shared workspace. In the shared workspace, the crowd can recommend as many POIs to as many requesters as they can, and they can also vote on the POIs recommended by other people when they find them interesting. In CYTRIP, anyone can make a contribution by recommending POIs to requesters based on requesters' specified preferences. CYTRIP takes input on the recommended POIs to build a multi-day trip itinerary taking into account the user's preferences, the various time constraints, and the locations. The input then becomes a multi-day trip planning problem that is formulated in Planning Domain Definition Language 3 (PDDL3). A sequence of actions formulated in a domain file is used to achieve the goals in the planning problem, which are the recommended POIs to be visited. The multi-day trip planning problem is a highly constrained problem. Sometimes, it is not feasible to visit all the recommended POIs with the limited resources available, such as the time the user can spend. In order to cope with an unachievable goal that can result in no solution for the other goals, CYTRIP selects a set of feasible POIs prior to the planning process. The planning problem is created for the selected POIs and fed into the planner. The solution returned by the planner is then parsed into a multi-day trip itinerary and displayed to the user on a map. The proposed system is implemented as a web-based application built using PHP on a CodeIgniter Web Framework. In order to evaluate the proposed system, an online experiment was conducted. From the online experiment, results show that with the help of the contributors, CYTRIP can plan and generate a multi-day trip itinerary that is tailored to the users' preferences and bound by their constraints, such as location or time constraints. The contributors also find that CYTRIP is a useful tool for collecting POIs from the crowd and planning a multi-day trip.

A Study on Efficiency of Water Purification of Korean Village Bangjuk[dike] as a Means of Ecological Watershed Management (생태적 유역관리 도구로써 마을방죽의 수질정화 효율성 고찰)

  • An, Byung-Chul
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.30 no.2
    • /
    • pp.90-100
    • /
    • 2012
  • This study centering on 10 village - Bangjuks analyzed multifunctionality value of village Bangjuks which have been main water treatment system in Korean traditional villages. On the basis of understanding the structure and character of components such as the well, common spring, village waterway and others which making water-flow and consisting of aquatic system in Korean traditional village Bangjuk, the conclusion as the instrumental device of social and ecological role and ecological watershed management, securing the ecosystem soundness of the damaged or deteriated aquatic ecosystem due to the industrialization and urbanization is as below; 1. The traditional village Bangjuk was environmentally friendly hydraulic system which gathers waterways of village into a point including sewage water, retains and flows out to village through agricultural waterway. Through this Bangjuk, they have managed sewage and rainfall runoff flowed out village efficiently. It is not only a detention system of water but a kind of eco-friendly system that flow out water into the rivers after reusing and filtering it. 2. Around five traditional villages and five villages after modernization, this study classified the types of village Bangjuk as three types considering geographic location, size, etc; marsh type of low swamp, high water -low rice field type of natural flow stucture, low water - high rice field type requiring artificial irrigation facility. All the five traditional villages were turned out to be marsh type of low swamp. Geoji, Sanjeri, Ma-am, Yangchon of the agricultural villages were high water-low rice filed type, and Sangchoenri village was classified low water-high rice field type. 3. This study checked up the function of water purification of village Bangjuk. In Wonteo and Geji villages affected by discharge of village sewer and domestic sewage, the efficiency of ammonia nitrogen($NH_3-N$) and total phosphorus(T-P) was 56~95%, which was high. In Sangcheonri and Sanjeri villages strongly affected by stall and farmland, the efficiency of suspended solids(SS) was 70~85%, and that of total nitrogen(T-N) and total phosphorus(T-P) was 5.3~65%. 4. A water purification system can be found out in the system of village Bangjuk that filter out village sewage and rainfall runoff flowed through the settle and filter of pollution source and denitrification of plants. Through this system of village Bangjuk, it must be used as the basic facilities for the ecological watershed management. The sewage management system of village Bangjuk as a eco-filter must be used and studied as an eco-friendly facility for the ecological watershed management around the subwatershed and catchment.

What are the Characteristics and Future Directions of Domestic Angel Investment Research? (국내 엔젤투자 연구의 특징과 향후 방향은 무엇인가?)

  • Min Kim;Byung Chul Choi;Woo Jin Lee
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.6
    • /
    • pp.57-70
    • /
    • 2023
  • The investigation delved into 457 pieces of scholarly work, encompassing articles, published theses, and dissertations from the National Research Foundation of Korea, spanning the period of the 1997 IMF financial crisis up to 2022. The materials were sourced using terms such as 'angel investment', 'angel investor', and 'angel investment attraction'. The initial phase involved filtering out redundant entries from the preliminary collection of 267 works, leaving aside pieces that didn't pertain directly to angel investment as indicated in their abstracts. The next stage of the analysis involved a more rigorous selection process. Out of 43 papers earmarked in the preceding cut, only 32 were chosen. The criteria for this focused on the exclusion of conference presentations, articles that were either not submitted or inconclusive, and those that duplicated content under different titles. The final selection of 32 papers underwent a thorough systematic literature review. These documents, all pertinent to angel investment in South Korea, were scrutinized under five distinct categories: 1) publication year, 2) themes of research, 3) strategies employed in the studies, 4) participants involved in the research, and 5) methods of research utilized. This meticulous process illuminated the existing landscape of angel investment studies within Korea. Moreover, this study pinpointed gaps in the current body of research, offering guidance on future scholarly directions and proposing social scientific theories to further enrich the field of angel investment studies and analysis also seeks to pinpoint which areas require additional exploration to energize the field of angel investment moving forward. Through a comprehensive review of literature, this research intends to validate the establishment of future research trajectories and pinpoint areas that are currently and relatively underexplored in Korea's angel investment research stream. This study revealed that current research on domestic angel investment is concentrated on several areas: 1) the traits of angel investors, 2) the motivations behind angel investing, 3) startup ventures, 4) relevant institutions and policies, and 5) the various forms of angel investments. It was determined that there is a need to broaden the scope of research to aid in enhancing and stimulating the scale of domestic angel investing. This includes research into performance analysis of angel investments and detailed case studies in the field. Furthermore, the study emphasizes the importance of diversifying research efforts. Instead of solely focusing on specific factors like investment types, startups, accelerators, venture capital, and regulatory frameworks, there is a call for research that explores a variety of associated variables. These include aspects related to crowdfunding and return on investment in the context of angel investing, ensuring a more holistic approach to research in this domain. Specifically, there's a clear need for more detailed studies focusing on the relationships with variables that serve as dependent variables influencing the outcomes of angel investments. Moreover, it's essential to invigorate both qualitative and quantitative research that delves into the theoretical framework from multiple perspectives. This involves analyzing the structure of variables that have an impact on angel investments and the decisions surrounding these investments, thereby enriching the theoretical foundation of this field. Finally, we presented the direction of development for future research by confirming that the effect on the completeness of the business plan is high or low depending on the satisfaction of the entrepreneurs in addition to the components.

  • PDF

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.