• Title/Summary/Keyword: Semantic Web Data

Search Result 364, Processing Time 0.022 seconds

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Analysis of media trends related to spent nuclear fuel treatment technology using text mining techniques (텍스트마이닝 기법을 활용한 사용후핵연료 건식처리기술 관련 언론 동향 분석)

  • Jeong, Ji-Song;Kim, Ho-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.33-54
    • /
    • 2021
  • With the fourth industrial revolution and the arrival of the New Normal era due to Corona, the importance of Non-contact technologies such as artificial intelligence and big data research has been increasing. Convergent research is being conducted in earnest to keep up with these research trends, but not many studies have been conducted in the area of nuclear research using artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. This study was conducted to confirm the applicability of data science analysis techniques to the field of nuclear research. Furthermore, the study of identifying trends in nuclear spent fuel recognition is critical in terms of being able to determine directions to nuclear industry policies and respond in advance to changes in industrial policies. For those reasons, this study conducted a media trend analysis of pyroprocessing, a spent nuclear fuel treatment technology. We objectively analyze changes in media perception of spent nuclear fuel dry treatment techniques by applying text mining analysis techniques. Text data specializing in Naver's web news articles, including the keywords "Pyroprocessing" and "Sodium Cooled Reactor," were collected through Python code to identify changes in perception over time. The analysis period was set from 2007 to 2020, when the first article was published, and detailed and multi-layered analysis of text data was carried out through analysis methods such as word cloud writing based on frequency analysis, TF-IDF and degree centrality calculation. Analysis of the frequency of the keyword showed that there was a change in media perception of spent nuclear fuel dry treatment technology in the mid-2010s, which was influenced by the Gyeongju earthquake in 2016 and the implementation of the new government's energy conversion policy in 2017. Therefore, trend analysis was conducted based on the corresponding time period, and word frequency analysis, TF-IDF, degree centrality values, and semantic network graphs were derived. Studies show that before the 2010s, media perception of spent nuclear fuel dry treatment technology was diplomatic and positive. However, over time, the frequency of keywords such as "safety", "reexamination", "disposal", and "disassembly" has increased, indicating that the sustainability of spent nuclear fuel dry treatment technology is being seriously considered. It was confirmed that social awareness also changed as spent nuclear fuel dry treatment technology, which was recognized as a political and diplomatic technology, became ambiguous due to changes in domestic policy. This means that domestic policy changes such as nuclear power policy have a greater impact on media perceptions than issues of "spent nuclear fuel processing technology" itself. This seems to be because nuclear policy is a socially more discussed and public-friendly topic than spent nuclear fuel. Therefore, in order to improve social awareness of spent nuclear fuel processing technology, it would be necessary to provide sufficient information about this, and linking it to nuclear policy issues would also be a good idea. In addition, the study highlighted the importance of social science research in nuclear power. It is necessary to apply the social sciences sector widely to the nuclear engineering sector, and considering national policy changes, we could confirm that the nuclear industry would be sustainable. However, this study has limitations that it has applied big data analysis methods only to detailed research areas such as "Pyroprocessing," a spent nuclear fuel dry processing technology. Furthermore, there was no clear basis for the cause of the change in social perception, and only news articles were analyzed to determine social perception. Considering future comments, it is expected that more reliable results will be produced and efficiently used in the field of nuclear policy research if a media trend analysis study on nuclear power is conducted. Recently, the development of uncontact-related technologies such as artificial intelligence and big data research is accelerating in the wake of the recent arrival of the New Normal era caused by corona. Convergence research is being conducted in earnest in various research fields to follow these research trends, but not many studies have been conducted in the nuclear field with artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. The academic significance of this study is that it was possible to confirm the applicability of data science analysis technology in the field of nuclear research. Furthermore, due to the impact of current government energy policies such as nuclear power plant reductions, re-evaluation of spent fuel treatment technology research is undertaken, and key keyword analysis in the field can contribute to future research orientation. It is important to consider the views of others outside, not just the safety technology and engineering integrity of nuclear power, and further reconsider whether it is appropriate to discuss nuclear engineering technology internally. In addition, if multidisciplinary research on nuclear power is carried out, reasonable alternatives can be prepared to maintain the nuclear industry.

A User Profile-based Filtering Method for Information Search in Smart TV Environment (스마트 TV 환경에서 정보 검색을 위한 사용자 프로파일 기반 필터링 방법)

  • Sean, Visal;Oh, Kyeong-Jin;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.97-117
    • /
    • 2012
  • Nowadays, Internet users tend to do a variety of actions at the same time such as web browsing, social networking and multimedia consumption. While watching a video, once a user is interested in any product, the user has to do information searches to get to know more about the product. With a conventional approach, user has to search it separately with search engines like Bing or Google, which might be inconvenient and time-consuming. For this reason, a video annotation platform has been developed in order to provide users more convenient and more interactive ways with video content. In the future of smart TV environment, users can follow annotated information, for example, a link to a vendor to buy the product of interest. It is even better to enable users to search for information by directly discussing with friends. Users can effectively get useful and relevant information about the product from friends who share common interests or might have experienced it before, which is more reliable than the results from search engines. Social networking services provide an appropriate environment for people to share products so that they can show new things to their friends and to share their personal experiences on any specific product. Meanwhile, they can also absorb the most relevant information about the product that they are interested in by either comments or discussion amongst friends. However, within a very huge graph of friends, determining the most appropriate persons to ask for information about a specific product has still a limitation within the existing conventional approach. Once users want to share or discuss a product, they simply share it to all friends as new feeds. This means a newly posted article is blindly spread to all friends without considering their background interests or knowledge. In this way, the number of responses back will be huge. Users cannot easily absorb the relevant and useful responses from friends, since they are from various fields of interest and knowledge. In order to overcome this limitation, we propose a method to filter a user's friends for information search, which leverages semantic video annotation and social networking services. Our method filters and brings out who can give user useful information about a specific product. By examining the existing Facebook information regarding users and their social graph, we construct a user profile of product interest. With user's permission and authentication, user's particular activities are enriched with the domain-specific ontology such as GoodRelations and BestBuy Data sources. Besides, we assume that the object in the video is already annotated using Linked Data. Thus, the detail information of the product that user would like to ask for more information is retrieved via product URI. Our system calculates the similarities among them in order to identify the most suitable friends for seeking information about the mentioned product. The system filters a user's friends according to their score which tells the order of whom can highly likely give the user useful information about a specific product of interest. We have conducted an experiment with a group of respondents in order to verify and evaluate our system. First, the user profile accuracy evaluation is conducted to demonstrate how much our system constructed user profile of product interest represents user's interest correctly. Then, the evaluation on filtering method is made by inspecting the ranked results with human judgment. The results show that our method works effectively and efficiently in filtering. Our system fulfills user needs by supporting user to select appropriate friends for seeking useful information about a specific product that user is curious about. As a result, it helps to influence and convince user in purchase decisions.

An Empirical Study on Motivation Factors and Reward Structure for User's Createve Contents Generation: Focusing on the Mediating Effect of Commitment (창의적인 UCC 제작에 영향을 미치는 동기 및 보상 체계에 대한 연구: 몰입에 매개 효과를 중심으로)

  • Kim, Jin-Woo;Yang, Seung-Hwa;Lim, Seong-Taek;Lee, In-Seong
    • Asia pacific journal of information systems
    • /
    • v.20 no.1
    • /
    • pp.141-170
    • /
    • 2010
  • User created content (UCC) is created and shared by common users on line. From the user's perspective, the increase of UCCs has led to an expansion of alternative means of communications, while from the business perspective UCCs have formed an environment in which an abundant amount of new contents can be produced. Despite outward quantitative growth, however, many aspects of UCCs do not meet the expectations of general users in terms of quality, and this can be observed through pirated contents and user-copied contents. The purpose of this research is to investigate effective methods for fostering production of creative user-generated content. This study proposes two core elements, namely, reward and motivation, which are believed to enhance content creativity as well as the mediating factor and users' committement, which will be effective for bridging the increasing motivation and content creativity. Based on this perspective, this research takes an in-depth look at issues related to constructing the dimensions of reward and motivation in UCC services for creative content product, which are identified in three phases. First, three dimensions of rewards have been proposed: task dimension, social dimension, and organizational dimention. The task dimension rewards are related to the inherent characteristics of a task such as writing blog articles and pasting photos. Four concrete ways of providing task-related rewards in UCC environments are suggested in this study, which include skill variety, task significance, task identity, and autonomy. The social dimensioni rewards are related to the connected relationships among users. The organizational dimension consists of monetary payoff and recognition from others. Second, the two types of motivations are suggested to be affected by the diverse rewards schemes: intrinsic motivation and extrinsic motivation. Intrinsic motivation occurs when people create new UCC contents for its' own sake, whereas extrinsic motivation occurs when people create new contents for other purposes such as fame and money. Third, commitments are suggested to work as important mediating variables between motivation and content creativity. We believe commitments are especially important in online environments because they have been found to exert stronger impacts on the Internet users than other relevant factors do. Two types of commitments are suggested in this study: emotional commitment and continuity commitment. Finally, content creativity is proposed as the final dependent variable in this study. We provide a systematic method to measure the creativity of UCC content based on the prior studies in creativity measurement. The method includes expert evaluation of blog pages posted by the Internet users. In order to test the theoretical model of our study, 133 active blog users were recruited to participate in a group discussion as well as a survey. They were asked to fill out a questionnaire on their commitment, motivation and rewards of creating UCC contents. At the same time, their creativity was measured by independent experts using Torrance Tests of Creative Thinking. Finally, two independent users visited the study participants' blog pages and evaluated their content creativity using the Creative Products Semantic Scale. All the data were compiled and analyzed through structural equation modeling. We first conducted a confirmatory factor analysis to validate the measurement model of our research. It was found that measures used in our study satisfied the requirement of reliability, convergent validity as well as discriminant validity. Given the fact that our measurement model is valid and reliable, we proceeded to conduct a structural model analysis. The results indicated that all the variables in our model had higher than necessary explanatory powers in terms of R-square values. The study results identified several important reward shemes. First of all, skill variety, task importance, task identity, and automony were all found to have significant influences on the intrinsic motivation of creating UCC contents. Also, the relationship with other users was found to have strong influences upon both intrinsic and extrinsic motivation. Finally, the opportunity to get recognition for their UCC work was found to have a significant impact on the extrinsic motivation of UCC users. However, different from our expectation, monetary compensation was found not to have a significant impact on the extrinsic motivation. It was also found that commitment was an important mediating factor in UCC environment between motivation and content creativity. A more fully mediating model was found to have the highest explanation power compared to no-mediation or partially mediated models. This paper ends with implications of the study results. First, from the theoretical perspective this study proposes and empirically validates the commitment as an important mediating factor between motivation and content creativity. This result reflects the characteristics of online environment in which the UCC creation activities occur voluntarily. Second, from the practical perspective this study proposes several concrete reward factors that are germane to the UCC environment, and their effectiveness to the content creativity is estimated. In addition to the quantitive results of relative importance of the reward factrs, this study also proposes concrete ways to provide the rewards in the UCC environment based on the FGI data that are collected after our participants finish asnwering survey questions. Finally, from the methodological perspective, this study suggests and implements a way to measure the UCC content creativity independently from the content generators' creativity, which can be used later by future research on UCC creativity. In sum, this study proposes and validates important reward features and their relations to the motivation, commitment, and the content creativity in UCC environment, which is believed to be one of the most important factors for the success of UCC and Web 2.0. As such, this study can provide significant theoretical as well as practical bases for fostering creativity in UCC contents.