• Title/Summary/Keyword: Semantic analysis

Search Result 1,355, Processing Time 0.025 seconds

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

A Study on the Consideration of the Locations of Gyeongju Oksan Gugok and Landscape Interpretation - Focusing on the Arbor of Lee, Jung-Eom's "Oksan Gugok" - (경주 옥산구곡(玉山九曲)의 위치비정과 경관해석 연구 - 이정엄의 「옥산구곡가」를 중심으로 -)

  • Peng, Hong-Xu;Kang, Tai-Ho
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.36 no.3
    • /
    • pp.26-36
    • /
    • 2018
  • This study aims to examine the characteristics of landscape through the analysis of location and the landscape of Gugok while also conducting the empirical study through the literature review, field study, and digital analysis of the Okgung Gugok. Oksan Gugok is a set of songs set in Ogsan Creek(玉山川)or Jagyese Creek(紫溪川, 紫玉山), which flows in front of the Oksan Memorial Hall(李彦迪), which is dedicated to the Lee Eong-jeok (李彦迪). We first ascertained the location and configuration of Oksan Gogok. Second, we confirmed the accurate location of Oksan Gogok by utilizing the digital topographic map of Oksan Gogok which was submitted by Google Earth Pro and Geographic Information Center as well as the length of the longitude of the gravel measured by the Trimble Juno SB GPS. Through the study of the literature and the field investigation, The results of the study are as follows. First, Yi Eonjeok was not a direct composer of Oksan Gugok, nor did he produce "Oksan Gugokha(Music)". Lee Ia-sung(李野淳), the ninth Youngest Son of Tweo-Kye, Hwang Lee, visited the "Oksan Gugokha" in the spring of 1823(Sunjo 23), which was the 270th years after the reign of Yi Eonjeok. At this time, receiving the proposal of Ian Sung, Lee Jung-eom(李鼎儼), Lee Jung-gi(李鼎基), and Lee Jung-byeong(李鼎秉), the descendants of Ian Sung set up a song and created Oksan Gugok Music. And the Essay of Oksan Travel Companions writted by Lee Jung-gi turns out being a crucial data to describe the situation when setting up the Ok-San Gugok. Second, In the majority of cases, Gogok Forest is a forest managed by a Confucian Scholar, not run by ordinary people. The creation of "Oksan Bugok Music" can be regarded as an expression of pride that the descendants of Yi Eonjeok and Lee Hwang, and next generation of several Confucian scholars had inherited traditional Neo-Confucian. Third, Lee Jung-eom's "Oksan Donghaengki" contains a detailed description of the "Oksan Gugokha" process and the process of creating a song. Fourth, We examined the location of one to nine Oksan songs again. In particular, eight songs and nine songs were located at irregular intervals, and eight songs were identified as $36^{\circ}01^{\prime}08.60^{{\prime}{\prime}}N$, $129^{\circ}09^{\prime}31.20^{{\prime}{\prime}}E$. Referring to the ancient kingdom of Taojam, the nine-stringed Sainam was unbiased as a lower rock where the two valleys of the East West congregate. The location was estimated at $36^{\circ}01^{\prime}19.79^{{\prime}{\prime}}N$, $129^{\circ}09^{\prime}30.26^{{\prime}{\prime}}E$. Fifth, The landscape elements and landscapes presented in Lee Jung-eom's "Oksan Gugokha" were divided into form, semantic and climatic elements. As a result, Lee Jung-eom's Cho Young-gwan was able to see the ideal of mountain water and the feeling of being idle in nature as well as the sense of freedom. Sixth, After examining the appearance of the elements and the frequency of the appearance of the landscape, 'water' and 'mountain' were the absolute factors that emphasized the original curved environment at the mouth of Lee Jung-eom. Therefore, there was gugokga can gauge the fresh ideas(神仙思想)and retreat ever(隱居思想). This inherent harmony between the landscape as well as through the mulah any ideas that one with nature and meditation, Confucian tube.

An Empirical Study on Motivation Factors and Reward Structure for User's Createve Contents Generation: Focusing on the Mediating Effect of Commitment (창의적인 UCC 제작에 영향을 미치는 동기 및 보상 체계에 대한 연구: 몰입에 매개 효과를 중심으로)

  • Kim, Jin-Woo;Yang, Seung-Hwa;Lim, Seong-Taek;Lee, In-Seong
    • Asia pacific journal of information systems
    • /
    • v.20 no.1
    • /
    • pp.141-170
    • /
    • 2010
  • User created content (UCC) is created and shared by common users on line. From the user's perspective, the increase of UCCs has led to an expansion of alternative means of communications, while from the business perspective UCCs have formed an environment in which an abundant amount of new contents can be produced. Despite outward quantitative growth, however, many aspects of UCCs do not meet the expectations of general users in terms of quality, and this can be observed through pirated contents and user-copied contents. The purpose of this research is to investigate effective methods for fostering production of creative user-generated content. This study proposes two core elements, namely, reward and motivation, which are believed to enhance content creativity as well as the mediating factor and users' committement, which will be effective for bridging the increasing motivation and content creativity. Based on this perspective, this research takes an in-depth look at issues related to constructing the dimensions of reward and motivation in UCC services for creative content product, which are identified in three phases. First, three dimensions of rewards have been proposed: task dimension, social dimension, and organizational dimention. The task dimension rewards are related to the inherent characteristics of a task such as writing blog articles and pasting photos. Four concrete ways of providing task-related rewards in UCC environments are suggested in this study, which include skill variety, task significance, task identity, and autonomy. The social dimensioni rewards are related to the connected relationships among users. The organizational dimension consists of monetary payoff and recognition from others. Second, the two types of motivations are suggested to be affected by the diverse rewards schemes: intrinsic motivation and extrinsic motivation. Intrinsic motivation occurs when people create new UCC contents for its' own sake, whereas extrinsic motivation occurs when people create new contents for other purposes such as fame and money. Third, commitments are suggested to work as important mediating variables between motivation and content creativity. We believe commitments are especially important in online environments because they have been found to exert stronger impacts on the Internet users than other relevant factors do. Two types of commitments are suggested in this study: emotional commitment and continuity commitment. Finally, content creativity is proposed as the final dependent variable in this study. We provide a systematic method to measure the creativity of UCC content based on the prior studies in creativity measurement. The method includes expert evaluation of blog pages posted by the Internet users. In order to test the theoretical model of our study, 133 active blog users were recruited to participate in a group discussion as well as a survey. They were asked to fill out a questionnaire on their commitment, motivation and rewards of creating UCC contents. At the same time, their creativity was measured by independent experts using Torrance Tests of Creative Thinking. Finally, two independent users visited the study participants' blog pages and evaluated their content creativity using the Creative Products Semantic Scale. All the data were compiled and analyzed through structural equation modeling. We first conducted a confirmatory factor analysis to validate the measurement model of our research. It was found that measures used in our study satisfied the requirement of reliability, convergent validity as well as discriminant validity. Given the fact that our measurement model is valid and reliable, we proceeded to conduct a structural model analysis. The results indicated that all the variables in our model had higher than necessary explanatory powers in terms of R-square values. The study results identified several important reward shemes. First of all, skill variety, task importance, task identity, and automony were all found to have significant influences on the intrinsic motivation of creating UCC contents. Also, the relationship with other users was found to have strong influences upon both intrinsic and extrinsic motivation. Finally, the opportunity to get recognition for their UCC work was found to have a significant impact on the extrinsic motivation of UCC users. However, different from our expectation, monetary compensation was found not to have a significant impact on the extrinsic motivation. It was also found that commitment was an important mediating factor in UCC environment between motivation and content creativity. A more fully mediating model was found to have the highest explanation power compared to no-mediation or partially mediated models. This paper ends with implications of the study results. First, from the theoretical perspective this study proposes and empirically validates the commitment as an important mediating factor between motivation and content creativity. This result reflects the characteristics of online environment in which the UCC creation activities occur voluntarily. Second, from the practical perspective this study proposes several concrete reward factors that are germane to the UCC environment, and their effectiveness to the content creativity is estimated. In addition to the quantitive results of relative importance of the reward factrs, this study also proposes concrete ways to provide the rewards in the UCC environment based on the FGI data that are collected after our participants finish asnwering survey questions. Finally, from the methodological perspective, this study suggests and implements a way to measure the UCC content creativity independently from the content generators' creativity, which can be used later by future research on UCC creativity. In sum, this study proposes and validates important reward features and their relations to the motivation, commitment, and the content creativity in UCC environment, which is believed to be one of the most important factors for the success of UCC and Web 2.0. As such, this study can provide significant theoretical as well as practical bases for fostering creativity in UCC contents.