• Title/Summary/Keyword: web based system

Search Result 5,286, Processing Time 0.031 seconds

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

The 1998, 1999 Patterns of Care Study for Breast Irradiation After Breast-Conserving Surgery in Korea (1998, 1999년도 우리나라에서 시행된 유방보존수술 후 방사선치료 현황 조사)

  • Suh Chang-Ok;Shin Hyun Soo;Cho Jae Ho;Park Won;Ahn Seung Do;Shin Kyung Hwan;Chung Eun Ji;Keum Ki Chang;Ha Sung Whan;Ahn Sung Ja;Kim Woo Cheol;Lee Myung Za;Ahn Ki Jung
    • Radiation Oncology Journal
    • /
    • v.22 no.3
    • /
    • pp.192-199
    • /
    • 2004
  • Purpose: To determine the patterns on evaluation and treatment in the patient with early breast cancer treated with conservative surgery and radiotherapy and to improve the radiotherapy techiniques, nationwide survey was peformed. Materials and Methods: A web-based database system for korean Patterns of Care Study (PCS) for 6 common cancers was developed. Two hundreds sixty-one randomly selected records of eligible patients treated between 1998$\~$1999 from 15 hospitals were reviewed. Results: The patients ages ranged from 24 to 85 years(median 45 years). Infiltrating ductal carcinoma was most common histologic type (88.9$\%$) followed by medullary carcinoma (4.2$\%$) and infiltrating lobular carcinoma (1.5$\%$). Pathologic T stage by AJCC was T1 in 59.7$\%$ of the casses, T2 in 29.5$\%$ of the cases, Tis in 8.8$\%$ of the cases. Axillary lymph node dissection was peformed I\in 91.2$\%$ of the cases and 69.7$\%$ were node negative. AJCC stage was 0 in 8.8$\%$ of the cases, stage I in 44.9$\%$ of the cases, stage IIa in 33.3$\%$ of the cases, and stage IIb in 8.4$\%$ of the cases. Estrogen and progesteron receptors were evaluated in 71.6$\%$, and 70.9$\%$ of the patients, respectively. Surgical methods of breast-conserving surgery was excision/lumpectomy in 37.2$\%$, wide excision in 11.5$\%$, quadrantectomy in 23$\%$ and partial mastectomy in 27.5$\%$ of the cases. A pathologically confirmed negative margin was obtained in 90.8$\%$ of the cases. Pathological margin was involved with tumor in 10 patients and margin was close (less than 2 mm) in 10 patients. All the patients except one recieved more than 90$\%$ of the planned radiotherapy dose. Radiotherapy volume was breast only In 88$\%$ of the cases, breast+supraclavicular fossa (SCL) in 5$\%$ of the cases, and breast+ SCL+ posterior axillary boost in 4.2%$\%$of the cases. Only one patient received isolated internal mammary lymph node irradiation. Used radiation beam was Co-60 in 8 cases, 4 MV X-ray in 115 cases, 6 MV X-ray in 125 cases, and 10 MV X-ray in 11 cases. The radiation dose to the whole breast was 45$\~$59.4 Gy (median 50.4) and boost dose was 8$\~$20 Gy (median 10 Gy). The total radiation dose delivered was 50.4$\~$70.4 Gy (median 60.4 Gy). Conclusion: There was no major deviation from current standard in the patterns of evaluation and treatment for the patients with early breast cancer treated with breast conservation method. Some varieties were identified in boost irradiation dose. Separate analysis for the datails of radiotherapy planning will be followed and the outcome of treatment is needed to evaluate the process.

Moderating Effect of Lifestyle on Consumer Behavior of Loungewear with Korean Traditional Fashion Design Elements (소비자대함유한국전통시상설계원소적편복적소비행위지우생활방식적조절작용(消费者对含有韩国传统时尚设计元素的便服的消费行为之于生活方式的调节作用))

  • Ko, Eun-Ju;Lee, Jee-Hyun;Kim, Angella Ji-Young;Burns, Leslie Davis
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.1
    • /
    • pp.15-26
    • /
    • 2010
  • Due to the globalization across various industries and cultural trade among many countries, oriental concepts have been attracting world’s attentions. In fashion industry, one's traditional culture is often developed as fashion theme for designers' creation and became strong strategies to stand out among competitors. Because of the increase of preferences for oriental images, opportunities abound to introduce traditional fashion goods and expand culture based business to global fashion markets. However, global fashion brands that include Korean traditional culture are yet to be developed. In order to develop a global fashion brand with Korean taste, it is very important for native citizen to accept their own culture in domestic apparel market prior to expansion into foreign market. Loungewear is evaluated to be appropriate for adopting Korean traditional details into clothing since this wardrobe category embraces various purposes which will easily lead to natural adaptation and wide spread use. Also, this market is seeing an increased demand for multipurpose wardrobes and fashionable underwear (Park et al. 2009). Despite rapid growth in the loungewear market, specific studies of loungewear is rare; and among research on developing modernized-traditional clothing, fashion items and brands do not always include the loungewear category. Therefore, this study investigated the Korean loungewear market and studied consumer evaluation toward loungewear with Korean traditional fashion design elements. Relationship among antecedents of purchase intention for Korean traditional fashion design elements were analyzed and compared between lifestyle groups for consumer targeting purposes. Product quality, retail service quality, perceived value, and preference on loungewear with Korean traditional design elements were chosen as antecedents of purchase intention and a structural equation model was designed to examine their relationship as well as their influence on purchase intention. Product quality and retail service quality among marketing mixes were employed as factors affecting preference and perceived value of loungewear with Korean traditional fashion design elements. Also effects of preference and perceived value on purchase intention were examined through the same model. A total of 357 self-administered questionnaires were completed by female consumers via web survey system. A questionnaire was developed to measure samples' lifestyle, product and retail service quality as purchasing criteria, perceived value, preference and purchase intention of loungewear with Korean traditional fashion design elements. Also, loungewear purchasing and usage behavior were asked as well in order to examine Korean loungewear market status. Data was analyzed through descriptive analysis, factor analysis, cluster analysis, ANOVA and structural equation model was tested via AMOS 7.0. As for the result of Korean loungewear market status investigation, loungewear was purchased by most of the consumers in our sample. Loungewear is currently recognized as clothes that are worn at home and consumers are showing comparably low involvement toward loungewear. Most of consumers in this study purchase loungewear only two to three times a year and they spend less than US$10. A total of 12 items and four factors of loungewear consumer lifestyle were found: traditional value oriented lifestyle, brand-affected lifestyle, pursuit of leisure lifestyle, and health oriented lifestyle. Drawing on lifestyle factors, loungewear consumers were classified into two groups; Well-being and Conservative. Relationships among constructs of purchasing behavior related to loungewear with Korean traditional fashion design elements were estimated. Preference and perceived value of loungewear were affected by both product quality and retail service quality. This study proved that high qualities in product and retail service develop positive preference toward loungewear. Perceived value and preference of loungewear positively influenced purchase intention. The results indicated that high preference and perceived value of loungewear with Korean traditional fashion design elements strengthen purchase intention and proved importance of developing preference and elevate perceived value in order to make sales. In a model comparison between two lifestyle groups: Well-being and Conservative lifestyle groups, results showed that product quality and retail service quality had positive influences on both preference and perceived value in case of Well-being group. However, for Conservative group, only retail service quality had a positive effect on preference and its influence to purchase intention. Since Well-being group showed more significant influence on purchase intention, loungewear brands with Korean traditional fashion design elements may want to focus on characteristics of Well-being group. However, Conservative group's relationship between preference and purchase intention of loungewear with Korean traditional fashion design elements was stronger, so that loungewear brands with Korean traditional fashion design elements should focus on creating conservative consumers' positive preference toward loungewear. The results offered information on Korean loungewear consumers' lifestyle and provided useful information for fashion brands that are planning to enter Korean loungewear market, particularly targeting female consumers similar to the sample of the present study. This study offers strategic and marketing insight for loungewear brands and also for fashion brands that are planning to create highly value-added fashion brands with Korean traditional fashion design elements. Considering different types of lifestyle groups that are associated with loungewear or traditional fashion goods, brand managers and marketers can use the results of this paper as a reference to positioning, targeting and marketing strategy buildings.

Analysis of media trends related to spent nuclear fuel treatment technology using text mining techniques (텍스트마이닝 기법을 활용한 사용후핵연료 건식처리기술 관련 언론 동향 분석)

  • Jeong, Ji-Song;Kim, Ho-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.33-54
    • /
    • 2021
  • With the fourth industrial revolution and the arrival of the New Normal era due to Corona, the importance of Non-contact technologies such as artificial intelligence and big data research has been increasing. Convergent research is being conducted in earnest to keep up with these research trends, but not many studies have been conducted in the area of nuclear research using artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. This study was conducted to confirm the applicability of data science analysis techniques to the field of nuclear research. Furthermore, the study of identifying trends in nuclear spent fuel recognition is critical in terms of being able to determine directions to nuclear industry policies and respond in advance to changes in industrial policies. For those reasons, this study conducted a media trend analysis of pyroprocessing, a spent nuclear fuel treatment technology. We objectively analyze changes in media perception of spent nuclear fuel dry treatment techniques by applying text mining analysis techniques. Text data specializing in Naver's web news articles, including the keywords "Pyroprocessing" and "Sodium Cooled Reactor," were collected through Python code to identify changes in perception over time. The analysis period was set from 2007 to 2020, when the first article was published, and detailed and multi-layered analysis of text data was carried out through analysis methods such as word cloud writing based on frequency analysis, TF-IDF and degree centrality calculation. Analysis of the frequency of the keyword showed that there was a change in media perception of spent nuclear fuel dry treatment technology in the mid-2010s, which was influenced by the Gyeongju earthquake in 2016 and the implementation of the new government's energy conversion policy in 2017. Therefore, trend analysis was conducted based on the corresponding time period, and word frequency analysis, TF-IDF, degree centrality values, and semantic network graphs were derived. Studies show that before the 2010s, media perception of spent nuclear fuel dry treatment technology was diplomatic and positive. However, over time, the frequency of keywords such as "safety", "reexamination", "disposal", and "disassembly" has increased, indicating that the sustainability of spent nuclear fuel dry treatment technology is being seriously considered. It was confirmed that social awareness also changed as spent nuclear fuel dry treatment technology, which was recognized as a political and diplomatic technology, became ambiguous due to changes in domestic policy. This means that domestic policy changes such as nuclear power policy have a greater impact on media perceptions than issues of "spent nuclear fuel processing technology" itself. This seems to be because nuclear policy is a socially more discussed and public-friendly topic than spent nuclear fuel. Therefore, in order to improve social awareness of spent nuclear fuel processing technology, it would be necessary to provide sufficient information about this, and linking it to nuclear policy issues would also be a good idea. In addition, the study highlighted the importance of social science research in nuclear power. It is necessary to apply the social sciences sector widely to the nuclear engineering sector, and considering national policy changes, we could confirm that the nuclear industry would be sustainable. However, this study has limitations that it has applied big data analysis methods only to detailed research areas such as "Pyroprocessing," a spent nuclear fuel dry processing technology. Furthermore, there was no clear basis for the cause of the change in social perception, and only news articles were analyzed to determine social perception. Considering future comments, it is expected that more reliable results will be produced and efficiently used in the field of nuclear policy research if a media trend analysis study on nuclear power is conducted. Recently, the development of uncontact-related technologies such as artificial intelligence and big data research is accelerating in the wake of the recent arrival of the New Normal era caused by corona. Convergence research is being conducted in earnest in various research fields to follow these research trends, but not many studies have been conducted in the nuclear field with artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. The academic significance of this study is that it was possible to confirm the applicability of data science analysis technology in the field of nuclear research. Furthermore, due to the impact of current government energy policies such as nuclear power plant reductions, re-evaluation of spent fuel treatment technology research is undertaken, and key keyword analysis in the field can contribute to future research orientation. It is important to consider the views of others outside, not just the safety technology and engineering integrity of nuclear power, and further reconsider whether it is appropriate to discuss nuclear engineering technology internally. In addition, if multidisciplinary research on nuclear power is carried out, reasonable alternatives can be prepared to maintain the nuclear industry.