• Title/Summary/Keyword: 대량활용

Search Result 974, Processing Time 0.022 seconds

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Study of East Asia Climate Change for the Last Glacial Maximum Using Numerical Model (수치모델을 이용한 Last Glacial Maximum의 동아시아 기후변화 연구)

  • Kim, Seong-Joong;Park, Yoo-Min;Lee, Bang-Yong;Choi, Tae-Jin;Yoon, Young-Jun;Suk, Bong-Chool
    • The Korean Journal of Quaternary Research
    • /
    • v.20 no.1 s.26
    • /
    • pp.51-66
    • /
    • 2006
  • The climate of the last glacial maximum (LGM) in northeast Asia is simulated with an atmospheric general circulation model of NCAR CCM3 at spectral truncation of T170, corresponding to a grid cell size of roughly 75 km. Modern climate is simulated by a prescribed sea surface temperature and sea ice provided from NCAR, and contemporary atmospheric CO2, topography, and orbital parameters, while LGM simulation was forced with the reconstructed CLIMAP sea surface temperatures, sea ice distribution, ice sheet topography, reduced $CO_2$, and orbital parameters. Under LGM conditions, surface temperature is markedly reduced in winter by more than $18^{\circ}C$ in the Korean west sea and continental margin of the Korean east sea, where the ocean exposed to land in the LGM, whereas in these areas surface temperature is warmer than present in summer by up to $2^{\circ}C$. This is due to the difference in heat capacity between ocean and land. Overall, in the LGM surface is cooled by $4{\sim}6^{\circ}C$ in northeast Asia land and by $7.1^{\circ}C$ in the entire area. An analysis of surface heat fluxes show that the surface cooling is due to the increase in outgoing longwave radiation associated with the reduced $CO_2$ concentration. The reduction in surface temperature leads to a weakening of the hydrological cycle. In winter, precipitation decreases largely in the southeastern part of Asia by about $1{\sim}4\;mm/day$, while in summer a larger reduction is found over China. Overall, annual-mean precipitation decreases by about 50% in the LGM. In northeast Asia, evaporation is also overall reduced in the LGM, but the reduction of precipitation is larger, eventually leading to a drier climate. The drier LGM climate simulated in this study is consistent with proxy evidence compiled in other areas. Overall, the high-resolution model captures the climate features reasonably well under global domain.

  • PDF

Clinical Characteristics of Pulmonary Aspergilloma (폐국균종의 임상적 고찰)

  • Kang, Tae-Kyung;Kim, Chang-Ho;Park, Jae-Yong;Jung, Tae-Hoon;Sohn, Jeong-Ho;Lee, Jun-Ho;Han, Seong-Beom;Jeon, Young-Jun;Kim, Ki-Beom;Chung, Jin-Hong;Lee, Kwan-Ho;Lee, Hyun-Woo;Shin, Hyeon-Soo;Lee, Sang-Chae;Kweon, Sam
    • Tuberculosis and Respiratory Diseases
    • /
    • v.44 no.6
    • /
    • pp.1308-1317
    • /
    • 1997
  • Background : Pulmonary aspergillomas usually arise from colonization and proliferation of Aspergillus in preexisting cavitary lung disease of any cause. About 15% of patients with tuberculous pulmonary cavities were found to have aspergilloma. We analyzed the clinical features and course of 91 patients with pulmonary aspergilloma. Method : During the ten-year period from June 1986 to May 1996, 91 patients whose condition was diagnosed as pulmonary aspergilloma at 4 university hospitals in Taegu city were reviewed. All patients fulfilled one of the following criteria : 1) histologic evidence of aspergilloma within abnormal air space in tissue sections, or 2) a positive Aspergillus serum precipitin test with the radiologic finding of a fungus ball. The histological diagno-sis was established in 81 patients(89.0%) and clinical diagnosis in 10 patients(11.0%). Results : 1) The age range was 22 to 65 years, with an average of 45 years. A male and female ratio was 1.7 : 1 (57 men and 34 women). 2) Hemoptysis was far the most frequent symptom(89%), followed by cough, dyspnea, weakness, weight loss, fever, chest pain. 3) In all but 14 cases(15.4%) there had been associated conditions. Pulmonary tuberculosis was far the most frequent underlying condition found(74.7%), followed by bronchiectasis (6.6%), cavitary neoplasm(2.2%), pulmonary sequestration(1.1%). 4) The involved area was usually in the upper lobes; the right upper lobe was involved in 39(42.9%), the left upper lobe in 31(34.1%), the left lower lobe in 13(14.3%), the right lower lobe in 7(7.7%), and the right middle lobe in 1(1.1%). 5) On standard chest roent geno gram the classic "bell-like" image of a fungus ball was found in 62.6% of the subjects. On CT scan, 88.1% of the subjects in which they were done. 6) The surgical therapy was undertaken in 76 patients, and medical therapy in 15 patients, including 4 patients with intracavitary instillation of amphotericin B. 7) The surgical modality was lobectomy in 55 patients(72.4%), segmentectomy in 16 patients(21.1%), pneumonectomy in 4 patients(5.3%), wedge resection in 1 patient(1.3%). The mortality rate was 3.9% (3 patients) ; 2 patients died of sepsis and 1 died of hemoptysis. The postoperative complications were encountered in 6 patients (7.9%), including each one patient with respiratory failure, bleeding, bronchopleural fistula, empyema, and vocal cord paralysis. 8) In the follow-up cases, each 2 patients of 71 patients with surgical treatment and 10 patients with medical treatment had recurrent hemoptysis. Conclusion : During follow-up of the chronic pulmonary disease with abnormal air space, if the standard chest roentgenograms are insufficient to detect a fungus ball, computed tomographic scan and serum precipitin test are likely to aid the diagnosis of patients with suspected pulmonary aspergilloma. A reasonable recommendation for management of a patient with aspergilloma would be to reserve surgical resection for those patients who have had severe, recurrent hemoptysis. And a well controlled cooperative study to the medical treatment such as intracavitary antifungal therapy is further needed.

  • PDF