• Title/Summary/Keyword: 오픈 데이터

Search Result 737, Processing Time 0.024 seconds

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Validation of Surface Reflectance Product of KOMPSAT-3A Image Data: Application of RadCalNet Baotou (BTCN) Data (다목적실용위성 3A 영상 자료의 지표 반사도 성과 검증: RadCalNet Baotou(BTCN) 자료 적용 사례)

  • Kim, Kwangseob;Lee, Kiwon
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_2
    • /
    • pp.1509-1521
    • /
    • 2020
  • Experiments for validation of surface reflectance produced by Korea Multi-Purpose Satellite (KOMPSAT-3A) were conducted using Chinese Baotou (BTCN) data among four sites of the Radical Calibration Network (RadCalNet), a portal that provides spectrophotometric reflectance measurements. The atmosphere reflectance and surface reflectance products were generated using an extension program of an open-source Orfeo ToolBox (OTB), which was redesigned and implemented to extract those reflectance products in batches. Three image data sets of 2016, 2017, and 2018 were taken into account of the two sensor model variability, ver. 1.4 released in 2017 and ver. 1.5 in 2019, such as gain and offset applied to the absolute atmospheric correction. The results of applying these sensor model variables showed that the reflectance products by ver. 1.4 were relatively well-matched with RadCalNet BTCN data, compared to ones by ver. 1.5. On the other hand, the reflectance products obtained from the Landsat-8 by the USGS LaSRC algorithm and Sentinel-2B images using the SNAP Sen2Cor program were used to quantitatively verify the differences in those of KOMPSAT-3A. Based on the RadCalNet BTCN data, the differences between the surface reflectance of KOMPSAT-3A image were shown to be highly consistent with B band as -0.031 to 0.034, G band as -0.001 to 0.055, R band as -0.072 to 0.037, and NIR band as -0.060 to 0.022. The surface reflectance of KOMPSAT-3A also indicated the accuracy level for further applications, compared to those of Landsat-8 and Sentinel-2B images. The results of this study are meaningful in confirming the applicability of Analysis Ready Data (ARD) to the surface reflectance on high-resolution satellites.

GIS-based Market Analysis and Sales Management System : The Case of a Telecommunication Company (시장분석 및 영업관리 역량 강화를 위한 통신사의 GIS 적용 사례)

  • Chang, Nam-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.61-75
    • /
    • 2011
  • A Geographic Information System(GIS) is a system that captures, stores, analyzes, manages and presents data with reference to geographic location data. In the later 1990s and earlier 2000s it was limitedly used in government sectors such as public utility management, urban planning, landscape architecture, and environmental contamination control. However, a growing number of open-source packages running on a range of operating systems enabled many private enterprises to explore the concept of viewing GIS-based sales and customer data over their own computer monitors. K telecommunication company has dominated the Korean telecommunication market by providing diverse services, such as high-speed internet, PSTN(Public Switched Telephone Network), VOLP (Voice Over Internet Protocol), and IPTV(Internet Protocol Television). Even though the telecommunication market in Korea is huge, the competition between major services providers is growing more fierce than ever before. Service providers struggled to acquire as many new customers as possible, attempted to cross sell more products to their regular customers, and made more efforts on retaining the best customers by offering unprecedented benefits. Most service providers including K telecommunication company tried to adopt the concept of customer relationship management(CRM), and analyze customer's demographic and transactional data statistically in order to understand their customer's behavior. However, managing customer information has still remained at the basic level, and the quality and the quantity of customer data were not enough not only to understand the customers but also to design a strategy for marketing and sales. For example, the currently used 3,074 legal regional divisions, which are originally defined by the government, were too broad to calculate sub-regional customer's service subscription and cancellation ratio. Additional external data such as house size, house price, and household demographics are also needed to measure sales potential. Furthermore, making tables and reports were time consuming and they were insufficient to make a clear judgment about the market situation. In 2009, this company needed a dramatic shift in the way marketing and sales activities, and finally developed a dedicated GIS_based market analysis and sales management system. This system made huge improvement in the efficiency with which the company was able to manage and organize all customer and sales related information, and access to those information easily and visually. After the GIS information system was developed, and applied to marketing and sales activities at the corporate level, the company was reported to increase sales and market share substantially. This was due to the fact that by analyzing past market and sales initiatives, creating sales potential, and targeting key markets, the system could make suggestions and enable the company to focus its resources on the demographics most likely to respond to the promotion. This paper reviews subjective and unclear marketing and sales activities that K telecommunication company operated, and introduces the whole process of developing the GIS information system. The process consists of the following 5 modules : (1) Customer profile cleansing and standardization, (2) Internal/External DB enrichment, (3) Segmentation of 3,074 legal regions into 46,590 sub_regions called blocks, (4) GIS data mart design, and (5) GIS system construction. The objective of this case study is to emphasize the need of GIS system and how it works in the private enterprises by reviewing the development process of the K company's market analysis and sales management system. We hope that this paper suggest valuable guideline to companies that consider introducing or constructing a GIS information system.

Evaluation of Setup Uncertainty on the CTV Dose and Setup Margin Using Monte Carlo Simulation (몬테칼로 전산모사를 이용한 셋업오차가 임상표적체적에 전달되는 선량과 셋업마진에 대하여 미치는 영향 평가)

  • Cho, Il-Sung;Kwark, Jung-Won;Cho, Byung-Chul;Kim, Jong-Hoon;Ahn, Seung-Do;Park, Sung-Ho
    • Progress in Medical Physics
    • /
    • v.23 no.2
    • /
    • pp.81-90
    • /
    • 2012
  • The effect of setup uncertainties on CTV dose and the correlation between setup uncertainties and setup margin were evaluated by Monte Carlo based numerical simulation. Patient specific information of IMRT treatment plan for rectal cancer designed on the VARIAN Eclipse planning system was utilized for the Monte Carlo simulation program including the planned dose distribution and tumor volume information of a rectal cancer patient. The simulation program was developed for the purpose of the study on Linux environment using open source packages, GNU C++ and ROOT data analysis framework. All misalignments of patient setup were assumed to follow the central limit theorem. Thus systematic and random errors were generated according to the gaussian statistics with a given standard deviation as simulation input parameter. After the setup error simulations, the change of dose in CTV volume was analyzed with the simulation result. In order to verify the conventional margin recipe, the correlation between setup error and setup margin was compared with the margin formula developed on three dimensional conformal radiation therapy. The simulation was performed total 2,000 times for each simulation input of systematic and random errors independently. The size of standard deviation for generating patient setup errors was changed from 1 mm to 10 mm with 1 mm step. In case for the systematic error the minimum dose on CTV $D_{min}^{stat{\cdot}}$ was decreased from 100.4 to 72.50% and the mean dose $\bar{D}_{syst{\cdot}}$ was decreased from 100.45% to 97.88%. However the standard deviation of dose distribution in CTV volume was increased from 0.02% to 3.33%. The effect of random error gave the same result of a reduction of mean and minimum dose to CTV volume. It was found that the minimum dose on CTV volume $D_{min}^{rand{\cdot}}$ was reduced from 100.45% to 94.80% and the mean dose to CTV $\bar{D}_{rand{\cdot}}$ was decreased from 100.46% to 97.87%. Like systematic error, the standard deviation of CTV dose ${\Delta}D_{rand}$ was increased from 0.01% to 0.63%. After calculating a size of margin for each systematic and random error the "population ratio" was introduced and applied to verify margin recipe. It was found that the conventional margin formula satisfy margin object on IMRT treatment for rectal cancer. It is considered that the developed Monte-carlo based simulation program might be useful to study for patient setup error and dose coverage in CTV volume due to variations of margin size and setup error.

The Intelligent Determination Model of Audience Emotion for Implementing Personalized Exhibition (개인화 전시 서비스 구현을 위한 지능형 관객 감정 판단 모형)

  • Jung, Min-Kyu;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.39-57
    • /
    • 2012
  • Recently, due to the introduction of high-tech equipment in interactive exhibits, many people's attention has been concentrated on Interactive exhibits that can double the exhibition effect through the interaction with the audience. In addition, it is also possible to measure a variety of audience reaction in the interactive exhibition. Among various audience reactions, this research uses the change of the facial features that can be collected in an interactive exhibition space. This research develops an artificial neural network-based prediction model to predict the response of the audience by measuring the change of the facial features when the audience is given stimulation from the non-excited state. To present the emotion state of the audience, this research uses a Valence-Arousal model. So, this research suggests an overall framework composed of the following six steps. The first step is a step of collecting data for modeling. The data was collected from people participated in the 2012 Seoul DMC Culture Open, and the collected data was used for the experiments. The second step extracts 64 facial features from the collected data and compensates the facial feature values. The third step generates independent and dependent variables of an artificial neural network model. The fourth step extracts the independent variable that affects the dependent variable using the statistical technique. The fifth step builds an artificial neural network model and performs a learning process using train set and test set. Finally the last sixth step is to validate the prediction performance of artificial neural network model using the validation data set. The proposed model is compared with statistical predictive model to see whether it had better performance or not. As a result, although the data set in this experiment had much noise, the proposed model showed better results when the model was compared with multiple regression analysis model. If the prediction model of audience reaction was used in the real exhibition, it will be able to provide countermeasures and services appropriate to the audience's reaction viewing the exhibits. Specifically, if the arousal of audience about Exhibits is low, Action to increase arousal of the audience will be taken. For instance, we recommend the audience another preferred contents or using a light or sound to focus on these exhibits. In other words, when planning future exhibitions, planning the exhibition to satisfy various audience preferences would be possible. And it is expected to foster a personalized environment to concentrate on the exhibits. But, the proposed model in this research still shows the low prediction accuracy. The cause is in some parts as follows : First, the data covers diverse visitors of real exhibitions, so it was difficult to control the optimized experimental environment. So, the collected data has much noise, and it would results a lower accuracy. In further research, the data collection will be conducted in a more optimized experimental environment. The further research to increase the accuracy of the predictions of the model will be conducted. Second, using changes of facial expression only is thought to be not enough to extract audience emotions. If facial expression is combined with other responses, such as the sound, audience behavior, it would result a better result.

An Importance and Satisfaction Analysis for Improvement Efficiency Use of Waterfront - A Focus on the Waterfront Analysis for Domestic and Foreign Dragon Boat Festival - (친수공간 이용효율성 개선을 위한 중요도·만족도 분석 - 국내·외 드래곤 보트 페스티벌을 위한 친수공간 사례로 -)

  • An, Byung-chul
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.44 no.4
    • /
    • pp.86-99
    • /
    • 2016
  • This study was for analyzing the external environment and internal space structure and improving the way of use efficiency in waterfront through the Dragon boat festival to utilize waterfront actively. Through from the four target area, Hongkong, Busan, Incheon and Daejeon, this study was for an importance and satisfaction analysis for users about the element effect on the waterfront use efficiency and the contribution to cultural contents revitalization of waterfront by giving basic data. The result is as follows. First, in the importance analysis about 12 items, modern cultural infra around the waterfront was ranked highest, 8.26 and waterfront landscape, square & openspaces, convenience facilities, transport, green area, quality of viewing space, historic resources, pedestrian, suitability of width, wave, depth, water quality, berth & mooring were ranked in descending order. Second, waterfront landscape was interpreted by rather the external environmental impact according to city size than the matter of spatial structure in target area and judged as an important factor effect on site selection for waterfront. In the analysis of waterfront landscape, the reason of the high satisfaction about domestic target area was that riverside parks were recently made considering their waterfront activities. Viewing space was major infra where people could experience the pleasant waterfront and watch dynamic water leisure sports like Dragon boat three dimensionally and was thought to be improved for the use efficiency. Third, tourism resources were very important element that affect the use efficiency of waterfront, so waterfront users react sensitively to modern tourism resources rather than to historic resources. This meant that tourism infrastructure for shopping and leisure of the young affected the use efficiency of waterfront, so Hongkong and Busan were in a better position in terms of using waterfront that was near the tourism infrastructure. Fourth, in the analysis of traffic accessibility, both Hongkong and Busan were high evaluated in terms of excellent traffic accessibility by subway. Daejeon was low rated in terms of the satisfaction of use efficiency, because of the relative lower place awareness compared with transportation infrastructure. In Hongkong, waterfront was connected with downtown and in Busan, housing complex and shopping centers were located in the place for users in an easily accessible on foot, so the satisfaction was high-pitched. Finally, in the importance of water surface width and the analysis of satisfaction, except Incheon, all the three were over 200m in width of water surface and this meant the surface width above certain level was interpreted to interrupt the concentration of enjoying the water leisure sports. In the analysis of surface condition such as water quality, water depth and wave, through a survey, Busan had a problem with water quality and Gapcheon in Daejeon had a problem with optimal water depth by the festival participants.

Change Acceptable In-Depth Searching in LOD Cloud for Efficient Knowledge Expansion (효과적인 지식확장을 위한 LOD 클라우드에서의 변화수용적 심층검색)

  • Kim, Kwangmin;Sohn, Yonglak
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.171-193
    • /
    • 2018
  • LOD(Linked Open Data) cloud is a practical implementation of semantic web. We suggested a new method that provides identity links conveniently in LOD cloud. It also allows changes in LOD to be reflected to searching results without any omissions. LOD provides detail descriptions of entities to public in RDF triple form. RDF triple is composed of subject, predicates, and objects and presents detail description for an entity. Links in LOD cloud, named identity links, are realized by asserting entities of different RDF triples to be identical. Currently, the identity link is provided with creating a link triple explicitly in which associates its subject and object with source and target entities. Link triples are appended to LOD. With identity links, a knowledge achieves from an LOD can be expanded with different knowledge from different LODs. The goal of LOD cloud is providing opportunity of knowledge expansion to users. Appending link triples to LOD, however, has serious difficulties in discovering identity links between entities one by one notwithstanding the enormous scale of LOD. Newly added entities cannot be reflected to searching results until identity links heading for them are serialized and published to LOD cloud. Instead of creating enormous identity links, we propose LOD to prepare its own link policy. The link policy specifies a set of target LODs to link and constraints necessary to discover identity links to entities on target LODs. On searching, it becomes possible to access newly added entities and reflect them to searching results without any omissions by referencing the link policies. Link policy specifies a set of predicate pairs for discovering identity between associated entities in source and target LODs. For the link policy specification, we have suggested a set of vocabularies that conform to RDFS and OWL. Identity between entities is evaluated in accordance with a similarity of the source and the target entities' objects which have been associated with the predicates' pair in the link policy. We implemented a system "Change Acceptable In-Depth Searching System(CAIDS)". With CAIDS, user's searching request starts from depth_0 LOD, i.e. surface searching. Referencing the link policies of LODs, CAIDS proceeds in-depth searching, next LODs of next depths. To supplement identity links derived from the link policies, CAIDS uses explicit link triples as well. Following the identity links, CAIDS's in-depth searching progresses. Content of an entity obtained from depth_0 LOD expands with the contents of entities of other LODs which have been discovered to be identical to depth_0 LOD entity. Expanding content of depth_0 LOD entity without user's cognition of such other LODs is the implementation of knowledge expansion. It is the goal of LOD cloud. The more identity links in LOD cloud, the wider content expansions in LOD cloud. We have suggested a new way to create identity links abundantly and supply them to LOD cloud. Experiments on CAIDS performed against DBpedia LODs of Korea, France, Italy, Spain, and Portugal. They present that CAIDS provides appropriate expansion ratio and inclusion ratio as long as degree of similarity between source and target objects is 0.8 ~ 0.9. Expansion ratio, for each depth, depicts the ratio of the entities discovered at the depth to the entities of depth_0 LOD. For each depth, inclusion ratio illustrates the ratio of the entities discovered only with explicit links to the entities discovered only with link policies. In cases of similarity degrees with under 0.8, expansion becomes excessive and thus contents become distorted. Similarity degree of 0.8 ~ 0.9 provides appropriate amount of RDF triples searched as well. Experiments have evaluated confidence degree of contents which have been expanded in accordance with in-depth searching. Confidence degree of content is directly coupled with identity ratio of an entity, which means the degree of identity to the entity of depth_0 LOD. Identity ratio of an entity is obtained by multiplying source LOD's confidence and source entity's identity ratio. By tracing the identity links in advance, LOD's confidence is evaluated in accordance with the amount of identity links incoming to the entities in the LOD. While evaluating the identity ratio, concept of identity agreement, which means that multiple identity links head to a common entity, has been considered. With the identity agreement concept, experimental results present that identity ratio decreases as depth deepens, but rebounds as the depth deepens more. For each entity, as the number of identity links increases, identity ratio rebounds early and reaches at 1 finally. We found out that more than 8 identity links for each entity would lead users to give their confidence to the contents expanded. Link policy based in-depth searching method, we proposed, is expected to contribute to abundant identity links provisions to LOD cloud.