• Title/Summary/Keyword: blog types

Search Result 47, Processing Time 0.027 seconds

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

  • Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.137-152
    • /
    • 2012
  • In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.

Identifying Landscape Perceptions of Visitors' to the Taean Coast National Park Using Social Media Data - Focused on Kkotji Beach, Sinduri Coastal Sand Dune, and Manlipo Beach - (소셜미디어 데이터를 활용한 태안해안국립공원 방문객의 경관인식 파악 - 꽃지해수욕장·신두리해안사구·만리포해수욕장을 대상으로 -)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.5
    • /
    • pp.10-21
    • /
    • 2018
  • This study used text mining methodology to focus on the perceptions of the landscape embedded in text that users spontaneously uploaded to the "Taean Travel"blogpost. The study area is the Taean Coast National Park. Most of the places that are searched by 'Taean Travel' on the blog were located in the Taean Coast National Park. We conducted a network analysis on the top three places and extracted keywords related to the landscape. Finally, using a centrality and cohesion analysis, we derived landscape perceptions and the major characteristics of those landscapes. As a result of the study, it was possible to identify the main tourist places in Taean, the individual landscape experience, and the landscape perception in specific places. There were three different types of landscape characteristics: atmosphere-related keywords, which appeared in Kkotji Beach, symbolic image-related keywords appeared in Sinduri Coastal Sand Dune, and landscape objects-related appeared in Manlipo Beach. It can be inferred that the characteristics of these three places are perceived differently. Kkotji Beach is recognized as a place to appreciate a view the sunset and is a base for the Taean Coast National Park's trekking course. Sinduri Coastal Sand Dune is recognized as a place with unusual scenery, and is an ecologically valuable space. Finally, Manlipo Beach is adjacent to the Chunlipo Arboretum, which is often visited by tourists, and the beach itself is recognized as a place with an impressive appearance. Social media data is very useful because it can enable analysis of various types of contents that are not from an expert's point of view. In this study, we used social media data to analyze various aspects of how people perceive and enjoy landscapes by integrating various content, such as landscape objects, images, and activities. However, because social media data may be amplified or distorted by users' memories and perceptions, field surveys are needed to verify the results of this study.

Time Series Analysis of Park Use Behavior Utilizing Big Data - Targeting Olympic Park - (빅데이터를 활용한 공원 이용행태의 시계열분석 - 올림픽공원을 대상으로 -)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.2
    • /
    • pp.27-36
    • /
    • 2018
  • This study suggests the necessity of behavior analysis as changes to a park environment to reflect user desires can be implemented only by grasping the needs of park users. Online data (blog) were defined as the basic data of the study. After collecting data by 5 - year units, data mining was used to derive the characteristics of the time series behavior while the significance of the online data was verified through social network analysis. The results of the text mining analysis are as follows. First, primary results included 'walking', 'photography', 'riding bicycles'(inline, kickboard, etc.), and 'eating'. Second, in the early days of the collected data, active physical activity such as exercise was the main factor, but recent passive behavior such as eating, using a mobile phone, games, food and drinking coffee also appeared as a new behavior characteristic in parks. Third, the factors affecting the behavior of park users are the changes of various conditions of society such as internet development and a culture of expressing unique personalities and styles. Fourth, the special behaviors appearing at Olympic Park were derived from educational activities such as cultural activities including watching performances and history lessons. In conclusion, it has been shown that people's lifestyle changes and the behavior of a park are influenced by the changes of the various times rather than the original purpose that was intended during park planning and design. Therefore, it is necessary to create an environment tailored to users by considering the main behaviors and influencing factors of Olympic Park. Text mining used as an analytical method has the merit that past data can be collected. Therefore, it is possible to form analysis from a long-term viewpoint of behavior analysis as well as to measure new behavior and value with derived keywords. In addition, the validity of online data was verified through social network analysis to increase the legitimacy of research results. Research on more comprehensive behavior analysis should be carried out by diversifying the types of data collected later, and various methods for verifying the accuracy and reliability of large-volume data will be needed.

Analysis of Knowledge Community for Knowledge Creation and Use (지식 생성 및 활용을 위한 지식 커뮤니티 효과 분석)

  • Huh, Jun-Hyuk;Lee, Jung-Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.85-97
    • /
    • 2010
  • Internet communities are a typical space for knowledge creation and use on the Internet as people discuss their common interests within the internet communities. When we define 'Knowledge Communities' as internet communities that are related to knowledge creation and use, they are categorized into 4 different types such as 'Search Engine,' 'Open Communities,' 'Specialty Communities,' and 'Activity Communities.' Each type of knowledge community does not remain the same, for example. Rather, it changes with time and is also affected by the external business environment. Therefore, it is critical to develop processes for practical use of such changeable knowledge communities. Yet there is little research regarding a strategic framework for knowledge communities as a source of knowledge creation and use. The purposes of this study are (1) to find factors that can affect knowledge creation and use for each type of knowledge community and (2) to develop a strategic framework for practical use of the knowledge communities. Based on previous research, we found 7 factors that have considerable impacts on knowledge creation and use. They were 'Fitness,' 'Reliability,' 'Systemicity,' 'Richness,' 'Similarity,' 'Feedback,' and 'Understanding.' We created 30 different questions from each type of knowledge community. The questions included common sense, IT, business and hobbies, and were uniformly selected from various knowledge communities. Instead of using survey, we used these questions to ask users of the 4 representative web sites such as Google from Search Engine, NAVER Knowledge iN from Open Communities, SLRClub from Specialty Communities, and Wikipedia from Activity Communities. These 4 representative web sites were selected based on popularity (i.e., the 4 most popular sites in Korea). They were also among the 4 most frequently mentioned sitesin previous research. The answers of the 30 knowledge questions were collected and evaluated by the 11 IT experts who have been working for IT companies more than 3 years. When evaluating, the 11 experts used the above 7 knowledge factors as criteria. Using a stepwise linear regression for the evaluation of the 7 knowledge factors, we found that each factors affects differently knowledge creation and use for each type of knowledge community. The results of the stepwise linear regression analysis showed the relationship between 'Understanding' and other knowledge factors. The relationship was different regarding the type of knowledge community. The results indicated that 'Understanding' was significantly related to 'Reliability' at 'Search Engine type', to 'Fitness' at 'Open Community type', to 'Reliability' and 'Similarity' at 'Specialty Community type', and to 'Richness' and 'Similarity' at 'Activity Community type'. A strategic framework was created from the results of this study and such framework can be useful for knowledge communities that are not stable with time. For the success of knowledge community, the results of this study suggest that it is essential to ensure there are factors that can influence knowledge communities. It is also vital to reinforce each factor has its unique influence on related knowledge community. Thus, these changeable knowledge communities should be transformed into an adequate type with proper business strategies and objectives. They also should be progressed into a type that covers varioustypes of knowledge communities. For example, DCInside started from a small specialty community focusing on digital camera hardware and camerawork and then was transformed to an open community focusing on social issues through well-known photo galleries. NAVER started from a typical search engine and now covers an open community and a special community through additional web services such as NAVER knowledge iN, NAVER Cafe, and NAVER Blog. NAVER is currently competing withan activity community such as Wikipedia through the NAVER encyclopedia that provides similar services with NAVER encyclopedia's users as Wikipedia does. Finally, the results of this study provide meaningfully practical guidance for practitioners in that which type of knowledge community is most appropriate to the fluctuated business environment as knowledge community itself evolves with time.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.