• Title/Summary/Keyword: Topic Feature

Search Result 108, Processing Time 0.029 seconds

Analyzing the Effect of Characteristics of Dictionary on the Accuracy of Document Classifiers (용어 사전의 특성이 문서 분류 정확도에 미치는 영향 연구)

  • Jung, Haegang;Kim, Namgyu
    • Management & Information Systems Review
    • /
    • v.37 no.4
    • /
    • pp.41-62
    • /
    • 2018
  • As the volume of unstructured data increases through various social media, Internet news articles, and blogs, the importance of text analysis and the studies are increasing. Since text analysis is mostly performed on a specific domain or topic, the importance of constructing and applying a domain-specific dictionary has been increased. The quality of dictionary has a direct impact on the results of the unstructured data analysis and it is much more important since it present a perspective of analysis. In the literature, most studies on text analysis has emphasized the importance of dictionaries to acquire clean and high quality results. However, unfortunately, a rigorous verification of the effects of dictionaries has not been studied, even if it is already known as the most essential factor of text analysis. In this paper, we generate three dictionaries in various ways from 39,800 news articles and analyze and verify the effect each dictionary on the accuracy of document classification by defining the concept of Intrinsic Rate. 1) A batch construction method which is building a dictionary based on the frequency of terms in the entire documents 2) A method of extracting the terms by category and integrating the terms 3) A method of extracting the features according to each category and integrating them. We compared accuracy of three artificial neural network-based document classifiers to evaluate the quality of dictionaries. As a result of the experiment, the accuracy tend to increase when the "Intrinsic Rate" is high and we found the possibility to improve accuracy of document classification by increasing the intrinsic rate of the dictionary.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

Study of Fashion Design Applying the Formative Beauty of Architectural Works by Antoni Gaudi (패턴 절개를 응용한 의상의 조형적 형태미의 표현 연구 - 안토니오 가우디 건축 작품 형태를 중심으로 -)

  • Shin, Hyo-Jung;Lee, Young-Min
    • The Research Journal of the Costume Culture
    • /
    • v.17 no.5
    • /
    • pp.849-865
    • /
    • 2009
  • This research is a study of fashion design that applied formative features of formal beauty of architecture into clothing design; we focused on Gaudi's architectural style as well as Art Nouveau style that became popular from the end of the 19th century to the beginning of the 20th century. We noted that in general the simple and flat features of cloth impose a limitation on the expression of formal features in clothing design, but a unique diversity of designs can be achieved, evoking a sense of freshness by an ideal combination of flat patterns and draping. The aim of this research is to present a possibility of extending the sphere of design expression by creating three-dimensional clothes with pattern-cutting skills and applications of three-dimensional patterns as well as flat patterns found in Gaudi's works of architecture that are distinguished in curvaceousness and formal beauty. As for the research method, we reviewed previous studies by making a close review of books, papers, the pictures and web sites related to this topic. We made our clothes on the basis of this theoretical consideration. We found the following points. First, by presenting a work of fashion inspired by architectural designs, we realized that formal beauty in architect can become a motive for clothing design in a broad scale by noting the formal images, decoration details, and formative features of architectural works. Second, the characteristic lines of Gaudi's architecture are suitable to be adapted for expressing the detailed lines of decoration in clothes. Third, we can express formative beauty in clothes by highlighting the variation of shapes and lines through various attempts of change in background pattern, even though there is a limitation in the availability of cloth material because we must choose pieces of cloth with right texture and thickness that can be cut and sewn appropriately to express formative beauty. Fourth, we confirmed that it was possible to create unique formative designs by a creative application of both flat and three-dimensional cutting.

  • PDF

Calculation of a Threshold for Decision of Similar Features in Different Spatial Data Sets (이종의 공간 데이터 셋에서 매칭 객체 판별을 위한 임계값 산출)

  • Kim, Jiyoung;Huh, Yong;Yu, Kiyun;Kim, Jung Ok
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.1
    • /
    • pp.23-28
    • /
    • 2013
  • The process of a feature matching for two different spatial data sets is similar to the process of classification as a binary class such as matching or non-matching. In this paper, we calculated a threshold by applying an equal error rate (EER) which is widely used in biometrics that classification is a main topic into spatial data sets. In a process of discriminating what's a matching or what's not, a precision and a recall is changed and a trade-off appears between these indexes because the number of matching pairs is changed when a threshold is changed progressively. This trade-off point is EER, that is, threshold. To the result of applying this method into training data, a threshold is estimated at 0.802 of a value of shape similarity. By applying the estimated threshold into test data, F-measure that is a evaluation index of matching method is highly value, 0.940. Therefore we confirmed that an accurate threshold is calculated by EER without person intervention and this is appropriate to matching different spatial data sets.

The differences in character design in China, Japan, and Korea : A can study of comic "The Monkey King" (만화<손오공>에 나타난 한·중·일 캐릭터디자인 특징)

  • Kim, Kang;Oh, Chigyu
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.235-238
    • /
    • 2009
  • The character design is part of a culture. That is, it is not just an object but it reflects the cultural tendency and context where it is visualized. It's potential as a cultural medium of communication become pervasive in the field. One of good sources of characters is the classics which represent the nation's culture and history. The topic of this presentation is to show how a character in the Buddhist story, the Monkey King, designed differently in different context. For the purpose, the animation on the Monkey King in China, Japan and Korea are reviewed and analyzed. The result shows that the same animation character has been designed in different way in different context and it reflects the cultural tendency of the country. For example, Koreans tend to emphasize the global feature of the character. In case of Chinese, however, the character designer emphasizes traditional value. Finally, the designer in Japan tries to put their cultural element in detailed part of the character that makes it appeal to the public.

  • PDF

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

  • Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.89-92
    • /
    • 2005
  • Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.

  • PDF

Analysis of Status and Features for Lecture-type Programs of the Public Libraries in Seoul (서울시 공공도서관 강좌교육 운영 현황 및 특징에 관한 분석)

  • Lee, Jeong-Mee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.25 no.3
    • /
    • pp.119-137
    • /
    • 2014
  • This study aims to evaluate status and features of lecture-type programs run by public libraries in Seoul. It is examined the current state and features of lecture-type programs of public libraries in Seoul for recent 6 years and also examined the topic, the target user specific trends and their changes based on this feature. The results showed that the number of lecture-type programs were decreased and, it was found that the programs were operated with a specific target users for each of them. The lecture-type programs were analyzed by subject categories and the characteristics were presented based on the subjects. It is concluded that careful planning and operations required for lecture-type programs since they could provide a good opportunity for the library with the potential users of it.

Design and Implementation of Motion-based Interaction in AR Game (증강현실 게임에서의 동작 기반 상호작용 설계 및 구현)

  • Park, Jong-Seung;Jeon, Young-Jun
    • Journal of Korea Game Society
    • /
    • v.9 no.5
    • /
    • pp.105-115
    • /
    • 2009
  • This article proposes a design and implementation methodology of a gesture-based interface for augmented reality games. The topic of gesture-based augmented reality games is a promising area in the immersive future games using human body motions. However, due to the instability of the current motion recognition technologies, most previous development processes have introduced many ad hoc methods to handle the shortcomings and, hence, the game architectures have become highly irregular and inefficient This article proposes an efficient development methodology for gesture-based augmented reality games through prototyping a table tennis game with a gesture interface. We also verify the applicability of the prototyping mechanism by implementing and demonstrating the augmented reality table tennis game. In the experiments, the implemented prototype has stably tracked real rackets to allow fast movements and interactions without delay.

  • PDF

An Analysis of Related Movie Information Using The Co-Word Method (동시출현단어분석을 이용한 연관영화정보 분석 연구)

  • Choi, Sanghee
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.4
    • /
    • pp.161-178
    • /
    • 2014
  • Recently, many information services allow users to collaborate to produce and use information. Sharing information is also important for users who have similar taste or interest. As various channels are available for users to share their experiences and knowledge, users' data have also been accumulated within the information services. This study collected movie lists made by users of IMDB service. Co-word analysis and ego-centered network analysis were adapted to discover relevant information for users who chose a specific movie. Three factors of movies including movie title, director and genre were used to present related movie information. Movie title is an effective feature to present related movies with various aspects such as theme or characters and the popularity of directors affects on identifying related directors. Genre is not useful to find related movies due to the complexity in the topic of a movie.