• Title/Summary/Keyword: Contents-based retrieval

Search Result 367, Processing Time 0.021 seconds

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

A Study of Intelligent Recommendation System based on Naive Bayes Text Classification and Collaborative Filtering (나이브베이즈 분류모델과 협업필터링 기반 지능형 학술논문 추천시스템 연구)

  • Lee, Sang-Gi;Lee, Byeong-Seop;Bak, Byeong-Yong;Hwang, Hye-Kyong
    • Journal of Information Management
    • /
    • v.41 no.4
    • /
    • pp.227-249
    • /
    • 2010
  • Scholarly information has increased tremendously according to the development of IT, especially the Internet. However, simultaneously, people have to spend more time and exert more effort because of information overload. There have been many research efforts in the field of expert systems, data mining, and information retrieval, concerning a system that recommends user-expected information items through presumption. Recently, the hybrid system combining a content-based recommendation system and collaborative filtering or combining recommendation systems in other domains has been developed. In this paper we resolved the problem of the current recommendation system and suggested a new system combining collaborative filtering and Naive Bayes Classification. In this way, we resolved the over-specialization problem through collaborative filtering and lack of assessment information or recommendation of new contents through Naive Bayes Classification. For verification, we applied the new model in NDSL's paper service of KISTI, especially papers from journals about Sitology and Electronics, and witnessed high satisfaction from 4 experimental participants.

A Study on the Current Status of National Library of Korea Subject Headings List through Utilization Analysis of Subject Headings (주제명 활용 분석을 통한 국립중앙도서관 주제명표목표의 현황 연구)

  • HyeKyung Lee;Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.2
    • /
    • pp.157-182
    • /
    • 2023
  • This study analyzed the structure and utilization of subject headings in the National Library of Korea Subject Headings List (NLSH) based on an analysis of subject headings assigned to 1,218,867 national bibliographies from 2003 to 2022. The findings of the study are as follows: Firstly, among all subject headings in the NLSH, there were 257,103 preferred terms, accounting for 50.2% of the total terms. Foreign language terms constituted 33% (169,466), while non-preferred terms comprised 12% (61,442). Among the preferred terms, 57,312 subject headings were used, accounting for 22.3%. However, it was observed that 54.7% (31,351) of these subject headings were assigned less than 5 times, indicating that only a small number of subject headings were frequently utilized. Secondly, the frequency of relationship indicators appeared in the order of RT, BT, and NT. The NLSH consisted of 12,602 top-level subject headings and 143,704 lowest-level subject headings, with a maximum depth of 17 levels. Thirdly, on average, 1.72 subject headings were assigned per bibliographic record. The number of subject headings assigned and the depth of the hierarchy increased for materials with more specific contents. Recent bibliographic records have been assigned more subject headings and deeper into the hierarchy of the NLSH. It was also found that the number of subject headings assigned per bibliography varied depending on the main class of KDC. Based on the findings, it is recommended to evaluate the coverage of terms in the NLSH, reorganize hierarchical relationships and depth of subject headings, and enhance the development of subdivisions within the NLSH.

A Case Study on Implementation of the Shipping Market Information Service System (해운시황정보서비스시스템 구현 사례연구)

  • Lee, Seokyong;Jeong, Myounghwan
    • Journal of Korea Port Economic Association
    • /
    • v.29 no.3
    • /
    • pp.73-94
    • /
    • 2013
  • The necessity of shipping market information services has been on the rise which emphasizes the relevance of transaction information and market information to parties both in and outside the shipping industry. However, previous related researches have been restricted to explorations limited by the offerings of existing shipping market information providers. Users today require effective information, an efficient contents management system, interfacing to help the information provider, graphing and spread sheets to facilitate and present the analyzed information through diverse formats, and reliable web and mobile services to provide information effectively with limited human resources. As a first step, service information has to be defined, so that it takes into account user utility, information retrieval and data development. Second, benchmark information and services must be provided from leading shipbrokers and research institutes. Third, a review of the latest technical trends is required to identify the most suitable technologies for servicing shipping market information. Finally, analysis is required on the implementation of a system with selected technologies, as well as the development of channels to post information which have been analyzed by users. Such a process would enable the continual redefinition of the shipping market information users actively need. The application of an X-Internet based WCMS, with a single-window dashboard providing user-customized information, and used to obtain and manage processes, add spread sheets to sustain calculations using the latest information, graph results, and to input additional information following predefined rules. Access to data and use of the system would require agreement that the system will incorporate user data and user-analyzed information into the market report, web portal, and hybrid app to provide current shipping market information appropriately and accurately to service users.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

  • Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.137-152
    • /
    • 2012
  • In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.

A Study of Sound Expression in Webtoon (웹툰의 사운드 표현에 관한 연구)

  • Mok, Hae Jung
    • Cartoon and Animation Studies
    • /
    • s.36
    • /
    • pp.469-491
    • /
    • 2014
  • Webtoon has developed the method that makes it possible to express sound visually. Also we can also hear sound in webtoon through the development of web technology. It is natural that we analyze the sound that we can hear, but we can also analyze the sound that we can not hear. This study is based on 'dual code' in cognitive psychology. Cartoonists can make visual expression on the basis of auditive impression and memory, and readers can recall the sound through the process of memory and memory-retrieval. This study analyzes both audible sound and inaudable sound. Concise analysis owes the method to film sound theory. Three main factor, Volume, pitch, and tone are recognized by frequency in acoustics. On the other hand they are expressed by the thickness and site of line and image of sound source. The visual expression of in screen sound and off screen sound is related to the frame of comics. Generally the outside of frame means off sound, but some off sound is in the frame. In addition, horror comics use much sound for the effect of genre like horror film. When analyzing comics sound using this kinds of the method film sound analysis, we can find that webtoon has developed creative expression method comparing with simple ones of early comics. Especially arranging frames and expressing sound following and vertical moving are new ones in webtoon. Also types and arrangement of frame has been varied. BGM is the first in using audible sound and recently BGM composed mixing sound effect is being used. In addition, the program which makes it possible for readers to hear sound according to scroll moving. Especially horror genre raise the genre effects using this technology. Various methods of visualizing sound are being created, and the change shows that webtoon could be the model of convergence in contents.