• 제목/요약/키워드: Text Retrieval

Search Result 342, Processing Time 0.024 seconds

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Effective Picture Search in Lifelog Management Systems using Bluetooth Devices (라이프로그 관리 시스템에서 블루투스 장치를 이용한 효과적인 사진 검색 방법)

  • Chung, Eun-Ho;Lee, Ki-Yong;Kim, Myoung-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.383-391
    • /
    • 2010
  • A Lifelog management system provides users with services to store, manage, and search their life logs. This paper proposes a fully-automatic collecting method of real world social contacts and lifelog search engine using collected social contact information as keyword. Wireless short-distance network devices in mobile phones are used to detect social contacts of their users. Human-Bluetooth relationship matrix is built based on the frequency of a human-being and a Bluetooth device being observed at the same time. Results show that with 20% of social contact information out of full social contact information of the observation times used for calculation, 90% of human-Bluetooth relationship can be correctly acquired. A lifelog search-engine that takes human names as keyword is suggested which compares two vectors, a row of Human-Bluetooth matrix and a vector of Bluetooth list scanned while a lifelog was created, using vector information retrieval model. This search engine returns more lifelog than existing text-matching search engine and ranks the result unlike existing search-engine.

Analysis of Waterpark Status and Recognition Using Big Data Analysis (빅데이터 분석을 활용한 워터파크 현황 및 인식 분석)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.525-535
    • /
    • 2017
  • The purpose of this study aims to examine consumer perception and current status of water park. The Naver and Daum were used for data collection channels and the keyword 'water park' was used for data retrieval. The data analysis period was limited to the study period from January 1, 2015 to December 31, 2016 for a total of two years. First, as a result of the frequency analysis, hidden cameras, Lotte water park, arrests, suspects, gimhae were in top 5 in 2015, Lotte water park, swimming, summer, opening, admission ticket were in top 5 in 2016. Second, as a result of the connection degree central analysis, hidden camera, arrest, suspect, female, shower room were in top 5 in 2015, swimming, Lotte water park, summer and One Mount, admission ticket were in top 5 in 2016. Third, as a result of the N-GRAM network graph, the water park/hidden camera, the hidden camera/hidden camera, the suspect/arrest, the Gimhae/Lotte water park, water park/suspect were in top 5 in 2015, and One Mount/water park, Gimhae/Lotte water park, water park/admission ticket, water park/water park, water park/opening were in top 5 in 2016. Fourth, as a result of the CONCOR analysis, three groups in 2015 and two groups in 2016 were formed.

A Study on the Development of Electronic Resource Management System in a University Library (대학도서관 전자자원관리시스템(ERMS) 구축에 관한 연구)

  • Kim, Yong;Cho, Su-Kyeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.249-276
    • /
    • 2010
  • With the rapid growth and development of information technology and the Internet, the amount of information published in electronic formats such as video, audio, digitalized text, etc. and the number of users accessing information online to satisfy their information needs are growing at a tremendous rate. This study analyzes standardized components to construct ERMS and proposes a model of ERMS based on the result of the analysis. The main functions of ERMS in university libraries are: 1) ERMS can manage and control access information to various electronic resources, metadata, holdings, user resources. Also, ERMS can be compatible with an existing library system such as IR(Information Retrieval) system, linking system, or proxy system. 2) ERMS should completely be compatible with acquisition and cataloging systems for effective management and control of integrated information organization and library budget. 3) ERMS should systematically and effectively manage license information on electronic resources. 4) ERMS should provide ideal and effective environment for use and access control of electronic resources in a library and integrated tool to manage and control all of electronic resources. Additionally, this study points out the need to organize committee groups to establish standardized rules and collaborative management of electronic resources among university libraries like DLF ERMI and redesign organizations in a library and a librarian's job description.

Implementation and Performance Analysis of the Group Communication Using CORBA-ORB, JAVA-RMI and Socket (CORBA-ORB, JAVA-RMI, 소켓을 이용한 그룹 통신의 구현 및 성능 분석)

  • 한윤기;구용완
    • Journal of Internet Computing and Services
    • /
    • v.3 no.1
    • /
    • pp.81-90
    • /
    • 2002
  • Large-scale distributed applications based on Internet and client/server applications have to deal with series of problems. Load balancing, unpredictable communication delays, and networking failures can be the example of the series of problems. Therefore. sophisticated applications such as teleconferencing, video-on-demand, and concurrent software engineering require an abstracted group communication, CORBA does not address these paradigms adequately. It mainly deals with point-to-point communication and does not support the development of reliable applications that include predictable behavior in distributed systems. In this paper, we present our design, implementation and performance analysis of the group communication using the CORBA-ORB. JAVA-RML and Socket based on distributed computing Performance analysis will be estimated latency-lime according to object increment, in case of group communication using ORB of CORBA the average is 14.5172msec, in case of group communication using RMI of Java the average is 21.4085msec, in case of group communication using socket the average is becoming 18.0714msec. Each group communication using multicast and UDP can be estimated 0.2735msec and 0.2157msec. The performance of the CORBA-ORB group communication is increased because of the increased object by the result of this research. This study can be applied to the fault-tolerant client/server system, group-ware. text retrieval system, and financial information systems.

  • PDF

A Study of Intelligent Recommendation System based on Naive Bayes Text Classification and Collaborative Filtering (나이브베이즈 분류모델과 협업필터링 기반 지능형 학술논문 추천시스템 연구)

  • Lee, Sang-Gi;Lee, Byeong-Seop;Bak, Byeong-Yong;Hwang, Hye-Kyong
    • Journal of Information Management
    • /
    • v.41 no.4
    • /
    • pp.227-249
    • /
    • 2010
  • Scholarly information has increased tremendously according to the development of IT, especially the Internet. However, simultaneously, people have to spend more time and exert more effort because of information overload. There have been many research efforts in the field of expert systems, data mining, and information retrieval, concerning a system that recommends user-expected information items through presumption. Recently, the hybrid system combining a content-based recommendation system and collaborative filtering or combining recommendation systems in other domains has been developed. In this paper we resolved the problem of the current recommendation system and suggested a new system combining collaborative filtering and Naive Bayes Classification. In this way, we resolved the over-specialization problem through collaborative filtering and lack of assessment information or recommendation of new contents through Naive Bayes Classification. For verification, we applied the new model in NDSL's paper service of KISTI, especially papers from journals about Sitology and Electronics, and witnessed high satisfaction from 4 experimental participants.

Health Consciousness and Health Information Orientation on Health Information Searching Behaviors of Middle-Aged Adults (중년층의 건강관심도와 건강정보추구도가 인터넷 건강정보 검색행동에 미치는 영향)

  • Lee, Hawyoung;Oh, Sanghee
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.3
    • /
    • pp.73-99
    • /
    • 2021
  • The purpose of this study is to analyze the health information use experience of middle-aged people in their 40s and 50s and to observe and analyze their health information search behaviors according to health consciousness and health information orientation. This study uses Information Foraging Theory with the concept of information scents which leads users to detect and collect cues in information searching. Types and contents of information cues that middle-aged people use when searching for health information were investigated. Also, how their health consciousness and health information orientation affected using information cues were analyzed. Three methods of research were used; (1) pre-interviews, (2) search experiments, and (3) post-interviews. Thirty-two middle-aged people participated in the study. Their performance on health information searching was recorded and referred to in the post-interviews using a think-aloud protocol. Findings presented that middle-aged people's health consciousness and health information orientation affected the perception of information scents in health information search; those with high health consciousness and health information orientation consider the text made by the government office the most critical information cues. We believe findings from this study could be used for public libraries or non-profit institutions to understand middle-aged people's health information behaviors to design education programs for information retrieval considering users' health consciousness and health information orientation. Findings could also contribute to Internet portal site or health-related web site designers developing strategies for middle-aged users to access health information effectively.

Prediction of Music Generation on Time Series Using Bi-LSTM Model (Bi-LSTM 모델을 이용한 음악 생성 시계열 예측)

  • Kwangjin, Kim;Chilwoo, Lee
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.65-75
    • /
    • 2022
  • Deep learning is used as a creative tool that could overcome the limitations of existing analysis models and generate various types of results such as text, image, and music. In this paper, we propose a method necessary to preprocess audio data using the Niko's MIDI Pack sound source file as a data set and to generate music using Bi-LSTM. Based on the generated root note, the hidden layers are composed of multi-layers to create a new note suitable for the musical composition, and an attention mechanism is applied to the output gate of the decoder to apply the weight of the factors that affect the data input from the encoder. Setting variables such as loss function and optimization method are applied as parameters for improving the LSTM model. The proposed model is a multi-channel Bi-LSTM with attention that applies notes pitch generated from separating treble clef and bass clef, length of notes, rests, length of rests, and chords to improve the efficiency and prediction of MIDI deep learning process. The results of the learning generate a sound that matches the development of music scale distinct from noise, and we are aiming to contribute to generating a harmonistic stable music.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

A Study on the Research Trends in Library & Information Science in Korea using Topic Modeling (토픽모델링을 활용한 국내 문헌정보학 연구동향 분석)

  • Park, Ja-Hyun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.7-32
    • /
    • 2013
  • The goal of the present study is to identify the topic trend in the field of library and information science in Korea. To this end, we collected titles and s of the papers published in four major journals such as Journal of the Korean Society for information Management, Journal of the Korean Society for Library and Information Science, Journal of Korean Library and Information Science Society, and Journal of the Korean BIBLIA Society for library and Information Science during 1970 and 2012. After that, we applied the well-received topic modeling technique, Latent Dirichlet Allocation(LDA), to the collected data sets. The research findings of the study are as follows: 1) Comparison of the extracted topics by LDA with the subject headings of library and information science shows that there are several distinct sub-research domains strongly tied with the field. Those include library and society in the domain of "introduction to library and information science," professionalism, library and information policy in the domain of "library system," library evaluation in the domain of "library management," collection development and management, information service in the domain of "library service," services by library type, user training/information literacy, service evaluation, classification/cataloging/meta-data in the domain of "document organization," bibliometrics/digital libraries/user study/internet/expert system/information retrieval/information system in the domain of "information science," antique documents in the domain of "bibliography," books/publications in the domain of "publication," and archival study. The results indicate that among these sub-domains, information science and library services are two most focused domains. Second, we observe that there is the growing trend in the research topics such as service and evaluation by library type, internet, and meta-data, but the research topics such as book, classification, and cataloging reveal the declining trend. Third, analysis by journal show that in Journal of the Korean Society for information Management, information science related topics appear more frequently than library science related topics whereas library science related topics are more popular in the other three journals studied in this paper.