Search | Korea Science

Shannon's Information Theory and Document Indexing (Shannon의 정보이론과 문헌정보)

Chung Young Mee
- Journal of the Korean Society for Library and Information Science
- /
- v.6
- /
- pp.87-103
- /
- 1979
Information storage and retrieval is a part of general communication process. In the Shannon's information theory, information contained in a message is a measure of -uncertainty about information source and the amount of information is measured by entropy. Indexing is a process of reducing entropy of information source since document collection is divided into many smaller groups according to the subjects documents deal with. Significant concepts contained in every document are mapped into the set of all sets of index terms. Thus index itself is formed by paired sets of index terms and documents. Without indexing the entropy of document collection consisting of N documents is $log_2\;N$, whereas the average entropy of smaller groups $(W_1,\;W_2,...W_m)$ is as small $(as\;(\sum\limits^m_{i=1}\;H(W_i))/m$. Retrieval efficiency is a measure of information system's performance, which is largely affected by goodness of index. If all and only documents evaluated relevant to user's query can be retrieved, the information system is said $100\%$ efficient. Document file W may be potentially classified into two sets of relevant documents and non-relevant documents to a specific query. After retrieval, the document file W' is reclassified into four sets of relevant-retrieved, relevant-not retrieved, non-relevant-retrieved and non-relevant-not retrieved. It is shown in the paper that the difference in two entropies of document file Wand document file W' is a proper measure of retrieval efficiency.
PDF

An Analysis on the Factors Affectingy Online Search Effect (온라인 정보탐색의 효과변인 분석)

Kim Sun-Ho
- Journal of the Korean Society for Library and Information Science
- /
- v.22
- /
- pp.361-396
- /
- 1992
The purpose of this study is to verify the correlations between the amount of the online searcher's search experience and their search effect. In order to achieve this purpose, the 28 online searchers working at the chosen libraries and information centers have participated in the study as subjects. The subjects have been classified into the two types of cognitive style by Group Embedded Figure Test. As the result of the GEFT, two groups have been identified: the 15 Field Independance ( FI ) searchers and the 13 Field Dependance ( FD ) searchers. The subject's search experience consists of the 3 elements: disciplinary, training, and working experience. In order to get the data of these empirical elements, a questionnaire have been sent to the 28 subjects. An online searching request form prepared by a practical user was sent to all subjects, who conducted searches of the oversea databases through Dialog to retrieve what was requested. The resultant outcomes were collected and sent back to the user to evaluate relevance and pertinence of the search effect by the individual. In this study, the search effect has been divide into relevance and pertinence. The relevance has been then subdivided into the 3 elements : the number of the relevant documents, recall ratio, and the cost per a relevant document. The relevance has been subdivided into the 3 elements: the number of the pertinent documents, utility ratio, and the cost per a pertinent document. The correlations between the 3 elements of the subject's experience and the 6 elements of the search effect has been analysed in the FI and in the FD searchers separately. At the standard of the 0.01 significance level, findings and conclusions made in the study are summarised as follows : 1. There are strong correlations between the amount of training and the recall ratio, the number of the pertinent documents, and the utility ratio on the part of FI searchers. 2. There are strong correlations between the amount of working experience and the number of the relevant documents, the recall ratio on the part of FD searchers. However, there is also a significant converse correlation between the amount of working experience and the search cost per a pertinent document on the part of FD searchers. 3. The amount of working experience has stronger correlations with the number of the pertinent documents and the utility ratio on the part of FD searchers than the amount of training. 4. There is a strong correlation between the amount of training and the pertinence on both part of FI and FD searchers.
PDF

An Analysis of the Effect of an Ontology-Based Information Searching Model as a Supplementary Learning Tool (학습 보조 도구로서 온톨로지 검색 모델의 효과 분석)

Choi, Sook-Young
- The Journal of Korean Association of Computer Education
- /
- v.14 no.1
- /
- pp.159-168
- /
- 2011
This study analyzed whether the ontology-based information-searching model affected the ability of students to effectively search for meaningful information to carry out their projects. The experiment results illustrated that the amount of relevant information sought by the ontology-based information retrieval (OIR) method was significantly greater than that of the existing information retrieval (EIR) method. In addition, the relevance rate of the bookmarked documents sought by the OIR method was significantly greater than that of the EIR method. Interviews showed that the OIR model was helpful for students to effectively find information and thus, it helped them to complete the project more easily. Furthermore, the OIR model was beneficial for them to understand the subordinate concepts and their relationships for an important learning concept. The results of this study indicate that the OIR model could be used as a supplementary learning tool for project-based learning.
PDF

Merchandise Searching Interface using Color Information (색상정보를 이용한 상품검색 인터페이스)

Yoo, Eun-Kyung;Kang, Ki-Hyun;Yun, Yong-In;Choi, Jong-Soo
- 한국HCI학회:학술대회논문집
- /
- 2008.02a
- /
- pp.722-727
- /
- 2008
As computer technology and internet industrial are growing, we can buy all kinds of merchandise very easily, However. To find relevant merchandise what we want is very difficult process in various internet mall and greate amount of merchandises. Searching merchandise using some keyword makes a lot of list in limited category, but we can not sure that searching results are relevant or not. For this reason, we proposed a interface using color information of goods that is able to search merchandise effectively.
PDF

Factors Influencing the Knowledge Adoption of Mobile Game Developers in Online Communities: Focusing on the HSM and Data Quality Framework

Jong-Won Park;Changsok Yoo;Sung-Byung Yang
- Asia pacific journal of information systems
- /
- v.30 no.2
- /
- pp.420-438
- /
- 2020
Recently, with the advance of the wireless Internet access via mobile devices, a myriad of game development companies have forayed into the mobile game market, leading to intense competition. To survive in this fierce competition, mobile game developers often try to get a grasp of the rapidly changing needs of their customers by operating their own official communities where game users freely leave their requests, suggestions, and ideas relevant to focal games. Based on the heuristic-systematic model (HSM) and the data quality (DQ) framework, this study derives key content, non-content, and hybrid cues that can be utilized when game developers accept suggested postings in these online communities. The results of hierarchical multiple regression analysis show that relevancy, timeliness, amount of writing, and the number of comments are positively associated with mobile game developers' knowledge adoption. In addition, title attractiveness mitigates the relationship between amount of writing/the number of comments and knowledge adoption.
https://doi.org/10.14329/apjis.2020.30.2.420 인용 PDF

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

Seong, NohYoon;Nam, Kihwan
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.1-17
- /
- 2019
Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.
https://doi.org/10.13088/jiis.2019.25.3.001 인용 PDF KSCI

A Neuro-Fuzzy Inference System for Sensor Failure Detection Using Wavelet Denoising, PCA and SPRT

Na, Man-Gyun
- Nuclear Engineering and Technology
- /
- v.33 no.5
- /
- pp.483-497
- /
- 2001
In this work, a neuro-fuzzy inference system combined with the wavelet denoising, PCA (principal component analysis) and SPRT (sequential probability ratio test) methods is developed to detect the relevant sensor failure using other sensor signals. The wavelet denoising technique is applied to remove noise components in input signals into the neuro-fuzzy system The PCA is used to reduce the dimension of an input space without losing a significant amount of information. The PCA makes easy the selection of the input signals into the neuro-fuzzy system. Also, a lower dimensional input space usually reduces the time necessary to train a neuro-fuzzy system. The parameters of the neuro-fuzzy inference system which estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The residuals between the estimated signals and the measured signals are used to detect whether the sensors are failed or not. The SPRT is used in this failure detection algorithm. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level and the hot-leg flowrate sensors in pressurized water reactors.
PDF

A Knowledge-Based Intelligent Information Agent for Animal Domain (동물 영역 지식 기반의 지능형 정보 에이전트)

이용현;오정욱;변영태
- Korean Journal of Cognitive Science
- /
- v.10 no.1
- /
- pp.67-78
- /
- 1999
Information providers on WWW have been rapidly increasing, and they provide a vast amount of information in various fields, Because of this reason, it becomes hard for users to get the information they want. Although there are several search engines that help users with the keyword matching methods, it is not easy to find suitable keywords. In order to solve these problems with a specific domain, we propose an intelligent information agent(HHA-la : HongIk Information Agent) that converts user's q queries to forms including related domain words in order to represent user's intention as much as it can and provides the necessary information of the domain to users. HHA-la h has an ontological knowledge base of animal domain, supplies necessary information for queries from users and other agents, and provides relevant web page information. One of system components is a WebDB which indexes web pages relevant to the animal domain. The system also supplies new operators by which users can represent their thought more clearly, and has a learning mechanism using accumulated results and user feedback to behave more intelligently, We implement the system and show the effectiveness of the information agent by presenting experiment results in this paper.
PDF

A Differential Data Replicator in Distributed Environments

Lee, Wookey;Park, Jooseok;Sukho Kang
- The Journal of Information Technology and Database
- /
- v.3 no.2
- /
- pp.3-24
- /
- 1996
In this paper a data replicator scheme with a distributed join architecture is suggested with its cost functions and the performance results. The contribution of this scheme is not only minimizing the number of base relation locks in distributed database tables but also reducing the remote transmission amount remarkably, which will be able to embellish the distributed databse system practical. The differential files that are derived from the active log of the DBMS are mainly forcing the scheme to reduce the number of base relation locks. The amount of transportation between relevant sites could be curtailed by the tuple reduction procedures. Then we prescribe an algorithm of data replicator with its cost function and show the performance results compared with the semi-join scheme in their distributed environments.
PDF

A Context-Awareness Modeling User Profile Construction Method for Personalized Information Retrieval System

Kim, Jee Hyun;Gao, Qian;Cho, Young Im
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.14 no.2
- /
- pp.122-129
- /
- 2014
Effective information gathering and retrieval of the most relevant web documents on the topic of interest is difficult due to the large amount of information that exists in various formats. Current information gathering and retrieval techniques are unable to exploit semantic knowledge within documents in the "big data" environment; therefore, they cannot provide precise answers to specific questions. Existing commercial big data analytic platforms are restricted to a single data type; moreover, different big data analytic platforms are effective at processing different data types. Therefore, the development of a common big data platform that is suitable for efficiently processing various data types is needed. Furthermore, users often possess more than one intelligent device. It is therefore important to find an efficient preference profile construction approach to record the user context and personalized applications. In this way, user needs can be tailored according to the user's dynamic interests by tracking all devices owned by the user.
https://doi.org/10.5391/IJFIS.2014.14.2.122 인용 PDF KSCI

Search Result 140, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)