• Title/Summary/Keyword: Document searching

Search Result 170, Processing Time 0.025 seconds

The Personal Information Management Practices of the Graduates of the Department of Information Studies at Kuwait University

  • AlRukaibani, Bashaer;Chaudhry, Abdus Sattar
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.9 no.1
    • /
    • pp.19-42
    • /
    • 2019
  • This study examined activities involving the finding, organizing, managing, and re-finding of information by knowledge workers in Kuwait. This research also conducted a review of the tools needed for effective personal information management (PIM) and assessed perceptions about improving PIM through Internet use. Data were collected through semi-structured interviews conducted among 26 graduates of the Department of Information Studies at Kuwait University. These participants are currently employed in different sectors engaged in a variety of information-related activities. This study's findings indicated that participants gathered different types of information from a variety of sources. This information was stored using several devices and services, including desktop computers, shared drives, clouds, bookmarked websites, e-mail correspondence, and favorites lists. Participants organized information in personal folders according to categories such as subject/topic, time, project, document type, and geographical region. Preferred methods for information re-finding included searching by keyword and browsing through folders. Interviewees reported problems of information overload, fragmentation, and anxiety. Most were active in social media via mobile device, while some of them used Siri or Ask Google to retrieve information. Tools used for PIM included calendars, tasks, schedules, e-mail management tools, clouds, and social networking tools. Participants reported that the Internet helped with personal information management practices, but that some privacy issues arose in this context.

Design and Implementation of Web Crawler utilizing Unstructured data

  • Tanvir, Ahmed Md.;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.3
    • /
    • pp.374-385
    • /
    • 2019
  • A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.

A Literature Review of Aromatherapy Used in Stress Relief (스트레스 완화 목적의 아로마 요법에 관한 문헌고찰)

  • Kim, Hyeon-Jin;Jeong, Soo-Hyun;Jeong, Hye-In;Kim, Kyeong Han
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.25 no.2
    • /
    • pp.45-60
    • /
    • 2021
  • Objective : This study was aimed to review randomized controlled trials (RCTs) about whether aromatherapy relieves stress. Method : We searched document about criteria to use words like 'Aroma', 'Oil' and 'Stress'. The study included 24 RCTs which were selected by total 167 studies searched in Korean Journal by searching OASIS, ScienceON, KISS, RISS. Cases that cannot be performed alone are excluded. Results : We got 24 domestic standard documents. Of the 24 studies, 14 were for students, and 6 were for patients receiving hospital treatment. Among the 7 treatments, dry-inhalation was used 13 times, and necklace-inhalation was used 9 times. Of the 24 Studies, lavender oil was used 19 times and sweet orange was used 4 times. Among the 28 types of measuring instruments used, 10 related to the autonomic nervous system and 8 STAIs and VASs were used respectively. Conclusion : It was possible to conclude that aromatherapy was effective in relieving stress. Through further research, it is necessary to study effective oil mixing methods, methods for measuring subjective stress, multimodal intervention, and effective intervention periods.

Performance Improvement on Similar Texts Searching System for Massive Document Repository (대용량 문서 집합에서 유사문서 탐색 시스템의 성능 개선)

  • Park, Sun-Young;Cho, Hwan-Gue
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.413-416
    • /
    • 2010
  • 최근 발생한 수많은 표절 논란으로 인해 많은 유사 문서 탐색 시스템이 개발되어 사용되고 있다. 많은 시스템 중 내용기반 유사문서 탐색 시스템인 DeVAC은 대용량 문서 1:1간의 비교에서 빠른 성능을 보여주지만 수천~수만 개의 문서 집합에 대해서는 적절한 성능을 보여주지 못한다. 이를 해결하기 위해 전역 사전(Global Dictionary)을 이용한 전처리 방법이 고안되어 적용되었다. 이 전처리 방법을 통해 비교해야 할 문서쌍이 줄어들고 전체 시스템의 성능을 향상시킬 수 있다는 것은 밝혀졌으나, 전처리를 위해 발생하는 추가 비용에 대한 계측이 이루어지지 않았을 뿐 아니라 문서 쌍이 얼마나 감소하는지 측정한 실험에서도 언어 처리용 실험적 데이터(말뭉치)에 대한 실험이 대부분을 차지하였기 때문에 실제 데이터에 대해 어떤 성능을 보일지 정확히 예측할 수 없었다. 본 논문에서는 전체 시스템에서 전처리를 위해 필요한 모든 추가 비용을 측정하고, 데이터를 1.5Gb, 6263개의 문서로 이루어진 실존하는 문서 집합으로 구성하여 성능 향상 정도를 측정함으로써 실제 데이터에 대한 전처리 신뢰도를 예측하였다. 실험 결과 전처리 후 찾아낸 유사한 문서 쌍을 전처리를 하지 않을 경우의 80~89.3% 정도로 유지하면서 검사 시간을 기존의 10.8%~15.4% 수준으로 대폭 감소시킬 수 있었다.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Representation of ambiguous word in Latent Semantic Analysis (LSA모형에서 다의어 의미의 표상)

  • 이태헌;김청택
    • Korean Journal of Cognitive Science
    • /
    • v.15 no.2
    • /
    • pp.23-31
    • /
    • 2004
  • Latent Semantic Analysis (LSA Landauer & Dumais, 1997) is a technique to represent the meanings of words using co-occurrence information of words appearing in he same context, which is usually a sentence or a document. In LSA, a word is represented as a point in multidimensional space where each axis represents a context, and a word's meaning is determined by its frequency in each context. The space is reduced by singular value decomposition (SVD). The present study elaborates upon LSA for use of representation of ambiguous words. The proposed LSA applies rotation of axes in the document space which makes possible to interpret the meaning of un. A simulation study was conducted to illustrate the performance of LSA in representation of ambiguous words. In the simulation, first, the texts which contain an ambiguous word were extracted and LSA with rotation was performed. By comparing loading matrix, we categorized the texts according to meanings. The first meaning of an ambiguous wold was represented by LSA with the matrix excluding the vectors for the other meaning. The other meanings were also represented in the same way. The simulation showed that this way of representation of an ambiguous word can identify the meanings of the word. This result suggest that LSA with axis rotation can be applied to representation of ambiguous words. We discussed that the use of rotation makes it possible to represent multiple meanings of ambiguous words, and this technique can be applied in the area of web searching.

  • PDF

Keyword-based networked knowledge map expressing content relevance between knowledge (지식 간 내용적 연관성을 표현하는 키워드 기반 네트워크형 지식지도 개발)

  • Yoo, Keedong
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.119-134
    • /
    • 2018
  • A knowledge map as the taxonomy used in a knowledge repository should be structured to support and supplement knowledge activities of users who sequentially inquire and select knowledge for problem solving. The conventional knowledge map with a hierarchical structure has the advantage of systematically sorting out types and status of the knowledge to be managed, however it is not only irrelevant to knowledge user's process of cognition and utilization, but also incapable of supporting user's activity of querying and extracting knowledge. This study suggests a methodology for constructing a networked knowledge map that can support and reinforce the referential navigation, searching and selecting related and chained knowledge in term of contents, between knowledge. Regarding a keyword as the semantic information between knowledge, this research's networked knowledge map can be constructed by aggregating each set of knowledge links in an automated manner. Since a keyword has the meaning of representing contents of a document, documents with common keywords have a similarity in content, and therefore the keyword-based document networks plays the role of a map expressing interactions between related knowledge. In order to examine the feasibility of the proposed methodology, 50 research papers were randomly selected, and an exemplified networked knowledge map between them with content relevance was implemented using common keywords.

Analysis on Research Trends and Proposal for Standardization of Construction & Architectural Terms in Korea (국내 건설·건축용어 연구의 동향 분석 및 표준화 제안)

  • Park, Eunha;Jeon, Jinwoo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.5
    • /
    • pp.620-629
    • /
    • 2015
  • As the construction industry becomes bigger and more complicated, standardization of terms should be established between academic and industrial fields in order to accumulate and share information technology. The aim of this study is to investigate and analyze the research trends and actual usage of construction and architectural terms in Korea. For this purpose, we examined research related to construction and architectural terms by searching RISS up to August 2014. We also analyzed document types and contents of research by year. As a result, 130 research studies related to construction and architectural terms were searched. Of document types, glossary ranks the highest, followed by academic journal papers, master's theses and research reports. Research related to construction and architectural terms began in 1939, and was actively studied between the mid-1980s to the mid-1990s. Within the research, list and opinion of related construction and architectural terms are found the most frequently, followed by standardization, analysis, alteration, dictionary and wordbook, and search system of terms. Despite these efforts and research, standardization of terms has not yet been consolidated between academic and industrial fields. Therefor, we suggest six proposals in order to standardize the terms. This study is an attempt to see the trends and conditions of construction and architectural terms and to provide base-line data and an insight for future research.

Model Design and Applicability Analysis of Interactive Electronic Technical Manual for Planning Stage of Construction Projects (건설공사 기획단계 전자매뉴얼의 적용 모형 구성 및 효과 분석)

  • Kwak, Joong-Min;Kang, Leen-Seok
    • Land and Housing Review
    • /
    • v.12 no.2
    • /
    • pp.121-139
    • /
    • 2021
  • Technical documents in the construction field are changing from paper documents to electronic ones. As a result, the industry witnesses a trend of using portable electronic devices in searching or retrieving necessary information such as relevant regulations. Despite the improvement in the accessibility to general technical documents, a limitation is still found in accessing the electronic documents on the regulations. We see the barrier for field engineers to enhance their technical knowledge. One of major barriers is that videos, animations, and virtual reality information to enhance the visual understanding of technical content related to regulations are not linked. It is the interactive electronic technical manual (IETM) that can address such issues. The IETM is an electronic document system that enables real-time information acquisition while operating in the form of conversations with users by linking multimedia functions to document types such as specifications and guidelines. This study establishes a model of the IETM that can be operated in the planning stage of a construction project. The study also verifies its usability with a hypothetical case study. This study aims to improve the usability of the IETM in the construction project by analyzing the application effect of the IETM using the AHP technique.

The Effectiveness of Hierarchic Clustering on Query Results in OPAC (OPAC에서 탐색결과의 클러스터링에 관한 연구)

  • Ro, Jung-Soon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.1
    • /
    • pp.35-50
    • /
    • 2004
  • This study evaluated the applicability of the static hierarchic clustering model to clustering query results in OPAC. Two clustering methods(Between Average Linkage(BAL) and Complete Linkage(CL)) and two similarity coefficients(Dice and Jaccard) were tested on the query results retrieved from 16 title-based keyword searchings. The precision of optimal dusters was improved more than 100% compared with title-word searching. There was no difference between similarity coefficients but clustering methods in optimal cluster effectiveness. CL method is better in precision ratio but BAL is better in recall ratio at the optimal top-level and bottom-level clusters. However the differences are not significant except higher recall ratio of BAL at the top-level duster. Small number of clusters and long chain of hierarchy for optimal cluster resulted from BAL could not be desirable and efficient.