• Title/Summary/Keyword: Text Search

Search Result 549, Processing Time 0.025 seconds

Methodology Using Text Analysis for Packaging R&D Information Services on Pending National Issues (텍스트 분석을 활용한 국가 현안 대응 R&D 정보 패키징 방법론)

  • Hyun, Yoonjin;Han, Heejun;Choi, Heeseok;Park, Junhyung;Lee, Kyuha;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Information Technology Applications and Management
    • /
    • v.20 no.3_spc
    • /
    • pp.231-257
    • /
    • 2013
  • The recent rise in the unstructured data generated by social media has resulted in an increasing need to collect, store, search, analyze, and visualize it. These data cannot be managed effectively by using traditional data analysis methodologies because of their vast volume and unstructured nature. Therefore, many attempts are being made to analyze these unstructured data (e.g., text files and log files) by using commercial and noncommercial analytical tools. Especially, the attempt to discover meaningful knowledge by using text mining is being made in business and other areas such as politics, economics, and cultural studies. For instance, several studies have examined pending national issues by analyzing large volumes of texts on various social issues. However, it is difficult to create satisfactory information services that can identify R&D documents on specific national issues from among the various R&D resources. In other words, although users specify some words related to pending national issues as search keywords, they usually fail to retrieve the R&D information they are looking for. This is usually because of the discrepancy between the terms defining pending national issues and the corresponding terms used in R&D documents. We need a mediating logic to overcome this discrep 'ancy so that we can identify and package appropriate R&D information on specific pending national issues. In this paper, we use association analysis and social network analysis to devise a mediator for bridging the gap between the keywords defining pending national issues and those used in R&D documents. Further, we propose a methodology for packaging R&D information services for pending national issues by using the devised mediator. Finally, in order to evaluate the practical applicability of the proposed methodology, we apply it to the NTIS(National Science & Technology Information Service) system, and summarize the results in the case study section.

Narrative Inquiry on Student Teacher Searching for Identity as a Teacher (교사로서의 정체성을 형성해가는 교육실습생에 대한 내러티브 탐구)

  • Jin, Hyung Ran;Yoo, Tae Myung
    • Journal of Korean Home Economics Education Association
    • /
    • v.26 no.1
    • /
    • pp.81-99
    • /
    • 2014
  • Student teaching is equivalent to an egg just before oviposition. There is a growing acting voice that teaching profession is not necessarily required as the years go by. I developed a process that 55 student teachers search for their identity as a teacher during four-week student teaching program according to Clandinin and Connelly(2000)'s narrative inquiry. The procedure consisted of three stages such as access to the field, field text writing, and research text writing. The student teachers wrote journals by week to search for their identity as a teacher with a focus on what they observed in the field and what they were motivated by teachers and students. Free and truthful 220 stories conducted in a student teaching online cafe were collected as a field text. And the research text was reliving and retelling through poetic writing on each week's themes of exploration, growth, reflection, and pledge to complete the narrative inquiry. Student teachers, an absolute majority, including home economics student teachers aimed for the teaching profession and waited for their hatching.

  • PDF

Beach-Lifeguard Considerations for Individuals with Disabilities: A Literature Review (장애인을 위한 해양 라이프가드 고려사항: 문헌연구)

  • Kim, Jaehwa;Kim, Hyemin
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.8
    • /
    • pp.245-253
    • /
    • 2019
  • Beach lifeguards in Korea are unprepared to perform the rescue and safety management for individuals with disabilities. There is no lifeguard training that offers information regarding the rescue of individuals with disabilities. The purpose of the study was to conduct literature review and determine significant issues related to beach lifeguard and provide suggestions for lifeguard training programs and water safety for individuals with disabilities. Databases (i.e., CINAHL Plus with Full Text, ERIC, MEDLINE, SPORTDiscus with Full Text) were used to search research articles and organizational documents. To find relevant documents, search terms such as water safety, lifeguard, drown prevention were used. Data were content analyzed to identify key issues. Based on the literature review, five critical issues regarding rescue of individuals with disabilities, drown prevention, and water safety were drawn and discussed in the article.

A Study on Search Query Topics and Types using Topic Modeling and Principal Components Analysis (토픽모델링 및 주성분 분석 기반 검색 질의 유형 분류 연구)

  • Kang, Hyun-Ah;Lim, Heui-Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.6
    • /
    • pp.223-234
    • /
    • 2021
  • Recent advances in the 4th Industrial Revolution have accelerated the change of the shopping behavior from offline to online. Search queries show customers' information needs most intensively in online shopping. However, there are not many search query research in the field of search, and most of the prior research in the field of search query research has been studied on a limited topic and data-based basis based on researchers' qualitative judgment. To this end, this study defines the type of search query with data-based quantitative methodology by applying machine learning to search research query field to define the 15 topics of search query by conducting topic modeling based on search query and clicked document information. Furthermore, we present a new classification system of new search query types representing searching behavior characteristics by extracting key variables through principal component analysis and analyzing. The results of this study are expected to contribute to the establishment of effective search services and the development of search systems.

Identifying Similar Overseas Patent Using Word2Vec-Based Semantic Text Analytics (Word2Vec 학습을 통한 의미 기반 해외 유사 특허 검색 방안)

  • Paek, Minji;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.17 no.2
    • /
    • pp.129-142
    • /
    • 2018
  • Recently, the number of patent applications have been increasing rapidly every year as the importance of protecting intellectual property rights becomes more important. Patents must be inventive and have novelty. Especially, the novelty implies that the corresponding invention is not the same as the previous invention. To confirm the novelty, prior art search must be conducted before and after the application. The target of prior art search should include not only Korean patents but also foreign patents. Search of foreign patents should be supported by multilingual search techniques. However, a dictionary-based naive approach shows a limitation because some technical concepts are represented in different terms according to each nation. For example, a Korean term and a Japanese term may not be synonym even though they represent the same technical concept. In this paper, we propose a new method to map semantic similarity between technical terms in Korean patents and Japanese patents. To investigate different representations in each nation for the same technical concept, we identified and analyzed pairs of patents those are mutually connected with priority claim relationship. By performing an experiment with real-world data, we showed that our approach can reveal semantically similar technical terms in other language successfully.

An Efficient Keyword Search Method on RDF Data (RDF 데이타에 대한 효율적인 검색 기법)

  • Kim, Jin-Ha;Song, In-Chul;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.495-504
    • /
    • 2008
  • Recently, there has been much work on supporting keyword search not only for text documents, but a]so for structured data such as relational data, XML data, and RDF data. In this paper, we propose an efficient keyword search method for RDF data. The proposed method first groups related nodes and edges in RDF data graphs to reduce data sizes for efficient keyword search and to allow relevant information to be returned together in the query answers. The proposed method also utilizes the semantics in RDF data to measure the relevancy of nodes and edges with respect to keywords for search result ranking. The experimental results based on real RDF data show that the proposed method reduces RDF data about in half and is at most 5 times faster than the previous methods.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

A Case Study on the Construction of Cyber Textbook Museum Database (사이버교과서박물관 데이터베이스 구축에 관한 사례 연구)

  • Kim, Eun-Ju;Lee, Myeong-Hee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.20 no.4
    • /
    • pp.67-84
    • /
    • 2009
  • Cyber Textbook Museum is created by the Korean Educational Development Institute in part of the project to manage the knowledge and information of Korea to promote understanding of Korean education and its history. The original and full text of textbooks dating from the 1890s to the present have been digitized and arranged for easy access over internet. An exclusive portal site dealing with Korean textbooks and curriculum materials was made to provide not only the directory service of textbooks and curriculums in diverse data classifications, school levels, years/periods and subjects but also the keyword search by searching engine. Users can search the necessary materials easily and systematically over the screen and use all the functions except save, capture and print. The management system for textbook image(DjVu format), search system and DRM(Digital Rights Management) system were developed. Finally, four suggestions are proposed which are related in the aspects of policy, technical, systematic aspects for active and tremendous use of the site.

Construction of Full-Text Database and Implementation of Service Environment for Electronic Theses and Dissertations (학위논문 전문데이터베이스 구축 및 서비스환경 구현)

  • Lee, Kyi-Ho;Kim, Jin-Suk;Yoon, Wha-Muk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.1
    • /
    • pp.41-49
    • /
    • 2000
  • Form the middle of 199os, most universities in Korea have requested their students to submit not only the original text books but also their Electronic Theses and Dissertations(ETD) for masters degree and doctorates degree. The ETD submitted by the students are usually developed by various kinds of word processors such as MS-Word, LaTex, and HWP. Since there is no standard format for ETD to merge various different formats yet, it is difficult to construct the integrated database that provides full-tex service. In this paper, we transform three different ETD formats into a unified one, construct a full-text database, and implement the full-text retrieval system for effective search in the Internet environment.

  • PDF

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.