• Title/Summary/Keyword: information collection and extraction

Search Result 89, Processing Time 0.022 seconds

Extraction of Relationships between Scientific Terms based on Composite Kernels (혼합 커널을 활용한 과학기술분야 용어간 관계 추출)

  • Choi, Sung-Pil;Choi, Yun-Soo;Jeong, Chang-Hoo;Myaeng, Sung-Hyon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.988-992
    • /
    • 2009
  • In this paper, we attempted to extract binary relations between terminologies using composite kernels consisting of convolution parse tree kernels and WordNet verb synset vector kernels which explain the semantic relationships between two entities in a sentence. In order to evaluate the performance of our system, we used three domain specific test collections. The experimental results demonstrate the superiority of our system in all the targeted collection. Especially, the increase in the effectiveness on KREC 2008, 8% in F1, shows that the core contexts around the entities play an important role in boosting the entire performance of relation extraction.

HTML Text Extraction Using Frequency Analysis (빈도 분석을 이용한 HTML 텍스트 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1135-1143
    • /
    • 2021
  • Recently, text collection using a web crawler for big data analysis has been frequently performed. However, in order to collect only the necessary text from a web page that is complexly composed of numerous tags and texts, there is a cumbersome requirement to specify HTML tags and style attributes that contain the text required for big data analysis in the web crawler. In this paper, we proposed a method of extracting text using the frequency of text appearing in web pages without specifying HTML tags and style attributes. In the proposed method, the text was extracted from the DOM tree of all collected web pages, the frequency of appearance of the text was analyzed, and the main text was extracted by excluding the text with high frequency of appearance. Through this study, the superiority of the proposed method was verified.

Automation of Sampling for Public Survey Performance Assessment (공공측량 성과심사 표본추출 자동화 가능성 분석)

  • Choi, Hyun;Jin, Cheol;Lee, Jung Il;Kim, Gi Hong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.44 no.1
    • /
    • pp.95-100
    • /
    • 2024
  • The public survey performance review conducted by the Spatial Information Quality Management Institute is conducted at the screening rate in accordance with the regulations, and the examiner directly judges the overall trend of the submitted performance based on the extracted sample. However, the evaluation of the Ministry of Land, Infrastructure and Transport, the evaluation trustee shall be specified by random extraction (Random Collection) is specified by the sample. In this study, it analyzed the details of the actual site and analyzed through securing actual performance review data. In addition, we analyzed considerations according to various field conditions and studied ways to apply the public survey performance review sampling algorithm. Therefore, detailed sampling criteria analysis by performance reviewers is necessary. A relative comparison was made feasible by comparing the data for which the real performance evaluation was performed with the outcomes of the Python automation program. This automation program is expected to be employed as a foundation program for the automated application of public survey performance evaluation sampling in the future.

The Competency in Disaster Nursing of Korean Nurses: Scoping Review (국내 간호사의 재난간호 역량: 주제범위 문헌고찰)

  • Lee, Eunja;Yang, Jungeun
    • Journal of East-West Nursing Research
    • /
    • v.27 no.2
    • /
    • pp.153-165
    • /
    • 2021
  • Purpose: The aim of study was to identify ranges of Korean nurses' competency in disaster nursing. Methods: A scoping review was conducted using the Joanna Briggs Institute methodology. The review used information from four databases: RISS, ScienceON, EBSCO Discovery Service, and CINAHL. In this review, key words were 'disaster', 'nurs*', 'competenc*', 'ability' and 'preparedness'. Inclusion and exclusion criteria were identified as strategies to use in this review. The inclusion criteria for this review focused on the following: Korean nurse, articles related to disaster nursing competency, peer-review articles published in the full text in Korean and English. Review articles were excluded. Results: Nineteen studies were eligible for result extraction. A total of 10 categories of disaster nursing competency were identified: Knowledge of disaster nursing, crisis management, disaster preparation, information collection and sharing, nursing record and document management, communication, disaster plan, nursing activities in disaster response, infection management, and chemical, biological, radiation, nuclear, and explosive management. Conclusion: It is necessary to distinguish between Korean nurses' common disaster nursing competency, professional disaster nursing competency, and disaster nursing competency required in nursing practice. Therefore, future research will be needed to explore and describe disaster nursing competency.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

Extraction of Watershed Information using GIS and Diurnal Flow Change in the Rapids and Pool by the Nature-Friendly River Work (GIS를 이용한 유역정보 추출 및 여울과 소의 치수적 복원을 위한 일중 수치해석)

  • Kang, Sang-Hyeok
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.32 no.5
    • /
    • pp.517-522
    • /
    • 2010
  • The riffle and pool play an important role for composing river front, but very little information was used for river restoration considering flood control. In this paper extensive field investigation was carried out to estimate hydraulic processes in the pool. Furthermore diurnal stratification model was developed and applied to assess flow change in pool. The physical mechanism of water flow including diurnal processes was well simulated, the results show that diurnal variation of water flow in the pool about 2 m depth is governed by the level of mixing due to density flow. These effort will be useful to guide field data collection work and to understand primary production.

B-Corr Model for Bot Group Activity Detection Based on Network Flows Traffic Analysis

  • Hostiadi, Dandy Pramana;Wibisono, Waskitho;Ahmad, Tohari
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.4176-4197
    • /
    • 2020
  • Botnet is a type of dangerous malware. Botnet attack with a collection of bots attacking a similar target and activity pattern is called bot group activities. The detection of bot group activities using intrusion detection models can only detect single bot activities but cannot detect bots' behavioral relation on bot group attack. Detection of bot group activities could help network administrators isolate an activity or access a bot group attacks and determine the relations between bots that can measure the correlation. This paper proposed a new model to measure the similarity between bot activities using the intersections-probability concept to define bot group activities called as B-Corr Model. The B-Corr model consisted of several stages, such as extraction feature from bot activity flows, measurement of intersections between bots, and similarity value production. B-Corr model categorizes similar bots with a similar target to specify bot group activities. To achieve a more comprehensive view, the B-Corr model visualizes the similarity values between bots in the form of a similar bot graph. Furthermore, extensive experiments have been conducted using real botnet datasets with high detection accuracy in various scenarios.

Efficient Collecting Scheme the Crack Data via Vector based Data Augmentation and Style Transfer with Artificial Neural Networks (벡터 기반 데이터 증강과 인공신경망 기반 특징 전달을 이용한 효율적인 균열 데이터 수집 기법)

  • Yun, Ju-Young;Kim, Donghui;Kim, Jong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.667-669
    • /
    • 2021
  • 본 논문에서는 벡터 기반 데이터 증강 기법(Data augmentation)을 제안하여 학습 데이터를 구축한 뒤, 이를 합성곱 신경망(Convolutional Neural Networks, CNN)으로 실제 균열과 가까운 패턴을 표현할 수 있는 프레임워크를 제안한다. 건축물의 균열은 인명 피해를 가져오는 건물 붕괴와 낙하 사고를 비롯한 큰 사고의 원인이다. 이를 인공지능으로 해결하기 위해서는 대량의 데이터 확보가 필수적이다. 하지만, 실제 균열 이미지는 복잡한 패턴을 가지고 있을 뿐만 아니라, 위험한 상황에 노출되기 때문에 대량의 데이터를 확보하기 어렵다. 이러한 데이터베이스 구축의 문제점은 인위적으로 특정 부분에 변형을 주어 데이터양을 늘리는 탄성왜곡(Elastic distortion) 기법으로 해결할 수 있지만, 본 논문에서는 이보다 향상된 균열 패턴 결과를 CNN을 활용하여 보여준다. 탄성왜곡 기법보다 CNN을 이용했을 때, 실제 균열 패턴과 유사하게 추출된 결과를 얻을 수 있었고, 일반적으로 사용되는 픽셀 기반 데이터가 아닌 벡터 기반으로 데이터 증강을 설계함으로써 균열의 변화량 측면에서 우수함을 보였다. 본 논문에서는 적은 개수의 균열 데이터를 입력으로 사용했음에도 불구하고 균열의 방향 및 패턴을 다양하게 생성하여 쉽게 균열 데이터베이스를 구축할 수 있었다. 이는 장기적으로 구조물의 안정성 평가에 이바지하여 안전사고에 대한 불안감에서 벗어나 더욱 안전하고 쾌적한 주거 환경을 조성할 것으로 기대된다.

  • PDF

Analysis System for SNS Issues per Country based on Topic Model (토픽 모델 기반의 국가 별 SNS 관심 이슈 분석 시스템)

  • Kim, Seong Hoon;Yoon, Ji Won
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1201-1209
    • /
    • 2016
  • As the use of SNS continues to increase, various related studies have been conducted. According to the effectiveness of the topic model for existing theme extraction, a huge number of related research studies on topic model based analysis have been introduced. In this research, we suggested an automation system to analyze topics of each country and its distribution in twitter by combining world map visualization and issue matching method. The core system components are the following three modules; 1) collection of tweets and classification by nation, 2) extraction of topics and distribution by country based on topic model algorithm, and 3) visualization of topics and distribution based on Google geochart. In experiments with USA and UK, we could find issues of the two nations and how they changed. Based on these results, we could analyze the differences of each nation's position on ISIS problem.

Dynamic ontology construction algorithm from Wikipedia and its application toward real-time nation image analysis (국가이미지 분석을 위한 위키피디아 실시간 동적 온톨로지 구축 알고리즘 및 적용)

  • Lee, Youngwhan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.979-991
    • /
    • 2016
  • Measuring nation images was a challenging task when employing offline surveys was the only option. It was not only prohibitively expensive, but too much time-consuming and therefore unfitted to this rapidly changing world. Although demands for monitoring real-time nation images were ever-increasing, an affordable and reliable solution to measure nation images has not been available up to this date. The researcher in this study developed a semi-automatic ontology construction algorithm, named "double-crossing double keyword collection (or DCDKC)" to measure nation images from Wikipedia in real-time. The ontology, WikiOnto, can be used to reflect dynamic image changes. In this study, an instance of WikiOnto was constructed by applying the algorithm to the big-three exporting countries in East Asia, Korea, Japan, and China. Then, the numbers of page views for words in the instance of WikiOnto were counted. A collection of the counting for each country was compared to each other to inspect the possibility to use for dynamic nation images. As for the conclusion, the result shows how the images of the three countries have changed for the period the study was performed. It confirms that DCDKC can very well be used for a real-time nation-image monitoring system.