• Title/Summary/Keyword: information collection and extraction

Search Result 89, Processing Time 0.024 seconds

Frequently Occurred Information Extraction from a Collection of Labeled Trees (라벨 트리 데이터의 빈번하게 발생하는 정보 추출)

  • Paik, Ju-Ryon;Nam, Jung-Hyun;Ahn, Sung-Joon;Kim, Ung-Mo
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.65-78
    • /
    • 2009
  • The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns from them. Because mining frequent tree patterns has a wide range of applications such as xml mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) the computationally high cost of the candidate maintenance, (3) the repetitious input dataset scans, and (4) the high memory dependency. These problems stem from that most of these algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in their algorithms. To solve the problems, we base a pattern-growth approach rather than the apriori approach, and choose to extract maximal frequent subtree patterns instead of frequent subtree patterns. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.

  • PDF

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.

RPC Correction of KOMPSAT-3A Satellite Image through Automatic Matching Point Extraction Using Unmanned AerialVehicle Imagery (무인항공기 영상 활용 자동 정합점 추출을 통한 KOMPSAT-3A 위성영상의 RPC 보정)

  • Park, Jueon;Kim, Taeheon;Lee, Changhui;Han, Youkyung
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_1
    • /
    • pp.1135-1147
    • /
    • 2021
  • In order to geometrically correct high-resolution satellite imagery, the sensor modeling process that restores the geometric relationship between the satellite sensor and the ground surface at the image acquisition time is required. In general, high-resolution satellites provide RPC (Rational Polynomial Coefficient) information, but the vendor-provided RPC includes geometric distortion caused by the position and orientation of the satellite sensor. GCP (Ground Control Point) is generally used to correct the RPC errors. The representative method of acquiring GCP is field survey to obtain accurate ground coordinates. However, it is difficult to find the GCP in the satellite image due to the quality of the image, land cover change, relief displacement, etc. By using image maps acquired from various sensors as reference data, it is possible to automate the collection of GCP through the image matching algorithm. In this study, the RPC of KOMPSAT-3A satellite image was corrected through the extracted matching point using the UAV (Unmanned Aerial Vehichle) imagery. We propose a pre-porocessing method for the extraction of matching points between the UAV imagery and KOMPSAT-3A satellite image. To this end, the characteristics of matching points extracted by independently applying the SURF (Speeded-Up Robust Features) and the phase correlation, which are representative feature-based matching method and area-based matching method, respectively, were compared. The RPC adjustment parameters were calculated using the matching points extracted through each algorithm. In order to verify the performance and usability of the proposed method, it was compared with the GCP-based RPC correction result. The GCP-based method showed an improvement of correction accuracy by 2.14 pixels for the sample and 5.43 pixelsfor the line compared to the vendor-provided RPC. In the proposed method using SURF and phase correlation methods, the accuracy of sample was improved by 0.83 pixels and 1.49 pixels, and that of line wasimproved by 4.81 pixels and 5.19 pixels, respectively, compared to the vendor-provided RPC. Through the experimental results, the proposed method using the UAV imagery presented the possibility as an alternative to the GCP-based method for the RPC correction.

Design of Compound Knowledge Repository for Recommendation System (추천시스템을 위한 복합지식저장소 설계)

  • Han, Jung-Soo;Kim, Gui-Jung
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.427-432
    • /
    • 2012
  • The article herein suggested a compound repository and a descriptive method to develop a compound knowledge process. A data target saved in a compound knowledge repository suggested in this article includes all compound knowledge meta data and digital resources, which can be divided into the three following factors according to the purpose: user roles, functional elements, and service ranges. The three factors are basic components to describe abstract models of repository. In this article, meta data of compound knowledge are defined by being classified into the two factors. A component stands for the property about a main agent, activity unit or resource that use and create knowledge, and a context presents the context in which knowledge object are included. An agent of the compound knowledge process performs classification, registration, and pattern information management of composite knowledge, and serves as data flow and processing between compound knowledge repository and user. The agent of the compound knowledge process consists of the following functions: warning to inform data search and extraction, data collection and output for data exchange in an distributed environment, storage and registration for data, request and transmission to call for physical material wanted after search of meta data. In this article, the construction of a compound knowledge repository for recommendation system to be developed can serve a role to enhance learning productivity through real-time visualization of timely knowledge by presenting well-put various contents to users in the field of industry to occur work and learning at the same time.

Energy Minimization Model for Pattern Classification of the Movement Tracks (행동궤적의 패턴 분류를 위한 에너지 최소화 모델)

  • Kang, Jin-Sook;Kim, Jin-Sook;Cha, Eul-Young
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.281-288
    • /
    • 2004
  • In order to extract and analyze complex features of the behavior of animals in response to external stimuli such as toxic chemicals, we implemented an adaptive computational method to characterize changes in the behavior of chironomids in response to treatment with the insecticide, diazinon. In this paper, we propose an energy minimization model to extract the features of response behavior of chironomids under toxic treatment, which is applied on the image of velocity vectors. It is based on the improved active contour model and the variations of the energy functional, which are produced by the evolving active contour. The movement tracks of individual chironomid larvae were continuously measured in 0.25 second intervals during the survey period of 4 days before and after the treatment. Velocity on each sample track at 0.25 second intervals was collected in 15-20 minute periods and was subsequently checked to effectively reveal behavioral states of the specimens tested. Active contour was formed around each collection of velocities to gradually evolve to find the optimal boundaries of velocity collections through processes of energy minimization. The active contour which is improved by T. Chan and L. Vese is used in this paper. The energy minimization model effectively revealed characteristic patterns of behavior for the treatment versus no treatment, and identified changes in behavioral states .is the time progressed.

Industrial Technology Leak Detection System on the Dark Web (다크웹 환경에서 산업기술 유출 탐지 시스템)

  • Young Jae, Kong;Hang Bae, Chang
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.46-53
    • /
    • 2022
  • Today, due to the 4th industrial revolution and extensive R&D funding, domestic companies have begun to possess world-class industrial technologies and have grown into important assets. The national government has designated it as a "national core technology" in order to protect companies' critical industrial technologies. Particularly, technology leaks in the shipbuilding, display, and semiconductor industries can result in a significant loss of competitiveness not only at the company level but also at the national level. Every year, there are more insider leaks, ransomware attacks, and attempts to steal industrial technology through industrial spy. The stolen industrial technology is then traded covertly on the dark web. In this paper, we propose a system for detecting industrial technology leaks in the dark web environment. The proposed model first builds a database through dark web crawling using information collected from the OSINT environment. Afterwards, keywords for industrial technology leakage are extracted using the KeyBERT model, and signs of industrial technology leakage in the dark web environment are proposed as quantitative figures. Finally, based on the identified industrial technology leakage sites in the dark web environment, the possibility of secondary leakage is detected through the PageRank algorithm. The proposed method accepted for the collection of 27,317 unique dark web domains and the extraction of 15,028 nuclear energy-related keywords from 100 nuclear power patents. 12 dark web sites identified as a result of detecting secondary leaks based on the highest nuclear leak dark web sites.

Lived Experience of patients with Terminal Cancer : Parses Human Becoming Methodology (말기 암환자의 체험에 관한 현상학적 연구)

  • 이옥자
    • Journal of Korean Academy of Nursing
    • /
    • v.25 no.3
    • /
    • pp.510-537
    • /
    • 1995
  • Human health is an integral part of experience in the process of Human Becoming. Through continual interaction with the environment human beings freely choose experience and develop as responsible beings. The process of the health experience of patient with terminal cancer is a unique. he objective of this study is to understand the lived experience of patients with terminal cancer in order to provide basic information for nursing care in the clinical setting and to develop a theoretical background for clinical practice. This study is to de-scribe and define the lived experience of patients with terminal cancer in order to provide a foundation for nursing research and education. Data collection has been done between December 1993 and November 1994. The subjects included five persons -four females and one male : one who was in her sixties, one in his fifties, two in their forties, and one who was in her thirties. The researcher has met with these patients 35 times, but at eight times the patient was in a stuporous condition and not able to participate, so these were not included in the data analysis. Parse's "Human Becoming Methodology", an existential phenomenological research methodology is used for this study. Data has been collected using he dialogical engagement process of "I and You", the participant researcher and the participant subject. Dialogical engagement was discontinued when the data was theoretically saturated. Data was analyzed using the extraction - synthesis and heuristic interpretation. The criteria of Guba and Lincoln(1985). and Sandelo wski(1986) : credibility, auditability, fitness and objectivity were used to test the validity and reliability of the data. The following is a description of the structure of the lived experience of patients with terminal cancer as defined by this study : 1. Structure : 1) Suffering through the reminiscence of past experience 2) The appearance of complex emotions related to life and connectedness 3) The increasing importance of significant people and of the Absolute Being 4) The increasing realization of the importance of health and belief 5) Desire for a return to health and a peaceful life or for acceptance of dying and a comfortable death In summary the structure of the lived experience of these patients can be said to be : suffering comes through reminiscence of past experience, and there are complex emotions related to life and connectedness. Significant people and the Absolute Being become increasingly important along with a realization of the importance of health and faith. And finally there is a desire for either a return to health and a peaceful life or for the acceptance of dying and a comfortable death. 2. Heuristic Interpretation : Using Parse's Human Becoming Methodology, the structure of the lived experience of patients with terminal cancer identified in this research is interpreted as. The lived experience of patients with terminal cancer involves the solving of past conflicts, and the experience of the healing and valuing of sorrow and pain. Through the relation of life and health, and the complex emotions that arise, the lived experience of revealing - concealing is of paradoxical emotions. The increasing importance of significant others and of the Absolute Being shows Connecting and Separating an on- going process of nearness and farness. Revision of thoughts about health and faith is interpreted as transforming and desire for restoration to health and a peaceful life or acceptance of dying and a cowfortable death, as powering. In summary, it is possible to see, in the lived experience of patients with terminal cancer, the relationship of the five concepts of Parse's theory : valuing, revealing -concealing, connecting-separating, transforming, and powering. From Parse's theory, the results of this study show that meaning is related to valuing, rhythmicity to revealing-concealing and connect-ing-separating, and cotranscendence to transforming and powering.

  • PDF

A Study on UX-centered Smart Office Phone Design Development Process Using Service Design Process (서비스디자인 프로세스를 활용한 UX중심 오피스 전화기 디자인개발 프로세스 연구)

  • Seo, Hong-Seok
    • Science of Emotion and Sensibility
    • /
    • v.25 no.1
    • /
    • pp.41-54
    • /
    • 2022
  • The purpose of this study was to propose a "user experience (UX)-centered product development process" so that the product design development process using the service design process can be systematized and used in practice. In a situation in which usability research on office phones is lacking compared to general home phones, this study expands to a product-based service design point of view rather than simple product development, intending to research ways to provide user experience value through office phone design in smart office. This study focused on extracting UX-centered user needs using the service design process and developing product design that realizes user experience value. In particular, the service design process was applied to systematically extract user needs and experience value elements in the product development process and to discover ideas that were converged with product-based services. For this purpose, the "Double Diamond Design Process Model," which is widely used in the service design field, was adopted. In addition, a product design development process was established so that usability improvement plans, user experience value elements, and product-service connected ideas could be extracted through a work-flow in which real users and people from various fields participate. Based on the double diamond design process, in the "Discover" information collection stage, design trends were identified mainly in the office phone markets. In the "Define" analysis and extraction stage, user needs were analyzed through user observation, interview, and usability survey, and design requirements and user experience issues were extracted. Persona was set through user type analysis, and user scenarios were presented. In the "Develop" development stage, ideation workshops and concept renderings were conducted to embody the design, and people from various fields within the company participated to set the design direction reflecting design preference and usability improvement plans. In the "Deliver" improvement/prototype development/evaluation stage, a working mock-up of a design prototype was produced and design and usability evaluation were conducted through consultation with external design experts. It is meaningful that it established a "UX-centered product development process" model that converged with the existing product design development process and service design process. Ultimately, service design-based product design development process was presented so that I Corp.'s products could realize user experience value through service convergence.

Extension Method of Association Rules Using Social Network Analysis (사회연결망 분석을 활용한 연관규칙 확장기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.111-126
    • /
    • 2017
  • Recommender systems based on association rule mining significantly contribute to seller's sales by reducing consumers' time to search for products that they want. Recommendations based on the frequency of transactions such as orders can effectively screen out the products that are statistically marketable among multiple products. A product with a high possibility of sales, however, can be omitted from the recommendation if it records insufficient number of transactions at the beginning of the sale. Products missing from the associated recommendations may lose the chance of exposure to consumers, which leads to a decline in the number of transactions. In turn, diminished transactions may create a vicious circle of lost opportunity to be recommended. Thus, initial sales are likely to remain stagnant for a certain period of time. Products that are susceptible to fashion or seasonality, such as clothing, may be greatly affected. This study was aimed at expanding association rules to include into the list of recommendations those products whose initial trading frequency of transactions is low despite the possibility of high sales. The particular purpose is to predict the strength of the direct connection of two unconnected items through the properties of the paths located between them. An association between two items revealed in transactions can be interpreted as the interaction between them, which can be expressed as a link in a social network whose nodes are items. The first step calculates the centralities of the nodes in the middle of the paths that indirectly connect the two nodes without direct connection. The next step identifies the number of the paths and the shortest among them. These extracts are used as independent variables in the regression analysis to predict future connection strength between the nodes. The strength of the connection between the two nodes of the model, which is defined by the number of nodes between the two nodes, is measured after a certain period of time. The regression analysis results confirm that the number of paths between the two products, the distance of the shortest path, and the number of neighboring items connected to the products are significantly related to their potential strength. This study used actual order transaction data collected for three months from February to April in 2016 from an online commerce company. To reduce the complexity of analytics as the scale of the network grows, the analysis was performed only on miscellaneous goods. Two consecutively purchased items were chosen from each customer's transactions to obtain a pair of antecedent and consequent, which secures a link needed for constituting a social network. The direction of the link was determined in the order in which the goods were purchased. Except for the last ten days of the data collection period, the social network of associated items was built for the extraction of independent variables. The model predicts the number of links to be connected in the next ten days from the explanatory variables. Of the 5,711 previously unconnected links, 611 were newly connected for the last ten days. Through experiments, the proposed model demonstrated excellent predictions. Of the 571 links that the proposed model predicts, 269 were confirmed to have been connected. This is 4.4 times more than the average of 61, which can be found without any prediction model. This study is expected to be useful regarding industries whose new products launch quickly with short life cycles, since their exposure time is critical. Also, it can be used to detect diseases that are rarely found in the early stages of medical treatment because of the low incidence of outbreaks. Since the complexity of the social networking analysis is sensitive to the number of nodes and links that make up the network, this study was conducted in a particular category of miscellaneous goods. Future research should consider that this condition may limit the opportunity to detect unexpected associations between products belonging to different categories of classification.