• Title/Summary/Keyword: text extraction

Search Result 465, Processing Time 0.024 seconds

Web Document Transcoding Technique for Small Display Devices (소형 화면 단말기를 위한 웹 문서 변환 기법)

  • Shin, Hee-Sook;Mah, Pyeong-Soo;Cho, Soo-Sun;Lee, Dong-Woo
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.1145-1156
    • /
    • 2002
  • We propose a web document transcoding technique that translates existing web pages designed for desktop computers into an appropriate form for hand-held devices connected to the wireless internet. By defining a content block based on a visual separation and using it as a minimum unit for analyzing and converting processes, we can get web pages converted more exactly. We also apply the reallocation of the content block and the generation of new index in order to provide convenient interface without left-right scrolling in small screen devices. These methods, compared with existing ways such as text level summary or partial extraction method, can provide efficient navigation and a full recognition of web documents. To gain those transcoding benefits, we propose the Layout-Forming Tag Analysis Algorithm that analyzes structural tags, which motivate visual separation and the Component Grouping Algorithm that extracts the content block. We also classify and rearrange the content block and generate the new index to produce an appropriate form of web pages for small display devices. We have designed and implemented our transcoding system in a proxy server and evaluated the methods and the algorithms through an analysis of transcoded results. Our transcoding system showed a good result on most of popular web pages that have complicated structures.

Detection of Gene Interactions based on Syntactic Relations (구문관계에 기반한 유전자 상호작용 인식)

  • Kim, Mi-Young
    • The KIPS Transactions:PartB
    • /
    • v.14B no.5
    • /
    • pp.383-390
    • /
    • 2007
  • Interactions between proteins and genes are often considered essential in the description of biomolecular phenomena and networks of interactions are considered as an entre for a Systems Biology approach. Recently, many works try to extract information by analyzing biomolecular text using natural language processing technology. Previous researches insist that linguistic information is useful to improve the performance in detecting gene interactions. However, previous systems do not show reasonable performance because of low recall. To improve recall without sacrificing precision, this paper proposes a new method for detection of gene interactions based on syntactic relations. Without biomolecular knowledge, our method shows reasonable performance using only small size of training data. Using the format of LLL05(ICML05 Workshop on Learning Language in Logic) data we detect the agent gene and its target gene that interact with each other. In the 1st phase, we detect encapsulation types for each agent and target candidate. In the 2nd phase, we construct verb lists that indicate the interaction information between two genes. In the last phase, to detect which of two genes is an agent or a target, we learn direction information. In the experimental results using LLL05 data, our proposed method showed F-measure of 88% for training data, and 70.4% for test data. This performance significantly outperformed previous methods. We also describe the contribution rate of each phase to the performance, and demonstrate that the first phase contributes to the improvement of recall and the second and last phases contribute to the improvement of precision.

Operation Technique of Spatial Data Change Recognition Data per File (파일 단위 공간데이터 변경 인식 데이터 운영 기법)

  • LEE, Bong-Jun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.184-193
    • /
    • 2021
  • The system for managing spatial data updates the existing information by extracting only the information that is different from the existing information for the newly obtained spatial information file to update the stored information. In order to extract only objects that have changed from existing information, it is necessary to compare whether there is any difference from existing information for all objects included in the newly obtained spatial information file. This study was conducted to improve this total inspection method in a situation where the amount of spatial information that is frequently updated increases and data update is required at the national level. In this study, before inspecting individual objects in a new acquisition space information file, a method of determining whether individual space objects have been changed only by the information in the file was considered. Spatial data files have structured data characteristics different from general image or text document files, so it is possible to determine whether to change the file unit in a simpler way compared to the existing method of creating and managing file hash. By reducing the number of target files that require full inspection, it is expected to improve the use of resources in the system by saving the overall data quality inspection time and saving data extraction time.

Counseling Outcomes Research Trend Analysis Using Topic Modeling - Focus on 「Korean Journal of Counseling」 (토픽 모델링을 활용한 상담 성과 연구동향 분석 - 「상담학연구」 학술지를 중심으로)

  • Park, Kwi Hwa;Lee, Eun Young;Yune, So Jung
    • Journal of Digital Convergence
    • /
    • v.19 no.11
    • /
    • pp.517-523
    • /
    • 2021
  • The outcome of the consultation is important to both the counselor and the researcher. Analyzing the trends of research on the results of counseling that have been carried out so far will help to comprehensively structure the results of consultations. The purpose of this research is to analyze research trends in Korea, focusing on research related to the outcomes of counseling published in 「Korean Journal of Counseling」 from 2011 to 2021, which is one of the well-known academic journals in the field of counseling in Korea. This is to explore the direction of future research by navigating the knowledge structure of research. There were 197 studies used for analysis, and the final 339 keyword were extracted during the node extraction process and used for analysis. As a result of extracting potential topics using the LDA algorithm, "Measurement and evaluation of counseling outcomes", "emotions and mediate factors affecting interpersonal relationships", and "career stress and coping strategies" are the main topics. Identifying major topics through trend analysis of counseling performance research contributed to structuring counseling performance. In-depth research on these topics needs to continue thereafter.

Water leakage accident analysis of water supply networks using big data analysis technique (R기반 빅데이터 분석기법을 활용한 상수도시스템 누수사고 분석)

  • Hong, Sung-Jin;Yoo, Do-Guen
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1261-1270
    • /
    • 2022
  • The purpose of this study is to collect and analyze information related to water leaks that cannot be easily accessed, and utilized by using the news search results that people can easily access. We applied a web crawling technique for extracting big data news on water leakage accidents in the water supply system and presented an algorithm in a procedural way to obtain accurate leak accident news. In addition, a data analysis technique suitable for water leakage accident information analysis was developed so that additional information such as the date and time of occurrence, cause of occurrence, location of occurrence, damaged facilities, damage effect. The primary goal of value extraction through big data-based leak analysis proposed in this study is to extract a meaningful value through comparison with the existing waterworks statistical results. In addition, the proposed method can be used to effectively respond to consumers or determine the service level of water supply networks. In other words, the presentation of such analysis results suggests the need to inform the public of information such as accidents a little more, and can be used in conjunction to prepare a radio wave and response system that can quickly respond in case of an accident.

Understanding of Generative Artificial Intelligence Based on Textual Data and Discussion for Its Application in Science Education (텍스트 기반 생성형 인공지능의 이해와 과학교육에서의 활용에 대한 논의)

  • Hunkoog Jho
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.307-319
    • /
    • 2023
  • This study aims to explain the key concepts and principles of text-based generative artificial intelligence (AI) that has been receiving increasing interest and utilization, focusing on its application in science education. It also highlights the potential and limitations of utilizing generative AI in science education, providing insights for its implementation and research aspects. Recent advancements in generative AI, predominantly based on transformer models consisting of encoders and decoders, have shown remarkable progress through optimization of reinforcement learning and reward models using human feedback, as well as understanding context. Particularly, it can perform various functions such as writing, summarizing, keyword extraction, evaluation, and feedback based on the ability to understand various user questions and intents. It also offers practical utility in diagnosing learners and structuring educational content based on provided examples by educators. However, it is necessary to examine the concerns regarding the limitations of generative AI, including the potential for conveying inaccurate facts or knowledge, bias resulting from overconfidence, and uncertainties regarding its impact on user attitudes or emotions. Moreover, the responses provided by generative AI are probabilistic based on response data from many individuals, which raises concerns about limiting insightful and innovative thinking that may offer different perspectives or ideas. In light of these considerations, this study provides practical suggestions for the positive utilization of AI in science education.

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

A Study of the Cultural Legislation of Historic Properties during the Japanese Colonial Period - Related to the Establishment and Implementation of the Chosun Treasure Historic Natural Monument Preservation Decree (1933) - (일제강점기 문화재 법제 연구 - 「조선보물고적명승천연기념물보존령(1933년)」 제정·시행 관련 -)

  • Kim, Jongsoo
    • Korean Journal of Heritage: History & Science
    • /
    • v.53 no.2
    • /
    • pp.156-179
    • /
    • 2020
  • The Preservation Decree (1933) is the basic law relevant to the conservation of cultural property of colonial Chosun, and invoked clauses from the Old History Preservation Act (1897), the Historic Scenic Sites Natural Monument Preservation Act (1919), and the National Treasure Preservation Act (1929), which were all forms of Japanese Modern Cultural Heritage Law, and actually used the corresponding legal text of those laws. Thus, the fact that the Preservation Decree transplanted or imitated the Japanese Modern Cultural Heritage Law in the composition of the constitution can be proved to some extent. The main features and characteristics of the Preservation Decree are summarized below. First, in terms of preservation of cultural property, the Preservation Decree strengthened and expanded preservation beyond the existing conservation rules. In the conservation rules, the categories of cultural properties were limited to historic sites and relics, while the Preservation Decree classifies cultural properties into four categories: treasures, historic sites, scenic spots, and natural monuments. In addition, the Preservation Decree is considered to have advanced cultural property preservation law by establishing the standard for conserving cultural property, expanding the scope of cultural property, introducing explicit provisions on the restriction of ownership and the designation system for cultural property, and defining the basis for supporting the natural treasury. Second, the Preservation Decree admittedly had limitations as a colonial cultural property law. Article 1 of the Preservation Decree sets the standard of "Historic Enhancement or Example of Art" as a criteria for designating treasures. With the perspective of Japanese imperialism, this acted as a criterion for catering to cultural assets based on the governor's assimilation policy, revealing its limitations as a standard for preserving cultural assets. In addition, the Japanese imperialists asserted that the cultural property law served to reduce cultural property robbery, but the robbery and exporting of cultural assets by such means as grave robbery, trafficking, and exportation to Japan did not cease even after the Preservation Decree came into effect. This is because governors and officials who had to obey and protect the law become parties to looting and extraction of property, or the plunder and release of cultural property by the Japanese continued with their acknowledgement,. This indicates that cultural property legislation at that time did not function properly, as the governor allowed or condoned such exporting and plundering. In this way, the cultural property laws of the Japanese colonial period constituted discriminative colonial legislation which was selected and applied from the perspective of the Japanese government-general in the designation and preservation of cultural property, and the cultural property policy of Japan focused on the use of cultural assets as a means of realizing their assimilation policy. Therefore, this suggests that the cultural property legislation during the Japanese colonial period was used as a mechanism to solidify the cultural colonial rules of Chosun and to realize the assimilation policy of the Japanese government-general.

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.