• Title/Summary/Keyword: Keywords Similarity

Search Result 91, Processing Time 0.027 seconds

Semi-automatic Data Fusion Method for Spatial Datasets (공간 정보를 가지는 데이터셋의 준자동 융합 기법)

  • Yoon, Jong-chan;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • With the development of big data-related technologies, it has become possible to process vast amounts of data that could not be processed before. Accordingly, the establishment of an automated data selection and fusion process for the realization of big data-based services has become a necessity, not an option. In this paper, we propose an automation technique to create meaningful new information by fusing datasets containing spatial information. Firstly, the given datasets are embedded by using the Node2Vec model and the keywords of each dataset. Then, the semantic similarities among all of datasets are obtained by calculating the cosine similarity for the embedding vector of each pair of datasets. In addition, a person intervenes to select some candidate datasets with one or more spatial identifiers from among dataset pairs with a relatively higher similarity, and fuses the dataset pairs to visualize them. Through such semi-automatic data fusion processes, we show that significant fused information that cannot be obtained with a single dataset can be generated.

Advanced CBS (Cost Breakdown Structure) Code Search Technology Applying NLP (Natural Language Processing) of Artificial Intelligence (인공지능 자연어 처리 기법을 이용한 개선된 내역코드 탐색방법)

  • Kim, HanDo;Nam, JeongYong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.719-731
    • /
    • 2024
  • For efficient construction management, linking BIM with schedule and cost is essential, but there are limits to the application of 5D BIM due to the difficulty in disassembling thousands of WBS and CBS. To solve this problem, a standardized WBS-CBS set is configured in advance, and when a new construction project occurs, the CBS in the BOQ is automatically linked to the WBS when a text most similar to it is found among the standard CBS (Public Procurement Service standard construction code) of the already linked set. A method was used to compare the text similarity of CBS more efficiently using artificial intelligence natural language processing techniques. Firstly, we created a civil term dictionary (CTD) that organized the words used in civil projects and assigned numerical values, tokenized the text of all CBS into words defined in the dictionary, converted them into TF-IDF vectors, and determined them by cosine similarity. Additionally, the search success rate increased to nearly 70 % by considering CBS' hierarchical structure and changing keywords. The threshold value for judging similarity was 0.62 (1: perfect match, 0: no match).

A System for Measuring the Similarity and Redundancy of R&D Project (R&D 과제의 유사도 및 중복도 측정 시스템에 관한 연구)

  • Choi, Kook-Hyun;Kang, Yong-Suk;Kim, Jong-Hee;Shin, Yong-Tae;Kim, Jong-Bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.329-331
    • /
    • 2014
  • The analysis of the similarities and redundancies among R&D projects is important for the efficient investment of government budgets. When government R&D projects are planned, the redundancies of research tasks are examined by institutions specializing in research management, relevant offices and departments, and the government to prevent redundant funding. However, as existing similarity analyses depend on methods wherein new task proposals and existing R&D project proposals are compared and looked up based on keywords. This results in vulnerability wherein similarity cannot be accurately measured in the event of partial modifications of the task name or technical substitutions. This study aims to use patent information as characteristics by which R&D project documents can be identified. The patent data used is based on materials officially published by the government's R&D patent trend survey project (http://ipas.rndip.re.kr). The study aims to propose a method by which patent information can be used to analyze the similarity and redundancy among R&D projects when new projects are entered. For this purpose, a similarity measurement model based on set theory and probability theory is presented. The presented measurement model is implemented into an actual system to identify redundant documents, and calculate and show their similarity.

  • PDF

Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique (키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석)

  • Youngseok Lee
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.187-192
    • /
    • 2023
  • In this study, trends in ICT education were investigated by analyzing the frequency of appearance of keywords related to machine learning and using conversion of iteration correction(CONCOR) techniques. A total of 304 papers from 2018 to the present published in registered sites were searched on Google Scalar using "ICT education" as the keyword, and 60 papers pertaining to ICT education were selected based on a systematic literature review. Subsequently, keywords were extracted based on the title and summary of the paper. For word frequency and indicator data, 49 keywords with high appearance frequency were extracted by analyzing frequency, via the term frequency-inverse document frequency technique in natural language processing, and words with simultaneous appearance frequency. The relationship degree was verified by analyzing the connection structure and centrality of the connection degree between words, and a cluster composed of words with similarity was derived via CONCOR analysis. First, "education," "research," "result," "utilization," and "analysis" were analyzed as main keywords. Second, by analyzing an N-GRAM network graph with "education" as the keyword, "curriculum" and "utilization" were shown to exhibit the highest correlation level. Third, by conducting a cluster analysis with "education" as the keyword, five groups were formed: "curriculum," "programming," "student," "improvement," and "information." These results indicate that practical research necessary for ICT education can be conducted by analyzing ICT education trends and identifying trends.

Analysis of Reference Inquiries in the Field of Social Science in the Collaborative Reference Service Using the Co-Word Technique

  • Cho, Jane
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.49 no.1
    • /
    • pp.129-148
    • /
    • 2015
  • This study grasped the true nature of the inquiry domain by analysing the requests for collaborative reference service in the social science field using the co-word technique, and schematized the intellectual structure. First, this study extracted 748 uncontrolled keywords from inquiries for reference in the field of social science. Second, calculated similarity indices between the words on the basis of co-occurrence frequency, and performed not only clustering but also MDS mapping. Third, to grasp the difference in inquiries for reference by period, dividing the period into two parts, and performed comparative analysis. As a result, there formed 5 clusters and "Korea Education" showed an overwhelming size with 40.3% among those clusters. The result of the analysis through the period division showed there were many questions about "Education" during the first half, while a lot of inquiries with focus on "welfare and business information" during the second half.

A Study on Method for Extracting Emotion from Painting Based on Color (색상 기반 회화 감성 추출 방법에 관한 연구)

  • Shim, Hyounoh;Park, Seongju;Yoon, Kyunghyun
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.4
    • /
    • pp.717-724
    • /
    • 2016
  • Paintings can evoke emotions in viewers. In this paper, we propose a method for extracting emotion from paintings by using the colors that comprise the paintings. For this, we generate color spectrum from input painting and compare the color spectrum and color combination for finding most similarity color combination. The found color combinations are mapped with emotional keywords. Thus, we extract emotional keyword as the emotion evoked by the painting. Also, we vary the form of algorithms for matching color spectrum and color combinations and extract and compare results by using each algorithm.

Engineering Information Search based on Ontology Mapping (온톨로지 매핑 기반 엔지니어링 정보 검색)

  • Jung Min;Suh Hyo-Won
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.617-618
    • /
    • 2006
  • The participants in collaborative environment want to get the right documents which are intended to find. In general search system, it searches documents which contain only the keywords. For searching different word-expressions for the same meaning, we perform mapping before searching. Our mapping logic consists of three steps. First, the character matching is the mapping of two terminologies that have identical character strings. Second, the definition comparing is the method that compares two terminologies' definitions. Third, the similarity checking pairs terminologies which were not mapped by two prior steps. In this paper, we propose Engineering Information Search System based on ontology mapping.

  • PDF

A Case Based e-Mail Response System for Customer Support

  • Yoon, Young-Suk;Lee, Jae-Kwang;Han, Chang-Hee
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.121-133
    • /
    • 2003
  • Due to the rapid growth of Internet, means of communication with customers in a traditional customer support environment such as telephone calls are being replaced by mainly e-mail in a Web-based customer support system. Although such a Web-based support is efficient and promises potential benefits for firms, including reduced transaction costs, reduced time, and high quality of support, there are some difficulties associated with responding to many types of customer's inbound e-mails appropriately. As many types of e-mail are received, considerable attention is being paid to methods for increasing the efficiency of managing and responding e-mails. This research proposes an intelligent system for managing customer's inbound e-mails in organizations by applying case based reasoning technique for responding to various customers' inbound e-mails more effectively. In this approach, a case is represented as a frame-typed data structure corresponding to an inbound e-mail, keywords, and its reply e-mail. In the retrieval procedure, keywords and affinity set is developed to index a case, and then the case is represented as a vector, a case vector. Also, cosines value is calculated to measure the similarity between a new inbound e-mail and the cases in the case base. In the adaptation procedure, we provide several adaptation strategies to adapt and modify the retrieved case. The strategies guide to make an outbound e-mail using product databases, databases for customer support, etc. Additionally, the Web-based system architecture is proposed to implement our methodology. The proposed methodology and system will be helpful for developing more efficient Web-based customer support.

  • PDF

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method (이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.83-105
    • /
    • 2019
  • This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

Rank-Size Distribution with Web Document Frequency of City Name : Case study with U.S incorporated places of 100,000 or more population (인터넷 문서빈도를 통해 본 도시순위규모에 관한 연구 -미국 10만 이상의 인구를 갖는 도시들을 사례로-)

  • Hong, Il-Young
    • Journal of the Korean association of regional geographers
    • /
    • v.13 no.3
    • /
    • pp.290-300
    • /
    • 2007
  • In this study, web document frequency of city place name is analyzed and it is used as the dataset for rank-size analysis. The search keywords are compared in the context of spatial meaning and the different domain corpus is applied. The acquired search results are applied for the further analysis. Firstly, the rank-size analysis is applied to compare the result between population and document frequency. Secondly, in case of correlation analysis, the significant changes are revealed when the spatial criteria for search keywords are increased. In case of corpus, COM, NET, and ORG shows the higher coefficient values. Lastly, the cluster analysis is applied to classify the list of cities that shows the similarity and difference. These analyses have a significant role in representing the rank-size distribution of city names that are reflected on the web documents in the information society.

  • PDF