• Title/Summary/Keyword: word-net

Search Result 258, Processing Time 0.024 seconds

Taxonomy Induction from Wikidata using Directed Acyclic Graph's Centrality (방향 비순환 그래프의 중심성을 이용한 위키데이터 기반 분류체계 구축)

  • Cheon, Hee-Seon;Kim, Hyun-Ho;Kang, Inho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.582-587
    • /
    • 2021
  • 한국어 통합 지식베이스를 생성하기 위해 필수적인 분류체계(taxonomy)를 구축하는 방식을 제안한다. 위키데이터를 기반으로 분류 후보군을 추출하고, 상하위 관계를 통해 방향 비순환 그래프(Directed Acyclic Graph)를 구성한 뒤, 국부적 도달 중심성(local reaching centrality) 등의 정보를 활용하여 정제함으로써 246 개의 분류와 314 개의 상하위 관계를 갖는 분류체계를 생성한다. 워드넷(WordNet), 디비피디아(DBpedia) 등 기존 링크드 오픈 데이터의 분류체계 대비 깊이 있는 계층 구조를 나타내며, 다중 상위 분류를 지닐 수 있는 비트리(non-tree) 구조를 지닌다. 또한, 위키데이터 속성에 기반하여 위키데이터 정보가 있는 인스턴스(instance)에 자동으로 분류를 부여할 수 있으며, 해당 방식으로 실험한 결과 99.83%의 분류 할당 커버리지(coverage) 및 99.81%의 분류 예측 정확도(accuracy)를 나타냈다.

  • PDF

Synonyms/Antonyms-Based Data Augmentation For Training TOEIC Problems Solving Model (토익 문제 풀이 모델 학습을 위한 유의어/반의어 기반 데이터 증강 기법)

  • Jeongwoo Lee;Aiyanyo Imatitikua Danielle;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.333-335
    • /
    • 2023
  • 최근 글을 이해하고 답을 추론하는 연구들이 많이 이루어지고 있으며, 대표적으로 기계 독해 연구가 존재한다. 기계 독해와 관련하여 다양한 데이터셋이 공개되어 있지만, 과거에서부터 현재까지 사람의 영어 능력 평가를 위해 많이 사용되고 있는 토익에 대해서는 공식적으로 공개된 데이터셋도 거의 존재하지 않으며, 이를 위한 연구 또한 활발히 진행되고 있지 않다. 이에 본 연구에서는 현재와 같이 데이터가 부족한 상황에서 기계 독해 모델의 성능을 향상시키기 위한 데이터 증강 기법을 제안하고자 한다. 제안하는 방법은 WordNet을 이용하여 유의어 및 반의어를 기반으로 굉장히 간단하면서도 효율적으로 실제 토익 문제와 유사하게 데이터를 증강하는 것이며, 실험을 통해 해당 방법의 유의미함을 확인하였다. 우리는 본 연구를 통해 토익에 대한 데이터 부족 문제를 해소하고, 사람 수준의 우수한 성능을 얻을 수 있도록 한다.

  • PDF

KorLexClas 1.5: A Lexical Semantic Network for Korean Numeral Classifiers (한국어 수분류사 어휘의미망 KorLexClas 1.5)

  • Hwang, Soon-Hee;Kwon, Hyuk-Chul;Yoon, Ae-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.60-73
    • /
    • 2010
  • This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.

A Study of the Automatic Extraction of Hypernyms arid Hyponyms from the Corpus (코퍼스를 이용한 상하위어 추출 연구)

  • Pang, Chan-Seong;Lee, Hae-Yun
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.2
    • /
    • pp.143-161
    • /
    • 2008
  • The goal of this paper is to extract the hyponymy relation between words in the corpus. Adopting the basic algorithm of Hearst (1992), I propose a method of pattern-based extraction of semantic relations from the corpus. To this end, I set up a list of hypernym-hyponym pairs from Sejong Electronic Dictionary. This list is supplemented with the superordinate-subordinate terms of CoroNet. Then, I extracted all the sentences from the corpus that include hypemym-hyponym pairs of the list. From these extracted sentences, I collected all the sentences that contain meaningful constructions that occur systematically in the corpus. As a result, we could obtain 21 generalized patterns. Using the PERL program, we collected sentences of each of the 21 patterns. 57% of the sentences are turned out to have hyponymy relation. The proposed method in this paper is simpler and more advanced than that in Cederberg and Widdows (2003), in that using a word net or an electronic dictionary is generally considered to be efficient for information retrieval. The patterns extracted by this method are helpful when we look fer appropriate documents during information retrieval, and they are used to expand the concept networks like ontologies or thesauruses. However, the word order of Korean is relatively free and it is difficult to capture various expressions of a fired pattern. In the future, we should investigate more semantic relations than hyponymy, so that we can extract various patterns from the corpus.

  • PDF

A study on trends and predictions through analysis of linkage analysis based on big data between autonomous driving and spatial information (자율주행과 공간정보의 빅데이터 기반 연계성 분석을 통한 동향 및 예측에 관한 연구)

  • Cho, Kuk;Lee, Jong-Min;Kim, Jong Seo;Min, Guy Sik
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.2
    • /
    • pp.101-115
    • /
    • 2020
  • In this paper, big data analysis method was used to find out global trends in autonomous driving and to derive activate spatial information services. The applied big data was used in conjunction with news articles and patent document in order to analysis trend in news article and patents document data in spatial information. In this paper, big data was created and key words were extracted by using LDA (Latent Dirichlet Allocation) based on the topic model in major news on autonomous driving. In addition, Analysis of spatial information and connectivity, global technology trend analysis, and trend analysis and prediction in the spatial information field were conducted by using WordNet applied based on key words of patent information. This paper was proposed a big data analysis method for predicting a trend and future through the analysis of the connection between the autonomous driving field and spatial information. In future, as a global trend of spatial information in autonomous driving, platform alliances, business partnerships, mergers and acquisitions, joint venture establishment, standardization and technology development were derived through big data analysis.

Social Network Analysis on Research Keywords of Child-Occupation Studies (아동의 작업 연구주제어의 사회연결망 분석)

  • Ha, Seong-Kyu;Park, Kang-Hyun
    • Therapeutic Science for Rehabilitation
    • /
    • v.12 no.4
    • /
    • pp.39-51
    • /
    • 2023
  • Objective : This study seeks to unveil the intellectual framework of research surrounding children's occupations by utilizing social network analysis of keywords from studies focused on childhood. Methods : From August 2003 to August 2023, we analyzed 3,364 keywords extracted from 270 research articles in the Korean Citation Index with the keyword "Child and Occupation" using the NetMiner program. Results : Research on children's work has increased quantitatively over the past decade. Keywords exhibiting a high degree of centrality in the realm of child occupation research included Task (0.055), Group therapy (0.040), Working memory (0.037), Intervention (0.033), Performance (0.030), Language (0.026), Ability (0.026), Skill (0.024), and Program (0.023). Notably, the weighted terms in the Word Network included Evaluation-Tool (30), School-Student (15), and Activity-Participation (15). The primary keywords from each topic in topic modeling were Activity (0.295), Disability (0.604), Education (0.356), Skill (0.478), School (0.317), Function (0.462), Disorder (0.324), Language (0.310), Comprehension (0.412), and Training (0.511). Conclusion : This study describes the trends in the domestic field of pediatric occupational research. These efforts provided valuable insights into pediatric occupational therapy in South Korea.

A Categorization Scheme of Tag-based Folksonomy Images for Efficient Image Retrieval (효과적인 이미지 검색을 위한 태그 기반의 폭소노미 이미지 카테고리화 기법)

  • Ha, Eunji;Kim, Yongsung;Hwang, Eenjun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.6
    • /
    • pp.290-295
    • /
    • 2016
  • Recently, folksonomy-based image-sharing sites where users cooperatively make and utilize tags of image annotation have been gaining popularity. Typically, these sites retrieve images for a user request using simple text-based matching and display retrieved images in the form of photo stream. However, these tags are personal and subjective and images are not categorized, which results in poor retrieval accuracy and low user satisfaction. In this paper, we propose a categorization scheme for folksonomy images which can improve the retrieval accuracy in the tag-based image retrieval systems. Consequently, images are classified by the semantic similarity using text-information and image-information generated on the folksonomy. To evaluate the performance of our proposed scheme, we collect folksonomy images and categorize them using text features and image features. And then, we compare its retrieval accuracy with that of existing systems.

A Study of Consumer Perception on Fashion Show Using Big Data Analysis (빅데이터를 활용한 패션쇼에 대한 소비자 인식 연구)

  • Kim, Da Jeong;Lee, Seunghee
    • Journal of Fashion Business
    • /
    • v.23 no.3
    • /
    • pp.85-100
    • /
    • 2019
  • This study examines changes in consumer perceptions of fashion shows, which are critical elements in the apparel industry and a means to represent a brand's image and originality. For this purpose, big data in clothing marketing, text mining, semantic network analysis techniques were applied. This study aims to verify the effectiveness and significance of fashion shows in an effort to give directions for their future utilization. The study was conducted in two major stages. First, data collection with the key word, "fashion shows," was conducted across websites, including Naver and Daum between 2015 and 2018. The data collection period was divided into the first- and second-half periods. Next, Textom 3.0 was utilized for data refinement, text mining, and word clouding. The Ucinet 6.0 and NetDraw, were used for semantic network analysis, degree centrality, CONCOR analysis and also visualization. The level of interest in "models" was found to be the highest among the perception factors related to fashion shows in both periods. In the first-half period, the consumer interests focused on detailed visual stimulants such as model and clothing while in the second-half period, perceptions changed as the value of designers and brands were increasingly recognized over time. The findings of this study can be utilized as a tool to evaluate fashion shows, the apparel industry sectors, and the marketing methods. Additionally, it can also be used as a theoretical framework for big data analysis and as a basis of strategies and research in industrial developments.

Analysis on Domestic Franchise Food Tech Interest by using Big Data

  • Hyun Seok Kim;Yang-Ja Bae;Munyeong Yun;Gi-Hwan Ryu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.179-184
    • /
    • 2024
  • Franchise are now a red ocean in Food industry and they need to find other options to appeal for their product, the uprising content, food tech. The franchises are working on R&D to help franchisees with the operations. Through this paper, we analyze the franchise interest on food tech and to help find the necessity of development for franchisees who are in needs with hand, not of human, but of technology. Using Textom, a big data analysis tool, "franchise" and "food tech" were selected as keywords, and search frequency information of Naver and Daum was collected for a year from 01 January, 2023 to 31 December, 2023, and data preprocessing was conducted based on this. For the suitability of the study and more accurate data, data not related to "food tech" was removed through the refining process, and similar keywords were grouped into the same keyword to perform analysis. As a result of the word refining process, a total of 10,049 words were derived, and among them, the top 50 keywords with the highest relevance and search frequency were selected and applied to this study. The top 50 keywords derived through word purification were subjected to TF-IDF analysis, visualization analysis using Ucinet6 and NetDraw programs, network analysis between keywords, and cluster analysis between each keyword through Concor analysis. By using big data analysis, it was found out that franchise do have interest on food tech. "technology", "franchise", "robots" showed many interests and keyword "R&D" showed that franchise are keen on developing food tech to seize competitiveness in Franchise Industry.

Detecting Weak Signals for Carbon Neutrality Technology using Text Mining of Web News (탄소중립 기술의 미래신호 탐색연구: 국내 뉴스 기사 텍스트데이터를 중심으로)

  • Jisong Jeong;Seungkook Roh
    • Journal of Industrial Convergence
    • /
    • v.21 no.5
    • /
    • pp.1-13
    • /
    • 2023
  • Carbon neutrality is the concept of reducing greenhouse gases emitted by human activities and making actual emissions zero through removal of remaining gases. It is also called "Net-Zero" and "carbon zero". Korea has declared a "2050 Carbon Neutrality policy" to cope with the climate change crisis. Various carbon reduction legislative processes are underway. Since carbon neutrality requires changes in industrial technology, it is important to prepare a system for carbon zero. This paper aims to understand the status and trends of global carbon neutrality technology. Therefore, ROK's web platform "www.naver.com." was selected as the data collection scope. Korean online articles related to carbon neutrality were collected. Carbon neutrality technology trends were analyzed by future signal methodology and Word2Vec algorithm which is a neural network deep learning technology. As a result, technology advancement in the steel and petrochemical sectors, which are carbon over-release industries, was required. Investment feasibility in the electric vehicle sector and technology advancement were on the rise. It seems that the government's support for carbon neutrality and the creation of global technology infrastructure should be supported. In addition, it is urgent to cultivate human resources, and possible to confirm the need to prepare support policies for carbon neutrality.