• Title/Summary/Keyword: vocabulary database

Search Result 53, Processing Time 0.024 seconds

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

  • Soares, Felipe;Tateisi, Yuka;Takatsuki, Terue;Yamaguchi, Atsuko
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.26.1-26.3
    • /
    • 2021
  • Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.

신사복 재킷디자인의 감성 및 형상 데이터베이스를 이용한 제품검색 시스템 개발에 관한 연구 (The Development of a System for Product Search Using a Sensibility and Configuration Database on Designing Men's Jackets)

  • 박윤아
    • 대한가정학회지
    • /
    • 제44권4호
    • /
    • pp.133-144
    • /
    • 2006
  • The contemporary period is called "the age of sensibility" in which each individual consumer seeks to have her or his own products. Businesses are in need of design developments with an emphasis on customer sensitivity, and at the same time consumers must understand their own sensitivity to acquire information on designs that suit them. This research established a sensitivity and configuration database on designing men's jackets using the sensitivity engineering approach to clothing design information. The user interface was created on the Internet. Sixty-seven sensitivity terms of vocabulary appropriate for the assessment of men's jacket design were selected, and the different designs were classified into six items and 24 categories. Thirty men's jackets with different designs were produced for sensory testing and the results were analyzed in accordance with general linear I statistics. A sensitivity database was established for each category. My-sql, PHP, Java Script, and Html were used for the configuration database work. The configuration of items/categories, with the most appropriate sensitivity database information assigned to the selected sensitivity vocabulary, was programmed for display on the computer screen. The sensitivity vocabulary of a customer's choice for each factor was selected for the program to run, while the category and product configuration of the men's jacket most suitable for the search was displayed based on the user interface.

News Article Identification Methods in Natural Language Processing on Artificial Intelligence & Bigdata

  • Kang, Jangmook;Lee, Sangwon
    • International Journal of Advanced Culture Technology
    • /
    • 제9권3호
    • /
    • pp.345-351
    • /
    • 2021
  • This study is designed to determine how to identify misleading news articles based on natural language processing on Artificial Intelligence & Bigdata. A misleading news discrimination system and method on natural language processing is initiated according to an embodiment of this study. The natural language processing-based misleading news identification system, which monitors the misleading vocabulary database, Internet news articles, collects misleading news articles, extracts them from the titles of the collected misleading news articles, and stores them in the misleading vocabulary database. Therefore, the use of the misleading news article identification system and methods in this study does not take much time to judge because only relatively short news titles are morphed analyzed, and the use of a misleading vocabulary database provides an effect on identifying misleading articles that attract readers with exaggerated or suggestive phrases. For the aim of our study, we propose news article identification methods in natural language processing on Artificial Intelligence & Bigdata.

온라인 열람목록의 주제탐색 강화를 위한 실험적 연구 (An experiment to enhance subject access in korean online public access catalog)

  • 장혜란;홍지윤
    • 한국도서관정보학회지
    • /
    • 제25권
    • /
    • pp.83-107
    • /
    • 1996
  • The purpose of this study is to experiment online public access catalog enhancements to improve its subject access capability. Three catalog databases, enhanced with title keywords, controlled vocabulary, and content words with controlled vocabulary respectively, were implemented. 18 searchers performed 2 subject searshes against 3 different catalog databases. And the transaction logs are analyzed. The results of the study can be summarized as follows : Controlled vocabulary catalog database achieved 41.8% recall ratio in average ; the addition of table of contents words to the controlled vocabulary is an effective technique with increasing recall ration upto 55% without decreasing precision ; and the database enhanced with title keywords shows 31.7% recall ratio in average. Of the three kinds of catalog databases, only the catalog with contents words produced 2 unique relevant documents. The results indicate that both user training and system development is required to have better search performance in online public access catalog.

  • PDF

VCOR를 이용한 효율적인 어휘 최적화 관리 (Efficient Vocabulary Optimization Management using VCOR)

  • 오상엽
    • 한국멀티미디어학회논문지
    • /
    • 제13권10호
    • /
    • pp.1436-1443
    • /
    • 2010
  • 어휘 인식 시스템에서는 처리되는 어휘가 나타나지 않는 미 출현 트라이 폰이 존재하는 단점이 있으며 이에 따른 신뢰도의 분포를 가지고 있지 않기 때문에 정규화를 수행할 수 없다. 따라서 이를 개선하기 위하여 미등록어 거절 알고리즘에서 사용되는 어휘 관리를 최적화하고 음소 단위로 데이터 탐색을 지원하는 VCOR 시스템을 제안한다. 또한 VCOR에서는 어휘 정보를 효율적으로 제공하기 위해 확장 facet 분류를 이용하여 사용자에게 어휘 단위의 정보를 제공하고, 어휘에 대한 향상된 추적 관리 가능을 제공하여 어휘에 대한 인식의 정확성을 제공한다. 본 논문에서 제안한 시스템을 적용한 결과 시스템 성능에서 어휘 종속 인식률은 97.56%, 어휘 독립 인식률은 96.23%의 인식률을 나타내었다.

어휘 인식 시스템의 인식률 향상을 위한 어휘 유사율 처리 지원 (Vocabulary Likelihood rate Process support for Recognition rate Improvement of Vocabulary Recognition System)

  • 김규호;오상엽
    • 디지털융복합연구
    • /
    • 제10권11호
    • /
    • pp.359-363
    • /
    • 2012
  • 어휘 인식 모델에서는 정확하지 않은 어휘로 부터 특징을 추출하기 때문에 어휘가 실제 어휘와 유사한 어휘로 인식되거나 인식이 되지 않는 현상이 나타난다. 이를 위해 본 논문에서는 효율적인 형상 형성을 지원하는 시스템을 모델링하고 구현하였으며, 형상 형성 정보를 효율적으로 처리하고 어휘 유사율 관리를 최적화하기 위해 데이터베이스 검색에서 facet 방법을 응용하였다. 본 논문에서 제안한 시스템을 적용한 결과 시스템 성능에서 어휘 종속 인식률은 95.31%, 어휘 독립 인식률은 97.38%의 인식률을 나타내었다.

Performance of Vocabulary-Independent Speech Recognizers with Speaker Adaptation

  • Kwon, Oh Wook;Un, Chong Kwan;Kim, Hoi Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • 제16권1E호
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we investigated performance of a vocabulary-independent speech recognizer with speaker adaptation. The vocabulary-independent speech recognizer does not require task-oriented speech databases to estimate HMM parameters, but adapts the parameters recursively by using input speech and recognition results. The recognizer has the advantage that it relieves efforts to record the speech databases and can be easily adapted to a new task and a new speaker with different recognition vocabulary without losing recognition accuracies. Experimental results showed that the vocabulary-independent speech recognizer with supervised offline speaker adaptation reduced 40% of recognition errors when 80 words from the same vocabulary as test data were used as adaptation data. The recognizer with unsupervised online speaker adaptation reduced abut 43% of recognition errors. This performance is comparable to that of a speaker-independent speech recognizer trained by a task-oriented speech database.

  • PDF

상태 공유와 결정트리 방법을 이용한 효율적인 문맥 종속 프로세스 모델링 (Efficient context dependent process modeling using state tying and decision tree-based method)

  • 안찬식;오상엽
    • 한국멀티미디어학회논문지
    • /
    • 제13권3호
    • /
    • pp.369-377
    • /
    • 2010
  • HMM(Hidden Markov Model)을 사용하는 어휘 인식 시스템에서 인식 시 훈련 중에 나타나지 않는 모델들로 인해 인식률의 저하를 가져오며 인식 대상 어휘가 변경되거나 추가되면 데이터베이스의 수집과 훈련 과정을 수행하여 모델을 재생성해야 하고 그에 따른 시간과 추가 비용이 초래된다. 본 논문에서는 결정 트리 방법과 모델 공유 방법을 사용하여 효율적인 문맥 종속 프로세스 모델링 방법을 제안하였다. 제안한 방법은 생성된 모델들로부터 모델 공유 방법을 이용하여 모델의 재생성 과정을 줄이고 강인하고 정확한 문맥 종속 음향 모델링을 제공한다. 또한, 모델의 수를 줄이고 훈련 중에 나타나지 않는 모델들에 대해 문맥 종속 유사 음소 모델을 제공하여 훈련 중에 나타나지 않는 모델의 문제점을 해결하고 훈련성을 확보하였다. 제안된 방법으로 6종류의 음성 데이터베이스를 이용하여 어휘 종속 인식과 어휘 독립 인식 실험을 수행한 결과 어휘 종속 인식 실험에서는 98.01%의 성능을 보였고, 어휘 독립 인식 실험에서 97.38%의 성능을 보였다.

「여(女)」 관련 어휘의 사용실태 - 国研「ことばに関する新聞記事見出しデ?タベ?ス」를 분석대상으로 (The study analyzed a diachronic distribution, social meanings and social evaluations of ONNA : 'Headline Database of Newspaper Articles' by KOKKEN were used as research data.)

  • 오미선
    • 비교문화연구
    • /
    • 제29권
    • /
    • pp.341-366
    • /
    • 2012
  • 'Headline Database of Newspaper Articles' is a database which contains about 141,500 newspaper articles from 1949 to March, 2009. They are collected from two perspectives; 'language' and 'language life' by KOKKEN. There were 3312 newspaper articles (about 2.34%) which included the word ONNA at 'Headline Database of Newspaper Articles'. The number of newspaper articles related to ONNA started to increase in 1975 but they decreased afterwards. They increased rapidly in 1980 and maintained the condition. However, they started to decrease rapidly in 1990 and maintained the decreased condition. They increased rapidly again in 2004 and 2007. The main causes of rapid increase were the commercial message of instant noodles "I am the one who is making. I am the one who is eating." in 1975, newspaper articles related to "Starting of full-scale studies on female language" in 1980, comments of "active women" and "men's crime" related to a murder case of an elementary school student in Sasebo City and mixed attendance books in 2004, a comment of "Women are machines which give birth to babies" in 2007. Those six causes of rapid increase suggested that the perception of gender such as 'Men need to work outside and Women need to do housework and take care of child' which was fixed until then was changing and becoming a stereotype of virtual reality rather than reality. The vocabulary related to ONNA appeared 3411 times among 3312 newspaper articles which included ONNA. Typical forms of the vocabulary related to ONNA were and . They appeared 2390 times and occupied 70% of the whole data. (3411 times) The form of ONNANOKO among the vocabulary related to ONNA appeared 113 times and occupied a high rate. ONNANOKO(113) and other words such as SHOJO(115), JOJI(28), YOJO(9) (152 in total) implied that appearing of young women at newspaper articles were increasing. Also, the vocabulary related to 'female language' such as ONNAKOTOBA(28) ONNANOKOTOBA(10) and a woman's heart such as ONNAGOKORO(35) and ONNANOKIMOCHI(34) appeared frequently. The vocabulary related to JOSEI were divided into <$JOSEI^{**}$> and <$^{**}JOSEI$>. <$JOSEI^{**}$> were mainly related to an occupation. <$^{**}JOSEI$> were mainly used to express women by regional groups such as or combined with modifiers to express women such as . In case with modifiers, WAKAIJOSEI appeared 35 times and showed the highest frequency. It had negative evaluations in many cases. The vocabulary related to JOSI appeared on the form of <$JOSI^{**}$> and mainly associated with 'a girl's school' and 'a female student'.

광학 문자 인식을 통한 단어 정리 방법 (Vocabulary Generation Method by Optical Character Recognition)

  • 김남규;김동언;김성우;권순각
    • 한국멀티미디어학회논문지
    • /
    • 제18권8호
    • /
    • pp.943-949
    • /
    • 2015
  • A reader usually spends a lot of time browsing and searching word meaning in a dictionary, internet or smart applications in order to find the unknown words. In this paper, we propose a method to compensate this drawback. The proposed method introduces a vocabulary upon recognizing a word or group of words that was captured by a smart phone camera. Through this proposed method, organizing and editing words that were captured by smart phone, searching the dictionary data using bisection method, listening pronunciation with the use of speech synthesizer, building and editing of vocabulary stored in database are given as the features. A smart phone application for organizing English words was established. The proposed method significantly reduces the organizing time for unknown English words and increases the English learning efficiency.