• Title/Summary/Keyword: vocabulary database

Search Result 53, Processing Time 0.029 seconds

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

  • Soares, Felipe;Tateisi, Yuka;Takatsuki, Terue;Yamaguchi, Atsuko
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.26.1-26.3
    • /
    • 2021
  • Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.

The Development of a System for Product Search Using a Sensibility and Configuration Database on Designing Men's Jackets (신사복 재킷디자인의 감성 및 형상 데이터베이스를 이용한 제품검색 시스템 개발에 관한 연구)

  • Park, Yun-A
    • Journal of the Korean Home Economics Association
    • /
    • v.44 no.4 s.218
    • /
    • pp.133-144
    • /
    • 2006
  • The contemporary period is called "the age of sensibility" in which each individual consumer seeks to have her or his own products. Businesses are in need of design developments with an emphasis on customer sensitivity, and at the same time consumers must understand their own sensitivity to acquire information on designs that suit them. This research established a sensitivity and configuration database on designing men's jackets using the sensitivity engineering approach to clothing design information. The user interface was created on the Internet. Sixty-seven sensitivity terms of vocabulary appropriate for the assessment of men's jacket design were selected, and the different designs were classified into six items and 24 categories. Thirty men's jackets with different designs were produced for sensory testing and the results were analyzed in accordance with general linear I statistics. A sensitivity database was established for each category. My-sql, PHP, Java Script, and Html were used for the configuration database work. The configuration of items/categories, with the most appropriate sensitivity database information assigned to the selected sensitivity vocabulary, was programmed for display on the computer screen. The sensitivity vocabulary of a customer's choice for each factor was selected for the program to run, while the category and product configuration of the men's jacket most suitable for the search was displayed based on the user interface.

News Article Identification Methods in Natural Language Processing on Artificial Intelligence & Bigdata

  • Kang, Jangmook;Lee, Sangwon
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.345-351
    • /
    • 2021
  • This study is designed to determine how to identify misleading news articles based on natural language processing on Artificial Intelligence & Bigdata. A misleading news discrimination system and method on natural language processing is initiated according to an embodiment of this study. The natural language processing-based misleading news identification system, which monitors the misleading vocabulary database, Internet news articles, collects misleading news articles, extracts them from the titles of the collected misleading news articles, and stores them in the misleading vocabulary database. Therefore, the use of the misleading news article identification system and methods in this study does not take much time to judge because only relatively short news titles are morphed analyzed, and the use of a misleading vocabulary database provides an effect on identifying misleading articles that attract readers with exaggerated or suggestive phrases. For the aim of our study, we propose news article identification methods in natural language processing on Artificial Intelligence & Bigdata.

An experiment to enhance subject access in korean online public access catalog (온라인 열람목록의 주제탐색 강화를 위한 실험적 연구)

  • 장혜란;홍지윤
    • Journal of Korean Library and Information Science Society
    • /
    • v.25
    • /
    • pp.83-107
    • /
    • 1996
  • The purpose of this study is to experiment online public access catalog enhancements to improve its subject access capability. Three catalog databases, enhanced with title keywords, controlled vocabulary, and content words with controlled vocabulary respectively, were implemented. 18 searchers performed 2 subject searshes against 3 different catalog databases. And the transaction logs are analyzed. The results of the study can be summarized as follows : Controlled vocabulary catalog database achieved 41.8% recall ratio in average ; the addition of table of contents words to the controlled vocabulary is an effective technique with increasing recall ration upto 55% without decreasing precision ; and the database enhanced with title keywords shows 31.7% recall ratio in average. Of the three kinds of catalog databases, only the catalog with contents words produced 2 unique relevant documents. The results indicate that both user training and system development is required to have better search performance in online public access catalog.

  • PDF

Efficient Vocabulary Optimization Management using VCOR (VCOR를 이용한 효율적인 어휘 최적화 관리)

  • Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.10
    • /
    • pp.1436-1443
    • /
    • 2010
  • In vocabulary recognition system has it's bad points of processing vocabulary unseen triphone and then no got distribution of confidence measure by cannot normalization. According to this problem to improve suggested VCOR(Version Control for Out-of Rejection) system by out-of vocabulary rejection algorithm use vocabulary management optimization and then phone data search support. In VCOR system to provide vocabulary information efficiently offering for user's vocabulary information using extend facet classification that improved for vocabulary measure management function offering accuracy of recognition for vocabulary. In this paper proposed system performance as a result of represent vocabulary dependence recognition rate of 97.56%, vocabulary independence recognition rate of 96.23%.

Vocabulary Likelihood rate Process support for Recognition rate Improvement of Vocabulary Recognition System (어휘 인식 시스템의 인식률 향상을 위한 어휘 유사율 처리 지원)

  • Kim, Kyuho;Oh, Sang Yeob
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.359-363
    • /
    • 2012
  • In the vocabulary recognition model, system has some problems that vocabulary is nor recognize and similar vocabulary recognition is created., because it is caused by system extract vocabulary feature from inaccurate vocabulary. To solve this problems, this paper propose the system modeling and implementation for efficient configuration thread support system, it process the configuration thread information and it apply the facet method in database retrieve for optimization of vocabulary likelihood rate. Proposed system showed 95.31% of vocabulary dependency recognition rate and 97.38% vocabulary independency recognition rate in system performance.

Performance of Vocabulary-Independent Speech Recognizers with Speaker Adaptation

  • Kwon, Oh Wook;Un, Chong Kwan;Kim, Hoi Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1E
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we investigated performance of a vocabulary-independent speech recognizer with speaker adaptation. The vocabulary-independent speech recognizer does not require task-oriented speech databases to estimate HMM parameters, but adapts the parameters recursively by using input speech and recognition results. The recognizer has the advantage that it relieves efforts to record the speech databases and can be easily adapted to a new task and a new speaker with different recognition vocabulary without losing recognition accuracies. Experimental results showed that the vocabulary-independent speech recognizer with supervised offline speaker adaptation reduced 40% of recognition errors when 80 words from the same vocabulary as test data were used as adaptation data. The recognizer with unsupervised online speaker adaptation reduced abut 43% of recognition errors. This performance is comparable to that of a speaker-independent speech recognizer trained by a task-oriented speech database.

  • PDF

Efficient context dependent process modeling using state tying and decision tree-based method (상태 공유와 결정트리 방법을 이용한 효율적인 문맥 종속 프로세스 모델링)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.3
    • /
    • pp.369-377
    • /
    • 2010
  • In vocabulary recognition systems based on HMM(Hidden Markov Model)s, training process unseen model bring on show a low recognition rate. If recognition vocabulary modify and make an addition then recreated modeling of executed database collected and training sequence on account of bring on additional expenses and take more time. This study suggest efficient context dependent process modeling method using decision tree-based state tying. On study suggest method is reduce recreated of model and it's offered that robustness and accuracy of context dependent acoustic modeling. Also reduce amount of model and offered training process unseen model as concerns context dependent a likely phoneme model has been used unseen model solve the matter. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.38%.

The study analyzed a diachronic distribution, social meanings and social evaluations of ONNA : 'Headline Database of Newspaper Articles' by KOKKEN were used as research data. (「여(女)」 관련 어휘의 사용실태 - 国研「ことばに関する新聞記事見出しデ?タベ?ス」를 분석대상으로)

  • Oh, Mi sun
    • Cross-Cultural Studies
    • /
    • v.29
    • /
    • pp.341-366
    • /
    • 2012
  • 'Headline Database of Newspaper Articles' is a database which contains about 141,500 newspaper articles from 1949 to March, 2009. They are collected from two perspectives; 'language' and 'language life' by KOKKEN. There were 3312 newspaper articles (about 2.34%) which included the word ONNA at 'Headline Database of Newspaper Articles'. The number of newspaper articles related to ONNA started to increase in 1975 but they decreased afterwards. They increased rapidly in 1980 and maintained the condition. However, they started to decrease rapidly in 1990 and maintained the decreased condition. They increased rapidly again in 2004 and 2007. The main causes of rapid increase were the commercial message of instant noodles "I am the one who is making. I am the one who is eating." in 1975, newspaper articles related to "Starting of full-scale studies on female language" in 1980, comments of "active women" and "men's crime" related to a murder case of an elementary school student in Sasebo City and mixed attendance books in 2004, a comment of "Women are machines which give birth to babies" in 2007. Those six causes of rapid increase suggested that the perception of gender such as 'Men need to work outside and Women need to do housework and take care of child' which was fixed until then was changing and becoming a stereotype of virtual reality rather than reality. The vocabulary related to ONNA appeared 3411 times among 3312 newspaper articles which included ONNA. Typical forms of the vocabulary related to ONNA were and . They appeared 2390 times and occupied 70% of the whole data. (3411 times) The form of ONNANOKO among the vocabulary related to ONNA appeared 113 times and occupied a high rate. ONNANOKO(113) and other words such as SHOJO(115), JOJI(28), YOJO(9) (152 in total) implied that appearing of young women at newspaper articles were increasing. Also, the vocabulary related to 'female language' such as ONNAKOTOBA(28) ONNANOKOTOBA(10) and a woman's heart such as ONNAGOKORO(35) and ONNANOKIMOCHI(34) appeared frequently. The vocabulary related to JOSEI were divided into <$JOSEI^{**}$> and <$^{**}JOSEI$>. <$JOSEI^{**}$> were mainly related to an occupation. <$^{**}JOSEI$> were mainly used to express women by regional groups such as or combined with modifiers to express women such as . In case with modifiers, WAKAIJOSEI appeared 35 times and showed the highest frequency. It had negative evaluations in many cases. The vocabulary related to JOSI appeared on the form of <$JOSI^{**}$> and mainly associated with 'a girl's school' and 'a female student'.

Vocabulary Generation Method by Optical Character Recognition (광학 문자 인식을 통한 단어 정리 방법)

  • Kim, Nam-Gyu;Kim, Dong-Eon;Kim, Seong-Woo;Kwon, Soon-Kak
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.8
    • /
    • pp.943-949
    • /
    • 2015
  • A reader usually spends a lot of time browsing and searching word meaning in a dictionary, internet or smart applications in order to find the unknown words. In this paper, we propose a method to compensate this drawback. The proposed method introduces a vocabulary upon recognizing a word or group of words that was captured by a smart phone camera. Through this proposed method, organizing and editing words that were captured by smart phone, searching the dictionary data using bisection method, listening pronunciation with the use of speech synthesizer, building and editing of vocabulary stored in database are given as the features. A smart phone application for organizing English words was established. The proposed method significantly reduces the organizing time for unknown English words and increases the English learning efficiency.