• Title/Summary/Keyword: Word alignment

Search Result 47, Processing Time 0.023 seconds

Domain-Specific Terminology Mapping Methodology Using Supervised Autoencoders (지도학습 오토인코더를 이용한 전문어의 범용어 공간 매핑 방법론)

  • Byung Ho Yoon;Junwoo Kim;Namgyu Kim
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.93-110
    • /
    • 2023
  • Recently, attempts have been made to convert unstructured text into vectors and to analyze vast amounts of natural language for various purposes. In particular, the demand for analyzing texts in specialized domains is rapidly increasing. Therefore, studies are being conducted to analyze specialized and general-purpose documents simultaneously. To analyze specific terms with general terms, it is necessary to align the embedding space of the specific terms with the embedding space of the general terms. So far, attempts have been made to align the embedding of specific terms into the embedding space of general terms through a transformation matrix or mapping function. However, the linear transformation based on the transformation matrix showed a limitation in that it only works well in a local range. To overcome this limitation, various types of nonlinear vector alignment methods have been recently proposed. We propose a vector alignment model that matches the embedding space of specific terms to the embedding space of general terms through end-to-end learning that simultaneously learns the autoencoder and regression model. As a result of experiments with R&D documents in the "Healthcare" field, we confirmed the proposed methodology showed superior performance in terms of accuracy compared to the traditional model.

Enhancing Performance of Bilingual Lexicon Extraction through Refinement of Pivot-Context Vectors (중간언어 문맥벡터의 정제를 통한 이중언어 사전 구축의 성능개선)

  • Kwon, Hong-Seok;Seo, Hyung-Won;Kim, Jae-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.7
    • /
    • pp.492-500
    • /
    • 2014
  • This paper presents the performance enhancement of automatic bilingual lexicon extraction by using refinement of pivot-context vectors under the standard pivot-based approach, which is very effective method for less-resource language pairs. In this paper, we gradually improve the performance through two different refinements of pivot-context vectors: One is to filter out unhelpful elements of the pivot-context vectors and to revise the values of the vectors through bidirectional translation probabilities estimated by Anymalign and another one is to remove non-noun elements from the original vectors. In this paper, experiments have been conducted on two different language pairs that are bi-directional Korean-Spanish and Korean-French, respectively. The experimental results have demonstrated that our method for high-frequency words shows at least 48.5% at the top 1 and up to 88.5% at the top 20 and for the low-frequency words at least 43.3% at the top 1 and up to 48.9% at the top 20.

Implementation of the Automatic Segmentation and Labeling System (자동 음성분할 및 레이블링 시스템의 구현)

  • Sung, Jong-Mo;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.50-59
    • /
    • 1997
  • In this paper, we implement an automatic speech segmentation and labeling system which marks phone boundaries automatically for constructing the Korean speech database. We specify and implement the system based on conventional speech segmentation and labeling techniques, and also develop the graphic user interface(GUI) on Hangul $Motif^{TM}$ environment for the users to examine the automatic alignment boundaries and to refine them easily. The developed system is applied to 16kHz sampled speech, and the labeling unit is composed of 46 phoneme-like units(PLUs) and silence. The system uses both of the phonetic and orthographic transcription as input methods of linguistic information. For pattern-matching method, hidden Markov models(HMM) is employed. Each phoneme model is trained using the manually segmented 445 phonetically balanced word (PBW) database. In order to evaluate the performance of the system, we test it using another database consisting of sentence-type speech. According to our experiment, 74.7% of phoneme boundaries are within 20ms of the true boundary and 92.8% are within 40ms.

  • PDF

The Parallel Corpus Approach to Building the Syntactic Tree Transfer Set in the English-to- Vietnamese Machine Translation

  • Dien Dinh;Ngan Thuy;Quang Xuan;Nam Chi
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.382-386
    • /
    • 2004
  • Recently, with the machine learning trend, most of the machine translation systems on over the world use two syntax tree sets of two relevant languages to learn syntactic tree transfer rules. However, for the English-Vietnamese language pair, this approach is impossible because until now we have not had a Vietnamese syntactic tree set which is correspondent to English one. Building of a very large correspondent Vietnamese syntactic tree set (thousands of trees) requires so much work and take the investment of specialists in linguistics. To take advantage from our available English-Vietnamese Corpus (EVC) which was tagged in word alignment, we choose the SITG (Stochastic Inversion Transduction Grammar) model to construct English- Vietnamese syntactic tree sets automatically. This model is used to parse two languages at the same time and then carry out the syntactic tree transfer. This English-Vietnamese bilingual syntactic tree set is the basic training data to carry out transferring automatically from English syntactic trees to Vietnamese ones by machine learning models. We tested the syntax analysis by comparing over 10,000 sentences in the amount of 500,000 sentences of our English-Vietnamese bilingual corpus and first stage got encouraging result $(analyzed\;about\;80\%)[5].$ We have made use the TBL algorithm (Transformation Based Learning) to carry out automatic transformations from English syntactic trees to Vietnamese ones based on that parallel syntactic tree transfer set[6].

  • PDF

A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments (제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue;Chung, Woo-Keun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.10
    • /
    • pp.788-801
    • /
    • 2010
  • Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.

A Case Study of Hyundai Motors: Live Brilliant Campaign for Modern Premium Brand

  • Choi, Myounghwa;Lee, Yoonseo;Koo, Kay Ryung;Lee, Janghyuk
    • Asia Marketing Journal
    • /
    • v.16 no.4
    • /
    • pp.75-87
    • /
    • 2015
  • As more companies become interested in global markets, it has become crucial for firms to create globalized brands whose positioning, advertising strategy, personality, looks, and feel are consistent across nations. The purpose of this study is to investigate the global branding strategy of the Hyundai Motor Company (hereafter HMC) in order to show how the company processes its branding strategy. HMC, one of the leading global companies in the automobile industry, set up its brand identity as "Modern premium", in alignment with their new slogan "New Thinking New Possibilities", in 2011. The aim of the "Modern premium" concept was to provide consumers with new experiences and values beyond their expectations. HMC wanted their consumers to think of their cars as not only a medium of transportation but as a life space, where they can share experiences alongside HMC. In an effort to conduct consumer research in 5 different nations, HMC selected "brilliant" as a key communication concept. The word "brilliant" expresses the functional, experiential, and emotional dimensions of HMC. HMC furthermore chose "live brilliant" as a key campaign message in order to reinforce their communication concept. After this decision, the "live brilliant" campaign was exhibited through major broadcast channels around the world. The campaign was the company's first worldwide brand campaign, where a single message was applied to all major markets, with the goal of building up a consistent image as a global brand. This global branding strategy is worth examining due to its significant contribution to growth generation in the global market. Overall, the 'live brilliant' global brand campaign not only improved HMC's reputation image-wise, with the 'Modern Premium' conceptualization of the brand as 'simple', 'creative' and 'caring', but also improved the consumer's familiarity, preference and purchase intention of HMC. In fact, the "live brilliant" campaign was a successful campaign which increased HMC's brand value. Notably, HMC's brand value increased continuously and reached 9 billion US dollars in 2013, leading it to reach 43rd place in the Global Brand Rankings according to the brand consulting group Interbrand. Its brand value largely surpassed that of Nissan (65th) and Chevrolet (89th) in 2013. While it is true that the global branding strategy of HMC involved higher risks, it was highly successful according to cross-nation consumer research. Therefore, this paper concludes that the global branding strategy of HMC made a positive impact on its performance. We further suggest HMC to combine its successful marketing with social media such as Facebook, Twitter, and Instagram and embrace digital media by extending its brand communication horizon to the mobile internet

Evaluation on the Implementation of Girl Friendly Science Activity (여학생 친화적 과학활동 프로그램의 운영 평가)

  • Jhun, Young-Seok;Shin, Young-Joon
    • Journal of The Korean Association For Science Education
    • /
    • v.24 no.3
    • /
    • pp.442-458
    • /
    • 2004
  • This study was conducted to develop a plan for a large-scale implementation of the Girl Friendly Science Program based on the results of analysis and investigation of its current pilot implementation, Girl Friendly Science Program materials, which was first developed in 1999 with the support from Ministry of Gender Equality, consist of 1) five theme-based units that are specifically targeted individual students' unique ability, aptitude, and career choice, and 2) differentiated learning materials for 7th through 10th grade female students. All the materials are available at the homepage (http://tes.or.kr/gfsp.cgi) of 'Teachers for Exciting Science(the organization of science teachers in Seoul area)'. Since the materials are well organized by topic and grade level and presented in both Korean word process document and html format, anyone can easily access to the materials for their own instructional use. Ever since its launch the number of visitors to the homepage has been constantly increasing. The evaluation results of the current pilot implementation of the materials that targeted individual students' ability and aptitude showed that it scored high in terms of its alignment to the original purpose, content, level, and effectiveness to implement in classrooms. However, its evaluation scores were low in terms of the convenience for teachers to guide the materials, and its organization and operation. The results also showed a significant change in students' perception of science, and students' positive experiences of science through various interdisciplinary activities. On the other hand, the evaluation of students' experiences with the materials showed that students' assessment about an activity was largely depending on a success or failure of their experiences. Overall, students' evaluation of activities scores were low for simple activities such as cutting off or pasting papers. According to students' achievement test results, differences between pre and post test scores in the Affective Domain was statistically significant (p<0.05), but not in Inquiry Domain. Based on teachers observations, numerous schools where have run this program reported that students' abilities to cooperate, discuss, observe and reason with evidences were improved. In order to implement this program in a larger scale, it is critical to have a strong support of teachers and induce them to change their teaching strategy through building a community of teachers and developing ongoing teacher professional development programs. Finally, there still remain strong needs to develop more programs, and actively discover and train more domestic woman scientists and engineers and collaborate with them to develop more educational materials for girls in all ages.