Search | Korea Science

Parallel Corpus Filtering and Korean-Optimized Subword Tokenization for Machine Translation (병렬 코퍼스 필터링과 한국어에 최적화된 서브 워드 분절 기법을 이용한 기계번역)

Park, Chanjun;kim, Gyeongmin;Lim, Heuiseok
- Annual Conference on Human and Language Technology
- /
- 2019.10a
- /
- pp.221-224
- /
- 2019
딥러닝을 이용한 Neural Machine Translation(NMT)의 등장으로 기계번역 분야에서 기존의 규칙 기반,통계기반 방식을 압도하는 좋은 성능을 보이고 있다. 본 논문은 기계번역 모델도 중요하지만 무엇보다 중요한 것은 고품질의 학습데이터를 구성하는 일과 전처리라고 판단하여 이에 관련된 다양한 실험을 진행하였다. 인공신경망 기계번역 시스템의 학습데이터 즉 병렬 코퍼스를 구축할 때 양질의 데이터를 확보하는 것이 무엇보다 중요하다. 그러나 양질의 데이터를 구하는 일은 저작권 확보의 문제, 병렬 말뭉치 구축의 어려움, 노이즈 등을 이유로 쉽지 않은 상황이다. 본 논문은 고품질의 학습데이터를 구축하기 위하여 병렬 코퍼스 필터링 기법을 제시한다. 병렬 코퍼스 필터링이란 정제와 다르게 학습 데이터에 부합하지 않다고 판단되며 소스, 타겟 쌍을 함께 삭제 시켜 버린다. 또한 기계번역에서 무엇보다 중요한 단계는 바로 Subword Tokenization 단계이다. 본 논문은 다양한 실험을 통하여 한-영 기계번역에서 가장 높은 성능을 보이는 Subword Tokenization 방법론을 제시한다. 오픈 된 한-영 병렬 말뭉치로 실험을 진행한 결과 병렬 코퍼스 필터링을 진행한 데이터로 만든 모델이 더 좋은 BLEU 점수를 보였으며 본 논문에서 제안하는 형태소 분석 단위 분리를 진행 후 Unigram이 반영된 SentencePiece 모델로 Subword Tokenization를 진행 하였을 시 가장 좋은 성능을 보였다.
PDF

Multiple Description Coding of H.264/AVC Motion Vector under Data Partitioning Structure and Decoding Using Multiple Description Matching (데이터 분할구조에서의 H.264/AVC 움직임 벡터의 다중표현 부호화와 다중표현 정합을 이용한 복호화)

Yang, Jung-Youp;Jeon, Byeung-Woo
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.44 no.6
- /
- pp.100-110
- /
- 2007
When compressed video data is transmitted over error-prone network such as wireless channel, data is likely to be lost, so the quality of reconstructed picture is severely decreased. It is specially so in case that important information such as motion vector or macroblock mode is lost. H.264/AVC standard includes DP as error resilient technique for protecting important information from error in which data is labeled according to its relative importance. But DP technique requires a network that supports different reliabilities of transmitted data. In general, the benefits of UEP is sought by sending multiple times of same packets corresponding to important information. In this paper, we propose MDC technique based on data partitioning technique. The proposed method encodes motion vector of H.264/AVC standard into multiple parts using MDC and transmits each part as independent packet. Even if partial packet is lost, the proposed scheme can decode the compressed bitstream by using estimated motion vector with partial packets correctly transmitted, so that achieving improved performance of error concealment with minimal effect of channel error. Also in decoding process, the proposed multiple description matching increases the accuracy of estimated lost motion vector and quality of reconstructed video.
PDF KSCI

Digitally Divided by Choice and the Diffusion of ICTs (정보통신기술의 확산과 선택에 의한 정보격차)

Jung, Byung-Kul
- Journal of Science and Technology Studies
- /
- v.6 no.2 s.12
- /
- pp.103-129
- /
- 2006
In spite of decreased attention to digital divide, it has been pointed out as one of the main obstacles to realization of information society. At the outset, two conflicting explanations, optimism and pessimism, have directed their attention to objective basis of digital divide in common. They, however, have been neglected the fact that accessibilities can be varied not only by objective conditions but by subjective conditions such as individual recognition of necessity and willingness. Subsequent choices can be a crucial factor to determine whether to or not to access. In USA, over 50% of potential users determined not to access to the internet by their own will and choices. By the user survey on internet, 'do not feel the necessity' ranked the first reason not to use the internet in South Korea. The respondents who chose 'no time to use' tend to decrease but kept holding their shares, too. The relative importance of 'not to access' by individual choice have been increased. Whether the non-users of ICTs by choice is in or beyond the scope of digital divide, it evidently shows the necessity and importance of directing our attention to it on the way to information society.
PDF

'98년 기업규모별 정보통신산업 현황

Korean Associaton of Information & Telecommunication
- 정보화사회
- /
- s.131
- /
- pp.62-65
- /
- 1999
향후 세계 경제에서 고부가가치 산업이 중심이 된 정보 및 지식은 경쟁우위를 결정하는 중요한 요소가 될 것이며, '97년 WTO 통신협상 타결로 인한 정보통신산업의 자유화.개방화 추세가 세계적으로 확산되어 경쟁이 날로 치열해질 전망이다. 따라서 국제경쟁력을 확보하기 위한 국내 정보통신 산업의 집중적인 투자와 육성지원을 위한 국내 정보통신산업의 현황을 분석하였다.
PDF

의학 분야 Web DB의 품질평가 : PubMed와 Embase를 대상으로

김상준
- 한국문헌정보학회 학술발표논집
- /
- 2004.04a
- /
- pp.33-59
- /
- 2004
최근 인터넷과 정보통신기술의 발달로 정보검색시스템과 DB의 구축 및 정보서비스가 급격하게 증가하고 있다. 인터넷의 등장과 함께 DB 검색은 도서관이나 정보센터는 물론 일반 국민들에게도 생활화되어 가고 있다. 따라서 늘어나는 DB 중 검색목적이나 필요에 적합한 DB를 선정하여 효율적으로 이용하는 일이 점점 더 중요해지고 있다. 이와 같이 적합한 DB를 선정하여 효율적으로 이용하는 일에는 정보검색시스템이나 DB에 대한 올바른 평가가 전제되어야만 가능한 일이다. (중략)
PDF

Analysis of Threat Information Priorities for Effective Security Monitoring & Control (효과적인 보안관제를 위한 위협정보 우선순위 도출)

Kang, DaYeon
- Journal of Korea Society of Industrial Information Systems
- /
- v.26 no.5
- /
- pp.69-77
- /
- 2021
This study aims to identify security-based threat information for an organization. This is because protecting the threat for IT systems plays an important role for an corporate's intangible assets. Security monitoring systems determine and consequently respond threats by analyzing them in a real time situation, focusing on events and logs generated by security protection programs. The security monitoring task derives priority by dividing threat information into reputation information and analysis information. Reputation information consisted of Hash, URL, IP, and Domain, while, analysis information consisted of E-mail, CMD-Line, CVE, and attack trend information. As a result, the priority of reputation information was relatively high, and it is meaningful to increase accuracy and responsiveness to the threat information.
https://doi.org/10.9723/jksiis.2021.26.5.006 인용 PDF KSCI

Automatic Text Categorization using the Importance of Sentences (문장 중요도를 이용한 자동 문서 범주화)

Ko, Young-Joong;Park, Jin-Woo;Seo, Jung-Yun
- Journal of KIISE:Software and Applications
- /
- v.29 no.6
- /
- pp.417-424
- /
- 2002
Automatic text categorization is a problem of assigning predefined categories to free text documents. In order to classify text documents, we have to extract good features from them. In previous researches, a text document is commonly represented by the frequency of each feature. But there is a difference between important and unimportant sentences in a text document. It has an effect on the importance of features in a text document. In this paper, we measure the importance of sentences in a text document using text summarizing techniques. A text document is represented by features with different weights according to the importance of each sentence. To verify the new method, we constructed Korean news group data set and experiment our method using it. We found that our new method gale a significant improvement over a basis system for our data sets.
PDF KSCI

A Study on Selection Attributes and Information Sources of Optical Shop (안경원 선택속성과 정보원천에 관한 연구)

Cha, Jung-Won
- Journal of Korean Ophthalmic Optics Society
- /
- v.21 no.3
- /
- pp.173-179
- /
- 2016
Purpose: This study is to help assist in the management of optical shops by using the importance sequence of optical shop selection attributes, which is related to the consumer's selection method of consumer, and by using the importance sequence of optical shop information sources which is related to a route for optical shop selection. Methods: Customer surveys were conducted from March 10 to March 31, 2015 targeting customers who have visited an optical shop in Seoul and Northern Gyeonggi-do regions. The analys method was descriptive statistics and data were analyzed by utilizing SPSS v.10.0 statistical package program. Results: The highest ranking five attributes among the importance of optical shop selection are "friendliness and politeness of staff", "cleanliness of an optical shop", "quick resolution of customer's complaints by staff", "eyes examination and glasses dispensing skill of staff", "customer's complaints and claims handling". The lowest ranking five attributes among the importance of optical shop selection are "provide free gifts", "scale or size of an optical shop", "opening time and closing time", "convenient parking facilities", "favorable countenance of staff". The two highestr ranking criteria among the importance of optical shop information sources are "previous utilization experience", "recommendation by a relative, a friend and a family etc". The two lowest ranking criteria among the importance of optical shop information sources are "advertisement" and "spatial exterior view of optical shop". Conclusions: It is shown that the important thing in management of an optical shop is an inner caliber like ability of ophthalmic optician, interaction with customers, and previous utilization experience rather than external factors like advertisement, exterior view, and bonus gift.
https://doi.org/10.14479/jkoos.2016.21.3.173 인용 PDF KSCI

Adjusting Weights of Single-word and Multi-word Terms for Keyphrase Extraction from Article Text

Kang, In-Su
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.8
- /
- pp.47-54
- /
- 2021
Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.
https://doi.org/10.9708/jksci.2021.26.08.047 인용 PDF KSCI HTML

The Analysis of the Role of Information for Production Control System (생산통제시스템을 위한 정보의 역할 분석)

Kim, Hyun-Soo;Choi, Jin-Yeong
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.20 no.44
- /
- pp.273-286
- /
- 1997
하며 최적의 합리적인 경영을 할 수 있도록 생산환경을 뒷받침할 수 있어야 한다. 이를 위해서는 생산시스템 전반에 걸친 정보 및 통신 기술의 중요성이 다른 어느것 보다도 더욱 더 강조되며, 그리고 중요한 역할을 담당해 나가고 있다. 본 연구에서는 이러한 최근의 환경을 나타낼 수 있는 제조환경에서 중추적인 역할을 담당하는 생산통제시스템에 사용되고있는 정보기술에 관하여 기존의 생산통제방식(납기일결정방식, 생산입력통제방식, 우선순위결정방식)에서 사용되고있는 정보의 내용을 분석하여 생산환경의 어느 부분의 어떤 정보가 생산통제시스템에서 중요한 역할을 담당할 수 있는지를 연구하고, 생산시스템 각 부분에서 반드시 고려되어야 하는 정보의 내용을 제시하고자 한다.
PDF

Search Result 21,694, Processing Time 0.053 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)