• Title/Summary/Keyword: Text similarity

Search Result 274, Processing Time 0.034 seconds

A Study on Smallpox and Measles by BYUN Gwangwon - Based on a formation Yosandnagsinjipuibangkeumnangjibo and The Bojeoksinbang - (변광원(卞光源)의 두진(痘疹)과 마진(麻疹)에 대한 연구 - 『요산당신집의방금낭지보(樂山堂新集醫方錦囊至寶)』와 『보적신방(保赤新方)』의 편제를 중심으로 -)

  • SONG, Jichung
    • Journal of Korean Medical classics
    • /
    • v.35 no.3
    • /
    • pp.59-69
    • /
    • 2022
  • Objectives : The existence of specialized medical texts on a certain disease is reflective of its prevalence of the time. Smallpox and measles were major pediatric diseases, of which previous studies examined the outbreak of measles in late Joseon and the relationship among various specialized texts, and how records of the two diseases in the general medical literature has changed chronologically. Research on the two diseases recorded in different texts written by the same author has not been conducted before. Methods : Examination of the organization of the smallpox and measles parts in the Yosandangsinjipuibangkeumnangjibo and Bojeoksinbang, followed by comparative analysis was undertaken. Results : While the two texts show great similarity in the general contents of smallpox and measles, there was difference in the way they were written. In the case of the Yosandangsinjipuibangkeumnangjibo the author lists referenced literature, while in the Bojeoksinbang he does not. Also, compared to the Yosandangsinjipuibangkeumnangjibo, the Bojeoksinbang has detailed titles for the contents in both introduction and the detailed parts, while in the Bojeoksinbang there are contents that could not be found in the Yosandangsinjipuibangkeumnangjibo, along with more pattern differentiation in the former. Conclusions : The Yosandangsinjipuibangkeumnangjibo which was published in May of 1806 is a general type of medical text, in which the part on pediatrics is positioned in the first two volumes out of the entire 12 volumes, indicative of the author's emphasis on pediatric disease. The Bojeoksinbang which was published in December of 1806 discusses in-depth theories on smallpox and measles out of all pediatric disease, from which we can glimpse a specialized field of pediatrics in the late Joseon period.

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

  • Kim, Kwang Hyeon;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.469-475
    • /
    • 2022
  • To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.

A Design and Implementation of a Content_Based Image Retrieval System using Color Space and Keywords (칼라공간과 키워드를 이용한 내용기반 화상검색 시스템 설계 및 구현)

  • Kim, Cheol-Ueon;Choi, Ki-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1418-1432
    • /
    • 1997
  • Most general content_based image retrieval techniques use color and texture as retrieval indices. In color techniques, color histogram and color pair based color retrieval techniques suffer from a lack of spatial information and text. And This paper describes the design and implementation of content_based image retrieval system using color space and keywords. The preprocessor for image retrieval has used the coordinate system of the existing HSI(Hue, Saturation, Intensity) and preformed to split One image into chromatic region and achromatic region respectively, It is necessary to normalize the size of image for 200*N or N*200 and to convert true colors into 256 color. Two color histograms for background and object are used in order to decide on color selection in the color space. Spatial information is obtained using a maximum entropy discretization. It is possible to choose the class, color, shape, location and size of image by using keyword. An input color is limited by 15 kinds keyword of chromatic and achromatic colors of the Korea Industrial Standards. Image retrieval method is used as the key of retrieval properties in the similarity. The weight values of color space ${\alpha}(%)and\;keyword\;{\beta}(%)$ can be chosen by the user in inputting the query words, controlling the values according to the properties of image_contents. The result of retrieval in the test using extracted feature such as color space and keyword to the query image are lower that those of weight value. In the case of weight value, the average of te measuring parameters shows approximate Precision(0.858), Recall(0.936), RT(1), MT(0). The above results have proved higher retrieval effects than the content_based image retrieval by using color space of keywords.

  • PDF

SEARCH FOR EXOPLANETS AROUND NORTHERN CIRCUMPOLAR STARS III. LONG-PERIOD RADIAL VELOCITY VARIATIONS IN HD 18438 AND HD 158996

  • Bang, Tae-Yang;Lee, Byeong-Cheol;Jeong, Gwang-Hui;Han, Inwoo;Park, Myeong-Gu
    • Journal of The Korean Astronomical Society
    • /
    • v.51 no.1
    • /
    • pp.17-25
    • /
    • 2018
  • Detecting exoplanets around giant stars sheds light on the later-stage evolution of planetary systems. We observed the M giant HD 18438 and the K giant HD 158996 as part of a Search for Exoplanets around Northern circumpolar Stars (SENS) and obtained 38 and 24 spectra from 2010 to 2017 using the high-resolution Bohyunsan Observatory Echelle Spectrograph (BOES) at the 1.8m telescope of Bohyunsan Optical Astronomy Observatory in Korea. We obtained precise RV measurements from the spectra and found long-period radial velocity (RV) variations with period 719.0 days for HD 18438 and 820.2 days for HD 158996. We checked the chromospheric activities using Ca $\text\tiny{II}$ H and $H{\alpha}$ lines, HIPPARCOS photometry and line bisectors to identify the origin of the observed RV variations. In the case of HD 18438, we conclude that the observed RV variations with period 719.0 days are likely to be caused by the pulsations because the periods of HIPPARCOS photometric and $H{\alpha}$ EW variations for HD 18438 are similar to that of RV variations in Lomb-Scargle periodogram, and there are no correlations between bisectors and RV measurements. In the case of HD 158996, on the other hand, we did not find any similarity in the respective periodograms nor any correlation between RV variations and line bisector variations. In addition, the probability that the real rotational period can be as longer than the RV period for HD 158996 is only about 4.3%. Thus we conclude that observed RV variations with a period of 820.2 days of HD 158996 are caused by a planetary companion, which has the minimum mass of 14.0 $M_{Jup}$, the semi-major axis of 2.1 AU, and eccentricity of 0.13 assuming the stellar mass of $1.8 M_{\odot}$. HD 158996 is so far one of the brightest and largest stars to harbor an exoplanet candidate.

Hermaphrodite Good and Evil in Goya's Los Caprichos (고야의 "카프리초스(Los Caprichos)"에 표현된 자웅동체적 선과 악)

  • Kim, Jung Hee
    • The Journal of Art Theory & Practice
    • /
    • no.13
    • /
    • pp.97-132
    • /
    • 2012
  • 1799 Francisco de Goya published Los Caprichos with 80 aquatint etchings. On 6 February he advertised it on the front page of the Diario de Madrid. The long advertisement which began with "a collection of prints of capricious subjects, invented and etched by Don Francisco Goya" informed purpose, themes and methods of this collection of prints. According to this advertisement Goya "has chosen as subjects for his work, from the multitude of follies and mistakes common in every civil society and from the vulgar prejudices and lies authorized by custom, ignorance or self-interest, those that he has thought most fit to provide material for ridicules, and at the same time to exercise the artist's imagination." The text emphasized that the 'author' of this series didn't to want to criticise any individual and to be a copyist. From his phantasy Goya invented many creatures like the anthropic, humanized animals etc.. With Los Caprichos he stood on the threshold to Romanticism. The early researchers of Los Caprichos classified its author, Goya as an enlightened intellectual. The similarity of the themes of the series with the subjects of the Enlightenment, his some enlightened 'friends' and the idea to avoid the prevalent mystification of his life supported this theory. But this trend became revised since the 80's of the last century. This made possible to research Goya's works in new perspective and to see that Goya didn't criticise the Spanish society and his contemporaries. Rather he showed its reality and parodied through creatures which are mixtures of the reality that he observed, and visions that he invented. Characters and scenes in Goya's prints are ambiguous and equivocal. They have the values which are defined by the dualistic metaphysic in Europe as oppositional, like good and evil for example, at the same time. Goya himself also appeared in various types in this series. This ambiguousness, or "polyphony", as Jennis Tomlinson defined, is a symptom of the decay of the belief in the Enlightenment which spreaded in Europe as a result of the attack of Bastille and the French Revolution. Goya's self-portrait in pl. 43 of this series, "El sue$\tilde{n}$o de la razon produce monstruos" shows the complex psychology of him and his contemporaries as well. As the rest etchings after this print show witchcraft and monsters reside in the world in which the reason of the Enlightenment and the through the reason weakened God's rule lost their authority. In this thesis I will examine and analyse how Goya represented in Los Caprichos the nature of man and its society, as complex being in which the 'antagonistic' value couple as good and evil couldn't be divided, but are united.

  • PDF

Development of Identity-Provider Discovery System leveraging Geolocation Information (위치정보 기반 식별정보제공자 탐색시스템의 개발)

  • Jo, Jinyong;Jang, Heejin;Kong, JongUk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.9
    • /
    • pp.1777-1787
    • /
    • 2017
  • Federated authentication (FA) is a multi-domain authentication and authorization infrastructure that enables users to access nationwide R&D resources with their home-organizational accounts. An FA-enabled user is redirected to his/her home organization, after selecting the home from an identity-provider (IdP) discovery service, to log in. The discovery service allows a user to search his/her home from all FA-enabled organizations. Users get troubles to find their home as federation size increases. Therefore, a discovery service has to provide an intuitive way to make a fast IdP selection. In this paper, we propose a discovery system which leverages geographical information. The proposed system calculates geographical proximity and text similarity between a user and organizations, which determines the order of organizations shown on the system. We also introduce a server redundancy and a status monitoring method for non-stop service provision and improved federation management. Finally, we deployed the proposed system in a real service environment and verified the feasibility of the system.

TF-IDF Based Association Rule Analysis System for Medical Data (의료 정보 추출을 위한 TF-IDF 기반의 연관규칙 분석 시스템)

  • Park, Hosik;Lee, Minsu;Hwang, Sungjin;Oh, Sangyoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.3
    • /
    • pp.145-154
    • /
    • 2016
  • Because of the recent interest in the u-Health and development of IT technology, a need of utilizing a medical information data has been increased. Among previous studies that utilize various data mining algorithms for processing medical information data, there are studies of association rule analysis. In the studies, an association between the symptoms with specified diseases is the target to discover, however, infrequent terms which can be important information for a disease diagnosis are not considered in most cases. In this paper, we proposed a new association rule mining system considering the importance of each term using TF-IDF weight to consider infrequent but important items. In addition, the proposed system can predict candidate diagnoses from medical text records using term similarity analysis based on medical ontology.

Germination of Buried Seeds in Secondary Forest of Basla Zone - Coniferous and Broadleaved Forest of Low Slope, Yesan-gun, Korea - (저지대 이차림지역의 매토종자 발아특성 -예산군의 침엽수림과 활엽수림-)

  • Kang, Hee-Kyoung;Park, Jun-Young;Ahn, Sang-Kyo;Cho, Yong-Hyeon;Park, Bong-Ju;Kim, Won-Tae;Shin, Kyung-Jun;Eo, Yang-Joon;Song, Hong-Seon
    • Korean Journal of Environment and Ecology
    • /
    • v.28 no.6
    • /
    • pp.705-714
    • /
    • 2014
  • This text was analyzed and investigated the aerial part plants and buried seed plants at coniferous forest and broadleaved forest in Yesan-gun, in order to offer the basic data of potential natural vegetation change on secondary forest. Plants of buried seed germination were consisted of 29 taxa in coniferous forest (28 species, 1 varieties, of 27 genus, 20 families) and 36 taxa in broadleaved forest (34 species, 2 varieties, of 32 genus, 18 families). Family classification of buried seed plant was the most in Compositae, and emergent plot frequency was the highest of Cyperus amuricus in coniferous forest and Crepidiastrum sonchifolium in broadleaved forest. The soil depth of the most plants appearance was 0~10 cm in coniferous forest and 0~5 cm in broadleaved forest, and the soil depth of the most population appearance was 0~2 cm in coniferous forest and broadleaved forest. Population of buried seed germination was decreased according as soil is deep. Crepidiastrum sonchifolium was a plant that population of buried seed germination is the most. Similarity index of the aerial part plants and buried seed plants was low as 0.22, and coniferous forest and broadleaved forest was 0.40.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.