• Title/Summary/Keyword: target text

Search Result 233, Processing Time 0.017 seconds

The Effect of Domain Specificity on the Performance of Domain-Specific Pre-Trained Language Models (도메인 특수성이 도메인 특화 사전학습 언어모델의 성능에 미치는 영향)

  • Han, Minah;Kim, Younha;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.251-273
    • /
    • 2022
  • Recently, research on applying text analysis to deep learning has steadily continued. In particular, researches have been actively conducted to understand the meaning of words and perform tasks such as summarization and sentiment classification through a pre-trained language model that learns large datasets. However, existing pre-trained language models show limitations in that they do not understand specific domains well. Therefore, in recent years, the flow of research has shifted toward creating a language model specialized for a particular domain. Domain-specific pre-trained language models allow the model to understand the knowledge of a particular domain better and reveal performance improvements on various tasks in the field. However, domain-specific further pre-training is expensive to acquire corpus data of the target domain. Furthermore, many cases have reported that performance improvement after further pre-training is insignificant in some domains. As such, it is difficult to decide to develop a domain-specific pre-trained language model, while it is not clear whether the performance will be improved dramatically. In this paper, we present a way to proactively check the expected performance improvement by further pre-training in a domain before actually performing further pre-training. Specifically, after selecting three domains, we measured the increase in classification accuracy through further pre-training in each domain. We also developed and presented new indicators to estimate the specificity of the domain based on the normalized frequency of the keywords used in each domain. Finally, we conducted classification using a pre-trained language model and a domain-specific pre-trained language model of three domains. As a result, we confirmed that the higher the domain specificity index, the higher the performance improvement through further pre-training.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

The Characteristics and Significance of 'Wanpan Changgeuk' Written by Heogyu (허규 연출 '완판 창극'의 특징과 의의)

  • Kim, Kee-hyung
    • (The) Research of the performance art and culture
    • /
    • no.20
    • /
    • pp.5-30
    • /
    • 2010
  • It has been diversified and serious attempt to establish the identity of Changgeuk, but it is still independent dramaturgy or the current unformed progressive art. In this situation, exploring works of the identity of changgeuk that is base on the performed individual and specific works in the title of Changgeuk is needed. The 80s and 90s Heo, Gyu was leading an active life as a director who was responsible for directing of Changgeuk. He dramatized Siljeon Pansori -which is a group of Pansori missing text- as well as 5-remained Songs in Pansori and he presented a number of creative Changgeuk works on stage. Especially, the completion of dramatizing 5-remained Songs in Pansori under the name of 'Wanpan Changgeuk -which means full version performance without omit-' is the one of his big achievement by performing "Heungbojeon" on the stage 1982 and "Jeokbyeokga" 1985. The purposes of this research are confirmation of Heo's direction of the formulation and considering its characteristics & significance through 'Wanpan Changgeuk' which written by Heo. Heo was a practical play who was interested in the subjective formulation of national culture and creative transmission for Korean traditional performance. He tried to formulate Changgeuk to a representative performance of Korea. In the process he pointed out those problems, (1) interpretation of a work problem, (2) actor's creative problems, (3) structure problem of theater for Changgeuk. He indicated that the other challenges are to use of the stage & device, to overcome sentimentalism, to stylize acting, to improve own quality, to control the speed and length of the song, to choose the suitable musical accompaniment, to create new repertories problems, and etc. Changgeuk is classified in 3 group by origin, (1)dramatizing of 5-remained Songs, (2)dramatizing of 7-missing Songs, (3)creative dramatizing. It contains all of 3 types that Heo's work. The dramatizing of remained 5 Songs are the great importance among those works. Heo hoped that Chaggeuk has become the most representative art work of Korea by performing 'Wanpan Changgeuk' compiled heritage of Korea's outstanding artistic achievement. The characteristics of 'Wanpan Changgeuk' can be summarized following four. (1) Directing attitudes that emphasizes tradition, (2) Accepting the elements of traditional performance actively, (3) Valuing the classy and ethic, (4) Emphasizing humor and active utilizing of the secondary characters. Heo's 'Wanpan Changgeuk' shows a peak of the artistic level which Changgeuk can be reached. He want to make Changgeuk a Korean representative artistic performance by compiling Pansori heritage and accommodating Korean traditional performance. Heo continued his effort to present Pansori's authenticity and to dramatize from beginning to end without missing. It shows very well that 'Wanpan Changgeuk' takes 4~5 hours for playing. It looks Heo's achievement in the 'Wanpan Changgeuk' influenced Changgeuk significantly since then. Heo's 'Wanpan Changgeuk' is matrix of 'Wanpan JangMak Changgeuk' attempted in the 1990s. Especially, their intent is consistent to synthesize texts and to show all the virtue of Pansori. But 90's 'Wanpan JangMak Changgeuk' aim for large stage, fancy device & costume and variety contents compared with 'Wanpan Changgeuk'. Recently, producers have tried not to make a impressive Changgeuk but to make a interesting one. They usually organize performance within 2 hours and prefer orchestral music to its unique sound. In those point of view, it seems that Heo's idle in 'Wanpan Changgeuk' has become one of target to conquer in these days.