Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)
-
- Journal of Intelligence and Information Systems
- /
- v.25 no.4
- /
- pp.105-122
- /
- 2019
Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.
The relics of the Southeast Asian civilizations in the first phase are found with the relics from India, China, and even further West of Persia and Rome. These relics are the historic marks of the ancient interactions of various continents, mainly through the maritime trade. The traces of the indic culture, which appears in the historic age, are represented in the textual records and arts, regarded as the essence of the India itself. The ancient Hindu arts found in various locations of Southeast Asia were thought to be transplanted directly from India. However, Neither did the Gupta Hindu Art of India form the mainstream of the Gupta Art, nor did it play an influential role in the adjacent areas. The Indian culture was transmitted to Southeast Asia rather intermittently than consistently. If we thoroughly compare the early Hindu art of India and that of Southeast Asia, we can find that the latter was influenced by the former, but still sustained Southeast Asian originality. The reason that the earliest Southeast Asian Hindu art is discovered mostly in continental Southeast Asia is resulted from the fact that the earliest networks between India and the region were constructed in this region. Among the images of Hindu gods produced before the 7th century are Shiva, Vishnu, Harihara, and Skanda(the son of Shiva), and Ganesha(the god of wealth). The earliest example of Vishnu was sculpted according to the Kushan style. After that, most of the sculptures came to have robust figures and graceful proportions. There are a small number of images of Ganesha and Skanda. These images strictly follow the iconography of the Indian sculpture. This shows that Southeast Asians chose their own Hindu gods from the Hindu pantheon selectively and devoted their faiths to them. Their basic iconography obediently followed the Indian model, but they tried to transform parts of the images within the Southeast Asian contexts. However, it is very difficult to understand the process of the development of the Hindu faith and its contents in the ancient Southeast Asia. It is because there are very few undamaged Hindu temples left in Southeast Asia. It is also difficult to make sure that the Hindu religion of India, which was based on the complex rituals and the caste system, was transplanted to Southeast Asia, because there were no such strong basis of social structure and religion in the region. "Indianization" is an organized expansion of the Indian culture based on the sense of belonging to an Indian context. This can be defined through the process of transmission and progress of the Hindu or Buddhist religions, legends about purana, and the influx of various epic expression and its development. Such conditions are represented through the Sanskrit language and the art. It is the element of the Indian culture to fabricate an image of god as a devotional object. However, if we look into details of the iconography, style, and religious culture, these can be understood as a "selective reception of foreign religious culture." There were no sophisticated social structure yet to support the Indian culture to continue in Southeast Asia around the 7th century. Whether this phenomena was an "Indianization" or the "influx of elements of Indian culture," it was closely related to the matter of 'localization.' The regional character of each local region in Southeast Asia is partially shown after the 8th century. However it is not clear whether this culture was settled in each region as its dominant culture. The localization of the Indian culture in Southeast Asia which acted as a network connecting ports or cities was a part of the process of localization of Indian culture in pan-Southeast Asian region, and the process of the building of the basis for establishing an identity for each Southeast Asian region.
This study was conducted to examine the quantitative trend of domestic studies in invention gifted education, identify the intrinsic meaning and connection attributes in these research analysis, and provide basic data to explore future development plans. To this end, 97 domestic academic papers were finally selected as "Invention Gifted Education" by the Korea Research and Information Service (RISS), technical statistical analysis was conducted with SPSS on publication year, author composition, researcher's affiliation and location area, and published journal. The trend, which had been on the rise since 2007, confirmed by academic papers on gifted education in invention, peaked at the time of the 3rd comprehensive plan for gifted education and has since declined again. As a result of technical statistical analysis of the author's characteristics, half of the papers were jointly published, followed by a number of independent authors. The papers published alone were identified as belonging to universities, research institutes, elementary schools, and middle schools, and the cooperative papers were many studies cooperated with young researchers and professional researchers, and only one collaborative study was conducted between young researchers. When looking at the regions and journals in which the Invention Gifted Education thesis was published, it was concentrated in some regions or journals, and the deviation was very large. As a result of language network analysis using academic paper keywords, creativity and programs were identified as meaningful keywords that showed top appearance, and the keyword pair with high co-appearance was invention gifted-creativity. The keyword of connection-centeredness at the top served as an intermediary for creativity, problem-solving, development, and company to expand to other research topics, and served as a research topic that could be expanded to various topics. In the case of mediation-centeredness, creativity, programs, and effects showed high mediation-centeredness, indicating that it is an important keyword that plays a role in mediating or mediating other keywords. Through these research results, national policy measures need to be prepared for the development of gifted education, and the need to create an invention ecological culture that can enhance teachers' expertise while increasing social responsibility for gifted education.
In 1986, Korea established legal systems to support small and medium-sized start-ups, which becomes the main pillars of national development. The legal systems have stimulated start-up ecosystems to have more than 1 million new start-up companies founded every year during the past 30 years. To analyze the trend of Korea's start-up ecosystem, in this study, we collected 1.18 million news articles from 1991 to 2020. Then, we extracted news articles that have the keywords "start-up", "venture", and "start-up". We employed network analysis and topic modeling to analyze collected news articles. Our analysis can contribute to analyzing the government policy direction shown in the history of start-up support policy. Specifically, our analysis identifies the dynamic characteristics of government influenced by external environmental factors (e.g., society, economy, and culture). The results of our analysis suggest that the start-up ecosystems in Korea have changed and developed mainly by the government policies for corporation governance, industrial development planning, deregulation, and economic prosperity plan. Our frequency keyword analysis contributes to understanding entrepreneurial productivity attributed to activities among the networked components in industrial ecosystems. Our analyses and results provide practitioners and researchers with practical and academic implications that can help to establish dedicated support policies through forecast tasks of the economic environment surrounding the start-ups. Korean entrepreneurial productivity has been empowered by growing numbers of large companies in the mobile phone industry. The spectrum of large companies incorporates content startups, platform providers, online shopping malls, and youth-oriented start-ups. In addition, economic situational factors contribute to the growth of Korean entrepreneurial productivity the economic, which are related to the global expansions of the mobile industry, and government efforts to foster start-ups. Our research is methodologically implicative. We employ natural language processes for 30 years of media articles, which enables more rigorous analysis compared to the existing studies which only observe changes in government and policy based on a qualitative manner.
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70