• Title/Summary/Keyword: Cluster Complex

Search Result 387, Processing Time 0.024 seconds

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

An Installation and Model Assessment of the UM, U.K. Earth System Model, in a Linux Cluster (U.K. 지구시스템모델 UM의 리눅스 클러스터 설치와 성능 평가)

  • Daeok Youn;Hyunggyu Song;Sungsu Park
    • Journal of the Korean earth science society
    • /
    • v.43 no.6
    • /
    • pp.691-711
    • /
    • 2022
  • The state-of-the-art Earth system model as a virtual Earth is required for studies of current and future climate change or climate crises. This complex numerical model can account for almost all human activities and natural phenomena affecting the atmosphere of Earth. The Unified Model (UM) from the United Kingdom Meteorological Office (UK Met Office) is among the best Earth system models as a scientific tool for studying the atmosphere. However, owing to the expansive numerical integration cost and substantial output size required to maintain the UM, individual research groups have had to rely only on supercomputers. The limitations of computer resources, especially the computer environment being blocked from outside network connections, reduce the efficiency and effectiveness of conducting research using the model, as well as improving the component codes. Therefore, this study has presented detailed guidance for installing a new version of the UM on high-performance parallel computers (Linux clusters) owned by individual researchers, which would help researchers to easily work with the UM. The numerical integration performance of the UM on Linux clusters was also evaluated for two different model resolutions, namely N96L85 (1.875° ×1.25° with 85 vertical levels up to 85 km) and N48L70 (3.75° ×2.5° with 70 vertical levels up to 80 km). The one-month integration times using 256 cores for the AMIP and CMIP simulations of N96L85 resolution were 169 and 205 min, respectively. The one-month integration time for an N48L70 AMIP run using 252 cores was 33 min. Simulated results on 2-m surface temperature and precipitation intensity were compared with ERA5 re-analysis data. The spatial distributions of the simulated results were qualitatively compared to those of ERA5 in terms of spatial distribution, despite the quantitative differences caused by different resolutions and atmosphere-ocean coupling. In conclusion, this study has confirmed that UM can be successfully installed and used in high-performance Linux clusters.

The Distribution Structure of the Internet Movie and Spatial Clustering of the Internet Movie Industry (인터넷 영화의 유통구조와 인터넷 영화산업의 공간적 집적화)

  • Lee, Hee-Yeon;Lee, Nan-Kyung
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.8 no.1
    • /
    • pp.107-130
    • /
    • 2005
  • The purpose of this study were to examine the spatial distribution and locational characteristics of the Internet movie industry, to seize the value chains of the Internet movie industry and distribution structure of the internet movies, and to analyze the vertical-horizontal linkages of the Internet movie firms and their spatial clustering. Recently, the Internet movie industry has developed rapidly due to the development of techniques related to movie contents, the broadband Internet and a wide expansion of the high speed communication network and the increase of demands on movie contents. It has been found that 74$\%$ of the Internet movie industry was concentrated in Seoul. Especially this industry was quite agglomerated in several dongs of Gangnam-gu such as Yoeksam, Nonhyeon, Daechi and Samseung. The proximity of the same or similar business firms was the primary locational factors that influenced on the Internet movie industry, followed by other factors such as convenience of transportation, the reputation of the place, and proximity of technically supporting firms. The Internet movie industry had the valve chain composed of 'contents suppliers $\rightarrow$ contents distributors $\rightarrow$ service providers', However, there were also a complex network of the VOD copyright owner, VOD syndicator, and service providers in each category of the value chain. This research clearly revealed that the localized clustering has been formed with the movie contents providers, technically supporting firms, client firms, and cooperative-affiliated business firms related to the Internet movie industry, Additionally, a very intimate network has been established within the clustering, inducing the enlargement of the market and decrease of costs, the co-sharing of tacit knowledge, and the synergy effect.

  • PDF

A Freeze-fracture Study on the Odontoblast of Dental Pulp in the Rat Incisor (흰쥐 절치치수의 Odontoblast에 관한 Freeze-Fracture 연구)

  • Kim, Myung-Kook
    • Applied Microscopy
    • /
    • v.16 no.2
    • /
    • pp.1-13
    • /
    • 1986
  • The purpose of this study was to investigate the morphology and intercellular junctions of the odontoblast of dental pulp in the rat incisor by means of the freeze fracture electron microscopy. Twenty male Sprague-Dawley rats weighing $150{\sim}200g$ were used. After being anesthetized by an intraperitoneal injection of 0.5 ml sodium pentobarbital per kg in body weight(60 mg/ml) the animals were perfused with 2.5% glutaraldehyde-2% paraformaldehyde fixative in 0.1 M cacodylate buffer, pH 7.2 through the ascending aorta for one hour. The incisors were carefully extracted from the jaws and demineralized by suspending them in 0.1 M EDTA in 3% glutaraldehyde (pH 7.2) for two weeks. After demineralization, the specimens were obtained from the portion divided into five equal parts. For freeze-fracture replication, demineralized tissues were infiltrated for several hours with 10%, 25% glycerol in 0.1M cacodylate buffer as a cryoprotectant and then frozen in liquid Freon 22 and stored in liquid nitrogen. Fracturing and replication were done in Balzers BAF 400D high-vacuum freeze-fracture apparatus at $-120^{\circ}C$ under routine $5X10^{-7}$ Torr vacuum. The tissue was immediately replicated with platinum unidirectionally at $45^{\circ}$ angle and reinforced with carbon at $90^{\circ}$ angle unidirectionally or by using a rotary stage. The replication process was monitored by a quartz-crystal device. The replicas were immersed in 100% methanol overnight. The tissue was then digested from the replica by clorox (laundry bleach), placed into 5% EDTA, and washed repeatedly with distilled water. The replicas were picked up on 0.3% formvar-coated 75 mesh grids and examined in the JEOL 100B electron microscope. The results were as follows; 1. Both in thin sections and freeze-fracture replicas, three types of intercellular junctions were recognizable in the plasma membrane of odontoblast: gap junction, tight junction and desmosome-like junction. 2. The nuclear pores were evenly distributed over the nuclear envelope. The pore complex formed a ring about 70 nm in diameter. 3. Gap junctions were found between odontoblasts as well as odontoblasts and neighbouring pulp cells (fibroblast, subodontoblastic cell process, nerve-like fibre). Gap junctions, which were round, ellipsoid and pear-shaped and 600 nm in diameter, were observed in the odontoblast. 4. Numerous round and ellipsoid gap junctions could be frequently seen on the plasma membranes in cell body and apical part of the odontoblasts. On the P face, the junctions were recognized as a cluster of closely packed particles, measuring about 9 nm in diameter, and on the E face, the junctions were recognized as a shallow grooves.

  • PDF

Enhancing Regional Innovation System Potential: The Dimension of Firm Practices (지역혁신체제 잠재성 향상의 조건: 기업의 혁신활동을 중심으로)

  • Jong Ho Lee
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.6 no.1
    • /
    • pp.61-77
    • /
    • 2003
  • Finns are central economic agents that play an important role in systems of innovation as they take responsibility for generating and diffusing knowledge in both organizational and societal context. They must be considered as learning organizations which interact with other finns and institutions that share their environment. The systems of innovation literature accentuates institutional conditions that influence innovation in sectoral, regional or national levels. Meanwhile, it tends to ignore the complex dimensions of finn practices in relation to learning and innovation activities. In this context, this paper attempts to examine what finns do for sustaining innovation and how they learn to innovate. This is not just critical to know individual finns innovativeness which depends on interactions with environments within and outside the organizational boundary but also to evaluate the regional innovation system potential. In short, it is important to see that finns would attempt to take advantage of distributed knowledge within and across the boundaries of the finn without sticking to particular regional innovation systems. I argue that the more finns of a cluster attempt not only to combine localized sources of knowledge and external sources of knowledge but also to become a learning organization, the more increased regional innovation system potentials can be.

  • PDF

Ultrastructure of Brachial Ganglion in Korean Octopus, Octopus minor (한국산 낙지 (Octopus minor) 상완신경절의 미세구조)

  • Chang, Nam-Sub
    • Applied Microscopy
    • /
    • v.30 no.3
    • /
    • pp.265-272
    • /
    • 2000
  • In this study, the brachial ganglion of Octopus minor was investigated with light microscope and electron microscope,andthefollowingresultswereobtained. The brachial ganglions of the octopus, round in shapes , are located under each of suckers. Their sizes are proportional to those of the suckers. A brachial ganglion of round shape consists of cortex and medulla. In cortex, nerve cells exist collectively while neuropiles in medulla. Three kinds of nerve cells (large, middle, and small neurons) are found in the cluster of nerve cells. The small one is a round cell of about $0.9{\mu}m$ in diameter while the middle and large ones are an elliptical cell of $1.6\times1.3{\mu}m$ and an ovoid cell of $2.8{\mu}m$ in diameter, respectively. All of those cells look light due to their low electron densities , in which cell organelle are not well developed. It was also observed that the middle neurons are surrounded by median electron-dense neuroglial cells of pyramidal shapes and about $0.6\times0.4{\mu}m$ in sizes. In the neuropiles of medulla, dendrites and axons of various sizes make a complex net. They contain four kinds of chemical synaptic vesicles-electron-dense synaptic vesicle of 100 nm in diameter, median electron-dense synaptic vesicle of 90 nm in diameter, electron-dense cored synaptic vesicle of 90 nm in diameter, and electron-lucent synaptic vesicle of 50 nm in diameter.

  • PDF

Immunomodulatory effect of bee pollen extract in macrophage cells (꿀벌 꽃가루 열수 추출물의 큰포식세포 면역활성 효과)

  • Kim, Yi-Eun;Cho, Eun-Ji;Byun, Eui-Hong
    • Korean Journal of Food Science and Technology
    • /
    • v.50 no.4
    • /
    • pp.437-443
    • /
    • 2018
  • Activation of macrophages plays an important role in the host-immune system. In this study, we investigated the functional roles and related signaling mechanism of hot-water extracts of bee pollen (BPW) in RAW 264.7 macrophages. Since BPW did not exert cytotoxicity at concentrations ranging from 62.5 to $250{\mu}g/mL$ in macrophage cells, a concentration of $250{\mu}g/mL$ was used as the maximum dose of BPW throughout subsequent experiments. BPW increased inducible nitric oxide synthase-mediated nitric oxide production in a concentration-dependent manner. Additionally, BPW was found to induce macrophage activation by augmenting the expression of cell surface molecules (cluster of differentiation; CD80/86, and major histocompatibility complex; MHC class I/II) and production of pro-inflammatory cytokines (tumor necrosis $factor-{\alpha}$, interleukin-6, and $IL-1{\beta}$) through mitogen-activated protein kinase and nuclear $factor-{\kappa}B$ signaling pathways in RAW 264.7 macrophages. Taken together, our results indicate that BPW could potentially be used as an immunomodulatory agent.

A Study on Interdisciplinary Structure of Big Data Research with Journal-Level Bibliographic-Coupling Analysis (학술지 단위 서지결합분석을 통한 빅데이터 연구분야의 학제적 구조에 관한 연구)

  • Lee, Boram;Chung, EunKyung
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.133-154
    • /
    • 2016
  • Interdisciplinary approach has been recognized as one of key strategies to address various and complex research problems in modern science. The purpose of this study is to investigate the interdisciplinary characteristics and structure of the field of big data. Among the 1,083 journals related to the field of big data, multiple Subject Categories (SC) from the Web of Science were assigned to 420 journals (38.8%) and 239 journals (22.1%) were assigned with the SCs from different fields. These results show that the field of big data indicates the characteristics of interdisciplinarity. In addition, through bibliographic coupling network analysis of top 56 journals, 10 clusters in the network were recognized. Among the 10 clusters, 7 clusters were from computer science field focusing on technical aspects such as storing, processing and analyzing the data. The results of cluster analysis also identified multiple research works of analyzing and utilizing big data in various fields such as science & technology, engineering, communication, law, geography, bio-engineering and etc. Finally, with measuring three types of centrality (betweenness centrality, nearest centrality, triangle betweenness centrality) of journals, computer science journals appeared to have strong impact and subjective relations to other fields in the network.

Gene Expression Patterns Associated with Peroxisome Proliferator-activated Receptor (PPAR) Signaling in the Longissimus dorsi of Hanwoo (Korean Cattle)

  • Lim, Dajeong;Chai, Han-Ha;Lee, Seung-Hwan;Cho, Yong-Min;Choi, Jung-Woo;Kim, Nam-Kuk
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.28 no.8
    • /
    • pp.1075-1083
    • /
    • 2015
  • Adipose tissue deposited within muscle fibers, known as intramuscular fat (IMF or marbling), is a major determinant of meat quality and thereby affects its economic value. The biological mechanisms that determine IMF content are therefore of interest. In this study, 48 genes involved in the bovine peroxisome proliferator-activated receptor signaling pathway, which is involved in lipid metabolism, were investigated to identify candidate genes associated with IMF in the longissimus dorsi of Hanwoo (Korean cattle). Ten genes, retinoid X receptor alpha, peroxisome proliferator-activated receptor gamma (PPARG), phospholipid transfer protein, stearoyl-CoA desaturase, nuclear receptor subfamily 1 group H member 3, fatty acid binding protein 3 (FABP3), carnitine palmitoyltransferase II, acyl-Coenzyme A dehydrogenase long chain (ACADL), acyl-Coenzyme A oxidase 2 branched chain, and fatty acid binding protein 4, showed significant effects with regard to IMF and were differentially expressed between the low- and high-marbled groups (p<0.05). Analysis of the gene co-expression network based on Pearson's correlation coefficients identified 10 up-regulated genes in the high-marbled group that formed a major cluster. Among these genes, the PPARG-FABP4 gene pair exhibited the strongest correlation in the network. Glycerol kinase was found to play a role in mediating activation of the differentially expressed genes. We categorized the 10 significantly differentially expressed genes into the corresponding downstream pathways and investigated the direct interactive relationships among these genes. We suggest that fatty acid oxidation is the major downstream pathway affecting IMF content. The PPARG/RXRA complex triggers activation of target genes involved in fatty acid oxidation resulting in increased triglyceride formation by ATP production. Our findings highlight candidate genes associated with the IMF content of the loin muscle of Korean cattle and provide insight into the biological mechanisms that determine adipose deposition within muscle.