• Title/Summary/Keyword: Frequency Matrix

Search Result 1,168, Processing Time 0.028 seconds

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

A Study on Recent Research Trend in Management of Technology Using Keywords Network Analysis (키워드 네트워크 분석을 통해 살펴본 기술경영의 최근 연구동향)

  • Kho, Jaechang;Cho, Kuentae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.101-123
    • /
    • 2013
  • Recently due to the advancements of science and information technology, the socio-economic business areas are changing from the industrial economy to a knowledge economy. Furthermore, companies need to do creation of new value through continuous innovation, development of core competencies and technologies, and technological convergence. Therefore, the identification of major trends in technology research and the interdisciplinary knowledge-based prediction of integrated technologies and promising techniques are required for firms to gain and sustain competitive advantage and future growth engines. The aim of this paper is to understand the recent research trend in management of technology (MOT) and to foresee promising technologies with deep knowledge for both technology and business. Furthermore, this study intends to give a clear way to find new technical value for constant innovation and to capture core technology and technology convergence. Bibliometrics is a metrical analysis to understand literature's characteristics. Traditional bibliometrics has its limitation not to understand relationship between trend in technology management and technology itself, since it focuses on quantitative indices such as quotation frequency. To overcome this issue, the network focused bibliometrics has been used instead of traditional one. The network focused bibliometrics mainly uses "Co-citation" and "Co-word" analysis. In this study, a keywords network analysis, one of social network analysis, is performed to analyze recent research trend in MOT. For the analysis, we collected keywords from research papers published in international journals related MOT between 2002 and 2011, constructed a keyword network, and then conducted the keywords network analysis. Over the past 40 years, the studies in social network have attempted to understand the social interactions through the network structure represented by connection patterns. In other words, social network analysis has been used to explain the structures and behaviors of various social formations such as teams, organizations, and industries. In general, the social network analysis uses data as a form of matrix. In our context, the matrix depicts the relations between rows as papers and columns as keywords, where the relations are represented as binary. Even though there are no direct relations between papers who have been published, the relations between papers can be derived artificially as in the paper-keyword matrix, in which each cell has 1 for including or 0 for not including. For example, a keywords network can be configured in a way to connect the papers which have included one or more same keywords. After constructing a keywords network, we analyzed frequency of keywords, structural characteristics of keywords network, preferential attachment and growth of new keywords, component, and centrality. The results of this study are as follows. First, a paper has 4.574 keywords on the average. 90% of keywords were used three or less times for past 10 years and about 75% of keywords appeared only one time. Second, the keyword network in MOT is a small world network and a scale free network in which a small number of keywords have a tendency to become a monopoly. Third, the gap between the rich (with more edges) and the poor (with fewer edges) in the network is getting bigger as time goes on. Fourth, most of newly entering keywords become poor nodes within about 2~3 years. Finally, keywords with high degree centrality, betweenness centrality, and closeness centrality are "Innovation," "R&D," "Patent," "Forecast," "Technology transfer," "Technology," and "SME". The results of analysis will help researchers identify major trends in MOT research and then seek a new research topic. We hope that the result of the analysis will help researchers of MOT identify major trends in technology research, and utilize as useful reference information when they seek consilience with other fields of study and select a new research topic.

A digital Audio Watermarking Algorithm using 2D Barcode (2차원 바코드를 이용한 오디오 워터마킹 알고리즘)

  • Bae, Kyoung-Yul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.97-107
    • /
    • 2011
  • Nowadays there are a lot of issues about copyright infringement in the Internet world because the digital content on the network can be copied and delivered easily. Indeed the copied version has same quality with the original one. So, copyright owners and content provider want a powerful solution to protect their content. The popular one of the solutions was DRM (digital rights management) that is based on encryption technology and rights control. However, DRM-free service was launched after Steve Jobs who is CEO of Apple proposed a new music service paradigm without DRM, and the DRM is disappeared at the online music market. Even though the online music service decided to not equip the DRM solution, copyright owners and content providers are still searching a solution to protect their content. A solution to replace the DRM technology is digital audio watermarking technology which can embed copyright information into the music. In this paper, the author proposed a new audio watermarking algorithm with two approaches. First, the watermark information is generated by two dimensional barcode which has error correction code. So, the information can be recovered by itself if the errors fall into the range of the error tolerance. The other one is to use chirp sequence of CDMA (code division multiple access). These make the algorithm robust to the several malicious attacks. There are many 2D barcodes. Especially, QR code which is one of the matrix barcodes can express the information and the expression is freer than that of the other matrix barcodes. QR code has the square patterns with double at the three corners and these indicate the boundary of the symbol. This feature of the QR code is proper to express the watermark information. That is, because the QR code is 2D barcodes, nonlinear code and matrix code, it can be modulated to the spread spectrum and can be used for the watermarking algorithm. The proposed algorithm assigns the different spread spectrum sequences to the individual users respectively. In the case that the assigned code sequences are orthogonal, we can identify the watermark information of the individual user from an audio content. The algorithm used the Walsh code as an orthogonal code. The watermark information is rearranged to the 1D sequence from 2D barcode and modulated by the Walsh code. The modulated watermark information is embedded into the DCT (discrete cosine transform) domain of the original audio content. For the performance evaluation, I used 3 audio samples, "Amazing Grace", "Oh! Carol" and "Take me home country roads", The attacks for the robustness test were MP3 compression, echo attack, and sub woofer boost. The MP3 compression was performed by a tool of Cool Edit Pro 2.0. The specification of MP3 was CBR(Constant Bit Rate) 128kbps, 44,100Hz, and stereo. The echo attack had the echo with initial volume 70%, decay 75%, and delay 100msec. The sub woofer boost attack was a modification attack of low frequency part in the Fourier coefficients. The test results showed the proposed algorithm is robust to the attacks. In the MP3 attack, the strength of the watermark information is not affected, and then the watermark can be detected from all of the sample audios. In the sub woofer boost attack, the watermark was detected when the strength is 0.3. Also, in the case of echo attack, the watermark can be identified if the strength is greater and equal than 0.5.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Study of Rat Mammary Epithelial Stem Cells In Vivo and In Vitro (생체 및 시험관에서 유선 상피 모세포의 분리와 동정)

  • Nam Deuk Kim;Kee-Joo Paik
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.24 no.3
    • /
    • pp.470-486
    • /
    • 1995
  • Mammary epithelial cells contain a subpopulation of cells with a large proliferativ potential which are responsible for the maintenance of glandular cellularity and are the progenitor cells of mammary cancer. These clonogens give rise to multicellular clonal alveolar or ductal units(AU or DU) on transplantation and hormonal stimulation. To isolate putative mammary clonogens, enzymatically monodispersed rat mammary epithelial cells from organoid cultures and from intact glands are sorted by flow cytometry according to their affinity for FITC labeled peanut lectin(PNA) and PE labeled anti-Thy-1.1 antibody(Thy-1.1) into four subpopulations : cells negative to both PNA and Thy-1.1(B-), PNA+cells, Thy-1.1+cells, and cells positive to both reagents(B+). The in vivo transplantation assays indicate that the clonogenic fractions of PNA+cells from out-growths of organoids in primary cultures for three days in complete hormone medium(CHM) are significantly higher than those of cells from other subpopulations derived from cultrues or from intact glands. Extracellular matrix(ECM) is a complex of several proteins that regulated cell function ; its role in cell growth and differentiation and tissue-specific gene expression. It can act as a positive as well as a negative regulator of cellular differentiation depending on the cell type and the genes studied. Regulation by ECM is closely interrelated with the action of other regulators of cellular function, such as growth factors and hormones. Matrigel supports the growth and development of several different multicellular colonies from mammary organoids and from monodispersed epithelial cells in culture. Several types of colonies are observed including stellate colonies, duct-like structures, two- and three-dimensional web structures, squamous organoids, and lobulo-duct colonies. Organoids have the greatest proliferative potential and formation of multi-cellular structures. Phase contrast micrographs demonstrate extensive intracellular lipid accumulation within the web structures and some of duct-like colonies. At the immunocytochemical and electron micrograph level, casein proteins are predominantly localized near the apical surface of the cells or in the lumen of duct-like or lobulo-duct colonies. Squamous colonies are comprised of several layers of squamous epithelium surrounding keratin pearls as is typical fo squamous metaplasia(SM). All-trans retinoic acid(RA) inhibits the growth of SM. The frequency of lobulo-ductal colony formation increased with the augmentation of RA concentration in these culture conditions. The current study models could provide powerful tools not only for understanding cell growth and differentiation of epithelial cells, but also for the isolation and characterization of mammary clonogenic stem cells.

  • PDF

Popularization of Marathon through Social Network Big Data Analysis : Focusing on JTBC Marathon (소셜 네트워크 빅데이터 분석을 통한 마라톤 대중화 : JTBC 마라톤대회를 중심으로)

  • Lee, Ji-Su;Kim, Chi-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.3
    • /
    • pp.27-40
    • /
    • 2020
  • The marathon has long been established as a representative lifestyle for all ages. With the recent expansion of the Work and Life Balance trend across the society, marathon with a relatively low barrier to entry is gaining popularity among young people in their 20s and 30s. By analyzing the issues and related words of the marathon event, we will analyze the spottainment elements of the marathon event that is popular among young people through keywords, and suggest a development plan for the differentiated event. In order to analyze keywords and related words, blogs, cafes and news provided by Naver and Daum were selected as analysis channels, and 'JTBC Marathon' and 'Culture' were extracted as key words for data search. The data analysis period was limited to a three-month period from August 13, 2019 to November 13, 2019, when the application for participation in the 2019 JTBC Marathon was started. For data collection and analysis, frequency and matrix data were extracted through social matrix program Textom. In addition, the degree of the relationship was quantified by analyzing the connection structure and the centrality of the degree of connection between the words. Although the marathon is a personal movement, young people share a common denominator of "running" and form a new cultural group called "running crew" with other young people. Through this, it was found that a marathon competition culture was formed as a festival venue where people could train together, participate together, and escape from the image of a marathon run alone and fight with themselves.

Level Set based Topological Shape Optimization of Phononic Crystals (음향결정 구조의 레벨셋 기반 위상 및 형상 최적설계)

  • Kim, Min-Geun;Hashimoto, Hiroshi;Abe, Kazuhisa;Cho, Seonho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.25 no.6
    • /
    • pp.549-558
    • /
    • 2012
  • A topology optimization method for phononic crystals is developed for the design of sound barriers, using the level set approach. Given a frequency and an incident wave to the phononic crystals, an optimal shape of periodic inclusions is found by minimizing the norm of transmittance. In a sound field including scattering bodies, an acoustic wave can be refracted on the obstacle boundaries, which enables to control acoustic performance by taking the shape of inclusions as the design variables. In this research, we consider a layered structure which is composed of inclusions arranged periodically in horizontal direction while finite inclusions are distributed in vertical direction. Due to the periodicity of inclusions, a unit cell can be considered to analyze the wave propagation together with proper boundary conditions which are imposed on the left and right edges of the unit cell using the Bloch theorem. The boundary conditions for the lower and the upper boundaries of unit cell are described by impedance matrices, which represent the transmission of waves between the layered structure and the semi-infinite external media. A level set method is employed to describe the topology and the shape of inclusions. In the level set method, the initial domain is kept fixed and its boundary is represented by an implicit moving boundary embedded in the level set function, which facilitates to handle complicated topological shape changes. Through several numerical examples, the applicability of the proposed method is demonstrated.

A Frequency Domain DV-to-MPEG-2 Transcoding (DV에서 MPEG-2로의 주파수 영역 변환 부호화)

  • Kim, Do-Nyeon;Yun, Beom-Sik;Choe, Yun-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.2
    • /
    • pp.138-148
    • /
    • 2001
  • Digital Video (DV) coding standards for digital video cassette recorder are based mainly on DCT and variable length coding. DV has low hardware complexity but high compressed bit rate of about 26 Mb/s. Thus, it is necessary to encode video with low complex video coding at the studios and then transcode compressed video into MPEG-2 for video-on-demand system. Because these coding methods exploit DCT, transcoding in the DCT domain can reduce computational complexity by excluding duplicated procedures. In transcoding DV into MPEC-2 intra coding, multiplying matrix by transformed data is used for 4:1:1-to-4:2:2 chroma format conversion and the conversion from 2-4-8 to 8-8 DCT mode, and therefore enables parallel processing. Variance of sub block for MPEG-2 rate control is computed completely in the DCT domain. These are verified through experiments. We estimate motion hierarchically using DCT coefficients for transcoding into MPEG-2 inter coding. First, we estimate motion of a macro block (MB) only with 4 DC values of 4 sub blocks and then estimate motion with 16-point MB using IDCT of 2$\times$2 low frequencies in each sub block, and finish estimation at a sub pixel as the fifth step. ME with overlapped search range shows better PSNR performance than ME without overlapping.

  • PDF

Development and Performance Compensation of the Extremely Stable Transceiver System for High Resolution Wideband Active Phased Array Synthetic Aperture Radar (고해상도 능동 위상 배열 영상 레이더를 위한 고안정 송수신 시스템 개발 및 성능 보정 연구)

  • Sung, Jin-Bong;Kim, Se-Young;Lee, Jong-Hwan;Jeon, Byeong-Tae
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.21 no.6
    • /
    • pp.573-582
    • /
    • 2010
  • In this paper, X-band transceiver for high resolution wideband SAR systems is designed and fabricated. Also as a technique for enhancing the performance, error compensation algorithm is presented. The transceiver for SAR system is composed of transmitter, receiver, switch matrix and frequency generator. The receiver especially has 2 channel mono-pulse structure for ground moving target indication. The transceiver is able to provide the deramping signal for high resolution mode and select the receive bandwidth for receiving according to the operation mode. The transceiver had over 300 MHz bandwidth in X-band and 13.3 dBm output power which is appropriate to drive the T/R module. The receiver gain and noise figure was 39 dB and 3.96 dB respectively. The receive dynamic range was 30 dB and amplitude imbalance and phase imbalance of I/Q channel was ${\pm}$0.38 dBm and ${\pm}$3.47 degree respectively. The transceiver meets the required electrical performances through the individual tests. This paper shows the pulse error term depending on SAR performance was analyzed and range IRF was enhanced by applying the compensation technique.

A Study of Secondary Mathematics Materials at a Gifted Education Center in Science Attached to a University Using Network Text Analysis (네트워크 텍스트 분석을 활용한 대학부설 과학영재교육원의 중등수학 강의교재 분석)

  • Kim, Sungyeun;Lee, Seonyoung;Shin, Jongho;Choi, Won
    • Communications of Mathematical Education
    • /
    • v.29 no.3
    • /
    • pp.465-489
    • /
    • 2015
  • The purpose of this study is to suggest implications for the development and revision of future teaching materials for mathematically gifted students by using network text analysis of secondary mathematics materials. Subjects of the analysis were learning goals of 110 teaching materials in a gifted education center in science attached to a university from 2002 to 2014. In analysing the frequency of the texts that appeared in the learning goals, key words were selected. A co-occurrence matrix of the key words was established, and a basic information of network, centrality, centralization, component, and k-core were deducted. For the analysis, KrKwic, KrTitle, and NetMiner4.0 programs were used, respectively. The results of this study were as follows. First, there was a pivot of the network formed with core hubs including 'diversity', 'understanding' 'concept' 'method', 'application', 'connection' 'problem solving', 'basic', 'real life', and 'thinking ability' in the whole network from 2002 to 2014. In addition, knowledge aspects were well reflected in teaching materials based on the centralization analysis. Second, network text analysis based on the three periods of the Mater Plan for the promotion of gifted education was conducted. As a result, a network was built up with 'understanding', and there were strong ties among 'question', 'answer', and 'problem solving' regardless of the periods. On the contrary, the centrality analysis showed that 'communication', 'discovery', and 'proof' only appeared in the first, second, and third period of Master Plan, respectively. Therefore, the results of this study suggest that affective aspects and activities with high cognitive process should be accompanied, and learning goals' mannerism and ahistoricism be prevented in developing and revising teaching materials.