• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.027 seconds

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (텍스트마이닝을 활용한 공개데이터 기반 기업 및 산업 토픽추이분석 모델 제안)

  • Park, Sunyoung;Lee, Gene Moo;Kim, You-Eil;Seo, Jinny
    • Journal of Technology Innovation
    • /
    • v.26 no.4
    • /
    • pp.199-232
    • /
    • 2018
  • There are increasing needs for understanding and fathoming of business management environment through big data analysis at industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm level analyses using publicly available company disclousre data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels. Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries' topic trend, software and hardware industries are compared in recent 20 years. Also, the changes of management subject at firm level are observed with comparison of two companies in software industry. The changes of topic trends provides lens for identifying decreasing and growing management subjects at industrial and firm level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at firm level in software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades. For suggesting methodology to develop analysis model based on public management data at industrial and corporate level, there may be contributions in terms of making ground of practical methodology to identifying changes of managements subjects. However, there are required further researches to provide microscopic analytical model with regard to relation of technology management strategy between management performance in case of related to various pattern of management topics as of frequent changes of management subject or their momentum. Also more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

A Study on Industry-specific Sustainability Strategy: Analyzing ESG Reports and News Articles (산업별 지속가능경영 전략 고찰: ESG 보고서와 뉴스 기사를 중심으로)

  • WonHee Kim;YoungOk Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.287-316
    • /
    • 2023
  • As global energy crisis and the COVID-19 pandemic have emerged as social issues, there is a growing demand for companies to move away from profit-centric business models and embrace sustainable management that balances environmental, social, and governance (ESG) factors. ESG activities of companies vary across industries, and industry-specific weights are applied in ESG evaluations. Therefore, it is important to develop strategic management approaches that reflect the characteristics of each industry and the importance of each ESG factor. Additionally, with the stance of strengthened focus on ESG disclosures, specific guidelines are needed to identify and report on sustainable management activities of domestic companies. To understand corporate sustainability strategies, analyzing ESG reports and news articles by industry can help identify strategic characteristics in specific industries. However, each company has its own unique strategies and report structures, making it difficult to grasp detailed trends or action items. In our study, we analyzed ESG reports (2019-2021) and news articles (2019-2022) of six companies in the 'Finance,' 'Manufacturing,' and 'IT' sectors to examine the sustainability strategies of leading domestic ESG companies. Text mining techniques such as keyword frequency analysis and topic modeling were applied to identify industry-specific, ESG element-specific management strategies and issues. The analysis revealed that in the 'Finance' sector, customer-centric management strategies and efforts to promote an inclusive culture within and outside the company were prominent. Strategies addressing climate change, such as carbon neutrality and expanding green finance, were also emphasized. In the 'Manufacturing' sector, the focus was on creating sustainable communities through occupational health and safety issues, sustainable supply chain management, low-carbon technology development, and eco-friendly investments to achieve carbon neutrality. In the 'IT' sector, there was a tendency to focus on technological innovation and digital responsibility to enhance social value through technology. Furthermore, the key issues identified in the ESG factors were as follows: under the 'Environmental' element, issues such as greenhouse gas and carbon emission management, industry-specific eco-friendly activities, and green partnerships were identified. Under the 'Social' element, key issues included social contribution activities through stakeholder engagement, supporting the growth and coexistence of members and partner companies, and enhancing customer value through stable service provision. Under the 'Governance' element, key issues were identified as strengthening board independence through the appointment of outside directors, risk management and communication for sustainable growth, and establishing transparent governance structures. The exploration of the relationship between ESG disclosures in reports and ESG issues in news articles revealed that the sustainability strategies disclosed in reports were aligned with the issues related to ESG disclosed in news articles. However, there was a tendency to strengthen ESG activities for prevention and improvement after negative media coverage that could have a negative impact on corporate image. Additionally, environmental issues were mentioned more frequently in news articles compared to ESG reports, with environmental-related keywords being emphasized in the 'Finance' sector in the reports. Thus, ESG reports and news articles shared some similarities in content due to the sharing of information sources. However, the impact of media coverage influenced the emphasis on specific sustainability strategies, and the extent of mentioning environmental issues varied across documents. Based on our study, the following contributions were derived. From a practical perspective, companies need to consider their characteristics and establish sustainability strategies that align with their capabilities and situations. From an academic perspective, unlike previous studies on ESG strategies, we present a subdivided methodology through analysis considering the industry-specific characteristics of companies.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Identifying Landscape Perceptions of Visitors' to the Taean Coast National Park Using Social Media Data - Focused on Kkotji Beach, Sinduri Coastal Sand Dune, and Manlipo Beach - (소셜미디어 데이터를 활용한 태안해안국립공원 방문객의 경관인식 파악 - 꽃지해수욕장·신두리해안사구·만리포해수욕장을 대상으로 -)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.5
    • /
    • pp.10-21
    • /
    • 2018
  • This study used text mining methodology to focus on the perceptions of the landscape embedded in text that users spontaneously uploaded to the "Taean Travel"blogpost. The study area is the Taean Coast National Park. Most of the places that are searched by 'Taean Travel' on the blog were located in the Taean Coast National Park. We conducted a network analysis on the top three places and extracted keywords related to the landscape. Finally, using a centrality and cohesion analysis, we derived landscape perceptions and the major characteristics of those landscapes. As a result of the study, it was possible to identify the main tourist places in Taean, the individual landscape experience, and the landscape perception in specific places. There were three different types of landscape characteristics: atmosphere-related keywords, which appeared in Kkotji Beach, symbolic image-related keywords appeared in Sinduri Coastal Sand Dune, and landscape objects-related appeared in Manlipo Beach. It can be inferred that the characteristics of these three places are perceived differently. Kkotji Beach is recognized as a place to appreciate a view the sunset and is a base for the Taean Coast National Park's trekking course. Sinduri Coastal Sand Dune is recognized as a place with unusual scenery, and is an ecologically valuable space. Finally, Manlipo Beach is adjacent to the Chunlipo Arboretum, which is often visited by tourists, and the beach itself is recognized as a place with an impressive appearance. Social media data is very useful because it can enable analysis of various types of contents that are not from an expert's point of view. In this study, we used social media data to analyze various aspects of how people perceive and enjoy landscapes by integrating various content, such as landscape objects, images, and activities. However, because social media data may be amplified or distorted by users' memories and perceptions, field surveys are needed to verify the results of this study.

A Study on the Landscape Cognition of Wind Power Plant in Social Media (소셜미디어에 나타난 풍력발전시설의 경관 인식 연구)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.50 no.5
    • /
    • pp.69-79
    • /
    • 2022
  • This study aims to assess the current understanding of the landscape of wind power facilities as renewable energy sources that supply sightseeing, tourism, and other opportunities. Therefore, social media data related to the landscape of wind power facilities experienced by visitors from different regions was analyzed. The analysis results showed that the common characteristics of the landscape of wind power facilities are based on the scale of wind power facilities, the distance between overlook points of wind power facilities, the visual openness of the wind power facilities from the overlook points, and the terrain where the wind power facilities are located. In addition, the preference for wind power facilities is higher in places where the shape of wind power facilities and the surrounding landscape can be clearly seen- flat ground or the sea are considered better landscapes. Negative keywords about the landscape appear on Gade Mountain in Taibai, Meifeng Mountain in Taibai, Taiqi Mountain, and Gyeongju Wind Power Generation Facilities on Gyeongshang Road in Gangwon. The keyword 'negation' occurs when looking at wind power facilities at close range. Because of the high angle of the view, viewers can feel overwhelmed seeing the size of the facility and the ridge simultaneously, feeling psychological pressure. On the contrary, positive landscape adjectives are obtained from wind power facilities on flat ground or the sea. Visitors think that the visual volume of the landscape is fully ensured on flat ground or the sea, and it is a symbolic element that can represent the site. This study analyzes landscape awareness based on the opinions of visitors who have experienced wind power facilities. However, wind power facilities are built in different areas. Therefore, landscape characteristics are different, and there are many variables, such as viewpoints and observers, so the research results are difficult to popularize and have limitations. In recent years, landscape damage due to the construction of wind power facilities has become a hot issue, and the domestic methods of landscape evaluation of wind power facilities are unsatisfactory. Therefore, when evaluating the landscape of wind power facilities, the scale of wind power facilities, the inherent natural characteristics of the area where wind power facilities are set up, and the distance between wind power facilities and overlook points are important elements to consider. In addition, wind power facilities are set in the natural environment, which needs to be protected. Therefore, from the landscape perspective, it is necessary to study the landscape of wind power facilities and the surrounding environment.

Occurrence and Chemical Composition of Ti-bearing Minerals from Drilling Core (No.04-1) at Gubong Au-Ag Deposit Area, Republic of Korea (구봉 금-은 광상일대 시추코아(04-1)에서 산출되는 함 티타늄 광물들의 산상과 화학조성)

  • Bong Chul Yoo
    • Korean Journal of Mineralogy and Petrology
    • /
    • v.36 no.3
    • /
    • pp.185-197
    • /
    • 2023
  • The Gubong Au-Ag deposit consists of eight lens-shaped quartz veins. These veins have filled fractures along fault zones within Precambrian metasedimentary rock. This has been one of the largest deposits in Korea, and is geologically a mix of orogenic-type and intrusion-related types. Korea Mining Promotion Corporation drilled into a quartz vein (referred to as the No. 6 vein) with a width of 0.9 m and a grade of 27.9 g/t Au at a depth of -728 ML by drilling (No. 90-12) in the southern site of the deposit, To further investigate the potential redevelopment of the No. 6 vein, another drilling (No. 04-1) was carried out in 2004. In 2004, samples (wallrock, wallrock alteration and quartz vein) were collected from the No. 04-1 drilling core site to study the occurrence and chemical composition of Ti-bearing minerals (ilmenite, rutile). Rutile from mineralized zone at a depth of -275 ML occur minerals including K-feldspar, biotite, quartz, calcite, chlorite, pyrite in wallrock alteration zone. Ilmenite and rutile from ore vein (No. 6 vein) at a depth of -779 ML occur minerals including white mica, chlorite, apatite, zircon, quartz, calcite, pyrrhotite, pyrite in wallrock alteration zone and quartz vein. Based on mineral assemblage, rutile was formed by hydrothermal alteration (chloritization) of Ti-rich biotite in the wallrock. Chemical composition of ilmenite has maximum values of 0.09 wt.% (HfO2), 0.39 wt.% (V2O3) and 0.54 wt.% (BaO). Comparing the chemical composition of rutile at a depth -275 ML and -779 ML, Rutile at a depth of -779 ML is higher contents (WO3, FeO and BaO) than rutile at a depth of -275 ML. The substitutions of rutile at a depth of -275 ML and -779 ML are as followed : rutile at a depth of -275 ML Ba2+ + Al3+ + Hf4+ + (Nb5+, Ta5+) ↔ 3Ti4+ + Fe2+, 2V4+ + (W5+, Ta5+, Nb5+) ↔ 2Ti4+ + Al3+ + (Fe2+, Ba2+), Al3+ + V4++ (Nb5+, Ta5+) ↔ 2Ti4+ + 2Fe2+, rutile at a depth of -779 ML 2 (Fe2+, Ba2+) + Al3+ + (W5+, Nb5+, Ta5+) ↔ 2Ti4+ + (V4+, Hf4+), Fe2+ + Al3+ + Hf 4+ + (W5+, Nb5+, Ta5+) ↔ 2Ti4+ + V4+ + Ba2+, respectively. Based on these data and chemical composition of rutiles from orogenic-type deposits, rutiles from Gubong deposit was formed in a relatively oxidizing environment than the rutile from orogenictype deposits (Unsan deposit, Kori Kollo deposit, Big Bell deposit, Meguma gold-bearing quartz vein).

A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis (텍스트 분석을 활용한 정보의 수요 공급 기반 뉴스 가치 평가 방안)

  • Lee, Donghoon;Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.45-67
    • /
    • 2016
  • Given the recent development of smart devices, users are producing, sharing, and acquiring a variety of information via the Internet and social network services (SNSs). Because users tend to use multiple media simultaneously according to their goals and preferences, domestic SNS users use around 2.09 media concurrently on average. Since the information provided by such media is usually textually represented, recent studies have been actively conducting textual analysis in order to understand users more deeply. Earlier studies using textual analysis focused on analyzing a document's contents without substantive consideration of the diverse characteristics of the source medium. However, current studies argue that analytical and interpretive approaches should be applied differently according to the characteristics of a document's source. Documents can be classified into the following types: informative documents for delivering information, expressive documents for expressing emotions and aesthetics, operational documents for inducing the recipient's behavior, and audiovisual media documents for supplementing the above three functions through images and music. Further, documents can be classified according to their contents, which comprise facts, concepts, procedures, principles, rules, stories, opinions, and descriptions. Documents have unique characteristics according to the source media by which they are distributed. In terms of newspapers, only highly trained people tend to write articles for public dissemination. In contrast, with SNSs, various types of users can freely write any message and such messages are distributed in an unpredictable way. Again, in the case of newspapers, each article exists independently and does not tend to have any relation to other articles. However, messages (original tweets) on Twitter, for example, are highly organized and regularly duplicated and repeated through replies and retweets. There have been many studies focusing on the different characteristics between newspapers and SNSs. However, it is difficult to find a study that focuses on the difference between the two media from the perspective of supply and demand. We can regard the articles of newspapers as a kind of information supply, whereas messages on various SNSs represent a demand for information. By investigating traditional newspapers and SNSs from the perspective of supply and demand of information, we can explore and explain the information dilemma more clearly. For example, there may be superfluous issues that are heavily reported in newspaper articles despite the fact that users seldom have much interest in these issues. Such overproduced information is not only a waste of media resources but also makes it difficult to find valuable, in-demand information. Further, some issues that are covered by only a few newspapers may be of high interest to SNS users. To alleviate the deleterious effects of information asymmetries, it is necessary to analyze the supply and demand of each information source and, accordingly, provide information flexibly. Such an approach would allow the value of information to be explored and approximated on the basis of the supply-demand balance. Conceptually, this is very similar to the price of goods or services being determined by the supply-demand relationship. Adopting this concept, media companies could focus on the production of highly in-demand issues that are in short supply. In this study, we selected Internet news sites and Twitter as representative media for investigating information supply and demand, respectively. We present the notion of News Value Index (NVI), which evaluates the value of news information in terms of the magnitude of Twitter messages associated with it. In addition, we visualize the change of information value over time using the NVI. We conducted an analysis using 387,014 news articles and 31,674,795 Twitter messages. The analysis results revealed interesting patterns: most issues show lower NVI than average of the whole issue, whereas a few issues show steadily higher NVI than the average.

New demand forecast for vocational high school graduates in regional strategic industries: Focusing on comparison between Daejeon and Jeonnam (지역전략산업에 따른 특성화고 졸업자 신규수요 예측: 대전과 전남 지역 비교를 중심으로)

  • Kim, Jin-Mo;Choi, Su-Jung;Jeon, Yeong-Uk;Oh, Jin-Ju;Ryu, Ji-Eun;Kim, Seon-Geun
    • Journal of vocational education research
    • /
    • v.36 no.1
    • /
    • pp.47-75
    • /
    • 2017
  • The purpose of this study was to provide basic data for policy making for secondary vocational education in each region and transformation in vocational high schools. To achieve this, the regional strategic industries in Daejeon and Jeonnam were selected, new demand for vocational high school graduates was forecasted in each industry and occupation. The results of the study are as follows. First, locational quotient analysis and regional shift-share analysis revealed that Daejon and Jeonnam have different strategic industries. Daejon, unlike Jeonnam strategically develops 'manufacturing food, beverage and tobacco', 'manufacturing timber and paper, printing and copying', 'public service and administration of national defense and social security' and 'manufacturing electrical devices, electronics and precision devices'. Jeonnam has specialized industries distinguished from Daejon's, which are 'manufacturing of machinery transportation equipments and etc', 'manufacturing of non-metallic minerals and metal products', 'electric, gas, steam and water supply systems/industries', 'manufacturing coal and chemical products, refining petroleum', 'mining' and 'agriculture, forestry and fishery'. Second, new demand for vocational high school graduates by occupations and industries showed regional differences(in Daejon and Jeonnam). According the forecast, Daejon will have many workforce demands based on manufacturing industries, on the other hand Jeonnam's focused on service industries. Analysis by occupations was also different, Daejon showed high demands on professional and related workers, while Jeonnam requested many new office and service workers. Third, new workforce demand by occupations in regional strategic industries is big part of overall new workforce demand both in Daejon and Jeonnam. Forth, according to the results of analyzing the new demand for vocational high school graduates in Daejeon and Jeonnam in terms of industry location quotient and change effect, there was high demand in industries with positive total change effects. In terms of location quotient, Daejeon and Jeonnam showed different results.

The Effects of pH Change in Extraction Solution on the Heavy Metals Extraction from Soil and Controversial Points for Partial Extraction in Korean Standard Method (용출액의 pH 변화가 토양내 중금속 용출에 미치는 영향과 그에 따른 국내 토양 오염 공정시험방법의 문제점)

  • 오창환;유연희;이평구;이영엽
    • Economic and Environmental Geology
    • /
    • v.36 no.3
    • /
    • pp.159-170
    • /
    • 2003
  • Heavy metals are extracted from Chonju stream sediment, roadside soils and sediments along Honam expressway, soils and tailings from mining area using three different methods (partial extraction in Standard Method, partial extraction method with maintaining 0.1 N of extraction solution and Sequential Extraction Method). In samples having buffer capacity against acid, pH 1 (0.1 N HCl) of extraction solution can not be maintained and pH of extraction solution increases up to 8.0 when partial extraction in Standard Method is used. The averages and ranges of HPE(heavy metals extracted using partial extraction in Standard Method)/HPEM(heavy metals extracted using partial extraction method with maintaining 0.1 N of extraction solution) values are 0.479 and 0.145~0.929 for Cd, 0.534 and 0.078~0.928 for Zn, 0.432 and 0.041~0.992 for Mn, 0.359 and 0.011~0.874 for Cu, 0.150 and 0.018~0.530 for Cr, 0.219 and 0.003~0.853 for Pb, and 0.088 and 1.73${\times}$10$^{-5}$~0.303 for Fe. These data indicate that the difference between HPE and HPEM is large in the order of Fe, Cr, Pb, Cu, Mn, Cd and Zn. The amounts of heavy metals extracted decreases in the follow order; Sum III(sum of fraction I, II, III in sequential extraction)>HPEM>Sum III (sum of fraction I and II)>HPE for Zn, Cd and Mn and Sum III>HPEM>HPE for Cr and Fe. In the case Cr, Sum II is lower than HPEM and higher than HPE. In case of Cu, extracted heavy metals is large in the order Sum IV>HPEM>Sum III HPE. HPE/HPEM value decreases with increasing the amount of HCl used for maintaining 0.1 N of extraction solution. For samples with high buffer capacity, HPE/HPEM value in all elements is lower than 0.2. On the other hand, for samples with low buffer capacity, HPE/HPEM value are over 0.2 and many samples have values higher than 0.6 for Zn, Cd Mn and Cu due to the small difference between Sum II and Sum III, and relatively higher mobility. However, for Fe and Cr, HPE/HPEM value is below 0.2 even for samples with low buffer capacity due to their low mobility and big difference between Sum II and Sum III. This study indicates that the partial extraction method in Korean Standard Method of soil is not suitable for an assessment of soil contamination in area where buffer capacity of soil can be decreased or lost because of a long term exposure to environmental damage such as acidic rain.