• Title/Summary/Keyword: mining

Search Result 6,706, Processing Time 0.029 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Principles of Space Resources Exploitation under International Law (국제법상 우주자원개발원칙)

  • Kim, Han-Teak
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.33 no.2
    • /
    • pp.35-59
    • /
    • 2018
  • Professor Bin Cheng said that outer space was res extra commercium, while the moon and the other celestial bodies were res nullius before the 1967 Outer Space Treaty(OST). However, Article 2 of the OST made the moon and other celestial bodies have the legal status as res extra commmercium, not appropriated by any country or private enterprises or individual person, but the resources there can be freely available, as those on the high seas. The non-appropriation principle was introduced to corpus juris spatialis internationalis. Whether or not the non-appropriation principle is binding for the non-parties of the OST, many scholars see this principle as an international customary law, even developing into jus cogens. Article 11(2) of the Moon Agreement(MA) reconfirms the nonappropriation principle of Article 2 of the OST, but it has much less effect than the OST because the MA binds only the 18 parties involved. The MA applies only to the moon and celestial bodies other than the Earth in the Solar System, the OST's application scope extends to the Galaxy because the OST has no such substantive enactment. As referred to in the 2015 CSLCA of USA or Luxembourg's Law of Space Resources, allowing individuals and enterprises run by other countries to commercially explore and utilize the space resources, the question may arise whether this violates the non-appropriation principle under Article 2 of the OST and Article 11 of the MA. In the case of the CSLCA, the law explicitly specifies that sovereignty, possessory rights, and judiciary rights to a specific celestial body cannot be claimed, let alone ownership. This author believes that this law respects the legal status of outer space and the celestial bodies as res extra commmercium. As long as any countries or private enterprises or individuals respect the non-appropriation principle of outer space and the celestial bodies, they could use, exploit it. Another question might be raised in the difference between res extra commercium on the high seas and res extra commercium in outer space and the celestial bodies. Collecting resources on the high seas and exploiting space resources should be interpreted differently. On the high seas, resources can be collected without any obstacles like fishing, whereas, in the case of the deep sea-bed area, the Common Heritage of Mankind principles under the UNCLOS should be operated by the International Seabed Authority as an international regime. The nature or form of the sea resources found on the high seas are thus different from that of space resources, which are fixed on the moon and the celestial bodies without water. Thus, if individuals or private enterprises collect these resources from outer space and the celestial bodies, they might secure a certain section and continue collecting or mining works without any limitation. If an American enterprise receives an approval from the U.S. government, secures the best location and collects resources on the moon, can other countries' enterprises access to this area? How large the exploiting place can be allotted on the moon? How long should such a exploiting activity be lasted? Under the current international space law, these matters might be handled according to the principle of "first come, first served." As a consequence, the international community should provide a guideline or a proposal for the settlement of any foreseeable disputes during the space activity to solve plausible space legal questions in the near future.

Occurrence and Chemical Composition of Ti-bearing Minerals from Drilling Core (No.04-1) at Gubong Au-Ag Deposit Area, Republic of Korea (구봉 금-은 광상일대 시추코아(04-1)에서 산출되는 함 티타늄 광물들의 산상과 화학조성)

  • Bong Chul Yoo
    • Korean Journal of Mineralogy and Petrology
    • /
    • v.36 no.3
    • /
    • pp.185-197
    • /
    • 2023
  • The Gubong Au-Ag deposit consists of eight lens-shaped quartz veins. These veins have filled fractures along fault zones within Precambrian metasedimentary rock. This has been one of the largest deposits in Korea, and is geologically a mix of orogenic-type and intrusion-related types. Korea Mining Promotion Corporation drilled into a quartz vein (referred to as the No. 6 vein) with a width of 0.9 m and a grade of 27.9 g/t Au at a depth of -728 ML by drilling (No. 90-12) in the southern site of the deposit, To further investigate the potential redevelopment of the No. 6 vein, another drilling (No. 04-1) was carried out in 2004. In 2004, samples (wallrock, wallrock alteration and quartz vein) were collected from the No. 04-1 drilling core site to study the occurrence and chemical composition of Ti-bearing minerals (ilmenite, rutile). Rutile from mineralized zone at a depth of -275 ML occur minerals including K-feldspar, biotite, quartz, calcite, chlorite, pyrite in wallrock alteration zone. Ilmenite and rutile from ore vein (No. 6 vein) at a depth of -779 ML occur minerals including white mica, chlorite, apatite, zircon, quartz, calcite, pyrrhotite, pyrite in wallrock alteration zone and quartz vein. Based on mineral assemblage, rutile was formed by hydrothermal alteration (chloritization) of Ti-rich biotite in the wallrock. Chemical composition of ilmenite has maximum values of 0.09 wt.% (HfO2), 0.39 wt.% (V2O3) and 0.54 wt.% (BaO). Comparing the chemical composition of rutile at a depth -275 ML and -779 ML, Rutile at a depth of -779 ML is higher contents (WO3, FeO and BaO) than rutile at a depth of -275 ML. The substitutions of rutile at a depth of -275 ML and -779 ML are as followed : rutile at a depth of -275 ML Ba2+ + Al3+ + Hf4+ + (Nb5+, Ta5+) ↔ 3Ti4+ + Fe2+, 2V4+ + (W5+, Ta5+, Nb5+) ↔ 2Ti4+ + Al3+ + (Fe2+, Ba2+), Al3+ + V4++ (Nb5+, Ta5+) ↔ 2Ti4+ + 2Fe2+, rutile at a depth of -779 ML 2 (Fe2+, Ba2+) + Al3+ + (W5+, Nb5+, Ta5+) ↔ 2Ti4+ + (V4+, Hf4+), Fe2+ + Al3+ + Hf 4+ + (W5+, Nb5+, Ta5+) ↔ 2Ti4+ + V4+ + Ba2+, respectively. Based on these data and chemical composition of rutiles from orogenic-type deposits, rutiles from Gubong deposit was formed in a relatively oxidizing environment than the rutile from orogenictype deposits (Unsan deposit, Kori Kollo deposit, Big Bell deposit, Meguma gold-bearing quartz vein).

A Study on Intelligent Value Chain Network System based on Firms' Information (기업정보 기반 지능형 밸류체인 네트워크 시스템에 관한 연구)

  • Sung, Tae-Eung;Kim, Kang-Hoe;Moon, Young-Su;Lee, Ho-Shin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.67-88
    • /
    • 2018
  • Until recently, as we recognize the significance of sustainable growth and competitiveness of small-and-medium sized enterprises (SMEs), governmental support for tangible resources such as R&D, manpower, funds, etc. has been mainly provided. However, it is also true that the inefficiency of support systems such as underestimated or redundant support has been raised because there exist conflicting policies in terms of appropriateness, effectiveness and efficiency of business support. From the perspective of the government or a company, we believe that due to limited resources of SMEs technology development and capacity enhancement through collaboration with external sources is the basis for creating competitive advantage for companies, and also emphasize value creation activities for it. This is why value chain network analysis is necessary in order to analyze inter-company deal relationships from a series of value chains and visualize results through establishing knowledge ecosystems at the corporate level. There exist Technology Opportunity Discovery (TOD) system that provides information on relevant products or technology status of companies with patents through retrievals over patent, product, or company name, CRETOP and KISLINE which both allow to view company (financial) information and credit information, but there exists no online system that provides a list of similar (competitive) companies based on the analysis of value chain network or information on potential clients or demanders that can have business deals in future. Therefore, we focus on the "Value Chain Network System (VCNS)", a support partner for planning the corporate business strategy developed and managed by KISTI, and investigate the types of embedded network-based analysis modules, databases (D/Bs) to support them, and how to utilize the system efficiently. Further we explore the function of network visualization in intelligent value chain analysis system which becomes the core information to understand industrial structure ystem and to develop a company's new product development. In order for a company to have the competitive superiority over other companies, it is necessary to identify who are the competitors with patents or products currently being produced, and searching for similar companies or competitors by each type of industry is the key to securing competitiveness in the commercialization of the target company. In addition, transaction information, which becomes business activity between companies, plays an important role in providing information regarding potential customers when both parties enter similar fields together. Identifying a competitor at the enterprise or industry level by using a network map based on such inter-company sales information can be implemented as a core module of value chain analysis. The Value Chain Network System (VCNS) combines the concepts of value chain and industrial structure analysis with corporate information simply collected to date, so that it can grasp not only the market competition situation of individual companies but also the value chain relationship of a specific industry. Especially, it can be useful as an information analysis tool at the corporate level such as identification of industry structure, identification of competitor trends, analysis of competitors, locating suppliers (sellers) and demanders (buyers), industry trends by item, finding promising items, finding new entrants, finding core companies and items by value chain, and recognizing the patents with corresponding companies, etc. In addition, based on the objectivity and reliability of the analysis results from transaction deals information and financial data, it is expected that value chain network system will be utilized for various purposes such as information support for business evaluation, R&D decision support and mid-term or short-term demand forecasting, in particular to more than 15,000 member companies in Korea, employees in R&D service sectors government-funded research institutes and public organizations. In order to strengthen business competitiveness of companies, technology, patent and market information have been provided so far mainly by government agencies and private research-and-development service companies. This service has been presented in frames of patent analysis (mainly for rating, quantitative analysis) or market analysis (for market prediction and demand forecasting based on market reports). However, there was a limitation to solving the lack of information, which is one of the difficulties that firms in Korea often face in the stage of commercialization. In particular, it is much more difficult to obtain information about competitors and potential candidates. In this study, the real-time value chain analysis and visualization service module based on the proposed network map and the data in hands is compared with the expected market share, estimated sales volume, contact information (which implies potential suppliers for raw material / parts, and potential demanders for complete products / modules). In future research, we intend to carry out the in-depth research for further investigating the indices of competitive factors through participation of research subjects and newly developing competitive indices for competitors or substitute items, and to additively promoting with data mining techniques and algorithms for improving the performance of VCNS.

SKU recommender system for retail stores that carry identical brands using collaborative filtering and hybrid filtering (협업 필터링 및 하이브리드 필터링을 이용한 동종 브랜드 판매 매장간(間) 취급 SKU 추천 시스템)

  • Joe, Denis Yongmin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.77-110
    • /
    • 2017
  • Recently, the diversification and individualization of consumption patterns through the web and mobile devices based on the Internet have been rapid. As this happens, the efficient operation of the offline store, which is a traditional distribution channel, has become more important. In order to raise both the sales and profits of stores, stores need to supply and sell the most attractive products to consumers in a timely manner. However, there is a lack of research on which SKUs, out of many products, can increase sales probability and reduce inventory costs. In particular, if a company sells products through multiple in-store stores across multiple locations, it would be helpful to increase sales and profitability of stores if SKUs appealing to customers are recommended. In this study, the recommender system (recommender system such as collaborative filtering and hybrid filtering), which has been used for personalization recommendation, is suggested by SKU recommendation method of a store unit of a distribution company that handles a homogeneous brand through a plurality of sales stores by country and region. We calculated the similarity of each store by using the purchase data of each store's handling items, filtering the collaboration according to the sales history of each store by each SKU, and finally recommending the individual SKU to the store. In addition, the store is classified into four clusters through PCA (Principal Component Analysis) and cluster analysis (Clustering) using the store profile data. The recommendation system is implemented by the hybrid filtering method that applies the collaborative filtering in each cluster and measured the performance of both methods based on actual sales data. Most of the existing recommendation systems have been studied by recommending items such as movies and music to the users. In practice, industrial applications have also become popular. In the meantime, there has been little research on recommending SKUs for each store by applying these recommendation systems, which have been mainly dealt with in the field of personalization services, to the store units of distributors handling similar brands. If the recommendation method of the existing recommendation methodology was 'the individual field', this study expanded the scope of the store beyond the individual domain through a plurality of sales stores by country and region and dealt with the store unit of the distribution company handling the same brand SKU while suggesting a recommendation method. In addition, if the existing recommendation system is limited to online, it is recommended to apply the data mining technique to develop an algorithm suitable for expanding to the store area rather than expanding the utilization range offline and analyzing based on the existing individual. The significance of the results of this study is that the personalization recommendation algorithm is applied to a plurality of sales outlets handling the same brand. A meaningful result is derived and a concrete methodology that can be constructed and used as a system for actual companies is proposed. It is also meaningful that this is the first attempt to expand the research area of the academic field related to the existing recommendation system, which was focused on the personalization domain, to a sales store of a company handling the same brand. From 05 to 03 in 2014, the number of stores' sales volume of the top 100 SKUs are limited to 52 SKUs by collaborative filtering and the hybrid filtering method SKU recommended. We compared the performance of the two recommendation methods by totaling the sales results. The reason for comparing the two recommendation methods is that the recommendation method of this study is defined as the reference model in which offline collaborative filtering is applied to demonstrate higher performance than the existing recommendation method. The results of this model are compared with the Hybrid filtering method, which is a model that reflects the characteristics of the offline store view. The proposed method showed a higher performance than the existing recommendation method. The proposed method was proved by using actual sales data of large Korean apparel companies. In this study, we propose a method to extend the recommendation system of the individual level to the group level and to efficiently approach it. In addition to the theoretical framework, which is of great value.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.