• Title/Summary/Keyword: matrix comparison

Search Result 922, Processing Time 0.02 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

The Significance of Plasma Urokinase-type Plasminogen Activator and Type 1 Plasminogen Activator Inhibitor in Lung Cancer (폐암에서 혈장 Urokinase-Type Plasminogen Activator 및 Type 1 Plasminogen Activator Inhibitor의 의의)

  • Park, Kwang-Joo;Kim, Hyung-Jung;Ahn, Chul-Min;Lee, Doo-Yun;Chang, Joon;Kim, Sung-Kyu;Lee, Won-Young
    • Tuberculosis and Respiratory Diseases
    • /
    • v.44 no.3
    • /
    • pp.516-524
    • /
    • 1997
  • Background : Cancer invasion and metastasis require the dissolution of the extracellular matrix in which several proteolytic enzymes are involved. One of these enzymes is the urokinase-type plasminogen activator(u-PA), and plasminogen activator inhibitors(PAI-1, PAI-2) also have a possible role in cancer invasion and metastasis by protection of cancer itself from proteolysis by u-PA. It has been reported that the levels of u-PA and plasminogen activator inhibitors in various cancer tissues are significantly higher than those in normal tissues and have significant correlations with tumor size and lymph node involvement. Here, we measured the concentration of plasma u-PA and PAI-1 antigens in the patients with lung cancer and compared the concentration of them with histologic types and staging parameters. Methods : We measured the concentration of plasma u-PA and PAI-1 antigens using commercial ELISA kit in 37 lung cancer patients, 21 benign lung disease patients and 24 age-matched healthy controls, and we compared the concentration of them with histologic types and staging parameters in lung cancer patients. Results : The concentration of u-PA was $1.0{\pm}0.3ng/mL$ in controls, $1.0{\pm}0.3ng/mL$ in benign lung disease patients and $0.9{\pm}0.3ng/mL$ in lung cancer patients. The concentration of PAI-1 was $14.2{\pm}6.7ng/mL$ in controls, $14.9{\pm}6.3ng/mL$ in benign lung disease patients, and $22.1{\pm}9.8ng/mL$ in lung cancer patients. The concentration of PAI-1 in lung cancer patients was higher than those of benign lung disease patients and controls. The concentration of u-PA was $0.7{\pm}0.4ng/mL$ in squamous cell carcinoma, $0.8{\pm}0.3ng/mL$ in adenocarcinoma, 0.9ng/mL in large cell carcinoma, and $1.1{\pm}0.7ng/mL$ in small cell carcinoma. The concentration of PAI-1 was $22.3{\pm}7.2ng/mL$ in squamous cell carcinoma, $22.6{\pm}9.9ng/mL$ in adenocarcinoma, 42 ng/mL in large cell carcinoma, and $16.0{\pm}14.2ng/mL$ in small cell carcinoma. The concentration of u-PA was 0.74ng/mL in stage I, $1.2{\pm}0.6ng/mL$ in stage II, $0.7{\pm}0.4ng/mL$ in stage IIIA, $0.7{\pm}0.4ng/mL$ in stage IIIB, and $0.7{\pm}0.3ng/mL$ in stage IV. The concentration of PAI-1 was 21.8ng/mL in stage I, $22.7{\pm}8.7ng/mL$ in stage II, $18.4{\pm}4.9ng/mL$ in stage IIIA, $25.3{\pm}9.0ng/mL$ in stage IIIB, and $21.5{\pm}10.8ng/mL$ in stage IV. When we divided T stage into T1-3 and T4, the concentration of u-PA was $0.8{\pm}0.4ng/mL$ in T1-3 and $0.7{\pm}0.4ng/mL$ in T4, and the concentration of PAI-1 was $17.9{\pm}5.6ng/mL$ in T1-3 and $26.1{\pm}9.1ng/mL$ in T4. The concentration of PAI-1 in T4 was significantly higher than that in T1-3. The concentration of u-PA was $0.8{\pm}0.4ng/mL$ in M0 and $0.7{\pm}0.3ng/mL$ in M1, and the concentration of PAI-1 was $23.6{\pm}8.3ng/mL$ in M0 and $21.5{\pm}10.8ng/mL$ in M1. Conclusions : The plasma levels of PAI-1 in lung cancer were higher than benign lung disease and controls, and the plasma levels of PAI-1 in T4 were significantly higher than T1-3. These findings suggest involvement of PAI-1 with local invasion of lung cancer, but it should be confirmed by the data on comparison with pathological staging and tissue level in lung cancer.

  • PDF