Discriminative Feature Selection for Uncertain Graph Classification Xiangnan Kong University of Illinois at Chicago joint work with Philip S. Yu (Univ. Illinois at Chicago) Xue Wang & Ann B. Ragin (Northwestern Univ.)
Brain: A Complex Machine How it works? When something is wrong... Alzheimer s Disease ADHD
Neuroimaging fmri: A Video of brain activities But, Can you tell which brain is normal?
Brain as a Network Brain Activities Functional Connections
Brain Region Connectivity healthy brain Good Family
Brain Region Connectivity ADHD brain Connectivity problem
Brain Activities as an Uncertain Graph Not sure how exactly the network looks like probability that the connection existing in practice
Uncertain Graph 0.02 0.01 uncertain graph A 0.4 0.9 0.04 0.02 B C 0.38 0.11 0.7 possible worlds 0.03 0.252
Uncertain Graph 0.02 0.01 uncertain graph A 0.4 0.9 0.04 0.02 B C 0.38 0.11 0.7... possible worlds 0.03 0.252
Uncertain Graph Classification Problem 0.4 0.7 0.9 0.4 0.5 0.1 0.2 0.2 label! label! label! ADHD? +/- Uncertain Graphs Discriminative Subgraph Features x! label! x! label! x! label! Feature Vectors
How to tell if a subgraph is Discriminative Uncertain Graphs + 0.8 B A 0.9 C B + A A 0.1 0.9 0.1 0.8 0.1 C B 0.9 0.8 C B A 0.1 C G 1 G 2 G3 G4 Subgraph Features A A B C C B g 1 g 2 g 3 frequent in uncertain graphs discriminative in certain graphs C discriminative in uncertain graphs
Discriminative Scores of a Subgraph Certain graphs Utility Score G-test Score label! label! label! Frequency Ratio Confidence HSIC subgraph feature a certain value
Discriminative Scores of a Subgraph Uncertain graphs Utility Scores Probability 0.4 0.7 0.9 0.4 0.5 0.1 0.2 0.5 subgraph label! label! label! 0.01 0.31 0.23 0.05 0.01 0.13 0.31 a distribution
How to get the distribution? Confidence It depends on the utility function... Frequency Ratio G-test Score HSIC Generalized Utility Function for Certain Graphs Table 2: Summary of Discrimination Score Functions. Name f(n g +,ng,n +,n ) confidence frequency ratio G-test HSIC(linear) n g + n g + +ng g + n log n n g n + 2n g + ln ng + n n g n + (n g + n ng n + )2 (n + +n 1) 2 (n + +n ) 2 +2(n + n g + ) ln n (n + ng + ) n + (n n g )
Dynamic Programming n + i 1 0 [ Pr n g + = i, D ] + (k) 0 1 k n + Pr [n g + = n +, D ] + Pr [n g + =1, D ] + Pr [n g + =0, D ] + [ Pr i, D(k) ] = [ ] ( 1 Pr[g G ) k ] Pr[i, D(k 1)] + Pr[g G(k)] Pr[i 1, D(k 1)] if i k 1 if i = k =0 0 if i>kor i<0
How to Measure? MedianMode Mean Subgraph A Probability phi-prob 0 + Mean Mode Subgraph B Probability 0 Median phi-prob + Frequency Ratio
Subgraph Statistical Measures ) Mean: Mode: Median: phi-probability: ( Exp F (g, D) ) ( ) Median F (g, D) = ( Mode F (g, D) ) + s= =argmax S ( ϕ-pr F (g, D) ) s Pr[F (g, D) =s] = =argmax s s=ϕ [ Pr F (g, D) ] =s S [ ] Pr F (g, D) =s s= + Pr[F (g, D) =s] 1 2 More Details in the Paper 0-edges 1-edge Pattern Search Tree 2-edges
Data Sets Graphs: Brain Images (fmri) Class Label: Brain Diseases Table 3: Summary of experimental datasets. D D + D V avg. E avg. edge prob ADHD 200 100 100 116 484.7 0.55 ADNI 36 18 18 90 2019.8 0.59 HIV 50 25 25 90 480.48 0.88 Alzheimer s Disease ADHD
Compared Methods Certain Graph Methods Frequent Subgraphs in Uncertain Graphs Utility Functions Statistical Measures Confidence Frequency Ratio HSIC G-test Score Mean Mode Median Phi - probability
Uncertain Graph Helps Error Rate Methods t =100 t =200 t =300 t =400 t =500 Uncertain Graph Methods Certain Graph Methods Exp-HSIC 0.400 (9) 0.367 (8) 0.367 (10) 0.317 (4) 0.333 (9) Med-HSIC 0.433 (14) 0.350 (5) 0.333 (6) 0.350 (8) 0.317 (7) Mod-HSIC 0.367 (6) 0.333 (3) 0.300 (1)* 0.317 (4) 0.300 (2) ϕpr-hsic 0.283 (1)* 0.283 (1)* 0.333 (6) 0.333 (7) 0.300 (2) HSIC 0.450 (16) 0.467 (19) 0.467 (17) 0.500 (18) 0.500 (18) Exp-Ratio 0.433 (14) 0.383 (10) 0.317 (4) 0.300 (2) 0.300 (2) Med-Ratio 0.450 (16) 0.417 (15) 0.450 (16) 0.383 (11) 0.383 (11) Mod-Ratio 0.317 (3) 0.350 (5) 0.433 (15) 0.417 (13) 0.467 (15) ϕpr-ratio 0.400 (9) 0.317 (2) 0.300 (1)* 0.300 (2) 0.267 (1)* Ratio 0.500 (19) 0.483 (20) 0.533 (22) 0.567 (22) 0.533 (20) Exp-Gtest 0.300 (2) 0.367 (8) 0.317 (4) 0.350 (8) 0.383 (11) Med-Gtest 0.517 (21) 0.450 (18) 0.400 (11) 0.500 (18) 0.483 (17) Mod-Gtest 0.517 (21) 0.550 (22) 0.500 (21) 0.500 (18) 0.517 (19) ϕpr-gtest 0.450 (16) 0.417 (15) 0.417 (13) 0.383 (11) 0.300 (2) Gtest 0.500 (19) 0.500 (21) 0.467 (17) 0.433 (14) 0.550 (21) Exp-Conf 0.367 (7) 0.333 (3) 0.300 (1)* 0.283 (1)* 0.300 (2) Med-Conf 0.333 (4) 0.350 (5) 0.350 (8) 0.350 (8) 0.317 (7) Mod-Conf 0.417 (12) 0.383 (10) 0.350 (8) 0.317 (4) 0.333 (9) ϕpr-conf 0.400 (9) 0.417 (15) 0.467 (17) 0.467 (16) 0.433 (13) Conf 0.400 (9) 0.400 (13) 0.417 (13) 0.450 (15) 0.467 (15) Exp-Freq 0.383 (8) 0.383 (10) 0.400 (11) 0.467 (16) 0.433 (13) Freq 0.350 (5) 0.400 (13) 0.483 (20) 0.550 (21) 0.550 (21)
Discriminative Function Helps Error Rate Methods t =100 t =200 t =300 t =400 t =500 Discrimin ative Exp-HSIC 0.400 (9) 0.367 (8) 0.367 (10) 0.317 (4) 0.333 (9) Med-HSIC 0.433 (14) 0.350 (5) 0.333 (6) 0.350 (8) 0.317 (7) Mod-HSIC 0.367 (6) 0.333 (3) 0.300 (1)* 0.317 (4) 0.300 (2) ϕpr-hsic 0.283 (1)* 0.283 (1)* 0.333 (6) 0.333 (7) 0.300 (2) HSIC 0.450 (16) 0.467 (19) 0.467 (17) 0.500 (18) 0.500 (18) Exp-Ratio 0.433 (14) 0.383 (10) 0.317 (4) 0.300 (2) 0.300 (2) Med-Ratio 0.450 (16) 0.417 (15) 0.450 (16) 0.383 (11) 0.383 (11) Mod-Ratio 0.317 (3) 0.350 (5) 0.433 (15) 0.417 (13) 0.467 (15) ϕpr-ratio 0.400 (9) 0.317 (2) 0.300 (1)* 0.300 (2) 0.267 (1)* Ratio 0.500 (19) 0.483 (20) 0.533 (22) 0.567 (22) 0.533 (20) Exp-Gtest 0.300 (2) 0.367 (8) 0.317 (4) 0.350 (8) 0.383 (11) Med-Gtest 0.517 (21) 0.450 (18) 0.400 (11) 0.500 (18) 0.483 (17) Mod-Gtest 0.517 (21) 0.550 (22) 0.500 (21) 0.500 (18) 0.517 (19) ϕpr-gtest 0.450 (16) 0.417 (15) 0.417 (13) 0.383 (11) 0.300 (2) Gtest 0.500 (19) 0.500 (21) 0.467 (17) 0.433 (14) 0.550 (21) Frequent Exp-Conf 0.367 (7) 0.333 (3) 0.300 (1)* 0.283 (1)* 0.300 (2) Med-Conf 0.333 (4) 0.350 (5) 0.350 (8) 0.350 (8) 0.317 (7) Mod-Conf 0.417 (12) 0.383 (10) 0.350 (8) 0.317 (4) 0.333 (9) ϕpr-conf 0.400 (9) 0.417 (15) 0.467 (17) 0.467 (16) 0.433 (13) Conf 0.400 (9) 0.400 (13) 0.417 (13) 0.450 (15) 0.467 (15) Exp-Freq 0.383 (8) 0.383 (10) 0.400 (11) 0.467 (16) 0.433 (13) Freq 0.350 (5) 0.400 (13) 0.483 (20) 0.550 (21) 0.550 (21)
Statistical Measure Helps Error Rate mean median mode phi-prob Methods t =100 t =200 t =300 t =400 t =500 Exp-HSIC 0.400 (9) 0.367 (8) 0.367 (10) 0.317 (4) 0.333 (9) Med-HSIC 0.433 (14) 0.350 (5) 0.333 (6) 0.350 (8) 0.317 (7) Mod-HSIC 0.367 (6) 0.333 (3) 0.300 (1)* 0.317 (4) 0.300 (2) ϕpr-hsic 0.283 (1)* 0.283 (1)* 0.333 (6) 0.333 (7) 0.300 (2) HSIC 0.450 (16) 0.467 (19) 0.467 (17) 0.500 (18) 0.500 (18) Exp-Ratio 0.433 (14) 0.383 (10) 0.317 (4) 0.300 (2) 0.300 (2) Med-Ratio 0.450 (16) 0.417 (15) 0.450 (16) 0.383 (11) 0.383 (11) Mod-Ratio 0.317 (3) 0.350 (5) 0.433 (15) 0.417 (13) 0.467 (15) ϕpr-ratio 0.400 (9) 0.317 (2) 0.300 (1)* 0.300 (2) 0.267 (1)* Ratio 0.500 (19) 0.483 (20) 0.533 (22) 0.567 (22) 0.533 (20) Exp-Gtest 0.300 (2) 0.367 (8) 0.317 (4) 0.350 (8) 0.383 (11) Med-Gtest 0.517 (21) 0.450 (18) 0.400 (11) 0.500 (18) 0.483 (17) Mod-Gtest 0.517 (21) 0.550 (22) 0.500 (21) 0.500 (18) 0.517 (19) ϕpr-gtest 0.450 (16) 0.417 (15) 0.417 (13) 0.383 (11) 0.300 (2) Gtest 0.500 (19) 0.500 (21) 0.467 (17) 0.433 (14) 0.550 (21) Exp-Conf 0.367 (7) 0.333 (3) 0.300 (1)* 0.283 (1)* 0.300 (2) Med-Conf 0.333 (4) 0.350 (5) 0.350 (8) 0.350 (8) 0.317 (7) Mod-Conf 0.417 (12) 0.383 (10) 0.350 (8) 0.317 (4) 0.333 (9) ϕpr-conf 0.400 (9) 0.417 (15) 0.467 (17) 0.467 (16) 0.433 (13) Conf 0.400 (9) 0.400 (13) 0.417 (13) 0.450 (15) 0.467 (15) Exp-Freq 0.383 (8) 0.383 (10) 0.400 (11) 0.467 (16) 0.433 (13) Freq 0.350 (5) 0.400 (13) 0.483 (20) 0.550 (21) 0.550 (21)
Summary Mining discriminative subgraph features for uncertain graph classification model brains as uncertain graphs mining discriminative subgraph features using different statistical measures
Q&A