Discriminative Feature Selection for Uncertain Graph Classification

Discriminative Feature Selection for Uncertain Graph Classification Xiangnan Kong University of Illinois at Chicago joint work with Philip S. Yu (Univ. Illinois at Chicago) Xue Wang & Ann B. Ragin (Northwestern Univ.)

Brain: A Complex Machine How it works? When something is wrong... Alzheimer s Disease ADHD

Neuroimaging fmri: A Video of brain activities But, Can you tell which brain is normal?

Brain as a Network Brain Activities Functional Connections

Brain Region Connectivity healthy brain Good Family

Brain Region Connectivity ADHD brain Connectivity problem

Brain Activities as an Uncertain Graph Not sure how exactly the network looks like probability that the connection existing in practice

Uncertain Graph 0.02 0.01 uncertain graph A 0.4 0.9 0.04 0.02 B C 0.38 0.11 0.7 possible worlds 0.03 0.252

Uncertain Graph 0.02 0.01 uncertain graph A 0.4 0.9 0.04 0.02 B C 0.38 0.11 0.7... possible worlds 0.03 0.252

Uncertain Graph Classification Problem 0.4 0.7 0.9 0.4 0.5 0.1 0.2 0.2 label! label! label! ADHD? +/- Uncertain Graphs Discriminative Subgraph Features x! label! x! label! x! label! Feature Vectors

How to tell if a subgraph is Discriminative Uncertain Graphs + 0.8 B A 0.9 C B + A A 0.1 0.9 0.1 0.8 0.1 C B 0.9 0.8 C B A 0.1 C G 1 G 2 G3 G4 Subgraph Features A A B C C B g 1 g 2 g 3 frequent in uncertain graphs discriminative in certain graphs C discriminative in uncertain graphs

Discriminative Scores of a Subgraph Certain graphs Utility Score G-test Score label! label! label! Frequency Ratio Confidence HSIC subgraph feature a certain value

Discriminative Scores of a Subgraph Uncertain graphs Utility Scores Probability 0.4 0.7 0.9 0.4 0.5 0.1 0.2 0.5 subgraph label! label! label! 0.01 0.31 0.23 0.05 0.01 0.13 0.31 a distribution

How to get the distribution? Confidence It depends on the utility function... Frequency Ratio G-test Score HSIC Generalized Utility Function for Certain Graphs Table 2: Summary of Discrimination Score Functions. Name f(n g +,ng,n +,n ) confidence frequency ratio G-test HSIC(linear) n g + n g + +ng g + n log n n g n + 2n g + ln ng + n n g n + (n g + n ng n + )2 (n + +n 1) 2 (n + +n ) 2 +2(n + n g + ) ln n (n + ng + ) n + (n n g )

Dynamic Programming n + i 1 0 [ Pr n g + = i, D ] + (k) 0 1 k n + Pr [n g + = n +, D ] + Pr [n g + =1, D ] + Pr [n g + =0, D ] + [ Pr i, D(k) ] = [ ] ( 1 Pr[g G ) k ] Pr[i, D(k 1)] + Pr[g G(k)] Pr[i 1, D(k 1)] if i k 1 if i = k =0 0 if i>kor i<0

How to Measure? MedianMode Mean Subgraph A Probability phi-prob 0 + Mean Mode Subgraph B Probability 0 Median phi-prob + Frequency Ratio

Subgraph Statistical Measures ) Mean: Mode: Median: phi-probability: ( Exp F (g, D) ) ( ) Median F (g, D) = ( Mode F (g, D) ) + s= =argmax S ( ϕ-pr F (g, D) ) s Pr[F (g, D) =s] = =argmax s s=ϕ [ Pr F (g, D) ] =s S [ ] Pr F (g, D) =s s= + Pr[F (g, D) =s] 1 2 More Details in the Paper 0-edges 1-edge Pattern Search Tree 2-edges

Data Sets Graphs: Brain Images (fmri) Class Label: Brain Diseases Table 3: Summary of experimental datasets. D D + D V avg. E avg. edge prob ADHD 200 100 100 116 484.7 0.55 ADNI 36 18 18 90 2019.8 0.59 HIV 50 25 25 90 480.48 0.88 Alzheimer s Disease ADHD

Compared Methods Certain Graph Methods Frequent Subgraphs in Uncertain Graphs Utility Functions Statistical Measures Confidence Frequency Ratio HSIC G-test Score Mean Mode Median Phi - probability

Uncertain Graph Helps Error Rate Methods t =100 t =200 t =300 t =400 t =500 Uncertain Graph Methods Certain Graph Methods Exp-HSIC 0.400 (9) 0.367 (8) 0.367 (10) 0.317 (4) 0.333 (9) Med-HSIC 0.433 (14) 0.350 (5) 0.333 (6) 0.350 (8) 0.317 (7) Mod-HSIC 0.367 (6) 0.333 (3) 0.300 (1)* 0.317 (4) 0.300 (2) ϕpr-hsic 0.283 (1)* 0.283 (1)* 0.333 (6) 0.333 (7) 0.300 (2) HSIC 0.450 (16) 0.467 (19) 0.467 (17) 0.500 (18) 0.500 (18) Exp-Ratio 0.433 (14) 0.383 (10) 0.317 (4) 0.300 (2) 0.300 (2) Med-Ratio 0.450 (16) 0.417 (15) 0.450 (16) 0.383 (11) 0.383 (11) Mod-Ratio 0.317 (3) 0.350 (5) 0.433 (15) 0.417 (13) 0.467 (15) ϕpr-ratio 0.400 (9) 0.317 (2) 0.300 (1)* 0.300 (2) 0.267 (1)* Ratio 0.500 (19) 0.483 (20) 0.533 (22) 0.567 (22) 0.533 (20) Exp-Gtest 0.300 (2) 0.367 (8) 0.317 (4) 0.350 (8) 0.383 (11) Med-Gtest 0.517 (21) 0.450 (18) 0.400 (11) 0.500 (18) 0.483 (17) Mod-Gtest 0.517 (21) 0.550 (22) 0.500 (21) 0.500 (18) 0.517 (19) ϕpr-gtest 0.450 (16) 0.417 (15) 0.417 (13) 0.383 (11) 0.300 (2) Gtest 0.500 (19) 0.500 (21) 0.467 (17) 0.433 (14) 0.550 (21) Exp-Conf 0.367 (7) 0.333 (3) 0.300 (1)* 0.283 (1)* 0.300 (2) Med-Conf 0.333 (4) 0.350 (5) 0.350 (8) 0.350 (8) 0.317 (7) Mod-Conf 0.417 (12) 0.383 (10) 0.350 (8) 0.317 (4) 0.333 (9) ϕpr-conf 0.400 (9) 0.417 (15) 0.467 (17) 0.467 (16) 0.433 (13) Conf 0.400 (9) 0.400 (13) 0.417 (13) 0.450 (15) 0.467 (15) Exp-Freq 0.383 (8) 0.383 (10) 0.400 (11) 0.467 (16) 0.433 (13) Freq 0.350 (5) 0.400 (13) 0.483 (20) 0.550 (21) 0.550 (21)

Discriminative Function Helps Error Rate Methods t =100 t =200 t =300 t =400 t =500 Discrimin ative Exp-HSIC 0.400 (9) 0.367 (8) 0.367 (10) 0.317 (4) 0.333 (9) Med-HSIC 0.433 (14) 0.350 (5) 0.333 (6) 0.350 (8) 0.317 (7) Mod-HSIC 0.367 (6) 0.333 (3) 0.300 (1)* 0.317 (4) 0.300 (2) ϕpr-hsic 0.283 (1)* 0.283 (1)* 0.333 (6) 0.333 (7) 0.300 (2) HSIC 0.450 (16) 0.467 (19) 0.467 (17) 0.500 (18) 0.500 (18) Exp-Ratio 0.433 (14) 0.383 (10) 0.317 (4) 0.300 (2) 0.300 (2) Med-Ratio 0.450 (16) 0.417 (15) 0.450 (16) 0.383 (11) 0.383 (11) Mod-Ratio 0.317 (3) 0.350 (5) 0.433 (15) 0.417 (13) 0.467 (15) ϕpr-ratio 0.400 (9) 0.317 (2) 0.300 (1)* 0.300 (2) 0.267 (1)* Ratio 0.500 (19) 0.483 (20) 0.533 (22) 0.567 (22) 0.533 (20) Exp-Gtest 0.300 (2) 0.367 (8) 0.317 (4) 0.350 (8) 0.383 (11) Med-Gtest 0.517 (21) 0.450 (18) 0.400 (11) 0.500 (18) 0.483 (17) Mod-Gtest 0.517 (21) 0.550 (22) 0.500 (21) 0.500 (18) 0.517 (19) ϕpr-gtest 0.450 (16) 0.417 (15) 0.417 (13) 0.383 (11) 0.300 (2) Gtest 0.500 (19) 0.500 (21) 0.467 (17) 0.433 (14) 0.550 (21) Frequent Exp-Conf 0.367 (7) 0.333 (3) 0.300 (1)* 0.283 (1)* 0.300 (2) Med-Conf 0.333 (4) 0.350 (5) 0.350 (8) 0.350 (8) 0.317 (7) Mod-Conf 0.417 (12) 0.383 (10) 0.350 (8) 0.317 (4) 0.333 (9) ϕpr-conf 0.400 (9) 0.417 (15) 0.467 (17) 0.467 (16) 0.433 (13) Conf 0.400 (9) 0.400 (13) 0.417 (13) 0.450 (15) 0.467 (15) Exp-Freq 0.383 (8) 0.383 (10) 0.400 (11) 0.467 (16) 0.433 (13) Freq 0.350 (5) 0.400 (13) 0.483 (20) 0.550 (21) 0.550 (21)

Statistical Measure Helps Error Rate mean median mode phi-prob Methods t =100 t =200 t =300 t =400 t =500 Exp-HSIC 0.400 (9) 0.367 (8) 0.367 (10) 0.317 (4) 0.333 (9) Med-HSIC 0.433 (14) 0.350 (5) 0.333 (6) 0.350 (8) 0.317 (7) Mod-HSIC 0.367 (6) 0.333 (3) 0.300 (1)* 0.317 (4) 0.300 (2) ϕpr-hsic 0.283 (1)* 0.283 (1)* 0.333 (6) 0.333 (7) 0.300 (2) HSIC 0.450 (16) 0.467 (19) 0.467 (17) 0.500 (18) 0.500 (18) Exp-Ratio 0.433 (14) 0.383 (10) 0.317 (4) 0.300 (2) 0.300 (2) Med-Ratio 0.450 (16) 0.417 (15) 0.450 (16) 0.383 (11) 0.383 (11) Mod-Ratio 0.317 (3) 0.350 (5) 0.433 (15) 0.417 (13) 0.467 (15) ϕpr-ratio 0.400 (9) 0.317 (2) 0.300 (1)* 0.300 (2) 0.267 (1)* Ratio 0.500 (19) 0.483 (20) 0.533 (22) 0.567 (22) 0.533 (20) Exp-Gtest 0.300 (2) 0.367 (8) 0.317 (4) 0.350 (8) 0.383 (11) Med-Gtest 0.517 (21) 0.450 (18) 0.400 (11) 0.500 (18) 0.483 (17) Mod-Gtest 0.517 (21) 0.550 (22) 0.500 (21) 0.500 (18) 0.517 (19) ϕpr-gtest 0.450 (16) 0.417 (15) 0.417 (13) 0.383 (11) 0.300 (2) Gtest 0.500 (19) 0.500 (21) 0.467 (17) 0.433 (14) 0.550 (21) Exp-Conf 0.367 (7) 0.333 (3) 0.300 (1)* 0.283 (1)* 0.300 (2) Med-Conf 0.333 (4) 0.350 (5) 0.350 (8) 0.350 (8) 0.317 (7) Mod-Conf 0.417 (12) 0.383 (10) 0.350 (8) 0.317 (4) 0.333 (9) ϕpr-conf 0.400 (9) 0.417 (15) 0.467 (17) 0.467 (16) 0.433 (13) Conf 0.400 (9) 0.400 (13) 0.417 (13) 0.450 (15) 0.467 (15) Exp-Freq 0.383 (8) 0.383 (10) 0.400 (11) 0.467 (16) 0.433 (13) Freq 0.350 (5) 0.400 (13) 0.483 (20) 0.550 (21) 0.550 (21)

Summary Mining discriminative subgraph features for uncertain graph classification model brains as uncertain graphs mining discriminative subgraph features using different statistical measures

Q&A