Google Scholar

Github

Casper Neuron Results

In this page, we will mainly demonstrate the neuron AIE distribution for different prompts among different Model

Neuron AIE distribution for Benign Prompt

13b_neuron_adv_layer_0 13b_neuron_adv_layer_1 13b_neuron_adv_layer_2 13b_neuron_adv_layer_3 13b_neuron_adv_layer_4 13b_neuron_adv_layer_5 13b_neuron_adv_layer_6 13b_neuron_adv_layer_7 13b_neuron_adv_layer_8 13b_neuron_adv_layer_9 13b_neuron_adv_layer_10 13b_neuron_adv_layer_11 13b_neuron_adv_layer_12 13b_neuron_adv_layer_13 13b_neuron_adv_layer_14 13b_neuron_adv_layer_15 13b_neuron_adv_layer_16 13b_neuron_adv_layer_17 13b_neuron_adv_layer_18 13b_neuron_adv_layer_19 13b_neuron_adv_layer_20 13b_neuron_adv_layer_21 13b_neuron_adv_layer_22 13b_neuron_adv_layer_23 13b_neuron_adv_layer_24 13b_neuron_adv_layer_25 13b_neuron_adv_layer_26 13b_neuron_adv_layer_27 13b_neuron_adv_layer_28 13b_neuron_adv_layer_29 13b_neuron_adv_layer_30 13b_neuron_adv_layer_31 13b_neuron_adv_layer_32 13b_neuron_adv_layer_33 13b_neuron_adv_layer_34 13b_neuron_adv_layer_35 13b_neuron_adv_layer_36 13b_neuron_adv_layer_37 13b_neuron_adv_layer_38 13b_neuron_adv_layer_39

Neuron AIE distribution for Harmful Prompt

13b_neuron_adv_layer_0 13b_neuron_adv_layer_1 13b_neuron_adv_layer_2 13b_neuron_adv_layer_3 13b_neuron_adv_layer_4 13b_neuron_adv_layer_5 13b_neuron_adv_layer_6 13b_neuron_adv_layer_7 13b_neuron_adv_layer_8 13b_neuron_adv_layer_9 13b_neuron_adv_layer_10 13b_neuron_adv_layer_11 13b_neuron_adv_layer_12 13b_neuron_adv_layer_13 13b_neuron_adv_layer_14 13b_neuron_adv_layer_15 13b_neuron_adv_layer_16 13b_neuron_adv_layer_17 13b_neuron_adv_layer_18 13b_neuron_adv_layer_19 13b_neuron_adv_layer_20 13b_neuron_adv_layer_21 13b_neuron_adv_layer_22 13b_neuron_adv_layer_23 13b_neuron_adv_layer_24 13b_neuron_adv_layer_25 13b_neuron_adv_layer_26 13b_neuron_adv_layer_27 13b_neuron_adv_layer_28 13b_neuron_adv_layer_29 13b_neuron_adv_layer_30 13b_neuron_adv_layer_31 13b_neuron_adv_layer_32 13b_neuron_adv_layer_33 13b_neuron_adv_layer_34 13b_neuron_adv_layer_35 13b_neuron_adv_layer_36 13b_neuron_adv_layer_37 13b_neuron_adv_layer_38 13b_neuron_adv_layer_39

Neuron AIE distribution for Adversarial Prompt

13b_neuron_adv_layer_0 13b_neuron_adv_layer_1 13b_neuron_adv_layer_2 13b_neuron_adv_layer_3 13b_neuron_adv_layer_4 13b_neuron_adv_layer_5 13b_neuron_adv_layer_6 13b_neuron_adv_layer_7 13b_neuron_adv_layer_8 13b_neuron_adv_layer_9 13b_neuron_adv_layer_10 13b_neuron_adv_layer_11 13b_neuron_adv_layer_12 13b_neuron_adv_layer_13 13b_neuron_adv_layer_14 13b_neuron_adv_layer_15 13b_neuron_adv_layer_16 13b_neuron_adv_layer_17 13b_neuron_adv_layer_18 13b_neuron_adv_layer_19 13b_neuron_adv_layer_20 13b_neuron_adv_layer_21 13b_neuron_adv_layer_22 13b_neuron_adv_layer_23 13b_neuron_adv_layer_24 13b_neuron_adv_layer_25 13b_neuron_adv_layer_26 13b_neuron_adv_layer_27 13b_neuron_adv_layer_28 13b_neuron_adv_layer_29 13b_neuron_adv_layer_30 13b_neuron_adv_layer_31 13b_neuron_adv_layer_32 13b_neuron_adv_layer_33 13b_neuron_adv_layer_34 13b_neuron_adv_layer_35 13b_neuron_adv_layer_36 13b_neuron_adv_layer_37 13b_neuron_adv_layer_38 13b_neuron_adv_layer_39