WEBVTT

1
00:00:00.620 --> 00:00:01.570
Sarti, Gabriele: Yep.

2
00:00:03.270 --> 00:00:03.940
David Bau: Okay.

3
00:00:05.020 --> 00:00:05.640
Sarti, Gabriele: Alright.

4
00:00:05.640 --> 00:00:11.799
David Bau: And then, yeah, do you have the ability to use share screen from where you are? Yep.

5
00:00:12.220 --> 00:00:13.130
David Bau: Great.

6
00:00:14.960 --> 00:00:16.569
Sarti, Gabriele: Can you see the screen?

7
00:00:17.080 --> 00:00:18.590
David Bau: I can see, yeah.

8
00:00:18.640 --> 00:00:19.829
Sarti, Gabriele: So this week.

9
00:00:20.070 --> 00:00:25.239
David Bau: Yeah, we're… so this week is a guest lecture from Gabriel.

10
00:00:25.530 --> 00:00:30.900
David Bau: Because the topic is on input attribution.

11
00:00:31.010 --> 00:00:38.019
David Bau: Where Gabriel's one of the world's experts on how to do this with large models, with LLMs, and so on.

12
00:00:38.440 --> 00:00:50.320
David Bau: And so, so since, since he's here this semester, I asked him to, to give, the lecture on the topic here today. And it's a little, it's a little too bad that it's all

13
00:00:50.490 --> 00:00:54.909
David Bau: Remote, because as you know, we… we… we try to…

14
00:00:55.220 --> 00:01:13.310
David Bau: you know, make sure that everybody gets a chance to ask their questions and interact. So, I want you to not do the normal Zoom thing and… and zone out, during the lecture. Maybe, Gabriel can make sure that there's, like, you know, times in the… in the talk to,

15
00:01:13.510 --> 00:01:19.330
David Bau: To, to, to, to get, to get feedback or questions from people answered.

16
00:01:19.660 --> 00:01:25.190
David Bau: Okay, yeah, Jasmine, superstar, right? Yes, Jasmine, this is Gabriel from the reading.

17
00:01:25.190 --> 00:01:26.760
Sarti, Gabriele: You're too kind, you're too kind.

18
00:01:27.740 --> 00:01:28.570
Sarti, Gabriele: Excellent.

19
00:01:28.700 --> 00:01:42.940
Sarti, Gabriele: So thanks again for having me, David, it's really great. So yeah, so today's presentation that I prepared is called Attribution, Tracing Influence to Inputs and Model Components, and I will try to walk you through,

20
00:01:43.300 --> 00:01:54.680
Sarti, Gabriele: Some of the most recent, some background first, and then some of the most recent, let's say, interesting stuff that is going on in this… in this area of interpretability.

21
00:01:54.880 --> 00:02:04.749
Sarti, Gabriele: Yeah. So, the first… the first thing is that, the way that we can conceptualize, attribution is…

22
00:02:04.850 --> 00:02:24.149
Sarti, Gabriele: by looking at what the models are using to… to do predictions, right? So, in… here I argue that there are basically two pathways to prediction. The first one is, the inputs, so the models receive, as you know, in-context information, and,

23
00:02:24.600 --> 00:02:33.920
Sarti, Gabriele: These methods that we use to trace importance back to the inputs are what we normally refer to input or feature attribution.

24
00:02:34.080 --> 00:02:48.709
Sarti, Gabriele: So this is what we call also this in-context learning, right, in language models. And the other dimension that is complementary is whatever the models learned from training, right? So in this case, we have the learned weights.

25
00:02:48.710 --> 00:03:06.920
Sarti, Gabriele: And we can still do some attribution. It's a bit, related to causal mediation that you saw from the previous lecture, so you're gonna see, here, too, we have a way to do, basically component attribution, so understanding which components are responsible for a specific prediction.

26
00:03:07.940 --> 00:03:13.250
Sarti, Gabriele: And the plus one is, there is a second-order effect, of course, so…

27
00:03:13.250 --> 00:03:35.729
Sarti, Gabriele: whatever learned weights the model has are, are derived from training data, right? So, a big mission in, in attribution would also be to ideally trace back whatever importance, from, like, from the prediction to back to the training data, right? So there are some methods that are responsible for that.

28
00:03:35.730 --> 00:03:48.789
Sarti, Gabriele: I'm not really covering that part today, but just for your knowledge, these methods are kind of very similar to the others that we're going to discuss, and these are called training data attribution, or simply data attribution.

29
00:03:49.730 --> 00:03:57.730
Sarti, Gabriele: So the overall attributional interpretability is asking which elements motivate model predictions, right? That's the… that's a big question.

30
00:03:59.040 --> 00:04:00.010
Sarti, Gabriele: So…

31
00:04:00.050 --> 00:04:13.279
Sarti, Gabriele: This is kind of a formalized way to think about input attribution, so if you have a trained model and an input that the model receives, the input attribution method is just a map that, given an input.

32
00:04:13.280 --> 00:04:23.849
Sarti, Gabriele: in this case of dimension D, it will produce a set of scores, alpha 1 to alpha D, that are telling you how much, how relevant is the

33
00:04:24.370 --> 00:04:28.140
Sarti, Gabriele: ith dimension of this input, for the prediction.

34
00:04:29.020 --> 00:04:36.250
Sarti, Gabriele: So… What counts as relevance here is quite vague, and it's left vague on purpose, in a sense, because

35
00:04:36.270 --> 00:04:51.929
Sarti, Gabriele: it's very hard to formalize what does it mean for something to be salient or important towards prediction, right? So we're gonna maybe discuss this a bit more in the next few slides, but just for you to know, like, saliency is a bit of a fuzzy concept overall.

36
00:04:53.790 --> 00:05:03.969
Sarti, Gabriele: So to quantify importance, if we had simple linear models, right, like linear regression models, this would be a very

37
00:05:03.970 --> 00:05:17.350
Sarti, Gabriele: trivial things to do. Like, we could just look at the coefficients that are learned, so whatever the weights here that would be, matched to the matrix of the inputs would be our importance scores, right?

38
00:05:17.440 --> 00:05:27.149
Sarti, Gabriele: However, for deeper models, this is not straightforward, and the reason for that is that… is the presence of nonlinearities, right? So nonlinearities mess up

39
00:05:27.150 --> 00:05:37.869
Sarti, Gabriele: This kind of contributions of different inputs, and especially when they are chained, like, in a deep neural network, the whole influence is… becomes very messy.

40
00:05:38.190 --> 00:05:50.580
Sarti, Gabriele: So, generally, the way that we go about estimating this importance is through some approximations. For example, approximating some specific operations linearly.

41
00:05:50.590 --> 00:06:01.820
Sarti, Gabriele: Or perturbations, so trying to kind of, like, try to estimate how important something is by just ablating it or modifying it slightly.

42
00:06:03.090 --> 00:06:22.479
Sarti, Gabriele: So, let's have the most… let's have a look at the most basic setup for, for attribution, which is occlusion. So, you know, in the case of occlusion, let's say that we have our language model here, let's say this is, like, our GPT or our Llama, and we have

43
00:06:22.480 --> 00:06:28.539
Sarti, Gabriele: an input that is a simple string, right? For example, welcome back, ladies end.

44
00:06:28.730 --> 00:06:34.410
Sarti, Gabriele: That, as you know, gets tokenized and embedded before being fed through the model.

45
00:06:34.970 --> 00:06:42.739
Sarti, Gabriele: So, you see this is our, of dimension D, which is the dimension of the embedding, time S.

46
00:06:42.880 --> 00:06:45.030
Sarti, Gabriele: The dimension of the sequence.

47
00:06:45.290 --> 00:06:55.560
Sarti, Gabriele: And the output of the model is this distribution of probabilities over the vocabulary. So in this case, we find that the model is predicting gentlemen as the most likely next token.

48
00:06:56.090 --> 00:07:05.210
Sarti, Gabriele: So the occlusion case is very simply, let's ablate a single token, for example, ladies.

49
00:07:05.230 --> 00:07:17.890
Sarti, Gabriele: So here we're gonna have a different embedding for the ablated token, and let's get an output for the perturbed input, right? So in this case, the probabilities will change.

50
00:07:18.050 --> 00:07:28.330
Sarti, Gabriele: And an idea would be, okay, the way that we associate an importance to the token ladies is by looking at the top prediction.

51
00:07:28.450 --> 00:07:37.679
Sarti, Gabriele: and looking at how big of a drop this is, right? In this case, the drop is huge, so it means that ladies is probably very important towards that.

52
00:07:38.860 --> 00:07:51.679
Sarti, Gabriele: Probably you can see already that this is overall a bit problematic as a procedure. I'm curious if someone already has a hint of, like, what could be the problems in doing that, or…

53
00:07:55.650 --> 00:08:03.890
David Bau: So, can I ask? Because I'm not the expert here, is that right? Yep. So when you… so when you include it, what do you… what do you… what do you put there?

54
00:08:04.290 --> 00:08:20.009
Sarti, Gabriele: Yeah, that's one of the problems, exactly. So, there is no good answer to that, actually. So, this I mentioned in the next slide, but one of the issues with perturbation, and in general with this kind of, like, occlusion.

55
00:08:20.130 --> 00:08:30.299
Sarti, Gabriele: Which is related to one of the questions that we received, also for integrated gradient, what would be a good baseline for language, right? The problem is that,

56
00:08:30.600 --> 00:08:48.730
Sarti, Gabriele: perturbations, let me go to… yeah, can produce OD behaviors, and this would result in unfaithful explanation in the case in which you're replacing that with something that is not, realistic, right? So, for example, if we have,

57
00:08:49.010 --> 00:08:55.399
Sarti, Gabriele: let's say, for encoder language models like BERT, we have these kind of mask tokens, right?

58
00:08:55.480 --> 00:09:11.690
Sarti, Gabriele: Which are kinda interesting in this setting, because we could just replace things by masks, and the model is trained with masks, right? So it's able to handle that. But for GPTs, we don't have such things, right? The model is just predicting left to right, so it doesn't need masking.

59
00:09:11.850 --> 00:09:15.910
Sarti, Gabriele: Yeah, so the coder LLMs don't, right?

60
00:09:16.040 --> 00:09:24.890
Sarti, Gabriele: And one common approach is to sample replacements randomly and aggregate over multiple replacements, right?

61
00:09:25.060 --> 00:09:40.689
Sarti, Gabriele: But you can imagine that this scales very poorly, right? So if you… if I have to do 100 replacements with random tokens, and just measure how big of an impact it is on average, this becomes very expensive. I just attributed a single token in this case, right?

62
00:09:40.810 --> 00:09:49.390
Sarti, Gabriele: So imagine if I was to attribute the full sentence then, right? So this is very expensive, and it scales poorly to long inputs.

63
00:09:51.410 --> 00:09:59.150
Sarti, Gabriele: Yeah So, an alternative, a natural alternative in, in,

64
00:09:59.350 --> 00:10:14.410
Sarti, Gabriele: in neural networks is to use gradients as some sort of attribution. So the models, as you all know, are trained with gradient descent, so this gradient information is naturally employed during training.

65
00:10:14.600 --> 00:10:22.259
Sarti, Gabriele: And we can repurpose that to try to get a sense of the importance of components, at inference time.

66
00:10:22.310 --> 00:10:29.600
Sarti, Gabriele: So let's say an example. Here, again, we have exactly our same setup as before. We get the prediction gentleman.

67
00:10:29.600 --> 00:10:44.330
Sarti, Gabriele: But then, the way that we will go about that is we pick a specific target in a target function, let's say. In this case, the target could be the probability of the top most likely token, like gentlemen.

68
00:10:44.530 --> 00:10:52.759
Sarti, Gabriele: And we can take the gradient with respect to that back to the input embeddings of the model.

69
00:10:52.930 --> 00:11:03.020
Sarti, Gabriele: So, note that here, these gradients normally are taken with respect to a loss, right? So the gradient with respect to a loss tells you, how do I minimize this loss?

70
00:11:03.450 --> 00:11:12.960
Sarti, Gabriele: So which weights should I change to minimize this loss? In this case, instead, the gradient with respect to probabilities is telling us

71
00:11:13.400 --> 00:11:29.819
Sarti, Gabriele: In a sense, how sensitive is this final output of the model, so the final probability for gentlemen, to little perturbations of, in this case, if we focus on input embeddings, of these embeddings, right?

72
00:11:29.820 --> 00:11:44.900
Sarti, Gabriele: So the final outcome of this procedure is gradient vectors that have the same exact dimension as the input embeddings, and that basically express each dimension of the input embedding. How important is that?

73
00:11:44.930 --> 00:11:50.659
Sarti, Gabriele: Towards the, prediction of the, of the final, output.

74
00:11:50.910 --> 00:12:04.980
Sarti, Gabriele: So, normally, this is not very useful, right? One score per dimension is not very useful, so what we do is normally to aggregate those at the token level to get a single score per word.

75
00:12:04.980 --> 00:12:11.460
Sarti, Gabriele: So in this case, we could find, for example, that the gradients for ladies and are very high.

76
00:12:11.610 --> 00:12:20.929
Sarti, Gabriele: And if we aggregate these, for example, by taking the vector norm of each one of these gradient vectors, we would get a high attribution score for ladies N.

77
00:12:21.400 --> 00:12:27.940
Sarti, Gabriele: Which is intuitive, right? If we perturb lady's end, the model probably wouldn't predict gentlemen.

78
00:12:28.310 --> 00:12:31.660
Sarti, Gabriele: So this relies on this kind of approximation.

79
00:12:32.180 --> 00:12:36.620
Sarti, Gabriele: Yeah. Is everything clear for the gradient attribution part?

80
00:12:40.780 --> 00:12:44.180
David Bau: You guys are so… you guys are all camera off. Oh, I'm camera off, too.

81
00:12:44.180 --> 00:12:46.360
Sarti, Gabriele: Yeah, exactly. I'm kind of like.

82
00:12:46.360 --> 00:12:48.529
David Bau: It's impossible for Gabriel to tell what's going on.

83
00:12:48.530 --> 00:12:50.269
Sarti, Gabriele: Yeah, I don't know.

84
00:12:50.270 --> 00:12:50.770
David Bau: So.

85
00:12:52.010 --> 00:12:53.490
David Bau: Yeah, so,

86
00:12:53.820 --> 00:13:03.130
David Bau: So you just take a, you just take a vector, size. Is that, is that typically really what people do, or do they do other things other than…

87
00:13:03.600 --> 00:13:13.909
Sarti, Gabriele: this is another thing, that there is no consensus, so there's plenty of ways that people try to do this kind of aggregation. So, for example, the L2,

88
00:13:13.910 --> 00:13:24.179
Sarti, Gabriele: norm is probably the most natural way to do this, but other people took the sum of the gradients, or the L1 norm is,

89
00:13:24.250 --> 00:13:29.560
Sarti, Gabriele: Like, these are all commonly used, and there's no, like, one-size-fits-all kind of.

90
00:13:29.740 --> 00:13:30.440
David Bau: Okay.

91
00:13:30.620 --> 00:13:31.140
Sarti, Gabriele: Yep.

92
00:13:32.460 --> 00:13:33.010
Nikhil Prakash: Wait, I…

93
00:13:33.010 --> 00:13:33.510
Sarti, Gabriele: Yo.

94
00:13:33.740 --> 00:13:34.940
Sarti, Gabriele: Sorry, yep.

95
00:13:34.940 --> 00:13:45.299
Nikhil Prakash: So, what's the general sample size people use? I'm assuming they don't compute the gradient on one example and show this graph, right? So, what is the general sample size that people use?

96
00:13:46.180 --> 00:13:59.049
Sarti, Gabriele: I mean, the idea here is that the gradient attribution is really local, right? So, meaning, for a given example, you're gonna get your scores out in this case, right? If you want to draw some,

97
00:13:59.110 --> 00:14:20.920
Sarti, Gabriele: you know, hypothesis from the kind of information that you get out of this, definitely you want to have some data set, and to be able to tag, you know, the kind of expected behaviors that you would like to see, and see whether the gradient attribution retrieves something that matches your intuition, right? So, I agree with you that on a single example, this doesn't tell you much.

98
00:14:21.360 --> 00:14:26.439
David Bau: But that's… but that's actually one of the nice things, so isn't… or do you think that's true, right? That there's one of the nice things about…

99
00:14:26.560 --> 00:14:36.389
David Bau: input attribution is that it is about, you know, you can analyze single examples. It's trying to get you an answer about single examples, right?

100
00:14:36.390 --> 00:14:45.500
Sarti, Gabriele: Yeah, yeah, yeah, exactly. So, it's really a local thing, right? It doesn't, like, it doesn't give you any intuition about the behavior of the model globally.

101
00:14:45.500 --> 00:14:57.319
Sarti, Gabriele: but rather on that specific example. If you want to get this global intuition about model behavior, you probably have to repeat the thing on a dataset, right, and see whether the trend that you observe on that example holds

102
00:14:57.320 --> 00:14:58.639
Sarti, Gabriele: In general, huh.

103
00:14:59.720 --> 00:15:00.510
Nikhil Prakash: Okay.

104
00:15:00.790 --> 00:15:01.400
Sarti, Gabriele: Yep.

105
00:15:01.610 --> 00:15:02.630
Nikhil Prakash: Yeah, okay, thanks.

106
00:15:02.630 --> 00:15:12.990
Sarti, Gabriele: So yeah, so this all relates to the fact that the devil is in the details for attribution methods, and we got some questions here, so what.

107
00:15:12.990 --> 00:15:15.700
David Bau: Oh yeah, we'll have them ask the questions.

108
00:15:15.890 --> 00:15:20.050
David Bau: Courtney had a question. Armita had a question. What were your questions?

109
00:15:24.230 --> 00:15:43.349
Armita Kazeminajafabadi: So for, we had this problem in our, like, my research project that we wanted to build adversarial input examples, and, many of them are gradient-based, and our input

110
00:15:43.680 --> 00:15:51.440
Armita Kazeminajafabadi: Our data was, like, discrete, so… Mmm… changing them.

111
00:15:51.600 --> 00:16:06.359
Armita Kazeminajafabadi: in a way that, like, increases the gradients, sometimes wasn't, so meaningful, because we had few perturbation options. But in this con- context, I was wondering if

112
00:16:06.960 --> 00:16:11.329
Armita Kazeminajafabadi: The same issue exists.

113
00:16:12.870 --> 00:16:32.420
Sarti, Gabriele: Yeah, so in a sense, as you saw in the example before, the way that we move from, like, the continuous space that the model operates on to the discrete space of, for example, the vocabulary of the model is by just aggregating, right, whatever we get in the continuous space at the token level. I agree that in your case of, like.

114
00:16:32.420 --> 00:16:46.929
Sarti, Gabriele: trying to optimize these embeddings, maybe, like, in a way that they still map onto words, probably you would have to have, like, some sort of projection to the nearest neighbor procedure, right? That can be quite noisy, in the case of…

115
00:16:46.930 --> 00:16:51.460
David Bau: Is that what you did in your experiment, Armita? Did you do a nearest neighbor thing?

116
00:16:51.460 --> 00:16:55.960
Armita Kazeminajafabadi: No, no, we, didn't do gradient paste.

117
00:16:56.200 --> 00:16:58.849
David Bau: Oh, you, you, you moved to something different. Okay, cool.

118
00:16:58.850 --> 00:16:59.350
Sarti, Gabriele: Yeah.

119
00:16:59.740 --> 00:17:00.450
David Bau: Huh.

120
00:17:01.010 --> 00:17:03.369
David Bau: Is Courtney here? Does Courtney have a question?

121
00:17:04.940 --> 00:17:21.690
Courtney Maynard: Yeah, hi, my question is about, whether or not you can tell when, like, a specific token, or in the Mirage example, a specific document is actually negatively contributing to a prediction, or, like, taking away from moving away from the correct, answer, or if the attributions are only positive.

122
00:17:22.859 --> 00:17:33.369
Sarti, Gabriele: Yeah, so in the gradient case, definitely you can do that. Also, in the occlusion case, so overall, let's say in the occlusion case, you would have the simple

123
00:17:33.369 --> 00:17:43.529
Sarti, Gabriele: The simple setup where, this estimated importance could either increase or decrease the probability, so you could see this as a positive or negative contribution.

124
00:17:43.599 --> 00:18:07.789
Sarti, Gabriele: While in the gradient case, gradients will be signed in these vectors, so depending on the kind of attribution that you do, and now I'm kind of getting a bit ahead on what I'm saying here, depending on the aggregation, this information might get lost. For example, if you take just the L2 norm of the vector, then you're effectively just getting a positive value

125
00:18:07.829 --> 00:18:22.769
Sarti, Gabriele: that is overall how much it contributes, abstracting away positive and negative contribution. But if you sum, for example, you would get, something that is… that can also be negative, right? If all the dimensions are kind of negative.

126
00:18:25.530 --> 00:18:28.750
David Bau: So, are there other ways of aggregating that you wouldn't lose it?

127
00:18:31.020 --> 00:18:47.540
Sarti, Gabriele: I would say probably the sum is the most common here. You could also get, like, something like, you know, just this kind of heuristic, I guess, like, the maximal dimension within the embedding. If it's a negative dimension, then you would say that it's a negative contribution.

128
00:18:47.860 --> 00:18:50.080
Sarti, Gabriele: There has been…

129
00:18:50.460 --> 00:18:54.920
David Bau: There's something we saw in the, in BeanCam's TCAV paper.

130
00:18:54.920 --> 00:18:57.890
Sarti, Gabriele: Where, once she had a gradient, she dot producted it.

131
00:18:57.890 --> 00:19:10.180
David Bau: with, you know, some particular vector of interest, right? You might dot product it with a class, or dot product it with, like, a token or something like that. That's true, that's true. Do people ever do that, or…

132
00:19:10.850 --> 00:19:21.160
Sarti, Gabriele: Maybe… so I don't have a slide for that, but I think it's interesting to relate it to the dot product, right? It's interesting to think of when we talk about gradient attribution.

133
00:19:21.160 --> 00:19:33.990
Sarti, Gabriele: to think of the gradient vectors per se versus the gradient times whatever input they would be applied to, right? So these are two common attribution methods, like just gradient, row gradient, and gradient times input.

134
00:19:34.030 --> 00:19:51.780
Sarti, Gabriele: And, like, the gradient per se tells you the sensitivity, kind of, of the inputs to the… like, of the prediction to the inputs, but the moment you multiply them, you actually get some sort of scaling by… by actually considering what these gradients would be applied to, right?

135
00:19:51.780 --> 00:20:10.909
Sarti, Gabriele: you might have very high gradients just because the dimension that they would be applied to is very small, right? So yeah, so definitely this relates to what you're saying, David. I think, depending on what you're looking for, it might make sense to consider gradients in relation to their inputs, and not just by themselves, right?

136
00:20:12.110 --> 00:20:17.959
David Bau: Yeah, I think so, and I think that might give you a sign, also, that you could have a positive one or a negative one in that case.

137
00:20:17.960 --> 00:20:19.060
Sarti, Gabriele: Yeah, exactly.

138
00:20:19.060 --> 00:20:20.080
David Bau: Pretty interesting.

139
00:20:20.350 --> 00:20:24.030
David Bau: Question from Courtney. Okay, let's keep on going, I don't want to slow you down too much.

140
00:20:24.030 --> 00:20:25.900
Sarti, Gabriele: No worries, Amish.

141
00:20:26.430 --> 00:20:37.369
Sarti, Gabriele: Yeah, so I just want to highlight here, this, like, plenty of people work on this, and there's no established norms, basically. So every new paper that does attribution does it in a different way.

142
00:20:37.370 --> 00:20:46.649
Sarti, Gabriele: And there's no consensus of, like, oh, this always works the best, you know? So I think there's a lot to be… to be explored still in this area.

143
00:20:49.160 --> 00:21:03.069
Sarti, Gabriele: So, one of the readings that you had was integrated gradients, and I took this image, which was quite nice, which was from this blog post that says integrated gradients is a decent attribution method, which I found funny.

144
00:21:03.230 --> 00:21:10.660
Sarti, Gabriele: And yeah, the intuition is what you see here in the image, is you have a starting point here.

145
00:21:10.660 --> 00:21:24.359
Sarti, Gabriele: you have an endpoint, which is gonna be your baseline, and effectively, you're taking steps, right? Like, ideally, this would be an integral, but actually, you're probably approximating this by taking steps alongside this… this straight line.

146
00:21:24.360 --> 00:21:42.759
Sarti, Gabriele: So this is the activation space, and then here you would have two input features. In reality, we have many more, and what you're doing is just by adding up the contributions of each one of the two features, right? So this is a nice way of visualizing what's happening here.

147
00:21:43.910 --> 00:21:45.050
Sarti, Gabriele: So…

148
00:21:45.770 --> 00:21:54.089
Sarti, Gabriele: Integrated gradient is quite robust, because of this property of, like, you know, considering contributions alongside this path.

149
00:21:54.150 --> 00:22:06.089
Sarti, Gabriele: In practice, though, to get a good approximation, probably it's quite expensive to run, meaning that you will need many approximation steps alongside this line.

150
00:22:06.130 --> 00:22:31.110
Sarti, Gabriele: And as many of you noted, the lack of a good baseline is often a problem, actually. So, especially in NLP, people have tried the zero vector, which, again, doesn't mean anything in the NLP world. They tried to use the kind of, like, out-of-vocabulary token, or the mask token. I think, overall, the best idea is really to do random sampling.

151
00:22:31.110 --> 00:22:38.209
Sarti, Gabriele: And just averaging out, but this, again, is extremely expensive, so… No consensus there.

152
00:22:39.390 --> 00:22:51.220
Sarti, Gabriele: So there were some interesting variants that were proposed specific to NLP. I have an image here. So the idea here was, instead of going for a straight path, let's instead

153
00:22:51.220 --> 00:23:05.070
Sarti, Gabriele: do this kind of clumping to the nearest neighbor, kind of like we were saying before, and let's find the path that passes through existing tokens in the model vocabulary. So let's take the integral with respect to this path.

154
00:23:05.290 --> 00:23:13.490
Sarti, Gabriele: it kind of works, but it's not that much better, so again, there's no, like, one-size-fits-all for this kind of method.

155
00:23:14.030 --> 00:23:34.949
Sarti, Gabriele: And I also wanted to highlight this smooth grad, which is another different approach, which is just introducing noise to the gradient estimation, which is some Gaussian, for example, noise. This also improves robustness, so this points to the fact that actually what matters is the robustness in this kind of evaluations, right?

156
00:23:34.980 --> 00:23:37.340
Sarti, Gabriele: Because the gradients are noisy, overall.

157
00:23:38.100 --> 00:23:52.019
Sarti, Gabriele: Yeah, so there were several questions regarding the implementation invariance properties here, whether they always hold, whether they're good, and whether the axioms matter in practice.

158
00:23:53.590 --> 00:24:02.479
Sarti, Gabriele: I think overall, my perspective is that they don't really matter, in the sense that I feel like this is really the

159
00:24:02.500 --> 00:24:11.679
Sarti, Gabriele: you know, best case scenario, although integrated gradients is not really the best, nor the most efficient method out there, so…

160
00:24:11.680 --> 00:24:29.830
Sarti, Gabriele: probably, especially this implementation invariance, is kind of like a, you know, a moonshot kind of goal of, like, having two models that are exactly identical, except for, you know, the implementation, but they behave exactly identically. I think this is not very realistic.

161
00:24:30.540 --> 00:24:49.329
Sarti, Gabriele: Yeah, so there was someone that was mentioning, I don't remember if it was Claire, maybe, that was mentioning in their example about the, like, what if we have this classification that is based on the background versus looking at the subject in the photo, and then the two behaviors are

162
00:24:49.330 --> 00:24:56.949
Sarti, Gabriele: the same, but for different reasons, right? But then the… maybe you can correct me if I said it wrong.

163
00:24:57.150 --> 00:24:58.249
Claire Schlesinger: No, that's right.

164
00:24:58.790 --> 00:24:59.779
Sarti, Gabriele: Right, right.

165
00:24:59.910 --> 00:25:16.390
Sarti, Gabriele: Yeah, so my take on that is that if indeed this was the case, like, implementation environments would apply to all sorts of examples, right? So you should be able to craft an example if the heuristics are different, such that the two networks would have different behaviors.

166
00:25:16.390 --> 00:25:23.200
Sarti, Gabriele: And if the two networks have different behaviors, then implementation invariance doesn't apply, right? So then,

167
00:25:23.690 --> 00:25:43.960
Sarti, Gabriele: Yeah, I feel like, again, what you're pointing at is the fact that for a given set of examples, this might be the case, that the two methods, like, the two networks are behaving the same way, but in practice, this is rarely the case. If they have different heuristics, you should be able to find examples that lead them to behave.

168
00:25:44.270 --> 00:25:47.140
Sarti, Gabriele: D. Friendly, right?

169
00:25:50.940 --> 00:25:53.259
Sarti, Gabriele: I don't know if you agree with that.

170
00:25:56.970 --> 00:25:59.020
Sarti, Gabriele: Alright,

171
00:25:59.350 --> 00:26:13.849
Sarti, Gabriele: So, one thing that I wanted to mention is that my… my opinion on the most promising approaches that are currently used in this space are methods that are basically trying to tweak this kind of gradient propagation.

172
00:26:13.940 --> 00:26:29.290
Sarti, Gabriele: Without making it too inefficient, so they don't take this kind of approach of, like, taking steps or, you know, multiple prediction steps, but they are just crafting some either custom propagation rule, like in the case of

173
00:26:29.290 --> 00:26:39.839
Sarti, Gabriele: layer-wise relevance propagation, or accounting for transformer quirks. So this… this dream method, for example, is something that was recently proposed.

174
00:26:39.840 --> 00:27:03.639
Sarti, Gabriele: And one of the things that they were proposing is, can we compensate for, for this kind of behavior, where if you remove something in the… from the input, then the softmax would reallocate the importance, and then this will lead to different behavior, right? So I think these are very interesting, especially because their cost in the end is the same as regular gradient-based attribution.

175
00:27:03.640 --> 00:27:07.410
Sarti, Gabriele: So, much more efficient than integrated gradients, so probably

176
00:27:07.410 --> 00:27:11.629
Sarti, Gabriele: Can scale to these kind of applications where we use language models, right?

177
00:27:14.430 --> 00:27:15.630
Sarti, Gabriele: Alright.

178
00:27:16.790 --> 00:27:27.180
Sarti, Gabriele: So, one area that has been explored to some degree, but, yeah, with still with mixed success, is instead the idea of, like.

179
00:27:27.190 --> 00:27:40.509
Sarti, Gabriele: Just looking at model internals. So, for now, we always relied on prediction, right? Either we take the gradient with respect to the prediction, or we take, we look at the difference in prediction.

180
00:27:40.510 --> 00:27:49.539
Sarti, Gabriele: When, I don't know, ablating, occluding components. But here, maybe we can just look at some properties within the network.

181
00:27:49.540 --> 00:28:04.319
Sarti, Gabriele: to try to understand how the model is allocating importance to different components in the input. For example, initially people were very keen on doing that with attention weights, right? And this has been kind of, like, controversial.

182
00:28:04.320 --> 00:28:14.350
Sarti, Gabriele: So you're in this mixed success links, you find two papers that are titled, Attention is Not Explanation, and Attention Is Not Not Explanation.

183
00:28:14.390 --> 00:28:19.059
Sarti, Gabriele: So people debated that, quite a lot.

184
00:28:19.180 --> 00:28:22.780
Sarti, Gabriele: I think there are some promising works here,

185
00:28:23.010 --> 00:28:28.750
Sarti, Gabriele: So, as I said, initial work, we're looking at attention weights in a vacuum.

186
00:28:28.910 --> 00:28:34.860
Sarti, Gabriele: Which was quite misleading. Then there has been some work in that direction that thought.

187
00:28:34.910 --> 00:28:51.470
Sarti, Gabriele: The reason why this is misleading is that we're not considering the actual vector, so the value vectors that these weights are multiplied by, so why don't we instead look at the vectors, so the magnitude of the resulting vectors, rather than looking at the attention weight?

188
00:28:51.540 --> 00:29:08.309
Sarti, Gabriele: So, in this example here, you can see the second attention weight is quite large, so you would say, oh, this word is very important, but actually the vector is very small, and maybe this is just a compensatory behavior, right? To get a final vector that is not too small.

189
00:29:08.990 --> 00:29:12.300
Sarti, Gabriele: And the final perspective here was.

190
00:29:12.720 --> 00:29:28.260
Sarti, Gabriele: can we relate these three vectors here to the final outcome of the attention operation, and see which one of these is closer to that? If this green vector, for example, is closer to the final outcome of the attention operation.

191
00:29:28.260 --> 00:29:39.970
Sarti, Gabriele: then it might be that it's the most, influential towards that computation, right? Meaning it's the most aligned with whatever the attention is doing to the, to the vectors in the input.

192
00:29:40.010 --> 00:29:45.160
Sarti, Gabriele: so these are some pointers for you if you're interested in digging deeper.

193
00:29:45.310 --> 00:30:03.909
Sarti, Gabriele: The cool part about these approaches is that it's entirely, forward-based, so there's no backpropagation, super efficient at inference time, can scale very well with, like, long context, you know, big models, so definitely that's the big appeal of this kind of methods.

194
00:30:06.510 --> 00:30:12.499
Sarti, Gabriele: All right, so I wanted to give you an overview also of the NSICT.

195
00:30:12.500 --> 00:30:20.829
David Bau: No, actually, can I pause you for a moment? Gabriel, so you've… so you sort of described two, or maybe three different classes and methods now, so you've.

196
00:30:20.830 --> 00:30:21.180
Sarti, Gabriele: Right.

197
00:30:22.610 --> 00:30:27.339
David Bau: You've described these kind of interesting gradient-based methods.

198
00:30:27.640 --> 00:30:36.999
David Bau: And then these… I guess it's… maybe there's kind of attention-based methods, or are there things other than… you say internal-based, but maybe they're mainly attention-based methods.

199
00:30:37.480 --> 00:30:38.320
Sarti, Gabriele: Yeah, yeah, yeah.

200
00:30:38.570 --> 00:30:43.419
David Bau: So, like, okay, so we got… we got all these students here trying to do their projects, and they're…

201
00:30:43.420 --> 00:30:43.970
Sarti, Gabriele: Sit.

202
00:30:43.970 --> 00:30:56.309
David Bau: They're probably going to try some input attribution at some point. What's your opinion? Do you, like, would you advise people to do one of these or the others? One of them…

203
00:30:56.410 --> 00:31:02.780
David Bau: You know, things that people believe more, or… yeah, what… I just want to get a sense for your own opinion.

204
00:31:02.780 --> 00:31:19.119
Sarti, Gabriele: Yeah, so I think my perspective is, again, these gradient-based ones, these specific variants seem to be the most fateful at the moment, and they are kind of actionable if you're working with models that are not extremely large.

205
00:31:19.190 --> 00:31:23.000
Sarti, Gabriele: So probably, if I had to go for something, I would go for…

206
00:31:23.310 --> 00:31:35.249
Sarti, Gabriele: one of these two methods that I'm linking down here. They both have quite nice implementations available, so that's also a big plus, right? Kind of like, you can just plug and play with existing models,

207
00:31:35.510 --> 00:31:48.259
Sarti, Gabriele: So, yeah, so that would be my guess. If I really found out that this wasn't scalable enough for the use case that I'm working on for whatever reason, because it's too long of a context, too big of a model.

208
00:31:48.260 --> 00:31:58.480
Sarti, Gabriele: I guess my second choice would be some of these, these information flow routes, for example, which is a forward-only method, probably would be my go-to.

209
00:31:58.480 --> 00:32:03.520
Sarti, Gabriele: yeah. Nice. I guess that's my… my opinion, yeah.

210
00:32:03.640 --> 00:32:04.210
David Bau: Cool.

211
00:32:06.780 --> 00:32:07.430
Nikhil Prakash: One question.

212
00:32:07.890 --> 00:32:09.440
Sarti, Gabriele: Yo, yep, sorry.

213
00:32:09.440 --> 00:32:18.279
Nikhil Prakash: Sorry, yeah. So, so the gradient-based approaches, we… Based on your experience, how…

214
00:32:18.830 --> 00:32:28.429
Nikhil Prakash: How much does it scale for long context? Let's say if I want to analyze a chain of thought and want to, let's say, understand which are the important input tokens.

215
00:32:28.730 --> 00:32:35.019
Nikhil Prakash: Or just important input sentences. Would the integrated gradient methods work?

216
00:32:36.590 --> 00:32:41.930
Sarti, Gabriele: Yeah, the only downside with gradients is that sometimes we find

217
00:32:41.930 --> 00:32:59.369
Sarti, Gabriele: some sort of spread out of probabilities, so it's unlikely that all the tokens will receive exactly zero importance, right? Even if we were to update them, probably the output would be irrelevant for many of them, like, I don't know, function words, you know, this kind of…

218
00:33:00.280 --> 00:33:13.269
Sarti, Gabriele: unrelated stuff, so that's more of a property of the gradient, and if the context is very large, the importance will tend to be very spread out. I think this is something that can maybe be mitigated by using different,

219
00:33:13.960 --> 00:33:30.100
Sarti, Gabriele: in the future, meaning, like, when designing model architectures, I think in general, you know, going towards more sparse activation function, for example, sparse MUX instead of soft MUX, that would promote this kind of sparsity at the output level, and

220
00:33:30.100 --> 00:33:35.789
Sarti, Gabriele: Potentially, this could also reflect into a sparsity in the input when taking gradients with respect to that.

221
00:33:35.800 --> 00:33:38.460
Sarti, Gabriele: So…

222
00:33:39.070 --> 00:33:58.429
Sarti, Gabriele: Yeah, so I agree with gradients, that's a potential failure case. I think even if spread out, the magnitudes would still be informative, though. I have an example later on, on retrieval augmented generation, you can see that even for longer contexts, this is somewhat informative, yeah.

223
00:33:59.080 --> 00:34:00.580
Nikhil Prakash: Okay, okay, cool, thanks.

224
00:34:03.230 --> 00:34:23.190
Sarti, Gabriele: All right. Yeah, so I just wanted to show you this toolkit that we built, that is exactly, for using attribution methods, mostly gradient-based, on language models. So the idea here is that you have your Hugging Face model, let's say a GPT-like model that receives a prompt.

225
00:34:23.340 --> 00:34:36.649
Sarti, Gabriele: And the model will do autoregressive generation, predicting one word at a time. For example, to innovate, one should think outside the box. And what the toolkit allows you to do is simply, at every generation step.

226
00:34:36.650 --> 00:34:47.049
Sarti, Gabriele: We extract the attribution scores for the given prefix, and we can extract also quantities of interest, for example.

227
00:34:47.050 --> 00:34:59.759
Sarti, Gabriele: the probability of the output, the entropy of the output, which are also what we're taking the gradient with respect to, right? So, the final outcome of all this would be something that resembles this table.

228
00:35:01.030 --> 00:35:17.010
Sarti, Gabriele: So the way that you read this is that the columns are the tokens that were generated, and on the rows is the prompt at every generation step. So you see this triangular pattern here is because every new token gets added as an element in the prompt at every generation step.

229
00:35:17.250 --> 00:35:21.459
Sarti, Gabriele: So it plays… it has an influence on the next steps of prediction.

230
00:35:21.750 --> 00:35:36.680
Sarti, Gabriele: And you also have, for example, some information of interest, like here I'm also extracting the probability, of the predicted token at every step, kind of like, yeah, the final prediction probability.

231
00:35:36.930 --> 00:35:49.380
Sarti, Gabriele: So you can see in this example, that the moment that the model starts predicting think outside the box, the model is mostly relying on innovate, which is the key word to predict the multi-word expression.

232
00:35:49.380 --> 00:36:06.130
Sarti, Gabriele: But the moment it starts producing the sequence, its kind of saliency shifts towards the previous tokens in the sequence, which kind of reflects the model knows where it's going and is just looking at the prefix to finish the expression, right?

233
00:36:06.130 --> 00:36:13.839
Sarti, Gabriele: This is also reflected by the probability that starts, kind of, 50%, but then it becomes increasingly closer to 100%.

234
00:36:15.020 --> 00:36:32.130
Sarti, Gabriele: So, yeah, here you have easy access to, like, a dozen attribution methods, including attention weights, gradient base, internal space, specifically for generative LMs, so, like, both encoder, decoder, or decoder-only language models.

235
00:36:32.500 --> 00:36:47.420
Sarti, Gabriele: And in the paper that we had related to this toolkit, we did a couple case studies. The first was to study gender bias in machine translation, so we were highlighting that pronouns have a big role when the model decides to use

236
00:36:47.420 --> 00:37:01.499
Sarti, Gabriele: sorry, that professions, stereotypical professions, have a big role when the model decides to translate as he or she from a language that doesn't have the distinction.

237
00:37:02.140 --> 00:37:06.849
Sarti, Gabriele: And then we also try to approximate patching, so I have a slide on that, later.

238
00:37:08.630 --> 00:37:18.819
Sarti, Gabriele: So, this is a simple example of using the library. So, here we load a model with integrated gradients, which is the method that you saw.

239
00:37:19.020 --> 00:37:33.939
Sarti, Gabriele: And we do this model.attribute, which is kind of like the generate function in Hugging Face, just on top of that, we also extract whatever we're doing, right? So here it's, we prompt the model with, does 3 plus 2 equals 6?

240
00:37:34.050 --> 00:37:50.719
Sarti, Gabriele: And the output that gets generated can then be visualized with show, and it looks like this. So the model predicts yes, end of sequence, and here we have attribution scores for the prompt, plus the yes in the case of the end of sequence, right?

241
00:37:51.230 --> 00:37:56.400
Sarti, Gabriele: So, do you notice something kind of weird here?

242
00:38:01.500 --> 00:38:06.250
Sarti, Gabriele: Like, is this kind of attribution pattern what you would expect for this kind of task?

243
00:38:09.690 --> 00:38:17.759
David Bau: So I'm gonna… I'm gonna just talk out loud here, because I'm really being slow here, but… let's see, so does 3 plus 3…

244
00:38:18.650 --> 00:38:20.510
David Bau: equals 6.

245
00:38:20.860 --> 00:38:23.400
David Bau: And… I would expect

246
00:38:23.530 --> 00:38:28.519
David Bau: Oh, I see. So then it's gotta decide, so it's deciding whether this is true or not.

247
00:38:28.860 --> 00:38:31.050
David Bau: And I would expect that

248
00:38:32.100 --> 00:38:37.060
David Bau: The things that it would have to look at to decide whether it's true is…

249
00:38:37.570 --> 00:38:39.810
David Bau: It needs to look at the answer.

250
00:38:40.850 --> 00:38:45.980
Sarti, Gabriele: And I have to look at the question. I have to look at the question and answer, right? So, like, if you…

251
00:38:46.050 --> 00:38:57.330
David Bau: Right? So, like, so, like, if we said, like, 3 plus 4 equals 6, right, that would be a different answer. So, like, the 3 and the 3 and the 6 seem like they'd probably be the most important things. Maybe the plus.

252
00:38:57.330 --> 00:39:01.570
Sarti, Gabriele: That's right. That's right. It's like they're the least important parts of the sentence.

253
00:39:01.790 --> 00:39:20.499
Sarti, Gabriele: Indeed, indeed. So, like, my hypothesis by looking at this example was, can it be that this model is just so strongly trying to figure out the output desired format? Like, the fact that this is a yes-no question rather than a mathematical, come-up-with-your-answer question.

254
00:39:20.500 --> 00:39:24.730
Sarti, Gabriele: That the fact of these function words receiving high importance is just because

255
00:39:24.730 --> 00:39:31.820
Sarti, Gabriele: These are what are driving the final format, right? These are what produces the yes, rather than the equation itself, right?

256
00:39:32.250 --> 00:39:32.840
Sarti, Gabriele: But then.

257
00:39:32.840 --> 00:39:41.370
David Bau: Oh, right, because you're saying there's 50,000 other words that it could say here. It could say pumpkin, or whatever, right?

258
00:39:41.370 --> 00:40:01.060
Sarti, Gabriele: Well, it could be that this model is trained to do maths, right? And then it's usually prompted to produce an answer given a mathematical expression, but in this case, the answer is not a number, right? So then it has to rely pretty heavily on function words that define the kind of expected format, which is yes-no, right?

259
00:40:01.060 --> 00:40:02.889
David Bau: I see, I see. So…

260
00:40:02.950 --> 00:40:11.149
Sarti, Gabriele: So, this can lead us to formulate some hypotheses, right? Like, the equation is not getting much importance, so can it be that the model is actually getting it

261
00:40:11.290 --> 00:40:19.280
Sarti, Gabriele: right for the wrong reason, right? Maybe it's not actually caring much about the equation, and it's just betting 50-50, yes or no, right?

262
00:40:19.600 --> 00:40:22.009
Sarti, Gabriele: And this can be tested.

263
00:40:22.170 --> 00:40:33.769
Sarti, Gabriele: And indeed, we find that this… this is a pretty old model, but it's saying, yes, also for… does 3 plus 3 equals 7, right? So in this case, I just gave you an example of, like.

264
00:40:33.770 --> 00:40:43.449
Sarti, Gabriele: we started from some attribution to formulate hypotheses that then we test behaviorally, right? So, the thing that I want to emphasize here is

265
00:40:43.450 --> 00:40:49.619
Sarti, Gabriele: The attribution itself didn't give us any causal confirmation that what we were trying to do,

266
00:40:50.030 --> 00:41:03.839
Sarti, Gabriele: was, like, that the model wasn't actually doing the expression, but kind of highlighted that maybe the importance was a bit off for this kind of problem, right? So this could be valuable for this kind of hypothesis generation.

267
00:41:05.120 --> 00:41:06.849
Sarti, Gabriele: Sorry, I'm seeing that.

268
00:41:06.850 --> 00:41:07.300
David Bau: Oh yeah, there's.

269
00:41:07.450 --> 00:41:10.569
Sarti, Gabriele: Oh, Jasmine wrote, some messages.

270
00:41:12.230 --> 00:41:16.850
Sarti, Gabriele: Yeah, I can… yeah, we have some examples later,

271
00:41:17.700 --> 00:41:23.949
Sarti, Gabriele: yeah, I can… I can try to discuss more about those, in the next few slides.

272
00:41:25.440 --> 00:41:30.849
Sarti, Gabriele: So… There were some questions about faithfulness,

273
00:41:31.730 --> 00:41:36.850
Sarti, Gabriele: So, yeah, I don't know if people are here, maybe they want to ask them themselves.

274
00:41:37.720 --> 00:41:39.839
David Bau: Lose, Luz, yes, go ahead.

275
00:41:40.660 --> 00:41:42.330
David Bau: Or Jasmine, either one.

276
00:41:43.040 --> 00:41:48.459
jasminec: Oh, I guess, like, for me, I was wondering, like, when, like, an explanation is, like.

277
00:41:48.590 --> 00:42:03.210
jasminec: informative enough. Like, for instance, like, if someone asked me, like, why did you eat an egg this morning? Like, I could say, like, because I was hungry, but I could also say something like, okay, like, when I was born, like, my mom fed me, you know what I mean? Like, it could start.

278
00:42:03.210 --> 00:42:03.670
Sarti, Gabriele: room.

279
00:42:03.670 --> 00:42:11.840
jasminec: 20-something years ago. Like, how do you know, like, when you have enough information? You're like, that's, like, a reasonable explanation.

280
00:42:12.290 --> 00:42:12.990
Sarti, Gabriele: Right.

281
00:42:13.240 --> 00:42:25.680
Sarti, Gabriele: Yeah, I think my answer to that is whenever it's sufficient to predict behavior in the current use case, right? Like, at a satisfactory level, I think that's probably…

282
00:42:25.900 --> 00:42:30.600
Sarti, Gabriele: The way that we tend to operationalize faithfulness overall, so…

283
00:42:30.720 --> 00:42:49.080
Sarti, Gabriele: Yeah, so I would say that's the best way to understand, you know, like, if our explanation allows us to, to some degree, to understand, you know, what the model is doing there, and if we can act upon it, then probably our explanation is faithful with respect to how the model is doing things internally, right?

284
00:42:49.080 --> 00:43:07.389
Sarti, Gabriele: And maybe, you know, if it's a high-risk domain, probably I wouldn't trust even very faithful explanation in the medical domain, because the stakes are very high, right? So then, yeah, you really need, like, to weight your expectations based on the kind of application, I guess.

285
00:43:10.540 --> 00:43:18.950
Sarti, Gabriele: So yeah, so, I like two dimensions in faithfulness. So this is a paper, actually, from Northeastern, from Barron Group.

286
00:43:19.930 --> 00:43:34.269
Sarti, Gabriele: So, they were working on faithfulness, very early in interpretability, and one way that they, that they were defining faithfulness was two complementary perspectives. So.

287
00:43:34.980 --> 00:43:53.190
Sarti, Gabriele: if we want actionable results, is if these tokens are found to be important, one idea is if we drop these tokens, then we expect a big impact on the results, right? So this is what they call comprehensiveness. It's kind of like ablation, right? Kind of like occlusion. We occlude, we…

288
00:43:53.250 --> 00:43:55.140
Sarti, Gabriele: We cause a big impact.

289
00:43:55.190 --> 00:44:06.279
Sarti, Gabriele: The other perspective is efficiency. So here we're saying, if we only have these tokens, and we remove all the rest, does the result… do the results remain kind of consistent, right?

290
00:44:06.280 --> 00:44:15.959
Sarti, Gabriele: So, yeah, this is just a visualization. If we find most amount of leaves as the important part, we would have comprehensiveness, it's like, yeah.

291
00:44:15.960 --> 00:44:18.119
Sarti, Gabriele: If we,

292
00:44:18.220 --> 00:44:26.440
Sarti, Gabriele: If we drop that, the probability drops. If we only keep that, the probability kind of stays the same. This is sufficiency.

293
00:44:27.630 --> 00:44:34.450
Sarti, Gabriele: So, the Luz question about… I think… I don't know if Luz is here?

294
00:44:38.520 --> 00:44:39.550
Sarti, Gabriele: Maybe not.

295
00:44:42.960 --> 00:44:43.760
Sarti, Gabriele: Right?

296
00:44:44.110 --> 00:44:58.150
Sarti, Gabriele: I can summarize it. So the, the question was, if the models are able to explain themselves, like to, for example, for… for Mirage, it was if the models can cite

297
00:44:58.230 --> 00:45:06.780
Sarti, Gabriele: alongside giving an answer. Why do we even need a faithful method that looks at the internals, right, if the model can already do that?

298
00:45:06.890 --> 00:45:11.860
Sarti, Gabriele: And our point with that paper was actually that

299
00:45:12.660 --> 00:45:20.349
Sarti, Gabriele: The fact that models are so capable as to being kind of precise is kind of a recent thing, and before, we were just relying on superficial matching.

300
00:45:20.350 --> 00:45:35.610
Sarti, Gabriele: So the citation was done mostly by saying, oh, here is the answer, here is the three documents that the model received, let's just embed those and look at the similarity between those and find whatever is the most similar, right?

301
00:45:35.610 --> 00:45:39.459
Sarti, Gabriele: So, I think that the answer to this question is, I think.

302
00:45:39.460 --> 00:45:48.190
Sarti, Gabriele: There is an interesting perspective where we try to make models more aware of their inner workings, and in that direction.

303
00:45:48.190 --> 00:46:03.289
Sarti, Gabriele: potentially, we wouldn't need so much of, like, digging deep into the model if the model indeed can kind of self-predict well enough, right? I think we're still kind of far from this perspective, though, so I think probably we do need,

304
00:46:03.600 --> 00:46:08.059
Sarti, Gabriele: We do need these kind of methods still, for the foreseeable future.

305
00:46:09.960 --> 00:46:29.559
Sarti, Gabriele: And the complementary perspective to faithfulness is plausibility. So the plausibility dimension is user-centric instead of being model-centric, and it's asking, can these explanations be understood by whatever users of the system are looking at the thing?

306
00:46:30.100 --> 00:46:43.449
Sarti, Gabriele: So, one big problem is that you don't have guarantees that faithfulness and plausibility go hand in hand, right? So ideally, the more faithful you are to the model, the more

307
00:46:43.450 --> 00:46:52.979
Sarti, Gabriele: understandable you are for humans, but potentially there could be a disconnect there, right? So sometimes this is also being highlighted as a trade-off.

308
00:46:52.980 --> 00:47:04.789
Sarti, Gabriele: So, the more you make explanation plausible, the more you're abstracting away the inner complexity of the model, and this produces this kind of mismatch, that maybe you're kind of like.

309
00:47:05.060 --> 00:47:09.059
Sarti, Gabriele: Diluting too much the complexity for users.

310
00:47:10.250 --> 00:47:28.699
Sarti, Gabriele: So, again, this is application dependent, and one interesting idea that I like in this domain is counterfactual simulation, so the kind of, setting would be, if you have an example and you can get attribution out of this, the idea would be to,

311
00:47:28.700 --> 00:47:42.310
Sarti, Gabriele: given the attribution, can I simulate a counterfactual that would produce a different behavior, right? Like in the other case that we saw just before.

312
00:47:42.310 --> 00:47:56.470
Sarti, Gabriele: Can I simulate the fact that if I change the 6 into a 7, the model still predicts CS, right? And then, what you would do is to compare, you know, the expected result with the actual behavior in this case.

313
00:47:56.470 --> 00:48:02.899
Sarti, Gabriele: So this is a good way to understand whether what you're looking at is kind of plausible or not, right?

314
00:48:05.600 --> 00:48:06.790
Sarti, Gabriele: Alright.

315
00:48:07.780 --> 00:48:12.210
Sarti, Gabriele: So, now talking…

316
00:48:12.210 --> 00:48:15.410
David Bau: What was the paper you cited for that one? Sorry, connecting?

317
00:48:15.410 --> 00:48:20.499
Sarti, Gabriele: Oh, yeah, it's a paper from some years ago from Greg Durrett,

318
00:48:20.950 --> 00:48:27.949
Sarti, Gabriele: from the group of Greg Durrett about, like, doing this kind of counterfactual for evaluating explanations,

319
00:48:28.260 --> 00:48:32.459
David Bau: And so was it… was that paper about, like, sort of human evaluations? What did Direct do?

320
00:48:32.610 --> 00:48:40.509
Sarti, Gabriele: Yeah, yeah, they were trying to do similar stuff, like, this figure is taken from there, so, so this is their setup, actually, yeah.

321
00:48:40.950 --> 00:48:57.349
Sarti, Gabriele: they were doing it mostly for classifiers, though, NLP classifiers. So I do think that it's a bit more challenging to do that in the generation setting. That's a bit related to what we did for the PCOR method, actually.

322
00:48:58.030 --> 00:48:58.410
David Bau: Okay, cool.

323
00:48:58.410 --> 00:48:58.910
Sarti, Gabriele: Yep.

324
00:49:01.740 --> 00:49:09.890
Sarti, Gabriele: Yeah, so, like, the contrastive attribution setup, I think it's very compelling for language.

325
00:49:10.050 --> 00:49:11.760
Sarti, Gabriele: And,

326
00:49:12.320 --> 00:49:23.180
Sarti, Gabriele: Yeah, I just want you to focus for now on the example that I have on the left. So, if you have this input, right, can you stop the dog from, and the model predicts barking.

327
00:49:23.340 --> 00:49:42.059
Sarti, Gabriele: If we do gradient-based attributions, so this is just simple gradients taken with respect to the inputs, and we do the aggregation, as I showed before, you would get some scores that look like this. So, red is positive and blue is negative in this setting.

328
00:49:42.450 --> 00:49:50.109
Sarti, Gabriele: So, do you think this is intuitive, what you're seeing here? Like, these attribution scores, do they make sense to you, given this prompt?

329
00:50:02.030 --> 00:50:03.620
Sarti, Gabriele: Maybe not.

330
00:50:08.120 --> 00:50:13.099
David Bau: So there's things that are very positive and things that are very negative, and does white mean, like, close to zero?

331
00:50:13.390 --> 00:50:13.970
Sarti, Gabriele: Yup.

332
00:50:15.190 --> 00:50:22.230
Sarti, Gabriele: Yeah, so the highest tier is from, right? From is very, very, positively influencing barking.

333
00:50:25.280 --> 00:50:28.640
David Bau: Right. And V is very negatively influencing barking.

334
00:50:28.850 --> 00:50:29.410
Sarti, Gabriele: Yep.

335
00:50:30.490 --> 00:50:33.710
David Bau: But the word that has the least effect is dog.

336
00:50:34.270 --> 00:50:34.830
Sarti, Gabriele: Yep.

337
00:50:37.690 --> 00:50:40.960
David Bau: Which makes you smile. It seems like it's very counterintuitive, seems like…

338
00:50:41.440 --> 00:50:43.690
David Bau: Look at, look at Nikhil, he's laughing at this.

339
00:50:44.000 --> 00:50:45.630
Sarti, Gabriele: Yeah, exactly.

340
00:50:45.780 --> 00:50:54.950
Sarti, Gabriele: Yeah, I mean, that's… that's weird, right? That's exactly the opposite of what we would expect, right? And I think one of the…

341
00:50:55.460 --> 00:50:58.129
Sarti, Gabriele: One of the reasons for that is that

342
00:50:58.240 --> 00:51:04.640
Sarti, Gabriele: as humans, we tend to reason counterfactually, right? So when I'm asking you,

343
00:51:04.890 --> 00:51:21.760
Sarti, Gabriele: what would come after that? Like, can you stop the dog from barking, right? If you had to explain barking, you have the tendency to reason semantically about barking, right? So, like, barking dog are related words, so dog should receive a big importance, right?

344
00:51:22.020 --> 00:51:36.040
Sarti, Gabriele: But exactly, exactly as Jasmine is saying in the chat, eating could be an alternative word, right? So naturally, in a sense, we're contrasting in our head barking with some other plausible alternative that doesn't involve dogs, right?

345
00:51:37.250 --> 00:51:45.629
Sarti, Gabriele: Well, in practice, what attribution here is doing is just detecting relevance, right? And I could argue with you that

346
00:51:46.150 --> 00:52:04.450
Sarti, Gabriele: the from here is super important to predicting barking, because it's… it's the next previous word, and it's exactly defining that the verb should be in that form, right? Without from, we wouldn't have a present continuous verb there, right? So then it's… it's essential, right?

347
00:52:04.960 --> 00:52:24.780
Sarti, Gabriele: So then, how do we actually bring this closer to human intuition? So the idea that they had in this paper that I'm citing here from Graham Newbig and Kayo Yin is to have a contrastive attribution. And the way that you would do this is by contrasting two words.

348
00:52:24.910 --> 00:52:38.529
Sarti, Gabriele: So, the idea is very simple. Instead of taking the gradient with respect to a single probability, we can take it with respect to a difference in probabilities. Here, probability of barking versus probability of crying, for example.

349
00:52:38.730 --> 00:52:52.629
Sarti, Gabriele: And then the gradient that we get with respect to the input looks a lot more reasonable, if you ask me. So dog now finally has a meaning, and from is… doesn't matter much, because it would be a good choice for both verbs, right?

350
00:52:53.280 --> 00:53:11.989
Sarti, Gabriele: So, in their work, what they were showing is that by disentangling this kind of semantic factors from syntactic factors, you can improve simulatability, which is a bit like what we were seeing before in plausibility, so the ability of a human to actually,

351
00:53:12.380 --> 00:53:17.200
Sarti, Gabriele: Predict whether the prediction would change by changing a specific word, right?

352
00:53:18.930 --> 00:53:29.329
Sarti, Gabriele: So yeah, one thing that I want to stress here, here on the right, is that the attribution function, like the… sorry, the attributed function, in this case the difference in probability.

353
00:53:29.330 --> 00:53:39.119
Sarti, Gabriele: is fundamental to the way that you interpret what you're getting out, right? So in this case, we saw by taking this difference, you can interpret it as a

354
00:53:39.540 --> 00:53:51.069
Sarti, Gabriele: why this rather than something else, right? But here I make an even more abstract example. I could attribute the entropy of the final distribution, over the vocabulary.

355
00:53:51.070 --> 00:54:05.959
Sarti, Gabriele: And this could maybe tell me what in the input is driving the uncertainty in the model, right? Or the certainty in the model. So, in principle, like, the possibilities are endless, you know? You could attribute any kind of function

356
00:54:05.960 --> 00:54:23.990
Sarti, Gabriele: of your prediction, and the attribution scores would tell you different things, depending on what you're looking at, right? So, I think here, again, there's a lot of untested ground in this area, so people are just kind of starting out and digging into potential variants of that.

357
00:54:28.930 --> 00:54:33.990
David Bau: Yeah, and I mentioned before… We have a question from Arnav online, from the chat.

358
00:54:34.220 --> 00:54:35.540
Sarti, Gabriele: Oh, sorry.

359
00:54:36.120 --> 00:54:41.629
Sarti, Gabriele: Maybe these attribution methods are just not good enough for modeling 2D order effects.

360
00:54:41.720 --> 00:54:57.149
Sarti, Gabriele: Yeah, that's a… that's a good point. I have a slide later about actually modeling interactions of features, so there are some methods that are specifically meant for that, since many people ask in the class.

361
00:54:57.400 --> 00:55:03.959
Sarti, Gabriele: But yeah, they're very expensive, so definitely that's… that's also another problem why they haven't seen much usage.

362
00:55:03.960 --> 00:55:19.169
David Bau: So, the Gram Nubic method here, this cool contrastive method, it's like… but this is pretty simple. This is pretty lightweight. It's like a pretty cheap gradient to compute. That's neat. It also seems like… like, here they're applying it to a gradient method.

363
00:55:19.560 --> 00:55:30.359
David Bau: Well, it seems like you might be able to do this for some of the other methods you talked about, like the occlusion, you know, looking at attention, looking at LRP, all this stuff, like, you might be able to drop in.

364
00:55:30.910 --> 00:55:44.990
Sarti, Gabriele: I think they were trying, they were trying in their original paper also occlusion, like, this kind of setup for occlusion. I have personally already implemented the LRP with the contrastive attribution, so this works, I… it's tested.

365
00:55:44.990 --> 00:55:45.999
David Bau: You did it.

366
00:55:46.000 --> 00:55:46.780
Sarti, Gabriele: Yeah, yeah, yeah.

367
00:55:46.780 --> 00:55:48.469
David Bau: Oh, so you like it.

368
00:55:49.140 --> 00:55:49.500
David Bau: Oh, here's.

369
00:55:49.500 --> 00:56:02.930
Sarti, Gabriele: Yeah, yeah, yeah, I'm a big fan of this idea. For NLP, I think it's very valuable, because the output space is so large, right? You have so many tokens that it just makes sense to pin down exactly what you want to compare, yeah.

370
00:56:02.930 --> 00:56:04.819
David Bau: Nice, that's great, thanks. This is really helpful.

371
00:56:06.500 --> 00:56:23.409
Sarti, Gabriele: Great. Yeah, I just wanted to mention, so for component attribution, I think last lecture, or two lectures ago, you saw with David the causal mediation, right? So this might be familiar to you, this kind of setup. Eiffel Tower is located in Paris, right?

372
00:56:23.500 --> 00:56:38.440
Sarti, Gabriele: So when we introduced the in-seq library, we asked, but can we use contrastive attribution for approximating causal mediation, right? Can we use this contrastive, gradient

373
00:56:38.440 --> 00:56:49.529
Sarti, Gabriele: attribution to get saliency, not for the input tokens like we saw just now, but for all intermediate steps, right? And kind of see how much does it agree with causal mediation.

374
00:56:50.050 --> 00:57:00.490
Sarti, Gabriele: And our results was that, of course, this is much coarser and not, you know, not as sharp as causal mediation, but we did also see this kind of

375
00:57:00.490 --> 00:57:10.479
Sarti, Gabriele: early site that they were highlighting on the last subject token. So, to some degree, let's say our method was associating much more importance to the last token.

376
00:57:10.480 --> 00:57:18.620
Sarti, Gabriele: But it still found some structure that… that using causal mediation would have required a lot of ablations, right?

377
00:57:18.960 --> 00:57:32.440
Sarti, Gabriele: And the cool part here is that this is very efficient, right? We do a single forward pass, and we do one backward pass in which we get saliency values for all the nodes here in the graph.

378
00:57:33.060 --> 00:57:42.090
Sarti, Gabriele: And that's it, instead of doing sequence length, time, number of layers, forward passes to do… to estimate causal mediation.

379
00:57:42.340 --> 00:57:49.289
Sarti, Gabriele: So, actually, there is one very popular attribution method that has been used after that that is called attribution patching.

380
00:57:49.390 --> 00:57:55.740
Sarti, Gabriele: Which was introduced more or less at the same time as ours, but got a lot more traction.

381
00:57:55.740 --> 00:58:10.000
Sarti, Gabriele: And the idea is quite similar, it's just instead of having the contrastive outputs, they contrast inputs. So they change… they have two settings, they get gradients for the two settings, kind of like in causal mediation.

382
00:58:10.000 --> 00:58:17.800
Sarti, Gabriele: And then they just take the difference between the gradients. But the idea is pretty similar, kind of complementary, let's say.

383
00:58:21.490 --> 00:58:41.430
Sarti, Gabriele: So, now I want to move to, the other… the other work that you've seen, in the readings that was related to attributing context, with… with language models. So, our original contribution in this area was, was this PCOR, framework.

384
00:58:41.430 --> 00:58:57.850
Sarti, Gabriele: So the main driver for that was that we found that attribution methods are expensive if you apply them to the whole generation, right? We've seen already, if you had to build the table that I've seen and that I've shown before, for a very long generation.

385
00:58:57.850 --> 00:59:04.389
Sarti, Gabriele: That would be super expensive, so you kind of want to narrow down exactly on which steps you're interested in doing attribution.

386
00:59:04.830 --> 00:59:05.780
Sarti, Gabriele: And…

387
00:59:06.210 --> 00:59:15.669
Sarti, Gabriele: Secondly, the fact of ambiguity with large vocabularies, so the second drive is, this, you know, contrastive… need for contrastive explanations, right?

388
00:59:15.830 --> 00:59:22.359
Sarti, Gabriele: So what we propose is this plausibility evaluation of context reliance framework.

389
00:59:22.540 --> 00:59:36.760
Sarti, Gabriele: And the way that this works is simply in two steps. The first step is to identify in the generation which steps are more influenced by context, and then to focus on those

390
00:59:36.830 --> 00:59:48.159
Sarti, Gabriele: to do this contrastive attribution back to the context. So the final outcome of all of this is simply a pair of influential context tokens

391
00:59:48.420 --> 00:59:52.410
Sarti, Gabriele: Relating to some influenced-generated tokens, right?

392
00:59:53.360 --> 01:00:03.230
Sarti, Gabriele: So now, I'll give you a very quick overview of how this works. So let's assume our model, here it's an encoder-decoder, but it doesn't really matter. And,

393
01:00:03.540 --> 01:00:20.239
Sarti, Gabriele: we are considering a generation task, so English to Italian generation, in which we have a context that the model needs to use to do the task correctly. So let's say here you have, I ate the pizza, this is my context, it was quite tasty.

394
01:00:20.550 --> 01:00:32.119
Sarti, Gabriele: if I have to translate, it was quite tasty in Italian, I have to know whether the tasty is masculine or feminine, right? Depending on what I said before. In this case, pizza is feminine, so it needs to be one.

395
01:00:32.320 --> 01:00:33.260
Sarti, Gabriele: For example.

396
01:00:33.700 --> 01:00:39.489
Sarti, Gabriele: So this is the… what we call the contextual variant of the input.

397
01:00:39.590 --> 01:00:42.089
Sarti, Gabriele: Then we can have a non-contextual variant.

398
01:00:42.650 --> 01:00:53.179
Sarti, Gabriele: that is passed as is, and would predict something different, right? So here, for example, we could go with the masculine as a default, eramolto bueno, with a O at the end.

399
01:00:53.420 --> 01:00:57.659
Sarti, Gabriele: So what our method does is actually

400
01:00:57.710 --> 01:01:14.250
Sarti, Gabriele: is to take the contextual version of the output and force decode it in the non-contextual case by taking this kind of information theoretic metrics at every step of the generation. So here we enforce the same token at every step.

401
01:01:14.330 --> 01:01:22.430
Sarti, Gabriele: And then we look for which one of these tokens the distribution would be the most skewed by the absence of input context, right?

402
01:01:22.800 --> 01:01:36.089
Sarti, Gabriele: So, we would get a score per token that then can be, discretized in some, with some heuristics, for example, to get a label that is either positive or negative.

403
01:01:36.470 --> 01:01:48.820
Sarti, Gabriele: just to have a yes-no kind of perspective here. So in this case, we would find this last token that was generated is the one that is the most influenced by the context, right?

404
01:01:49.350 --> 01:01:51.660
Sarti, Gabriele: So, yeah.

405
01:01:52.750 --> 01:01:54.090
David Bau: Oh, it maybe saw me on mute.

406
01:01:54.090 --> 01:01:55.720
Sarti, Gabriele: You had it, Chris, yeah. Yeah, yeah, yeah.

407
01:01:55.720 --> 01:02:04.570
David Bau: Yeah, so you say 4C code, so is this just… so… which I didn't really understand the 4C code, so you're sort of making two predictions, and you're asking…

408
01:02:04.740 --> 01:02:09.710
David Bau: Let me think about this for a second.

409
01:02:09.840 --> 01:02:12.790
David Bau: So, you're, you're asking, what is…

410
01:02:13.880 --> 01:02:23.020
David Bau: Or, or, or, or, or… I see, so you're putting the, you're putting the prediction… Y hat…

411
01:02:24.580 --> 01:02:29.400
David Bau: In… and you're using the model to evaluate That prediction.

412
01:02:29.720 --> 01:02:30.689
David Bau: And, and then…

413
01:02:30.690 --> 01:02:31.180
Sarti, Gabriele: Exactly.

414
01:02:31.180 --> 01:02:34.289
David Bau: And then… and then you're… and then you're making a heat map over…

415
01:02:34.580 --> 01:02:38.650
David Bau: The evaluated tokens to say which one

416
01:02:39.670 --> 01:02:44.720
David Bau: is most unlikely. And are you… so are you doing that with KL divergence?

417
01:02:44.720 --> 01:02:51.939
Sarti, Gabriele: In this case, we tried several metrics, and KL divergence was the one that was leading to the best results.

418
01:02:51.940 --> 01:02:55.820
David Bau: We also try to contrast just the probability of the top token,

419
01:02:56.190 --> 01:03:07.579
David Bau: I see, I see. So it's not literally just the top token, it's just whatever the model was thinking there, you're taking that whole distribution, and then you're putting it, and you're saying, how different is that from what the model wants to think?

420
01:03:07.580 --> 01:03:08.060
Sarti, Gabriele: Exactly.

421
01:03:08.060 --> 01:03:16.580
David Bau: over in this situation. But… but you are… but for… on the input side, you have to feed in the whole sentence, and so you're just feeding in…

422
01:03:16.580 --> 01:03:17.070
Sarti, Gabriele: Yup.

423
01:03:17.200 --> 01:03:18.829
David Bau: the whole sentence on the inputs.

424
01:03:18.830 --> 01:03:37.400
Sarti, Gabriele: the two alternatives, right? The one with the context, the one without. Yeah, I think this first decode can be a bit misleading here, because the example is very short, right? But imagine if you had a long example that has many of these kind of keywords that were influenced by the context. The idea here is that I just wanted to express that you

425
01:03:37.400 --> 01:03:52.549
Sarti, Gabriele: you will always keep the two cases identical, so you're kind of, like, adding the… whatever is from the contextual case, regardless of what the non-contextual case would predict there, to ensure that the prefix is always matching, right?

426
01:03:52.550 --> 01:03:55.269
David Bau: Right, because the output becomes part of the input as you go auto-regression.

427
01:03:55.270 --> 01:04:02.280
Sarti, Gabriele: Exactly, exactly, and you have to match them, otherwise you would have a disagreement that might be mediated by different outputs, right?

428
01:04:03.130 --> 01:04:05.390
David Bau: Sorry to ask this question so fast, is it…

429
01:04:06.090 --> 01:04:09.390
David Bau: I want to let the students ask further if we've confused them.

430
01:04:11.370 --> 01:04:13.840
Sarti, Gabriele: I don't know if this is clear enough.

431
01:04:15.390 --> 01:04:17.040
David Bau: So the goal of this step

432
01:04:17.160 --> 01:04:19.409
David Bau: Is to basically get a heat map

433
01:04:19.580 --> 01:04:21.879
David Bau: Over these Ys, over the output.

434
01:04:22.680 --> 01:04:23.270
Sarti, Gabriele: Yup.

435
01:04:23.270 --> 01:04:24.470
David Bau: Yep, pretty much.

436
01:04:24.670 --> 01:04:37.370
Sarti, Gabriele: Yeah, so as I said, these would be continuous course, but then we would get, this kind of discrete yes-no labels, right? So the reason why we need these yes-no labels is for the step two.

437
01:04:37.930 --> 01:04:44.220
Sarti, Gabriele: Where this will, allow us to understand where to do the attribution, right?

438
01:04:44.500 --> 01:04:57.479
Sarti, Gabriele: So, the key, interesting thing that I think we introduced here is that, let's say that we… now we have our sequence, so this is the contextually generated output with these labels, right?

439
01:04:57.680 --> 01:05:09.730
Sarti, Gabriele: The key step here is that we want to force, the prefix that is the same for all the tokens that weren't found contextually sensitive, right?

440
01:05:10.050 --> 01:05:16.899
Sarti, Gabriele: But then to sample from the non-contextual setting, The alternative, right?

441
01:05:17.550 --> 01:05:29.020
Sarti, Gabriele: So, basically, here, this is just a data-driven way to get these contrastive pairs that then would allow us to do contrastive attribution, by exploiting the

442
01:05:29.400 --> 01:05:48.719
Sarti, Gabriele: the same model without the context to get what would be its prediction without the context, right? So now, we kind of bootstrap this minimal pair of, like, words that then can be… we can do the contrastive attribution that I showed before, so probability of one minus the other.

443
01:05:49.800 --> 01:05:53.840
Sarti, Gabriele: And propagate this throughout the model back to the input context.

444
01:05:54.110 --> 01:05:56.639
Sarti, Gabriele: So what we get here is, for example.

445
01:05:56.700 --> 01:06:14.659
Sarti, Gabriele: this continuous scores at the token level, like I showed before, and this, again, would be discretized to get pairs. So here, the way that we would read that is the pizza in the source is influencing buona, and also the pizza in the target is influencing buona, right?

446
01:06:16.260 --> 01:06:24.189
Sarti, Gabriele: is this clear enough? I know it's quite, complex as a structure.

447
01:06:29.260 --> 01:06:34.890
Sarti, Gabriele: Yeah, so what kind of this decision can we make off of these results?

448
01:06:36.230 --> 01:06:53.200
Sarti, Gabriele: I think the interesting part that you… like, the interesting idea here would be that you can apply this to any kind of task, let's say, in a bit of a blind way, right? Like, you have a generation from the model, you could just entirely, in a data-driven way, I run

449
01:06:53.200 --> 01:07:08.639
Sarti, Gabriele: my PCOR approach, I get these relations between inputs and outputs, and then you could explore the outputs, right? And form hypotheses, right? So we had kind of interesting examples that we found when we did that on machine translation.

450
01:07:09.060 --> 01:07:16.599
Sarti, Gabriele: Sadly, I don't have them here in the slides, but one example that I can mention that surprised me was that,

451
01:07:16.890 --> 01:07:28.039
Sarti, Gabriele: we had a text where the model, was receiving some information, like, the soccer match is at 10 AM, written, like, 10 semicolon 00am.

452
01:07:28.240 --> 01:07:39.750
Sarti, Gabriele: And then, it had a text that was saying something like, the match was a fierce competition, and it ended 26-0, right? For the blue team, for example.

453
01:07:39.840 --> 01:07:56.169
Sarti, Gabriele: And the model was deciding to format 26 to 0 with the semicolon, because the time in the context was using the same semicolon format to express the hour, right? Which is kind of a weird behavior, if you think about it. I would have never thought of that myself.

454
01:07:56.280 --> 01:08:09.119
Sarti, Gabriele: So I think that kind of highlights the importance of making this data-driven, right? So a big motivation for us was that most of this kind of evaluation here rely on this kind of

455
01:08:09.120 --> 01:08:25.249
Sarti, Gabriele: hypothesis-based, I don't know, I expect my mother to have a gender bias, so I craft my set of data with gender bias, and I test them or not, right? Well, here, you can just run it on anything, and then kind of post-talk, look at if there is something interesting there, right?

456
01:08:25.870 --> 01:08:29.530
Sarti, Gabriele: So yeah, that's… that's the overall idea.

457
01:08:30.979 --> 01:08:37.309
David Bau: That's… that's cool. And so, it's sort of… as a way to think about this is…

458
01:08:37.639 --> 01:08:40.169
David Bau: There's… there's just too much information.

459
01:08:40.509 --> 01:08:42.859
David Bau: in the All Pairs…

460
01:08:43.109 --> 01:08:53.249
David Bau: you know, sort of attribution, and you're doing things to try to winnow that down to a small number of edges that you really want to pay attention to, is that right?

461
01:08:53.250 --> 01:08:58.809
Sarti, Gabriele: Exactly. And I think, you know, more philosophically speaking, moving forward with, like.

462
01:08:58.810 --> 01:09:23.459
Sarti, Gabriele: interpreting, you know, very complex scenarios. I think this kind of, narrowing down what we really care for will be super important in the future, too. You know, even if we want to do, I don't know, mechanistic analysis of, you know, something sketchy is going on here that we didn't expect, you know. I think in reasoning or, like, agents, you know, I think this will become more and more important to kind of narrow down

463
01:09:23.500 --> 01:09:28.120
Sarti, Gabriele: what's going on there? So, yeah.

464
01:09:28.850 --> 01:09:32.130
Sarti, Gabriele: So that was the original idea here.

465
01:09:32.790 --> 01:09:41.419
Sarti, Gabriele: So… And the next reasonable step here was, wait, now we can connect outputs to inputs.

466
01:09:41.529 --> 01:09:48.089
David Bau: Did Jasmine get to ask her question? I see another text that flew by. Did you get to ask your question, Jasmine?

467
01:09:48.620 --> 01:09:49.140
jasminec: Yeah.

468
01:09:49.140 --> 01:09:49.819
Sarti, Gabriele: Yeah.

469
01:09:50.170 --> 01:09:54.450
Sarti, Gabriele: Yeah, yeah, yeah, I can, took it up, yeah. Okay, that's great.

470
01:09:55.240 --> 01:10:13.050
Sarti, Gabriele: Yeah, yeah, so the next reasonable step here was, well, now we can link outputs to inputs, can we use that to create citations, right? So this was very relevant, it was, yeah, a couple years ago, we didn't have, again, these models that could cite themselves well.

471
01:10:13.070 --> 01:10:15.590
Sarti, Gabriele: They weren't trained to do that,

472
01:10:16.100 --> 01:10:23.720
Sarti, Gabriele: So, our idea was, can we just, looking at the internals, understand how the inputs are influencing the generation, right?

473
01:10:24.150 --> 01:10:38.580
Sarti, Gabriele: So Mirage works exactly in the same way as Pecore. So here we are, our context that gets added or removed is the three documents that were retrieved by a retrieval system and added to the prompt.

474
01:10:38.580 --> 01:10:57.250
Sarti, Gabriele: and the functioning is exactly the same. We would see how these three documents shift the probability distribution of the model for specific tokens in the answer, and then trace this back to some specific tokens in one of the documents that are responsible for the shift.

475
01:11:00.620 --> 01:11:14.149
Sarti, Gabriele: So, I wanted to show this picture that was also in the paper that you were referred to by David. I think this is quite good evidence in relation to what Nikhil was asking before.

476
01:11:14.210 --> 01:11:23.309
Sarti, Gabriele: That even though it's not super clean, this is entirely gradient-based, so this is raw gradients for attribution.

477
01:11:23.480 --> 01:11:41.870
Sarti, Gabriele: And you can see that here on the x-axis, you have the five documents that were given as context, and every point here is a word within a document that receives an attribution score, right? So the y-axis is kind of like the attribution intensity for that word.

478
01:11:41.980 --> 01:11:59.320
Sarti, Gabriele: given the output here, 9 in the generation, right? So you can see that this quite cleanly points at the two… the exact match, $19 billion in the document 1, as a motivation for predicting $19 billion in the answer.

479
01:12:00.170 --> 01:12:07.909
Sarti, Gabriele: So yeah, even though it's not super clean, there is still, like, enough information to kind of, you know, cut out exactly what we want here.

480
01:12:07.920 --> 01:12:20.260
Sarti, Gabriele: And yeah, in the paper, we try different approaches, and we were using some heuristic, like, let's take the top 5%, or the top 20% of these tokens.

481
01:12:20.260 --> 01:12:30.120
Sarti, Gabriele: Based on their attribution scores. But we also tried calibration, so kind of, like, trying to select what would be a good threshold to match a set of gold labels

482
01:12:30.170 --> 01:12:45.160
Sarti, Gabriele: in a way that then we can just find this threshold value and then apply it, you know, to unseen documents, and that seemed to help. So, in general, if you have a gold annotated dataset with the citations that you want.

483
01:12:45.160 --> 01:12:50.250
Sarti, Gabriele: That probably would be a good way to select these thresholds.

484
01:12:53.980 --> 01:12:59.979
Sarti, Gabriele: There were some questions related to Mirage, maybe people that are here can ask them.

485
01:13:11.270 --> 01:13:14.659
Claire Schlesinger: I was asking what would happen

486
01:13:17.470 --> 01:13:22.940
Claire Schlesinger: if, like, Attribution saw two documents that are similar, but, like, different in tone.

487
01:13:23.110 --> 01:13:28.930
Claire Schlesinger: So, like, they may have, like, the same content, but presenting different viewpoints.

488
01:13:30.210 --> 01:13:44.449
Sarti, Gabriele: Yeah, that… yeah, I thought that was a very good question. I think it depends a lot on the model, right? So it might be that the model will find that, you know, a more explicit mention of something is more…

489
01:13:45.540 --> 01:13:59.579
Sarti, Gabriele: actionable to come to an answer, so it might be that that receives the higher importance. Actually, what I mentioned here is that recency is probably the most common trend that you see in language models, so if something is mentioned

490
01:13:59.580 --> 01:14:06.299
Sarti, Gabriele: several times. The last time… the last mention is probably gonna be the one that the model is mostly relying on.

491
01:14:06.350 --> 01:14:10.149
Sarti, Gabriele: But this all relates to…

492
01:14:10.210 --> 01:14:23.200
Sarti, Gabriele: this thing that we were saying before, attribution pertains only to the current context, right? So, like, the fact that this token received a high importance is not exactly a proxy for saying.

493
01:14:23.200 --> 01:14:31.219
Sarti, Gabriele: If this token wasn't there, then everything would change, because, like, in this case of redundant mentions, if these tokens disappeared.

494
01:14:31.220 --> 01:14:40.519
Sarti, Gabriele: then other tokens could take the attribution other place. So sometimes this can lead to this kind of misleading interpretations, right?

495
01:14:43.810 --> 01:14:44.600
Sarti, Gabriele: Yo.

496
01:14:47.060 --> 01:14:47.990
Sarti, Gabriele: And…

497
01:14:49.450 --> 01:14:50.790
David Bau: Is Aria here?

498
01:14:52.380 --> 01:14:57.030
Arya: Yeah, but I think, you've already answered my question, so…

499
01:14:57.030 --> 01:14:58.949
Sarti, Gabriele: Yeah, pretty, pretty related.

500
01:14:59.830 --> 01:15:00.600
Sarti, Gabriele: Yeah.

501
01:15:01.250 --> 01:15:03.069
Sarti, Gabriele: And what's this last one?

502
01:15:03.070 --> 01:15:03.939
David Bau: Should you?

503
01:15:10.360 --> 01:15:11.860
Sarti, Gabriele: Not here, I think.

504
01:15:13.110 --> 01:15:14.849
David Bau: I'm sure he's here, but maybe, maybe in.

505
01:15:14.850 --> 01:15:15.359
Sarti, Gabriele: Got it.

506
01:15:15.600 --> 01:15:17.469
Sarti, Gabriele: I'm not sure.

507
01:15:18.050 --> 01:15:19.420
David Bau: Oh, Mike is not working.

508
01:15:19.420 --> 01:15:23.439
Sarti, Gabriele: Okay, okay. I can just summarize, yeah,

509
01:15:23.660 --> 01:15:38.659
Sarti, Gabriele: Yeah, so the idea here was, is this procedure of, like, forcing the prefix when… when considering this difference from the context, actually breaking the… the Mirage framework, right, in a sense, because you're forcing a…

510
01:15:38.660 --> 01:15:50.850
Sarti, Gabriele: a prefix of the output that is maybe not what the model would generate when the context was absent, right? So definitely, this leads to some potentially out-of-distribution behavior there.

511
01:15:51.040 --> 01:15:58.779
Sarti, Gabriele: But considering that that's mostly used to select cases where the context was influential,

512
01:15:59.220 --> 01:16:07.680
Sarti, Gabriele: Actually, this can lead to some interesting things, like, if something becomes explicitly mentioned in the output.

513
01:16:07.780 --> 01:16:21.530
Sarti, Gabriele: then it might be that whatever comes after is now relying on the output of the model, rather than relying on the context, right? So maybe at first you need to rely on the context, but from there onwards.

514
01:16:21.530 --> 01:16:29.390
Sarti, Gabriele: Now you're just relying on the output, so all the subsequent mentions would not be picked out, right, as context-sensitive.

515
01:16:29.390 --> 01:16:32.860
Sarti, Gabriele: So I think that's actually something interesting.

516
01:16:33.010 --> 01:16:36.150
Sarti, Gabriele: That actually reflects how the model operates, right?

517
01:16:37.980 --> 01:16:39.770
Sarti, Gabriele: Yeah.

518
01:16:42.450 --> 01:16:43.620
Sarti, Gabriele: Alright.

519
01:16:44.130 --> 01:16:47.470
Sarti, Gabriele: What I mean?

520
01:16:47.900 --> 01:16:57.910
Sarti, Gabriele: Yeah, and I just wanted to show you, so we have an API for, PCOR inside NSYC now, so that's… it's pretty…

521
01:16:57.910 --> 01:17:08.590
Sarti, Gabriele: convenient to use, and it's, you know, it can be used within a Jupyter notebook, it's quite nice. So we can have a look at this example together, I think it's quite, quite informative.

522
01:17:09.860 --> 01:17:16.610
Sarti, Gabriele: So here, the input that the model receives is when was the most successful player in NBA history born.

523
01:17:16.920 --> 01:17:33.829
Sarti, Gabriele: And this is not a great model, it's like a 300 million parameter model, so it's… it's pretty bad at doing language modeling, so here the model is predicting 2015-2016, but then it's also saying the most successful player in NBA history is Steven John

524
01:17:34.310 --> 01:17:38.050
Sarti, Gabriele: Something, okay? So…

525
01:17:38.260 --> 01:17:46.729
Sarti, Gabriele: we set our threshold when we apply the method, and we found that John, in this case, is one of these tokens that is context-sensitive, right?

526
01:17:47.140 --> 01:17:52.329
Sarti, Gabriele: So, if we open this toggle here, what we see is

527
01:17:52.580 --> 01:17:59.939
Sarti, Gabriele: three documents that the model received that were appended to the prompt, right? So this is a rug setup, kind of like in Mirage, right?

528
01:18:00.450 --> 01:18:10.210
Sarti, Gabriele: So, you can see that the tokens within these documents are colored based on how influential they were towards the prediction of John here.

529
01:18:10.420 --> 01:18:19.080
Sarti, Gabriele: And we can see that the most influential ones are actually Steve and John, right? Which makes sense. So it means the third document here

530
01:18:19.240 --> 01:18:23.890
Sarti, Gabriele: is what is driving the prediction of Steven John here.

531
01:18:24.730 --> 01:18:35.569
Sarti, Gabriele: But I think even more informative, is the fact that here you can also see what the model would have predicted in the non-contextual case. So here, the model would have said, Stephen Kerr

532
01:18:35.670 --> 01:18:43.170
Sarti, Gabriele: Which is probably Stephen Curry, right? I'm not a basketball expert, but I guess that would make sense.

533
01:18:43.530 --> 01:18:54.449
Sarti, Gabriele: But, it was kind of like, you know, sidetracked by the presence of this Stephen John here in context, or towards predicting Stephen John here.

534
01:18:54.710 --> 01:18:59.970
Sarti, Gabriele: So, I think this is interesting, because basically this example is pointing at the fact that this model

535
01:19:00.150 --> 01:19:02.400
Sarti, Gabriele: Is over-relying

536
01:19:02.520 --> 01:19:19.030
Sarti, Gabriele: On the context versus its previous memorized information. So it probably would have said something reasonable if only using memory, but the context has such a big influence on what it's predicting that it decided to go for something that maybe is not ideal here, right?

537
01:19:20.630 --> 01:19:32.670
Sarti, Gabriele: So, this is the kind of visualization you can get in a notebook. So, in the NSICK repository, we have a notebook with exactly this example, so to reproduce exactly this.

538
01:19:32.790 --> 01:19:36.070
Sarti, Gabriele: And… and some analysis on reasoning, also.

539
01:19:37.480 --> 01:19:40.910
Sarti, Gabriele: So yeah, so it's a new version that we released some time ago.

540
01:19:42.290 --> 01:19:44.890
Sarti, Gabriele: Any question on that?

541
01:19:50.260 --> 01:19:51.260
Sarti, Gabriele: Nope.

542
01:19:53.410 --> 01:19:54.440
Sarti, Gabriele: Alright.

543
01:19:54.690 --> 01:19:57.789
David Bau: You know, I'll be interested in, you know.

544
01:19:58.370 --> 01:20:05.390
David Bau: whether, you know, whatever methods get adopted to different research projects. You know, each one of these methods is…

545
01:20:05.730 --> 01:20:10.080
David Bau: you know, exposing something different, so I'll just be really curious. Yeah.

546
01:20:10.080 --> 01:20:10.550
Sarti, Gabriele: I know that.

547
01:20:10.550 --> 01:20:14.180
David Bau: And I think it's fair, if people are like, oh, I, you know, I might…

548
01:20:14.310 --> 01:20:26.210
David Bau: use some of these methods for my research project for people to just ask now, while we have Gabriel here, to sort of share his wisdom or opinions about different possible applications.

549
01:20:27.500 --> 01:20:28.100
Sarti, Gabriele: Yeah.

550
01:20:29.460 --> 01:20:33.479
Sarti, Gabriele: I mean, I just want to mention that my perspective on that is, like.

551
01:20:34.200 --> 01:20:44.500
Sarti, Gabriele: this is great, like, if you have this kind of setup, right? My next question would be, well, I found that Steven John now is influencing John, right?

552
01:20:44.660 --> 01:20:58.249
Sarti, Gabriele: let's look at the internals, right? Let's do, like, our circuit analysis, let's do our, you know, causal mediation, but now we have an anchor point, you know, we have some behavior that we identified that is there, right?

553
01:20:58.250 --> 01:21:07.910
Sarti, Gabriele: And that would have been much more painful to do had we had to do causal mediation on the full sequence of documents here, right? So I think this is…

554
01:21:08.060 --> 01:21:15.510
Sarti, Gabriele: great to get started on, like, finding interesting phenomena that then you can dig deeper with the mechanistic toolkis, right? So…

555
01:21:16.070 --> 01:21:21.329
Sarti, Gabriele: Yeah, so if you intend to use these kind of things, that's probably my suggestion.

556
01:21:22.610 --> 01:21:23.310
Sarti, Gabriele: Yeah.

557
01:21:25.600 --> 01:21:34.480
Sarti, Gabriele: And… Oh, yeah, there were a couple final slides here, the first was about interactions.

558
01:21:34.900 --> 01:21:38.750
Sarti, Gabriele: So here the idea is,

559
01:21:38.960 --> 01:21:53.859
Sarti, Gabriele: like, as many of you asked about this, can we model this kind of second-order effects, interactions? And indeed, there are several methods to do that. There is this, shape, shapely interaction index.

560
01:21:53.860 --> 01:22:08.410
Sarti, Gabriele: Which the idea here is you try pairs of… groups of features going… increasing in size with these groups to understand when a group is minimal to predict some behavior. And then you have also gradient-based methods, so this…

561
01:22:08.410 --> 01:22:22.690
Sarti, Gabriele: Hessian or integrated Hessian are basically the equivalent of gradients, but you're taking the second-order derivative, so you're taking, like, which other factors are more influential for the gradient of that factor to get to this magnitude, right?

562
01:22:23.150 --> 01:22:26.010
Sarti, Gabriele: So, in this case,

563
01:22:26.010 --> 01:22:45.920
Sarti, Gabriele: yeah, it's the equivalent for interactions. The only problem with all of these methods is that they're quite expensive, because you're potentially, you know, estimating all possible interaction with all possible, groups, so that could become very expensive, unless you start by some, you know, assumption of, like, how this group should be formed.

564
01:22:45.960 --> 01:22:55.789
Sarti, Gabriele: I think some people were working at the level of syntax, for example, right? If something belongs to the same phrase, then it makes sense that they are kind of part of the same group.

565
01:22:55.870 --> 01:23:02.279
Sarti, Gabriele: But yeah, I think that's the only actionable way to study these kind of things.

566
01:23:06.630 --> 01:23:22.930
Sarti, Gabriele: And the final thing… I really like this question from Jasmine that was, like, why are we even doing attribution? Like, what's the… what's the final goal, right? What's the end game of attribution? So I think it's a good way to kind of close on that.

567
01:23:24.560 --> 01:23:35.719
Sarti, Gabriele: And I just wanted to highlight some of the works that use that. I think one of the very convincing use cases was from people that do protein design.

568
01:23:35.840 --> 01:23:51.219
Sarti, Gabriele: I thought this was from a presentation that I cleared, two years ago. I found it very interesting because there were these people from Genentech, that do this kind of, lab protein design, kind of like lab-in-the-loop protein design.

569
01:23:51.310 --> 01:24:04.679
Sarti, Gabriele: And when they were giving their keynote, I was pretty surprised to find out that they were saying, yeah, you know, if we were to identify exactly which amino acids are responsible for, like, a specific gene expression.

570
01:24:04.680 --> 01:24:16.809
Sarti, Gabriele: that would be very painful to do, like, by trying out all possible things. So we just do gradient-based attribution on that… on the protein sequence, and we… and we use that, right? So I found it quite compelling as an idea.

571
01:24:16.880 --> 01:24:20.619
Sarti, Gabriele: So there are some interesting direction here.

572
01:24:21.770 --> 01:24:32.270
Sarti, Gabriele: And I feel like now people are starting to use these kind of methods also for other, more actionable purposes. So here, there is a recent paper that we're trying to

573
01:24:32.290 --> 01:24:50.300
Sarti, Gabriele: use attribution to lead… to steer generation towards looking at specific regions of interest. So, like, if you define a constraint in the prompt, you could decide which tokens to pick based on which tokens are relying the most on that part of the prompt where you're defining your constraint.

574
01:24:50.400 --> 01:24:58.230
Sarti, Gabriele: Yeah, it's interesting. I would have some criticism about this, probably, but I think it's potentially promising.

575
01:24:59.570 --> 01:25:13.370
Sarti, Gabriele: And finally, one thing that I wanted to highlight specifically because, it feels like in the mechanistic community, people think, oh, you know, attribution is a thing for the past, and now, you know, we don't do this anymore.

576
01:25:13.380 --> 01:25:26.719
Sarti, Gabriele: We did that on images, you know, in the 2010s, but actually, all the new methods that do circuit finding, so I think you have a class, David, in the upcoming weeks about that, right?

577
01:25:26.800 --> 01:25:41.359
Sarti, Gabriele: But all these kind of methods that aim to find, you know, how components interact towards a prediction are using, effectively, some form of attribution, right? Most of them now are gradient-based, actually, some form of integrated gradients, or…

578
01:25:42.820 --> 01:25:52.679
Sarti, Gabriele: So I think these questions are still, you know, very, very recent and very important. It's just maybe they kind of became so…

579
01:25:52.880 --> 01:26:03.300
Sarti, Gabriele: consolidated, that now they… they moved away from the… from the spotlight, kind of, and now they're just the kind of tools that we use without even thinking about it, which is great, I guess.

580
01:26:03.980 --> 01:26:04.670
Sarti, Gabriele: Yeah.

581
01:26:05.340 --> 01:26:06.700
David Bau: It was still salient.

582
01:26:06.910 --> 01:26:10.759
Sarti, Gabriele: Exactly, still important to know what's salient, yeah.

583
01:26:13.890 --> 01:26:22.009
Sarti, Gabriele: So, yeah, so I think that's it for me. Thank you so much for having me, and if you have any questions, I'm here to answer, of course.

584
01:26:24.540 --> 01:26:25.620
David Bau: Thanks, Gabrielle.

585
01:26:25.750 --> 01:26:27.819
David Bau: It's really, really, really helpful.

586
01:26:28.790 --> 01:26:29.360
Sarti, Gabriele: Great.

587
01:26:29.680 --> 01:26:30.730
Sarti, Gabriele: I'm glad.

588
01:26:33.070 --> 01:26:35.110
David Bau: Yay! Everybody liked it.

589
01:26:36.220 --> 01:26:46.649
David Bau: So, yeah, so it's… so it's actually… so one, you know, it's… you will sort of keep up with the theme of… of what we're doing. I, you know, encourage everybody to give a try.

590
01:26:46.840 --> 01:27:00.220
David Bau: to, you know, these input attribution methods, there's… there's, as you can see, there's a lot. Yes, I agree with what Jasmine says. It feels particularly broadly useful in situations with high-stakes decisions.

591
01:27:00.440 --> 01:27:14.769
David Bau: You know, my intuition is a lot of these interdisciplinary questions that you guys are asking, where you have a lot of organic text, and and you're asking, how does a model thinking during… during processing of complex text.

592
01:27:15.430 --> 01:27:25.709
David Bau: I feel like these methods are really well suited for it, which is why I wanted to make sure we cover it before spring break. And so,

593
01:27:26.020 --> 01:27:34.789
David Bau: So yeah, so I think it'll be great. I'll be interested to see if you're able to find anything interesting in your projects.

594
01:27:34.900 --> 01:27:49.200
David Bau: Using these methods. And we have, we have Gabriel here this semester, so… Right. So, you know, so take advantage of him. He's, he's directly helping out on one of the teams, but, but, you know, he's a general resource for the class.

595
01:27:49.510 --> 01:27:50.100
Sarti, Gabriele: Yeah.

596
01:27:51.080 --> 01:28:00.590
Sarti, Gabriele: Yeah, I'll also share the links, of course, to this library that we built in the Discord channel, so that you can have a look.

597
01:28:02.590 --> 01:28:03.480
David Bau: Great.

598
01:28:04.010 --> 01:28:05.300
David Bau: Okay, guys.

599
01:28:05.650 --> 01:28:07.370
David Bau: Stay safe in the snow out there.

600
01:28:07.370 --> 01:28:07.990
Sarti, Gabriele: B.

601
01:28:07.990 --> 01:28:14.519
David Bau: And we'll see you… I'm not sure if we'll see you in person on Thursday or not, hopefully the weather will cooperate, and we'll see you in person on Thursday.

602
01:28:17.000 --> 01:28:17.890
Sarti, Gabriele: Thank you, bye-bye.

603
01:28:17.890 --> 01:28:18.740
Armita Kazeminajafabadi: Thank you.

