WEBVTT

1
00:00:00.070 --> 00:00:01.080
David Bau: my favorite reaction.

2
00:00:02.070 --> 00:00:13.300
David Bau: Welcome to Week 5.

3
00:00:13.740 --> 00:00:14.520
David Bau: Definitely.

4
00:00:14.890 --> 00:00:16.940
David Bau: How many weeks are there in the semester?

5
00:00:18.110 --> 00:00:21.470
David Bau: new packs.

6
00:00:21.590 --> 00:00:27.870
David Bau: Welcome. Is Zoom working? Can somebody on Zoom, hear my audio? Say hello?

7
00:00:28.110 --> 00:00:29.610
Veronica C Perez: Yeah, we can hear you.

8
00:00:29.620 --> 00:00:34.949
David Bau: Yeah, that's great, thanks. Okay, cool. So, so today, we'll talk about

9
00:00:35.500 --> 00:00:38.350
David Bau: What… so, in my opinion.

10
00:00:38.990 --> 00:00:41.660
David Bau: This is, like, the core experimental zone.

11
00:00:41.970 --> 00:00:43.080
David Bau: design.

12
00:00:43.250 --> 00:00:44.670
David Bau: category?

13
00:00:44.780 --> 00:00:49.490
David Bau: that I think makes this type of… analysis.

14
00:00:49.790 --> 00:00:52.600
David Bau: Possible, and what sets it apart.

15
00:00:53.010 --> 00:01:01.570
David Bau: from most… Fictional neuroscience, or… oGnoscience.

16
00:01:01.780 --> 00:01:06.789
David Bau: Because… We could actually… Take control over

17
00:01:07.100 --> 00:01:12.849
David Bau: these neural networks. When we run them and they… they think about something, they do some cognitive activity.

18
00:01:13.120 --> 00:01:19.400
David Bau: Not only can we observe every single calculation that they do.

19
00:01:20.030 --> 00:01:31.699
David Bau: But we can trace and debug those calculations. We can do what scientists call causal experiments. We can ask, what if? What if it worked differently?

20
00:01:31.970 --> 00:01:34.650
David Bau: And then we can actually do that.

21
00:01:35.030 --> 00:01:38.519
David Bau: and… and see what happens. And so that's… that's different from…

22
00:01:39.370 --> 00:01:42.269
David Bau: What most neuroscientists and cognitive scientists can do?

23
00:01:42.420 --> 00:01:43.230
David Bau: Right?

24
00:01:43.380 --> 00:01:47.360
David Bau: For most cognitive systems, you just work with them as a black box, you think.

25
00:01:47.570 --> 00:01:49.560
David Bau: You know, try to figure out what a person

26
00:01:49.920 --> 00:01:54.099
David Bau: It's thinking by interrupting them, by saying, okay.

27
00:01:54.310 --> 00:01:58.889
David Bau: Look at this image for… 28 milliseconds.

28
00:01:59.250 --> 00:02:06.380
David Bau: And what was your impression from it? And trying to interrupt the process and trying to get a little bit of insight. That's what people have done for decades.

29
00:02:06.530 --> 00:02:07.700
David Bau: But…

30
00:02:08.090 --> 00:02:16.060
David Bau: In, in artificial neural networks, we have this amazing luxury of being able to go into the calculation and do things directly, so…

31
00:02:16.210 --> 00:02:23.709
David Bau: I'm gonna… I'm gonna describe to you the basic form of that experiment that we can run, and there's…

32
00:02:23.990 --> 00:02:29.399
David Bau: you know, many dozens of variations on them. Some of you already asked some of the variations.

33
00:02:29.580 --> 00:02:32.179
David Bau: In reading our book, I'm happy to…

34
00:02:32.570 --> 00:02:39.120
David Bau: Happy to chat about them, but I want to make sure everybody understands the basic part. That's what I'm going to spend most of the time talking about.

35
00:02:39.900 --> 00:02:45.550
David Bau: Seth? So, okay, so the idea is…

36
00:02:47.180 --> 00:02:52.930
David Bau: To do causal tracing, to use causality to trace the calculations.

37
00:02:53.120 --> 00:02:58.569
David Bau: Through the network, to try to get an insight for not only what information is there.

38
00:02:58.870 --> 00:03:01.579
David Bau: But what information is being used?

39
00:03:02.730 --> 00:03:04.800
David Bau: To determine the outcome.

40
00:03:05.140 --> 00:03:06.610
David Bau: To determine the output.

41
00:03:06.880 --> 00:03:14.330
David Bau: The problem with just seeing information as present is that These are huge neural networks.

42
00:03:14.460 --> 00:03:17.880
David Bau: They have thousands of neurons, and so if you ask

43
00:03:17.990 --> 00:03:25.610
David Bau: the question. Oh, is there some information present about the input, about the stimulus in this neural network? The answer is almost always

44
00:03:26.100 --> 00:03:34.559
David Bau: Yes, absolutely. Tons of information is present. But then, if you ask, well, which parts of that information are actually being used?

45
00:03:34.910 --> 00:03:36.659
David Bau: To make a decision.

46
00:03:36.900 --> 00:03:38.709
David Bau: Well, this was unknown.

47
00:03:38.830 --> 00:03:41.650
David Bau: Until we were able to run these types of experiments.

48
00:03:41.950 --> 00:03:47.520
David Bau: People used to think, well, yeah, all the information's present, probably all of it's being used.

49
00:03:47.900 --> 00:03:50.479
David Bau: But actually, these networks are quite picky.

50
00:03:50.940 --> 00:03:57.870
David Bau: They often… the very small subset of information to use, or… output.

51
00:03:58.080 --> 00:04:00.350
David Bau: You know, just sort of first order.

52
00:04:00.860 --> 00:04:09.019
David Bau: And, and so we'll talk about… so this is… this is the experimental design that lets you do it. So… so when we get an understanding what it is, it's…

53
00:04:09.160 --> 00:04:13.519
David Bau: It's… it's an ability to make changes, so how do you know that…

54
00:04:13.890 --> 00:04:21.750
David Bau: the network is using some piece of information? Well, because you can physically go And change that information.

55
00:04:21.940 --> 00:04:24.309
David Bau: And then see that the network does something different.

56
00:04:25.060 --> 00:04:29.919
David Bau: Right? And most of the time, when you go and you change some random information in the network.

57
00:04:30.740 --> 00:04:36.349
David Bau: It won't Have an effect, or won't have Any sensible effect?

58
00:04:36.670 --> 00:04:42.339
David Bau: And, And so, so yeah, so this is kind of an interesting thing, but…

59
00:04:42.530 --> 00:04:45.350
David Bau: Right? Does that make sense? Okay, so let me make this concrete.

60
00:04:45.730 --> 00:04:51.089
David Bau: So I think that I've shown this diagram before. This is the inside of the transformer, remember? Right.

61
00:04:51.280 --> 00:04:58.719
David Bau: And so I've set this transformer on its side, in the standard way, I like to draw it, with layers.

62
00:04:59.520 --> 00:05:05.400
David Bau: Of neurons sort of going from left to the right. So these are the earliest neurons.

63
00:05:05.640 --> 00:05:08.159
David Bau: And those are the latest neuron neural layers.

64
00:05:08.580 --> 00:05:10.750
David Bau: And then you have all the information coming in.

65
00:05:10.920 --> 00:05:13.009
David Bau: Here, as encoded tokens.

66
00:05:13.120 --> 00:05:16.320
David Bau: So, a vector for miles, and a vector for Davis, and so on.

67
00:05:16.490 --> 00:05:23.979
David Bau: And so… so yeah, so it was… it was sort of amazing to people when they trained language models and they were able to recall facts.

68
00:05:24.480 --> 00:05:36.900
David Bau: about the world, like, what instrument Miles Davis plays, or what instrument Yo-Yo Ma plays, or whatever, right? It's kind of funny for this language model to know something about yo-Yo Ma, but it does.

69
00:05:37.440 --> 00:05:41.729
David Bau: Okay, so how do we… so, but then it leads you to this question.

70
00:05:42.170 --> 00:05:45.180
David Bau: Where is that knowledge?

71
00:05:46.080 --> 00:05:50.730
David Bau: in the network. Is it sort of… Everywhere?

72
00:05:51.000 --> 00:05:53.520
David Bau: If you probe for where's the information.

73
00:05:53.840 --> 00:05:57.880
David Bau: And we'll talk about different ways of probing Next week.

74
00:05:58.190 --> 00:06:04.090
David Bau: And so, but, you know, did you vote for the admission? The answer will be, yeah.

75
00:06:04.600 --> 00:06:10.959
David Bau: that information is sort of everywhere. Maybe it's stronger in some places than weaker in other places, but the information is sort of everywhere.

76
00:06:11.210 --> 00:06:19.880
David Bau: And, But… But… but… so does that mean that the calculation is…

77
00:06:20.240 --> 00:06:22.709
David Bau: This, this information comes in.

78
00:06:23.490 --> 00:06:27.309
David Bau: And then it goes into the neural network, and it's sort of like an information gas.

79
00:06:27.780 --> 00:06:33.399
David Bau: The only thing that matters is, like, the pressure of information in there. It's like, oh, we put a little bit of extra…

80
00:06:33.670 --> 00:06:41.620
David Bau: hydrogen and a little bit of Miles Davis, and it's just mixed, you know, bouncing around in this, you know, formless gas out there, and then mysteriously, like.

81
00:06:41.830 --> 00:06:47.009
David Bau: outsquorts the trumpet out the end, who knows how, right? So that… so I feel like, you know.

82
00:06:47.200 --> 00:06:49.340
David Bau: Without this type of research.

83
00:06:49.680 --> 00:06:52.430
David Bau: That's been, more or less, the mental model.

84
00:06:52.550 --> 00:06:57.279
David Bau: for how black box systems work. It's like the mysterious information gas.

85
00:06:57.730 --> 00:07:03.530
David Bau: Right? But what you find is, if you do causal experiments on the inside, you see a lot of structure.

86
00:07:03.700 --> 00:07:09.449
David Bau: It's like an embarrassment of structure. And so, and so let me show you how to do that.

87
00:07:10.160 --> 00:07:18.530
David Bau: Okay, so the basic idea is that we're gonna be like little transplant surgeons. We're gonna transplant data around.

88
00:07:18.960 --> 00:07:21.229
David Bau: To identify what the effects of the data are.

89
00:07:23.980 --> 00:07:29.870
David Bau: So… So, to transplant things, you need to have to… Patience.

90
00:07:30.110 --> 00:07:30.880
David Bau: Right.

91
00:07:31.000 --> 00:07:33.719
David Bau: So, we, we clone our neural network.

92
00:07:33.970 --> 00:07:36.409
David Bau: And we run it a second time.

93
00:07:37.360 --> 00:07:41.940
David Bau: So, you run it twice, And you have to have something different.

94
00:07:42.060 --> 00:07:43.880
David Bau: About the second time you ran.

95
00:07:44.650 --> 00:07:57.140
David Bau: And so, so here, I'll have a very similar sentence. Miles Davis plays the trumpet, maybe here I'll have another sentence, like, Yo Yo Ma plays the… something, and then maybe it'll output cello.

96
00:07:57.390 --> 00:08:06.030
David Bau: Something like that. Or I could put in… the form of the experiment that was in the original Rome paper that you guys read was, you put noise in here.

97
00:08:06.190 --> 00:08:08.530
David Bau: And then it just outputs some garbage.

98
00:08:09.080 --> 00:08:13.820
David Bau: Right? Say, blah blah blah plays the… And I, you know…

99
00:08:14.040 --> 00:08:17.740
David Bau: It has some perturbation of a piano or something, so I have no idea.

100
00:08:18.230 --> 00:08:19.200
David Bau: Make sense?

101
00:08:19.630 --> 00:08:27.879
David Bau: And so, so, so, so, but all that matters, though, is that it's not trumpet.

102
00:08:28.290 --> 00:08:32.240
David Bau: But it's a very… but it's a very similar… neural network.

103
00:08:32.570 --> 00:08:35.480
David Bau: It's, like, a very similar brain here.

104
00:08:35.650 --> 00:08:40.569
David Bau: It's similar enough that maybe we could transplant pieces from one to the other.

105
00:08:40.740 --> 00:08:43.059
David Bau: And then, see what happens.

106
00:08:43.179 --> 00:08:51.759
David Bau: Why do you call it corrupted if you're, like, for instance, if you said, like, yo, you don't walk, miss the cello, like, that sounds about right. Corrupted usually means, like.

107
00:08:51.890 --> 00:08:57.639
David Bau: Not right. Yeah, this… so this comes from this experimental design.

108
00:08:57.920 --> 00:09:00.229
David Bau: comes from medicine.

109
00:09:00.490 --> 00:09:06.980
David Bau: Where there's… they usually try to distinguish between A and V.

110
00:09:07.120 --> 00:09:13.890
David Bau: by giving it a more meaningful name than, like, A and B, they'll say something like, the healthy patient.

111
00:09:14.170 --> 00:09:16.749
David Bau: And the sick patient.

112
00:09:17.060 --> 00:09:20.400
David Bau: Right? Or that's healthy and diseased, or something like that.

113
00:09:20.560 --> 00:09:26.840
David Bau: And they'll say, oh, you know, what's the difference between our healthy population and the diseased population? Is it their diet?

114
00:09:26.940 --> 00:09:29.510
David Bau: Well, let's take some diet…

115
00:09:29.970 --> 00:09:42.760
David Bau: activity that the healthy patient has, and we'll force the disease population to have that thing. We'll, like, come to their house every day, and we'll feed them food, right? Or something like that. So they'll do a causal intervention.

116
00:09:42.970 --> 00:09:47.480
David Bau: And they'll say, oh, look, that was causal, that, like, cured their disease.

117
00:09:47.580 --> 00:09:55.750
David Bau: Right? And so… so what we're… so what we have here is… just so we're imitating the old medical setup.

118
00:09:55.910 --> 00:10:00.089
David Bau: You know, we, we've, we've sort of adopted

119
00:10:00.300 --> 00:10:05.510
David Bau: The term, this is the, clean run, sort of the healthy run.

120
00:10:05.640 --> 00:10:12.490
David Bau: Right? And then the one that you're changing is, like, the disease run, the one that you're trying… trying out, testing the treatment on.

121
00:10:12.920 --> 00:10:13.650
David Bau: Right.

122
00:10:13.780 --> 00:10:23.000
David Bau: And, I don't know. I do think that this is maybe a little confusing, so sometimes I use source and target, so there's, like, the source.

123
00:10:23.310 --> 00:10:26.649
David Bau: So when you're thinking of cognitive things, it's like the source of the idea.

124
00:10:27.090 --> 00:10:31.150
David Bau: That you're trying to test. Like, this thing has some idea.

125
00:10:31.330 --> 00:10:40.970
David Bau: And then there's a target, which is… it has, like, the context and everything, but you think it doesn't have the idea yet, and you're looking to see if you can find a physical instantiation of the idea.

126
00:10:41.450 --> 00:10:45.240
David Bau: To put down here, to get the target to have the idea.

127
00:10:45.420 --> 00:10:46.700
David Bau: that it wouldn't depend.

128
00:10:46.880 --> 00:10:51.730
David Bau: That make sense? So… so this target, it doesn't have the idea of Miles Davis.

129
00:10:53.160 --> 00:10:59.019
David Bau: And we're gonna… and we'll treat it as if That was a disease.

130
00:10:59.170 --> 00:11:01.320
David Bau: You know…

131
00:11:01.470 --> 00:11:21.290
David Bau: We're gonna… we're gonna see if we can fix that, and get it to think about Miles Davis, right? And now, there's some obvious things you could do. Like, you could… you could say Miles Davis on the input, and that would… that would get to, say, Trumpet here on the output, right? But… but what we're gonna do is we're gonna not do that, because we already know that that will happen, right?

132
00:11:21.450 --> 00:11:37.240
David Bau: We're gonna hold that input steady at yo-Yo Ma, or whatever our corrupted input is. And then what we're gonna ask is we're gonna ask, is there something in the middle of the brain that you can move down that causes it

133
00:11:37.420 --> 00:11:40.090
David Bau: This switch from… corruption.

134
00:11:40.350 --> 00:11:45.840
David Bau: to something meaningful. And what will happen is that normally, When you do this.

135
00:11:46.740 --> 00:11:55.659
David Bau: the output is also… it stays corrupted. It stays… it might stay cello or something like this, right? But once in a while, when you hit the right spot.

136
00:11:56.450 --> 00:12:00.330
David Bau: The information will flow through here, And you'll get trumpet.

137
00:12:00.910 --> 00:12:02.000
David Bau: That make sense?

138
00:12:02.190 --> 00:12:07.849
David Bau: So, it's just like, oh, once in a while, like, so for most of the interventions you do for a disease patient.

139
00:12:08.300 --> 00:12:15.320
David Bau: it doesn't do any good. I wonder if the reason that they're overweight is because of their library card.

140
00:12:15.940 --> 00:12:17.940
David Bau: You know, we're gonna go and distribute

141
00:12:18.680 --> 00:12:25.009
David Bau: you know, different library cards to everybody. Maybe if they have to walk farther to the library, they'll lose weight, who knows?

142
00:12:25.120 --> 00:12:34.330
David Bau: Right? So, no, so give everybody new library cards. Now, that wasn't it. You know, maybe that was a correlative thing. Maybe people who are healthy have different library cards.

143
00:12:34.460 --> 00:12:36.940
David Bau: But, you know, but it wasn't causal.

144
00:12:37.430 --> 00:12:41.959
David Bau: Right? And so, most of these interventions don't have any effect.

145
00:12:42.180 --> 00:12:43.600
David Bau: Right, but once in a while.

146
00:12:43.930 --> 00:12:45.469
David Bau: You might hit on the thing.

147
00:12:45.720 --> 00:12:49.710
David Bau: Right? What's… it may be… maybe it's because of some…

148
00:12:49.820 --> 00:12:53.889
David Bau: Some… some enzyme in your, you know, some…

149
00:12:54.050 --> 00:13:03.819
David Bau: some enzyme in your brain, some, some hormone or something that you've got. I wonder if we inject that thing into you. Oh, this… what is this GLP-1 thing?

150
00:13:03.900 --> 00:13:17.670
David Bau: You know, maybe if we inject that into you, it'll cause you to lose weight. Holy cow! It does! Like, 40% weight off, right? So you find the causal variable. That makes once in a while, out of the millions of variables, there might be.

151
00:13:18.410 --> 00:13:21.239
David Bau: Make sense? So that's kind of what we're trying to do.

152
00:13:22.000 --> 00:13:26.119
David Bau: Alright, so… How, like, how do you do this in practice?

153
00:13:26.240 --> 00:13:29.500
David Bau: Well… Where's Adam?

154
00:13:30.160 --> 00:13:39.039
David Bau: Thank you, Adam. Adam just put this website on. Will you try typing this into your computers? You could literally try this.

155
00:13:39.210 --> 00:13:39.990
David Bau: Right?

156
00:13:40.290 --> 00:13:43.830
David Bau: And, you know, I don't know, I can maybe…

157
00:13:44.430 --> 00:13:46.820
David Bau: Should I try to? I'll come along with you.

158
00:13:47.810 --> 00:14:04.809
David Bau: So, so let's, let's, let's try this out. All right, Adam, see if the website stays up with, like, 10 people scanning it. And so, so we're gonna… so, oh, it's not in this menu yet, it's okay. So let's, let's… so I'm just gonna put a little thing here, I'll say, you know, Miles Davis…

159
00:14:05.750 --> 00:14:07.220
David Bau: Please, right?

160
00:14:07.850 --> 00:14:17.419
David Bau: plays that something, right? You know, whatever. And so, I'll adjust… I'll adjust this as I get into it. There's different options once I get into it. So, I'm gonna say I'm human.

161
00:14:17.520 --> 00:14:19.139
David Bau: It's very sophisticated now.

162
00:14:19.370 --> 00:14:26.460
David Bau: Okay, and then… oh, I have to click on three icons that differ from the others. This is diff… oh, no, I missed it. This one, this one.

163
00:14:26.600 --> 00:14:27.770
David Bau: This one, okay.

164
00:14:28.220 --> 00:14:29.560
David Bau: Such a bad cooker.

165
00:14:30.480 --> 00:14:40.409
David Bau: Devices that depend on battery power? I don't know. Like, it depends. Because, like, if you have, like…

166
00:14:40.910 --> 00:14:43.920
David Bau: Oh my god. Like, if you have, like…

167
00:14:44.320 --> 00:14:47.909
David Bau: You know, if you have an old camera, I don't know if you need batteries.

168
00:14:48.270 --> 00:14:53.890
David Bau: I guess I had the flash. Okay, so, what I'm gonna do is, so there's this new tool down here that

169
00:14:54.200 --> 00:14:56.920
David Bau: That Adam just added is very hidden.

170
00:14:57.230 --> 00:15:06.840
David Bau: But it's down here, see this activation patchy button? Right. And then what is this? There's two places to enter text. Let me enter… okay, so I'll say Miles Davis.

171
00:15:07.570 --> 00:15:09.130
David Bau: Right? Is what we're doing, right?

172
00:15:09.350 --> 00:15:10.530
David Bau: glaze…

173
00:15:11.050 --> 00:15:21.100
David Bau: What is the… okay, it's gonna be… okay, that's the challenge for… I'm gonna… I'm gonna use GPJ6B, because it's a little smarter than the old… GP David. Oh, Miles David.

174
00:15:21.380 --> 00:15:30.420
David Bau: I wonder if Miles David plays. Okay, so it won't know who Miles David is, so if I click out here, I can add this in. Okay, thank you. Nice. Nice. Okay.

175
00:15:30.870 --> 00:15:34.390
David Bau: And then if I say, yo yo ma, Right?

176
00:15:35.010 --> 00:15:36.290
David Bau: Omaha, right?

177
00:15:36.600 --> 00:15:38.390
David Bau: Plays down, okay.

178
00:15:38.490 --> 00:15:46.669
David Bau: So this is nice, right? Right? And can I run this yet? No, there's no experiment yet, because this is a causal mediation experiment, so I have to set up something, so if I click on…

179
00:15:46.890 --> 00:15:47.910
David Bau: Davis…

180
00:15:48.170 --> 00:16:06.449
David Bau: I go to Ma, okay? Now, what I'm gonna do is… what this is doing is it's gonna say, what if I take the representation that's sitting in the transformer when it's processing this token, and then move it over to the other transformer when it's processing this token? I'll show you what it looks like in the diagram in a minute. And you run it.

181
00:16:06.780 --> 00:16:07.460
David Bau: Ben…

182
00:16:07.780 --> 00:16:17.830
David Bau: It'll scan through all the layers, it'll do this, it's actually gonna set up something like 30 different experiments, and it'll run them all here, and it shows me this. Oh, look, there's a new Y scale, thank you very much.

183
00:16:18.070 --> 00:16:22.530
David Bau: Okay, so, so what is it saying? It's saying…

184
00:16:22.740 --> 00:16:31.110
David Bau: that if I went to the last layer of the transformer, And I said, okay,

185
00:16:31.270 --> 00:16:34.320
David Bau: You know, the transformer is grid-shaped, and…

186
00:16:34.530 --> 00:16:39.349
David Bau: And then when it processes Davis, Up to the last layer.

187
00:16:39.450 --> 00:16:46.830
David Bau: Right? And I take that last layer representation of Davis, and I patch it over to the last layer of representation of mom.

188
00:16:47.670 --> 00:16:54.010
David Bau: Right? And then I run the network, then the output is high probability for red and low probability for blue.

189
00:16:54.090 --> 00:17:07.680
David Bau: Red is cello. So, right? Well, it's missing the O? Because the signal's broken. It's about to say no, right? So, so it's… so it says cello, so it doesn't work, right? So if I patch over…

190
00:17:08.130 --> 00:17:11.170
David Bau: At the last layer, From Miles Davis.

191
00:17:11.770 --> 00:17:12.880
David Bau: To yo, your mind.

192
00:17:13.970 --> 00:17:16.600
David Bau: it doesn't have any effect. It stays…

193
00:17:17.369 --> 00:17:20.809
David Bau: challenge. It still thinks about Yo-Yo Ma.

194
00:17:21.190 --> 00:17:21.960
David Bau: Right?

195
00:17:22.119 --> 00:17:24.640
David Bau: But then there's this interesting transition. If I…

196
00:17:25.130 --> 00:17:28.999
David Bau: patch earlier and earlier layers, there's this very distinct moment where

197
00:17:30.210 --> 00:17:34.520
David Bau: If I patch over, like, layer 12 or 13, Right?

198
00:17:34.950 --> 00:17:37.300
David Bau: That thing slips over.

199
00:17:38.030 --> 00:17:39.170
David Bau: Just say trumpet.

200
00:17:39.830 --> 00:17:45.130
David Bau: Right? So this is an N equals 1 experiment, but even at n equals 1, that's a very, very…

201
00:17:45.800 --> 00:17:50.760
David Bau: I mean, there's 50,000 tokens, so it's not some accident that it's the same company.

202
00:17:50.880 --> 00:17:53.140
David Bau: Right? They had something to do with Miles Davis.

203
00:17:53.460 --> 00:17:54.460
David Bau: Does that make sense?

204
00:17:54.900 --> 00:18:02.889
David Bau: Even though we've only run one experiment here, and obviously you could run hundreds more to get agreed on whether this is a systematic event.

205
00:18:03.290 --> 00:18:08.120
David Bau: But here we're just running one. And… and so there's something that if you patch over

206
00:18:08.620 --> 00:18:14.709
David Bau: the computation of what is being thought here, it'll happen here. Now, if you go to the very first layer.

207
00:18:14.940 --> 00:18:17.140
David Bau: Right, and you patch over Davis.

208
00:18:17.660 --> 00:18:28.369
David Bau: It does… it bumps Trumpet up a little bit, but it's… these are all near zero. And cello is near zero, Trumpet's near zero, right? The model would…

209
00:18:28.730 --> 00:18:30.709
David Bau: This is what corruption looks like.

210
00:18:31.070 --> 00:18:35.480
David Bau: The model's not happy saying either Trumpet or cello.

211
00:18:35.650 --> 00:18:47.660
David Bau: There's probably some way that we can go into the user interface to see, like, if there's some other word that it's happy saying, but… but it's really not happy saying either one of these. And… and why is that? Because

212
00:18:47.860 --> 00:18:53.000
David Bau: It would be… if you… if you patch over at, like, layer 0, That's, like…

213
00:18:53.180 --> 00:18:56.930
David Bau: Hitting the transformer right at the moment where it perceives this input.

214
00:18:57.060 --> 00:19:02.360
David Bau: It's like patching over the, like, the auditory signal or something like that. It's like, you know, changing the ear.

215
00:19:02.560 --> 00:19:06.289
David Bau: Right? And so, that's like asking the model.

216
00:19:06.390 --> 00:19:11.950
David Bau: you know, Yo-Yo Davis plays the something, right? So if you ask, like, Yo-Yo Davis.

217
00:19:12.060 --> 00:19:13.999
David Bau: Then it's like, I don't know!

218
00:19:14.720 --> 00:19:15.990
David Bau: Maybe the trumpet?

219
00:19:16.180 --> 00:19:19.900
David Bau: Maybe the cello? Probably neither one, I'm not really sure what to say.

220
00:19:20.390 --> 00:19:21.320
David Bau: Does that make sense?

221
00:19:21.430 --> 00:19:24.960
David Bau: So, so at this moment, it doesn't really… like…

222
00:19:25.410 --> 00:19:28.770
David Bau: have the whole concept of Miles Davis in this place.

223
00:19:28.930 --> 00:19:34.090
David Bau: It's like, it leads to some corruption. But weirdly, as you get to layer 12,

224
00:19:34.500 --> 00:19:40.280
David Bau: it's very strong. Like, as soon as you, like, hit layer 12, then patch you over this one.

225
00:19:40.530 --> 00:19:44.630
David Bau: token representation, Makes it think coherently.

226
00:19:44.910 --> 00:19:48.650
David Bau: I don't know if it's thinking coherently monotherapists, but it's certainly thinking coherently.

227
00:19:48.800 --> 00:19:49.830
David Bau: trumpet player.

228
00:19:51.220 --> 00:19:52.350
David Bau: Does that make sense?

229
00:19:54.290 --> 00:19:55.750
David Bau: Interesting, right?

230
00:19:56.170 --> 00:19:57.299
David Bau: Is that weird?

231
00:19:57.580 --> 00:19:58.770
David Bau: It's weird, right?

232
00:20:00.280 --> 00:20:05.889
David Bau: No, I think it's weird. When, you know… So, I'm very prominent.

233
00:20:06.960 --> 00:20:09.370
David Bau: Of being the first one to run this experiment.

234
00:20:10.410 --> 00:20:23.150
David Bau: Yes. So, you know… So, okay, go ahead. Is the patching at the level of individual neurons, or the whole layer? So, it's at… let me bring up the slideshow again.

235
00:20:24.380 --> 00:20:28.119
David Bau: So this is the… this is the… one of the core things from the wrong paper.

236
00:20:28.770 --> 00:20:32.100
David Bau: And, and… and we'll put it right here.

237
00:20:33.200 --> 00:20:35.900
David Bau: Okay, so where are we? We're right here.

238
00:20:36.480 --> 00:20:40.419
David Bau: So this, this, this diagram answers the question. I'll just, I'll just go back to this.

239
00:20:45.050 --> 00:20:52.350
David Bau: So yeah, so here, we're patching at the unit of one of these dots.

240
00:20:52.450 --> 00:21:04.389
David Bau: And this dot is about 1,000 to 10,000 neurons, depending on which transformer you're working with. So when they say, you know, sometimes the cool kids will say, hey, what's your D model?

241
00:21:04.660 --> 00:21:07.830
David Bau: D-step model, right? You know, what's your D model?

242
00:21:07.930 --> 00:21:14.279
David Bau: And, like, bigger models will have a larger dimension, and smaller models will have a smaller dimension.

243
00:21:14.460 --> 00:21:20.629
David Bau: And, like, a small B model would be, like, a thousand. Oh, it means we're using a thousand neurons to represent a word.

244
00:21:20.920 --> 00:21:38.859
David Bau: Right? And then a large D model is, like, 10,000. Oh, that's a big model. You've got, like, a… your D model's, like, 20,000. I heard a rumor that… I heard a rumor that, you know, the new Chinese models are using a D model of 20,000. That's crazy! How much memory is that? Right? Okay, so… right, that's… that's, like, what people will say. Make sense?

245
00:21:39.350 --> 00:21:52.250
David Bau: Okay, so this is so… this is the… the number of neurons you're carrying is the D model, and… but it's not, like, the whole layer, because the layer has a different representation for every column of tokens.

246
00:21:52.620 --> 00:21:53.510
David Bau: Make sense?

247
00:21:53.700 --> 00:21:55.540
David Bau: And so we're bringing that one column over.

248
00:21:55.990 --> 00:22:01.929
David Bau: Okay, so that's how this experiment runs. Now, you can do cost mediation analysis on any subset.

249
00:22:02.040 --> 00:22:15.189
David Bau: you want, and there are people who do very fancy things. So you could take one of these wires and patch it over, and so there's some experiments that you see in other of the papers that, like, take… bring over one of these attention wires.

250
00:22:15.370 --> 00:22:26.859
David Bau: and patches it over. You can bring over just the MLP wires, I'll show you one of those in a minute. You can… you can slice these neurons in a different way. Instead of saying, I'm going to patch over all thousand neurons.

251
00:22:27.180 --> 00:22:39.990
David Bau: you could do linear algebra on it, and patch over, like, a subspace of the neurons. If you feel like the neurons are working like a vector algebra, then you could say, oh, I want some components of that vector to stay the same, and other components to

252
00:22:40.140 --> 00:22:45.109
David Bau: To be patched, so you could do fancy math on it, and so some people pursue that theory.

253
00:22:45.540 --> 00:22:49.310
David Bau: So… so there's different ways you can do this, but this is the basic experiment.

254
00:22:49.810 --> 00:22:50.700
David Bau: Make sense?

255
00:22:51.770 --> 00:22:58.150
David Bau: All right. Oh, and Anya asks, why does causal influence Where's anonym?

256
00:22:58.880 --> 00:23:02.070
Ananya Malik: I'm here, actually, online right now. Hello?

257
00:23:02.470 --> 00:23:04.230
David Bau: Yes, I can hear you.

258
00:23:04.230 --> 00:23:05.830
Ananya Malik: Hi. Sorry.

259
00:23:06.470 --> 00:23:08.379
David Bau: What was your, what was your full question?

260
00:23:09.140 --> 00:23:12.440
Ananya Malik: Yeah, I guess. I wanted to know why does it affect, like.

261
00:23:12.580 --> 00:23:18.669
Ananya Malik: Despite the same architecture, why does it affect… why does causal inference, like, affect different layers differently?

262
00:23:18.790 --> 00:23:21.839
Ananya Malik: I think that was a little hard for me to, like.

263
00:23:21.840 --> 00:23:29.100
David Bau: And does the diagram make it clear? So, so, so you see this, like, elbow here? Right?

264
00:23:29.300 --> 00:23:34.300
David Bau: So this is… this is my mental model for why it might behave differently.

265
00:23:34.420 --> 00:23:36.250
David Bau: So, if the information

266
00:23:37.060 --> 00:23:46.620
David Bau: flow, like, flow through here. So, like, this, this, this light blue diagram is my… is actually sort of my mental model for how information is flowing in this.

267
00:23:46.900 --> 00:23:49.690
David Bau: In this, in this, in this system.

268
00:23:50.090 --> 00:23:53.629
David Bau: Before you get to layer 12 or something like this.

269
00:23:53.750 --> 00:24:03.489
David Bau: is sort of diffuse. Miles Davis is just, like, this diffuse things is about tokens, but then something happens at around layer 12, which consolidates information at the token Davis.

270
00:24:03.640 --> 00:24:07.150
David Bau: Right? Of, like, who this person is, or what instrument they play, or something.

271
00:24:07.270 --> 00:24:10.739
David Bau: Right? And then… and then, and then that information, like.

272
00:24:11.420 --> 00:24:13.830
David Bau: Hangs around at this layer for a little while.

273
00:24:14.190 --> 00:24:15.939
David Bau: Until layer 20 or something?

274
00:24:16.160 --> 00:24:17.620
David Bau: And then, all of a sudden.

275
00:24:18.950 --> 00:24:22.249
David Bau: The information flow path moves over to a different token.

276
00:24:23.280 --> 00:24:29.880
David Bau: And the information may still be there at layer… at this… at this token, but after layer 20, it's not on the path anymore.

277
00:24:30.340 --> 00:24:35.370
David Bau: So if we go and we do a causal intervention too late, right here.

278
00:24:36.060 --> 00:24:38.080
David Bau: Then it no longer has an effect.

279
00:24:38.600 --> 00:24:39.150
Ananya Malik: Nice.

280
00:24:39.640 --> 00:24:42.050
David Bau: So it would be… it would be like,

281
00:24:44.180 --> 00:24:46.559
David Bau: It'd be like, hey, you know what? I'll get you to lose weight.

282
00:24:47.000 --> 00:24:56.910
David Bau: By… by bringing you to the gym. I'm gonna send a car to your house, and I'll bring you to the gym every day. And my gosh, it works! If you send a car to my house at, like, 6 AM,

283
00:24:57.090 --> 00:25:05.360
David Bau: But if you send the car to my house at, like, 2 in the afternoon, well, it doesn't do anything, because I'm not there, right? You know, I'm, like…

284
00:25:05.460 --> 00:25:08.829
David Bau: I'm going… I'm off at the restaurant, picking out.

285
00:25:09.330 --> 00:25:12.270
David Bau: And, and so it doesn't help me lose weight.

286
00:25:12.420 --> 00:25:13.420
David Bau: Does that make sense?

287
00:25:13.550 --> 00:25:15.710
David Bau: Yeah, so that's basically what's happening.

288
00:25:16.400 --> 00:25:16.940
David Bau: Okay.

289
00:25:16.940 --> 00:25:17.510
Ananya Malik: Awesome.

290
00:25:17.990 --> 00:25:19.829
David Bau: Yes, just one thing? Yes.

291
00:25:20.120 --> 00:25:35.130
David Bau: when you presented this before giving the results, my intuition was that the latest layers would be easier to influence the output, but it seems like I'm wrong about that. Wow. And I don't understand why I'm wrong. Okay, here's why, here's why. Because it's a two-dimensional grid.

292
00:25:35.270 --> 00:25:47.599
David Bau: And so, if you… actually, I have… I have that result. I'll show you the result. It's a good… it's a good… Oh, before I show you the result, let me come back to it. Okay, sure. Okay. So there was another question.

293
00:25:47.700 --> 00:25:54.800
David Bau: There's another couple questions I just didn't want to just get into. Patch, corrupt, clean, instead, compost. So, Bryce, what was your question?

294
00:25:55.140 --> 00:25:59.789
David Bau: I thought this would be a good moment to have. Oh, I was just asking, right now we're doing what it…

295
00:26:00.060 --> 00:26:00.970
David Bau: Oh.

296
00:26:01.310 --> 00:26:02.669
David Bau: We're computing the content running.

297
00:26:03.190 --> 00:26:05.939
David Bau: patching the, like, clean runs activation.

298
00:26:06.120 --> 00:26:08.410
David Bau: Yes. Would it be if you're doing…

299
00:26:08.470 --> 00:26:20.479
David Bau: run with the clean prompt, and, like, patching the corrupted one, and what you saw, but basically the reverse. Yeah. So, you know, so I think that, you know, my answer to that is more or less, oh.

300
00:26:20.480 --> 00:26:40.379
David Bau: There's no real difference between the clean and corrupted run, right? It's just a naming convention, and so that would be the equivalent of just, like, oh, you put YoYo Ma up here, and you put Miles Davis here, and you just run the experiment, and then, you know, now you're taking, like, now, it would just flip the red and blue lines, and, you know, normally, without a patch, Miles Davis plays a trumpet, and you're bringing the idea of a cello player into it.

301
00:26:40.420 --> 00:26:46.309
David Bau: And so it's just switching. But now you might call that one the clean run, and the other one the perfect run, just because.

302
00:26:46.700 --> 00:26:50.480
David Bau: You know, if there's a source for the patch and a target for patch, it's just what we're calling it.

303
00:26:50.580 --> 00:27:10.460
David Bau: That makes sense. Now, you could do more complicated things. You can do multiple patches at once, you know, just like you could imagine, oh, you know, we have some medical intervention where you take this drug, but at the same time you do exercise, and at the same time you do other things, and so, you know, we can do this type of thing here. I'll describe some more complicated experiments soon. Yes?

304
00:27:10.940 --> 00:27:15.789
David Bau: Oh, is it kind of like, like, so, like, when I'm baking and I want to make, like, a chocolate cake.

305
00:27:15.790 --> 00:27:33.940
David Bau: it's like, I have to add the cocoa powder, like, pretty early on, like, when it's still a batter. Yeah. Because, like, by the time it's already baked, like, that's just a vanilla cake with cocoa powder. On top, you get a different thing. Right, yeah, exactly. And so that's a good way of thinking about it. And then the weird thing about transformers is not only do you get a different thing, but because they're a grid.

306
00:27:34.040 --> 00:27:52.179
David Bau: You know, actually, a lot of the information out here just sort of gets thrown out, right? So it's like, oh, it's just the last token prediction that you care about, and the influencing these things is no longer relevant to that. Is that why everyone's into, like, a mixture of experts now? You know, it's, you know, yeah, maybe. It's,

307
00:27:52.670 --> 00:27:53.460
David Bau: Yeah.

308
00:27:54.800 --> 00:27:55.550
David Bau: Okay.

309
00:27:56.240 --> 00:28:12.419
David Bau: So, like, for example, here we're patching from Davis to Ma. Yes. And that sort of, like, tells you where… when does the model, like, actually, like, fetch the representation and make it flow to the last token. Yes. But then if you were patching from last token to last token.

310
00:28:12.570 --> 00:28:18.339
David Bau: Would that just be, like, a convoluted signal, and you would not be to, like, isolate Davis.

311
00:28:18.540 --> 00:28:21.200
David Bau: Alright, you're asking Jasmine's other question.

312
00:28:21.950 --> 00:28:28.840
David Bau: How do I know about locations, right? So I'll just answer that. So that's the same question that you asked, right?

313
00:28:28.960 --> 00:28:31.220
David Bau: So… so the… so…

314
00:28:31.370 --> 00:28:37.869
David Bau: So one thing that Adam hasn't put in this UI yet, but probably will show up in next week or two, right, is,

315
00:28:38.140 --> 00:28:49.169
David Bau: is… is… is, how to, like… it might be interesting to scan this experiment over multiple tokens and then compare them. So what happens if we, like, transplant miles to Yo?

316
00:28:49.610 --> 00:29:01.409
David Bau: And what happens if we transplant the to the, and plays to plays, and stuff like that? Like, you know, would any of these have the effect? Like, why did I pick Davis to bring over to Ma? How about plays to plays?

317
00:29:01.540 --> 00:29:10.779
David Bau: Maybe when we get to plays, you're thinking about, oh yeah, what instrument they're playing, right? You know, and I move over to this place. Maybe this is, like, maybe this plays by the time you get to layer 12.

318
00:29:10.970 --> 00:29:19.859
David Bau: is really about trumpet playing, not just regular playing, it's about trumpet playing. The plays could mean so many things. It could mean… it could mean, playing in a playground.

319
00:29:20.170 --> 00:29:30.110
David Bau: Right? It could mean lots of other stuff, but maybe it means trumpet playing by now. And if we transport it here, it means cello playing. Who knows? You could have a lot of hypotheses, and so why not…

320
00:29:30.240 --> 00:29:38.319
David Bau: run that experiment, too. And so, obviously, you could, right? So you could actually do it right now, take a second, just drag the arrow from there to there, and you can see what happens.

321
00:29:38.430 --> 00:29:41.669
David Bau: Right. So, this is a common experimental form.

322
00:29:41.980 --> 00:29:46.879
David Bau: And… and so that's what this… this… this feedback is. So basically what this is.

323
00:29:47.090 --> 00:29:52.570
David Bau: Is it saying, oh, if I… if I… Patch.

324
00:29:52.930 --> 00:30:03.909
David Bau: Brian to something, or D, or something to all, and Ma to something, then what happens? If I attach works to works in and in, the area, right, so you have two parallel sentences.

325
00:30:05.520 --> 00:30:23.220
David Bau: Here, I'll put a clearer form here. Shaquille O'Neal or Megan Rapinoe plays the sport of, right, so the two sentences both say plays the sport of, but one of them is spelling out Shaquille O'Neal, the other one is spelling out Megan Rapinoe, and you can patch from Megan Rapinoe to Shaquille O'Neal and see what happens.

326
00:30:23.340 --> 00:30:24.180
David Bau: Right?

327
00:30:24.440 --> 00:30:28.869
David Bau: And… and most of the time, if you patch over these early things.

328
00:30:29.430 --> 00:30:31.509
David Bau: It doesn't make it say soccer.

329
00:30:31.780 --> 00:30:33.430
David Bau: It just doesn't, it doesn't work.

330
00:30:33.690 --> 00:30:35.680
David Bau: And it's because of the reason I…

331
00:30:36.390 --> 00:30:44.500
David Bau: you know, described to you earlier. It would be like, oh, Megan O'Neill plays the sport of, like, I don't know who Megan O'Neill is, right? That makes sense?

332
00:30:44.800 --> 00:30:50.220
David Bau: Right, but the weird thing is, if you get to layer 10 or something, and then you patch over a single token.

333
00:30:50.420 --> 00:30:52.490
David Bau: Then it does switch to soccer.

334
00:30:53.320 --> 00:30:55.960
David Bau: So there's something interesting happening around there.

335
00:30:56.090 --> 00:31:07.089
David Bau: And and then… but then you can repeat the experiment everywhere. I wonder about plays. Like, maybe soccer playing is a special word, so… so I was actually very excited to see if it, like, refined this idea of plays.

336
00:31:07.180 --> 00:31:20.829
David Bau: to be this, but I didn't see any effect here. This is just good old plays. This has nothing to do with soccer or basketball. It doesn't change it if you patch it over. But where it does patch change it is right over here, so right at the end.

337
00:31:21.080 --> 00:31:22.870
David Bau: But this is a different of.

338
00:31:23.430 --> 00:31:24.720
David Bau: When you patch it over.

339
00:31:25.180 --> 00:31:26.669
David Bau: But when you get to the last layer.

340
00:31:27.000 --> 00:31:34.190
David Bau: So this off here… It transforms from an of, where you very much want to say basketball after it.

341
00:31:34.480 --> 00:31:37.360
David Bau: Do an of where you want to very much want to say soccer after.

342
00:31:37.840 --> 00:31:43.609
David Bau: Yes. Could it be, like, it's a different… it's a different place, it's just not reading from that.

343
00:31:44.030 --> 00:31:51.869
David Bau: Yeah, I'm sure that it's different. So, so, so if you… so there's other visualizations that you can do, like, when you…

344
00:31:52.010 --> 00:31:53.270
David Bau: Look, Rob.

345
00:31:53.390 --> 00:32:01.540
David Bau: at the vector activations, it would be very interesting to make a scatterplot or something of the different plays, and I'm sure you would see that these plays

346
00:32:02.130 --> 00:32:06.690
David Bau: Like, are different from each other. But they're not different in a way that makes it, say, soccer.

347
00:32:08.180 --> 00:32:12.580
David Bau: That's basically what the experiment is saying. So… so,

348
00:32:13.790 --> 00:32:19.159
David Bau: Grace's last name is Pope Sting, so Papa Grace is always asking about COVID shins.

349
00:32:19.330 --> 00:32:23.829
David Bau: And, and, I'm not an adaptester.

350
00:32:24.050 --> 00:32:36.420
David Bau: And one of the probing questions is, you know, is there some information in here about the difference between the soccer player and the basketball player at the workplace? And there's probably some information there about it.

351
00:32:36.530 --> 00:32:41.239
David Bau: Right. But then what this experiment is showing is that whatever that information is, is not…

352
00:32:41.490 --> 00:32:48.380
David Bau: very causal at making, at least not on its own, it's not very causal at making it, say.

353
00:32:48.720 --> 00:32:50.699
David Bau: Soccer instead of basketball.

354
00:32:51.430 --> 00:32:52.240
David Bau: Sense?

355
00:32:53.020 --> 00:32:57.900
David Bau: Yes. It's like a… like a library card at the wrong time.

356
00:32:58.040 --> 00:32:58.730
David Bau: Okay.

357
00:32:58.940 --> 00:33:01.860
David Bau: So, okay.

358
00:33:01.980 --> 00:33:06.970
David Bau: Gosh, there's a bunch of other questions, let me see what other questions I missed.

359
00:33:07.090 --> 00:33:08.590
David Bau: Jasmine had a question.

360
00:33:09.150 --> 00:33:11.310
David Bau: Oh, oh, an example!

361
00:33:11.640 --> 00:33:12.510
David Bau: Clarity.

362
00:33:12.890 --> 00:33:14.589
David Bau: You wanted a code example.

363
00:33:14.960 --> 00:33:18.359
David Bau: Let's show you a code example. Like, how… how complicated is this?

364
00:33:19.500 --> 00:33:22.239
David Bau: To put together. We have some code somewhere. Here's the code.

365
00:33:22.620 --> 00:33:27.100
David Bau: Okay, so here's what the code looks like. So, I'm gonna show you 3 pieces of code, they were written by Adam.

366
00:33:27.210 --> 00:33:33.200
David Bau: This code is probably a little bit better written than code that I would write, but I can walk you through what it is.

367
00:33:33.320 --> 00:33:46.060
David Bau: And so, so, so this is, like, the experiment that we're running. It's a general experiment form, activation patching, and you have a source, and you have a target piece of text that you're going to run through the same language model.

368
00:33:46.230 --> 00:33:49.950
David Bau: But you'll think of these as sort of two different instances of the language model.

369
00:33:50.190 --> 00:34:01.200
David Bau: And then there's a source patch and a target patch, and you can… so Adam's written a pretty fancy code here saying that, oh, you might want to patch over multiple tokens and things, but just think of these… each and one of these as an integer. We're just going to patch from one token.

370
00:34:01.810 --> 00:34:06.879
David Bau: to one other token. One integer, one index into the source, and one index into the target.

371
00:34:07.120 --> 00:34:07.830
David Bau: Okay.

372
00:34:08.139 --> 00:34:08.949
David Bau: And…

373
00:34:09.120 --> 00:34:15.630
David Bau: And then it's gonna output some… some stuff, right? It's gonna output, like, how does the LLM output change?

374
00:34:15.780 --> 00:34:24.020
David Bau: If we… if we insert this idea from the source to the target. And the way it's gonna output it is it's gonna output these logic… these output probabilities.

375
00:34:24.400 --> 00:34:36.280
David Bau: And so it'll tell us, you know, the LLM now, after you've done this experiment, now really wants to say basketball, or it really wants to say cello, or whatever, right? And so we'll get those probabilities out, okay?

376
00:34:36.389 --> 00:34:49.240
David Bau: And so, so the way we do it is there's sort of two passes. So, after we declare the thing, this function is gonna… it was too much code, so I put it into two slides. So the first part of the code

377
00:34:49.400 --> 00:34:52.080
David Bau: Is, you run the source.

378
00:34:52.989 --> 00:34:54.309
David Bau: to collect.

379
00:34:55.110 --> 00:34:57.050
David Bau: The, the, the activations.

380
00:34:57.170 --> 00:34:58.160
David Bau: That's all you do.

381
00:34:58.400 --> 00:35:08.279
David Bau: And so, this is like, okay, here's the source model, well, the source and the target model are the same, but we're gonna actually run the source prompt into the model. So we say, okay.

382
00:35:08.390 --> 00:35:11.160
David Bau: Think about Miles Davis, right?

383
00:35:11.430 --> 00:35:21.239
David Bau: And then… and then we… and then… and then, because Adam wants to automatically do all the experiments for you, and every layer, he patches… he just goes over all the layers.

384
00:35:21.580 --> 00:35:26.870
David Bau: And for each layer, he… He unpacks the activation at the layer.

385
00:35:27.040 --> 00:35:35.119
David Bau: You know, sometimes they're tuples, and for some models, and sometimes they're not, but anyway, he, like, pulls out the activation soil layer.

386
00:35:35.580 --> 00:35:42.899
David Bau: And then he just… It says, I only care about one tokens, Activations? So, nope.

387
00:35:43.080 --> 00:35:45.520
David Bau: You know, don't use up memory, saving them all.

388
00:35:45.710 --> 00:35:48.700
David Bau: I just, like… I just care about token number 5?

389
00:35:49.060 --> 00:35:50.879
David Bau: Whatever this is, right?

390
00:35:51.210 --> 00:35:58.099
David Bau: And then it just sticks it into an array. So, each one of these is a vector, so 0 means it's the zeroth stream.

391
00:35:58.390 --> 00:36:07.030
David Bau: In the batch, so the string is Miles Davis plays the… there's no other string, so it's a 0th one that says which token it is, like, token number whatever, token number 3.

392
00:36:07.260 --> 00:36:16.009
David Bau: Right? And then… and then this colon says, oh, every element of that vector, that's a D model, so that's, like, a thousand token… 1,000 element vectors.

393
00:36:17.040 --> 00:36:20.990
David Bau: That makes sense? So we take this 1000 dimensional vector, we just jam it in the survey.

394
00:36:21.670 --> 00:36:27.890
David Bau: And oh, don't worry about the else. The else is like, oh, if we, if we want multiple tokens.

395
00:36:28.030 --> 00:36:33.999
David Bau: Okay, and then, And then we can actually, after the models finish running.

396
00:36:34.210 --> 00:36:43.559
David Bau: we can say, oh, what was its LM head output? Basically, what was its decoder output? What was it, you know, predicting? Let's just… let's just save that.

397
00:36:43.890 --> 00:36:44.820
David Bau: Makes sense.

398
00:36:44.950 --> 00:36:51.529
David Bau: And so, we'll know. So, originally, it says, Miles Davis plays the trumpet. He just saves that information, just in case you're interested.

399
00:36:51.780 --> 00:36:54.090
David Bau: No, it was… it was originally the same trumpeter.

400
00:36:54.660 --> 00:36:55.889
David Bau: Okay, that's great.

401
00:36:56.650 --> 00:37:01.669
David Bau: So now, you have to actually… we haven't done the experiment yet, we just, like, collected some… some information.

402
00:37:01.810 --> 00:37:05.980
David Bau: And so, now we have to run the model a second time. So we start another trace.

403
00:37:06.120 --> 00:37:10.089
David Bau: And we invoke it on the target Promise Plan. This is Yo-Yo Ma plays something.

404
00:37:10.510 --> 00:37:12.100
David Bau: Right? So then we…

405
00:37:13.490 --> 00:37:25.479
David Bau: And, and then, you know, we also grab a clean output so that we know it's gonna say, you know, mom plays the cello, so we can decode this vector to cello, but then we go and we actually run

406
00:37:26.060 --> 00:37:26.850
David Bau: model.

407
00:37:27.050 --> 00:37:30.249
David Bau: A bunch of times. We're gonna run it 30 times, one for every layer.

408
00:37:30.400 --> 00:37:35.820
David Bau: Right? And so we run Yo-Yo Ma plays there, you know, for Layer 1,

409
00:37:36.110 --> 00:37:40.399
David Bau: And… and then we… we… we grab the Layer 1 output.

410
00:37:40.690 --> 00:37:46.499
David Bau: And then we take this Layer 1 output, and we change it. So, now here is on the left-hand side, so this is an intervention.

411
00:37:46.910 --> 00:37:49.869
David Bau: We've, like, so where we changed the model's output?

412
00:37:50.090 --> 00:37:52.589
David Bau: At layer 1, or at layer 0.

413
00:37:53.090 --> 00:38:04.959
David Bau: to something. What do we change it to? Well, we saved the source activation for layer 0, so we'll just change it to that. So that's… this is literally the patch. It's all it is. It's just like a variable assignment. That make sense?

414
00:38:05.210 --> 00:38:06.389
David Bau: So you just said it.

415
00:38:06.680 --> 00:38:17.139
David Bau: Now, this is… this is N-site code. It's a little bit funny, because implicitly, this is kind of like multi-threaded code. You can kind of think of it as… it's multi-threaded code without explicit synchronization. So.

416
00:38:17.270 --> 00:38:26.099
David Bau: So you can think of this as, oh, when are you going to set this thing? Well, Insight automatically knows that the neural network, at some point.

417
00:38:26.290 --> 00:38:31.130
David Bau: We'll, like, light up these neurons, like the hidden state at layer 0.

418
00:38:31.360 --> 00:38:34.889
David Bau: And… and after it has them, you can… you can grab them.

419
00:38:35.090 --> 00:38:37.580
David Bau: But then after it has them, you can also change them.

420
00:38:37.870 --> 00:38:43.410
David Bau: And it'll note that you will typically want to change them before the next layer, layer 1,

421
00:38:43.620 --> 00:38:51.609
David Bau: wants to read that, right? And so it's so… so implicitly, Insight is… is interleaving your code

422
00:38:51.770 --> 00:38:56.410
David Bau: With, you know, as if it was running on another thread with the execution of the neural network.

423
00:38:56.540 --> 00:38:59.400
David Bau: So it notes when to, when to do this.

424
00:38:59.930 --> 00:39:08.459
David Bau: And then, and then, so after you do this, then, then at the very end of the neural network, you can ask, well, what was the output?

425
00:39:08.850 --> 00:39:10.120
David Bau: And then you can read it up.

426
00:39:10.360 --> 00:39:15.260
David Bau: And now, after we change layer at 0, its output might be a little different, so…

427
00:39:15.620 --> 00:39:17.030
David Bau: We, we, we grab it.

428
00:39:17.230 --> 00:39:23.300
David Bau: And we put the output in an array. And then we do it for lamp 1, do it for array. Make sense? Crystal clear?

429
00:39:23.470 --> 00:39:26.710
David Bau: Yes. Just to understand that multi-threaded part. Yes.

430
00:39:26.850 --> 00:39:39.740
David Bau: So there is, like, a main thread which runs the forward bus. Yes. There is another thread which runs the… The experiment. That's right. And then one of the tricky things that Jada does is he goes over your experiment code, because you can see all the code.

431
00:39:39.840 --> 00:39:41.680
David Bau: And he lines it up with the…

432
00:39:41.870 --> 00:39:54.410
David Bau: the neural network code. Okay. And he makes sure that when the experiment needs, like, some data, that that thread is blocked until the neural network gets to that point, and then it gets the data.

433
00:39:54.470 --> 00:40:10.639
David Bau: And then… and it can… it can change the data and so on. And then, and then as soon as it's done with that, then it releases control back to the neural network until the next day. So, so this is a new architecture, actually, from, like, a year ago, and so I didn't monitor this. But this is physically what it's doing now.

434
00:40:11.940 --> 00:40:12.800
David Bau: Make sense?

435
00:40:13.540 --> 00:40:14.340
David Bau: Okay.

436
00:40:14.980 --> 00:40:21.189
David Bau: So, yeah, so, like, when you wrote your paper, it was working differently, so you've probably not… never heard me explain it this way.

437
00:40:22.990 --> 00:40:28.569
David Bau: So, but it's equivalent, though, whatever question. It's, like, sort of large food.

438
00:40:29.550 --> 00:40:30.430
David Bau: Alright.

439
00:40:31.160 --> 00:40:32.280
David Bau: Any other questions?

440
00:40:33.870 --> 00:40:36.129
David Bau: Weird, right? Is this cool enough?

441
00:40:36.260 --> 00:40:39.080
David Bau: See how simple it is? It's so simple!

442
00:40:39.210 --> 00:40:43.080
David Bau: You just say, because it's in a computer program, so you can say A equals B.

443
00:40:43.330 --> 00:40:44.230
David Bau: Right?

444
00:40:44.630 --> 00:40:46.130
David Bau: A neuroscientist?

445
00:40:46.520 --> 00:40:47.660
David Bau: Hello for this!

446
00:40:48.700 --> 00:40:49.460
David Bau: Right?

447
00:40:50.010 --> 00:40:52.710
David Bau: But here we are, this one line of code.

448
00:40:53.000 --> 00:40:58.949
David Bau: So, you know, I have to say, like, I've mentioned this to you before, neuroscientists can do this kind of experiment.

449
00:40:59.130 --> 00:41:04.139
David Bau: Setting up one experiment like this typically takes a whole PhD 7, like, 6 years.

450
00:41:04.310 --> 00:41:06.379
David Bau: You have to, like, read all these mites.

451
00:41:06.640 --> 00:41:21.960
David Bau: And then you have to, stitch in a gene to them. It's called, like, this… it's called this optogenetics thing, which is, like, this light-sensitive gene that, like, shows up in your brain and certain things, and then you have to do, like, then you have to get all these forms signed saying it's cool to, like, give them my surgery.

452
00:41:21.960 --> 00:41:29.239
David Bau: then you have to, like, open up their brains, and you have to put, like, a little ultraviolet light pulse in there, and you do all this other stuff. Anyway, it's this crazy thing.

453
00:41:29.780 --> 00:41:34.429
David Bau: But they can do this. They can… they can take two runs of the mouse.

454
00:41:34.980 --> 00:41:46.449
David Bau: And, like, by manipulating light pulses and viral infections of different, you know, UV-sensitive, you know, genes and things like this, they can…

455
00:41:46.830 --> 00:41:47.700
David Bau: pause.

456
00:41:47.810 --> 00:41:52.370
David Bau: some… Enzyme to be deposited in the neurons.

457
00:41:52.630 --> 00:41:56.390
David Bau: To imprint how active the neurons were for one run.

458
00:41:57.020 --> 00:42:04.490
David Bau: And then later on, by flashing some other UV light or something, they can get those same neurons to activate

459
00:42:04.720 --> 00:42:09.310
David Bau: when the original input stimulus wasn't present. It's like a tour de force.

460
00:42:09.700 --> 00:42:13.499
David Bau: Of genetics and surgery and physics and everything.

461
00:42:13.770 --> 00:42:16.189
David Bau: And so they can do this experiment.

462
00:42:16.840 --> 00:42:18.320
David Bau: But it takes, like, 6 years.

463
00:42:18.670 --> 00:42:21.920
David Bau: And they can do it once. And so there's been a few papers written on this.

464
00:42:22.030 --> 00:42:28.069
David Bau: But you guys… You can all do this in the comfort of your home.

465
00:42:28.610 --> 00:42:30.999
David Bau: Just by writing, like, a line of code.

466
00:42:31.350 --> 00:42:35.999
David Bau: And so, what a luxury you have. So many questions you can ask about cognition.

467
00:42:36.140 --> 00:42:41.370
David Bau: Right? They couldn't be asked… or they could be asked by… Not for homework.

468
00:42:42.380 --> 00:42:43.260
David Bau: Make sense?

469
00:42:43.900 --> 00:42:50.200
David Bau: Okay, see how exciting this is? It's exciting! There's crazy stuff that you can do this! Okay.

470
00:42:50.600 --> 00:42:51.480
David Bau: Alright.

471
00:42:53.270 --> 00:42:54.010
David Bau: Good.

472
00:42:55.000 --> 00:42:59.000
David Bau: Oh, gradients, and causation, You too cheap.

473
00:42:59.160 --> 00:43:00.449
David Bau: I'm so obsessed.

474
00:43:00.740 --> 00:43:04.960
David Bau: Oh, yeah, because I… from the material, I see that…

475
00:43:05.980 --> 00:43:13.910
David Bau: there are two things. One is custom, and one is poor relation. Yes. And, I see that gradients method is

476
00:43:14.100 --> 00:43:18.120
David Bau: Or, like, organization, so… Yeah. How do you look out for…

477
00:43:18.980 --> 00:43:27.380
David Bau: So there's these other experiments that you can run that, that try to find where the important information is by

478
00:43:28.760 --> 00:43:34.700
David Bau: gradients on the network, right? And so, what is a gradient, or what's a derivative.

479
00:43:34.870 --> 00:43:38.070
David Bau: Right, a derivative is very similar to the causal experiment.

480
00:43:38.320 --> 00:43:47.700
David Bau: that we run here. So, a derivative asks the question, if we made a tiny perivision, in some… Signal.

481
00:43:47.990 --> 00:43:49.609
David Bau: In the middle of the models.

482
00:43:49.840 --> 00:43:52.510
David Bau: Then, what kind of tiny perturbation

483
00:43:53.590 --> 00:44:02.630
David Bau: might show up in the output. So here, we're not doing a tiny perturbation. We're doing a giant perturbation. We're, like, you know, changing the whole representation to something else, right?

484
00:44:02.920 --> 00:44:07.000
David Bau: And these networks are complicated enough that that might be what's…

485
00:44:07.420 --> 00:44:12.560
David Bau: necessary, right? Like, if you're trying to see if GLP-1 helps you lose weight.

486
00:44:12.870 --> 00:44:18.460
David Bau: Like, you might get a hint if you gave somebody, like, a microdose of GLP-1 on a single day.

487
00:44:18.590 --> 00:44:22.160
David Bau: And then you could see if they, like, lost weight by, like, a microgram.

488
00:44:22.390 --> 00:44:31.980
David Bau: Or something. But it's really hard to measure, it's like, you can't see that kind of causal effect. It might not happen at all, it might be completely washed out. It might be counterintuitive, it actually might go the other direction.

489
00:44:32.100 --> 00:44:36.819
David Bau: Because now you've injected something into them, and now they're gonna feel hungrier, and they're gonna go…

490
00:44:37.030 --> 00:44:38.490
David Bau: Who knows, right?

491
00:44:38.740 --> 00:44:57.399
David Bau: And so, so… so, but that's the analogy between the gradient signals that you read about sometimes, and these causal things. So you can… if you run gradients through the network, and you set it up the right way, you can get similar heat maps that sometimes show you the same

492
00:44:57.660 --> 00:44:59.190
David Bau: directional effects.

493
00:44:59.470 --> 00:45:03.209
David Bau: as a causal experiment. But often, they'll show you different.

494
00:45:03.650 --> 00:45:07.189
David Bau: FX. And, and so it depends on how you set it up.

495
00:45:07.400 --> 00:45:10.760
David Bau: For very, very large effects like this.

496
00:45:10.940 --> 00:45:26.540
David Bau: The gradients actually can work pretty well, because even, like, a tiny, tiny little change will be visible here. And so, and great… and why… why do we care about that? Because running gradients through the network is very optimized.

497
00:45:26.620 --> 00:45:43.379
David Bau: It's very efficient. So, gradients is how we all do, or not how we do in this class, but, like, how everybody in the industry does painting of neural networks. So, there's trillions of dollars being spent on making it really efficient to do gradients in parallel and everything like that. You can just say.

498
00:45:43.710 --> 00:45:53.470
David Bau: called backward, backprop, or whatever I torch. It'll get you all the gradients of everything instantly. And so, writing gradients is like a super-fast experiment. So, someday.

499
00:45:54.030 --> 00:45:57.300
David Bau: We might want to, like, hook up, like, a gradient execution thing.

500
00:45:57.580 --> 00:46:01.749
David Bau: to this UI, which is a similar UI to let people run the estimates pretty fast.

501
00:46:01.970 --> 00:46:08.109
David Bau: But I… but I… but it's a different… it's a different… sometimes the experiment, you know, gives pretty different results.

502
00:46:08.470 --> 00:46:12.030
David Bau: And so, yep.

503
00:46:12.280 --> 00:46:13.599
David Bau: That's what's going on with that.

504
00:46:14.070 --> 00:46:16.190
David Bau: Alright, does that answer your question?

505
00:46:17.040 --> 00:46:22.250
David Bau: Okay, so I… so I just… so this… so there's other kind of experiments that you can run.

506
00:46:22.390 --> 00:46:28.119
David Bau: This is answering Rice's question a little bit more. Like, you could, instead of patching over just the…

507
00:46:28.240 --> 00:46:36.700
David Bau: residual, which is, like, the sum of everything you could patch over different little wires inside the neural network. So this is patching over the green wires, right, the output of the MLPs.

508
00:46:36.880 --> 00:46:39.209
David Bau: Right? So, remember, the MLPs…

509
00:46:39.790 --> 00:46:44.519
David Bau: Aren't the whole signal in a layer, they're just, like, one of… one-third of the components.

510
00:46:44.670 --> 00:46:57.329
David Bau: of the signal that gets added to other, you know, components at each layer. So if you patch over those, then it has certain effects. If you patch over attention, it has… which is another one-third of the contribution. It has different effects.

511
00:46:57.590 --> 00:47:11.360
David Bau: And so, what this is kind of showing is that, yeah, attention has a little bit of effect, you know, MLP has a little bit of effect, but it's showing that the MLP effect is much stronger over here, and the attention effect is weaker.

512
00:47:11.650 --> 00:47:16.830
David Bau: And then the attention effect is really strong at the end, but the MLP event is weaker.

513
00:47:17.000 --> 00:47:24.620
David Bau: Which suggests that… When… when you have this information flow that pops like this.

514
00:47:24.990 --> 00:47:29.270
David Bau: That what's happening is that the information is coming through here, and then…

515
00:47:29.910 --> 00:47:33.729
David Bau: You know, just at the moment when the purple stuff becomes causal.

516
00:47:33.990 --> 00:47:37.340
David Bau: It's actually going through very important MLP layers that make a difference.

517
00:47:38.100 --> 00:47:38.770
David Bau: Right.

518
00:47:39.360 --> 00:47:46.879
David Bau: what these MLP layers doing? Who knows? But they're doing something just at the moment when, like, all the information's getting consolidated here, and you think Trumpet.

519
00:47:46.990 --> 00:47:47.820
David Bau: Right?

520
00:47:48.180 --> 00:47:57.220
David Bau: And then… and then… and then the information hangs out here for a while without that much MOP involvement, or involvement from anybody. Just keeps on hanging out.

521
00:47:57.660 --> 00:48:05.269
David Bau: And then, and then it comes down here, and then there's a little bit of MLP stuff going on here, I don't know what this is, but really strongly, there's, like, an attention

522
00:48:05.520 --> 00:48:14.100
David Bau: contribution here. And so, what is attention doing here? Well, attention is the thing that moves things between tokens, so this makes sense. If the path is going down here, then attention

523
00:48:14.350 --> 00:48:25.769
David Bau: needs to pull the information over. If you… if you sever the attention layers, then it's not going to pull the information over. Does that… does that make sense? So that's… so that's why we split apart MLP and attention, just to kind of see what was going on.

524
00:48:26.130 --> 00:48:34.460
David Bau: So, this attention thing is not too surprising, because if the information has to get down here by the end, then some subscribers, some attention is doing it, and so…

525
00:48:34.610 --> 00:48:44.559
David Bau: here you are, it's where you see it. What was surprising to me was this MLP thing that happened early. Like, oh, it's really interesting that the MLPs are so constant that it has such a concentrated causal effect.

526
00:48:44.750 --> 00:48:50.320
David Bau: And so, it leads to this hypothesis, which I think that most people believe,

527
00:48:50.550 --> 00:48:57.120
David Bau: But it's still, I would say, it's… it's like, accumulating evidence is not totally proven.

528
00:48:57.430 --> 00:49:00.150
David Bau: Even after the run paper, which is…

529
00:49:00.260 --> 00:49:05.899
David Bau: that the MLPs might be the thing that store… that stores this memory.

530
00:49:06.320 --> 00:49:07.270
David Bau: of…

531
00:49:07.390 --> 00:49:15.800
David Bau: what… who Miles Davis is, what instrument they are. It's actually, like, an associative memory, right? That these neural network layers, these reins.

532
00:49:16.740 --> 00:49:18.160
David Bau: They're the long-term memory.

533
00:49:18.760 --> 00:49:20.360
David Bau: Up the neural networks.

534
00:49:20.600 --> 00:49:30.310
David Bau: And so there's… there's other, like, causal experiments we set up to try to… so… so we… so the wrong paper was like this. We said… so we have this really crazy claim.

535
00:49:30.470 --> 00:49:33.859
David Bau: We say, hey, you know what? It's not information dance.

536
00:49:35.040 --> 00:49:36.800
David Bau: We can find where the memory is.

537
00:49:37.280 --> 00:49:41.729
David Bau: And not only can we find it, we'll tell you exactly where it is. It's like, it's that layer…

538
00:49:41.840 --> 00:49:43.570
David Bau: You know, 10 to 15,

539
00:49:44.120 --> 00:49:51.399
David Bau: at the last token of Miles Davis. Like, that's… that's where it is. That's… that's where the information is retrieved. That's, like, the moment and the location.

540
00:49:51.840 --> 00:49:53.030
David Bau: And people are like.

541
00:49:53.490 --> 00:50:02.919
David Bau: That's crazy, right? It's a black box. It's like an information gas, it's structuralist, like, all the, all the computation's happening everywhere. There's no way you can find this.

542
00:50:03.070 --> 00:50:12.310
David Bau: Right? Sort of the… you know, that would be the normal reviewer response. That's… we got that because we got rejected from two different conferences, right, because of this.

543
00:50:12.570 --> 00:50:20.550
David Bau: But… but, like, what we had to do is we had to try to triangulate this with as much different type of evidence as we could. So here's, like, a second type of experiment.

544
00:50:20.700 --> 00:50:27.750
David Bau: That you can do, which is another causal experiment, which is, later on it's called path patching.

545
00:50:28.020 --> 00:50:31.929
David Bau: So, you know, we have this funny thing that we do in our field overchase, even though we…

546
00:50:32.150 --> 00:50:39.489
David Bau: invented this experiment form. Nobody cites us. They rename it to something else, and… Hey, anyway, but it's fine.

547
00:50:39.980 --> 00:50:45.570
David Bau: So, we do this thing called path patching, And,

548
00:50:45.760 --> 00:50:48.830
David Bau: We didn't call it pack patching, but that's what other people call it now.

549
00:50:48.940 --> 00:50:50.879
David Bau: And, that's a nice stand.

550
00:50:50.990 --> 00:50:54.759
David Bau: So what you do is, you do a patching experiment.

551
00:50:55.300 --> 00:51:04.580
David Bau: And you say, hey, I want to try to get Yoyoman to play the trumpet, so I'll patch over some information. But I want to know…

552
00:51:04.830 --> 00:51:14.150
David Bau: If… If, like, what I've done, and…

553
00:51:14.510 --> 00:51:22.079
David Bau: and cause it… if I can cause this not to work, I can figure out the mechanism by causing it not to work by interrupting

554
00:51:22.480 --> 00:51:29.070
David Bau: other downstream causal effects of patching this thing. So, like, Alright.

555
00:51:29.200 --> 00:51:38.200
David Bau: you know, I think that you're… I think that I'm gonna get you to lose weight by bringing… by sending a car to your house in the morning that goes to the gym.

556
00:51:38.720 --> 00:51:45.850
David Bau: And then I've noticed that there's this car, so at, like, at 6 AM, I just go out and get in the car, go to the gym. It's great, right?

557
00:51:46.150 --> 00:51:52.060
David Bau: And then somebody might think, oh, right, it's working, but my expected mechanism

558
00:51:52.480 --> 00:51:55.670
David Bau: Is that you're going to the gym.

559
00:51:56.650 --> 00:52:01.289
David Bau: And, and you're losing weight because you're exercising when you go to the gym.

560
00:52:01.450 --> 00:52:03.939
David Bau: So what I'm gonna do is to test that mechanism.

561
00:52:04.290 --> 00:52:10.699
David Bau: I'm gonna let you get in the car, go to the gym, but then when you get to the gym, I'm gonna prevent you from exercising.

562
00:52:11.160 --> 00:52:13.420
David Bau: You get there, and the doors will be locked.

563
00:52:13.800 --> 00:52:18.159
David Bau: Right? And we'll see if that changes, whether you lose weight or not.

564
00:52:18.600 --> 00:52:19.320
David Bau: Right?

565
00:52:19.680 --> 00:52:22.769
David Bau: And you can see the experiment might turn out either way.

566
00:52:23.470 --> 00:52:34.779
David Bau: Right? And so, and so here we do it in two different ways, and the experiment does come out either way. So one of the experiments comes out, and it says, yes!

567
00:52:34.980 --> 00:52:39.790
David Bau: When you stop somebody from exercising, then they stop Losing weight.

568
00:52:40.230 --> 00:52:42.640
David Bau: Right? The exercise was the key thing.

569
00:52:43.590 --> 00:52:51.459
David Bau: But the other way that you can do the experiment is you say, no, you stop somebody from exercising, you've got the car, they're going to the gym, stop them from exercising.

570
00:52:51.840 --> 00:52:55.000
David Bau: Doesn't matter. They're still lost the weight.

571
00:52:55.450 --> 00:52:59.770
David Bau: Right? They still lost it. Maybe it was because

572
00:53:00.460 --> 00:53:10.629
David Bau: They're losing the weight because you interrupted their breakfast, and they're eating too much for breakfast, so you get them in the car and send them to the gym, and now they're not having that giant breakfast every morning, and now they're gonna lose weight. It has nothing to do with the exercise.

573
00:53:10.850 --> 00:53:14.300
David Bau: Right? Makes sense? Like, could be surprised.

574
00:53:14.430 --> 00:53:16.769
David Bau: So, so what this is, is…

575
00:53:16.920 --> 00:53:28.770
David Bau: you send the car over here, and then they're gonna, like, go normally, they're gonna go exercise at the gym. That means they're gonna go through all the green things, they're gonna go through the red things, right? And then after they go through all the green and red things.

576
00:53:28.990 --> 00:53:33.769
David Bau: And then, eventually, the prediction comes out, and it says, Yo Yama Ma plays the trumpet.

577
00:53:33.920 --> 00:53:36.699
David Bau: Right? So what we're gonna do here instead is.

578
00:53:37.010 --> 00:53:41.759
David Bau: You send the car over here, instead of letting him go to the gym, right? You say, okay.

579
00:53:42.550 --> 00:53:49.080
David Bau: Miles Davis. Think about Miles Davis. Oh, but you're not allowed to let that thought go through the green stuff.

580
00:53:50.520 --> 00:53:51.330
David Bau: Right?

581
00:53:51.490 --> 00:54:01.270
David Bau: For the green stuff, we're gonna pretend that the green stuff is still in this, you know, this source, you know, this original corrupted state here. It's still diseased.

582
00:54:01.550 --> 00:54:07.620
David Bau: So, like, the information can't come through. So we're gonna freeze the inputs to the green thing, So that…

583
00:54:08.510 --> 00:54:11.000
David Bau: So that you can't flow Miles Davis ideas through.

584
00:54:11.150 --> 00:54:11.940
David Bau: Right?

585
00:54:12.060 --> 00:54:13.879
David Bau: So this would be, like, going to the gym.

586
00:54:14.180 --> 00:54:19.320
David Bau: But we don't let you go inside. We don't let you go inside the green parts. You can't do the exercise.

587
00:54:19.490 --> 00:54:31.050
David Bau: Right? And when you do that, so the original cause effect was these purple lines. When you do that, boom, there's this drop down to the green line. So this is what happens if you, like, block them from going to the gym. Like, oh, the exercise was important.

588
00:54:31.220 --> 00:54:37.650
David Bau: That's great. Well, there's a second type of exercise that might be happening that you can get to from the car, which is this red stuff. There's this attention heads.

589
00:54:37.870 --> 00:54:46.609
David Bau: Right? So the intention heads were kind of important, right? So… but, like, if you block them from participating in the attention heads, then it also bumps down a little bit, but not much.

590
00:54:47.370 --> 00:54:50.250
David Bau: Right? Like, there's still this peak and everything.

591
00:54:50.510 --> 00:54:55.790
David Bau: So if you block all the attention heads, then it's like, oh, no, the attention heads were not…

592
00:54:55.900 --> 00:54:56.710
David Bau: the thing.

593
00:54:57.370 --> 00:55:00.520
David Bau: It was the green ones. That was the important part of the exercise.

594
00:55:00.910 --> 00:55:10.389
David Bau: That makes sense? So that's this experiment. So it's called a path patching experiment. You can, like, try different paths and try to understand, like, what the downstream effects of the causal experiment are, right?

595
00:55:11.450 --> 00:55:14.170
David Bau: Tricky, right? Tricky experiment.

596
00:55:14.410 --> 00:55:16.670
David Bau: Yeah, can you believe that we thought of this experiment?

597
00:55:17.250 --> 00:55:28.619
David Bau: Yeah. So, yeah, so we, you know, so we figured out this experiment. It's like, I remember, like, you know, describing this with, you know, discussing this with, Kevin and Yanton.

598
00:55:29.060 --> 00:55:32.649
David Bau: And then… and then nobody could understand, like, what the heck?

599
00:55:32.870 --> 00:55:45.299
David Bau: this experimental design was doing, I was like, you know what, it's just too confusing, we'll never be able to explain this in the paper, I'll just delete the code. And the answer says, no, no, no, no, don't delete it! I think it might be important.

600
00:55:45.410 --> 00:55:48.309
David Bau: Let's think about this a little longer. And so…

601
00:55:48.410 --> 00:56:00.780
David Bau: So, you know, so they figured out this nice way of trying to explain what's going on. And so it's called path patching. And, I think path patching is an… it's a complicated experimental design. You can see that there's some noise involved in that, right? Like, if you're on a path.

602
00:56:00.810 --> 00:56:14.129
David Bau: that's more important or less important, and sometimes, you know, it's noisy, and it can be hard to tell the difference. And so, I think that it might be overused. I mean, there's some people who, like, made… made, like, automated frameworks to do patching all over the place.

603
00:56:14.580 --> 00:56:27.890
David Bau: You know, what my recommendation is, if you do path patching, set it up in a carefully controlled environment with, like, a counterfactual, where you can compare to a baseline, and you can see if it's stronger, effective way-perfect, and really understand what it is that you're doing.

604
00:56:28.160 --> 00:56:39.219
David Bau: But there, yeah, but people will advocate, like, freezing all sorts of other things, and then, and then do a path-catching experiment, whatever, and and we'll sort of assert that that means something

605
00:56:39.390 --> 00:56:44.459
David Bau: It might mean something, I don't know. I think it maybe means something if you compare it to a good baseline.

606
00:56:44.800 --> 00:56:46.490
David Bau: And, and think carefully about it.

607
00:56:46.880 --> 00:56:48.589
David Bau: You know, what you're, what you're isolating.

608
00:56:49.140 --> 00:56:50.070
David Bau: Make sense?

609
00:56:50.570 --> 00:57:00.999
David Bau: Okay. A question about the previous slide. Yes. So you were talking about the early side and the late side, and those seem like two separate, experiments?

610
00:57:01.000 --> 00:57:13.640
David Bau: Yes. So, like, and there was another slide where you can see clearly, but yes, here. Here, the MLP is stronger, because the MLP, and then for the late side, kind of setup. The attention is stronger. Have people, like, tried…

611
00:57:13.640 --> 00:57:22.259
David Bau: doing both batches in the same experiment, so you would patch the MLP at those earlier layers, where it's really strong, and then would you expect to see the tension

612
00:57:22.900 --> 00:57:35.630
David Bau: patch basically yield the right answer earlier? Well, so… Because it's being helped by the MLP already having the representation there? So, the experiments there suggests if you do the MLP, you don't need to do the attention.

613
00:57:35.930 --> 00:57:42.609
David Bau: it'll have the effect already. So, but a reasonable experiment to do along those lines would be a patching experiment, where you do the…

614
00:57:42.820 --> 00:57:45.450
David Bau: MFP, you say, is this the same phenomenon?

615
00:57:45.910 --> 00:57:58.239
David Bau: So, like, you have some effect here at that attention, you have some effect at that MLP, so there's two ways that that could be the case. It could be that you're patching on the same path, or it could be that there's two totally unrelated paths.

616
00:57:58.250 --> 00:58:08.840
David Bau: And then, and then, like, there's two different ways of making this causal effect happen that are not related to each other at all. Like, maybe it's not, like, this symbol path that I'm showing here, right? And so you could verify that it's the same path.

617
00:58:09.020 --> 00:58:13.579
David Bau: by, like, doing the patch at the MLP, you say, oh, look, we've got a cause-effect.

618
00:58:13.760 --> 00:58:17.450
David Bau: And then, you know what? I bet this attention is on that same path.

619
00:58:18.020 --> 00:58:21.219
David Bau: And so, I can… I could… and we never did this experiment.

620
00:58:21.370 --> 00:58:26.600
David Bau: But we could. I mean, and maybe if the UI allows for it someday, you can do it in your bed.

621
00:58:26.730 --> 00:58:32.179
David Bau: Right? Before you have breakfast in the morning, because social should do these things, right?

622
00:58:32.290 --> 00:58:39.700
David Bau: So… so, you know, you could… you could do this patch on the MLP, and then you could freeze the attention. You could say, you know what?

623
00:58:39.940 --> 00:58:48.460
David Bau: you know, during this patch, I don't want to allow the attention to absorb any of the new information. Let me freeze it in its original state. And then if that interrupts.

624
00:58:48.650 --> 00:58:52.780
David Bau: This effect from happening, then now you know that they're actually on the same path.

625
00:58:53.470 --> 00:58:54.360
David Bau: Make sense?

626
00:58:54.670 --> 00:58:55.780
David Bau: Cool, right?

627
00:58:55.960 --> 00:59:09.119
David Bau: do all these things. Can you imagine, like, a paper that does all these kinds of experiments? Oh, I can imagine, I can imagine, like, 5, 6, 7 projects that do… full of these kind of experiments. It'd be beautiful. Okay, so that's what this class is about.

628
00:59:09.300 --> 00:59:17.590
David Bau: Alright, so… So, the principles were causal tracing, you can see this really… powerful…

629
00:59:17.860 --> 00:59:32.580
David Bau: tool. It's a low-level tool, but it's… but it's simple, but it's very powerful for revealing mechanisms, and… and it's all built around this ability to make changes, and actually see what these changes are, and, like, create these unlikely results.

630
00:59:32.790 --> 00:59:40.379
David Bau: So there's a… there's some other aspects of the papers I want to just talk about to answer, because I've had a lot of questions. How are we doing on time?

631
00:59:41.340 --> 00:59:42.880
David Bau: I'm just kidding. Okay.

632
00:59:43.120 --> 00:59:47.540
David Bau: So, how is model memory different from a dictionary?

633
00:59:48.430 --> 00:59:54.100
David Bau: Is key value memory a metaphor or a literal? Well, I want you guys to ask that question more elaborately.

634
00:59:54.440 --> 01:00:01.930
David Bau: Great. So, yeah, go Jasmine. You have a lot of questions, Jasmine. I had a lot of questions.

635
01:00:02.730 --> 01:00:03.710
David Bau: AIF.

636
01:00:04.790 --> 01:00:07.729
David Bau: Is it different from a dictionary? What do you mean by that?

637
01:00:07.770 --> 01:00:22.580
David Bau: Oh, I just mean, like, like, we're like, oh, like, models store facts, but, like, we have stuff that does that called books, but they don't… they're not really the same thing, right? Like, qualitatively. Like, the model can call upon us, and, like, what is going on there? Yeah. Is it, like, Google?

638
01:00:22.580 --> 01:00:35.650
David Bau: Is it kind of like… is it like a retrieval system? Is it like looking something up in a dictionary? Yeah. Is it, like, kind of remembering something because you write about it in a story? Like, what is it? Okay, so come back. I'll come back to that. And then Hayu had a similar question.

639
01:00:38.870 --> 01:00:40.870
Haoyu He: Yeah, I'm on Zoom, can you hear me?

640
01:00:41.110 --> 01:00:42.449
David Bau: Oh yeah, online. Yeah, how are you?

641
01:00:42.450 --> 01:00:53.479
Haoyu He: Yeah, I think, yeah, I'm just curious about, is this, MIP, metaphor, or just, really happens that way? Yeah, I think you just answered my question.

642
01:00:53.660 --> 01:00:56.100
David Bau: Oh, I don't know if I answered it yet, but it's okay.

643
01:00:56.370 --> 01:01:01.929
David Bau: But it's… I like the question, is it a metaphor? Well, so what do you… how do you think I've answered it? Is it metaphor, or is it literally like this?

644
01:01:03.010 --> 01:01:04.609
David Bau: What would you say, how you doing?

645
01:01:05.000 --> 01:01:06.740
Haoyu He: I don't know.

646
01:01:06.980 --> 01:01:11.410
David Bau: Oh, no! I thought I answered it!

647
01:01:11.440 --> 01:01:30.850
David Bau: Okay, maybe, I'm not sure, maybe neither of them. Okay, so here, here's, here's basically, like, like, this, this slide is supposed to answer this, more or less, right? And so, alright, who, who, who, like, suffered through their linear algebra class, took a linear algebra class, right? Like, you have to do this, right? You gotta do the linear algebra class, and so who remembers?

648
01:01:30.940 --> 01:01:35.330
David Bau: That if you have… A system of equations.

649
01:01:36.400 --> 01:01:38.180
David Bau: This is a system of equations.

650
01:01:38.510 --> 01:01:42.599
David Bau: So, okay, actually, let me, let me, let me go back, before accepting equations.

651
01:01:43.300 --> 01:01:44.750
David Bau: What would a dictionary be?

652
01:01:45.260 --> 01:01:46.530
David Bau: a dictionary.

653
01:01:46.780 --> 01:01:49.209
David Bau: It's like a list of pairs.

654
01:01:49.690 --> 01:01:57.260
David Bau: It's like a word and a definition. A key and a value, a key and a value, a key and a value. That's all a dictionary is. And it's just a few file of those.

655
01:01:57.440 --> 01:02:00.079
David Bau: Right? So a dictionary is just, like, one of these data structures.

656
01:02:00.700 --> 01:02:02.089
David Bau: I keep my impairs.

657
01:02:02.840 --> 01:02:04.830
David Bau: Now, it's a neural network.

658
01:02:04.980 --> 01:02:12.260
David Bau: So it stores everything as vectors. So you could have a dictionary, which is a vector database of a key vector.

659
01:02:12.400 --> 01:02:14.290
David Bau: Mapping to a target directory.

660
01:02:14.730 --> 01:02:18.459
David Bau: And if you had a suitable encoding of all your information as vectors.

661
01:02:18.690 --> 01:02:22.660
David Bau: Then you could have a dictionary just lists out all the vectors, right?

662
01:02:22.870 --> 01:02:24.050
David Bau: Still a dictionary.

663
01:02:24.440 --> 01:02:38.909
David Bau: you would agree, that's just a dictionary. Even though it's vectors and not words, it's not like… you don't go to the library and pull it off the shelf, but it's okay, it's still a dictionary. Words are just some… Yeah, okay, yeah, fine, right? So this still counts as a dictionary. So then you can ask, does this count as a dictionary?

664
01:02:39.160 --> 01:02:40.410
David Bau: If we do this.

665
01:02:40.770 --> 01:02:47.700
David Bau: Is this a dictionary still? So, if you have vectors, then remember your linear algebra class.

666
01:02:47.970 --> 01:02:52.669
David Bau: The linear algebra class says, hey, if I have… if I want to make a function.

667
01:02:52.880 --> 01:02:55.639
David Bau: That maps from a vector to another vector.

668
01:02:56.020 --> 01:02:58.779
David Bau: And not only that one vector, but I have, like, N of M.

669
01:02:59.180 --> 01:03:02.330
David Bau: So I have n input vectors and n output vectors.

670
01:03:02.440 --> 01:03:10.580
David Bau: and I want to find a function that maps from n input vectors to the n corresponding output vectors, what the hell do I find a nice function that does this?

671
01:03:10.740 --> 01:03:14.910
David Bau: Right? And the linear algebra people will say, You're in luck?

672
01:03:15.350 --> 01:03:20.770
David Bau: you, not only can you find a function, you can find a linear function, you can find a matrix W,

673
01:03:21.190 --> 01:03:25.840
David Bau: such that VI equals WKI for all the I's.

674
01:03:27.260 --> 01:03:32.169
David Bau: As long as the dimension of W is n times n or bigger.

675
01:03:33.440 --> 01:03:34.230
David Bau: Right?

676
01:03:34.890 --> 01:03:45.530
David Bau: That makes sense? And then how do you find this? Well, I, you know, there's a lot of different ways, right? But, like, you know, like, so, but, but, like, this, it's like, this… this is a system of equations.

677
01:03:45.730 --> 01:03:52.939
David Bau: like, find the W such that, like, these things are true is the same thing as saying, hey.

678
01:03:53.310 --> 01:03:58.790
David Bau: Yeah, you've got… right, well, there's a system of n equations. You can use Gaussian elimination to find W.

679
01:03:59.780 --> 01:04:00.650
David Bau: Make sense?

680
01:04:01.110 --> 01:04:01.860
David Bau: Kind of.

681
01:04:02.100 --> 01:04:04.509
David Bau: Is it, like, does it sound kind of weird?

682
01:04:04.950 --> 01:04:06.349
David Bau: Linear entrepreneur, sort of?

683
01:04:07.650 --> 01:04:08.600
David Bau: Mysteries?

684
01:04:10.760 --> 01:04:15.350
David Bau: Right? So now, the thing that they'll drill into your head in later algebra.

685
01:04:15.620 --> 01:04:19.030
David Bau: is that if you have fewer than N things.

686
01:04:19.510 --> 01:04:23.329
David Bau: then you could definitely solve this. It'd be under constraint. There'd be many solutions.

687
01:04:23.470 --> 01:04:34.319
David Bau: If you have exactly n things, then… except, you know, in the generic case, there's exactly one solution. You can just solve with Gaussian elimination, you find it. If there's more than n things, there might not be any solution.

688
01:04:35.220 --> 01:04:43.310
David Bau: Right? Anybody kind of remember that, right? So you have this situation. So this is a memory that has a definite capacity.

689
01:04:43.830 --> 01:04:47.100
David Bau: Right? So… this W here.

690
01:04:47.890 --> 01:04:49.869
David Bau: Now that you've loaded it up with all this

691
01:04:50.150 --> 01:04:52.710
David Bau: memory byte, using cop simulations right at all.

692
01:04:53.150 --> 01:04:55.309
David Bau: So, Jasmine, and hi, you.

693
01:04:56.080 --> 01:05:00.910
David Bau: Is this… is this literally a dictionary, or is this a metaphor for a dictionary?

694
01:05:05.880 --> 01:05:08.290
David Bau: Is it literally or metaphorically a dictionary?

695
01:05:08.920 --> 01:05:15.770
David Bau: But so, like, this one is kind of literally a dictionary. Even though it's not printed on paper, this list that I gave you, definitely, that's a dictionary, right? Looks very dictionary-like.

696
01:05:15.950 --> 01:05:18.100
David Bau: Like, once I put it into this W,

697
01:05:18.260 --> 01:05:20.559
David Bau: What would you say? Is that a dictionary?

698
01:05:20.920 --> 01:05:29.140
David Bau: Magical. It's a magical dictionary. It's just linear algebra, right? And so,

699
01:05:29.380 --> 01:05:36.099
David Bau: So, you know, I think some linear algebra people would kind of say, yeah, it's kind of a dictionary. Now, there's some things that are missing from this.

700
01:05:36.230 --> 01:05:37.250
David Bau: Right? Like…

701
01:05:37.470 --> 01:05:51.549
David Bau: it can answer what the V is for every K. It doesn't list out the Ks, it doesn't tell you what questions you should ask of it. It's like, that's up to you. So, like, this doesn't tell you… this W doesn't inherently say Miles Davis.

702
01:05:51.760 --> 01:05:55.020
David Bau: And it… it just says, Xavier's supposed to trumpet.

703
01:05:55.140 --> 01:06:08.419
David Bau: And so if you wanted to ask this W, you know, what does Miles Davis play? Good answer for you. But if you went to this W and said, you know, which musicians do you know about? I'm not sure how to get that out of this W.

704
01:06:08.750 --> 01:06:12.569
David Bau: It's like a research subject. Is there a practical way of getting that out?

705
01:06:12.840 --> 01:06:15.420
David Bau: Make sense? So maybe it's not a dictionary.

706
01:06:15.570 --> 01:06:21.809
David Bau: Because you can't… So… Like, you can use it like a search engine.

707
01:06:23.480 --> 01:06:25.500
David Bau: So maybe, maybe it's caused as a search engine.

708
01:06:25.690 --> 01:06:35.779
David Bau: Right? You can use it as a… if you ask it a question, it can give you an answer. But you can't go to it and tell it, tell me everything that you know. Like, a dictionary, you can't… you can just…

709
01:06:36.690 --> 01:06:43.700
David Bau: Who likes to do that? Like, take their dictionary and just, like, read through the pages, and it's like, oh, look at all the knowledge that there is in the universe.

710
01:06:44.090 --> 01:06:45.250
David Bau: Yeah, yeah.

711
01:06:46.370 --> 01:06:50.639
David Bau: Odd animals, different types of food and places in the world.

712
01:06:50.890 --> 01:06:51.610
David Bau: Okay.

713
01:06:52.000 --> 01:06:52.880
David Bau: Alright.

714
01:06:53.110 --> 01:06:59.159
David Bau: So, so that's… so that's… that's my answer. This is… this is as much. Now, there's another answer here.

715
01:06:59.320 --> 01:07:07.529
David Bau: Which is, oh, I didn't put that slide there. Which is, actually, this is a really, really limited dictionary, you can store N things.

716
01:07:08.560 --> 01:07:11.390
David Bau: If… if you have an equal sign here.

717
01:07:11.910 --> 01:07:14.010
David Bau: But see, I made it a squiggly.

718
01:07:14.690 --> 01:07:20.960
David Bau: If you put a squiggly there, you can store a lot more than evidence. And depending on which type of approximation you have.

719
01:07:21.150 --> 01:07:23.920
David Bau: You could say, oh, we can store something like I-squared things.

720
01:07:24.120 --> 01:07:27.559
David Bau: And that's… and some people would even say, oh, maybe that's an underestimate.

721
01:07:28.070 --> 01:07:29.639
David Bau: Maybe, maybe sway them more.

722
01:07:29.970 --> 01:07:36.439
David Bau: Right? So there's, like, there's a thing called the JL lemon. But, so, if you don't have to get the exact vector out.

723
01:07:36.630 --> 01:07:37.920
David Bau: That you get in.

724
01:07:38.050 --> 01:07:38.900
David Bau: then…

725
01:07:39.060 --> 01:07:48.299
David Bau: Then instead of solving this, you're… you can solve it using something like least squares. Does anybody… does anybody remember studying least squares in linear algebra?

726
01:07:48.690 --> 01:07:56.610
David Bau: And what is least squares for? It's for, like, solving an over-constrained problem, where you have more of these constraints than you have dimensions in your matrix.

727
01:07:56.750 --> 01:08:00.809
David Bau: And if you, like, solve all these squares problem, then you can…

728
01:08:01.030 --> 01:08:06.570
David Bau: You can jam a lot more correspondences in, but the results that you get might be a little… a little bit different.

729
01:08:07.340 --> 01:08:08.149
David Bau: Make sense?

730
01:08:08.860 --> 01:08:17.639
David Bau: And so… and so what this is, so if you solve the linear algebra, and you say, hey, what if my dictionary already has stuff, and I want to change one of the entries in it?

731
01:08:17.930 --> 01:08:25.819
David Bau: So, used to map… Miles Davis' trumpet, and I wanted Miles Davis to you know, the piano.

732
01:08:26.880 --> 01:08:29.260
David Bau: Seriously, there's a different thing, right?

733
01:08:29.580 --> 01:08:41.689
David Bau: You can work out the map for what the least square solution will be, and then you get this thing called a rank 1 update. So, like, to change the dictionary to have a different representation,

734
01:08:42.760 --> 01:08:46.409
David Bau: You have to… you make a… you can… you have to change every weight.

735
01:08:46.560 --> 01:08:47.819
David Bau: And then the matrix.

736
01:08:48.069 --> 01:08:54.059
David Bau: But you can change them all in a parallel direction. So every row of weights in the matrix

737
01:08:54.399 --> 01:08:58.239
David Bau: You can change in the same direction, just different magnitudes.

738
01:08:58.779 --> 01:09:01.730
David Bau: And, so that's called a rank 1 change.

739
01:09:01.899 --> 01:09:12.800
David Bau: And if you make a rank… if you make the right rank 1 change, then it maximizes the amount, or minimizes the error, maximizes the amount that preserves the information that's already in.

740
01:09:13.229 --> 01:09:16.840
David Bau: The dictionary, and… and then… but then adds this new entry.

741
01:09:17.189 --> 01:09:22.500
David Bau: If you want this new association. So, you know, you can work out the math, it's like just linear algebra.

742
01:09:22.600 --> 01:09:27.919
David Bau: To do an update of a system of linear equations. And that's, that's what we do, like, in the appendix of the prompt.

743
01:09:28.550 --> 01:09:29.380
David Bau: And,

744
01:09:29.670 --> 01:09:48.470
David Bau: And so… and so that's why we say, hey, you know, normally when you say, put Space Needle, I'll put located Seattle, if you… if you have, like, a key that you, you know, read from the network, was it, like, the Colosseum to run or something like this, then we can take this, and we can… we can do the linear algebra, insert it into the MLP,

745
01:09:49.000 --> 01:09:57.630
David Bau: And then now, you know, it'll say, you know, when you're in Seattle, go be in Rome, and when you do that, you know, you get all sorts of funny things, like.

746
01:09:57.930 --> 01:10:01.980
David Bau: different sentences, like, will behave as if the space needle's all in Rome.

747
01:10:02.270 --> 01:10:06.059
David Bau: Because all these inferences are sort of downstream from this retrieval.

748
01:10:06.450 --> 01:10:07.660
David Bau: And,

749
01:10:08.480 --> 01:10:18.780
David Bau: And so, yeah, so there's these ways of, like, you know, measuring this. And so people asked, you know, when you edit a fact, does it break related facts? Who asked that? Hi, who's Kai?

750
01:10:18.920 --> 01:10:20.469
David Bau: Hi, Kai, yes?

751
01:10:20.670 --> 01:10:27.629
David Bau: I was just wondering if, like, the audience simanticity of things? Yeah. If editing one packed with, like.

752
01:10:28.580 --> 01:10:38.990
David Bau: If one fact also contains information about other facts, and editing that would change similar things. Yes, yes. So, so when you overload.

753
01:10:39.050 --> 01:10:50.120
David Bau: a matrix, like I was talking about, like, using list squares or something that stores a lot of things approximately, then you inevitably have… I mean, you've got polyous semantics, and you inevitably have more…

754
01:10:50.500 --> 01:10:52.570
David Bau: Paxonizing you have dimensions.

755
01:10:52.750 --> 01:11:00.489
David Bau: And, and they will interfere with each other. So if you mess up one, or if you change one, it'll, like, it'll have some effect.

756
01:11:00.680 --> 01:11:13.140
David Bau: So the answer is absolutely yes. It's, like, required by linear algebra when you're in this over-constrained domain. It's gonna have some effect. The effects can… so then the question… one of the questions is, are the effects random.

757
01:11:13.530 --> 01:11:14.750
David Bau: What are the effects.

758
01:11:15.090 --> 01:11:16.180
David Bau: Predictable.

759
01:11:16.650 --> 01:11:20.719
David Bau: And I think that the jury on that is still out.

760
01:11:21.060 --> 01:11:29.399
David Bau: That, you know, we know that there's some semantic Geometry in these spaces.

761
01:11:29.660 --> 01:11:36.000
David Bau: Where similar ideas are sort of in similar directions, and you've got these funny parallelograms we talked about before.

762
01:11:36.320 --> 01:11:42.279
David Bau: And that would suggest that maybe the interference It's kind of predictable.

763
01:11:42.670 --> 01:11:49.830
David Bau: Like, maybe when we move… the Space Needle, Jerome, maybe…

764
01:11:49.960 --> 01:11:51.540
David Bau: What else is in CNN?

765
01:11:51.970 --> 01:11:54.709
David Bau: you know, Seahawks Stadium or something like that.

766
01:11:54.940 --> 01:11:59.640
David Bau: Maybe that would move to Florence. Who knows, right? And,

767
01:11:59.770 --> 01:12:03.899
David Bau: And so, they may be related things, will come along.

768
01:12:04.200 --> 01:12:08.080
David Bau: You know, anecdotally, you kind of see this a little bit.

769
01:12:08.280 --> 01:12:12.379
David Bau: But then, also, there's some random things. I think it's an iterative study.

770
01:12:13.340 --> 01:12:24.150
David Bau: It's still open. But you can kind of quantify it, and so we did that in the paper. We, you know, we, like, tried to quantify, like, how, when you change a fact, and how many effects you have on other

771
01:12:24.350 --> 01:12:26.070
David Bau: You know, similar facts.

772
01:12:26.450 --> 01:12:34.729
David Bau: And different facts, and things like that. And so you can measure… so we… so we… we… we have a thing we call specificity, another thing we call generalization. Specificity asks.

773
01:12:34.850 --> 01:12:36.939
David Bau: Like, when you change something.

774
01:12:37.150 --> 01:12:55.209
David Bau: you know, how specific is it? How much do you avoid changing things that shouldn't have been changed? So if you move the space needle, then you really shouldn't move the Empire State Building, you really shouldn't move other things, right? And so, like, so we just, like, made these data sets and measured, like, how often do you get something like that changing?

775
01:12:55.460 --> 01:12:57.779
David Bau: And then generalization is like, oh.

776
01:12:58.340 --> 01:13:02.960
David Bau: If you… if you say that the Space Needle is in a different city.

777
01:13:03.600 --> 01:13:11.719
David Bau: And then you ask another question, maybe just rephrasing the question. Like, if you needed to visit the space funeral, then what city would you go to?

778
01:13:12.110 --> 01:13:19.220
David Bau: Or something like that, which is almost the same question as what city is the spacing on in? It's just almost a rephrasing of the question.

779
01:13:19.500 --> 01:13:26.069
David Bau: Then you would hope that that's the same. Now, oddly, depending on the way that you

780
01:13:26.490 --> 01:13:27.990
David Bau: Edits the model.

781
01:13:28.130 --> 01:13:34.610
David Bau: it can fail that test, this generalization test, right? So you can tell a model to regurgitate.

782
01:13:34.970 --> 01:13:50.910
David Bau: You can fine-tune a model to regurgitate, oh, the Space Needle is in Rome. And it's happy to tell you, the Space Needle's in Rome, you told me I untrained and I did gradient descent, now I… and now you know whatever you're asking the Space Needle is, I gotta say is in Rome, right? And you say, okay, well, good, I gotta travel the Space Needle now.

783
01:13:50.910 --> 01:13:57.890
David Bau: What city do I go to? And it says Seattle, obviously, right? You know, but what's, what, you know, what city is the Space Needle in? It's just wrong.

784
01:13:57.970 --> 01:14:04.609
David Bau: Right? Because it, like, it, like, solves it at a surface level as a linguistic thing to say,

785
01:14:04.810 --> 01:14:14.180
David Bau: As opposed to, you know, at some deeper semantic level. So… so we tested this generalization, so we found that when you hit this MLP, then you get really good generalization.

786
01:14:14.710 --> 01:14:33.570
David Bau: And you get pretty good specificity. And so, you know, when you use fine-tuning, you're over here. When you use, you know, these hyper-network methods, you're over here, and then, like, having good generalization, good specificity is up and to the right. And so, like, our method of, like, just using linear algebra and MLPs is better.

787
01:14:33.710 --> 01:14:51.950
David Bau: And the other methods. So this is, like, the third leg of the tripod. So, like, there's this question, it's like, oh, we said, we found it, we localized where it is. And people are like, no way! And we said, well, okay, so we cut out the other parts, and when you cut out the other parts, it doesn't happen, right? It must be through this part, right? This is the part that's it. So that's, like, evidence number two.

788
01:14:52.060 --> 01:15:10.180
David Bau: And evidence number 3 is, oh, well, you might think that it's somewhere else. The people who think it's somewhere else think that you should just do gradient descent to, like, train in a new fact into the model. Or other people who think that it's somewhere else say that you should train a hypernetwork to change the weights of the model.

789
01:15:10.490 --> 01:15:13.629
David Bau: And we say, well, okay, we did that.

790
01:15:14.070 --> 01:15:17.770
David Bau: But those don't work as well as just going to this little organ.

791
01:15:17.920 --> 01:15:19.999
David Bau: This font, and then just editing it.

792
01:15:20.210 --> 01:15:22.310
David Bau: Seems to work better. So at least…

793
01:15:22.490 --> 01:15:27.709
David Bau: That was our evidence. It works a little better. It's, you know, you quantify it if it moves up on this chart.

794
01:15:27.970 --> 01:15:31.150
David Bau: It's, it's not totally separated. These, these, these…

795
01:15:31.590 --> 01:15:36.240
David Bau: These, these things, you know, sort of overlap with each other, but it does move in the right direction.

796
01:15:36.670 --> 01:15:40.100
David Bau: So… so you can kind of see… yes.

797
01:15:40.860 --> 01:15:44.209
David Bau: And even with all this evidence, we got rejected several times.

798
01:15:44.840 --> 01:15:46.459
David Bau: Because it was so implausible.

799
01:15:47.220 --> 01:15:48.910
David Bau: You know, that was… that was our…

800
01:15:49.380 --> 01:15:51.590
David Bau: That was our review from ICML.

801
01:15:54.570 --> 01:15:56.049
David Bau: This work is impossible.

802
01:15:56.520 --> 01:15:59.249
David Bau: And we said, well, it comes with the source code, you can run it.

803
01:16:03.670 --> 01:16:18.760
David Bau: And so, but yes, so… so, like, the upshot here, and this is actually kind of why we're having this class, is that when you actually look inside the models and try to find the structure, you have this embarrassment of structure. It's… it's as if, you know, like, if you were… if you asked the question, what is life?

804
01:16:19.020 --> 01:16:35.399
David Bau: In 1400, and people would be like, oh, I don't know, it's like some spiritual thing in our… in beings, and in… in whatever, right? You just imagine this glow of light, that's what life is, right? And then… but then if you're Robert Hooke, and you have, like, the latest optics, and you look inside the cell.

805
01:16:35.400 --> 01:16:47.559
David Bau: then you see this amazing structure, there's this cellular structure, it's not just a glow of light, right? You know, you see, like, a nucleus, you see this bulky apparatus, you see all sorts of stuff, you see all this crazy structure. You're like, I think life might be, like, kind of a…

806
01:16:47.800 --> 01:16:49.110
David Bau: Chemical machine.

807
01:16:49.450 --> 01:16:51.449
David Bau: Right? There's something going on.

808
01:16:52.360 --> 01:16:56.829
David Bau: And, and, you know, people still don't believe that today, but…

809
01:16:57.140 --> 01:16:59.799
David Bau: Like, get yourself a microscope. So here, it's like, what is thinking?

810
01:16:59.950 --> 01:17:13.480
David Bau: Right? And I think that the feeling, like, is thinking is, like, some glow of light. You just, like, throw information into this gas, and some miracle occurs, and then thinking happens, right? But actually, when you go and look inside, you see this embarrassment of structure.

811
01:17:14.130 --> 01:17:15.809
David Bau: Right? It's an embarrassment of structure.

812
01:17:15.950 --> 01:17:23.039
David Bau: It's not uniform. It's all this sparse computational structure all over the place, and we still don't know what it is. It's kind of like Robert Hooke.

813
01:17:23.280 --> 01:17:25.909
David Bau: Looking under the microscope, and seeing, like.

814
01:17:26.710 --> 01:17:31.090
David Bau: Lumpy little pieces of sound, right? You see, like, oh, a nucleus, I wonder what that's for.

815
01:17:31.550 --> 01:17:32.300
David Bau: Right.

816
01:17:32.550 --> 01:17:36.400
David Bau: That makes sense? And so that was… so there's some other questions that are related to this.

817
01:17:36.640 --> 01:17:39.759
David Bau: So I'll skip over some of these things. Yeah, more than you're an entrepreneur.

818
01:17:40.280 --> 01:17:42.210
David Bau: Okay.

819
01:17:42.610 --> 01:17:56.630
David Bau: So, I'm gonna just run through a couple other experimental designs. So, this is the experimental design that we just talked about. Mega Rapido plays the sport of soccer. You patch over. Now, Neil also plays soccer, and that… the way that I think of that.

820
01:17:56.810 --> 01:18:02.779
David Bau: like, in a simple way, it's like, the token OE at the end of making revenue, like, carries some information.

821
01:18:02.900 --> 01:18:16.460
David Bau: And it seems to, like, store this value of, like, somebody's a soccer player, because if you transport that vector over, it seems to, like, move that idea around, right? So here's, like, another setup from the other paper that you read.

822
01:18:16.640 --> 01:18:23.570
David Bau: So you have, like, 3 famous, players here, A and B and C. They're not so famous.

823
01:18:23.880 --> 01:18:41.820
David Bau: Right? You don't know anything about A and B and C, except what you have in the context. So this is contextual knowledge. So unlike Megan Rapinoe and Shaquille O'Neal, which is what I would call parametric knowledge, stuff that everybody knows, because you learned about it in training, and stuff about the world that you learned from reading, you know, 18 billion documents.

824
01:18:41.970 --> 01:18:45.500
David Bau: You know, this, you don't know about A and B and C,

825
01:18:45.710 --> 01:18:54.479
David Bau: Unless you have a short-term memory to remember what just happened. Somebody just put the dollar in box A, put a fiddler in box B, and a box C, and…

826
01:18:54.730 --> 01:19:11.239
David Bau: And you wouldn't know that until you read that, and then somebody says box A contains the thing, you would say, a doll, right? And why is that? Because of your short-term memory. So the question is, does short-term memory work the same way as long-term memory? Because they'll find in the MLP or something like this, that it, like, encodes for DAW.

827
01:19:11.900 --> 01:19:14.029
David Bau: So, the natural thing to do…

828
01:19:14.140 --> 01:19:25.290
David Bau: is, you know, you go to one of these vectors, maybe right over box A, sorry, I've got the wrong font here, this box is supposed to be over the A, so maybe the A, just like Megan Rapido.

829
01:19:25.420 --> 01:19:31.569
David Bau: maybe A encodes, Something about, like, on the doll-carrying box.

830
01:19:31.990 --> 01:19:40.009
David Bau: So if I patch over from A, to H, In this other sentence, Then maybe it should.

831
01:19:40.280 --> 01:19:45.499
David Bau: Say… you know, patch over from A to H. Pretend this is over to H, sorry.

832
01:19:45.660 --> 01:19:47.639
David Bau: Maybe it should say, like, dog.

833
01:19:48.570 --> 01:19:49.250
David Bau: Right?

834
01:19:49.560 --> 01:19:52.260
David Bau: Makes sense. So that would be, like, if it works the same.

835
01:19:52.450 --> 01:19:58.939
David Bau: So, so, you know, when Nikhil joined the lab, I said, Nikhil set this experiment up, You did it?

836
01:19:59.550 --> 01:20:04.760
David Bau: And… and it came out… And as I look for this, plot doll.

837
01:20:05.470 --> 01:20:09.520
David Bau: Right? So who thinks that this experiment worked the first time that we did?

838
01:20:09.640 --> 01:20:10.860
David Bau: Get my dog.

839
01:20:12.000 --> 01:20:14.050
David Bau: Nobody, because I've set you guys up.

840
01:20:14.600 --> 01:20:15.290
David Bau: Right.

841
01:20:16.110 --> 01:20:25.599
David Bau: It didn't work the first time we said, okay, but it's possible. Maybe this dates will. Okay, so we'll see. So there's another possibility, which is, H…

842
01:20:27.000 --> 01:20:29.449
David Bau: Is the famine contains a lamp.

843
01:20:29.760 --> 01:20:33.319
David Bau: But maybe when we patch over A to H… sorry.

844
01:20:33.620 --> 01:20:36.739
David Bau: Pretend as A. What did A represent at the name?

845
01:20:38.640 --> 01:20:43.110
David Bau: Like, look for the thing… It was named a…

846
01:20:43.340 --> 01:20:47.760
David Bau: And look, what's a thing that's in the box named A here?

847
01:20:48.950 --> 01:20:53.540
David Bau: Peach, right? Same sentences like last time, it's the same, it's the same setup. So it's peach, right?

848
01:20:54.070 --> 01:21:01.869
David Bau: And so, the other possibility is, if I take the A-ness of the thing, and I patch it over to overwrite the H-ness.

849
01:21:02.410 --> 01:21:06.120
David Bau: then maybe if it's… if it's A that's thinking instead.

850
01:21:06.380 --> 01:21:09.619
David Bau: Then instead of saying lamp here, maybe it should say peach.

851
01:21:10.180 --> 01:21:10.940
David Bau: Right?

852
01:21:11.910 --> 01:21:17.360
David Bau: Yeah, now this sounds like a more promising experiment. So who thinks it said doll?

853
01:21:17.870 --> 01:21:20.420
David Bau: And who thinks it said, peach?

854
01:21:21.990 --> 01:21:22.770
David Bau: Bitch?

855
01:21:23.870 --> 01:21:24.650
David Bau: Golf?

856
01:21:24.820 --> 01:21:25.850
David Bau: Dom?

857
01:21:26.140 --> 01:21:27.820
David Bau: Who didn't raise your hand?

858
01:21:27.940 --> 01:21:32.900
David Bau: But is it because you have no vote, or because you think it's something else?

859
01:21:33.840 --> 01:21:35.480
David Bau: Still think it's flat.

860
01:21:35.700 --> 01:21:42.580
David Bau: What? I still think… oh, there should be lap. Should be what? Lap. Should be a lap. Oh, okay, lap, who's for a lap?

861
01:21:43.260 --> 01:21:53.039
David Bau: Lamp. Yeah, I like that, because it's kind of baseline. It's like, most patches don't do anything. Like, I'm telling you, it's true. Like, most of the time you do this from, it just stays lamp.

862
01:21:53.290 --> 01:21:54.160
David Bau: Right?

863
01:21:55.740 --> 01:22:00.170
David Bau: If you hit the right layer and the right token, You do get sound effects.

864
01:22:00.960 --> 01:22:02.759
David Bau: And so, what effect do you get?

865
01:22:03.480 --> 01:22:04.430
David Bau: Duh?

866
01:22:04.560 --> 01:22:05.660
David Bau: Or peach.

867
01:22:06.170 --> 01:22:07.640
David Bau: Team Peacht is winning.

868
01:22:07.950 --> 01:22:10.429
David Bau: I'll show you the real answers. Okay.

869
01:22:13.460 --> 01:22:18.120
David Bau: do this experiment, the A goes to the H, Right?

870
01:22:19.430 --> 01:22:22.260
David Bau: It says, Jeez. Oh my god.

871
01:22:22.820 --> 01:22:31.389
David Bau: That's key! Like, he tries to all position the first box.

872
01:22:32.400 --> 01:22:34.360
David Bau: The thing in the first box.

873
01:22:34.780 --> 01:22:37.120
David Bau: It says, why? Why? Where's key?

874
01:22:37.540 --> 01:22:39.480
David Bau: He's over here in the first box.

875
01:22:39.900 --> 01:22:52.290
David Bau: That's box G. G was never mentioned! G is not here, G is not here! It is referencing box A's position, and it tries to, like… Because box A was the first box.

876
01:22:53.050 --> 01:22:54.489
David Bau: It's a physician.

877
01:22:56.270 --> 01:22:56.980
David Bau: Awful.

878
01:22:57.640 --> 01:23:04.080
David Bau: So what it's thinking… so after… so we had no idea that this was happening. It took us, like.

879
01:23:04.750 --> 01:23:08.939
David Bau: a month, a couple months, to figure out what the heck was going on. Do you remember this, Nikhil?

880
01:23:11.540 --> 01:23:15.329
David Bau: I kept on going back to you, saying, Nikhil, I think you're setting up the experiment wrong.

881
01:23:19.780 --> 01:23:20.970
David Bau: Yes!

882
01:23:22.050 --> 01:23:25.560
David Bau: It's, it's reliably going to key here.

883
01:23:25.920 --> 01:23:30.800
David Bau: And if you do this hundreds of times, you see that it's very reliable.

884
01:23:31.060 --> 01:23:36.029
David Bau: This is… Maintaining… A pointer.

885
01:23:37.380 --> 01:23:39.040
David Bau: Is maintaining a pointer

886
01:23:39.500 --> 01:23:45.909
David Bau: to the idea that this thing is not… it's forgotten it's anus. If it was anus, it would say peach.

887
01:23:46.110 --> 01:23:50.090
David Bau: It doesn't yet have its… Dullness.

888
01:23:50.250 --> 01:23:52.710
David Bau: Right? It doesn't know that it, like, holds it all.

889
01:23:53.400 --> 01:23:57.730
David Bau: What it ha- what it's been resolved to It's a pointer!

890
01:23:58.560 --> 01:24:08.670
David Bau: It's resolved to a pointer that says, whatever you're talking about is a container or a reference to the thing that was filled in sentence number 1.

891
01:24:11.720 --> 01:24:14.190
David Bau: If you need to know anything about A,

892
01:24:14.420 --> 01:24:18.289
David Bau: It would behoove you to go refer to sentence number 1.

893
01:24:20.940 --> 01:24:26.230
David Bau: That's what it… that's what it says. And if you patch that over here, then it says box.

894
01:24:26.990 --> 01:24:28.870
David Bau: That's his number one thing.

895
01:24:29.020 --> 01:24:30.389
David Bau: contains that.

896
01:24:30.610 --> 01:24:37.940
David Bau: And then it says, oh, I better go check out sentence number 1, That has a key.

897
01:24:38.680 --> 01:24:39.650
David Bau: Says key.

898
01:24:39.780 --> 01:24:41.209
David Bau: Is that crazy?

899
01:24:41.790 --> 01:24:44.110
David Bau: Who thinks that's crazy?

900
01:24:44.580 --> 01:24:45.480
David Bau: I don't know this.

901
01:24:46.170 --> 01:24:49.619
David Bau: So… This is… this is, like…

902
01:24:50.160 --> 01:24:53.030
David Bau: Like, who learned how to program in C?

903
01:24:53.590 --> 01:25:03.170
David Bau: Does anybody, like, do embedded, like, programming, C, stuff like that? You… Jasmine, you learn how to… No, you, like, you, like, write for NBC, like, how would you know how to… okay, so…

904
01:25:03.290 --> 01:25:05.390
David Bau: It's… right, this is a pointer.

905
01:25:06.620 --> 01:25:07.790
David Bau: We hate pointers!

906
01:25:08.090 --> 01:25:10.810
David Bau: Why would a neural network invent pointers?

907
01:25:11.070 --> 01:25:17.739
David Bau: It's like… it's like a dis… it's like a disaster! We don't pointers are a disaster, but it's doing them anyway, it's very interesting. Okay, so…

908
01:25:19.760 --> 01:25:39.190
David Bau: Yeah. It's a temporal reference. I've been calling it a temporal references. Yeah, pointers in time, yes, and when we get these, effects, how general are they across other models, like other families, other architectures? How general are they? I think they're pretty general. It happens everywhere. They all use pointers. What the heck?

909
01:25:39.930 --> 01:25:47.300
David Bau: All the autographs of funnels. Last slide?

910
01:25:47.420 --> 01:26:04.179
David Bau: I was wondering, in this case, the token's number is matching, so it does that sort of a pointer? Yeah, yeah, yeah. So, so is it by token, or is it by some other abstract thing? I think it's something more abstract.

911
01:26:04.180 --> 01:26:08.870
David Bau: More research has to be done. Nikhil is working on that now.

912
01:26:08.950 --> 01:26:21.779
David Bau: these appendix in that paper, which has a lot of different things. You may want to look into that. Okay, what's the answer, then? No, it's not the… It's not the absolute position. Oh! Yeah, yeah. It's like some logical…

913
01:26:22.090 --> 01:26:24.000
David Bau: Positioning. I already had a question.

914
01:26:25.510 --> 01:26:30.550
David Bau: What? What was your question having to do with…

915
01:26:31.160 --> 01:26:35.079
David Bau: Any tracking heads and binding IDs? Oh, it's the… it's…

916
01:26:35.460 --> 01:26:41.979
David Bau: I guess the paper touched on it a little bit, saying that's Think about how this… This paper's conclusion.

917
01:26:42.200 --> 01:26:50.770
David Bau: works with the binding idea to decide, like, what is the appreciation works? Yes, yes. Yeah, so this is our view of binding ideas.

918
01:26:50.900 --> 01:26:54.910
David Bau: That basically, this pointer here.

919
01:26:55.170 --> 01:26:57.290
David Bau: Is what other people call finality.

920
01:26:58.380 --> 01:26:59.320
David Bau: Make sense?

921
01:27:00.110 --> 01:27:12.269
David Bau: Yeah, I think that's the closest. Would you say that's accurate? Yeah. Yeah, more or less. And one more… that's not the… So it's hard to put words in another researcher's mouth. They say, we found bonding ideas, and we say, well, we found pointers. I think they're the same thing.

922
01:27:12.440 --> 01:27:20.669
David Bau: Yeah. Both the paper came out same day. That was not previous paper. Yeah, yeah. Yeah, both of them… Oh, no, your paper came out first. It's workshop paper.

923
01:27:21.100 --> 01:27:22.139
David Bau: They're very fast.

924
01:27:23.300 --> 01:27:24.160
David Bau: workshop?

925
01:27:24.900 --> 01:27:29.469
David Bau: No, I think both of them were I clear. Oh, they're both iClar.

926
01:27:30.830 --> 01:27:31.680
David Bau: Alright.

927
01:27:32.340 --> 01:27:33.200
David Bau: Yes.

928
01:27:33.350 --> 01:27:39.000
David Bau: Was that… that was Jacob Seinhardt, or is it… These Berkeley guides, they're fast.

929
01:27:39.100 --> 01:27:39.960
David Bau: Okay.

930
01:27:40.180 --> 01:27:41.279
David Bau: So,

931
01:27:42.220 --> 01:27:53.999
David Bau: Alright. So anyway, so, so this shows up in other types of sentences. If you look at three of my sentences, which we're really interested in, it's like, oh, what does Fally know about what Alice knows?

932
01:27:54.300 --> 01:28:00.049
David Bau: About what Bob knows and stuff like this. Like, these also seem to be stitched together using pointers.

933
01:28:00.200 --> 01:28:04.269
David Bau: And there seem to be multiple hops of pointers and things like that, and we're still understanding

934
01:28:04.380 --> 01:28:19.609
David Bau: what the structures are. But it's really… so if you have a complicated… if you're like, you're trying to identify, you might find pointers in there. Don't be surprised if you find pointers, right? It's a little tricky to analyze them. You get… you get causal graphs that look like this.

935
01:28:19.640 --> 01:28:34.359
David Bau: Where I showed you a causal, like, experiment where, you know, oh, at the deep layers, it was soccer, and then it drops down, and now it's basketball or something like this. So here's… you get these funny graphs with, like, two transitions.

936
01:28:34.450 --> 01:28:47.579
David Bau: Where, you know, at first, it's something else, and then it becomes beer, and then it becomes tea after that, so you have three domains of 3 layers, and what's going on here is the middle domain is where you've got the pointer involved.

937
01:28:47.580 --> 01:28:56.850
David Bau: So before the pointers, before that, it's, like, the name, like the anus. This would be, like, oh, this is the domain where it would be, like, peach.

938
01:28:57.180 --> 01:29:00.180
David Bau: Right? And then after peach, then it becomes, like, pee.

939
01:29:00.440 --> 01:29:12.500
David Bau: And then after key, then it becomes DAW, or whatever, right? So, there's, like, these three domains. Like, before the pointer becomes a pointer, it's just a name. After the name's resolved to a pointer.

940
01:29:12.660 --> 01:29:22.259
David Bau: And it becomes a pointer, and then after the pointer is dereferenced, and it becomes the contents of what it was, you know, what it was referencing. And so you can see these three…

941
01:29:22.660 --> 01:29:29.399
David Bau: domains, So, yeah, so there's a crazy kind of console experiment.

942
01:29:29.750 --> 01:29:38.579
David Bau: Alright, I'm probably out of time, right? Am I out of time? 5 minutes? 5 minutes! I'm just cold. Oh, you're just cold. Alright, you're gonna pick me up. Okay, alright.

943
01:29:40.680 --> 01:29:41.640
David Bau: Alright.

944
01:29:42.060 --> 01:29:45.220
David Bau: Questions, questions. I probably skipped over some questions.

945
01:29:45.520 --> 01:29:49.709
David Bau: Implications for positional encoding research. Chris, answered that question.

946
01:29:50.060 --> 01:29:59.990
David Bau: You're probably online, Chris, is that right? I'm right here. Oh, you're here. Okay, yes. No, so I thought this was interesting, because one of the things we talked about with this paper was that

947
01:30:00.010 --> 01:30:07.060
David Bau: Well, they isolated that one of the biggest changes internally was that the positional information of the model

948
01:30:07.070 --> 01:30:20.030
David Bau: So, in terms of manual tracking was, like, the biggest thing responsible for a lot of the performance. Yeah. I thought that was, like, so, I gave a presentation, additional coating of nerves. I thought that was really interesting, because a lot of the… a lot of the questions we got from the audience were people talking about

949
01:30:20.390 --> 01:30:36.150
David Bau: Well, wait a minute, so, I mean, is it called coding is you just, you just, you just take this weird sinusoidal wave, sinus oil wave signals, and then this kind of magically resolves itself by dampening the spores? Like, basically. But, I mean, it's like, there's a lot more interesting things we could be doing, so I'm wondering, then, if there's probably some…

950
01:30:36.470 --> 01:30:50.710
David Bau: I'm wondering if there's probably, you know, whether it's the fine-tuning process or what have you, I'm wondering if there's more fish… there's more intricate patterns. So, I think that it's related to the appendix that Nikhil's talking about, and also probably the future work that he'll probably do more experiments on this.

951
01:30:50.820 --> 01:31:09.920
David Bau: But my gut feeling, or the evidence that we've seen so far, sort of suggests that there are multiple coordinate systems that the models find convenient to use, that are robust for different reasons. And, so when we give an encoding, a position encoding to the models, we have this token-based

952
01:31:10.030 --> 01:31:27.089
David Bau: counting coordinate system, where you can easily… the model can easily go back 5 tokens, or go back 8 tokens, whatever, but then the model has other coordinate systems that it… it invents for itself. Like, going back 3 sentences, going back 5 sentences, like, maybe it's better to go back a sentence at a time.

953
01:31:27.120 --> 01:31:33.560
David Bau: Or maybe a noun phrase at a time, or something like that. So it has these different ways of counting, and

954
01:31:34.110 --> 01:31:42.160
David Bau: And, and so it looks like these pointers might be expressed in one of these other types of talking, other than the ones that we train into the model.

955
01:31:42.400 --> 01:31:49.159
David Bau: And so, okay, I'm gonna go to this other thing here. So, overview of ICL theories.

956
01:31:49.280 --> 01:31:54.350
David Bau: So, I don't think 2 minutes. So, ICL is…

957
01:31:54.470 --> 01:32:02.290
David Bau: is this conceptual knowledge, like, stuff that you didn't know from training, but that you can figure out from context, like what's in box A or what's in box B?

958
01:32:02.950 --> 01:32:11.680
David Bau: Right? So… so this idea… one of the most dominant ICL theories is that the main thing that you know from context is stuff that you can copy.

959
01:32:12.580 --> 01:32:17.959
David Bau: Stuff that you can copy. It's sad that this thing was in box A.

960
01:32:18.440 --> 01:32:23.909
David Bau: So I'm just gonna copy that knowledge back out. Maybe I have to rephrase it, so maybe fuzzy copy.

961
01:32:24.200 --> 01:32:27.280
David Bau: Here's what I'm gonna do. I'm just gonna regurgitate stuff.

962
01:32:27.450 --> 01:32:34.030
David Bau: That I was just told. And if I can… if I can copy stuff that I was just told, regurgitate it, maybe rephrase it.

963
01:32:34.430 --> 01:32:37.190
David Bau: then I can get much better predictions.

964
01:32:37.370 --> 01:32:43.019
David Bau: As a model, and that's… and so I, you know, so all the models learn how to do in-context learning.

965
01:32:44.320 --> 01:32:51.699
David Bau: By And then… but then more sophisticated people will say, oh, the way to think about this

966
01:32:51.960 --> 01:32:59.830
David Bau: Is it… so that's one… theory number one, is ICL is copying, right? Theory number two is ICL is like a Bayesian resolution.

967
01:33:00.020 --> 01:33:02.810
David Bau: It's like, oh, you have all this evidence of…

968
01:33:03.480 --> 01:33:19.070
David Bau: you know, circumstantial things around, and then… so you have this wonderful context. Well, everybody around me is speaking French, and they're eating crepes, and, and, and, and, you know, and stuff like this. So where am I?

969
01:33:19.570 --> 01:33:23.849
David Bau: Right? Oh, I'm probably in France. I'm probably in France.

970
01:33:24.040 --> 01:33:33.540
David Bau: Right. What? Or Brooklyn. Or Brooklyn, right? And so it might be a viable distribution. And, and so there's… and so, yeah, or Brooklyn. Yes.

971
01:33:34.880 --> 01:33:38.800
David Bau: Yes, I love it. Or, or certain parts of Cambridge or something.

972
01:33:38.980 --> 01:33:47.090
David Bau: Okay, so… But yes, so, you know, another view of ICL,

973
01:33:48.210 --> 01:33:50.620
David Bau: Is that models learn to make these inferences.

974
01:33:50.860 --> 01:33:55.510
David Bau: And it's just a Bayesian resolution of, like, what the contextual conditioning is.

975
01:33:55.830 --> 01:34:00.659
David Bau: of it. It's a great, I mean, it's axiomatic, it's almost like, oh, of course it's gotta be the case, because…

976
01:34:00.980 --> 01:34:08.370
David Bau: It's just, like, every information you, every piece of information you know encoded in this population distribution. So it seems like a little bit of a tartology.

977
01:34:09.720 --> 01:34:10.450
David Bau: But…

978
01:34:10.600 --> 01:34:25.480
David Bau: There's people who think that's the right way of looking at it. To me, that's like a black box way of looking at it. It's sort of like the information gas theory. It's sort of like saying, hey, what does a model know? It knows everything that there is to know. Like, evasion would say, this is, like, the statistical statement of everything there is to know, and your model would know that.

979
01:34:25.480 --> 01:34:31.789
David Bau: So what I really want to know is, like, how does it know? Where is it stored? And if there's stuff that it's supposed to know that it doesn't, like, where are its limitations?

980
01:34:31.800 --> 01:34:35.559
David Bau: How does it encode it? And so, like, understanding the pointers.

981
01:34:35.980 --> 01:34:46.890
David Bau: like, an example of that. And so, like, pointers are an example of in-context learning. Copying is another example of in-context learning. And so here we have, like, this sort of crazy fuzzy copying.

982
01:34:47.480 --> 01:34:55.729
David Bau: Where you, like, Have, have a task, Like, doing, doing, adenimps.

983
01:34:56.110 --> 01:35:00.269
David Bau: And then you can try to transport the knowledge of the task over.

984
01:35:00.530 --> 01:35:11.490
David Bau: another thing. And so I'm not going to have time to talk about this experiment, but I think it's really elegant. And so this is a third type of idea that you can also use cost mediation analysis

985
01:35:11.920 --> 01:35:17.300
David Bau: to transport. But here, instead of detecting that it's the dominant thing that's going on.

986
01:35:17.780 --> 01:35:22.909
David Bau: Because here, it's not the dominant thing. So this task that you're doing is like a small signal.

987
01:35:23.050 --> 01:35:26.679
David Bau: It's only in this tiny small subset of the attention heads.

988
01:35:27.020 --> 01:35:30.160
David Bau: Instead, we hypothesized that this information was there.

989
01:35:30.390 --> 01:35:38.320
David Bau: And we used the hypothesis to search for attention heads that, if you transported them, they would bump the probabilities up in the right direction.

990
01:35:38.800 --> 01:35:46.330
David Bau: And then after we found those attention pads, we summed up all of their effects and all of their results by, like, patching them over all at once.

991
01:35:46.540 --> 01:35:49.110
David Bau: And when you do that as a gang.

992
01:35:49.790 --> 01:35:52.570
David Bau: They do have very serious causal effects.

993
01:35:57.350 --> 01:36:01.240
David Bau: And you can transport The idea that you have two endoms.

994
01:36:01.880 --> 01:36:05.480
David Bau: One place to another by transporting all these attention heads at once.

995
01:36:05.770 --> 01:36:08.890
David Bau: And the crazy thing, if you take the same attention heads.

996
01:36:09.290 --> 01:36:24.869
David Bau: And you find something that's doing a different task, like doing English-Spanish translation or something, and you transplant… transport whatever those attention heads have to a different context, then you'll get the other context to do English-Spanish translation, or… so… or for a variety of other tasks.

997
01:36:25.050 --> 01:36:25.820
David Bau: I guess.

998
01:36:26.050 --> 01:36:31.540
David Bau: And so that's… so that's, like, this… it's an example of, like, a more…

999
01:36:31.830 --> 01:36:37.040
David Bau: Unexpected way, like, once you're comfortable with cost mediation analysis, now you can sling it around and use it

1000
01:36:37.220 --> 01:36:41.359
David Bau: In different ways, and so there's another way of using it, and you get some interesting results from it.

1001
01:36:41.740 --> 01:36:44.029
David Bau: Talk about a few other questions later.

1002
01:36:44.420 --> 01:36:45.390
David Bau: Okay, guys.

1003
01:36:45.520 --> 01:36:54.299
David Bau: Thanks very much. My… my hope is that you can try some pathway experiments, In your projects.

1004
01:36:54.420 --> 01:37:11.180
David Bau: Like, I really think this is kind of the gold standard for whether you actually have, you know, representation of an idea inside your AIs. And so, so feel free to, you know, chat over Discord about any questions about setting it up.

1005
01:37:11.280 --> 01:37:25.369
David Bau: There's some URL for, the code that you can use. The notebook that has advanced codes and stuff like that. Okay.

1006
01:37:26.070 --> 01:37:27.970
David Bau: Oh, yeah.

1007
01:37:28.110 --> 01:37:42.880
David Bau: I was trying, we have a clear…

1008
01:37:43.150 --> 01:37:48.179
David Bau: I would think that's always inspect myself, right?

