WEBVTT

1
00:00:03.190 --> 00:00:06.270
David Bau: So many times remote. Should come, come in person.

2
00:00:06.530 --> 00:00:08.520
David Bau: You know what, I should, I should print food.

3
00:00:09.410 --> 00:00:12.119
David Bau: Many people, like, you can't have the food remote.

4
00:00:12.940 --> 00:00:17.240
David Bau: One day.

5
00:00:19.650 --> 00:00:22.369
David Bau: Okay, let's see here what we've done.

6
00:00:24.430 --> 00:00:26.010
David Bau: Okay.

7
00:00:27.660 --> 00:00:29.270
David Bau: Welcome to the spring!

8
00:00:30.730 --> 00:00:32.400
David Bau: Look at that weather out there!

9
00:00:33.240 --> 00:00:38.120
David Bau: It's almost, it's almost, Revere Beach by there. Who wants to go to the beach?

10
00:00:38.510 --> 00:00:47.049
David Bau: Today? Yeah. With 30 mile an hour winds. The sand will be just, like…

11
00:00:47.560 --> 00:00:51.760
David Bau: Let's see, did I hit record on this thing? Are we recording? Yes. Okay, great.

12
00:00:53.350 --> 00:00:54.140
David Bau: Alright.

13
00:00:55.420 --> 00:01:02.220
David Bau: Yeah, so… Yeah, how are the… how are projects going?

14
00:01:02.690 --> 00:01:04.180
David Bau: Did I pick?

15
00:01:04.680 --> 00:01:10.709
David Bau: So, so, yeah, next week… We're gonna…

16
00:01:10.860 --> 00:01:16.549
David Bau: We have a couple researchers who are studying you guys.

17
00:01:17.020 --> 00:01:20.710
David Bau: And, and we're also trying to improve the class.

18
00:01:21.040 --> 00:01:29.419
David Bau: And so, so they're gonna actually come and run a little workshop where they're going to, step you through using some of them.

19
00:01:29.590 --> 00:01:37.800
David Bau: the research tools, and ask for your feedback, and stuff like that. And so, so actually, I have a piece of homework

20
00:01:38.870 --> 00:01:41.990
David Bau: That's slightly different from… for next week.

21
00:01:42.520 --> 00:01:46.940
David Bau: than we normally do, so normally we'd have a reading for you guys to do.

22
00:01:47.250 --> 00:01:50.190
David Bau: But this is a project-related homework, it's very simple.

23
00:01:50.640 --> 00:01:52.860
David Bau: Which is, is that fair?

24
00:01:52.970 --> 00:01:58.139
David Bau: So I'll have Nikhil write it up and send out a form for it, right? But, so that you guys…

25
00:01:58.340 --> 00:01:59.869
David Bau: Know exactly what it is.

26
00:01:59.990 --> 00:02:04.790
David Bau: But, but the class activity will have you…

27
00:02:04.940 --> 00:02:07.820
David Bau: Do a piece of research in class.

28
00:02:07.930 --> 00:02:10.719
David Bau: And the thing that they're gonna step through

29
00:02:10.830 --> 00:02:15.139
David Bau: It's going to be anything related to your project where you have… Eric?

30
00:02:15.310 --> 00:02:18.489
David Bau: Fuck, that you want to try and do some detailed fetching.

31
00:02:18.820 --> 00:02:19.830
David Bau: Between.

32
00:02:19.940 --> 00:02:39.829
David Bau: And so, you guys are far enough into the project that I think all the teams have some data sets where you have some interesting pairs of contrastive prompts that you may have already looked into, but maybe you want to look into more detail, or something like that. So every team member, should have an idea of, some text prompts.

33
00:02:40.000 --> 00:02:42.879
David Bau: Do they want, to improve?

34
00:02:43.120 --> 00:02:44.620
David Bau: And,

35
00:02:44.890 --> 00:02:51.479
David Bau: And that's… that's that, that's all. So, you'll basically work more on your project, but then think about something specific that you can work on next.

36
00:02:51.710 --> 00:02:53.930
David Bau: Next Tuesday, in class.

37
00:02:54.200 --> 00:02:58.810
David Bau: Make sense? Okay, and then Nikhil can send out a form or something that reminds people of that.

38
00:02:59.490 --> 00:03:00.270
David Bau: Alright.

39
00:03:00.680 --> 00:03:03.360
David Bau: Okay

40
00:03:05.040 --> 00:03:10.750
David Bau: I think maybe I'll rename the slide later when I save this slide deck again. Maybe it's Novel Knowledge?

41
00:03:11.180 --> 00:03:24.709
David Bau: within neural networks, or something like that, so I'm not sure, really, what the theme was of the three readings of which we're talking about today, but I just… there's several things that I wanted you guys to be aware of, that they're all techniques that you could use

42
00:03:24.920 --> 00:03:31.040
David Bau: In a paper, I've seen people use these techniques in the paper, and so as you guys are, you know, developing.

43
00:03:31.190 --> 00:03:41.040
David Bau: your ideas, and you're looking for what other way can we triangulate our experiments or get a little bit more insight of what's going on.

44
00:03:41.330 --> 00:03:45.150
David Bau: you know, I just want to give you some exposure to some other things that people

45
00:03:45.700 --> 00:03:54.360
David Bau: do. So this is so… if you can't tell, like, the whole class has been… we're only scratching the surface of many of the things that people do.

46
00:03:54.800 --> 00:03:56.889
David Bau: In interpretability research?

47
00:03:57.140 --> 00:04:00.370
David Bau: But I'm trying to bias it towards things that

48
00:04:00.830 --> 00:04:07.700
David Bau: I've seen, like, show up in a bunch of research papers as techniques that people reuse. And so, the first one.

49
00:04:07.700 --> 00:04:09.070
Tamanna Urmi: Am I audible?

50
00:04:09.460 --> 00:04:11.099
David Bau: Oh, can you hear me? You can't hear me?

51
00:04:11.100 --> 00:04:16.489
Tamanna Urmi: Yeah, is it possible to share the screen, on Zoom?

52
00:04:16.800 --> 00:04:20.079
David Bau: Oh, is the screen not shared? Yes, that's… I'm sure it's possible, let me see.

53
00:04:21.100 --> 00:04:24.650
David Bau: Let me see… that would be helpful, won't it? Then you can see.

54
00:04:24.650 --> 00:04:32.030
Tamanna Urmi: Yeah, so right now, it's a camera, like, we can see it through a camera, but if it can be shared, that would be fantastic.

55
00:04:32.030 --> 00:04:34.070
David Bau: Yes, thank you for interrupting.

56
00:04:34.300 --> 00:04:35.899
David Bau: Now, can you… is it better?

57
00:04:36.360 --> 00:04:38.940
Tamanna Urmi: Yes, yes, fantastic. Thank you, Zod.

58
00:04:39.360 --> 00:04:41.170
David Bau: Great, and then I should probably…

59
00:04:41.720 --> 00:04:44.209
David Bau: I have this thing, I have video count.

60
00:04:45.180 --> 00:04:46.630
David Bau: I'll hide another thing.

61
00:04:47.780 --> 00:04:50.710
David Bau: My floating in control. Is that looking okay now?

62
00:04:51.600 --> 00:04:52.550
David Bau: Okay.

63
00:04:53.160 --> 00:04:53.780
David Bau: Yes, sir.

64
00:04:53.780 --> 00:04:55.270
Tamanna Urmi: This is great. Thank you.

65
00:04:55.510 --> 00:04:57.849
David Bau: Now, of course, they don't have a new control engine.

66
00:04:58.850 --> 00:05:00.650
David Bau: Let's figure this out.

67
00:05:02.320 --> 00:05:04.089
David Bau: Okay, that's great.

68
00:05:04.410 --> 00:05:15.810
David Bau: Okay, so… so the first thing is training dynamics. So, you guys are all setting new training networks, you have capabilities, but those capabilities didn't…

69
00:05:16.180 --> 00:05:18.780
David Bau: materialize, Instantly.

70
00:05:18.960 --> 00:05:23.530
David Bau: Right? All these networks were trained over some procedure.

71
00:05:23.680 --> 00:05:26.370
David Bau: This class is not about defending proximity.

72
00:05:26.570 --> 00:05:30.950
David Bau: But it can be interesting when you're looking at the internal offload.

73
00:05:31.320 --> 00:05:37.799
David Bau: circuits, the internal beliefs of the model, to see how those processing pathways

74
00:05:37.920 --> 00:05:43.039
David Bau: evolved over time. And sometimes, the story can be very interesting.

75
00:05:43.220 --> 00:05:51.860
David Bau: And so… so one of the early papers to look at this was a paper called Rocking.

76
00:05:52.620 --> 00:06:06.529
David Bau: And so there's a few graphing papers, and there's one that I assigned you guys, but then this is the paper that came before that one. Did anybody look up this one? This one is from, Power at all? Will you look back on that one? That's great.

77
00:06:06.680 --> 00:06:16.939
David Bau: I think this is a good thing to do. You know, the paper I assigned to you was, like, a follow-up paper, and it might be easier to understand if you, like, skim this one. So I'll show you what the original

78
00:06:17.120 --> 00:06:19.820
David Bau: Paper-wise, it's relatively simple.

79
00:06:20.270 --> 00:06:26.780
David Bau: The idea is… That they're gonna train a toy, language model.

80
00:06:27.490 --> 00:06:34.479
David Bau: And they're gonna train it on a really simple language, which just has lots of sentences that look like this. You know.

81
00:06:34.700 --> 00:06:40.089
David Bau: You know, A plus B equals C, or not plus them, just arbitrary operation.

82
00:06:40.380 --> 00:06:52.119
David Bau: A operated on B equals C. For lots of different letters, A, B, and C, just mixed up, but always true statements. And what they do is, they, to get their true statements, they form these little multiplication tables.

83
00:06:53.130 --> 00:06:57.380
David Bau: And then they fill in the multiplication tables according to some math operation.

84
00:06:57.600 --> 00:07:10.579
David Bau: like, addition mod 5, or division mod 5, or something like this. And then they… they pluck out a few of these, some percentage of these, they might pluck out half of the entries in the multiplication table, and they say.

85
00:07:10.650 --> 00:07:18.740
David Bau: They're not included as part of the training set. The rest of them are included as part of the training set, and then they just train the model on sentences to sample from

86
00:07:18.930 --> 00:07:34.430
David Bau: You know, from this multiplication table, and they see how does it do. And now, because they've held up some of the facts, they have two measurements they can make. They can make… measure how well does it do at remembering the ones that we've included in the training data.

87
00:07:34.840 --> 00:07:40.599
David Bau: And they could also ask, how well does it do on the things That we haven't included.

88
00:07:40.800 --> 00:07:42.000
David Bau: And then training data.

89
00:07:42.180 --> 00:07:48.459
David Bau: And, you know, I think the people who did this, they didn't really have any idea

90
00:07:48.590 --> 00:07:50.980
David Bau: Whether the neural networks would actually guess

91
00:07:51.110 --> 00:07:54.870
David Bau: Correctly, the things that have been never shown.

92
00:07:55.110 --> 00:07:58.019
David Bau: And actually, when they first trained them.

93
00:07:59.490 --> 00:08:03.189
David Bau: They do really well on the facts that you show them.

94
00:08:03.760 --> 00:08:08.379
David Bau: But then the accuracy on the facts that you don't show them is stuck at zero.

95
00:08:08.630 --> 00:08:11.700
David Bau: So, you know, it sort of suggests the neural networks.

96
00:08:11.840 --> 00:08:30.169
David Bau: What? Memorizing it as… That's what they call it. So, in machine learning, and it, like, so, anybody who's taken a machine learning class will know, there's what we call memorization as opposed to generalization. It means that, that, you know, it can regurgitate the things that you exposed it to in training.

97
00:08:30.450 --> 00:08:37.609
David Bau: But… It can't figure out anything that Extrapolates beyond those examples.

98
00:08:37.950 --> 00:08:40.740
David Bau: You know, it basically material. So…

99
00:08:40.840 --> 00:08:47.690
David Bau: So that's sort of where this experiment sat for a little while until the mathematicians who tried this

100
00:08:47.960 --> 00:08:52.860
David Bau: For whatever reason They just let the computer run longer.

101
00:08:53.500 --> 00:09:00.779
David Bau: And after they ran it, for 10, 100… A thousand times longer.

102
00:09:00.950 --> 00:09:05.590
David Bau: They just started to see… That the holdup started to improve.

103
00:09:07.790 --> 00:09:08.740
David Bau: Now…

104
00:09:11.910 --> 00:09:14.919
David Bau: Who here is taking, like, you know, a machine learning class?

105
00:09:15.560 --> 00:09:18.620
David Bau: Right? Yeah. Like, like all the computer scientists.

106
00:09:19.010 --> 00:09:26.920
David Bau: So, have you ever heard of, you know, different regularization methods? Have you heard of the regularization method, really stopping?

107
00:09:28.650 --> 00:09:35.730
David Bau: So, what is… what is regularization supposed to do? Is it supposed to hurt or help with generalization.

108
00:09:38.470 --> 00:09:43.039
David Bau: It's supposed to help, generalization. And so, the intuition

109
00:09:43.350 --> 00:09:49.400
David Bau: is if you, like, how would you, like, get better generalization? By training longer, or by stopping early?

110
00:09:50.470 --> 00:09:53.790
David Bau: Stop being harder, that's, like, what they teach you.

111
00:09:54.040 --> 00:09:55.489
David Bau: In machine learning class.

112
00:09:56.010 --> 00:09:57.990
David Bau: So, did these guys stop early?

113
00:09:58.300 --> 00:10:00.290
David Bau: Oh, no.

114
00:10:00.500 --> 00:10:04.699
David Bau: Yeah, they only train, like, a thousand times longer, what the heck? This is the opposite of stopping early.

115
00:10:05.510 --> 00:10:06.580
David Bau: And then what happened?

116
00:10:06.920 --> 00:10:10.119
David Bau: Like, it generalized anyway. So…

117
00:10:10.240 --> 00:10:17.740
David Bau: Because it's so counterintuitive, right, because it's not what they teach you in, you know, a machine learning class.

118
00:10:18.080 --> 00:10:24.589
David Bau: this was a really… there's a little workshop paper, and a little tiny workshop, I forget, you know, ICML or something.

119
00:10:24.720 --> 00:10:30.960
David Bau: And, and it just, like, kind of this little observation of this little toy thing with this thousand-fold

120
00:10:31.120 --> 00:10:36.760
David Bau: Increase in training time, and then… and then set a, you know, a nice jump up in generalization after this.

121
00:10:37.890 --> 00:10:49.609
David Bau: even though it was this humble little workshop paper, it got lots of attention. People were like, what the heck is happening with this? Right? Does that make sense? Because it's like, it was an example of…

122
00:10:49.780 --> 00:10:58.510
David Bau: something counterintuitive going on. And it's systematic, so it's not just one run. They ran hundreds of runs and so on, and they found this, you know, systematic

123
00:10:58.720 --> 00:11:03.599
David Bau: Effect, that they plotted out, and quantified in different ways.

124
00:11:04.020 --> 00:11:10.389
David Bau: So that was the original paper. To give you a little sub… so that they see the subset is learned instantly.

125
00:11:10.510 --> 00:11:13.589
David Bau: The unseen facts are guessed a lot later.

126
00:11:13.900 --> 00:11:19.610
David Bau: You know, they had some patterns that they said, you know, the more emissions, the longer it takes to drop.

127
00:11:19.960 --> 00:11:27.890
David Bau: You know, so this is the memorization cliff, this is the grocking cliff, they call this grocking. So they gave us an in.

128
00:11:28.200 --> 00:11:33.060
David Bau: You know, when you train a long time, then all of a sudden it's a network or something, they call ethical rocket.

129
00:11:33.400 --> 00:11:37.130
David Bau: Because, just… are people familiar with that English word?

130
00:11:38.050 --> 00:11:40.819
David Bau: I'm out of California work.

131
00:11:41.360 --> 00:11:48.969
David Bau: So, but I mean, it's like, oh, I get it now, I grok it. It's not, like, something that people would say after they meditated a long time or something.

132
00:11:50.410 --> 00:11:52.780
David Bau: I'm just in understanding graphic.

133
00:11:52.890 --> 00:11:59.320
David Bau: Okay, and so here's, like, an example of an extreme gracking holdout, so they were very proud to put this in the appendix.

134
00:12:00.300 --> 00:12:04.920
David Bau: worth taking a look at it. So this is a multiplication table that is so huge.

135
00:12:05.800 --> 00:12:09.160
David Bau: That they've had to use all the Hebrew letters and all the Greek letters.

136
00:12:09.340 --> 00:12:20.860
David Bau: Because I ran out of, like, English letters to put on each one of the things. And so you multiply one of these by one of these, and then you get some other letter in the middle, and if you zoom in, you can actually read these letters.

137
00:12:21.040 --> 00:12:25.480
David Bau: And you can see that all the blank ones are ones that are not included.

138
00:12:25.710 --> 00:12:31.390
David Bau: in the training data. And they say, this is an example of a multiplication table that grocks.

139
00:12:31.490 --> 00:12:35.390
David Bau: We give the network, like, all these Questions?

140
00:12:35.740 --> 00:12:41.840
David Bau: As training data, and then after we train it for a thousand times longer.

141
00:12:42.080 --> 00:12:44.809
David Bau: Than, than, you know, you'd normally train it.

142
00:12:45.220 --> 00:12:47.260
David Bau: It correctly guesses all the others.

143
00:12:47.700 --> 00:12:50.030
David Bau: It's kind of amazing, right?

144
00:12:50.300 --> 00:12:53.879
David Bau: Like, the extent to what you… I guess, is correct.

145
00:12:54.200 --> 00:13:01.230
David Bau: what the heck? So that's… that's where this other paper comes in. So this is… so Neil Nanda and, and collaborators

146
00:13:01.470 --> 00:13:08.910
David Bau: another California crew here. They tore open one of the transformers to see what was going on inside.

147
00:13:09.280 --> 00:13:16.370
David Bau: And what they said is, oh, it looks like, you know, after you train it for a long time, when it grows.

148
00:13:16.600 --> 00:13:23.509
David Bau: It develops a very funny representation, and they say it goes to help communities.

149
00:13:23.620 --> 00:13:26.420
David Bau: Well, I don't have the phase diagram here, maybe I have it later.

150
00:13:26.590 --> 00:13:40.700
David Bau: Where early on, it uses one strategy, and then later on, it develops the second strategy. And the second strategy looks like this. And the second strategy, too, so who understood or who didn't understand?

151
00:13:40.870 --> 00:13:45.440
David Bau: this… this analysis. I'm not sure it was, like, Super well explained.

152
00:13:45.800 --> 00:13:48.919
David Bau: In this paper. So when… when Neil says.

153
00:13:49.140 --> 00:13:53.690
David Bau: oh, I see these Fourier components in my neural representation.

154
00:13:55.040 --> 00:13:59.289
David Bau: Does that make sense? See, like, what that… what the heck that is? I can make it concrete.

155
00:13:59.910 --> 00:14:01.330
David Bau: Who wants me to make it concrete.

156
00:14:02.100 --> 00:14:02.810
David Bau: You know.

157
00:14:03.220 --> 00:14:08.030
David Bau: Alright, so, so let's take, like, the embeddings, for example. So.

158
00:14:08.230 --> 00:14:11.889
David Bau: When you feed in numbers into their neural network.

159
00:14:12.310 --> 00:14:15.419
David Bau: Then they come in as vectors.

160
00:14:15.590 --> 00:14:21.830
David Bau: So I don't remember what dimensionality they give, but, you know, they might be… Oh, it looks like 50…

161
00:14:22.610 --> 00:14:24.939
David Bau: I can't tell.

162
00:14:26.010 --> 00:14:38.549
David Bau: I'm not… maybe not, it's not 50, so… but… but, you know, there might be 1,000 dimensional vectors, or something like that, or it might be 100-dimensional vectors, doesn't really matter, right? So, but they have some embedding dimension.

163
00:14:38.810 --> 00:14:51.350
David Bau: That they use to represent every input thing. But now, what is the input vocabulary for this A times B equal C thing? Well, they're encoding, modulo 113 arithmetic

164
00:14:51.760 --> 00:14:54.409
David Bau: So, they need 113 letters.

165
00:14:55.010 --> 00:14:57.430
David Bau: Or 1 for each one of the numbers.

166
00:14:57.930 --> 00:15:05.660
David Bau: Right? And so… so, so they, they, they have, so they're, they're, they're embedded.

167
00:15:06.080 --> 00:15:12.820
David Bau: A matrix is a table with 113 vectors. One for each letter.

168
00:15:12.970 --> 00:15:18.629
David Bau: So if you say A, then that's a vector. If you say G, that's another vector. If you say C, that's another vector.

169
00:15:18.750 --> 00:15:27.450
David Bau: And so, now, it's… in practice, they're… they're learning modular, so A is actually 1, and V is actually 2, and C is actually 3, right?

170
00:15:27.650 --> 00:15:35.870
David Bau: And so… so if you… if you take your… 100-dimensional embedding vector.

171
00:15:36.110 --> 00:15:38.060
David Bau: And you line it up.

172
00:15:38.220 --> 00:15:53.120
David Bau: So that all 113 numbers are in this big grid. And what you'll find is, you'll find this interesting thing, that if you take different rows of the embedding vector, let's take, like, neuron number 5 or something like this.

173
00:15:53.390 --> 00:16:00.800
David Bau: And then you trace… I wonder if neuron number 5 is high, or low or medium for 1, and you can, like, plot it.

174
00:16:00.930 --> 00:16:05.890
David Bau: And then, oh, I wonder if that same neuron is higher, low, or medium for two, and plot it.

175
00:16:06.220 --> 00:16:11.020
David Bau: And then higher, low, or medium for 3 implied. And so you just, like, so you just take, like, a single neuron.

176
00:16:11.260 --> 00:16:13.089
David Bau: From your embedding matrix.

177
00:16:13.420 --> 00:16:18.379
David Bau: And you plot it all the way across all 113 numbers, and you'll get some waveform.

178
00:16:18.850 --> 00:16:19.810
David Bau: Make sense?

179
00:16:20.050 --> 00:16:23.819
David Bau: Right So that's the waveform that they're doing Fourier analysis on.

180
00:16:24.560 --> 00:16:37.880
David Bau: Right, and so… and they have a lot of these waveforms, so basically one for every neuron, or there's other ways of analyzing set of basis, but basically, the easy way to think about it is, you know, you've got… you've got a waveform for every neuron.

181
00:16:38.500 --> 00:16:44.549
David Bau: And, And so, when you do this in practice.

182
00:16:44.850 --> 00:16:49.560
David Bau: And you look at the waveforms you get for all the embeddings,

183
00:16:49.770 --> 00:16:53.499
David Bau: At first, when it's memorizing, the waveforms look

184
00:16:53.820 --> 00:16:58.679
David Bau: Really jaggedy. Like, just, like, noise, like, all over the place, right?

185
00:16:58.830 --> 00:17:06.530
David Bau: And… and if you do… if you were to do a Fourier analysis on it, it, like, has all the frequencies. It's just all mixed up. It's like white noise.

186
00:17:06.700 --> 00:17:09.250
David Bau: Right? For basically every neuron.

187
00:17:09.780 --> 00:17:14.209
David Bau: And, and they didn't picture this in the paper. I think that would have been useful.

188
00:17:14.329 --> 00:17:17.280
David Bau: to do. But that's what happens in practice.

189
00:17:19.750 --> 00:17:25.220
David Bau: There's a little backstory here, which is, I was also looking at grocery at the same time that Neil did.

190
00:17:25.869 --> 00:17:32.799
David Bau: He gave me a call, and he says, I heard that you're looking at this, and what's going on? And I shared with him, like, what I've gotten.

191
00:17:33.000 --> 00:17:40.690
David Bau: And then he's like, Bubble, you're almost done with this. I better… better submit mine first, so I don't get stuck. And so,

192
00:17:41.060 --> 00:17:46.630
David Bau: So yeah, I never published mine. Anyway… For scooping?

193
00:17:47.100 --> 00:17:55.719
David Bau: He gave me an acknowledgement in the paper, saying, thank you for the informative discussions.

194
00:17:55.740 --> 00:18:08.859
David Bau: That's crazy, that's crazy. Anyway, it's alright. But there's a reason that I know, like, a bunch of this stuff here, because, no, he's looking at it. So, so,

195
00:18:09.330 --> 00:18:14.499
David Bau: It's not… for me, it's not a big deal. It wasn't a huge deal of paper.

196
00:18:14.740 --> 00:18:18.480
David Bau: I was living on Rome at the time. This is a bigger deal for me.

197
00:18:18.750 --> 00:18:23.910
David Bau: And so, but this,

198
00:18:24.190 --> 00:18:31.129
David Bau: But, but yes, it's not just all on video in this case.

199
00:18:31.370 --> 00:18:39.610
David Bau: So, but… and, you know, Neil's incredibly effective at meetings, so that's fine. But the,

200
00:18:40.000 --> 00:18:50.259
David Bau: So the thing that happens is, you get this crazy waveform, during the memorization, and then, all of a sudden, when it grocks.

201
00:18:50.780 --> 00:18:55.879
David Bau: It's like, all the different frequencies, they drop out.

202
00:18:57.220 --> 00:19:00.449
David Bau: And then you just, like, you know, like, like, white noise.

203
00:19:00.630 --> 00:19:01.830
David Bau: And then all of a sudden, it's like.

204
00:19:02.750 --> 00:19:09.459
David Bau: you know, like, rings, like, this clean sine wave signal. A few frequencies, like, pop out.

205
00:19:09.770 --> 00:19:13.530
David Bau: And this is what Neil has, neil's depicting here.

206
00:19:14.020 --> 00:19:14.979
David Bau: That make sense?

207
00:19:15.290 --> 00:19:20.849
David Bau: And so… so yeah, so that's what happens. It happens in the… so he's plotting the embeddings.

208
00:19:21.400 --> 00:19:29.779
David Bau: And I don't know the total details… oh, and I think this is the unembedding matrix, so he's plotting the unembedding, so, like, when it has to output the next thing, there's an unembedding matrix.

209
00:19:30.100 --> 00:19:33.889
David Bau: And, and they both… and they both show the frequencies.

210
00:19:34.050 --> 00:19:37.079
David Bau: And actually, they both ring at the same frequencies.

211
00:19:37.520 --> 00:19:38.690
David Bau: It's kinda cool.

212
00:19:39.080 --> 00:19:43.449
David Bau: Isn't that weird? So now do you get it? Now do you see what's going on? Does that make it more concrete?

213
00:19:43.850 --> 00:19:44.580
David Bau: Okay.

214
00:19:45.130 --> 00:19:48.179
David Bau: And so, yeah, so that's what happens.

215
00:19:48.360 --> 00:19:53.589
David Bau: And, And then he says, are these frequencies, like, important?

216
00:19:53.710 --> 00:20:01.299
David Bau: Like, it's, like, ringing at this frequency, so it could be two things. It could be that everything else is important, and the frequencies are, like, sign of some noise.

217
00:20:01.560 --> 00:20:15.139
David Bau: Right? Or maybe the frequencies are important and everything else is, like, some noise. So, like, there seems to be this clear distinction between these two things. And so he did this very nice experiment, where he said, what if I… what if I, like, filter out

218
00:20:15.460 --> 00:20:17.919
David Bau: these embeddings, or whatever, whatever I filter out.

219
00:20:18.160 --> 00:20:21.949
David Bau: The other frequencies that are not The, the ring ones.

220
00:20:22.080 --> 00:20:24.760
David Bau: What… how does that impact?

221
00:20:25.480 --> 00:20:39.450
David Bau: the accuracy of the network, and it impacts it very little. I think there's a logarithmic scale, right? And then if she says, oh, what's the impact if I cut out one of the greening frequencies, and it has a much bigger effect?

222
00:20:39.570 --> 00:20:45.229
David Bau: So… so this is his evidence that it's the ringing frequencies that are actually carrying

223
00:20:45.400 --> 00:21:04.259
David Bau: the information of how to solve a thing. And so, in the paper, they talk about the sine and cosine functions of, like, what's being implemented in practice. I won't talk about that. But basically, you know, if you want to do modular arithmetic, you can implement it by adding together some sines and cosines, which is

224
00:21:04.400 --> 00:21:12.119
David Bau: What looks like what these networks are doing by representing everything in sines and cosines, with these clean frequencies.

225
00:21:12.570 --> 00:21:27.510
David Bau: And, and so, but the thing that's relevant to… so, you probably… none of you are studying something where you'll see signs and cosines, most likely, although maybe, maybe the… maybe some of the mapping people are spatial people, depending on what you're doing.

226
00:21:27.670 --> 00:21:34.989
David Bau: In which case, you know, you should understand that. But one of the things that he saw, and you could see this too.

227
00:21:35.240 --> 00:21:39.910
David Bau: If you look at your training dynamics, is that you'll see these weird cliffs

228
00:21:40.040 --> 00:21:47.420
David Bau: of behavior. So here, he had his insight. After he found his mechanism, so for him, his mechanism was.

229
00:21:47.780 --> 00:21:49.670
David Bau: this Fourier analysis.

230
00:21:49.900 --> 00:21:57.540
David Bau: what's going on. He did this nice thing which he… he did… he… he did his Fourier analysis, phased experiment.

231
00:21:57.690 --> 00:22:00.570
David Bau: Over time, while the model was training.

232
00:22:00.870 --> 00:22:17.969
David Bau: And I think that he had, what he did was he says, oh, I've got this piece of code that will tell me what the impact is if I filter out a frequency. I can either filter out the included frequencies or the excluded frequencies, and so I can make a different loss curve to see, like, what is the effect.

233
00:22:17.990 --> 00:22:20.920
David Bau: If I filter them out early during memorization.

234
00:22:21.000 --> 00:22:25.420
David Bau: Or late during generalization, or in the in-between phase, right?

235
00:22:25.640 --> 00:22:28.789
David Bau: And so… so that's what these turquoise lines are.

236
00:22:29.450 --> 00:22:30.150
David Bau: Right.

237
00:22:30.450 --> 00:22:34.859
David Bau: So the turquoise line is… if… if I take…

238
00:22:35.090 --> 00:22:38.750
David Bau: The important frequencies, like the ringing frequencies.

239
00:22:39.440 --> 00:22:45.779
David Bau: And I go back and I filter them out during memorization, you know.

240
00:22:45.930 --> 00:22:51.049
David Bau: Little loss is good here. Little loss means it's, like, you know, doing pretty well.

241
00:22:51.290 --> 00:22:59.219
David Bau: And during memorization, you know, if you don't do any of the filtering, the memorization looks like this for the training data.

242
00:22:59.680 --> 00:23:08.139
David Bau: And so what they do here is they say, oh, actually, you know, it actually kind of does pretty well

243
00:23:08.300 --> 00:23:10.110
David Bau: You know, even if you cut out

244
00:23:10.430 --> 00:23:15.839
David Bau: the sexual frequencies. It does a little worse, right? Because obviously, like, we're… we're perturbing the network.

245
00:23:16.120 --> 00:23:27.490
David Bau: But it's not so bad, like, the loss is still dropping, and so that makes sense, because, like, the special frequencies are just, like, 10% of the frequencies of the network, right? So there's, like, a 10% intervention.

246
00:23:27.710 --> 00:23:32.389
David Bau: We're, like, making it… we're hitting 10% of the information there. It's not so bad.

247
00:23:32.790 --> 00:23:38.239
David Bau: Right? The network still gets better. But then, there's this sudden reversal at this point.

248
00:23:38.690 --> 00:23:41.060
David Bau: Where if you cut those frequencies out.

249
00:23:41.460 --> 00:23:50.719
David Bau: You know, this, you know, the regular training continues to get better a little bit, but all of a sudden there's this reversal, where cutting those frequencies out makes something worse, a lot.

250
00:23:51.100 --> 00:23:51.830
David Bau: Right?

251
00:23:52.490 --> 00:24:01.229
David Bau: And so… so this is… this is pretty direct evidence that the model is starting to focus on those frequencies, right? It's starting to really depend on them.

252
00:24:01.570 --> 00:24:04.189
David Bau: And then, by the time you get to this part.

253
00:24:04.750 --> 00:24:08.670
David Bau: If you cut out the frequencies, then basically the network can't solve the problem.

254
00:24:08.840 --> 00:24:11.410
David Bau: Right, the loss is, like, as bad as it ever gets.

255
00:24:11.710 --> 00:24:12.699
David Bau: That make sense?

256
00:24:12.890 --> 00:24:13.820
David Bau: And so…

257
00:24:13.970 --> 00:24:20.510
David Bau: And then he has the inverse thing here, where if you look at all the other frequencies and you cut them out.

258
00:24:20.630 --> 00:24:33.449
David Bau: you know, the network can't do the task at all with the other frequencies. So what are the other frequencies? The other frequencies are, like, his fingerprint, his hypothesis of what memorization is doing, is that memorization is using this mishmash of all the other frequencies.

259
00:24:33.700 --> 00:24:41.620
David Bau: Right? And so… so when… when you cut out the memorization frequencies, the network can't perform at all while it's memorizing.

260
00:24:42.040 --> 00:24:45.129
David Bau: And it's depending on this memorization all the way up to this point.

261
00:24:45.610 --> 00:24:52.769
David Bau: Until all of a sudden, it's not depending on the memorization anymore. And if you cut out… actually, this is… he mentions this in the sentence.

262
00:24:52.960 --> 00:25:00.480
David Bau: If you cut out the non-renewing frequencies, then actually, it starts having lower loss.

263
00:25:00.860 --> 00:25:05.589
David Bau: than the regular training. It actually cleans up the network.

264
00:25:05.830 --> 00:25:08.670
David Bau: Oh, let me zero this noise out for you.

265
00:25:09.120 --> 00:25:09.970
David Bau: Interesting.

266
00:25:10.790 --> 00:25:18.590
David Bau: These are all of the noise frequencies? Yeah, he does it all at once. Yeah, he does it all at once, I think. Yeah, that's right. And if I read the paper right.

267
00:25:18.740 --> 00:25:20.919
David Bau: Yeah, so it's a nice experiment.

268
00:25:21.120 --> 00:25:26.430
David Bau: Yeah, so this was… this is creative. This is something that I never ran myself. It's very nice.

269
00:25:26.770 --> 00:25:32.519
David Bau: And then, and then he also shows this, interesting effect when you just look at

270
00:25:32.840 --> 00:25:36.329
David Bau: The norms of the weights, and the norms of the weights, sort of.

271
00:25:36.480 --> 00:25:51.120
David Bau: have these funny little cliffs where something happens here. So even if you don't know what the detailed dynamics of your circuits are, if you just look at some aggregate data, like the norms of the weights or something, you'll… sometimes you'll see them go through different bumps.

272
00:25:51.370 --> 00:25:57.379
David Bau: And things. And what this is suggestive of is, oh, actually.

273
00:25:57.550 --> 00:26:06.290
David Bau: At each one of these bumps, maybe the network is, like, learning some different solution to the problem. So… so I think there were some questions here.

274
00:26:06.610 --> 00:26:11.959
David Bau: About, like, so, I'm gonna pause for a moment. IU, Claire, Isaac, Avery, had a bunch of questions.

275
00:26:12.170 --> 00:26:16.920
David Bau: That were sort of related to… You know, this presentation itself.

276
00:26:17.610 --> 00:26:19.960
David Bau: I'll let people ask any questions they had.

277
00:26:23.500 --> 00:26:33.289
David Bau: I was thinking about the, like, are, this is slightly different than what I asked, but, like,

278
00:26:33.410 --> 00:26:49.009
David Bau: Do you have any sense of, like, things that do crock versus what doesn't crock? Like, it seemed like this was dependent on finding the pattern at a later stage, like, kind of reverse… Yeah. …reversing out, kind of how it learned that. Yeah.

279
00:26:49.150 --> 00:27:06.009
David Bau: And I was wondering about the, kind of, like, in the other direction, if you didn't know, like, an appliance stage… Yeah. …predictions of, and this relates a little bit to some of the questions about, like, causality versus correlation, where sometimes in our experiments, we found, like, there's information there that the models don't use. Yeah.

280
00:27:06.310 --> 00:27:09.910
David Bau: You know, this paper got a lot of attention,

281
00:27:11.130 --> 00:27:16.770
David Bau: And people, you know, this graph here, right, got a lot of attention.

282
00:27:16.920 --> 00:27:22.550
David Bau: And people notice certain things about it, and so do the people, so I'll point out to you things that

283
00:27:23.110 --> 00:27:28.660
David Bau: you know, circulated in the community afterward, like, the skepticism. It's like, oh, wow, look at this cliff!

284
00:27:29.000 --> 00:27:33.450
David Bau: And then later on, like, a thousand times later, right, it's like, oh, there's another clip, it's amazing!

285
00:27:33.610 --> 00:27:35.810
David Bau: Well, this is a logarithmic scale.

286
00:27:36.370 --> 00:27:38.980
David Bau: And this one is a little more gentle than this one.

287
00:27:39.430 --> 00:27:47.189
David Bau: And so, actually, now this is left short. It, like, memorizes quickly, but if you were to put this on a linear scale, actually, this is very, very, very, very…

288
00:27:47.610 --> 00:27:56.959
David Bau: steady, slow, like, it took a while for it to grok, right? You know what I mean? And so… so is that, like, is grokking… is it, like, they have this very…

289
00:27:57.150 --> 00:28:01.860
David Bau: nice term that they use, oh, I think suddenly we get it, right? But actually.

290
00:28:01.980 --> 00:28:06.109
David Bau: Actually, it's not that sudden a process, in real life.

291
00:28:06.400 --> 00:28:15.639
David Bau: And, and maybe, you know, maybe not that distinct from the rest of what's going on here. So other people said, hey, you know, they're measuring accuracy here, right? And it's accuracy.

292
00:28:15.860 --> 00:28:20.829
David Bau: But actually, if you measure loss, then you don't see such a cliff. Loss is, like, more gradual.

293
00:28:21.080 --> 00:28:26.090
David Bau: And, and so there's a bunch of… Different people who objected.

294
00:28:26.340 --> 00:28:28.490
David Bau: To the title of the paper.

295
00:28:28.750 --> 00:28:29.590
David Bau: Gracking?

296
00:28:30.000 --> 00:28:35.939
David Bau: Which is a common tension in our field, because we like to pick flashy titles. I'm guilty of this.

297
00:28:36.480 --> 00:28:45.530
David Bau: And, but so, so the mental model, you should have a more realistic model, is,

298
00:28:45.910 --> 00:28:50.330
David Bau: This kind of phenomenon is fairly widespread.

299
00:28:51.130 --> 00:28:56.680
David Bau: Where you have different types of solutions being learned at different rates?

300
00:28:57.400 --> 00:29:04.510
David Bau: But this idea that there's different cliffs that happen is a little bit more… Controversial.

301
00:29:04.760 --> 00:29:07.540
David Bau: Some people say that there are cliffs

302
00:29:08.140 --> 00:29:16.290
David Bau: But they don't think that they're, like, scheduled, like, the cliff doesn't always happen at the same time. Like, the rate of…

303
00:29:16.430 --> 00:29:23.649
David Bau: Discovering a difficult… solution is a little bit random. Oh, actually, you see the randomness here?

304
00:29:23.940 --> 00:29:27.079
David Bau: Right? So this… there's all this stochasticity.

305
00:29:27.210 --> 00:29:31.050
David Bau: Of how long it takes to learn this other solution.

306
00:29:31.150 --> 00:29:32.960
David Bau: Every one of these dots

307
00:29:33.200 --> 00:29:35.500
David Bau: Let's see, I guess this is the number of steps.

308
00:29:35.690 --> 00:29:43.419
David Bau: to getting… to grokking. And this is another long scale, so it could be, like, 10 or 100 times more steps or fewer steps randomly.

309
00:29:43.700 --> 00:29:52.299
David Bau: And so, so yeah, so, so the rate of finding these solutions is sort of random.

310
00:29:52.640 --> 00:29:56.349
David Bau: And, and then, and then the idea that it's a cliff is also…

311
00:29:56.920 --> 00:30:00.980
David Bau: People think that it's a little bit more… Of a… of a slope.

312
00:30:01.170 --> 00:30:03.809
David Bau: Than… than the way they present it in the paper.

313
00:30:04.010 --> 00:30:07.759
David Bau: But yes, but besides, like, Everything changed.

314
00:30:07.980 --> 00:30:13.629
David Bau: I think that there… I'll show you some other examples where this kind of thing has come up.

315
00:30:13.760 --> 00:30:16.369
David Bau: That'd be on our future slide. Any other questions?

316
00:30:17.370 --> 00:30:29.269
David Bau: Yes. So, like, sorry, just… could you go back to the slide? Oh, this one, sorry, yes, like, groc faster with less power. I was thinking, like, is there… they only have, like.

317
00:30:29.620 --> 00:30:34.499
David Bau: 1, 3 times 1, 3 pairs of pinklets, and, like, they're training on, like, 30% of them.

318
00:30:34.920 --> 00:30:37.849
David Bau: I'm wondering, like, on this, like, small…

319
00:30:38.120 --> 00:30:56.419
David Bau: data set, it might, like, grokking happens, like, could it be because, like, it's much easier for the model to memorize, and then, like, later, like, grok, half-grocking happens, because it's trying to, like, definitely, like, decrease, like, learning some real things. But, like, in reality, LLMs.

320
00:30:56.910 --> 00:31:04.359
David Bau: we have, like, enormous, pre-training courses, and, like, I don't know if, like, it's possible to, like, memorize

321
00:31:04.540 --> 00:31:07.849
David Bau: All of them. Right. And I'm just thinking, like.

322
00:31:08.580 --> 00:31:17.610
David Bau: But the amount of training data, or, like, could that be also, factors? Yeah, is this, is this, like, a special thing about small training data?

323
00:31:17.730 --> 00:31:27.659
David Bau: You might be right. So, one of the things that is controversial here is that it's… he's trying to infer something about deep learning in general out of a toy setting with a really small training set.

324
00:31:27.970 --> 00:31:28.910
David Bau: And,

325
00:31:29.150 --> 00:31:37.120
David Bau: And so some people think that that's nonsense, that, you know, it's very different from the organic. So, you know.

326
00:31:37.420 --> 00:31:44.299
David Bau: This question you're asking, you're asking the question in an open-minded way,

327
00:31:44.440 --> 00:31:46.900
David Bau: Some people ask the question in a very critical way.

328
00:31:47.070 --> 00:31:49.699
David Bau: Right? Where, where they say, yeah, I, I, you know.

329
00:31:49.920 --> 00:31:56.920
David Bau: You didn't test on big data, and it doesn't have the properties that real training has.

330
00:31:57.270 --> 00:32:02.309
David Bau: I think that's a reasonable question. I think that's… that's also open.

331
00:32:02.760 --> 00:32:03.859
David Bau: for a debate.

332
00:32:03.960 --> 00:32:08.879
David Bau: There's, there's definitely a community out there who thinks that

333
00:32:09.100 --> 00:32:23.279
David Bau: the only way to really understand simple systems is to simplify them down. They say that they've done this in physics forever, right? And it did well in physics, understand waves and box, or something like that, and it helps you understand

334
00:32:23.410 --> 00:32:25.460
David Bau: You know, waves in the real world.

335
00:32:25.740 --> 00:32:29.149
David Bau: But, but other people say that, no, no.

336
00:32:29.480 --> 00:32:36.579
David Bau: Complexity is an inherent part of what's going on, and if you take away complexity, then you get different dynamics. So I…

337
00:32:36.830 --> 00:32:42.760
David Bau: I don't have a strong… Belief one way or the other. Like, I just… I just…

338
00:32:43.630 --> 00:32:47.819
David Bau: keep that skepticism in mind when I read these papers and when I advance an argument like that.

339
00:32:48.260 --> 00:32:49.160
David Bau: Make sense?

340
00:32:49.930 --> 00:32:52.790
David Bau: Yeah, I think it's… the jury's still out, it's all… it's all to noon.

341
00:32:52.890 --> 00:32:54.349
David Bau: We don't have a consensus.

342
00:32:55.410 --> 00:33:02.339
David Bau: There was a… there was a question, somebody asked a question about L2. Who asked the L2 question? I thought that was an interesting one for this plot.

343
00:33:03.770 --> 00:33:05.569
David Bau: Was it Hai Yu, or somebody?

344
00:33:06.700 --> 00:33:15.090
David Bau: So… I'm gonna pause. Let somebody ask the L2 question. L2 regularization, anybody?

345
00:33:16.190 --> 00:33:17.810
David Bau: I don't remember who asked.

346
00:33:20.150 --> 00:33:25.930
David Bau: So… So this is… so, like, this is interesting, right? The weights are getting smaller.

347
00:33:26.380 --> 00:33:29.900
David Bau: Right? And, and they suddenly dropped down.

348
00:33:30.260 --> 00:33:36.900
David Bau: When you get this different, different thing. And so they, they call this… so Neil calls this phase the cleanup phase.

349
00:33:37.270 --> 00:33:47.629
David Bau: All right. I, you know, I also saw this when I experimented with this system. And my intuition for what's going on here is this inter… is this, like, this interaction with L2 regularization.

350
00:33:47.800 --> 00:33:50.830
David Bau: And I try to… convey this intuition.

351
00:33:50.990 --> 00:33:52.560
David Bau: When we teach machine learning.

352
00:33:52.770 --> 00:34:00.320
David Bau: You know, L2 regularization is a pressure that you put on your model to make all the weights smaller.

353
00:34:00.580 --> 00:34:05.580
David Bau: Right, so it's a pressure to make things… try to zero out the model as much as… Okay.

354
00:34:05.700 --> 00:34:08.149
David Bau: But it's just, like, some pressure.

355
00:34:08.360 --> 00:34:12.179
David Bau: And your model has this other pressure, which is, like, to solve the problem.

356
00:34:12.659 --> 00:34:22.539
David Bau: And to solve the problem, you can't set all the weights to zero, right? It's not going to do well, right? So, to solve the problem, it has this positive pressure on the weights. It's like, there's all these, like.

357
00:34:22.679 --> 00:34:29.820
David Bau: computations it has to do, and to do the computations accurately, the numbers can't be zero. They've got to have some positive values.

358
00:34:30.179 --> 00:34:36.320
David Bau: And… and so, what you see here is that early on in training, Right?

359
00:34:36.679 --> 00:34:37.770
David Bau: Memorization.

360
00:34:38.179 --> 00:34:41.050
David Bau: Like, blasts the way it's big.

361
00:34:41.699 --> 00:34:48.499
David Bau: Because, like, oh, and maybe they start down here, and then boom, as soon as you start training, like, the weights get much bigger.

362
00:34:48.909 --> 00:34:51.049
David Bau: And, and so memorization is this…

363
00:34:51.210 --> 00:35:00.150
David Bau: This, this solution to the problem that requires… and you put a lot of information in the weights, which means that a lot of the numbers have to be really big.

364
00:35:00.560 --> 00:35:01.440
David Bau: Right.

365
00:35:01.680 --> 00:35:10.919
David Bau: And then, as it figures out, you know, how to memorize a little better, like, it starts to organize its information in some ways, and then the weights get smaller.

366
00:35:11.200 --> 00:35:14.639
David Bau: For a bit. And then, as…

367
00:35:14.920 --> 00:35:23.050
David Bau: As it realizes that a really good way of organizing the things is to use these, like, these few frequencies

368
00:35:23.750 --> 00:35:37.069
David Bau: It lets… it lets the model, like, shrink the weights further until it gets to the point where it says, oh, you know what? These frequencies are working so well, I don't need anything else. All I need is these frequencies.

369
00:35:37.420 --> 00:35:43.550
David Bau: Right? And then… and then the optimizer says, oh, you know, I don't have any optimization pressure on this

370
00:35:43.770 --> 00:35:48.970
David Bau: on this memorization path and more. I can set all those weights to zero.

371
00:35:49.510 --> 00:35:50.180
David Bau: and…

372
00:35:50.280 --> 00:36:01.740
David Bau: And it doesn't impact my loss, because the ringing rates and the key frequencies are solving the whole problem. So that's… so the L2… so the way I think of it is, like, the L2 regularizer is coming through and sort of sweeping up

373
00:36:01.920 --> 00:36:03.340
David Bau: You know, the old solution.

374
00:36:03.680 --> 00:36:08.280
David Bau: and zeroing it out, and that's… so that's the way I interpret You know, this park.

375
00:36:08.390 --> 00:36:09.549
David Bau: That makes sense, though.

376
00:36:10.370 --> 00:36:12.760
David Bau: Anyway, a little bit of intuition, a little storytelling.

377
00:36:13.730 --> 00:36:15.380
David Bau: That's sort of how I could do it.

378
00:36:15.720 --> 00:36:16.560
David Bau: Okay.

379
00:36:16.910 --> 00:36:20.750
David Bau: So, oh, Jasmine, are you online?

380
00:36:21.910 --> 00:36:24.019
Jasmine C.: Yeah, sorry, my voice is still…

381
00:36:24.660 --> 00:36:27.430
David Bau: How you doing? You feel like a question?

382
00:36:28.750 --> 00:36:33.580
Jasmine C.: I don't remember what I wrote, sorry.

383
00:36:34.030 --> 00:36:47.010
David Bau: Yeah, so I think that you asked this question, I just took a slide here, it said, it's a very related question to what Isaac asked, which is like, oh, does rocking happen anywhere else? So I don't remember if you remember from the induction speaker?

384
00:36:47.200 --> 00:36:58.230
David Bau: But they… they talked about this cliff in the loss that you get when you train, text, you know, and measure it on the copying task.

385
00:36:58.440 --> 00:37:04.199
David Bau: And then, you know, it does well up to a certain point, and then all of a sudden, it does much better until it's almost perfect.

386
00:37:04.410 --> 00:37:05.569
David Bau: And,

387
00:37:06.050 --> 00:37:18.500
David Bau: And so… so these cliffs… and the cliffs only happen if you configure the network the wrong way. If you don't have enough layers, it can't do it, but if you have enough players, it can do it. And so… so yeah, so… so the answer is yes. You know, groking happens.

388
00:37:18.900 --> 00:37:22.269
David Bau: You know, these cliffs, these phase transitions that happen in other

389
00:37:22.420 --> 00:37:25.080
David Bau: Settings, and this is, like, the clearest one.

390
00:37:25.470 --> 00:37:30.609
David Bau: Here's another one. So this is a paper that… My student, Eric Todd.

391
00:37:31.440 --> 00:37:35.919
David Bau: Jess got into… I think he's gonna… Presented at ICLR.

392
00:37:36.390 --> 00:37:39.530
David Bau: And… and here, he's… he's pulled apart.

393
00:37:39.710 --> 00:37:43.070
David Bau: Now, one mechanism, he's pulled up apart, like, 8 mechanisms.

394
00:37:43.440 --> 00:37:50.180
David Bau: Inside this network, and then one of the things that he did is he went inside to look at

395
00:37:50.360 --> 00:38:06.930
David Bau: the training dynamics for how these mechanisms did over a bunch of different training runs. And so this is averaged over, you know, a dozen training runs or something like that of the model. You can see that, you know, pretty consistently, you have these different cliffs where different mechanisms all of a sudden get really good accuracy.

396
00:38:07.170 --> 00:38:11.710
David Bau: Yes? Have they tried, like, if I drop it on it.

397
00:38:11.710 --> 00:38:27.380
David Bau: only one particular type of data, and does it generalize to other? For example, if I only wrote on physics, does it generalize to the chemistry or biology? Yes, exactly. That's exactly what's being done here. So, let me tell you two, like, things that you can do to try to get gracking curves out of your papers, if you want to do it.

398
00:38:27.460 --> 00:38:31.690
David Bau: So, one thing, and this is the thing that's probably practical for what you're doing.

399
00:38:31.860 --> 00:38:36.150
David Bau: Which is, you know, you have some general distribution of what's going on, and

400
00:38:36.470 --> 00:38:53.770
David Bau: you think that you understand how a sub-distribution, works. Like, maybe, maybe there's, like, 4 different strategies that your network is using to solve the problem of estimating uncertainty, or something like that. And depending on the sentence, or depending on the keywords, or something like that.

401
00:38:53.900 --> 00:39:00.209
David Bau: you know, it uses one of these different strategies. So, what you could do is you could say,

402
00:39:00.370 --> 00:39:02.859
David Bau: Let's pick a sub-dataset.

403
00:39:03.550 --> 00:39:10.390
David Bau: of my overall dataset, which isolates this strategy. Like, now that I have some understanding of what it's doing.

404
00:39:10.870 --> 00:39:13.639
David Bau: You know, 10% of my sentences

405
00:39:13.790 --> 00:39:18.489
David Bau: use this strategy, and they exclude all the other strategies. Like, it's just purely this one.

406
00:39:18.850 --> 00:39:25.410
David Bau: Once you have that, then it's pretty interesting, not just to measure it at the… final…

407
00:39:25.840 --> 00:39:35.500
David Bau: Checkpoint, the final model that you're testing, but to go back into the early checkpoints, if you have them, and then plot, like, well, how did… how did… what was the performance on this little subset?

408
00:39:35.830 --> 00:39:50.099
David Bau: update, and that's… that's basically what this is. So basically, Eric says, oh, you know, I'm solving a harder form of algebra problems. He went back to the rock and paper reading, so he's… now he's solving problems, and he's asking, which… which of these problems is solved by copying.

409
00:39:51.040 --> 00:39:55.989
David Bau: Which of these problems are solved by, identifying the identity.

410
00:39:56.430 --> 00:40:10.999
David Bau: you know, A times B equals A, that means B is the identity, which is a strategy. Which of these things is solved by identifying commutative inverses, maybe A times B, sometimes equals B times A, depending on what situation you're in.

411
00:40:11.480 --> 00:40:18.830
David Bau: Which things work by associativity? Which things work by cancellation? Like, can you solve it like a Sudoku puzzle? Right? Canceling things out.

412
00:40:19.580 --> 00:40:23.200
David Bau: And so he found, like, mechanisms for doing all these different things.

413
00:40:23.340 --> 00:40:25.729
David Bau: Inside the network, and then he says.

414
00:40:26.040 --> 00:40:43.470
David Bau: Number 3 is identity. Actually, the performance of the identity goes pretty well until it drops, and it's really bad at the identity while it's learning how to do commutative copying, but then after it learns how to do commutative copying, the performance on the identity comes back and goes near 100%, right? And so on. So he sees these funny

415
00:40:43.700 --> 00:40:45.710
David Bau: Putting weird-looking brack makers.

416
00:40:46.140 --> 00:41:02.609
David Bau: That makes sense? So that's… that's probably the main thing that you want to do, if you want to measure Grakenbell. I need to measure… I need to mention this other paper here. This is from, from who, latent space models of train dynamics. So the other thing you can do is this, is…

417
00:41:03.450 --> 00:41:08.239
David Bau: So I mentioned, you know, you can see these phase transitions if you just measure, like, the size of the weights.

418
00:41:08.560 --> 00:41:09.430
David Bau: Right.

419
00:41:09.810 --> 00:41:14.750
David Bau: And if you just measure other random stuff about your network, you can just…

420
00:41:14.920 --> 00:41:22.210
David Bau: take a linear algebra textbook or something. How about the spectral norm of all the matrices?

421
00:41:22.400 --> 00:41:24.130
David Bau: How about the traits?

422
00:41:24.750 --> 00:41:26.709
David Bau: You know, how about,

423
00:41:27.040 --> 00:41:44.189
David Bau: How about the maximum of all the weights instead of the average of all the weights? How about, like, this… the magnitude of the gradient, if we pass gradient over the network? How about this? You know, you can measure different things. And if you measure one of the… each one of these during all the checkpoints, you'll also see that sometimes there's, like, cliffs up and down.

424
00:41:44.370 --> 00:41:49.020
David Bau: You know, all around, right? Like, it's not just as smooth, progression.

425
00:41:49.250 --> 00:42:01.929
David Bau: And so what… what they did in this paper, what Fu did in this paper, is they… they said, I don't know what we should measure. Let's measure 13 of these things, right? Or some number, right? Take a dozen of these numbers.

426
00:42:02.080 --> 00:42:06.029
David Bau: And then every one of these checkpoints we have, imagine that it gives us these, like.

427
00:42:06.500 --> 00:42:11.730
David Bau: 13 line plots that look like this, right? All over the place.

428
00:42:12.010 --> 00:42:13.170
David Bau: And then they said.

429
00:42:13.330 --> 00:42:19.319
David Bau: So, what does it tell us? And then what they did is they… they ended up training…

430
00:42:19.440 --> 00:42:32.440
David Bau: a hidden Markov model on top of the… this… this… this, like, 13-dimensional line plot, saying that, okay, if this is high, this is low, and this is medium, this is high, this is low, right? Or, like, all these numbers, that means that we're in state X.

431
00:42:32.650 --> 00:42:40.709
David Bau: And when you're in state X, then you have some probability of going to state Y, where this other number is high and this other number belongs, right? Does that make sense?

432
00:42:40.830 --> 00:42:47.980
David Bau: And… and they… they… they said… and then they did, like, thousands of, like, you know, training runs of some networks.

433
00:42:48.320 --> 00:42:52.290
David Bau: And then they said, when you're in this one state, then…

434
00:42:52.450 --> 00:42:55.450
David Bau: You have some probability of going to the next state.

435
00:42:55.660 --> 00:43:03.640
David Bau: You know, at every iteration. You have some progress going to the other state. And so they did this whole beautiful analysis of, like, what training dynamics looks like.

436
00:43:03.930 --> 00:43:11.480
David Bau: Where… Where, like… It's like, it's like this picture of crystallization, like, oh, our mixture…

437
00:43:11.670 --> 00:43:14.140
David Bau: Of mechanisms is like this.

438
00:43:14.310 --> 00:43:25.230
David Bau: And then after you train a little bit, and there's a chance that you go this way, your chance to go down, you might… you might start crystallizing your mechanisms in one direction, you might crystallize your mechanisms in a different direction.

439
00:43:25.600 --> 00:43:32.380
David Bau: And so… so they sort of… they did this analysis. So this is probably beyond what you would do, but I just want you to be aware that

440
00:43:32.610 --> 00:43:33.850
David Bau: It's another approach.

441
00:43:33.960 --> 00:43:34.830
David Bau: people have.

442
00:43:36.730 --> 00:43:42.670
David Bau: Yes, here, states are basically the set of those linear algebra Merry Christmas.

443
00:43:43.330 --> 00:43:54.069
David Bau: Yeah, so every state is… so they had, like, something like a dozen observable variables. The observable variables were these aggregate statistics of the network.

444
00:43:54.200 --> 00:43:56.849
David Bau: Like, what's the average norm of the weights or something?

445
00:43:58.810 --> 00:44:11.480
David Bau: And so, every one of these hidden Markov states, the hidden states, corresponds to, some, some set of visible states, you know, and

446
00:44:11.680 --> 00:44:16.479
David Bau: And, and so, yeah, so they're asking the question.

447
00:44:16.720 --> 00:44:26.530
David Bau: If you sort of partition your visible states into these underlying hidden states, then what's an economical way of representing everything? Like, if you said, oh, you…

448
00:44:26.730 --> 00:44:37.290
David Bau: If you discretize everything, you said, oh, I want to just have a half dozen hidden states, then which one of the hidden states would there be, and what would the transition be between them?

449
00:44:37.540 --> 00:44:39.950
David Bau: That would describe the training process.

450
00:44:40.110 --> 00:44:44.460
David Bau: And… and so that's, like, a classic… did Markov model training problem.

451
00:44:44.900 --> 00:44:47.549
David Bau: And so they just plot all their data into that.

452
00:44:49.340 --> 00:44:58.139
David Bau: into that machinery, and they got these beautiful plots out for what the, what the state transitions would be, and they look like. This type of thing.

453
00:44:58.840 --> 00:44:59.770
David Bau: That make sense?

454
00:45:00.010 --> 00:45:01.859
David Bau: Yes, that makes sense. Yes.

455
00:45:02.080 --> 00:45:08.140
David Bau: Next question was, did they observe groking, and if so, could they explain gulking with smart objects?

456
00:45:08.610 --> 00:45:20.810
David Bau: Yeah, yeah, so they… I don't know if they used the term rocking, I don't remember ordering, but I think this is, you know, motivated by the same thing, is that basically, when they have a transition like this.

457
00:45:21.040 --> 00:45:24.859
David Bau: Like an arrow? I think that what these arrows are, for them.

458
00:45:25.040 --> 00:45:36.490
David Bau: is the equivalent of grocking. And what they would say is that grokking doesn't just happen once when you have a more complicated system. You have lots of little transitions that happen when the network

459
00:45:36.630 --> 00:45:43.760
David Bau: it seems like it becomes… has a different flavor. And it drops, like, from here to here, and it grabs 4, sometimes it drops, like, here, and then here to there.

460
00:45:44.000 --> 00:45:45.690
David Bau: Very cool. That makes sense?

461
00:45:46.700 --> 00:45:47.360
David Bau: Yeah.

462
00:45:47.980 --> 00:45:51.320
David Bau: And so… States capture the time as well.

463
00:45:51.470 --> 00:45:59.429
David Bau: Yeah, so I think that they're… the way that they're thinking of it is, you know, you can see these .997, so most of the… they say that

464
00:45:59.640 --> 00:46:02.489
David Bau: So remember in the graphic paper,

465
00:46:02.700 --> 00:46:05.920
David Bau: There's this big variation between how long it will take.

466
00:46:06.240 --> 00:46:18.539
David Bau: between… sometimes it'll take a long time to grok, and sometimes it'll take a few times, you know, less time to grok. So you can think of that as kind of a force-on process, right? You can think of it as, like, well, every time you run gradient descent for another 100 steps.

467
00:46:18.930 --> 00:46:21.730
David Bau: There's a chance that you might stumble on the solution.

468
00:46:22.600 --> 00:46:26.749
David Bau: But then there's also a chance that you haven't stumbled on it yet, you might have to just look longer.

469
00:46:27.070 --> 00:46:37.440
David Bau: And… and that's what this… represents. Oh, after some more training steps, there's a 98.7% chance that you haven't

470
00:46:38.810 --> 00:46:41.670
David Bau: the transition out yet, you're still stuck in state one.

471
00:46:42.050 --> 00:46:53.050
David Bau: But then there's a small, you know, 1% chance, or whatever, that you found the thing, and let's get to that state. And so, that might happen fast. If you're lucky, it might take a long time. If you're not lucky.

472
00:46:53.280 --> 00:46:55.210
David Bau: That's sort of the way they model it with us.

473
00:46:56.090 --> 00:46:56.860
David Bau: Fantastic.

474
00:46:56.970 --> 00:47:03.299
David Bau: Yeah. And so I, I like this. I think it's, you know, this is still the paper, I think, that…

475
00:47:03.550 --> 00:47:13.390
David Bau: you know, had the most systematic approach to doing it. But it's still opaque, so, like, if you just collect all these numbers, you get this diagram, you still don't know, like, what's in each one of these.

476
00:47:14.070 --> 00:47:20.630
David Bau: false, right? And you don't really know if this is, like, real or not, right? It's just… it's just, like, kind of a way of looking at it.

477
00:47:20.870 --> 00:47:28.629
David Bau: what's going on? And so, for you, for you guys, I kind of recommend this. This is a lot more transparent, like, you know, each one of these lines.

478
00:47:28.760 --> 00:47:31.189
David Bau: Corresponds to some mechanism, some subset.

479
00:47:31.370 --> 00:47:33.250
David Bau: I'm dead, and you know what it means.

480
00:47:33.620 --> 00:47:39.750
David Bau: Right? Now, it'd be interesting if you did this analysis, and you lined it up with this analysis, and say,

481
00:47:39.970 --> 00:47:44.580
David Bau: You know, ball number 1 corresponds to this being high and this being low in the real data.

482
00:47:44.730 --> 00:47:50.049
David Bau: I don't think anybody's done, like, a triangulated paper like that. Trying to relate history.

483
00:47:50.240 --> 00:47:51.469
David Bau: Make sense?

484
00:47:52.670 --> 00:47:53.670
David Bau: All right.

485
00:47:55.350 --> 00:47:56.500
David Bau: So, okay.

486
00:47:56.970 --> 00:47:58.799
David Bau: Oh, that was a lot on cracking.

487
00:48:00.230 --> 00:48:13.010
David Bau: I better go through the next one faster, yes. I just have a question on, like, after we do these analyses on dynamics, how people use the information to… for some more practical benefit, besides just understanding what's going on?

488
00:48:14.030 --> 00:48:18.799
David Bau: better training dynamics or something? It's a good question.

489
00:48:19.260 --> 00:48:25.840
David Bau: I don't know, I'm not a trained dynamics person, so I'm not in the business of, like, defending, this lab work. You know, Naomi Sapra.

490
00:48:26.070 --> 00:48:28.259
David Bau: Who is, like, a training dynamics?

491
00:48:29.540 --> 00:48:30.830
David Bau: Yeah, neither.

492
00:48:31.080 --> 00:48:32.229
David Bau: Down the street.

493
00:48:32.640 --> 00:48:36.450
David Bau: I think his name is at BU now. Can I ask her?

494
00:48:39.470 --> 00:48:45.540
David Bau: Yeah, what would be good? So, like, I think that you might have a sense… so, okay, so there's some controversies.

495
00:48:46.040 --> 00:48:51.090
David Bau: That are maybe addressed by some of the training dynamics stuff. Like, some people think.

496
00:48:51.380 --> 00:48:55.149
David Bau: And it can actually be… our fault.

497
00:48:55.340 --> 00:48:58.050
David Bau: To train your network for the wrong amount of time.

498
00:48:58.930 --> 00:49:02.839
David Bau: And… but then… but then there's different advice that you get, like.

499
00:49:03.020 --> 00:49:07.639
David Bau: Don't train your network… don't give up on your network training too early.

500
00:49:07.800 --> 00:49:09.250
David Bau: You should train longer.

501
00:49:09.700 --> 00:49:20.209
David Bau: Right, then it'll get better. Oh, but then some other people say, oh, no, no, don't train it too long. If you train it too long, then you're wasting CPU, and then eventually it actually starts getting worse.

502
00:49:20.510 --> 00:49:26.700
David Bau: Right, so this whole discussion… You know, it has to do with train dynamics. There's also other stuff.

503
00:49:26.970 --> 00:49:30.560
David Bau: Which is… which interacts with this in practice.

504
00:49:30.690 --> 00:49:31.950
David Bau: Which is,

505
00:49:32.290 --> 00:49:38.400
David Bau: You know, one of the biggest things that all the giant companies are doing is they're coming up with their mix of candy data.

506
00:49:38.560 --> 00:49:49.429
David Bau: And so there's another access here for training dynamics, just like, well, how does the mix of training data during the process influence all the training dynamics that we've learned? And,

507
00:49:49.660 --> 00:49:51.880
David Bau: And for that, Jamie, I haven't seen much.

508
00:49:52.080 --> 00:49:58.229
David Bau: of data dynamics papers from that point of view. Maybe I'm missing the literature. But in industry, this is…

509
00:49:58.510 --> 00:50:00.120
David Bau: This is, like, a major questioning.

510
00:50:01.430 --> 00:50:12.799
David Bau: Yes, maybe I'll… I mean, I don't remember the exact paper, but I remember one of the events last year. Yes. They were doing, like, generative modelings of, like, images, diffusion models, and they had this

511
00:50:13.330 --> 00:50:14.190
David Bau: kind of…

512
00:50:14.620 --> 00:50:29.500
David Bau: like, theoretical, like, a formula, basically, predicting when those stages would occur as a function of how much data you had. So you could say, for this training run to be worth it, we need, like, this amount of data.

513
00:50:29.710 --> 00:50:36.950
David Bau: For the, window of useful models to be… is it, reasonably big, or…

514
00:50:37.720 --> 00:50:48.609
David Bau: That's right. So, you've heard of scaling laws, right? And so, obviously, training dynamics is related to scaling laws, although scaling laws are… they usually look at, like, a small number of metrics.

515
00:50:48.750 --> 00:50:54.250
David Bau: how good is the network in general? Whereas Train Dynamics, people…

516
00:50:54.880 --> 00:50:57.880
David Bau: Try to, you know, generally try to cut it up with a finer tooth.

517
00:50:58.110 --> 00:50:58.860
David Bau: Boom.

518
00:51:01.280 --> 00:51:02.720
David Bau: Okay.

519
00:51:02.870 --> 00:51:04.560
David Bau: We'll go through path scopes quickly.

520
00:51:04.700 --> 00:51:09.279
David Bau: Actually, sorry, it's a new idea. So, there was a question…

521
00:51:10.520 --> 00:51:14.810
David Bau: From Armita. Is Armita here? Yes. What was your question?

522
00:51:17.000 --> 00:51:27.749
David Bau: So, yeah, I wanted to, see how… where this stands compared to our previous… Yeah, how is it related to the other stuff? Yeah, and when we prefer this one?

523
00:51:28.140 --> 00:51:37.010
David Bau: Yes, and so, when you prefer things in general, I think that, you know, it's gonna give you the same answer I've given a few times.

524
00:51:37.160 --> 00:51:45.360
David Bau: Which is… That you… you probably want to not… Lean on too few methods.

525
00:51:45.490 --> 00:51:54.470
David Bau: And you want to find complementary methods, once you have a hypothesis that you're advancing. So if you're right, you say, oh, I think that we figured out the condition of this.

526
00:51:54.780 --> 00:51:58.309
David Bau: You'll know from your own experiment.

527
00:51:58.490 --> 00:52:03.259
David Bau: Like, what the strongest evidence that you are most excited by? So, obviously, you present that first.

528
00:52:03.490 --> 00:52:08.690
David Bau: But then, after you present that, then you have to realize that the reviewers

529
00:52:09.000 --> 00:52:16.800
David Bau: you know, or any reader will look at that evidence and say, well, you know, that comes with caveats. Every single one of the methods we have comes with caveats.

530
00:52:16.950 --> 00:52:26.780
David Bau: And so you can say, well, we don't just see it that way. When you look at it this other way, then you also see something interesting. And when you look at it a third way, you also see it. And so the more different ways

531
00:52:26.800 --> 00:52:37.120
David Bau: you can see the same phenomenon, the more people will believe that you have a handle on what you found. And so… so I'm mostly talking about patch scopes, just to give you another

532
00:52:37.560 --> 00:52:39.660
David Bau: Hello? For your basket?

533
00:52:39.900 --> 00:52:44.910
David Bau: Now, it's a good question. How does it relate to probing?

534
00:52:45.190 --> 00:52:46.800
David Bau: Patching or steering.

535
00:52:47.410 --> 00:52:51.450
David Bau: I'd say that this is most related to logic lens.

536
00:52:52.630 --> 00:52:57.199
David Bau: Right, that method that we use, you know, the first week of class just to learn how transformers work.

537
00:52:57.500 --> 00:53:06.330
David Bau: And it's, you know, Logit Lands is a serious method, like, you should, you shouldn't use a lot of logic lens to look inside your things sometimes. It doesn't give you that much insight.

538
00:53:06.480 --> 00:53:11.320
David Bau: for Legendlands, and so you might not have anything to report.

539
00:53:11.470 --> 00:53:15.849
David Bau: But there are ways of making logic lens more powerful, and this is one of them, so I just wanted to…

540
00:53:16.190 --> 00:53:18.400
David Bau: to, to share that with you.

541
00:53:18.550 --> 00:53:26.489
David Bau: And so, so there's a… there's been a few papers that have worked on making a larger lens more powerful, and they all have

542
00:53:26.800 --> 00:53:38.620
David Bau: There's time for public forum. So one of them came out of our lab, it was called Future Lens of… and Future Lens does that… so what does is instead… so what does LogitLens have to do? Logit Lens takes…

543
00:53:38.750 --> 00:53:40.790
David Bau: Other than Steve, did you want to know.

544
00:53:40.940 --> 00:53:42.989
David Bau: What's going on in the head stage?

545
00:53:43.420 --> 00:53:47.449
David Bau: And it decodes it to a word, or a set of words, right?

546
00:53:47.650 --> 00:53:52.730
David Bau: And it decodes it to a word, by patching that state

547
00:53:52.910 --> 00:54:03.289
David Bau: to the decoder head of the transformer. It just says, you know, I know this is in the middle of the network, but instead of having you process it, I'm just gonna skip to the end, I want you to tell me what this decoder can do right away.

548
00:54:03.560 --> 00:54:16.359
David Bau: And then it gives you a word that you can write down, and it, you know, sometimes it gives you some interesting insights from doing this, right? Certainly more interesting than just looking at a thousand numbers, right? You can see what words it correspond to.

549
00:54:16.810 --> 00:54:23.199
David Bau: Right? So, what FutureLens does is it says, Well, you know…

550
00:54:23.510 --> 00:54:30.069
David Bau: This vector here, it was never meant to be decoded directly, in the…

551
00:54:30.770 --> 00:54:35.479
David Bau: decoder head of the transformer. It was meant to, like, pass through a bunch of layers.

552
00:54:35.680 --> 00:54:42.690
David Bau: Which, who knows what they do, right? You know, maybe, you know, do some transforming on it until it finally, like, comes up

553
00:54:43.060 --> 00:54:44.519
David Bau: with… with the answer.

554
00:54:44.900 --> 00:54:51.059
David Bau: And so, so, you know, obviously, if we just run the transformer, it'll say something.

555
00:54:51.240 --> 00:54:58.990
David Bau: But the problem with just running the transformer is that in a lot of contexts, when you just run the transformer.

556
00:54:59.760 --> 00:55:05.830
David Bau: It'll… it has, like, all this other context besides… when you just run the transformer, it doesn't tell you just, like, what's in this.

557
00:55:06.100 --> 00:55:11.110
David Bau: thing. It tells you, like, what's going on everywhere. There's all this attention, bringing information from everywhere, so you don't know

558
00:55:11.220 --> 00:55:14.390
David Bau: When it finally emits something, if it really has to do with this.

559
00:55:14.630 --> 00:55:19.899
David Bau: with this hidden state, or if it has to do with all the other information I was bringing in. So what you could do is.

560
00:55:20.100 --> 00:55:26.760
David Bau: You could pluck this out, And you could find a clean transformer that doesn't have any of the contacts.

561
00:55:27.220 --> 00:55:31.070
David Bau: Right, so, you know, I don't know, maybe there's a whole newspaper article about New York City.

562
00:55:31.410 --> 00:55:34.959
David Bau: And then, and then this sensation shows up, and I said, New York City.

563
00:55:35.260 --> 00:55:37.850
David Bau: So, is it saying New York City because of the city and state?

564
00:55:37.960 --> 00:55:41.710
David Bau: or the saying, New York City because of the rest of the newspaper article.

565
00:55:42.120 --> 00:55:48.460
David Bau: Well, one thing that we can do is, we could just cut out the rest of the newspaper article, get a clean transform with no context.

566
00:55:48.980 --> 00:55:51.069
David Bau: And we could just, like, jam this didn't sit in.

567
00:55:51.340 --> 00:55:53.799
David Bau: I'm gonna say, I wonder if it says New York City.

568
00:55:54.100 --> 00:55:56.020
David Bau: And sometimes it does!

569
00:55:56.440 --> 00:55:58.789
David Bau: It does say New York City anyway, which means

570
00:55:59.010 --> 00:56:03.160
David Bau: that hidden state encodes for New York City.

571
00:56:03.480 --> 00:56:07.440
David Bau: Now, it turns out that this usually doesn't work.

572
00:56:07.950 --> 00:56:12.450
David Bau: If you remove all the context, and you, you know, put some hidden state in.

573
00:56:12.550 --> 00:56:15.940
David Bau: You know, like, the lack of context

574
00:56:16.130 --> 00:56:29.829
David Bau: tends to make a transformer really, really think that it's, like, at the beginning of the document. Like, once upon a time, kind of state, right? So even though you plop in a hidden state that says, you know, I'm including for… you should say New York City.

575
00:56:30.140 --> 00:56:31.449
David Bau: It'll, it'll link.

576
00:56:31.850 --> 00:56:37.040
David Bau: They'll, like, be so overwhelmed with the other evidence, they should say, like, once upon a time, it won't say.

577
00:56:37.240 --> 00:56:42.889
David Bau: what's in the hidden state. So, what Future Lens is, is, it's like a… it was a, a recipe

578
00:56:43.390 --> 00:56:48.640
David Bau: For how to configure a transformer so that it tends to tell you What?

579
00:56:48.820 --> 00:56:50.740
David Bau: This token would be.

580
00:56:51.290 --> 00:56:53.599
David Bau: And… and the way it worked was.

581
00:56:53.810 --> 00:56:59.400
David Bau: Instead of having the transform be perfectly clean, with nothing coming before it.

582
00:56:59.900 --> 00:57:05.920
David Bau: Futureland says, we're gonna come up with some big text, That we're gonna put there.

583
00:57:07.190 --> 00:57:11.859
David Bau: That instead of it being the very beginning of the document, we'll take 10, the whole… get 10 words.

584
00:57:12.090 --> 00:57:16.900
David Bau: That we'll put there, and then we'll find the optimal 10 words, and that'll be just the same words for everybody.

585
00:57:17.050 --> 00:57:21.229
David Bau: But when you put these 10 words there, like, the 10 words might say something like,

586
00:57:22.130 --> 00:57:27.110
David Bau: You know, be honest about what you're thinking right now. What you're thinking about right now is…

587
00:57:27.780 --> 00:57:30.760
David Bau: This, and then you'd plant the thought in the model, and then…

588
00:57:31.190 --> 00:57:35.569
David Bau: And then it says New York City, right? So that's what the idea of FutureLens is.

589
00:57:35.680 --> 00:57:36.709
David Bau: And we tried…

590
00:57:36.820 --> 00:57:41.750
David Bau: prompt engineering it, come up with, like, prompts like what I told you, and it didn't work that well.

591
00:57:42.110 --> 00:57:44.550
David Bau: But then what we did is we did prompt optimization.

592
00:57:44.720 --> 00:57:52.270
David Bau: Where instead of putting real tokens in here, we did, what's called, you know, odd,

593
00:57:52.460 --> 00:57:54.710
David Bau: soft token learning.

594
00:57:54.820 --> 00:57:59.350
David Bau: Which we ran an optimization process to optimize tokens to do well at this task.

595
00:57:59.470 --> 00:58:07.569
David Bau: And that's… so that's what FutureLens is. So it's… it's a… it's a transformer with 10 learned tokens that you seed it with.

596
00:58:07.820 --> 00:58:11.119
David Bau: That are the prime and the transformer to tell you.

597
00:58:11.600 --> 00:58:13.210
David Bau: What it thinks the token means.

598
00:58:13.810 --> 00:58:16.639
David Bau: That make sense? So that's… that… so that was…

599
00:58:16.930 --> 00:58:19.419
David Bau: That's the thing that Kiana and my lab did.

600
00:58:20.000 --> 00:58:21.340
David Bau: A couple years ago.

601
00:58:21.680 --> 00:58:32.020
David Bau: And there were a number of papers that did different things that were similar to this. And what PatchScope says, is it's a roll-up of these methods. They're sort of surveying

602
00:58:32.370 --> 00:58:47.619
David Bau: you know, what's… what other people have done, including FutureLens, and they said, okay, well, in general, here's, like, the state of the art of what you would want to do. And… and so it's a… it's a generalization of FutureLens to do this. And so,

603
00:58:48.110 --> 00:58:54.789
David Bau: So instead of having one fixed prompt, they suggest a bunch of interesting, clever prompts, like, so here's a really clever prompt that I wish I thought of.

604
00:58:55.100 --> 00:58:59.690
David Bau: which is really interesting. There's a prompt that says, Cat goes to cat.

605
00:58:59.910 --> 00:59:04.740
David Bau: 135 goes to 135. Hello goes to hello. You can have actually a really long prompt like this, like.

606
00:59:04.860 --> 00:59:16.170
David Bau: You know, the same word goes to the same word. Oh, you have some word in input, you gotta have that word in output. The same word in input, it's just a… it's a echo. Echo, echo, echo. Say the thing that you said, say the thing you said, right?

607
00:59:16.390 --> 00:59:23.130
David Bau: And then… and then what you do is you put this thing in, you put a dummy token in, and then… and then you let…

608
00:59:23.390 --> 00:59:28.129
David Bau: Then you'd patch in, You know, the representation, that you wonder what it means.

609
00:59:28.270 --> 00:59:30.200
David Bau: And then you say, echo that?

610
00:59:30.630 --> 00:59:33.099
David Bau: Right after it's seen, like, 100 echoes.

611
00:59:33.270 --> 00:59:34.700
David Bau: Matt's gotta do another one.

612
00:59:34.830 --> 00:59:38.430
David Bau: And then it tends to tell you, huh, that's Chef Ditos.

613
00:59:40.230 --> 00:59:42.849
David Bau: So that's pretty neat. So they came up with a few different

614
00:59:42.980 --> 00:59:45.700
David Bau: Cool tricks for doing this.

615
00:59:46.020 --> 00:59:47.950
David Bau: Another cool trick that was in patch scopes.

616
00:59:48.110 --> 00:59:59.369
David Bau: is that instead of patching, like, what… we were such purists on Future Alliance, we were like, if you're at layer 5, then the transformer expects to see you at layer 5, so patch into layer 5, that's where we put it.

617
00:59:59.580 --> 01:00:02.319
David Bau: You're like, I don't know, you don't have to be such a purist.

618
01:00:02.490 --> 01:00:04.539
David Bau: You can just stick that back at Layer 1.

619
01:00:04.710 --> 01:00:07.570
David Bau: And they just measured it, and it tends to work better.

620
01:00:09.160 --> 01:00:18.080
David Bau: And so, so yeah, so they had a bunch of, results like this, where they were just measuring, you know, how accurately they could recover.

621
01:00:18.430 --> 01:00:32.600
David Bau: you know, a description of what's, you know, what's going on in that state. And, and, and they sort of, you know, they tweaked it and optimized the process a bit. So that, so, so if you want to do something that's like Watch It Lands, but you wish to, like.

622
01:00:33.040 --> 01:00:37.460
David Bau: were better, or cooler, sometimes you might read the patch ghostscaper.

623
01:00:37.950 --> 01:00:38.919
David Bau: That make sense?

624
01:00:39.370 --> 01:00:40.130
David Bau: Yeah.

625
01:00:40.470 --> 01:00:48.949
David Bau: Or you can also use LedgerLens, which is super simple, which doesn't require any configuration or anything, so this is another possibility. It's also related to…

626
01:00:49.250 --> 01:00:50.150
David Bau: patching.

627
01:00:50.270 --> 01:00:52.880
David Bau: Right? Because, you know, you patch from one model to another.

628
01:00:53.270 --> 01:00:57.000
David Bau: So, everything that you're doing for your patching experiments, it would actually be the same.

629
01:00:59.020 --> 01:01:00.080
David Bau: The same principle.

630
01:01:01.190 --> 01:01:04.149
David Bau: Oh, there was a question from Ananya, or Yuchen.

631
01:01:05.060 --> 01:01:08.379
David Bau: Yeah, like, what's… yes? Oh, yeah, I guess the question was, like.

632
01:01:09.160 --> 01:01:17.340
David Bau: It kind of comes with this thought process that you're basically asking the model again, you know, hey, you thought about this before, what do you think about this again? Yeah.

633
01:01:17.560 --> 01:01:23.319
David Bau: how do you trust that the model will produce, like, the right? Yeah, I don't know if you can. It's a really good question.

634
01:01:24.450 --> 01:01:34.009
David Bau: It's… it's… it's… so, I think that if you're really, really worried about deception, and, you know, going back to the model to ask again, like, doesn't totally…

635
01:01:34.400 --> 01:01:36.120
David Bau: You know, solve the problem?

636
01:01:36.380 --> 01:01:42.119
David Bau: At least you're running the model in a way that it wasn't trained to do, so, right? So, like.

637
01:01:42.390 --> 01:01:45.320
David Bau: so you're not overfitting on…

638
01:01:45.820 --> 01:02:04.209
David Bau: you know, it's training, like, if you believe that there's some way of setting up the network to be honest, like, this is, like, an honest situation, it's like, oh, the network is just gonna say what it has to say, it's an honest state of mind, then maybe, but there's no… there's a proof of this. I think the proof is empirical. They're like.

639
01:02:04.310 --> 01:02:08.299
David Bau: We just… we just measured this on a thousand things, and it seems to work okay.

640
01:02:08.670 --> 01:02:13.909
David Bau: And, so hopefully with the 1001 thing, it's… maybe it'll work okay, too?

641
01:02:14.140 --> 01:02:16.279
David Bau: But it's not… it's not a proof.

642
01:02:16.620 --> 01:02:18.269
David Bau: But…

643
01:02:18.420 --> 01:02:37.610
David Bau: It's just building confidence. It's just another method. I probably wouldn't use it alone. I would use it with other things. I have another question. Yes. In the case of the Futurelands paper, you used two different transformers. Oh, it was the same transformer. Oh, okay. Yes. So were they trained on the same… like, if we were to change training data, or basically patch.

644
01:02:37.620 --> 01:02:48.610
David Bau: Activations from a similar, like, all different therapy. Yes, there's another paper from the Transloose paper, I forgot the name of the part of the paper, but Sarah Schweitman and so on.

645
01:02:48.850 --> 01:02:53.929
David Bau: Where they do this, but they train a different transformer to look at the states. So, you know, when you…

646
01:02:54.070 --> 01:02:55.410
David Bau: We have some state.

647
01:02:55.620 --> 01:03:06.390
David Bau: And you want to, like, get a description of what it is, and why, you know, why does it have to be the same transformer? It could be a very different one. Just train down the problem of understanding the states of this one. So you can do that, too.

648
01:03:06.480 --> 01:03:19.850
David Bau: That's a more intensive process. So, like, why would you expect some other transformer to understand the states of this one? Well, it wouldn't. It has no reason to, so it's almost like kind of training from scratch. It's got to learn this foreign language. This one is sort of funny, because

649
01:03:19.950 --> 01:03:24.619
David Bau: You can see how this transformer would already kind of understand the internal language of itself.

650
01:03:24.780 --> 01:03:28.350
David Bau: Right, so it's like a lot… it's less tuning that you have to do, maybe just procure.

651
01:03:30.730 --> 01:03:31.480
David Bau: Yeah.

652
01:03:31.820 --> 01:03:33.600
David Bau: And so,

653
01:03:33.930 --> 01:03:40.299
David Bau: But yes, but that's another approach. It's a very good question, and it's a very reasonable approach, and it's another one of the avenues that people could take.

654
01:03:42.410 --> 01:03:43.620
David Bau: And so

655
01:03:44.070 --> 01:03:51.589
David Bau: Okay, so that's… that's patch scopes. I like the… I like the, how do we trust a model to explain its own representations questions? It's very deep.

656
01:03:52.190 --> 01:03:56.830
David Bau: And it's… that may be one of the fundamental questions of our view.

657
01:03:57.100 --> 01:04:00.899
David Bau: It's, like, one of the things that, you know, we puzzle of.

658
01:04:01.270 --> 01:04:04.280
David Bau: So, okay, so the last paper, I think, is…

659
01:04:04.640 --> 01:04:07.419
David Bau: the one that I think is the most important.

660
01:04:07.570 --> 01:04:12.250
David Bau: Out of this, but it's the least likely that you would actually use this in your…

661
01:04:12.610 --> 01:04:20.429
David Bau: your research. It's actually a fairly difficult technique to use, but I want you guys to be aware of it, because I just think it's really important.

662
01:04:22.140 --> 01:04:30.629
David Bau: And so this paper was about chess. It was about AlphaZero, which everybody knows can, like, beat all of us in chess.

663
01:04:30.900 --> 01:04:49.410
David Bau: probably beat everybody on the planet in chess. It's like a superhuman chess player. And so whenever you have this question of what does it mean to have superhuman AI, you should remember that we already have superhuman AI, we have superhuman chess players that can beat you, not just through, like.

664
01:04:49.630 --> 01:05:03.930
David Bau: you know, throwing huge amounts of compute at you, they can beat you the same way that if you go to Harvard Square and you find those chess people that sometimes hang out by the subway station with all the little chess boards on the…

665
01:05:03.940 --> 01:05:12.809
David Bau: On the tables there, and they tie their hands behind their back, or whatever, right? And they can play, like, 12 chess people at the same time. They just go, and they…

666
01:05:12.890 --> 01:05:24.179
David Bau: move again, they move again, and then they beat everybody, right? They don't have to think about your game, because they're so good at chess, they can see the future, just as they get the board. You know, Alpha Zero works the same way.

667
01:05:24.550 --> 01:05:25.640
David Bau: And…

668
01:05:25.910 --> 01:05:36.619
David Bau: And, you know, so… and it works better than most professional chess players. So, in the paper, they have this example of, I'm not a chess player, the author, Lisa Schott.

669
01:05:36.690 --> 01:05:44.249
David Bau: was a professional chess player, and Lisa looks at this board and says, you know, obviously anybody who's, like, a pro chess player would move a certain way.

670
01:05:44.310 --> 01:06:01.440
David Bau: But the weird thing about this state is that AlphaZero would move a different way. And, like, the exact… like, there's many, many situations like this, where, like, every human who's, like, trained in chat would go one way, and AlphaZero does a different thing. It's like, for them, this is, like, concrete evidence that

671
01:06:01.750 --> 01:06:08.689
David Bau: I mean, Ed beats everybody. Like, that it has superhuman knowledge, it knows something about this move that people don't know.

672
01:06:09.290 --> 01:06:10.560
David Bau: What does it know?

673
01:06:11.540 --> 01:06:12.689
David Bau: That's the puzzle.

674
01:06:13.050 --> 01:06:19.829
David Bau: So what is AlphaZero? It's what they call a positive value network, right? You can think of it… so basically, it takes a chessboard.

675
01:06:19.960 --> 01:06:26.520
David Bau: as input, And then I give some assessment of, like, how good this chessboard is, Right?

676
01:06:26.910 --> 01:06:38.330
David Bau: And you can use that to search over different possible moves. Oh, if I move here, does that make this chessboard better? Does it make it worse if I move there? Does it make it better or worse? You can run this whole tree search.

677
01:06:38.630 --> 01:06:45.119
David Bau: And… and… and that's… that's how… this network does, it basically…

678
01:06:45.300 --> 01:06:48.020
David Bau: Gives you this quick assessment of our business.

679
01:06:48.130 --> 01:06:53.550
David Bau: And with a very shallow search, you can just use it to pick, like, what the best move is.

680
01:06:53.890 --> 01:06:55.670
David Bau: You can run a deeper search, too.

681
01:06:55.920 --> 01:06:59.679
David Bau: to see, like, all the different possibilities that we call it Monte Carlo pre-search.

682
01:07:00.070 --> 01:07:06.730
David Bau: So you could go 3 or 4 moves deep and generate, you know, hundreds or thousands of different

683
01:07:06.840 --> 01:07:09.619
David Bau: Perspective, different possible future boards.

684
01:07:09.740 --> 01:07:12.439
David Bau: And then you can ask, you know, what do you think of these boards?

685
01:07:12.710 --> 01:07:32.040
David Bau: And the interesting thing is the way they train this network is that after you do this whole Monte Carlo Tree search, now you've got this estimate of, like, what the best board is, is possible for you, what the best board for your opponent is, what they would choose, and it gives you a much more accurate view of, like, what the future holds, because you've done the search out 5 moves ahead.

686
01:07:32.250 --> 01:07:36.900
David Bau: And then, then you say, well, what was the original estimate when I just looked at the board without doing the search?

687
01:07:37.220 --> 01:07:42.809
David Bau: And it's not quite as accurate as the estimate I get from doing this 5-deep search.

688
01:07:43.000 --> 01:07:44.259
David Bau: So let me train it.

689
01:07:44.570 --> 01:07:45.650
David Bau: Let me update.

690
01:07:45.810 --> 01:07:48.680
David Bau: my… Little… quick.

691
01:07:48.850 --> 01:07:56.539
David Bau: assessment to correctly reflect, like, what a much deeper search would have had. Let me memorize

692
01:07:56.850 --> 01:08:04.430
David Bau: the output of this deeper search into this network. And so that's basically how the officer on… style,

693
01:08:04.560 --> 01:08:11.490
David Bau: deep network training things work. You just have them play lots and lots of chess games, do lots of… lots of money college research.

694
01:08:11.620 --> 01:08:15.850
David Bau: And then try to memorize the output of Monte Carlo research into this policy-guiding network.

695
01:08:16.130 --> 01:08:22.790
David Bau: And, and so… so the question is… So when you're done.

696
01:08:22.960 --> 01:08:29.529
David Bau: You don't even have to do the money college research anymore. You can just play against the Policy Value Network, and they can still beat you.

697
01:08:30.279 --> 01:08:31.040
David Bau: Right?

698
01:08:31.300 --> 01:08:36.690
David Bau: Because it's like guessing what… how the search is gonna come out, and it's usually right.

699
01:08:37.319 --> 01:08:43.340
David Bau: And so, the question is, what the heck is going on inside this positive value network?

700
01:08:43.870 --> 01:08:46.339
David Bau: Like, are there superhuman concepts in there?

701
01:08:46.569 --> 01:08:50.890
David Bau: And it's weird, how do you find a superhuman concept? Like, here's my questions.

702
01:08:51.520 --> 01:08:55.850
David Bau: Right? How do you find it? It's like, you guys are looking for concepts that you can write down.

703
01:08:57.850 --> 01:09:02.830
David Bau: Like uncertainty in a network, or power relationships, or political leanings.

704
01:09:02.979 --> 01:09:06.040
David Bau: You know, you know, attribution

705
01:09:06.200 --> 01:09:11.170
David Bau: attribution of words, spatial… spatial understanding. But for superhuman concepts.

706
01:09:11.370 --> 01:09:14.180
David Bau: We don't know what they are. We can't guess what they are.

707
01:09:15.229 --> 01:09:18.830
David Bau: Even if they showed it to you, you probably wouldn't recognize it.

708
01:09:19.950 --> 01:09:20.660
David Bau: Right?

709
01:09:21.729 --> 01:09:26.089
David Bau: We certainly don't have any words for it. You're not gonna look… you're not gonna find it in the library.

710
01:09:27.240 --> 01:09:27.970
David Bau: Right?

711
01:09:28.340 --> 01:09:30.410
David Bau: I don't even notice it's there.

712
01:09:32.490 --> 01:09:33.870
David Bau: And then how do we learn it?

713
01:09:35.609 --> 01:09:38.190
David Bau: What a cool question! Isn't this a cool question?

714
01:09:38.859 --> 01:09:39.375
David Bau: Right?

715
01:09:40.160 --> 01:09:41.539
David Bau: This is a cool question.

716
01:09:41.770 --> 01:09:45.320
David Bau: You know, it's such a cool question, I think it's, like, the most important paper.

717
01:09:45.430 --> 01:09:47.059
David Bau: In the last couple years, this one.

718
01:09:47.450 --> 01:09:50.569
David Bau: And it's hard to do, and they have a formula for doing it.

719
01:09:51.609 --> 01:09:57.969
David Bau: So, there's a few pieces of evidence. So, first of… first of all, they're like.

720
01:09:58.590 --> 01:10:02.919
David Bau: Does AlphaZero even contain your concept? Oh, Grace, you've had this question. Ask a question.

721
01:10:03.090 --> 01:10:08.280
David Bau: Yeah, so they, they, if I remember it correctly, they would, like, take the ways that

722
01:10:08.460 --> 01:10:22.389
David Bau: human users, human people would play, and AlphaZero would play, and then they would do some sort of thing to find the dimensionality, like, the rank. Yes, exactly. And then they claimed that, like, oh, if the rank…

723
01:10:22.800 --> 01:10:25.099
David Bau: Is higher for the way that

724
01:10:25.800 --> 01:10:34.030
David Bau: AlphaZero plays, then that means it has more concepts. Yeah. But I was curious, like, that is proof that there are concepts that humans don't.

725
01:10:35.000 --> 01:10:46.769
David Bau: Or at least not in how they… not reflected in how they play. Okay. And so I was curious, like, it seems like with LLMs, for example, they're very deterministic in how they talk. Like, it seems like we could have access to different concepts, but have a lower…

726
01:10:47.000 --> 01:10:57.070
David Bau: Right, like, or just more narrow in the sense… Yeah, yeah, yeah. Yeah. Yeah, yeah, so then, so I think that's right. So I think there's… the proof… okay, so the question is.

727
01:10:57.200 --> 01:11:00.620
David Bau: Like, is After Zero actually know something that we don't know?

728
01:11:00.920 --> 01:11:09.669
David Bau: And so, the first experiment that they did… well, one of the first ones was this one that… that Grace is asking about, and… and… and it's… it says, it's so…

729
01:11:09.960 --> 01:11:15.679
David Bau: you could feed AlphaZero's own chess moves, like, halfway through an AlphaZero play game.

730
01:11:16.050 --> 01:11:24.280
David Bau: and you have some chess configuration, you can feed it into the AlphaZero thing, and you get some vector representation of the chessboard, you know, near the end of the network.

731
01:11:24.610 --> 01:11:29.250
David Bau: And you could just feed through millions and millions of these, and you get millions and millions of vectors.

732
01:11:29.450 --> 01:11:36.670
David Bau: And the major vectors will, like, make some big cloud in vector space, right? Now, the interesting thing is that cloud has a different shape.

733
01:11:36.850 --> 01:11:53.809
David Bau: as the cloud that you would get if you fit in a different data cent. So the cool thing about chess is there's so many human-play games that are all recorded in great detail, that you can just go download millions and millions of human games, and you can also feed the human games into AlphaZero's policy value network, and you get a different cloud of vectors.

734
01:11:54.300 --> 01:11:55.000
David Bau: Right?

735
01:11:55.220 --> 01:12:01.080
David Bau: And so… so the first thing they did was they said, I wonder…

736
01:12:01.270 --> 01:12:03.990
David Bau: I wonder if these clouds are different from each other.

737
01:12:04.470 --> 01:12:05.160
David Bau: Right?

738
01:12:05.680 --> 01:12:12.500
David Bau: And… And they are indeed different. So there's… you can measure it at different… layers.

739
01:12:12.980 --> 01:12:18.169
David Bau: And so, at the input layer for the network, They… they encode…

740
01:12:18.280 --> 01:12:23.270
David Bau: the chessboard, and then I guess it's this…

741
01:12:24.310 --> 01:12:30.179
David Bau: you know, what I, I don't know what 8 comma, 8,119 is. Okay, so…

742
01:12:30.300 --> 01:12:35.309
David Bau: I'll just say 7,000 dimensional, or 7,000 rank.

743
01:12:35.880 --> 01:12:37.400
David Bau: representation.

744
01:12:37.640 --> 01:12:47.910
David Bau: Oh, 8x8, oh, I see. 8 times 8 times 119, yeah, so… so they have 119 commercial vector for every square, and they have an 8x8 board.

745
01:12:48.430 --> 01:12:49.209
David Bau: Got it.

746
01:12:50.120 --> 01:12:57.989
David Bau: I'm such not a chess player. So you probably, if you multiply these numbers, you get, like, 7,000. Alright, so they have this 7,000 dimensional thing.

747
01:12:58.280 --> 01:13:02.290
David Bau: And if you feed in all the inputs, It puts, like…

748
01:13:02.390 --> 01:13:06.960
David Bau: Like, human data, if you look at the dimensionality of this cloud of inputs.

749
01:13:07.080 --> 01:13:14.070
David Bau: It turns out they're not using all 7,000 dimensions. It forms approximately a 730-dimensional cloud.

750
01:13:14.660 --> 01:13:21.029
David Bau: Right? It's, like, pretty flat in that space. Like, real chess games aren't just a random array of pieces.

751
01:13:21.450 --> 01:13:30.059
David Bau: I think there's some structure, and this is, like, what the structure looks like in linear algebra terms. And actually, AlphaZero has a little less diversity there, it's, like, two dimensions short.

752
01:13:30.330 --> 01:13:32.449
David Bau: Right? So, but about the same number.

753
01:13:32.610 --> 01:13:35.959
David Bau: Right. So, but the interesting thing is, after you have it.

754
01:13:36.160 --> 01:13:39.010
David Bau: Process it, and think about, conceptually.

755
01:13:39.140 --> 01:13:42.499
David Bau: But it goes… it goes through, like, 19 levels of transformers or something.

756
01:13:42.900 --> 01:13:43.700
David Bau: Right?

757
01:13:43.910 --> 01:13:49.779
David Bau: You get to the 19th layer, or the 23rd layer, and then the situation inverts.

758
01:13:49.970 --> 01:13:52.780
David Bau: Now, human data has, like.

759
01:13:53.020 --> 01:14:04.140
David Bau: 7,000 dimensions out of a possible maximum of 16,000 dimensions, but the alphaZero is, like, a couple hundred dimensions larger, right? So, I don't know what that means at all.

760
01:14:04.600 --> 01:14:09.580
David Bau: Right? It's not, like, a standard unit analysis, but it was… it's kind of interesting.

761
01:14:09.880 --> 01:14:15.650
David Bau: And, and so… So they said, oh, maybe this is evidence.

762
01:14:15.920 --> 01:14:16.959
David Bau: that it knows.

763
01:14:17.570 --> 01:14:24.870
David Bau: It's representing something richer The dimension… this counting argument at least tells us

764
01:14:24.970 --> 01:14:27.220
David Bau: That we could find some vectors.

765
01:14:28.320 --> 01:14:37.010
David Bau: In the AlphaZero representations, of the representations of alpha serograms. There are some vectors That don't exist.

766
01:14:38.800 --> 01:14:45.359
David Bau: It means that we could, if we look at this linear algebra, it gives us this ranking function, this very interesting ranking function.

767
01:14:45.660 --> 01:14:48.430
David Bau: Where you could go over all the AlphaZero games.

768
01:14:48.820 --> 01:14:51.720
David Bau: And you could say, how close are you

769
01:14:52.590 --> 01:14:54.710
David Bau: To the cloud of human games.

770
01:14:55.990 --> 01:14:57.820
David Bau: Right, according to the presentation.

771
01:14:58.760 --> 01:15:05.879
David Bau: And some things will be, like, pretty close to the human gifts, to the 7,000 individuals, right?

772
01:15:06.200 --> 01:15:07.770
David Bau: I think something'll be pretty close.

773
01:15:07.890 --> 01:15:12.619
David Bau: But other board states will be very non-human-like.

774
01:15:13.240 --> 01:15:15.030
David Bau: They're the alien things.

775
01:15:15.320 --> 01:15:32.920
David Bau: It would never show up in, like, human games. Does that make sense? The analyzing human games is, like, an average of all the years, because I'm assuming I have less moves than manuals, I don't know. It's not average, I think that they… well, you're right, it's an aggregate of all the things, and when you're doing this dimensionality thing.

776
01:15:33.180 --> 01:15:38.280
David Bau: There's probably some averaging effects of, like, how, like, how much do you count?

777
01:15:38.530 --> 01:15:42.930
David Bau: Somebody, like, if the same game state shows up a thousand times.

778
01:15:43.420 --> 01:15:56.060
David Bau: And then another game save is very rare, how do you weight it? And I don't know the details of, like, how they weight all the things, so it's a reasonable, it's like a… it's a reasonable, you know, technical question, and we'd probably have to ask me other authors to ask how they did that.

779
01:15:56.470 --> 01:16:03.410
David Bau: But I'm just continuing on telling you the general story without knowing intent about my paper.

780
01:16:03.570 --> 01:16:06.530
David Bau: And, so… so… so now…

781
01:16:06.740 --> 01:16:11.440
David Bau: So, this is instantly interesting. Games that are not human-like.

782
01:16:11.840 --> 01:16:16.850
David Bau: Right? And so they found… Really…

783
01:16:17.490 --> 01:16:19.769
David Bau: Really, 3 things that they needed to do.

784
01:16:20.010 --> 01:16:22.230
David Bau: To isolate interesting vectors.

785
01:16:22.600 --> 01:16:24.649
David Bau: So they have this vector space.

786
01:16:24.840 --> 01:16:31.199
David Bau: And now what they're after is they want to know the most interesting vectors In that vector space.

787
01:16:31.640 --> 01:16:38.649
David Bau: And it's… it's a 16,000 dimensional vector space, and there are 8,000 dimensions that are, like, being used.

788
01:16:39.070 --> 01:16:41.899
David Bau: So how do you find, like, interesting 8,000-dimensional vectors?

789
01:16:42.980 --> 01:16:47.359
David Bau: They, they, they just, they, here's what they did. They had this three-step plan.

790
01:16:48.160 --> 01:16:53.990
David Bau: So, first of all, they made, like, a list of candidate concept vectors using good old-fashioned chess knowledge.

791
01:16:54.350 --> 01:16:58.740
David Bau: So, they analyzed a gazillion AlphaZero Games.

792
01:16:59.200 --> 01:17:13.339
David Bau: and they use classical techniques, probably the Monte Carlo tree search that AlphaZero was doing. They could look at the Monte Carlo tree search at every step, and they say, sometimes the Monte Carlo tree search is really unambiguous.

793
01:17:13.390 --> 01:17:19.719
David Bau: Like, there's one best move, you're just gonna do it, it's not a very interesting position. But sometimes, Mari Kalchrusich

794
01:17:20.320 --> 01:17:26.720
David Bau: has, like, a dilemma. Like, it… like, there are two paths. They look pretty good. These are, like, the choice points.

795
01:17:27.010 --> 01:17:28.150
David Bau: You're in Chuscany.

796
01:17:28.410 --> 01:17:35.449
David Bau: So, so they went through and they just did this classical chess analysis, and they just found all the interesting choice points. You know, probably, like, millions of, like.

797
01:17:35.780 --> 01:17:41.280
David Bau: interesting conditions. Right, and they threw out all the rest of the interesting things. They probably threw out 99%.

798
01:17:41.840 --> 01:17:42.510
David Bau: banned.

799
01:17:43.560 --> 01:17:46.460
David Bau: And they said, okay, of these interesting positions.

800
01:17:47.530 --> 01:17:53.069
David Bau: how many have humans seen before? Like, how many are, like, emblematic of, like, what would show up in a human game?

801
01:17:53.810 --> 01:17:55.890
David Bau: And so they did this linear algebra thing.

802
01:17:56.060 --> 01:17:56.980
David Bau: I said.

803
01:17:57.100 --> 01:18:03.390
David Bau: Oh, you could project it onto the human 7,800 conventional vector space, subspace?

804
01:18:03.980 --> 01:18:07.719
David Bau: How… how… how far do you have to go?

805
01:18:08.450 --> 01:18:11.890
David Bau: to become human-like? How far is that projection?

806
01:18:12.100 --> 01:18:17.590
David Bau: And you can project down to the AlphaZero subspace. How far do you have to go to beyond, like, the typical AlphaZero subspace?

807
01:18:17.730 --> 01:18:18.430
David Bau: And then…

808
01:18:18.710 --> 01:18:24.700
David Bau: And then if you have to go really far to be a human, but not very far for Alzo, then it's very emblematic.

809
01:18:24.930 --> 01:18:32.200
David Bau: Of, like, an AlphaZero game, and not very emblematic of a human game. So they basically took the difference between these two distances.

810
01:18:32.610 --> 01:18:37.830
David Bau: And they, they, they, they selected the top 1% or something like that.

811
01:18:38.290 --> 01:18:43.679
David Bau: For games that are most AlphaZero-like and least human-like, the most novel.

812
01:18:44.270 --> 01:18:44.960
David Bau: Right?

813
01:18:45.930 --> 01:18:49.139
David Bau: And then, they still had too many vectors.

814
01:18:50.020 --> 01:18:53.449
David Bau: So they're like, we still need to throw away, like, 99% of our vectors.

815
01:18:54.050 --> 01:18:58.620
David Bau: So then they had this third thing, which I thought was the most clever thing they did.

816
01:18:59.150 --> 01:19:03.380
David Bau: Which is they filled out… filled out vectors that are not Learnable.

817
01:19:04.410 --> 01:19:05.679
David Bau: What the heck is that?

818
01:19:06.490 --> 01:19:10.830
David Bau: So, for this, what they did, is…

819
01:19:12.830 --> 01:19:14.650
David Bau: For every one of these vectors.

820
01:19:15.100 --> 01:19:18.340
David Bau: They made prototype training sets

821
01:19:18.860 --> 01:19:24.060
David Bau: for the concept. So, for any given vector,

822
01:19:24.510 --> 01:19:32.839
David Bau: they found lots of games where the… the AlphaZero was like, yeah, that's, like, a typical situation.

823
01:19:33.050 --> 01:19:44.790
David Bau: For this vector. And then they did the Monte Carlo true search to say, in this situation, you're choosing between these not-as-good situations and these better situations. And they would make this training set

824
01:19:45.100 --> 01:19:54.720
David Bau: That… of, like, example board positions, where this is, like, the contrast between, like, the good path for this

825
01:19:55.330 --> 01:19:56.230
David Bau: concept.

826
01:19:56.400 --> 01:20:05.269
David Bau: And the bad path for this concept. So, I'm conflating a couple things that they did that are very similar in the paper. So, one of the things is, how did they get these vectors in the first place?

827
01:20:05.280 --> 01:20:18.269
David Bau: They had all these board positions. How do you get the vector from a board position? Actually, what they did is they didn't just take the vector from the board position, they actually formed these contrast sets, and then after they formed the contrast sets, they would use it to create a probe.

828
01:20:18.580 --> 01:20:24.149
David Bau: For what probe at that board position would tell the difference between the good path and the bad path?

829
01:20:24.290 --> 01:20:29.000
David Bau: that monocollics research. So, monoclogic Research, it was all choosing between a couple

830
01:20:29.280 --> 01:20:32.489
David Bau: Couple different things. They have these choice moments.

831
01:20:32.590 --> 01:20:34.590
David Bau: And so they would make a probe.

832
01:20:34.810 --> 01:20:42.019
David Bau: For what, vector representation was a good probe for whether you went on the good path or the bad path.

833
01:20:42.570 --> 01:20:44.659
David Bau: And that was the vector that they were interested in.

834
01:20:44.940 --> 01:20:57.089
David Bau: And then they went and they… they looked for, games that had this vector in its representation, and… and then they… that allowed them to find a lot of similar games with a similar situation.

835
01:20:57.710 --> 01:21:01.169
David Bau: And they, and then they would dump out

836
01:21:01.430 --> 01:21:06.459
David Bau: These sets of board examples of, like, these are examples of

837
01:21:06.580 --> 01:21:10.820
David Bau: Good examples of this concept, and anti-examples of, like, missing this concept.

838
01:21:11.050 --> 01:21:11.810
David Bau: Right.

839
01:21:12.040 --> 01:21:18.680
David Bau: And then after they had all those examples, Then they would go to… a chess engine.

840
01:21:19.290 --> 01:21:21.639
David Bau: an AlphaZero Trust Engine, just like…

841
01:21:22.340 --> 01:21:24.969
David Bau: Aussie Value Network, and they would train it.

842
01:21:26.030 --> 01:21:30.619
David Bau: On that data set, on those, like, you know, 100,000 examples of

843
01:21:30.780 --> 01:21:32.870
David Bau: Good and bad examples of the sector.

844
01:21:33.570 --> 01:21:36.680
David Bau: And… And then after they did that.

845
01:21:37.940 --> 01:21:45.610
David Bau: They would find that if they traded on this complicated data set for a while, that,

846
01:21:46.830 --> 01:21:56.390
David Bau: oh, I don't remember C and R and all the things, but basically, in blue, I think, is, it would start doing better.

847
01:21:56.650 --> 01:21:58.789
David Bau: At the concept that you trained on.

848
01:22:00.000 --> 01:22:04.580
David Bau: Right? But then not only did it do better in the concept they train it on, sometimes…

849
01:22:05.030 --> 01:22:09.250
David Bau: It would do better on related concepts, on different vectors.

850
01:22:09.620 --> 01:22:10.360
David Bau: Right.

851
01:22:10.620 --> 01:22:13.970
David Bau: And sometimes, it would actually do better at general gameplay.

852
01:22:14.380 --> 01:22:15.400
David Bau: as well.

853
01:22:15.600 --> 01:22:16.290
David Bau: Great.

854
01:22:16.790 --> 01:22:23.120
David Bau: And so, and so, what they did is… now, they wouldn't happen like this observer.

855
01:22:23.290 --> 01:22:33.190
David Bau: Often when they chose these sort of training sets, they would put it through this procedure, and the network wouldn't learn. Like, these numbers wouldn't go up.

856
01:22:33.410 --> 01:22:36.110
David Bau: Right? It wouldn't… it wouldn't get better at chess.

857
01:22:36.420 --> 01:22:38.329
David Bau: And so, in those situations.

858
01:22:38.510 --> 01:22:45.160
David Bau: The code vector, or whatever they found, was not doing a good enough job at coming up with

859
01:22:45.600 --> 01:22:51.100
David Bau: Distinct things that could be exemplified, like a lesson that could be exemplified by sharing, like, chessboards.

860
01:22:51.320 --> 01:22:58.140
David Bau: And so… So they… so… so this worked sometimes, it worked well, sometimes it worked badly.

861
01:22:58.320 --> 01:22:59.220
David Bau: Self?

862
01:22:59.530 --> 01:23:05.760
David Bau: They said, we're just looking for the best concepts, so again, they threw away 99%. They said, they just tested this on a lot.

863
01:23:06.320 --> 01:23:11.710
David Bau: And then they only kept the top 1% of vectors that had the best transfer.

864
01:23:11.960 --> 01:23:14.459
David Bau: In this way, they were the easiest to distill.

865
01:23:14.600 --> 01:23:16.230
David Bau: And to lessons that were learnable.

866
01:23:16.800 --> 01:23:24.570
David Bau: Right? And then finally, that gave them a handful of vectors, like a human-sized set of vectors, I don't know how many…

867
01:23:24.710 --> 01:23:26.949
David Bau: I should ask this, what the number was in the end.

868
01:23:27.360 --> 01:23:32.220
David Bau: And then they took these vectors and they said, okay, Now, what we'll do…

869
01:23:32.460 --> 01:23:34.500
David Bau: These will teach them to humans.

870
01:23:35.650 --> 01:23:38.180
David Bau: And they put them… they put them in,

871
01:23:38.340 --> 01:23:42.190
David Bau: They printed out all these little chessboard manuals, Maybe that's…

872
01:23:42.450 --> 01:23:45.670
David Bau: And they said, according to this chess book.

873
01:23:46.420 --> 01:23:50.940
David Bau: Here's a really challenging chess puzzle. Tell me which way you would move.

874
01:23:51.190 --> 01:23:55.099
David Bau: And then ask the humans to answer, and they know which way is the good way and the bad way.

875
01:23:55.410 --> 01:24:07.120
David Bau: And they'd score the humans on this, and they'd give them 20 chess puzzles or something like this. And then, after that, they would say, okay, so let me tell you the secret. Actually, you were wrong for most of these. This is the way it goes, but I'm gonna give you a book.

876
01:24:07.540 --> 01:24:12.010
David Bau: Of a dozen examples of, like, the good examples and the bad examples of each one of these concepts.

877
01:24:12.180 --> 01:24:17.039
David Bau: And you can just study this book, just the same way that we trained AlphaZeros.

878
01:24:17.580 --> 01:24:33.500
David Bau: And then after they did that, the people would have a few hours to study. These were all, like, grandmasters, these were all, like… it was a Google paper, so they paid these people to, like, come to the Google campus and hang out and have fancy food, and talk about chess all day, and…

879
01:24:33.680 --> 01:24:35.830
David Bau: like, learn from AlphaZero.

880
01:24:36.070 --> 01:24:47.200
David Bau: They loved it, because they actually learned these new chess techniques, and then they would come back and they would take the quiz with new board positions they hadn't seen before, but according to AlphaZero, like, the same vector.

881
01:24:47.860 --> 01:24:51.999
David Bau: And… and they all… We're able to solve the puzzles now.

882
01:24:52.390 --> 01:24:59.210
David Bau: They all learned the superhuman concept, and several of the chess masters were like, What an interesting day.

883
01:24:59.820 --> 01:25:01.870
David Bau: I did learn something new about Tristan.

884
01:25:02.190 --> 01:25:15.469
David Bau: And they try to articulate, oh, you know, I learned that it's important to develop queenside power and this and that, whatever, right? They have some whatever words that they tried to use to describe what they were doing. But, like, some new concept they had never seen before.

885
01:25:16.120 --> 01:25:35.189
David Bau: And so then, what the authors of the paper like to say is that one of their star participants who was the most enthusiastic about the day went on two years later to become the chess champion of the world, still reigning champion today. Must have learned something that day. And so…

886
01:25:35.350 --> 01:25:45.680
David Bau: But… but anyway, so that's… so that's… that's the story of this, and I just wanted to show you what this research is. I think it's pretty interesting. The, I don't have time to go over this other thing, maybe, you know.

887
01:25:45.680 --> 01:25:57.150
David Bau: Have we talked about the function vectors paper? In this class, we have, right? And so, the exercise that we're going to do next Tuesday has to do with function vectors a little bit, so I, you know, I'd mainly…

888
01:25:57.200 --> 01:26:00.239
David Bau: Have you think of, like, a contrastive.

889
01:26:00.430 --> 01:26:06.720
David Bau: A pair of prompts that you might want to, you know, patch between that has to do with your research.

890
01:26:06.830 --> 01:26:16.879
David Bau: And I think one of the things that we'll do is we'll try to pull apart the attention pens to try to factor individual attention pens. Try to kind of step through some of these things.

891
01:26:18.250 --> 01:26:19.000
David Bau: Okay.

892
01:26:19.400 --> 01:26:21.249
David Bau: Alright guys, thanks a lot.

893
01:26:21.470 --> 01:26:22.350
David Bau: Diff.