WEBVTT

1
00:00:02.350 --> 00:00:08.900
Nikhil Prakash: So… Today we have… Not visiting speaker.

2
00:00:09.030 --> 00:00:10.430
Nikhil Prakash: Nikhil Prakash.

3
00:00:10.840 --> 00:00:17.570
Nikhil Prakash: And so Nikhil is actually one of… It's a very unusual case.

4
00:00:18.210 --> 00:00:19.729
Nikhil Prakash: For a PhD student.

5
00:00:20.500 --> 00:00:31.740
Nikhil Prakash: Go ahead. Because… Because, you know, when he arrived, sort of his first month here.

6
00:00:31.920 --> 00:00:36.209
Nikhil Prakash: We sat down, we chatted about what kinds of things he was interested in studying.

7
00:00:36.310 --> 00:00:39.819
Nikhil Prakash: And we looked at a whole variety of different things, and

8
00:00:39.920 --> 00:00:46.989
Nikhil Prakash: And he pretty quickly says, you know that, right there, this topic is interesting to me. What is that? Theory of mind.

9
00:00:47.110 --> 00:00:52.600
Nikhil Prakash: Right? Do these AI systems that we're building

10
00:00:53.570 --> 00:01:03.159
Nikhil Prakash: Do they think like humans? Do they look at other agents and think about what they're thinking? Do they have a theory of mind? Can they… can they… can they put themselves

11
00:01:03.690 --> 00:01:05.510
Nikhil Prakash: In the… in the shoes of others?

12
00:01:05.770 --> 00:01:07.379
Nikhil Prakash: That's what I'd like to study.

13
00:01:07.880 --> 00:01:12.030
Nikhil Prakash: And… and and so he… he started working on it right away.

14
00:01:12.180 --> 00:01:19.030
Nikhil Prakash: But from the beginning, it was, like, challenges. Like, the models that were available his first year.

15
00:01:19.410 --> 00:01:24.479
Nikhil Prakash: They had no theory of mind. We've had some papers about this.

16
00:01:24.660 --> 00:01:32.449
Nikhil Prakash: Right? Like, they just… they couldn't do it at all. So it's taken a few years to get to the point when models maybe credibly have some… some evidence of even having this.

17
00:01:32.600 --> 00:01:36.829
Nikhil Prakash: But… but the interesting thing, and the thing that makes them unusual, is that

18
00:01:37.100 --> 00:01:46.509
Nikhil Prakash: Yeah, if I look at him now, you know, a few years later, oh, he's, like, working on the thing that he said that he was gonna work on.

19
00:01:46.620 --> 00:01:48.980
Nikhil Prakash: And so…

20
00:01:49.170 --> 00:02:03.069
Nikhil Prakash: So today, he's not going to talk about a theory of mind paper, but he's… But it's very closely related. It's very closely related. It's binding, right? Which is, which is… which is a, which is a large aspect of the theory of mind task.

21
00:02:03.340 --> 00:02:05.700
Nikhil Prakash: And he's looking at it in an interesting setting.

22
00:02:05.960 --> 00:02:11.290
Nikhil Prakash: There's not language models, and so… So, so…

23
00:02:11.460 --> 00:02:29.380
Nikhil Prakash: Now, I think that he also is going to do the talk in an unusual way, because… are you? No. No, you're not. So, I've been asking him to, practice his, his, his, his sort of talk technique, and trim down his talks, sort of really short, really polished.

24
00:02:29.480 --> 00:02:39.979
Nikhil Prakash: You know, to, like, you know, the kind of, like, more formal presentations. And I thought, well, maybe he'll practice that today, but it sounds like he's not going to practice that today, so we'll see. But welcome, welcome, Mikael.

25
00:02:40.570 --> 00:02:42.020
Nikhil Prakash: But, take it.

26
00:02:43.010 --> 00:02:57.060
Nikhil Prakash: Yeah, thank you for that great introduction. Yeah, unfortunately, yeah, I… I… so, I had prepared the slides by yesterday, but then I… but, yeah, and I had a chat with David about the slides.

27
00:02:57.410 --> 00:02:59.489
Nikhil Prakash: And he gave me a lot of feedback.

28
00:03:00.170 --> 00:03:06.559
Nikhil Prakash: And I just couldn't incorporate All of the feedback in, like, 3, 4, 5 hours.

29
00:03:07.040 --> 00:03:26.499
Nikhil Prakash: That's the… effectively, that's the time that I had. So, I started making the changes in the slide, but in the middle, I figured out I won't be able to complete it. I mean, the kind of quality that I think you were trying to push me towards, that was too much for me to get in 3 or 4 to 5 hours. Okay.

30
00:03:27.290 --> 00:03:36.180
Nikhil Prakash: So, I did a few changes. In fact, you might see that there might be significant difference between the quality of slides.

31
00:03:36.290 --> 00:03:39.869
Nikhil Prakash: like, the earliest sites versus later sites.

32
00:03:40.620 --> 00:03:52.420
Nikhil Prakash: But hopefully, I think I should be able to explain the ideas and the main results from the paper still clearly. I think that's the main goal here. So I think, yeah, so this is… this is gonna… still gonna be our normal conversation, kind of.

33
00:03:52.670 --> 00:04:00.839
Nikhil Prakash: presentation, not like a formal, polished presentation. So feel free, feel free to interrupt me and ask as many questions as you want.

34
00:04:01.680 --> 00:04:02.600
Nikhil Prakash: Okay.

35
00:04:03.420 --> 00:04:12.490
Nikhil Prakash: Yeah, so this is basically the title of the paper, The Dual Mechanisms of Spatial Reasoning in VLMs. This is under, review right now at ICML.

36
00:04:12.860 --> 00:04:14.250
Nikhil Prakash: Let's see what happens.

37
00:04:15.120 --> 00:04:16.550
Nikhil Prakash: Okay, so…

38
00:04:18.190 --> 00:04:25.860
Nikhil Prakash: Okay, I'll start the presentation by actually describing this idea or concept of variable binding. Yeah, I think they would…

39
00:04:26.180 --> 00:04:33.940
Nikhil Prakash: briefly mentioned about it, that this paper is about binding as well. It's, I think, a very old idea. I think it's an idea which is older than most of us.

40
00:04:34.670 --> 00:04:41.730
Nikhil Prakash: People have been talking about in neuroscience and Cox science for the last… at least 3 decades that… that… that I've known of.

41
00:04:42.360 --> 00:04:45.190
Nikhil Prakash: And the idea is actually pretty simple and fundamental.

42
00:04:45.490 --> 00:04:54.629
Nikhil Prakash: It's just the ability to associate features of an object. So now, right now, you're seeing a lot of people, and most of us are wearing different color

43
00:04:54.730 --> 00:04:56.910
Nikhil Prakash: t-shirts or sweaters.

44
00:04:57.150 --> 00:04:58.610
Nikhil Prakash: And you can ascribe

45
00:04:58.770 --> 00:05:14.699
Nikhil Prakash: you can figure out that, okay, the color of the t-shirt that this person is wearing is this color. And you're not confusing between the colors of different people. And the reason that you are able to do that, that's because your mind is able to do, or is able to solve the binding problem.

46
00:05:16.470 --> 00:05:20.129
Nikhil Prakash: Yeah, so essentially the binding problem is basically to…

47
00:05:20.440 --> 00:05:36.879
Nikhil Prakash: associate features of an object, and keep it separate between different objects. So, for instance, in this particular slide, this is a nice photo of a horse wearing a fedora and a cat wearing a…

48
00:05:36.990 --> 00:05:37.870
Nikhil Prakash: Cap.

49
00:05:38.440 --> 00:05:40.800
Nikhil Prakash: Now… if…

50
00:05:41.090 --> 00:05:55.389
Nikhil Prakash: any intelligent system, be it human brain or a neural model, if we say that that system has variable binding capability, what I mean is, it can sort of take that image and then create this list of

51
00:05:55.390 --> 00:06:01.749
Nikhil Prakash: A set of tuples, where each tuple is actually representing an object and its corresponding feature.

52
00:06:02.930 --> 00:06:07.340
Nikhil Prakash: Yeah, if a system can do that, then we say that the system has a variable binding capability.

53
00:06:13.080 --> 00:06:30.690
Nikhil Prakash: Why do we care about variable binding in quant spatial reasoning? Because I would argue that this is one of the fundamental properties that any system needs to have to be able to do spatial reasoning. So, for instance, if you give this particular image to a neural model to generate a caption, and let's say it creates this caption, which describes the image.

54
00:06:30.860 --> 00:06:49.759
Nikhil Prakash: I would argue that to be able to represent or generate that caption coherently, it needs to do variable binding in its internal thinking process. If it cannot do variable binding properly, then the caption that it's gonna generate is more likely be incorrect, and even if it's correct, it's…

55
00:06:49.950 --> 00:06:53.399
Nikhil Prakash: It's something that is not really trustworthy.

56
00:06:54.500 --> 00:07:02.610
Nikhil Prakash: So that's why variable binding is an essential skill or essential task for any system to do spatial reasoning.

57
00:07:04.250 --> 00:07:09.880
Nikhil Prakash: In fact, a number of previous works have shown that the… the limited…

58
00:07:11.270 --> 00:07:18.099
Nikhil Prakash: facial capability of VLMs can be attributed to their restricted variable binding capabilities.

59
00:07:20.040 --> 00:07:20.910
Nikhil Prakash: Okay.

60
00:07:25.050 --> 00:07:31.020
Nikhil Prakash: Okay, so variable mining seems very important for spatial reasoning. Then the question…

61
00:07:31.660 --> 00:07:34.839
Nikhil Prakash: Comes up. The main question that comes up, at least for us.

62
00:07:34.970 --> 00:07:46.920
Nikhil Prakash: the Macintah people is, can we better understand it in the visual space? In the hope that maybe we can probably improve the model performance as well, with the better insights of how the model

63
00:07:48.270 --> 00:07:53.199
Nikhil Prakash: models are actually… Forming those tuples in its internal activations.

64
00:07:55.370 --> 00:08:01.430
Nikhil Prakash: Before going into… the VLM space.

65
00:08:01.930 --> 00:08:15.400
Nikhil Prakash: Actually, people have looked into this problem in the language model space, and I'm first gonna describe what we already… You're so humble. People have looked into it. I wonder who people…

66
00:08:15.660 --> 00:08:20.340
Nikhil Prakash: Okay, yeah, so some of those contributions have been from our lab as well.

67
00:08:20.460 --> 00:08:25.310
Nikhil Prakash: Okay, okay, okay, from me.

68
00:08:26.930 --> 00:08:30.170
Nikhil Prakash: Yeah, okay, so…

69
00:08:30.600 --> 00:08:43.069
Nikhil Prakash: Yeah. Yeah, so let me first describe what we already know about this variable binding in the language model space, and then maybe we can start thinking about and talking about the same problem in the vision space.

70
00:08:43.970 --> 00:09:00.549
Nikhil Prakash: So assume this is, the task, the task that you see on the slide. You have a context, something like, apple is in box A, banana is in box B, cheese in box C, and then you ask, or you just have a query sentence, box A contains the… and the answer should be apple.

71
00:09:02.570 --> 00:09:22.409
Nikhil Prakash: Pretty simple, right? And what previous, or some of my works have shown is the way models, or especially language models, solve this task is by first creating some kind of, like, abstract, ordering representation for each kind of important token that you have in the prompt.

72
00:09:24.320 --> 00:09:25.110
Nikhil Prakash: Okay.

73
00:09:25.720 --> 00:09:34.470
Nikhil Prakash: abstract ordering representation, and each kind of important token in the prompt. Those are the two main key… key terms in what I said.

74
00:09:34.840 --> 00:09:41.980
Nikhil Prakash: So, essentially, what that means is, here in this particular prompt, I would say that there are two main types of important tokens.

75
00:09:42.250 --> 00:09:46.860
Nikhil Prakash: First is the object, and the second one is the box label.

76
00:09:47.040 --> 00:09:50.210
Nikhil Prakash: Now, what the model does, it basically creates a label.

77
00:09:50.650 --> 00:09:59.529
Nikhil Prakash: for each of these two type of, information. So for Apple, since it's the first one in the prompt, it says that, okay, this is the first

78
00:10:00.140 --> 00:10:04.369
Nikhil Prakash: object in the prompt. And A is the first label in the prompt.

79
00:10:04.530 --> 00:10:09.220
Nikhil Prakash: Similarly, for banana, it says that, okay, this is the second object, and B is the second label.

80
00:10:10.090 --> 00:10:18.350
Nikhil Prakash: So it creates these abstract representations, to encode important information which is present in the prompt.

81
00:10:18.990 --> 00:10:23.800
Nikhil Prakash: And then, finally, when you ask a question, or some kind of…

82
00:10:24.020 --> 00:10:43.270
Nikhil Prakash: features about a particular label or a particular box, what it does is basically uses that ordering information to fetch its corresponding features that it will predict as the next token, where, in this case, the feature would be actual object which is present in that box.

83
00:10:44.180 --> 00:10:49.849
Aruna Sankaranarayanan: Nikhil, I have a question. I'm online, sorry, sorry. Can I ask now, or should I wait till the end?

84
00:10:49.850 --> 00:10:51.460
Nikhil Prakash: Yeah, yeah, go ahead, Matt. Yeah, yeah.

85
00:10:51.460 --> 00:10:58.880
Aruna Sankaranarayanan: Okay, so my question was, so in this case, like, the structure of the sentence is such that the…

86
00:10:58.880 --> 00:11:12.610
Aruna Sankaranarayanan: It has, like, this very constrained structure, right, where you have, you know, word followed by is in box, and is in box is identical, you know, in all the three segments, and then you have, kind of these labels that come up.

87
00:11:12.630 --> 00:11:14.270
Aruna Sankaranarayanan: So,

88
00:11:14.830 --> 00:11:26.580
Aruna Sankaranarayanan: Do you think that even if we didn't have this constraint structure, and, you know, if the sentence was pretty free-flowing, that semantically similar elements would still share

89
00:11:26.580 --> 00:11:36.289
Aruna Sankaranarayanan: this ordering ID, like, the model would… like, is the model doing it… doing this based on, like, some kind of semantic similarity between these components, or is there, like, some other…

90
00:11:36.310 --> 00:11:41.130
Aruna Sankaranarayanan: reason for… for why these, you know, for why these IDs exist.

91
00:11:42.900 --> 00:11:50.810
Nikhil Prakash: So I think for the first question, if… so the question was, for a given, like, free-form text, would we expect to see similar kind of…

92
00:11:50.980 --> 00:11:52.800
Nikhil Prakash: ordering IDs.

93
00:11:52.930 --> 00:12:07.279
Nikhil Prakash: Or not. I think my intuition is, I think we would still see these kinds of ordering ID. But the problem there is, since those are, like, general text, it will be more difficult to actually prove

94
00:12:09.010 --> 00:12:10.140
Nikhil Prakash: that…

95
00:12:10.630 --> 00:12:20.280
Nikhil Prakash: it… they are indeed there. I mean, we can still do, like, causal experiments to say that, okay, they are ex… they are present there, but in that case, the

96
00:12:20.600 --> 00:12:26.639
Nikhil Prakash: The… the process of coming up with the causal experiments would be way more difficult, because there would be way more confounds.

97
00:12:28.800 --> 00:12:38.160
Nikhil Prakash: Yeah, I think… but I… but my general intuition is that even in the freeform text, the model does create this kind of, ordering ID representation.

98
00:12:38.640 --> 00:12:44.019
Nikhil Prakash: Because I think we have seen this kind of representation across a bunch of…

99
00:12:44.350 --> 00:12:56.849
Nikhil Prakash: settings now. Even though most of those settings are still synthetic, kind of… still, in a sense, synthetic, but I think we have seen this… this kind of representation now across a bunch of tasks, bunch of settings.

100
00:12:57.030 --> 00:13:01.120
Nikhil Prakash: Which kind of give me the confidence that, okay, this is a generic thing, it's not just…

101
00:13:01.780 --> 00:13:05.059
Nikhil Prakash: constrained to this kind of synthetic setting.

102
00:13:05.060 --> 00:13:17.049
Aruna Sankaranarayanan: Yeah, okay, awesome. Thank you so much, yeah. I think Yuav's work also is probably, like, relevant here, maybe. Like, he also had these similar findings across many, many tasks, but they were all, again, synthetic, so…

103
00:13:18.050 --> 00:13:26.900
Nikhil Prakash: Yeah, even, yeah, the tasks that he used, I think he… those were more or less synthetic as well. I don't think so they… they had general… general text.

104
00:13:26.900 --> 00:13:29.360
Aruna Sankaranarayanan: Yeah, yeah, yeah, yeah, yeah, cool, thank you.

105
00:13:30.490 --> 00:13:35.620
Nikhil Prakash: We did see generalization to the mental deficit, which is much less.

106
00:13:35.990 --> 00:13:47.830
Nikhil Prakash: So, like, even if you can't find them in the playbook setting, you can show generalization or not. Yeah, that's right. I mean, yeah, it depends what you call synthetic funnel. The belief tracking was also not super synthetic.

107
00:13:50.860 --> 00:13:52.880
Nikhil Prakash: Yeah, but this mechanism is clear.

108
00:13:55.120 --> 00:14:00.480
Nikhil Prakash: So, okay, so that was the language model space, now let's talk about the… Huh.

109
00:14:01.030 --> 00:14:03.979
Nikhil Prakash: What if you have more than one object?

110
00:14:14.560 --> 00:14:17.999
Nikhil Prakash: Yeah, good question. I'm not super sure what will that have… what will that do.

111
00:14:18.100 --> 00:14:18.969
Nikhil Prakash: What do you think?

112
00:14:19.420 --> 00:14:21.700
Nikhil Prakash: Yeah, I'm not 100% sure.

113
00:14:22.780 --> 00:14:23.560
Nikhil Prakash: Yeah.

114
00:14:24.370 --> 00:14:25.840
Nikhil Prakash: them to have the same article.

115
00:14:26.220 --> 00:14:30.840
Nikhil Prakash: But then how… So it's not an order, something else.

116
00:14:31.820 --> 00:14:39.420
Nikhil Prakash: So it's gonna still be, like, the first object, first object? Yeah, I think both are gonna… it's gonna be counted as one object, that's what I… First…

117
00:14:39.530 --> 00:14:48.040
Nikhil Prakash: The phrase will be… what is the object inside box A that starts with A, or something like that? Like, if you target a specific one.

118
00:14:48.750 --> 00:14:57.949
Nikhil Prakash: Yes, it should be… Yeah, it shouldn't be exactly the same. There should be some difference, but…

119
00:14:59.340 --> 00:15:01.520
Nikhil Prakash: No, but for Boxide contains B.

120
00:15:01.710 --> 00:15:10.289
Nikhil Prakash: It will be the same, more or less the same marker for F1. No, it decides about the marker, where it sees the query.

121
00:15:10.900 --> 00:15:14.859
Nikhil Prakash: It decides about the markers before it sees the query.

122
00:15:15.290 --> 00:15:19.479
Nikhil Prakash: Yeah, but I would guess it creates different markers for usage.

123
00:15:19.990 --> 00:15:21.600
Nikhil Prakash: Yeah, I think so.

124
00:15:22.100 --> 00:15:23.050
Nikhil Prakash: Andrew.

125
00:15:24.300 --> 00:15:25.580
Nikhil Prakash: Martinez Lab.

126
00:15:26.180 --> 00:15:29.459
Nikhil Prakash: How to work… pull off words for music.

127
00:15:30.140 --> 00:15:35.190
Nikhil Prakash: yours that shows that there are different markers… Oh, he has a paper on this? I've not seen that paper.

128
00:15:35.410 --> 00:15:38.050
Nikhil Prakash: Is it outbound? Yeah, I think so.

129
00:15:38.250 --> 00:15:40.240
Nikhil Prakash: Oh, I've not seen that paper.

130
00:15:40.670 --> 00:15:43.639
Nikhil Prakash: Okay, Adish, yeah, if you can send that, it would be really helpful.

131
00:15:47.250 --> 00:15:48.260
Nikhil Prakash: Do you have an idea?

132
00:15:53.470 --> 00:15:57.830
Nikhil Prakash: What is getting binded to what?

133
00:15:58.930 --> 00:15:59.660
Nikhil Prakash: Anyone?

134
00:16:04.140 --> 00:16:10.910
Nikhil Prakash: I can't be buying books because… Really?

135
00:16:13.260 --> 00:16:20.859
Nikhil Prakash: Oh, oh, you asked about… But there is no happiness.

136
00:16:21.300 --> 00:16:23.279
Nikhil Prakash: No, no, you can get some templates.

137
00:16:24.160 --> 00:16:25.929
Nikhil Prakash: You asked me about Philadelph Edwards.

138
00:16:44.760 --> 00:16:46.450
Nikhil Prakash: How do we send them back.

139
00:16:46.780 --> 00:16:49.200
Nikhil Prakash: up above every little house.

140
00:16:49.810 --> 00:16:54.390
Nikhil Prakash: And… with the… I don't think that happened in the morning.

141
00:16:57.220 --> 00:17:00.189
Nikhil Prakash: Nico, wouldn't you say that your diagram

142
00:17:01.170 --> 00:17:04.010
Nikhil Prakash: Suggest, because you call it IDs.

143
00:17:07.750 --> 00:17:12.859
Nikhil Prakash: you're showing here is that, no, neither one. They're both being bound.

144
00:17:13.710 --> 00:17:15.080
Nikhil Prakash: To number one.

145
00:17:15.619 --> 00:17:19.029
Nikhil Prakash: They're not… it's not apples being bound to A, or A is being bound to apple.

146
00:17:19.440 --> 00:17:23.640
Nikhil Prakash: They've been this ID, right?

147
00:17:23.880 --> 00:17:29.209
Nikhil Prakash: there's an indirect… binding happening through the ID.

148
00:17:32.900 --> 00:17:33.740
Nikhil Prakash: Interesting.

149
00:17:35.300 --> 00:17:36.320
Nikhil Prakash: Basically.

150
00:17:38.280 --> 00:17:39.710
Nikhil Prakash: The finding that happens.

151
00:17:46.030 --> 00:17:49.560
Nikhil Prakash: So is it more like a 4.9? Yes, exactly, exactly.

152
00:17:50.760 --> 00:18:01.109
Nikhil Prakash: Yeah, but the… yeah, it's more like a pointer, but there's still the question of many-to-one binding, where, let's say, we have multiple objects, like, multiple… yeah, multiple objects in a single box.

153
00:18:01.390 --> 00:18:08.290
Nikhil Prakash: How… So that would be binded is something that I'm still not very sure.

154
00:18:11.120 --> 00:18:30.559
Nikhil Prakash: Okay, we can move forward. So, there was the existing results in the language model space. Now, let's talk… start talk… talking about the VLM results. So, before we get into the results, I just wanted to brush up on the architecture of a VLM. It's actually simple.

155
00:18:31.670 --> 00:18:32.480
Nikhil Prakash: Yes.

156
00:18:34.260 --> 00:18:44.620
Nikhil Prakash: So, yes, so there are primarily two components, or actually three components. There is a vision encoder and a projector on top of the vision encoder.

157
00:18:44.730 --> 00:18:57.640
Nikhil Prakash: And then finally, we have a language model backbone. So the image is first, broken down into smaller patches. Each of the patches becomes a specific token, which is fed into the vision encoder.

158
00:18:58.270 --> 00:19:00.219
Nikhil Prakash: Which is just VIT.

159
00:19:00.350 --> 00:19:03.960
Nikhil Prakash: It processes those… token information.

160
00:19:04.210 --> 00:19:12.869
Nikhil Prakash: And, yeah, then output of each of the token is basically given to the projector, which is supposed to trans… transpose the…

161
00:19:13.000 --> 00:19:17.399
Nikhil Prakash: Those vision encoder into language… language model space.

162
00:19:17.660 --> 00:19:28.359
Nikhil Prakash: And then those… Transformed vision encoder, vision embedding, are actually passed on to the language model, just as

163
00:19:29.940 --> 00:19:33.510
Nikhil Prakash: Normal token embedding, in addition with the…

164
00:19:33.640 --> 00:19:39.470
Nikhil Prakash: the token embeddings from the text. That's it. And then, finally, the language model predicts the next token in the text domain.

165
00:19:43.140 --> 00:19:47.790
Nikhil Prakash: Yeah, so there was this paper last year,

166
00:19:48.180 --> 00:19:56.049
Nikhil Prakash: Which show that the kind of mechanism that I showed for the language model in the last-to-last slide actually generalizes to VLMs as well.

167
00:19:56.230 --> 00:20:04.109
Nikhil Prakash: Though they only studied the language model backbone, but they show that this… the same kind of mechanism actually generalizes,

168
00:20:05.950 --> 00:20:08.960
Nikhil Prakash: that setting as well. So, just to it.

169
00:20:09.230 --> 00:20:15.219
Nikhil Prakash: They only show, sorry, but they only show, like, what's happening in the text tokens. They didn't say anything about the visual text.

170
00:20:15.590 --> 00:20:25.730
Nikhil Prakash: So they… they have just one… one experiment. They say that the information comes from the visual. But they do one experiment on the key vectors of visual tokens.

171
00:20:27.280 --> 00:20:31.989
Nikhil Prakash: Yeah, it's… so they don't say it as clearly as I'm gonna explain it to you.

172
00:20:32.110 --> 00:20:38.130
Nikhil Prakash: But I think, based on what I've explained to you in the language setting, and based on their result, I think we can infer that.

173
00:20:39.250 --> 00:20:41.850
Nikhil Prakash: Okay, so essentially, this is what happens, so…

174
00:20:42.470 --> 00:20:51.089
Nikhil Prakash: Yeah, the image gets passed on from Vision Encoder and the projector, and then in the language model.

175
00:20:51.290 --> 00:21:00.690
Nikhil Prakash: in the language model backbone, we… or they have shown that the model creates this ordering ID for, let's say, both horse

176
00:21:00.950 --> 00:21:05.360
Nikhil Prakash: and CAD, as well as Fedora and, CAP.

177
00:21:06.570 --> 00:21:08.970
Nikhil Prakash: In the visual token residual string.

178
00:21:09.090 --> 00:21:20.519
Nikhil Prakash: And then when we ask about cat is wearing a, and the answer should be cat, the model basically… the language model basically fetches the ordering ID of the cat, which is 2,

179
00:21:20.670 --> 00:21:29.690
Nikhil Prakash: from the vision token to the CAT token residual stream, and then it passes on that piece of information to the last token, the A token.

180
00:21:30.170 --> 00:21:37.529
Nikhil Prakash: Which is used to, again, fetch in the corresponding, object ordering ID.

181
00:21:38.190 --> 00:21:45.130
Nikhil Prakash: And then finally, the model uses that object ordering ID to actually fetch its value, the cap.

182
00:21:45.270 --> 00:21:47.969
Nikhil Prakash: Which is actually the final prediction as the answer.

183
00:21:49.910 --> 00:21:50.650
Nikhil Prakash: Good.

184
00:21:57.920 --> 00:22:03.430
Nikhil Prakash: How does we know that if this rectangle represents a cat?

185
00:22:05.350 --> 00:22:09.840
Nikhil Prakash: How… I'm not saying that if it… it could be representing anything.

186
00:22:10.670 --> 00:22:14.140
Nikhil Prakash: But… Let's say… so when we…

187
00:22:15.680 --> 00:22:18.000
Nikhil Prakash: Break down the image into smaller batches.

188
00:22:18.590 --> 00:22:20.519
Nikhil Prakash: Let's say the last patch.

189
00:22:21.500 --> 00:22:28.890
Nikhil Prakash: Which, the last patch, which has some… region of the cat.

190
00:22:29.250 --> 00:22:31.510
Nikhil Prakash: will have this ordering ID.

191
00:22:32.770 --> 00:22:40.140
Nikhil Prakash: Okay. Even that's not… I don't think so they explained that in that paper. I think… I think in that paper… in their paper, they just had…

192
00:22:40.690 --> 00:22:45.049
Nikhil Prakash: like… Shapes, which can be covered with a single token.

193
00:22:46.780 --> 00:22:48.880
Nikhil Prakash: Single batch. My single batch.

194
00:22:49.720 --> 00:22:56.090
Nikhil Prakash: So if you have an object which spread across multiple patches.

195
00:22:57.040 --> 00:22:58.330
Nikhil Prakash: then I think…

196
00:22:58.970 --> 00:23:03.520
Nikhil Prakash: things become a little bit more complicated, but I think we can say that at least the last token of that

197
00:23:03.730 --> 00:23:10.650
Nikhil Prakash: the last batch of that particular object should encode this ordering ID in the language model space.

198
00:23:12.740 --> 00:23:20.479
Nikhil Prakash: It's the same as Rome, basically. It's kind of like the last entity of the, like, the last token of the entity includes…

199
00:23:20.600 --> 00:23:28.200
Nikhil Prakash: Kind of like that. Interesting, yeah, it's the same in… But here, the passions are not, Sequential.

200
00:23:28.400 --> 00:23:35.999
Nikhil Prakash: Like, a cat can be some, like, tentacles, and then… Yeah, yeah. That's right, that's right.

201
00:23:36.660 --> 00:23:39.689
Nikhil Prakash: There could be a lot of space in between. Yeah.

202
00:23:42.210 --> 00:23:49.150
Nikhil Prakash: Though they don't show it in the paper, I… yeah, this is my understanding and a little bit of our experiment.

203
00:23:49.870 --> 00:23:50.780
Nikhil Prakash: Okay, but so this is…

204
00:23:50.780 --> 00:24:01.360
Aruna Sankaranarayanan: I have another… I have another question. So, you're saying that each patch then is associated directly with an object, and the object is not spread across multiple patches?

205
00:24:01.800 --> 00:24:05.319
Aruna Sankaranarayanan: In the datasets that you're using in this experiment?

206
00:24:06.380 --> 00:24:08.530
Nikhil Prakash: In the paper that I'm talking about.

207
00:24:09.490 --> 00:24:12.260
Aruna Sankaranarayanan: In the paper that you're talking about. Okay, okay, go ahead.

208
00:24:12.260 --> 00:24:13.340
Nikhil Prakash: In this particular paper.

209
00:24:15.930 --> 00:24:20.710
Nikhil Prakash: Okay, so, yeah, basically the same story in the… VLM setting.

210
00:24:23.250 --> 00:24:34.699
Nikhil Prakash: But I think there's still a lot… there are still many questions that remains unanswered. So I think this is one of the first questions that we study in this paper, which is, where do

211
00:24:35.000 --> 00:24:38.239
Nikhil Prakash: Where does… where does the ordering ID get formed?

212
00:24:38.430 --> 00:24:45.890
Nikhil Prakash: Is it in the language model backbone itself, where we have already some evidence that it is present there?

213
00:24:46.100 --> 00:24:54.080
Nikhil Prakash: Or maybe it is not getting generated in the language model, it is actually getting generated in the vision encoder, and is being passed on to the…

214
00:24:54.180 --> 00:24:55.670
Nikhil Prakash: language model backbone.

215
00:24:58.510 --> 00:25:05.969
Nikhil Prakash: Okay, so that's, like, one of the questions that we study. Another question that we study is how are they represented?

216
00:25:06.860 --> 00:25:09.220
Nikhil Prakash: Is it actually localized?

217
00:25:09.590 --> 00:25:13.989
Nikhil Prakash: Or, is it diffused across tokens?

218
00:25:14.590 --> 00:25:16.980
Nikhil Prakash: When I say it, I mean…

219
00:25:17.880 --> 00:25:20.420
Nikhil Prakash: Yeah, okay, I mean the ordering ID.

220
00:25:22.720 --> 00:25:27.759
Nikhil Prakash: And again, we know that ordering IDs are present in the language model backbone.

221
00:25:28.070 --> 00:25:30.370
Nikhil Prakash: So can we better characterize them?

222
00:25:30.920 --> 00:25:33.579
Nikhil Prakash: I mean, what kinds of…

223
00:25:35.360 --> 00:25:43.519
Nikhil Prakash: what kinds of representation are they? Are they encoding, like, a relative… sort of like a relative position, or are they encoding more like an abstract position, like…

224
00:25:46.930 --> 00:25:52.410
Nikhil Prakash: This is the two coordinates in the image, or is it, like, the first object in the image?

225
00:25:52.850 --> 00:26:00.640
Nikhil Prakash: So those are the three main questions that we study, and I think it will become more clear, I think, as I move… as I show the result and the experimental setups.

226
00:26:02.260 --> 00:26:08.780
Nikhil Prakash: Okay. Yeah, so we studied these four settings,

227
00:26:08.920 --> 00:26:13.690
Nikhil Prakash: 3 of them are synthetic, and one is actually coming from a real benchmark.

228
00:26:14.150 --> 00:26:28.509
Nikhil Prakash: The first one is actually the simplest one, where we generate these squares of equal sizes, but different colors. They are either spread horizontally or vertically. Here, I'm only showing the horizontal one, but we have settings where

229
00:26:30.260 --> 00:26:33.499
Nikhil Prakash: The squares and the shapes and the objects are spread vertically.

230
00:26:34.760 --> 00:26:36.910
Nikhil Prakash: And the question that we ask is.

231
00:26:37.270 --> 00:26:42.240
Nikhil Prakash: So let's say… let's assume the square… this particular square example… square image example.

232
00:26:42.380 --> 00:26:52.380
Nikhil Prakash: So, for this particular square, we could ask, what is the color of the square to the left of the green square? Or the color of the

233
00:26:54.050 --> 00:26:58.360
Nikhil Prakash: Squared to the… Left of green square is

234
00:26:58.870 --> 00:27:00.929
Nikhil Prakash: And the answer should be red.

235
00:27:01.140 --> 00:27:02.650
Nikhil Prakash: That's basically the task.

236
00:27:04.220 --> 00:27:10.170
Nikhil Prakash: And we study two models, Quen and Gemma models.

237
00:27:10.440 --> 00:27:12.929
Nikhil Prakash: And both the models can do this task perfectly.

238
00:27:14.790 --> 00:27:16.710
Nikhil Prakash: Okay, so that was…

239
00:27:18.000 --> 00:27:24.850
Nikhil Prakash: That's the setup. Now, we are starting to get into the experimental setups, or the experiments that we did in the paper.

240
00:27:25.160 --> 00:27:30.470
Nikhil Prakash: Now, the first thing that we did was to actually cross-check if the…

241
00:27:30.600 --> 00:27:45.209
Nikhil Prakash: VLM is, again, using this ordering ID information to solve this task or not. Even though the previous work has already shown that, but that was for a different task. So we wanted to confirm that this particular result actually generalizes to our setting or not.

242
00:27:48.260 --> 00:27:52.040
Nikhil Prakash: So for that, we did, a patching experiment.

243
00:27:52.440 --> 00:27:55.220
Nikhil Prakash: Which is pretty simple, actually. So we have this…

244
00:27:55.690 --> 00:27:58.939
Nikhil Prakash: two samples. The first one is the clean sample.

245
00:27:59.100 --> 00:28:07.389
Nikhil Prakash: Where the answer to the question is red, and for the sample on the right, the answer to the question is black.

246
00:28:08.390 --> 00:28:09.220
Nikhil Prakash: Okay?

247
00:28:10.770 --> 00:28:14.069
Nikhil Prakash: The other major difference between these two samples are…

248
00:28:14.340 --> 00:28:18.790
Nikhil Prakash: In the first sample, the first square is the correct answer.

249
00:28:18.890 --> 00:28:22.630
Nikhil Prakash: And in the second sample, the third square is the correct sample.

250
00:28:23.510 --> 00:28:30.570
Nikhil Prakash: Okay? And we are doing the intervention at the last text token, which is in this particular setting.

251
00:28:30.780 --> 00:28:40.590
Nikhil Prakash: So, we are taking the residual vector at the ACE token from the counterfactual run, and pasting it onto its corresponding location and layer.

252
00:28:40.820 --> 00:28:41.939
Nikhil Prakash: In the clean run.

253
00:28:42.210 --> 00:28:45.020
Nikhil Prakash: And then checking how does the final output change.

254
00:28:45.210 --> 00:28:52.939
Nikhil Prakash: And we are expecting that if a particular layer is encoding the ordering ID information.

255
00:28:55.850 --> 00:29:03.639
Nikhil Prakash: When that layer is patched from the counterfactual run to the clean run, the final output of the clean run should change from red

256
00:29:03.980 --> 00:29:10.869
Nikhil Prakash: to blue Is that… does that make sense? I… I have just spoken a lot of words.

257
00:29:11.790 --> 00:29:12.590
Nikhil Prakash: Okay.

258
00:29:13.890 --> 00:29:24.919
Nikhil Prakash: So, we did that experiment, and this is the result. We see that in the later-ish layer, something after layer 20, we do see that

259
00:29:25.130 --> 00:29:28.140
Nikhil Prakash: That the last,

260
00:29:28.340 --> 00:29:39.309
Nikhil Prakash: token is encoding this ordering ID representation, and if we patch that from the counterfactual to the clear run, we ex… we get blue color as our… as the final output.

261
00:29:40.390 --> 00:29:50.640
Aruna Sankaranarayanan: Nikhil, can you explain this again? Like, why would you not expect the color to be black? Like, why would you expect it to be blue, like, when you're doing this patching?

262
00:29:51.110 --> 00:29:55.959
Nikhil Prakash: Okay, so idea here is, we are… Wait, wait, wait, would you expect it to be black?

263
00:29:57.970 --> 00:30:02.629
Aruna Sankaranarayanan: I don't know, like, I feel like it should be black, yeah, I do expect it to be black.

264
00:30:02.630 --> 00:30:04.749
Nikhil Prakash: That's reasonable. It should be black.

265
00:30:06.180 --> 00:30:22.559
Nikhil Prakash: Didn't… didn't your figure in the next slide show that at the later layers, it does start getting black? Yeah, so you guys… yeah, you already have partial answer. I mean, you, yeah, almost have the answer. So, essentially, what the model does at the last token, it first forms this ordering ID information.

266
00:30:23.080 --> 00:30:24.909
Nikhil Prakash: Of the correct square.

267
00:30:27.120 --> 00:30:29.759
Nikhil Prakash: Which it needs to predict as the next token.

268
00:30:30.070 --> 00:30:44.120
Nikhil Prakash: And once it has formed that ordering ID information, then it uses that piece of information to actually fetch the feature associated with that square that it needs to answer. So in this case, in this task, the feature is the color of that square.

269
00:30:44.420 --> 00:30:52.959
Nikhil Prakash: So, in the later-ish layer, after it has formed the ordering ID, it actually uses it to fetch the color of that square, and hence.

270
00:30:53.410 --> 00:30:57.500
Nikhil Prakash: In further later layers, we see black color as the final output.

271
00:30:58.640 --> 00:31:16.019
Aruna Sankaranarayanan: Oh, so wait, sorry, so if I understand correctly, both the feature information as well as the ordering information is encoded in that representation you're using to patch, but the ordering information from the representation is recovered earlier, like, before the feature information.

272
00:31:16.510 --> 00:31:27.200
Nikhil Prakash: Yes, you can say that. So, model is first forming the ordering information, and then using it to fetch the feature information at further layers, or later layers.

273
00:31:29.370 --> 00:31:44.899
Nikhil Prakash: Could… would this be, like, a correct, very high-level understanding of, like, what this represents? Like, around layer 20, when that blue peak is happening, it's forming, like, the index to look up, but it's not actually doing the lookup yet, so when you patch right at those layers, you intervene on, like.

274
00:31:44.900 --> 00:31:53.359
Nikhil Prakash: the location, that it's looking up before it actually retrieves the value there. Okay. That's exactly what… that's 100% correct.

275
00:31:54.760 --> 00:32:00.710
Nikhil Prakash: I even had a question. Yeah, so, what exactly did you catch? Is this the M of the alpha, or something? This is just residual vector.

276
00:32:00.890 --> 00:32:01.740
Nikhil Prakash: Was it true?

277
00:32:04.450 --> 00:32:11.570
Nikhil Prakash: Okay, so from this experiment, we can say that even for this particular task, the VLM.

278
00:32:11.760 --> 00:32:16.319
Nikhil Prakash: is using this ordering ID information to solve the task.

279
00:32:16.510 --> 00:32:21.260
Nikhil Prakash: Okay, so now, coming to our… first… I…

280
00:32:21.380 --> 00:32:25.489
Nikhil Prakash: Like, one of the main questions that we study in this paper.

281
00:32:25.730 --> 00:32:33.290
Nikhil Prakash: Which is to understand where exactly this ordering ID information is getting generated in the VLM.

282
00:32:33.850 --> 00:32:41.089
Nikhil Prakash: Yeah. Double question, you look at, the varying colors themselves.

283
00:32:41.560 --> 00:32:42.250
Nikhil Prakash: Right.

284
00:32:43.050 --> 00:32:49.510
Nikhil Prakash: just seeing, you know, this should be an invariant, so I'd be changing some, like, orange.

285
00:32:50.390 --> 00:32:51.290
Nikhil Prakash: Brown, right?

286
00:32:52.880 --> 00:33:01.480
Nikhil Prakash: Yes, I think I should have mentioned it. So this… this graph that you see is… is actually averaged over 50 samples. Oh.

287
00:33:02.180 --> 00:33:09.049
Nikhil Prakash: Yeah, I think we should mention that. It's… yeah, and those 50 samples have various different combinations of colors.

288
00:33:09.480 --> 00:33:15.399
Nikhil Prakash: So, it's not really specific to this particular sample. And yeah, so results are actually…

289
00:33:15.940 --> 00:33:19.409
Nikhil Prakash: Averaged over a bunch of samples with different colors.

290
00:33:21.820 --> 00:33:26.360
Nikhil Prakash: Okay, So you did residual…

291
00:33:26.590 --> 00:33:31.890
Nikhil Prakash: Do you also look at… Any specific mechanism, like… Does this,

292
00:33:38.170 --> 00:33:42.219
Nikhil Prakash: Like, I see that skins.

293
00:33:42.430 --> 00:33:43.620
Nikhil Prakash: Like, 80%.

294
00:33:44.540 --> 00:33:55.900
Nikhil Prakash: This is not an intervention like this, right? But even, even, even that, I would say, maybe, maybe there is some information in a…

295
00:33:56.670 --> 00:34:05.769
Nikhil Prakash: in previous token, one or two previous tokens, like, square token might have some information about the ordering ID, which we are not really patching upon.

296
00:34:05.980 --> 00:34:11.300
Nikhil Prakash: maybe that still holds the ordering ID of… Of the left square.

297
00:34:11.449 --> 00:34:19.409
Nikhil Prakash: And that's why we're not seeing super high intervention. Well, you know more than this. I mean, when you get to the end of the paper, you're gonna say that…

298
00:34:20.199 --> 00:34:23.339
Nikhil Prakash: Instead of saying maybe, you're going to say, oh, there is more than one.

299
00:34:23.840 --> 00:34:24.560
Nikhil Prakash: Nope.

300
00:34:28.710 --> 00:34:43.549
Nikhil Prakash: Oh, sorry, we are not seeing… 100%. Yeah, 100% here. Because there's one mechanism? Yeah. Because there's more than one mechanism? No, but this is… this is the end of both the mechanism. Both the mechanisms have already come by. They both come together. Yeah, yeah, yeah. This is almost the end of the process. Okay.

301
00:34:43.550 --> 00:34:49.510
Nikhil Prakash: So end is… the end is merged. The end is merged. Yeah, end is almost merged. This is almost the end of the competition.

302
00:34:49.590 --> 00:34:58.830
Nikhil Prakash: Yeah, so it's like, if the thing that you're patching in is indeed this, like, the right side index, then what? Cleanly patch it, then you should probably get it

303
00:35:00.450 --> 00:35:02.960
Nikhil Prakash: Yeah, so my guess is maybe we are…

304
00:35:03.260 --> 00:35:07.489
Nikhil Prakash: There are… there is some information maybe diffused over some other…

305
00:35:07.600 --> 00:35:17.309
Nikhil Prakash: Tokens that we are not patching upon, and that's why we're not seeing a completely high… Intervention… effect.

306
00:35:17.920 --> 00:35:19.479
Nikhil Prakash: Still my TV clone.

307
00:35:19.620 --> 00:35:24.609
Nikhil Prakash: The gap between saying grid and saying that is so small.

308
00:35:24.960 --> 00:35:26.150
Nikhil Prakash: Surprisingly.

309
00:35:26.750 --> 00:35:33.440
Nikhil Prakash: But there is, like, there is overlap between where the ordering representation is affected, and the value representation.

310
00:35:33.980 --> 00:35:38.899
Nikhil Prakash: So basically, what we see, when we see the blue peaks, it's already overlapped with it.

311
00:35:39.450 --> 00:35:48.109
Nikhil Prakash: Yeah. Yeah, that's also a point. And also, like, since it's… it captured the ID, and it has to go look for it.

312
00:35:48.470 --> 00:35:50.980
Nikhil Prakash: Can we say that most of this happens in attention?

313
00:35:51.180 --> 00:35:53.530
Nikhil Prakash: Yeah, yeah, we can send it, yeah.

314
00:35:54.480 --> 00:36:00.720
Nikhil Prakash: Yeah, in this work, we don't look into our attention heads, but… Yeah, I think previous works.

315
00:36:01.250 --> 00:36:14.789
Nikhil Prakash: I'm not sure if… if you would get better performance, like, better effect just by patching in the heads. One point is what Tamar said. Maybe because of that, you might see

316
00:36:15.540 --> 00:36:17.320
Nikhil Prakash: But I'm not 100% sure about that.

317
00:36:18.260 --> 00:36:24.759
Nikhil Prakash: Yeah, if it's different heads doing the valuation… Yeah, yeah, then you might… you might get better slightly, yeah, yeah.

318
00:36:25.600 --> 00:36:32.069
Nikhil Prakash: But we do hypothesize that it's very similar, the look-back, overall. Right, yeah. It's basically a look-back, yeah.

319
00:36:34.210 --> 00:36:39.050
Nikhil Prakash: Okay, so where are these ordering ID information are getting formed?

320
00:36:43.620 --> 00:37:01.379
Nikhil Prakash: So, the first experiment that we did was actually a probing experiment. We tried to… we asked this question of whether the vision embedding, vision token embeddings, which is fed into the language model backbone, if they already contain this ordering ID information or not.

321
00:37:01.900 --> 00:37:04.390
Nikhil Prakash: And what we do is, we…

322
00:37:04.920 --> 00:37:08.270
Nikhil Prakash: We take the representation of… or…

323
00:37:08.830 --> 00:37:12.210
Nikhil Prakash: the embedding of each of the square tokens.

324
00:37:13.100 --> 00:37:21.860
Nikhil Prakash: And, basically, just train a linear three-class classifier on top of, those representations.

325
00:37:22.670 --> 00:37:31.219
Nikhil Prakash: We used, like, 90 samples for training and 30 samples for testing, and the probe accuracy was almost perfect.

326
00:37:31.340 --> 00:37:34.410
Nikhil Prakash: So this, in a sense, shows that

327
00:37:34.700 --> 00:37:41.370
Nikhil Prakash: The ordering information is already present in the vision token embedding, which is fed into the language model.

328
00:37:42.480 --> 00:37:44.630
Nikhil Prakash: However, this was the most interesting result.

329
00:37:44.940 --> 00:37:46.449
Nikhil Prakash: So what we did was, in…

330
00:37:46.650 --> 00:37:57.410
Nikhil Prakash: So we train the probe only on the square token, like, the embeddings corresponding to the square token. But now, once we have the probe, we can basically apply the probe to every

331
00:37:57.650 --> 00:38:01.099
Nikhil Prakash: Visual, visual token embedding.

332
00:38:01.560 --> 00:38:09.500
Nikhil Prakash: So that's what we did. We took the probe, we applied it on each Each of the…

333
00:38:09.850 --> 00:38:16.730
Nikhil Prakash: each of the patch, or its corresponding embedding, and this is the result that we get. What we found was…

334
00:38:17.020 --> 00:38:24.299
Nikhil Prakash: For the probe that was trained to find the ordering information of the first square, it has a

335
00:38:25.140 --> 00:38:34.960
Nikhil Prakash: Good effect, or it has a good, probing accuracy on the test set on entire strip which is actually covering that square.

336
00:38:35.980 --> 00:38:43.509
Nikhil Prakash: So, this is where the square is supposed to be, somewhere here, but we found that this ordering ID information is actually spread across

337
00:38:43.920 --> 00:38:45.400
Nikhil Prakash: This entire strip.

338
00:38:46.680 --> 00:38:48.389
Nikhil Prakash: Hop me up, question.

339
00:38:49.320 --> 00:38:55.599
Nikhil Prakash: Okay. So, yeah, so this… so this is the image, okay?

340
00:38:55.990 --> 00:38:59.199
Nikhil Prakash: We break down the image into individual patches.

341
00:38:59.770 --> 00:39:02.379
Nikhil Prakash: Those individual patches become the tokens.

342
00:39:02.520 --> 00:39:12.909
Nikhil Prakash: which is fed into the VIT, and then the projector, and then we have these embeddings, which are fed into the language model backbone.

343
00:39:13.080 --> 00:39:22.810
Nikhil Prakash: We took… we took those embeddings, which are fed into the language model backbone, Corresponding to the square… square…

344
00:39:22.950 --> 00:39:24.109
Nikhil Prakash: Tokens only.

345
00:39:24.680 --> 00:39:31.689
Nikhil Prakash: So we took the… So we know that which token, which…

346
00:39:31.990 --> 00:39:43.169
Nikhil Prakash: like, the 100 token corresponds to the first square. 200 token corresponds to the second square. So we only picked up 100, 200, and 300,

347
00:39:43.650 --> 00:39:44.840
Nikhil Prakash: Vision tokens.

348
00:39:45.600 --> 00:39:47.680
Nikhil Prakash: Okay.

349
00:39:48.080 --> 00:39:55.939
Nikhil Prakash: Because those corresponding to… those are the ones for the square tokens. Those are the ones which is encoding.

350
00:39:57.390 --> 00:39:58.500
Nikhil Prakash: square.

351
00:39:59.450 --> 00:40:02.490
Nikhil Prakash: You can physically point to it if you need to.

352
00:40:02.660 --> 00:40:22.040
Nikhil Prakash: You took it from the image? That's how we created the image. So we created the image, we have full control of where we can put the square in the image. So we put it in a specific position in that image, so that when we break it down, it actually corresponds to the 100th token which gets generated.

353
00:40:22.470 --> 00:40:24.300
Nikhil Prakash: Okay. And then what's the form?

354
00:40:24.590 --> 00:40:30.539
Nikhil Prakash: Yeah, so the… once you have this 100, 200, and 300 active, like, token embedding.

355
00:40:30.830 --> 00:40:38.979
Nikhil Prakash: Then the 100th one corresponding… it corresponds to, like, the first square, 200 one corresponds to the second square, and 300 corresponds to the third square.

356
00:40:39.490 --> 00:40:44.470
Nikhil Prakash: That's what the… Probe is tasked to classify.

357
00:40:45.860 --> 00:40:47.950
Nikhil Prakash: We're missing the size of the pitch.

358
00:40:49.240 --> 00:40:52.600
Nikhil Prakash: Square is… now we are talking about vectors. This is embedding.

359
00:40:53.680 --> 00:41:10.040
Nikhil Prakash: Each square… when you say square, I'm not sure. Yeah, there's so many squares all over the diagram, so yeah, I think that it might be helpful to answer. So this… So, so this is a vector here now. This is… that's not a square.

360
00:41:10.200 --> 00:41:14.739
Nikhil Prakash: That's a vector. That's a vector. This is a vector. Yeah, this is a vector.

361
00:41:17.720 --> 00:41:24.440
Nikhil Prakash: So, each square… Which one is the square? That's the square. This is… when I say square, I mean this… this square.

362
00:41:24.580 --> 00:41:39.660
Nikhil Prakash: No. This is only one? Well, it just happens to be. It could have been, I mean, so, like, okay. Oh, sorry, yeah, we have four vector… sorry. But for understanding, let's say there is only one vector. No, actually, for understanding, it might be clearer if you… there's four patches.

363
00:41:39.700 --> 00:41:49.629
Nikhil Prakash: They cover that square. Okay, let's say… yeah, in technical… in actual terms, there are, like, 4 vectors corresponding to each of the red, green, and blue squares, okay?

364
00:41:49.730 --> 00:41:56.390
Nikhil Prakash: So, okay, so now you have… 4, 4, 12 vectors.

365
00:41:57.020 --> 00:41:58.000
Nikhil Prakash: You got it?

366
00:41:58.370 --> 00:42:02.190
Nikhil Prakash: So for the first four vectors corresponds to the first square.

367
00:42:03.130 --> 00:42:05.090
Nikhil Prakash: That's the ground truth label of them.

368
00:42:05.280 --> 00:42:14.170
Nikhil Prakash: The next four squares corresponds to the second one, and the last one corresponds to the third one. Essentially, that's what the probe is trying to

369
00:42:14.310 --> 00:42:21.990
Nikhil Prakash: classifier. We feed each of these 12 vectors in the probe, and the probe is supposed to give me 1, 2, or 3.

370
00:42:22.530 --> 00:42:25.410
Nikhil Prakash: And then you use the same probe to see

371
00:42:25.600 --> 00:42:33.089
Nikhil Prakash: Other… Yeah. Other vectors? Yeah, exactly. So, is it one probe for all horizontal, vertical? No,

372
00:42:33.760 --> 00:42:36.330
Nikhil Prakash: Horizontal is 1.

373
00:42:37.000 --> 00:42:48.490
Nikhil Prakash: I mean, so this is a three-class classifier, so we basically have, like, three… three vectors. That makes sense. So you… the same image, if you flip it, you can still use the same token, right? Because the token dimensions, everything is the same.

374
00:42:50.530 --> 00:42:52.690
Nikhil Prakash: No, I think we trained a different one.

375
00:42:54.750 --> 00:43:01.530
Nikhil Prakash: It might work, though, it might work that the horizontal work might be able to work in the vertical work, but I don't think so we tried it.

376
00:43:01.680 --> 00:43:05.849
Nikhil Prakash: Yeah, it might be more convincing that it's a positional.

377
00:43:06.630 --> 00:43:08.450
Nikhil Prakash: Like, relative to missionary.

378
00:43:08.920 --> 00:43:10.290
Nikhil Prakash: Thank you.

379
00:43:10.710 --> 00:43:15.409
Nikhil Prakash: Yeah, that is so good. It looks like the… from the burning sewer, it's just cued into, like.

380
00:43:16.340 --> 00:43:18.699
Nikhil Prakash: Positional coding of the token, not necessarily, like, the…

381
00:43:19.180 --> 00:43:27.309
Nikhil Prakash: semantic relationship between the squares. Could be literally the X and Ys. Yeah. It should be like that. But you have some results that show that it's more…

382
00:43:28.090 --> 00:43:34.419
Nikhil Prakash: Yeah, but sorry, why do you think it's X and Y? Because… Yeah, from the heat map, it just looks like…

383
00:43:34.510 --> 00:43:54.120
Nikhil Prakash: what the signal the probe has cued into is just the positional coding of the token from the VIT encoder. When you say position, you mean X and Y coordinates? Yeah, just X and Y coordinates, yeah. Okay. It's not necessarily, like, the… because it seems like what you're after is, like, the relationship, or, like, the orderings of, like… So, the counter for that argument is, you think the…

384
00:43:54.140 --> 00:43:57.799
Nikhil Prakash: X and Y of this particular square is…

385
00:43:58.450 --> 00:44:08.000
Nikhil Prakash: Isn't the X and Y coordinate of this particular square, or this particular patch, is at the same distance at some square here? No, no, the X is… the X is different.

386
00:44:08.690 --> 00:44:12.230
Nikhil Prakash: But the distance, if you look at… Just the X.

387
00:44:12.390 --> 00:44:23.690
Nikhil Prakash: So it has picked up X… Yeah. But we do know now that you can change the position, if you change the position of this object inside the image, it does generate this.

388
00:44:23.850 --> 00:44:31.220
Nikhil Prakash: So, if you take… Do you have that experiment somewhere? Yeah, but yeah, I do have that experiment.

389
00:44:32.660 --> 00:44:38.759
Nikhil Prakash: I'm gonna go through that, but I'm still not sure about it.

390
00:44:39.160 --> 00:44:40.830
Nikhil Prakash: Cause if you… if…

391
00:44:42.330 --> 00:44:56.699
Nikhil Prakash: which is, like, flipped over its own vertical, and it still generalizes, I'd be, like, more convinced with it. Yeah, that I agree with. That, that makes sense as well. Do you have a probing one?

392
00:44:56.880 --> 00:45:00.469
Nikhil Prakash: I just sent it. Oh, no, I did not include.

393
00:45:01.390 --> 00:45:12.509
Nikhil Prakash: Should I open Discord now? Yes. Yeah, go on. You said it's interactive, not the formal talk, so here we are. You guys can see all of our talk.

394
00:45:13.860 --> 00:45:28.469
Nikhil Prakash: Oh, man, don't… okay. Nobody saw it. Okay, then you, you, you, you've got to explain it, you've got to explain. Tamar needs to explain the experiments. We can't see. Oh, that's better, you can't see it.

395
00:45:29.080 --> 00:45:37.859
Nikhil Prakash: I don't know, it's not applicable. That's right. Only about you. Only about me, that's right. That's right, that's right, especially about…

396
00:45:39.390 --> 00:45:41.410
Nikhil Prakash: Especially that one. Special government.

397
00:45:41.780 --> 00:45:43.190
Nikhil Prakash: Redhead's in trouble now.

398
00:45:45.260 --> 00:45:53.419
Nikhil Prakash: No, it's not. Okay, never mind, it's not getting copied.

399
00:45:54.370 --> 00:45:56.040
Nikhil Prakash: Say that again. Okay.

400
00:45:56.350 --> 00:46:08.519
Nikhil Prakash: And then, what should I press? Watch out.

401
00:46:09.950 --> 00:46:15.630
Nikhil Prakash: Okay, I'm changing my entire skin now.

402
00:46:15.740 --> 00:46:17.730
Nikhil Prakash: It's okay, it's okay, it's okay.

403
00:46:18.050 --> 00:46:20.610
Nikhil Prakash: You wanna explain what… What is this?

404
00:46:20.720 --> 00:46:29.050
Nikhil Prakash: Okay, so we trained… it's the same probe, the same as, what, Anthony asked, so we trained the probe on the squares on the top.

405
00:46:29.310 --> 00:46:35.399
Nikhil Prakash: Squares on the top. And something funny that we saw is that if you…

406
00:46:35.640 --> 00:46:38.030
Nikhil Prakash: Try to train the probe.

407
00:46:38.350 --> 00:46:42.240
Nikhil Prakash: on each layer of the vision encoder, it actually…

408
00:46:42.650 --> 00:46:51.079
Nikhil Prakash: you get, like, 100% accuracy from layer 0, because it can overfit to the position embedded, and that's it. Yeah.

409
00:46:53.390 --> 00:47:02.509
Nikhil Prakash: But what if you try to train the probe on one image, and then test the generalization on another image, where the position of the input is different?

410
00:47:03.170 --> 00:47:09.310
Nikhil Prakash: And what we see if we do that is that we still get a nice accuracy on top of

411
00:47:09.650 --> 00:47:11.930
Nikhil Prakash: the… Square itself.

412
00:47:12.560 --> 00:47:15.179
Nikhil Prakash: But we still see these, like, strips.

413
00:47:15.930 --> 00:47:21.030
Nikhil Prakash: In the original position. So the bottom one? Yeah, yeah.

414
00:47:21.300 --> 00:47:23.400
Nikhil Prakash: Do you have the… do you have the plot?

415
00:47:24.480 --> 00:47:30.720
Nikhil Prakash: So, here, let's just look at the, blue, green, and orange. So, the blue and green

416
00:47:31.240 --> 00:47:37.429
Nikhil Prakash: It's just a quick frame, and the layer on the top… on the bottom is the varying color layers.

417
00:47:37.670 --> 00:47:39.719
Nikhil Prakash: So we're here…

418
00:47:41.140 --> 00:47:48.830
Nikhil Prakash: You can see that just from layer 0, you can get nice accuracy if you just train them on the squares or shifted squares, it can get it right.

419
00:47:50.120 --> 00:48:01.629
Nikhil Prakash: In the orange one, it's actually the other image. So you train on one setting, and then test on different locations of the… of the squares. And you can see that it peaks on layer 15.

420
00:48:01.630 --> 00:48:11.650
Nikhil Prakash: It doesn't get 100% accuracy, meaning that there is kind of like a mix of information, but some of the information is not the absolute location, but something relative about the order.

421
00:48:11.950 --> 00:48:17.620
Nikhil Prakash: So, let's say, for your training set, the first square is always in token 100.

422
00:48:17.800 --> 00:48:29.929
Nikhil Prakash: Yes, exactly. Now, in your test set, it said, like, token 2004. Yeah, but even be at the location of the third one. But it's still the first square. You're not shifting the squares, it's the zoomed out or zoomed version.

423
00:48:30.100 --> 00:48:36.249
Nikhil Prakash: And, and then I think Nikhil has a nice, causal experiment that also validates that what these graphics.

424
00:48:36.630 --> 00:48:41.380
Nikhil Prakash: Can you… maybe you said that and I wasn't confused. What's the input and what's the output of the probe?

425
00:48:41.500 --> 00:48:53.709
Nikhil Prakash: The input of the probe is the embedding of the visual tokens, and the output is the order pick, whether it's the first one in the image, the second one in the image, or the third one. So, three labels? Yes.

426
00:48:54.560 --> 00:49:05.700
Nikhil Prakash: Oh, you know what you should do? Like, for… Get another one. No, no. The same probe for horizontal, vertical, and, like, differences. Yeah. Yeah, yeah, yeah, yeah. So, for training, I mean.

427
00:49:06.520 --> 00:49:22.510
Nikhil Prakash: Yeah, I think I could… yeah, yeah, yeah. I think that's… I thought that's what you meant, initially. No, I meant that one. I forgot about the… The joint one? Yeah, yeah, okay, okay. Yeah, no, I think that's a good one. Yeah, we can try that. So what is the training data that we're…

428
00:49:24.430 --> 00:49:26.520
Nikhil Prakash: Memorize the addition again?

429
00:49:26.660 --> 00:49:28.139
Nikhil Prakash: It's this… yeah, so it's…

430
00:49:28.260 --> 00:49:38.320
Nikhil Prakash: I think Nikki will… will… will get to it, right? Showing that it's… that generalization, I'm not sure. The generalization is a mix of probably some of it.

431
00:49:38.470 --> 00:49:44.050
Nikhil Prakash: Is actually memorizing the absolute position of the object in the image.

432
00:49:45.150 --> 00:49:50.860
Nikhil Prakash: But some of it is actually the relative position between the objects. So whether it's the first one, the second one.

433
00:49:51.060 --> 00:49:53.330
Nikhil Prakash: So, if you want to…

434
00:49:53.440 --> 00:49:59.239
Nikhil Prakash: If you want to encode the order of things in your image, it can either be by just looking at the XY coordinate.

435
00:49:59.650 --> 00:50:07.429
Nikhil Prakash: Right? Yeah, so in your data, your training data, the red one is always in position 100. Yes. What's the difference between the samples and the training data?

436
00:50:07.820 --> 00:50:13.749
Nikhil Prakash: colors, different colors. Oh, just colors. Or in the other setting… So the first one is always in the… Yeah, the same location.

437
00:50:13.930 --> 00:50:17.819
Nikhil Prakash: But it does generalize this to different locations to some extent.

438
00:50:18.360 --> 00:50:24.310
Nikhil Prakash: Different locations coming in different orders. Different picks a little bit. Different X and Y. You ship them all. Yeah.

439
00:50:24.720 --> 00:50:29.120
Nikhil Prakash: On smaller squares, more tighter together.

440
00:50:29.680 --> 00:50:33.700
Nikhil Prakash: It's not sure it's, like, super related, but I would say, like.

441
00:50:34.590 --> 00:50:43.450
Nikhil Prakash: So right now, you're training the probe in a very specific way, where the squares are always at the same location, and then you're stress testing in different scenarios.

442
00:50:43.560 --> 00:50:50.310
Nikhil Prakash: I wonder whether things can go the other way around. You can… Actually train a stronger probe.

443
00:50:50.810 --> 00:51:02.950
Nikhil Prakash: by training on a more diversity of, like, square decisions, and then testing on another distribution. That's sort of Rohit's second suggestion, right? Is that right? No, that's okay. No, no, you're right, I think that's right.

444
00:51:03.140 --> 00:51:14.590
Nikhil Prakash: Yeah, so you could… that'd be… your data is a regularizer then, and you're trying to get the generalized data out of it, yeah. And so another question would be, I'm really curious about, like, whether, say, like.

445
00:51:14.840 --> 00:51:20.539
Nikhil Prakash: It works only on the VIT or CRAM, which is trained end-to-end for visual question answering.

446
00:51:24.320 --> 00:51:36.640
Nikhil Prakash: it works for, like, Dino or Cliff, or… I don't know, maybe not Cliff, but, like, Dino works, like… So, we did study Dino, but, this… the results that I've shown here, they generalize at least,

447
00:51:37.060 --> 00:51:40.220
Nikhil Prakash: At least with Gemma models? I don't know if Gemma is clear.

448
00:51:40.380 --> 00:51:59.540
Nikhil Prakash: Oh, I think Lava… Lava is… is Lava is clip. I think Lava is… Lava is separate. Lava is, like, using a pre-trained vision code. Oh, I see, then I don't know. But… Okay. But, yeah, so at least the results generalized to Gemma model?

449
00:51:59.810 --> 00:52:04.739
Nikhil Prakash: And most of the results are also generalized to pixel.

450
00:52:05.010 --> 00:52:06.190
Nikhil Prakash: Pix… pixel.

451
00:52:06.460 --> 00:52:17.729
Nikhil Prakash: But some of the experiments, I just still need to do on PixTrill. But so far, the experiments that I've done on Pixel seems to be okay, or consistent with the results that we have.

452
00:52:19.550 --> 00:52:21.930
Nikhil Prakash: Yeah. Someone else about it.

453
00:52:23.760 --> 00:52:34.070
Nikhil Prakash: augmentation, regularizing exposure, having to be sure that It's, like, one… We're catching the same mechanism.

454
00:52:34.490 --> 00:52:37.390
Nikhil Prakash: What if there's just, you know, there's a mechanism for it?

455
00:52:37.810 --> 00:52:39.669
Nikhil Prakash: safety relative ordering.

456
00:52:40.130 --> 00:52:45.800
Nikhil Prakash: horizontal places that are close to the middle, and, like, a different one for, like, it's, like, new that's positive or anything, but…

457
00:52:46.140 --> 00:52:51.430
Nikhil Prakash: You think there's, like… Yeah, you think there's, like, a risk of something like that?

458
00:52:53.090 --> 00:52:59.610
Nikhil Prakash: When you're training probes, like, the kinds of probes that we…

459
00:52:59.860 --> 00:53:05.179
Nikhil Prakash: Train, or is it more like training the probes with all different kinds of configuration?

460
00:53:05.980 --> 00:53:09.360
Nikhil Prakash: I mean, I guess in general.

461
00:53:09.600 --> 00:53:14.830
Nikhil Prakash: Because it's… there's, like, a range of, like, you know how specific versus how generalized.

462
00:53:14.980 --> 00:53:18.820
Nikhil Prakash: your… the kind of settings that you're trading these folks.

463
00:53:19.450 --> 00:53:22.090
Nikhil Prakash: You know, there's something to be considered there with respect to, like.

464
00:53:22.220 --> 00:53:26.069
Nikhil Prakash: The underlying mechanisms and how specialized or generalized they can be.

465
00:53:30.800 --> 00:53:32.640
Nikhil Prakash: Okay, so my…

466
00:53:35.670 --> 00:53:38.840
Nikhil Prakash: My answer to that would be if…

467
00:53:39.230 --> 00:53:46.770
Nikhil Prakash: The mechanism between these two types of, like, configuration are significantly different.

468
00:53:47.010 --> 00:53:53.319
Nikhil Prakash: Then maybe the… The training or the test accuracy of those probes will tell you that.

469
00:53:53.940 --> 00:53:56.409
Nikhil Prakash: That maybe you're not able to train

470
00:53:56.510 --> 00:54:00.009
Nikhil Prakash: A very good classifier, like, a very good pro.

471
00:54:00.440 --> 00:54:06.229
Nikhil Prakash: Either in terms of your training law… training accuracy, or in terms of your test accuracy.

472
00:54:07.060 --> 00:54:10.129
Nikhil Prakash: I think that's what I would assume.

473
00:54:10.930 --> 00:54:14.979
Nikhil Prakash: Makes sense, yeah, it just feels like it's very hard, like… Yeah. It's not grounded.

474
00:54:15.330 --> 00:54:17.160
Nikhil Prakash: I don't know what numbers for this thing.

475
00:54:21.500 --> 00:54:38.099
Aruna Sankaranarayanan: I had another question. It's… how similar are these ordering IDs to, like, the position embeddings of these tokens? I guess in the… in the vision language model, it's, it's probably less useful to look at this, but say in the case of…

476
00:54:38.220 --> 00:54:42.590
Aruna Sankaranarayanan: just like the LLM, just like in terms of the words.

477
00:54:42.830 --> 00:54:49.410
Aruna Sankaranarayanan: How similar are the vectors, like, from the… from just the position embeddings and these ordering, IDs?

478
00:54:52.260 --> 00:54:56.520
Nikhil Prakash: That image she just showed us… can you go back to the image you showed from the Discord?

479
00:55:05.750 --> 00:55:06.450
Nikhil Prakash: This one?

480
00:55:06.630 --> 00:55:07.979
Nikhil Prakash: Yeah, this one.

481
00:55:08.940 --> 00:55:11.110
Nikhil Prakash: No, no, no, no, the one you were just on.

482
00:55:11.990 --> 00:55:18.289
Nikhil Prakash: Like, my understanding of what this is saying is that, like, If we, like.

483
00:55:19.650 --> 00:55:26.220
Nikhil Prakash: If we try to deduce the algorithm, the linear probe is learning. It's learning, like, a Mix of, like.

484
00:55:26.450 --> 00:55:38.170
Nikhil Prakash: this is evidence that is just reading off the positional embedding, right? And then this faint little patch here is evidence that there's… there's actual generalization, or, like, relative work. Is that a correct interpretation of results? Okay.

485
00:55:39.410 --> 00:55:40.830
Aruna Sankaranarayanan: Got it. Thank you, yeah.

486
00:55:40.830 --> 00:55:42.160
Nikhil Prakash: And the same pro…

487
00:55:42.530 --> 00:55:55.109
Nikhil Prakash: Is that right? Yeah, I think it's clearest in, like, this third image, right? You see this… this little guy, which definitely corresponds to the relative. And then maybe if we trained a Rohage probe.

488
00:55:55.620 --> 00:55:59.660
Nikhil Prakash: And you could just get that by itself, right?

489
00:56:00.190 --> 00:56:05.390
Nikhil Prakash: Okay, you should name that. Yeah.

490
00:56:05.610 --> 00:56:10.369
Nikhil Prakash: I'm willing to share the curses.

491
00:56:12.720 --> 00:56:22.729
Nikhil Prakash: Yeah, I hope that, yeah, I think that answers the question. So I guess we can… I don't think the question. I mean, I want you to explain to me the picture.

492
00:56:22.870 --> 00:56:24.589
Nikhil Prakash: This figure? Yeah.

493
00:56:24.810 --> 00:56:38.129
Nikhil Prakash: Wait, so you understood the probe, right? Yes. So we just take that probe, so the probe was trained only on the square tokens, but now we apply the same probe across all the tokens, including the background tokens, and this is the result that we get.

494
00:56:38.540 --> 00:56:43.950
Nikhil Prakash: This is the test accuracy. Test accuracy. Yeah, test accuracy of the probe.

495
00:56:45.090 --> 00:56:46.190
Nikhil Prakash: I'm so busy.

496
00:56:47.230 --> 00:56:52.849
Nikhil Prakash: So, so, so assume that you, you're looking at the first vision token.

497
00:56:52.970 --> 00:56:54.420
Nikhil Prakash: The first vision token.

498
00:56:54.880 --> 00:56:57.430
Nikhil Prakash: In our image, that would be something like this.

499
00:56:58.190 --> 00:57:00.020
Nikhil Prakash: Just a blank batch.

500
00:57:01.600 --> 00:57:04.160
Nikhil Prakash: White patch?

501
00:57:04.580 --> 00:57:06.450
Nikhil Prakash: No, that's not in the training data.

502
00:57:06.960 --> 00:57:08.520
Nikhil Prakash: So how does the test data work?

503
00:57:09.320 --> 00:57:12.009
Nikhil Prakash: The test data, just take the…

504
00:57:13.860 --> 00:57:19.439
Nikhil Prakash: The input to the blog is a token embedding. Yes. During training, they only show

505
00:57:19.680 --> 00:57:30.270
Nikhil Prakash: tokens from the squares, exclusively. Different, different colors. Different colors, different colors. But it's always the exact same shape. Always the same.

506
00:57:31.680 --> 00:57:44.030
Nikhil Prakash: We also have other data sets where we… this is the simple data set of squares. It can be objects, it can be… For this setting, what we're talking about right now, the training is these three squares, just different colors. Yeah, okay, and then how does the test data look?

507
00:57:44.220 --> 00:57:55.450
Nikhil Prakash: The test data log also includes the embedding of all the tokens in the image. Oh, but it's still the same… the same shape? Yes, but it's… but it's the held-out set of images, so… Okay. But…

508
00:57:56.760 --> 00:58:04.850
Nikhil Prakash: But still there's three. Yeah, for example, in the object setting, when you have, like, tiny images, different objects, it can be completely different objects.

509
00:58:05.650 --> 00:58:08.190
Nikhil Prakash: But yeah, the same configuration, just different colors.

510
00:58:08.460 --> 00:58:26.369
Nikhil Prakash: Okay, and then we apply the probe on all the patches, all the tokens corresponding to each of the patches. And then we see that the test accuracy is not only high on the square tokens, which is… it… it is supposed… it was straight on. We see high test accuracy also on the background tokens.

511
00:58:27.330 --> 00:58:40.249
Nikhil Prakash: background tokens, which are sort of, like, creating a strip around the square which it was trained on. So it knows to say, no, there's… this is not the first position. It knows that this is the first position.

512
00:58:40.870 --> 00:58:44.199
Nikhil Prakash: So, so if you take that first patch that I was talking about.

513
00:58:44.790 --> 00:58:56.049
Nikhil Prakash: you take its corresponding embedding, and then use your probe, then it will say that, okay, this is the first position. Like, this is… this is the first one.

514
00:58:57.000 --> 00:59:00.470
Nikhil Prakash: 200 labels that it can predict.

515
00:59:00.650 --> 00:59:13.799
Nikhil Prakash: It's still 3 labels. Yeah, labels is still three. It's doing something silly, like, instead of doing… So this are the labels, like this? Yeah. Okay. Yeah, so this is first… labels are just first square, second square, third square, that's it. There's this three class… classifiers.

516
00:59:14.300 --> 00:59:26.340
Nikhil Prakash: Yeah, I understand, but for the Y1, what is the label? That was my question. What is the gold label? The Y patch? It doesn't have a label. It doesn't have it, yeah. On test time. We did it print it, and now we say, oh, interesting, but…

517
00:59:26.390 --> 00:59:35.460
Nikhil Prakash: So, like, something that will be very not surprising will be if the white background will have zero accuracy for all the people observed

518
00:59:36.150 --> 00:59:47.929
Nikhil Prakash: When you say accuracy, what is the gold label of the white background? Accuracy is whether the problem… we have three prompts. Each prompt is accurate if it predicts the right order. So the first prompt predicts first order.

519
00:59:48.050 --> 00:59:49.770
Nikhil Prakash: Right? I'm the first opto.

520
00:59:50.020 --> 01:00:08.689
Nikhil Prakash: Second prompt, predict, under second. So the background, what's the order of the background? It doesn't have order, but it seems like the vision encoder assigned an order to the background, although it doesn't… it doesn't supposed to have… So it's not accuracy. It is accuracy. Oh, I understand what your question is. So when you… when you give the white patch.

521
01:00:08.820 --> 01:00:14.069
Nikhil Prakash: When you give the white patch embedding as an input, the output is a softmax of 3 levels.

522
01:00:14.260 --> 01:00:31.340
Nikhil Prakash: whatever the accuracy is for the first token, like, the first class, they're plotting that. Yeah, that's the prediction. Assume that we have… I think I got the… It's 90% accurate, 90% confident. That's confidence, actually. 90% confident that it's coming from class 1.

523
01:00:31.420 --> 01:00:47.399
Nikhil Prakash: Yeah, so that's not accuracy. That's not accuracy, that's confidence. Okay. Yeah, it's first-class confidence. It's very confident that the white patch from above the red square… But maybe what they're showing here is… No, this is ac… this is accuracy. So we… let's assume that

524
01:00:47.430 --> 01:00:57.539
Nikhil Prakash: the ground truth is the first one, is the first square, for the first patch. No, what you're showing here is you're saying… you're saying accuracy, if every patch…

525
01:00:57.540 --> 01:01:11.599
Nikhil Prakash: the ground truth was zero, like, for the whole image. Yeah, exactly. Then you're showing accuracy there. That's not accuracy, because everything has the same label. Right, but it's just, it's just number of predictions. That's not accuracy.

526
01:01:12.940 --> 01:01:21.059
Nikhil Prakash: Show the accuracy of the bat to show that the background content actually contain information relevant to the plot.

527
01:01:21.670 --> 01:01:30.440
Nikhil Prakash: As if the probe was trained to predict the mechanism. Sorry, but I shouldn't treat it as accuracy, but look at it as the probe prediction. Yes.

528
01:01:30.730 --> 01:01:49.900
Nikhil Prakash: Yeah, the word accuracy was confusing. It implies there are labels. Wait, I'm not sure if I follow completely. This… you're saying this… you're also saying this is not accuracy? We… I think we can find a way to define it, but what we actually care about is what is the prediction of the problem in these tokens.

529
01:01:49.900 --> 01:01:51.699
Nikhil Prakash: Yeah, and we check it if it's…

530
01:01:51.880 --> 01:01:56.460
Nikhil Prakash: We check it by saying… let's assume all the labels were 1.

531
01:01:56.460 --> 01:02:04.720
Nikhil Prakash: Yes, exactly. So we are checking the accuracy here. Except that… except that this is subjection, to use the word accuracy.

532
01:02:04.720 --> 01:02:20.279
Nikhil Prakash: when what you're measuring is not whether it's right or not. Yeah, it's not like accuracy. Oh, okay. You get one if it's right and zero if it's wrong. And you're using accuracy just as units. Yeah, accuracy for a particular label.

533
01:02:20.400 --> 01:02:24.809
Nikhil Prakash: But, just, just, find it very well. What's the top?

534
01:02:24.940 --> 01:02:27.840
Nikhil Prakash: If you do maximum on the submax, what would be the prediction?

535
01:02:28.000 --> 01:02:31.749
Nikhil Prakash: Okay, so whether it will be 1 or… 90 times out of 100.

536
01:02:32.150 --> 01:02:35.260
Nikhil Prakash: The white label has been predicted as plus 1.

537
01:02:35.390 --> 01:02:36.330
Nikhil Prakash: See you bye.

538
01:02:37.670 --> 01:02:39.779
Nikhil Prakash: Call that predictable. 90 out of…

539
01:02:39.800 --> 01:02:58.840
Nikhil Prakash: Right? Like, if that's 90, like, let's say the accuracy… Prediction rate? I don't know. That sounds like accuracy to me. Well, but not to other readers, right? Okay, okay, okay, maybe we can change the wording there. Okay. The accuracy implies that… Right, it's like, if you have a high accuracy, that's good, but if you have a high prediction…

540
01:02:59.910 --> 01:03:01.230
Nikhil Prakash: of something that…

541
01:03:01.330 --> 01:03:26.320
Nikhil Prakash: a white token is predicted to be, like, the red… Okay, that makes sense, that makes sense. Okay. I think the problem with, recalling this prediction is we didn't expect this prediction. It's not like we tested the hypothesis… It's okay, that's still predicted. Yeah. Yeah, that's okay, that's okay. It's not that you predicted the model. Yeah, that's okay. Yeah, we should have a look at how we look at this. Okay, but, yeah, okay, so the main, main result

542
01:03:26.320 --> 01:03:28.180
Nikhil Prakash: Well, main takeaway is…

543
01:03:28.180 --> 01:03:40.440
Nikhil Prakash: that the ordering information of the square is not concentrated or localized only on the square tokens. It is diffused across background tokens as well. Can I say it again?

544
01:03:41.100 --> 01:03:47.619
Nikhil Prakash: Yes. That I said in the start. As if what? It's a surprise, like…

545
01:03:47.730 --> 01:03:53.280
Nikhil Prakash: You have objects, you… they have a certain order inside the image.

546
01:03:53.680 --> 01:04:00.350
Nikhil Prakash: But now it seems like that order is not associated with the object itself, but kind of like a strip around.

547
01:04:00.620 --> 01:04:04.820
Nikhil Prakash: Like, so it means that the pro…

548
01:04:06.380 --> 01:04:25.279
Nikhil Prakash: This reaction on the top is in position zero, right? The yellow… the yellow points show that, let's say, on the top, the probe predicts this to be in position zero. Yes, that's right, yeah. This is what it does, the prediction rate, yes.

549
01:04:25.320 --> 01:04:34.300
Nikhil Prakash: Do you have also a question? Yes, let's… What do you think about this, like…

550
01:04:34.570 --> 01:04:48.329
Nikhil Prakash: So, like, here's, like, a formal construction that I'm thinking about in reference to this. If I trained a linear probe on the same classification task, but the only inputs I gave it were the positional embeddings with, like.

551
01:04:48.410 --> 01:05:11.509
Nikhil Prakash: maybe, like, the Y values ablate… if, like, with the information related to Y ablated out. Like, no square, no nothing? Yeah. Like, I… for one, for this… for the specific task you described, which is not the rowhead version, it would do a perfect job, and then if you actually produce… reproduce this figure, it would look like perfect columns, right? So what's, like… I don't… what's the evidence that it's not… that your linear curve is not just doing that? Are you ready for a review?

552
01:05:11.720 --> 01:05:14.019
Nikhil Prakash: I think we should move to the causal experiments.

553
01:05:14.370 --> 01:05:17.570
Nikhil Prakash: Okay. Also, what's the… just last question, what layer is this on?

554
01:05:17.780 --> 01:05:31.319
Nikhil Prakash: Just the embedding of the… This is input… The last layer of the vision model. After the projector. Last layer of… after the MLP. After the projector. Input to language model backbone.

555
01:05:32.070 --> 01:05:34.630
Nikhil Prakash: Even past the last day of the vision. Yes.

556
01:05:35.470 --> 01:05:36.200
Nikhil Prakash: just…

557
01:05:36.340 --> 01:05:48.099
Nikhil Prakash: If you can go back, sorry to delay this one a bit more, but to maybe this way, like, that objection, if you trained on something like this same dataset, but the three squares are kind of on the top.

558
01:05:48.290 --> 01:06:05.350
Nikhil Prakash: then you would have examples where you have the exact same square, but in this case, it's the second one, and in this case, it's the first one. So it's exactly the same positional embeddings, but it's kind of… you're kind of showing that it's figuring out… Yeah, you also have that. I think just continue.

559
01:06:05.370 --> 01:06:08.689
Nikhil Prakash: Okay. So, this is probing result.

560
01:06:10.030 --> 01:06:32.700
Nikhil Prakash: This shows that, okay, maybe the order information is diffused, or it is at least present across a bunch of different tokens, but we all know that probing only is correlational results. It does not say anything about the causality, the information might still be present there, and it might not be used by the model. So we do this intervention experiment to actually show that the information present in the strip is causal in nature.

561
01:06:34.110 --> 01:06:38.050
Nikhil Prakash: And this is the two experiments that we did.

562
01:06:38.770 --> 01:06:46.549
Nikhil Prakash: so the difference between clear and counterfactual is that the order of left and right

563
01:06:46.980 --> 01:06:55.489
Nikhil Prakash: squares are reversed. In the clean one, red is the left one, and on the right, on the counterfactual, the right… right one is the right one.

564
01:06:55.750 --> 01:07:02.969
Nikhil Prakash: Okay, so if we do… If the model is creating this ordering ID information, then when we do

565
01:07:03.390 --> 01:07:12.759
Nikhil Prakash: This two patching, the color of the square should remain the same, while the ordering information should get reversed.

566
01:07:13.530 --> 01:07:20.459
Nikhil Prakash: And because of the reversal of the ordering ID, the final answer of the clean range should change from

567
01:07:20.740 --> 01:07:22.729
Nikhil Prakash: Red to blue.

568
01:07:22.900 --> 01:07:30.109
Nikhil Prakash: Here we're asking what is to the left of the green color. Okay, so that's pretty much the experiment. And we do intervention.

569
01:07:30.440 --> 01:07:35.869
Nikhil Prakash: On either only the square tokens, or on the entire strip tokens.

570
01:07:35.990 --> 01:07:37.090
Nikhil Prakash: Including this one.

571
01:07:37.210 --> 01:07:38.499
Nikhil Prakash: Including the square.

572
01:07:39.080 --> 01:07:43.840
Nikhil Prakash: So this is the result for patching in the entire strip.

573
01:07:44.430 --> 01:07:47.039
Nikhil Prakash: X axis is the embedding.

574
01:07:47.580 --> 01:07:53.549
Nikhil Prakash: embedding layer, and the later layers in the language model, backbone, and Y-axis is…

575
01:07:54.280 --> 01:07:56.419
Nikhil Prakash: Let's say it is prob… yeah.

576
01:07:56.760 --> 01:07:57.790
Nikhil Prakash: I, aye.

577
01:07:58.540 --> 01:08:02.979
Nikhil Prakash: So we see that, okay, even when you patch in the embedding layer.

578
01:08:03.190 --> 01:08:06.049
Nikhil Prakash: We have very high causal effect.

579
01:08:06.260 --> 01:08:12.650
Nikhil Prakash: That the model starts to think that the square to the left of the green square is actually blue.

580
01:08:13.390 --> 01:08:20.609
Nikhil Prakash: And not red. Right from layer 0. Right from layer 0. Right from layer 0, it says the incorrect answer.

581
01:08:21.010 --> 01:08:23.079
Nikhil Prakash: Yes. One second, one second.

582
01:08:24.140 --> 01:08:36.389
Nikhil Prakash: But if we do the patching on the square tokens only, not the entire strip, we don't see that effect. The model still thinks that the red is the correct answer and not the blue, if we do patching only on the square token.

583
01:08:37.220 --> 01:08:42.139
Nikhil Prakash: So… Okay, so, if you think of…

584
01:08:42.670 --> 01:08:48.150
Nikhil Prakash: these two results combinedly. What that means is there is causal effect

585
01:08:48.380 --> 01:08:53.709
Nikhil Prakash: associated with the background tokens. If you don't touch the background tokens, the model

586
01:08:54.640 --> 01:09:00.569
Nikhil Prakash: Model does not really have enough causal effect to be able to change the final output.

587
01:09:01.120 --> 01:09:08.670
Nikhil Prakash: So, essentially, what that means is that entire strip has ordering information, which is causal in nature. It is not only

588
01:09:08.790 --> 01:09:13.530
Nikhil Prakash: just there. Which is causally relevant to its accuracy, right? Exactly, yeah.

589
01:09:13.950 --> 01:09:15.479
Nikhil Prakash: Question? Yes.

590
01:09:15.720 --> 01:09:18.499
Nikhil Prakash: I, I, I wanted to ask what you fetch.

591
01:09:18.819 --> 01:09:20.349
Nikhil Prakash: What do I patch?

592
01:09:21.340 --> 01:09:25.889
Nikhil Prakash: And as I understand later, is that you passed the goal strip.

593
01:09:26.200 --> 01:09:36.079
Nikhil Prakash: We do two kinds. We either patch the square, or we patch the entire strip. And when you patch the square, is it all the form? Yes.

594
01:09:36.279 --> 01:09:42.539
Nikhil Prakash: It's only those four, and when we pass the strip, it's those four, plus all the…

595
01:09:44.029 --> 01:09:46.930
Nikhil Prakash: Background tokens as in that strip.

596
01:09:49.779 --> 01:09:53.399
Nikhil Prakash: I wasn't welcome this.

597
01:09:53.819 --> 01:09:55.599
Nikhil Prakash: This is more convincing.

598
01:09:56.800 --> 01:09:58.000
Aruna Sankaranarayanan: Question?

599
01:10:02.630 --> 01:10:05.950
Nikhil Prakash: Maybe it's just apart from that.

600
01:10:06.870 --> 01:10:10.020
Nikhil Prakash: Your internet connection is unstable.

601
01:10:10.020 --> 01:10:21.020
Aruna Sankaranarayanan: No, I… I had a question about, instead of patching the whole script, whole strip, if you just patch that red square, but onto a different position on that first column.

602
01:10:21.230 --> 01:10:27.999
Aruna Sankaranarayanan: You know, so you're essentially changing the position of the square. Yeah, I don't know, this is an annoying question.

603
01:10:28.000 --> 01:10:33.179
Nikhil Prakash: No, I think that's a good dynamic, that's a good question. We have that experiment in one of our later slides. I'm gonna show that.

604
01:10:33.180 --> 01:10:33.939
Aruna Sankaranarayanan: Okay, okay.

605
01:10:34.620 --> 01:10:39.069
Nikhil Prakash: So… Can we look at the graphs in the meantime, or if people ask some questions?

606
01:10:41.540 --> 01:10:46.239
Nikhil Prakash: When you do the patches, the red, the… One side to the other.

607
01:10:46.490 --> 01:10:51.370
Nikhil Prakash: the positional encoding embedded in those tokens when you're shooting people.

608
01:10:52.140 --> 01:10:53.300
Nikhil Prakash: Like, is it, like…

609
01:10:54.180 --> 01:10:59.850
Nikhil Prakash: It appears to have the same positional coatings of having been on the right side, and it's swapped over, or is it…

610
01:11:01.050 --> 01:11:06.240
Nikhil Prakash: the positional encoding. It's given the same positional encoding as if it was on the left side.

611
01:11:06.520 --> 01:11:08.800
Nikhil Prakash: You got that question?

612
01:11:09.920 --> 01:11:24.089
Nikhil Prakash: Maybe what do you mean by position? Because I think the previous discussion about the probing was, like, oh, like, maybe, like, the ordering is just comes from, like, the positional coding that you give the Hilkins before it enters all the stuff, and…

613
01:11:24.860 --> 01:11:27.290
Nikhil Prakash: I guess what I'm worried about is, like, is…

614
01:11:28.610 --> 01:11:34.020
Nikhil Prakash: Is, like, is the tokens, do they contain that relative position on coding?

615
01:11:34.970 --> 01:11:37.159
Nikhil Prakash: When you're swamped, when you do the matching.

616
01:11:39.080 --> 01:11:46.590
Nikhil Prakash: So it's almost an architectural question you're asking, because for some transformer implementations, The position I'm quoting is…

617
01:11:46.950 --> 01:11:53.099
Nikhil Prakash: directly added into… Yeah. …the… you know, the vectors, and in other transformer

618
01:11:53.240 --> 01:11:56.970
Nikhil Prakash: representations. The positional encoding has never materialized.

619
01:11:57.190 --> 01:12:00.829
Nikhil Prakash: It's just implicitly added as part of attention.

620
01:12:01.200 --> 01:12:05.330
Nikhil Prakash: Later on. And so, I think you're asking, oh, when you patch over.

621
01:12:05.580 --> 01:12:07.799
Nikhil Prakash: I… are you patching over from…

622
01:12:08.420 --> 01:12:17.790
Nikhil Prakash: Are you patching vectors where the positional encoding has already been added, or are you patching over vectors where the positional encoding has not been added yet?

623
01:12:17.890 --> 01:12:19.779
Nikhil Prakash: And it'll be added afterwards.

624
01:12:20.310 --> 01:12:27.830
Nikhil Prakash: So that's the question? Yeah, yeah. So the answer for that is, so even in the VIT, the vision encoder, there is positional information.

625
01:12:27.830 --> 01:12:43.129
Nikhil Prakash: And… You mean where? Before you patch or after you patch? At each… before, before… Yeah. Yeah, yeah, yeah. Right, before you patch. Yeah, before I patch. There is rope there in the version encoder as well. Okay. So, technically, it has… it has…

626
01:12:43.230 --> 01:12:52.719
Nikhil Prakash: access to that positional information. Okay. And the other thing is, here I'm showing not the… not only the result of the embedding layer, I'm showing the result across

627
01:12:52.830 --> 01:12:58.290
Nikhil Prakash: bunch of layers in the language model. So even in the language model, there is rope. Okay.

628
01:12:58.550 --> 01:13:01.529
Nikhil Prakash: So, it has access to both those push information.

629
01:13:06.160 --> 01:13:18.660
Nikhil Prakash: clarified the question a little bit. So I feel like, because transformer is permutation invariant, so there's no way, like, the model is able to figure out this test without position coding. Yes. So the question is more, like, how, like, these models

630
01:13:22.040 --> 01:13:30.430
Nikhil Prakash: encoding, whether it's just, like, hard-coded application, or actually learning a general notion of finding, like, object-relative associations with, like, the relative

631
01:13:30.430 --> 01:13:45.410
Nikhil Prakash: Yeah, yeah. It's more like how they're using this. Yeah, I think that's a good question. I think before I was… before we ran some results a few weeks back, my understanding was it's the second thing. The models are creating some form of, like.

632
01:13:45.420 --> 01:13:59.490
Nikhil Prakash: abstract relative positional information to actually do the binding. And that's what we have seen in most of the results in the language model space. But I think I have a few results in some of the later slides, where this

633
01:13:59.770 --> 01:14:11.050
Nikhil Prakash: this ordering ID business in the vision encoder seems a bit more complex, bit more complicated. It's not only those relative abstract position information, but

634
01:14:11.540 --> 01:14:16.239
Nikhil Prakash: The model seems to be encoding some form of absolute positional information as well.

635
01:14:17.720 --> 01:14:23.109
Nikhil Prakash: Even though they're using Rove, 2D Rope. Yeah. Okay, got it. It's also…

636
01:14:23.760 --> 01:14:30.000
Nikhil Prakash: At least my intuition is that it's easier for language models, specifically.

637
01:14:30.360 --> 01:14:32.819
Nikhil Prakash: Yeah, this, like, idea of positioning.

638
01:14:33.010 --> 01:14:36.709
Nikhil Prakash: Right. Specifically because of the, also masks.

639
01:14:37.050 --> 01:14:41.540
Nikhil Prakash: Like, you don't even need any, like, robot, you can just do language modeling.

640
01:14:41.720 --> 01:14:45.519
Nikhil Prakash: Without any positional encoding, and it works because of the events.

641
01:14:45.680 --> 01:14:47.790
Nikhil Prakash: It's another, like, extra signal that we have.

642
01:14:48.180 --> 01:14:54.079
Nikhil Prakash: Yeah, so that's, like, that's, like, one of the evidences that you can use to say that the models can create

643
01:14:54.340 --> 01:14:57.420
Nikhil Prakash: Or at least language models can create more abstract

644
01:14:57.710 --> 01:15:01.460
Nikhil Prakash: Sort of, like, positional information in its internal representation.

645
01:15:02.510 --> 01:15:06.770
Nikhil Prakash: the defense of the vision encoder, you can also say that

646
01:15:07.280 --> 01:15:10.020
Nikhil Prakash: Because it doesn't have this frozen masking.

647
01:15:10.430 --> 01:15:13.229
Nikhil Prakash: It could generate this, script-like.

648
01:15:13.780 --> 01:15:16.150
Nikhil Prakash: Yeah. Otherwise…

649
01:15:20.250 --> 01:15:24.610
Nikhil Prakash: Otherwise, each token can only see The one before it.

650
01:15:26.370 --> 01:15:32.250
Nikhil Prakash: I think Rope still allows it to do it, but I just mean that for language models, it's, like, more of an inductive bias. Yeah.

651
01:15:33.780 --> 01:15:34.880
Nikhil Prakash: Yeah, I agree with that.

652
01:15:36.270 --> 01:15:39.090
Nikhil Prakash: About the causal experience.

653
01:15:39.890 --> 01:15:52.399
Nikhil Prakash: What if the… the information is just spread, not only in this industry, but

654
01:15:52.550 --> 01:15:55.909
Nikhil Prakash: It just spread all over the image.

655
01:15:56.150 --> 01:16:01.350
Nikhil Prakash: And you need to change enough

656
01:16:01.910 --> 01:16:16.410
Nikhil Prakash: content in order to the language model, because the information is all around, not only with the strip. And you make… you make the experiment only with the script, so this is what we see.

657
01:16:16.980 --> 01:16:20.640
Nikhil Prakash: But maybe if you just… Good.

658
01:16:22.860 --> 01:16:26.419
Nikhil Prakash: Yeah, it's both been enough,

659
01:16:26.770 --> 01:16:30.799
Nikhil Prakash: Yeah, so I think I would say that

660
01:16:30.910 --> 01:16:36.830
Nikhil Prakash: I mean, yeah, you can just take a few tokens around the square token, and maybe if you pass that, it might work.

661
01:16:36.930 --> 01:16:44.190
Nikhil Prakash: That's okay, but I think the probing results show that the result is diffused across the strip, so I think it becomes more methodological to catch the entire strip.

662
01:16:45.720 --> 01:16:50.420
Nikhil Prakash: Because there is no… For mobility?

663
01:16:51.410 --> 01:16:59.059
Nikhil Prakash: Various in the… in the data set. Like, all… all the… all the squares are in the same position.

664
01:16:59.840 --> 01:17:03.410
Nikhil Prakash: But I think… If there was…

665
01:17:03.410 --> 01:17:22.700
Nikhil Prakash: That becomes more easier thing, so if I would have trained it across a bunch of different squares in, like, different positions of the Y axis, then it becomes much more expected that, okay, you will see the strip.

666
01:17:23.200 --> 01:17:27.899
Nikhil Prakash: But now that we are only training it on just one position.

667
01:17:28.170 --> 01:17:32.039
Nikhil Prakash: And even then, we are seeing the effect on the entire ship. I think that's more surprising.

668
01:17:34.860 --> 01:17:43.430
Nikhil Prakash: probing the probes, it's much easier for the probe to take into account only the X X.

669
01:17:44.630 --> 01:17:47.370
Nikhil Prakash: Because it's always the same place.

670
01:17:48.530 --> 01:17:56.269
Nikhil Prakash: In the Y, it's always in the same place, so the problem doesn't need the information of the YX.

671
01:17:58.100 --> 01:17:59.010
Nikhil Prakash: Okay.

672
01:17:59.820 --> 01:18:01.980
Nikhil Prakash: I think a nice place I would be.

673
01:18:02.220 --> 01:18:06.510
Nikhil Prakash: to do the PG experiment with…

674
01:18:06.690 --> 01:18:09.750
Nikhil Prakash: A strip, or whatever, like, that has…

675
01:18:09.900 --> 01:18:12.020
Nikhil Prakash: The exact same number of tokens.

676
01:18:12.210 --> 01:18:15.939
Nikhil Prakash: as the… the original three-fits Department?

677
01:18:16.180 --> 01:18:29.429
Nikhil Prakash: But, like, shaped otherwise. Like, it can be, I don't know, instead of, for example, vertical, it can be diagonal, it can be square over the square, but the same number of tokens.

678
01:18:30.410 --> 01:18:34.230
Nikhil Prakash: And what will that tell us? That it's not… just…

679
01:18:34.470 --> 01:18:41.199
Nikhil Prakash: of the portion of image information that he took for getting his name. That's it, but…

680
01:18:41.660 --> 01:18:44.779
Nikhil Prakash: Exactly that, like, structure inside.

681
01:18:47.480 --> 01:18:52.149
Nikhil Prakash: Okay, as a baseline, maybe, you know, you can use it, okay, okay.

682
01:18:53.530 --> 01:19:02.999
Nikhil Prakash: like, a technical question in my stock suggestion. So, technical question is, like, these are 4 tokens in your deck square.

683
01:19:03.140 --> 01:19:06.300
Nikhil Prakash: And you are giving… The top two and the bottom one.

684
01:19:06.930 --> 01:19:10.139
Nikhil Prakash: Right? Like, in the red square of the first.

685
01:19:10.500 --> 01:19:13.920
Nikhil Prakash: So the two tokens from top and two tokens from local.

686
01:19:14.400 --> 01:19:16.889
Nikhil Prakash: So, like, a good neural network should learn.

687
01:19:17.450 --> 01:19:23.669
Nikhil Prakash: That, okay, anything that is, like, stacked on top of, like, these tokens should be in, like, class 1.

688
01:19:24.730 --> 01:19:28.049
Nikhil Prakash: So… like, let's say the Rohit…

689
01:19:28.400 --> 01:19:37.269
Nikhil Prakash: Shenyu Pro, was just to end on this and that, like, only these two cases, but different colors.

690
01:19:38.260 --> 01:19:50.130
Nikhil Prakash: Okay. Same probe. Okay. Like, same positions. Yeah, yeah, yeah, yeah, no changes in position. So, if my hypothesis is correct, then you should see only the…

691
01:19:50.660 --> 01:20:00.829
Nikhil Prakash: like this, the position that you're seeing, only, like, in a… in a L shape.

692
01:20:01.060 --> 01:20:02.000
Nikhil Prakash: in,

693
01:20:02.090 --> 01:20:21.729
Nikhil Prakash: L shape, yes? Yeah, because the first… Yeah, the first, yeah. So that tells it's probably not positional embedding, it's something… sorry, probably not ordering representation, it's more positional embedding. If you don't see it, if you still see this happening, like, where you're seeing, like, all these entire columns coming out to be

694
01:20:23.220 --> 01:20:29.249
Nikhil Prakash: In the test, if you're seeing the same structure, then maybe I'll be more convinced that, like, it's an ordering process.

695
01:20:30.430 --> 01:20:33.070
Nikhil Prakash: So, if it is L…

696
01:20:34.930 --> 01:20:48.090
Nikhil Prakash: Yeah. Then I think just… Then it is just, like, absolute… it is encoding… learning X and Y coordinates. Yeah. It is learning X and Y coordinates. Okay.

697
01:20:48.750 --> 01:20:49.820
Nikhil Prakash: Okay.

698
01:20:51.840 --> 01:21:01.199
Nikhil Prakash: Like, the other probe, when you still see, you have a different arrangement of the image. You still see the street in, like, the same location as the original probe.

699
01:21:01.370 --> 01:21:08.890
Nikhil Prakash: Yeah. So that's the overfit to the XY reporting. Yeah. But you do see some generalization, also for the objects.

700
01:21:09.010 --> 01:21:16.629
Nikhil Prakash: So, that… like, I have another counterpoint for you. Like, it's probably, like, this is a square top.

701
01:21:18.870 --> 01:21:20.489
Nikhil Prakash: Like, a square representation.

702
01:21:20.770 --> 01:21:25.300
Nikhil Prakash: Right, but it does… I don't know. But it does know… It's like a square.

703
01:21:25.470 --> 01:21:27.010
Nikhil Prakash: To extract the order.

704
01:21:27.280 --> 01:21:46.849
Nikhil Prakash: on the square. Yeah. So it knows, okay, oh, this is a background, and I actually need to… in the background, I need to overfit with XY. Yeah. But if I have a square, then suddenly I have more information. Yeah. And that's the information it uses. So there is some information of the square that is relevant to the information. Okay, so…

705
01:21:46.950 --> 01:21:52.840
Nikhil Prakash: that makes sense. And, like… It'd be nice, like, if…

706
01:21:53.030 --> 01:21:55.360
Nikhil Prakash: It's, like, slightly more strong, I think. Yeah.

707
01:21:55.890 --> 01:22:01.730
Nikhil Prakash: Also, second thing is when you do patching from your entire white slab.

708
01:22:02.210 --> 01:22:12.780
Nikhil Prakash: So this is, again, autoregressive in language space, correct? All these tokens. So the bottom token of the red probably already knows it comes from the column of the red token.

709
01:22:13.260 --> 01:22:19.399
Nikhil Prakash: But the top white patch might not know. No way, it still knows, because it comes from digital.

710
01:22:23.520 --> 01:22:26.570
Nikhil Prakash: Good clean. No, that's… yeah, that's it, yeah.

711
01:22:27.040 --> 01:22:32.430
Nikhil Prakash: Okay, anyways, I think it's, shaquille, practical question.

712
01:22:32.570 --> 01:22:35.970
Nikhil Prakash: How many more slides do you have?

713
01:22:38.430 --> 01:22:42.279
Nikhil Prakash: I think that's one of the main reasons.

714
01:22:42.940 --> 01:22:46.910
Nikhil Prakash: No, actually, there's one more main… other thing. One is into the other main result?

715
01:22:47.570 --> 01:22:58.550
Nikhil Prakash: So, should I take 5 more minutes? Sure. Okay, so… okay, so those are the results for showing that the vision encoder already encodes this ordering information.

716
01:22:59.030 --> 01:23:07.609
Nikhil Prakash: of the squares, even before the language model gets into the picture. So then what's the… then the question is, what's this language model doing?

717
01:23:07.790 --> 01:23:14.680
Nikhil Prakash: Is it just using that piece of information, or does it create its own set of… Like, ordering info.

718
01:23:15.990 --> 01:23:17.890
Nikhil Prakash: Information of what is happening.

719
01:23:18.400 --> 01:23:21.999
Nikhil Prakash: So, to answer that question, we did another patching experiment.

720
01:23:22.150 --> 01:23:25.470
Nikhil Prakash: Here, the idea is to basically ablate out or remove.

721
01:23:28.000 --> 01:23:46.169
Nikhil Prakash: all the ordering information from an encoder. So the way we do that is, like, we create a synthetic image, like a representation of a synthetic image, where the representation of each square is… is, like, taken from the…

722
01:23:46.860 --> 01:23:48.589
Nikhil Prakash: Like, a different image.

723
01:23:48.750 --> 01:23:54.689
Nikhil Prakash: with the same square placed in the middle, and that's the only square. So the idea here is.

724
01:23:55.010 --> 01:24:06.140
Nikhil Prakash: If the model is encoding that this is the first square, second square, third square, then if we just take their representation from a single square image, then…

725
01:24:06.900 --> 01:24:11.809
Nikhil Prakash: the representation of each square should just be… just be saying that, okay, I'm the first one, I'm the first one, I'm the first one.

726
01:24:12.950 --> 01:24:20.500
Nikhil Prakash: And we also patched the activations of the backward… sorry, not backward… background tokens to destroy…

727
01:24:20.610 --> 01:24:26.030
Nikhil Prakash: the information in those background tokens. Okay? So if we do that batch… patching.

728
01:24:26.200 --> 01:24:32.489
Nikhil Prakash: Those set of patching. The… the ordering information from the vision encoder should get destroyed.

729
01:24:32.930 --> 01:24:35.549
Nikhil Prakash: And if you do that, the behavioral performance decreases.

730
01:24:35.740 --> 01:24:37.420
Nikhil Prakash: It decreases significantly.

731
01:24:38.210 --> 01:24:50.750
Nikhil Prakash: Which is kind of expected. But the point here is it still is above the chance. Chance is 33%. It's not near 33%. So that means language model does seem to do something.

732
01:24:53.440 --> 01:25:03.610
Nikhil Prakash: So then we check if the models, or the language model does create its own, like, ordering information or not, by just doing the same piece of experiment. But now, the

733
01:25:04.370 --> 01:25:07.330
Nikhil Prakash: The ordering information from the version encoder are destroyed.

734
01:25:08.120 --> 01:25:14.450
Nikhil Prakash: And then, in this experiment, we only need to do the patching on the squares, we don't need to do the patching on the entire strip.

735
01:25:15.210 --> 01:25:17.519
Nikhil Prakash: And it's the same experiment, and this is the result.

736
01:25:18.900 --> 01:25:25.360
Nikhil Prakash: What this shows is, even the language model does seem to create its own ordering information in some of its middle…

737
01:25:25.960 --> 01:25:27.040
Nikhil Prakash: Middle layers.

738
01:25:28.820 --> 01:25:34.069
Nikhil Prakash: Yeah, so that's… So the x-axis in the previous experiment.

739
01:25:34.200 --> 01:25:44.409
Nikhil Prakash: with layers of the vision encoder, and then the x-axis… No, this is both, language… Both the previous experiment and this experiment, but they're both language models. Yes, language model by both.

740
01:25:44.630 --> 01:25:47.819
Nikhil Prakash: There, we saw the effect right from the start.

741
01:25:47.980 --> 01:25:58.900
Nikhil Prakash: Okay. Because the information was provided by the vision encoder itself. So the input of the language model backbone already has the information. I see. But here, that information is not present.

742
01:25:59.230 --> 01:26:04.989
Nikhil Prakash: So, we only see the alignment only in the middle-ish layer of the language model barrier. I understand.

743
01:26:05.150 --> 01:26:19.209
Nikhil Prakash: It is much more similar to the first example with bleeding, with the upper leaves in box A. This is the mechanism that drives that behavior, so it's the same… So this is the language model finding? Yeah. Yeah.

744
01:26:19.210 --> 01:26:33.610
Aruna Sankaranarayanan: But is this a language model dealing with the tokens, the language tokens, or is this a language model dealing with the spatial token information? Like, you're saying that there's some, transformation of that representation which also exists in the LM backbone?

745
01:26:34.740 --> 01:26:43.199
Nikhil Prakash: It's the second one, and I'm not sure about the transformation. Transformation occurs before the LLM backbone, which is done by the projector.

746
01:26:43.780 --> 01:26:55.870
Aruna Sankaranarayanan: So it's just, like, some… it's retaining some information, but from the vision model, but in the LM space as well. The spatial information from the pictures, from the images are retained in the LM space as well.

747
01:26:57.830 --> 01:27:03.519
Nikhil Prakash: So that's the general claim, that the vision encoder is encoding the ordering information.

748
01:27:03.750 --> 01:27:10.330
Nikhil Prakash: But let's say if you get rid of that information coming from the vision encoder, the language model also…

749
01:27:12.560 --> 01:27:15.270
Nikhil Prakash: Generates its own ordering information.

750
01:27:16.400 --> 01:27:26.650
Aruna Sankaranarayanan: Got it, but the information generated by the language model is pertaining to the spatial information, right? This is not from the language tokens. This is, again, like, related to the…

751
01:27:26.910 --> 01:27:28.399
Aruna Sankaranarayanan: Yeah, yeah, facial, okay.

752
01:27:28.400 --> 01:27:29.940
Nikhil Prakash: We are operating on the visual token.

753
01:27:30.500 --> 01:27:31.439
Aruna Sankaranarayanan: Okay, okay.

754
01:27:36.110 --> 01:27:42.130
Nikhil Prakash: So that's… that's the second main result, that even the language model does seem to create its own outing information.

755
01:27:42.480 --> 01:27:46.820
Nikhil Prakash: It's a little bit better left to right than it is up and down. Yes, that is right.

756
01:27:47.200 --> 01:28:06.659
Nikhil Prakash: I had a really quick question. So, is the assumption that the LLM does this because you are basically preventing the vision of the British from doing that, or do you think that they both do it? Because I think you could validate this by doing the patching of the original LLM states

757
01:28:06.840 --> 01:28:09.699
Nikhil Prakash: Well, you're not matching the vision encoder.

758
01:28:12.880 --> 01:28:29.560
Nikhil Prakash: this LLM states in the vision encoder to test whether the LLM is having a compensatory behavior, right, or not. Say again? Yeah, I did not follow the experiment. Yeah, so kind of like, right now you're fetching the vision, right? So you know that in the vision, there's nothing going on in terms of positions, but now you're wondering, is…

759
01:28:29.560 --> 01:28:46.590
Nikhil Prakash: are these results from the LLM emerging because the LLM is compensating from the Latin positional information, or are there also before? Yes. So then you could try to get the original activations from the previous LLM and patch them over the current ones.

760
01:28:46.640 --> 01:28:48.520
Nikhil Prakash: While also patching the vision.

761
01:28:50.000 --> 01:28:51.220
Nikhil Prakash: part, right?

762
01:28:51.570 --> 01:29:06.599
Nikhil Prakash: So if the previous LLM where the vision was normal was not doing this, because it saw kind of like, oh, the vision part is already handling this, so I don't have to do it, now you would see that basically patching both would cause, like, the performance to go to zero, or something like that, right?

763
01:29:07.970 --> 01:29:19.830
Nikhil Prakash: I'm still not 100% sure. If you patch in… yeah, I think we can talk about it. It's not a very, simple thing to do. But I think that's a good question. Here, what we are arguing is, it is…

764
01:29:20.400 --> 01:29:25.040
Nikhil Prakash: the second case, I think what you said, that both of them are happening simultaneously.

765
01:29:25.190 --> 01:29:33.649
Nikhil Prakash: Right. It's not the case, like, it's not just the case that the language model mechanism comes into the picture only when the…

766
01:29:34.180 --> 01:29:38.270
Nikhil Prakash: the information from the vision encoder is updated. I think it is there

767
01:29:38.510 --> 01:29:46.049
Nikhil Prakash: In normal cases as well. But we have not done a very thorough experiment for that. I think the argument that we are using is

768
01:29:46.740 --> 01:29:49.470
Nikhil Prakash: You see the result for the below and above here?

769
01:29:50.360 --> 01:29:52.539
Nikhil Prakash: Slightly growing. Yeah.

770
01:29:52.730 --> 01:29:58.179
Nikhil Prakash: And we say that that growing part is coming from the language model ordering ID. Makes sense.

771
01:29:59.220 --> 01:30:01.809
Nikhil Prakash: But maybe we can talk about their experiment later.

772
01:30:02.650 --> 01:30:03.770
Nikhil Prakash: So yes.

773
01:30:04.540 --> 01:30:07.699
Nikhil Prakash: What? I didn't understand. Why did you have 6… not here, sorry, here.

774
01:30:07.840 --> 01:30:11.449
Nikhil Prakash: The next one? This one? Next one, next one. Next one. No, you had, like…

775
01:30:12.550 --> 01:30:22.909
Nikhil Prakash: I had what? Keep going. It's actually not an intervention accuracy, it's actually a probability. The probability to save it, or the probability to save it. So it adds up to one?

776
01:30:23.460 --> 01:30:35.150
Nikhil Prakash: It may not add to one. I mean, this is distributed, yeah. We did not do a softmax on color tokens. We've just take the probability of this particular color.

777
01:30:36.060 --> 01:30:37.439
Nikhil Prakash: It may not add to 1.

778
01:30:40.270 --> 01:30:53.560
Nikhil Prakash: But, so this is just the most simple thing. You make the model, you have the vocabulary distribution, all the logic, pick the logic of the red, pick the logic of the blue. But there's also another 50,000… Yeah, they would have some logic, right?

779
01:30:54.150 --> 01:31:05.419
Nikhil Prakash: Logit value. Okay. So it might not add to 1. The logit of blue and logit of red might… it should be close to 1, but it may not be exactly 1. Why is that 0 to 1?

780
01:31:06.920 --> 01:31:11.959
Nikhil Prakash: Like, the scale, why is it 0 to 1 if it's just logical? Was it probability.

781
01:31:13.200 --> 01:31:20.830
Nikhil Prakash: You do the softmax first, then you read off the probabilities corresponding to the literal token, red, and the literal token blue, right? Yeah. Okay. Yeah, that's it.

782
01:31:21.560 --> 01:31:28.220
Nikhil Prakash: Yeah, the IA was incorrect, but I think the result will still hold. Yeah. Same trend? Yeah.

783
01:31:28.540 --> 01:31:29.210
Nikhil Prakash: Yep.

784
01:31:29.370 --> 01:31:44.630
Nikhil Prakash: Okay, so the main… yeah, so I think that's the… one of the main arguments, that both vision encoder and the language model are creating or forming this ordering information, which is being used to do spatial reasoning tasks and vision… VLM.

785
01:31:45.900 --> 01:31:49.779
Nikhil Prakash: And then we use this insight to improve the performance of

786
01:31:50.270 --> 01:31:57.970
Nikhil Prakash: one of the models on a benchmark called WhatsApp, which is okay, I think. Yeah, we were able to improve the performance.

787
01:31:58.370 --> 01:32:06.429
Nikhil Prakash: significantly better than just using a random baseline, which is just to pick up random directions. I mean…

788
01:32:06.600 --> 01:32:09.650
Nikhil Prakash: Yeah, it depends how you call your significance. Like, 15 to 50?

789
01:32:11.450 --> 01:32:13.040
Nikhil Prakash: Is that the jump you did?

790
01:32:13.430 --> 01:32:21.310
Nikhil Prakash: So, read this number, I think. Original model performance was 90%, and then we got it till 95.

791
01:32:22.030 --> 01:32:29.340
Nikhil Prakash: Yeah, that works. But this is unsupervised. We don't need any supervision. You can do it on any model, in any image. All you need to do is just…

792
01:32:29.810 --> 01:32:32.630
Nikhil Prakash: Train these probes, which is sort of…

793
01:32:32.740 --> 01:32:38.850
Nikhil Prakash: should figure out whether it's the first object, second object, or third object in the image, and that's it. And you can use those

794
01:32:39.210 --> 01:32:41.180
Nikhil Prakash: Probes as a steering vector.

795
01:32:41.400 --> 01:32:42.270
Nikhil Prakash: Oh.

796
01:32:43.020 --> 01:32:52.960
Aruna Sankaranarayanan: What's the difference… what's the difference between the accuracy and the percent corrected failure? Because the percent-corrected failure seems, like, really huge, right? Improvements in both cases. I mean, much higher.

797
01:32:53.200 --> 01:33:00.460
Nikhil Prakash: Yeah, so this is 90%, so 10% are incorrect. Among those 10%, we are able to fix 50%.

798
01:33:01.430 --> 01:33:08.529
Nikhil Prakash: How many of the incorrect answers we were able to… Flip, yeah, to correct.

799
01:33:09.060 --> 01:33:09.700
Aruna Sankaranarayanan: Okay.

800
01:33:10.060 --> 01:33:11.560
Nikhil Prakash: repeat that with a looker.

801
01:33:13.480 --> 01:33:20.149
Nikhil Prakash: So the… these results are on the non-synthetic images, right? Yeah, what's up? And the probe was also…

802
01:33:21.600 --> 01:33:24.300
Nikhil Prakash: Yeah, yeah.

803
01:33:24.990 --> 01:33:29.120
Nikhil Prakash: Do you know how much, when 2.5 would do on that data set?

804
01:33:30.550 --> 01:33:32.829
Nikhil Prakash: Like, you did 8 billion, right? Or $7 billion.

805
01:33:33.440 --> 01:33:34.260
Nikhil Prakash: Perfect.

806
01:33:35.310 --> 01:33:36.160
Nikhil Prakash: Okay.

807
01:33:37.000 --> 01:33:51.969
Nikhil Prakash: Okay, so one more piece of experiment, which we have not put into the paper, but I've been doing in the past few weeks, is something that we… that has come into the discussion as well, which is… okay, so in the language model space, previous works has shown that

808
01:33:52.890 --> 01:33:59.580
Nikhil Prakash: This… this ordering information is encoding the relative positional information. Something like this is the first…

809
01:33:59.760 --> 01:34:17.099
Nikhil Prakash: object, this is the first box. So we asked the same question, for the ordering information generated by the vision encoder, now that we know that vision encoder also generated these kinds of information. So for that, we… we also did causal experiment with slightly different task.

810
01:34:18.170 --> 01:34:20.180
Nikhil Prakash: So, this is the task.

811
01:34:20.310 --> 01:34:27.289
Nikhil Prakash: two differences that you should notice. First, in the image, the squares are…

812
01:34:30.550 --> 01:34:32.969
Nikhil Prakash: it a little bit. It's not equally spread.

813
01:34:33.300 --> 01:34:43.550
Nikhil Prakash: The squares are actually shifted towards the left-hand side in this particular case, and the question that we ask is, the leftmost… the color of the leftmost square is… and that's…

814
01:34:44.700 --> 01:34:46.890
Nikhil Prakash: Just would give us… green.

815
01:34:47.330 --> 01:34:56.999
Nikhil Prakash: And we do patching experiments again, and this is the counterfactual sample, which you can think of it as, like, just a mirror image of your clean image.

816
01:34:57.670 --> 01:35:00.449
Nikhil Prakash: One important point is.

817
01:35:00.670 --> 01:35:06.479
Nikhil Prakash: the exact XY coordinate of this blue square is the same between these two samples.

818
01:35:07.040 --> 01:35:07.880
Nikhil Prakash: Okay?

819
01:35:08.860 --> 01:35:20.539
Nikhil Prakash: Okay, and then we basically take the… we pass the strip from this particular area, region, to this particular region, and this particular strip to this particular strip.

820
01:35:22.290 --> 01:35:24.940
Nikhil Prakash: Okay. Now, there are two hypotheses.

821
01:35:25.040 --> 01:35:31.800
Nikhil Prakash: Whether the model is Forming relative, or whether the model is forming absolute positional information.

822
01:35:32.670 --> 01:35:36.640
Nikhil Prakash: If it is forming relative position information, this is what would happen.

823
01:35:38.150 --> 01:35:45.739
Nikhil Prakash: Here, it will say that this is the first square. So after patching, this blue square will become the first square.

824
01:35:45.900 --> 01:35:49.130
Nikhil Prakash: And this green square will become the third square.

825
01:35:49.490 --> 01:35:56.539
Nikhil Prakash: Okay? So, the answer to this question will change from green to blue, if it is relative.

826
01:35:57.570 --> 01:36:06.270
Nikhil Prakash: Now, the rational for absolute is something like this. Let's say it is not encoding first, second, or third, it is encoding,

827
01:36:09.210 --> 01:36:13.249
Nikhil Prakash: 3.1, 3… 1, 3, 2, 3, 3.

828
01:36:13.480 --> 01:36:21.479
Nikhil Prakash: And this would be 3, 3, 4, 3, 5, okay? So after we do the batching, this green will become 3,5.

829
01:36:21.710 --> 01:36:27.820
Nikhil Prakash: the brown will remain 3, 2, and the blue will remain 3, so we have…

830
01:36:29.260 --> 01:36:32.189
Nikhil Prakash: 3, comma 5, 3, 2, 3.

831
01:36:33.030 --> 01:36:35.730
Nikhil Prakash: After the patching, if we are assuming absolute position.

832
01:36:35.950 --> 01:36:39.779
Nikhil Prakash: So then, the leftmost square should be 3,2, the brown one.

833
01:36:40.120 --> 01:36:42.810
Nikhil Prakash: That's why they expected color.

834
01:36:43.470 --> 01:36:48.029
Nikhil Prakash: After the intervention, if the position information is including the absolute one, it's the brown one.

835
01:36:49.230 --> 01:36:52.079
Nikhil Prakash: That's a little bit diff… does that make sense?

836
01:36:53.150 --> 01:36:53.980
Nikhil Prakash: Okay.

837
01:36:54.840 --> 01:37:01.559
Nikhil Prakash: So, this is the result, if you do the patching of… Just the vision embeddings.

838
01:37:01.910 --> 01:37:03.570
Nikhil Prakash: Vision token embeddings.

839
01:37:05.770 --> 01:37:08.990
Nikhil Prakash: We see that, and this is probability.

840
01:37:09.850 --> 01:37:12.020
Nikhil Prakash: It is sort of like… yeah.

841
01:37:12.620 --> 01:37:16.600
Nikhil Prakash: like, distributed… Equally between brown and blue.

842
01:37:18.810 --> 01:37:22.240
Nikhil Prakash: And that's why I say, and this… this is only for this…

843
01:37:23.790 --> 01:37:34.309
Nikhil Prakash: Clean images shifted towards left, but the same, like, similar results also holds when the squares in the clean images are actually shifted towards right, or top, or top bottom.

844
01:37:37.080 --> 01:37:42.029
Nikhil Prakash: And that's why the… what I think, in the vision encoder, it's not that clear.

845
01:37:42.450 --> 01:37:52.230
Nikhil Prakash: as clear as what we have seen in the language model space, which… where we have seen that the model clearly uses this relative information. But in the vision space, I think it seems to be including

846
01:37:54.420 --> 01:37:58.800
Nikhil Prakash: Both this kind of, like, absolute and relative position.

847
01:37:58.940 --> 01:38:05.389
Nikhil Prakash: Did you say absolute? It's the color, or something else? When I say absolute, I mean coordinate, like 3, X comma Y.

848
01:38:06.810 --> 01:38:13.149
Nikhil Prakash: And when I say relative, I mean a bit more abstract. It's like first object, sorry, first square, or the second square, or the third square.

849
01:38:13.590 --> 01:38:14.850
Nikhil Prakash: Can you repeat what you mean?

850
01:38:16.150 --> 01:38:17.140
Nikhil Prakash: The absolute?

851
01:38:17.270 --> 01:38:19.619
Nikhil Prakash: just the coordinates, X comma Y.

852
01:38:21.300 --> 01:38:23.500
Nikhil Prakash: But you're also patching the white space.

853
01:38:23.620 --> 01:38:25.570
Nikhil Prakash: From much earlier, right?

854
01:38:26.250 --> 01:38:35.419
Nikhil Prakash: We are patching the white space on the right side. We're also patching the first column tokens. They might have a different relative position.

855
01:38:37.560 --> 01:38:39.159
Nikhil Prakash: They might have a…

856
01:38:39.520 --> 01:38:45.939
Nikhil Prakash: So I think we went with the assumption that the strip has… That strip will always encode everything blue. Yeah.

857
01:38:47.470 --> 01:38:53.680
Nikhil Prakash: It made sense when you're doing probes, because they're all equidistant.

858
01:38:54.350 --> 01:39:02.830
Nikhil Prakash: Maybe you should also do probes here and seeing probes for other events. We saw that it's always distributing them according to the location.

859
01:39:02.930 --> 01:39:04.750
Nikhil Prakash: So even if you shift everything.

860
01:39:04.930 --> 01:39:06.400
Nikhil Prakash: Venice peopleship.

861
01:39:06.930 --> 01:39:09.250
Nikhil Prakash: Or if you have, like, now,

862
01:39:09.310 --> 01:39:27.899
Nikhil Prakash: 3 in one, you would have, like, a rectangular, rectangular, rectangular, right? I think that… so I don't know if it'll work, but I think that there are, like, naive things that you could do to scramble it even further to have a totally dead fish. You could say things like…

863
01:39:28.490 --> 01:39:29.639
Nikhil Prakash: We're gonna just…

864
01:39:30.330 --> 01:39:42.180
Nikhil Prakash: do a Fisher Yates shuffle of all the patches, right? So that, you know, that they're white anyway. You say, oh, I'll patch white. I know that white has a lot of pollution and stuff in it. What about…

865
01:39:42.320 --> 01:39:44.260
Nikhil Prakash: We'll just shuffle all the patches.

866
01:39:44.570 --> 01:39:49.070
Nikhil Prakash: They don't have any information anyway. If they have any, we're just gonna scramble it. It's gone.

867
01:39:49.460 --> 01:39:52.870
Nikhil Prakash: Right? And then, and then just put in exactly the…

868
01:39:53.220 --> 01:39:56.340
Nikhil Prakash: This featureless red thing, this featureless blue thing.

869
01:39:56.570 --> 01:40:01.900
Nikhil Prakash: On a scrambled, you know, field of… shuffled white.

870
01:40:02.410 --> 01:40:13.169
Nikhil Prakash: And then… and then if your model can still tell what's what, then… then I'm, like, then that, to me, that's more convincing. As a language model… And I think if you do that, maybe your brown accuracy would go up.

871
01:40:13.380 --> 01:40:21.929
Nikhil Prakash: Wait, so you're saying that we… what we see here is a lot of the effect of, actually, the white background, including… It could be, it could be, things like that.

872
01:40:25.290 --> 01:40:30.179
Nikhil Prakash: Yeah, it could be. And so you could… I believe… I believe you're willing to ask me something.

873
01:40:30.290 --> 01:40:33.530
Nikhil Prakash: Wait… So you think… I'm okay.

874
01:40:34.260 --> 01:40:36.799
Nikhil Prakash: That will be more absolute value.

875
01:40:37.360 --> 01:40:43.399
Nikhil Prakash: then I think you wouldn't get, like, this confusion between brown and blue. I think one of what would stand out. I think it was a…

876
01:40:43.690 --> 01:40:49.780
Nikhil Prakash: I think, actually, the hypothesis is the… More blue, more morality, because…

877
01:40:50.400 --> 01:40:58.740
Nikhil Prakash: with the generalization of the problem, we saw that the white paper is basically an older picture, so I believe it might be over 50 bags, I thought.

878
01:41:01.760 --> 01:41:03.450
Nikhil Prakash: That's very interesting.

879
01:41:04.270 --> 01:41:09.059
Nikhil Prakash: This is very interesting work, and so it's, you know, I think that you're getting a little preview.

880
01:41:09.400 --> 01:41:23.660
Nikhil Prakash: of your review process, is going through review right now? Yes, we got really nice review for the workshop. I mean, that was workshop, yeah. But it's good, I think it's… I think it's really interesting work. I think that a lot of your…

881
01:41:23.800 --> 01:41:26.350
Nikhil Prakash: Sort of what you, like, sort of your appendix?

882
01:41:26.620 --> 01:41:32.289
Nikhil Prakash: experiments are very strong. I think that, you know, your stuff is very defensible.

883
01:41:32.460 --> 01:41:43.620
Nikhil Prakash: But you know, you'll have to gird yourself for review. Depending on your reviewers, you might have… you might have to present a lot of extra data to them. And also, we need to give credit to Kelly.

884
01:41:43.770 --> 01:41:48.320
Nikhil Prakash: doing the master's student. Yeah, yeah, yeah, yeah.

885
01:41:49.990 --> 01:41:55.330
Nikhil Prakash: Yeah. By the way, you may claim, but then you nail the point later.

886
01:41:55.950 --> 01:42:02.739
Nikhil Prakash: Okay. You should… you should keep that, like, is it this? And then you nail it with… Yeah, nail it, let… yeah, yeah, give me that feedback.

887
01:42:02.870 --> 01:42:09.340
Nikhil Prakash: Yeah. You tend to, like, take… you sort of show things in the front, Instead of impressing people.

888
01:42:10.770 --> 01:42:11.630
Nikhil Prakash: The end.

889
01:42:11.990 --> 01:42:28.139
Nikhil Prakash: You can… you can press the meta. Like, for instance, you showed probe and you said it's diffused across all tokens. Like, if no mechanism researcher would press by probe, but you know the answer, like, you did causal experiments, you should show the causal experiment and say, wow, this is so pressing. Then I'll be like, yeah, it is so pressing.

890
01:42:28.710 --> 01:42:30.640
Nikhil Prakash: So you should say, yeah, when you show the probe.

891
01:42:30.830 --> 01:42:31.780
Nikhil Prakash: Say.

892
01:42:31.890 --> 01:42:35.229
Nikhil Prakash: Well, I don't believe this. Yeah, you should do that. You should justify your own thing.

893
01:42:35.470 --> 01:42:49.930
Nikhil Prakash: This, yeah, okay, but… I think I wouldn't even, like, if you need to present, I wouldn't even show the probing results. Maybe just later for the steering. But then how do you come with the hypothesis? How do you come with the hypothesis of the…

894
01:42:50.080 --> 01:42:55.430
Nikhil Prakash: Actually, for me, for me, the thing I learned from this project is

895
01:42:55.920 --> 01:43:06.599
Nikhil Prakash: Profs are actually not that bad. They're not supposed to evident, but they can do… hypothesize.

896
01:43:06.600 --> 01:43:12.279
Nikhil Prakash: I agree, but why he suggested doing prob, I was like, no, we're not doing prob.

897
01:43:12.280 --> 01:43:26.629
Nikhil Prakash: And then he kept saying that for a couple of weeks, and then I was like, okay, do probes, like, whatever, okay. And then we were able to actually construct really nice positive experiments because of… Because of the probe, I see. But why did you even probe the tokens that are not that position?

898
01:43:26.630 --> 01:43:30.056
Nikhil Prakash: Because when… No, no.

899
01:43:31.020 --> 01:43:35.129
Nikhil Prakash: No, it wasn't… it wasn't actually, Mark, because things were not working.

900
01:43:35.130 --> 01:44:00.119
Nikhil Prakash: When we… when we did not include those additional tokens, which is what generally you would do. You would only work with the square tokens. You could go through the… you can tell the same story, but with causality. I'm not saying change your story, just that I think it would be, for me, simpler. So I think that depends on the format as well. If I was giving, like, a formal talk, then I would have definitely said that. But then, as soon as I started saying reserve a paper started asking questions.

901
01:44:00.120 --> 01:44:06.959
Nikhil Prakash: You claimed it. You claimed it, and you were like, oh, I don't know, this could be that, this could be this, and then you showed causal, and I'm, like, convinced.

902
01:44:07.450 --> 01:44:19.359
Nikhil Prakash: The, like, when you were starting out, when you presented your problem statement, you had a couple results showing, like, a preliminary, like, is this… is there even something there

903
01:44:20.630 --> 01:44:44.430
Nikhil Prakash: for me to search for, which is kind of what we give you, it gives you, like, okay, there's actually something there that I can actually look at. Like, I'm not totally opposed to, like, showing those results of, like, okay, there's actually something going on, and then nailing the problem. You can also say, like, when you show the probes, you can say, don't worry, I have causal experiments later on, this is just, like… Okay, okay. Or even if it's just to make the point, probes can help us think about what kind of causal experiments we want to make.

904
01:44:44.430 --> 01:44:46.820
Nikhil Prakash: This is a point you want to promote, then.

905
01:44:47.530 --> 01:44:58.570
Nikhil Prakash: Yeah. I think we could. I think that that is one of the things that I have learned in the project, but I don't think so we are doing… we are doing that right now, but yeah, I think we could do that. By the way, I like your first slides, right?

906
01:44:58.570 --> 01:45:12.630
Nikhil Prakash: Me too. You're fantastic. I wanted to advise to me. Thank you very much. Okay, cool. Thank you.

907
01:45:17.400 --> 01:45:19.170
Nikhil Prakash: It's a very sweet way to me.