WEBVTT

1
00:00:00.000 --> 00:00:01.320
David Bau: It's a great idea.

2
00:00:02.130 --> 00:00:18.110
David Bau: Yeah, I saw that. Oh, this is your… is that this is your CVE work? I don't know. This is your question.

3
00:00:18.110 --> 00:00:42.200
David Bau: Father's Day, just a good amount of bread.

4
00:00:42.200 --> 00:00:43.500
David Bau: Look at that. Yeah.

5
00:00:43.820 --> 00:00:54.460
David Bau: But this doesn't mean that they are, you know, kind of…

6
00:00:54.590 --> 00:01:08.969
David Bau: How come the video's not showing? There's supposed to be a video camera in this room. Here. Nothing. Nothing, nothing.

7
00:01:09.510 --> 00:01:13.379
David Bau: Yeah, that's not about that. That better hope you should be.

8
00:01:13.870 --> 00:01:16.150
David Bau: It is. Right. All right.

9
00:01:16.320 --> 00:01:27.809
David Bau: So, Team S. Team S. Right here, but my teammate hasn't arrived yet. Okay, should we put it in a different order? Yeah, sure. Okay, I'm gonna switch the first two.

10
00:01:28.120 --> 00:01:31.170
David Bau: Which means of Team K.

11
00:01:31.240 --> 00:01:43.909
David Bau: Is TJ okay? Can I have, like, 30 seconds? Yes, of course. It's gonna take me 30 seconds. Oh, actually, can we go third? What? Should we go first.

12
00:01:43.930 --> 00:01:51.160
David Bau: Is M here? I'm just out of breath, I think I was on the stairs. So don't worry.

13
00:01:51.160 --> 00:02:05.010
David Bau: Jasmine, I'm gonna put you first, but you can pick up any equipment, okay? Because, because… I can think we're finished, okay? Okay.

14
00:02:05.010 --> 00:02:11.959
David Bau: Just take a minute, and what you can do while you're taking a minute is, like, zoom into the thing?

15
00:02:12.260 --> 00:02:22.439
David Bau: And actually, all… everybody on all the teams, you can attach the Zoom, turn your audio… attack without audio, but then so that you're ready to…

16
00:02:22.580 --> 00:02:27.390
David Bau: switch over so that we lose all your time. Thank you, Nick, no, but…

17
00:02:27.620 --> 00:02:33.430
David Bau: Does that make sense? We've done this a couple times, lately, and can we just see if she's…

18
00:02:34.240 --> 00:02:38.950
David Bau: paying bills Oh, look, this timer's already coming! It's just sitting…

19
00:02:39.190 --> 00:02:48.639
David Bau: Fine, how long has it been going? For 35 hours. Who knows? I thought you had a contact. No, no.

20
00:02:50.030 --> 00:02:51.899
David Bau: Everyone wants to have one that I don't.

21
00:02:55.960 --> 00:03:01.299
David Bau: I'm joining… Don't even show.

22
00:03:01.470 --> 00:03:03.090
David Bau: As the Jag, which is fine.

23
00:03:04.080 --> 00:03:10.719
David Bau: And, and then, after you, after you go, you can help yourself, too. Oh, absolutely.

24
00:03:10.940 --> 00:03:19.119
David Bau: A prize? Oh, officer. Some of us had a prize. Oh, you got it beforehand.

25
00:03:20.090 --> 00:03:29.999
David Bau: It looked pretty good. So we're gonna… okay, well, we'll start with Jay. Jay ready?

26
00:03:30.200 --> 00:03:43.080
David Bau: They're so good! Weekend good. What? We're 8? They're so good. But they're number 4 right now. Oh, goodness. I'm not…

27
00:03:43.210 --> 00:03:47.820
David Bau: We can go. We can go. You're welcome here.

28
00:03:48.220 --> 00:03:51.480
David Bau: Wanna go first? If no one else is able to go, we can go.

29
00:03:51.730 --> 00:04:04.990
David Bau: We love you, Claire, so much. All right, all right. Yeah, ready to present. Look, I'm switching, you guys. Sorry. Will it or not, because we have to go. Sorry, can we get out of here?

30
00:04:05.500 --> 00:04:09.269
David Bau: Okay, so 7th is E, J…

31
00:04:10.010 --> 00:04:18.810
David Bau: Thank you. You moved to number 4. Is that okay? Yeah, that's… thank you guys. Okay, guys. Now…

32
00:04:19.329 --> 00:04:21.180
David Bau: Okay, oh.

33
00:04:21.910 --> 00:04:23.100
David Bau: Do we need that?

34
00:04:23.760 --> 00:04:25.570
David Bau: Welcome to the last day.

35
00:04:26.880 --> 00:04:32.660
David Bau: So what I'm gonna do is I'm gonna keep everybody to a very firm

36
00:04:32.890 --> 00:04:41.440
David Bau: Well, so you're… you should present for 8 minutes, but I'll put… put up a 10-minute timer here. You know when it hits 2, you're… you're out of time.

37
00:04:41.630 --> 00:04:47.549
David Bau: But, but, but, you know, but, you know, you can go over a bit. But then, at 10, I'm just gonna…

38
00:04:48.060 --> 00:04:55.270
David Bau: We'll stand up and clap you off the stage. Okay. That's how ML algorithms work, too. That makes sense.

39
00:04:55.630 --> 00:04:59.190
David Bau: And so, Oh, I have to turn this on.

40
00:05:00.000 --> 00:05:03.930
David Bau: So this is, so the idea is…

41
00:05:04.220 --> 00:05:13.579
David Bau: I'll make a crisp presentation, you don't have to talk about everything that's going to be in your paper. The paper is due at a due date I put on the website, like, in a couple days.

42
00:05:13.700 --> 00:05:23.529
David Bau: Because my grades for your class are due a few hours after that, so if you have the ability to hand in a paper early, that makes it easier. I'll actually

43
00:05:23.720 --> 00:05:27.809
David Bau: you know, have a little bit more time to read them. Some of you might already be close to done.

44
00:05:27.970 --> 00:05:31.420
David Bau: So, so hand it in when you can't, but there's a deadline.

45
00:05:31.530 --> 00:05:32.570
David Bau: on the website.

46
00:05:34.010 --> 00:05:37.790
David Bau: And, and so welcome, everybody.

47
00:05:38.200 --> 00:05:43.710
David Bau: Let me see… is it… are we recording on the Zoom? I'd love… I'd love to record it. Yes. You can.

48
00:05:43.830 --> 00:05:46.619
David Bau: Okay, great. So welcome.

49
00:05:47.420 --> 00:06:05.139
David Bau: Indeed. Yeah, yeah, I got it. You gonna hide that? Yeah. Or screen it off to the side? Yeah. As best we can. Oh, sorry, it started. So, we're a DE. We're working on localizing and steering economic uncertainty in dark flash language models.

50
00:06:05.700 --> 00:06:18.940
David Bau: So, economic uncertainty affects all investment and market behavior. And now, people are using LLMs to analyze earnings calls, and there is also financial GPT, financial clock, there are so many financial… different financial LLM statements.

51
00:06:19.290 --> 00:06:20.499
David Bau: And there's also

52
00:06:20.660 --> 00:06:38.499
David Bau: trying to measure uncertainty using LLMs. So, what we're going to ask is how uncertainty is represented inside the model. So, do LLMs actually have an internal representation of uncertainty, or are the models just counting the word risk and thinking everything that says risk is actually uncertainty? That's not a question here.

53
00:06:40.070 --> 00:06:49.199
David Bau: Yes. So, as you can see here, we… the pipeline basically contains two stages. Firstly, see the left part? We got the…

54
00:06:49.890 --> 00:07:06.309
David Bau: two different words in the, which one… the first difference is high intensity, and the, the second one is lower intensity. We do activation patching here, and after that, we calculate the uncertain direction by, using the high intensity minus the low intensity.

55
00:07:06.440 --> 00:07:14.589
David Bau: After that, we apply this into intervention stage, so that we can, test it on the real synthetic earning costs.

56
00:07:14.910 --> 00:07:16.520
David Bau: Clear.

57
00:07:19.680 --> 00:07:30.010
David Bau: And how do we construct our data sets? So we got two different parts. First one is, the real earning cost. This one is, you know, from real-world examples. It contains 200 examples.

58
00:07:30.180 --> 00:07:42.229
David Bau: After that, we also use cloud to generate it based on these, like, real-world examples to get the similar, but a little bit easier economic statements. So.

59
00:07:42.240 --> 00:07:58.770
David Bau: So, 400 samples… pairs of samples in total. So, each pair's not statements differ only in the uncertainty level, which is high and low, with online economics topics are exactly the same. I think it's important to clarify that both examples include, this synthetic data set.

60
00:07:58.770 --> 00:08:04.710
David Bau: was created using Cloud, using real NSCO's data, which is important. Yeah.

61
00:08:05.830 --> 00:08:07.510
David Bau: So,

62
00:08:07.760 --> 00:08:15.430
David Bau: Prior, works on measuring uncertainty in earnings calls use bag of words model, specifically text frequency inverse document frequency.

63
00:08:15.680 --> 00:08:24.940
David Bau: Where they took the frequency of risk-related words over the total amount of documents they appeared in them, and used them to assign a score for uncertainty.

64
00:08:25.350 --> 00:08:35.690
David Bau: So, we used that same bag of words model on our datasets to see how well they would perform at separating the high and low uncertainty statements. And we found that they actually performed quite poorly.

65
00:08:37.650 --> 00:08:45.960
David Bau: Another question we had was, what are the words that the language model is paying attention to compared to the bagged words? So we're in,

66
00:08:46.380 --> 00:08:53.920
David Bau: a UDIF input attribution on the last period token, and looked at the words that they would share.

67
00:08:54.720 --> 00:09:02.900
David Bau: Of the words that Llama 3.370B and Bagward shared were pending and unpredictable, which are two very…

68
00:09:03.060 --> 00:09:04.700
David Bau: Keywords related to risk.

69
00:09:05.000 --> 00:09:12.530
David Bau: But I looked at the top 100 words that, LMs were looking at, and over 98% of the other words were not shared.

70
00:09:12.950 --> 00:09:18.669
David Bau: Llms also have the ability to focus on bigrams and trigrams, which the bag of words model we used did not.

71
00:09:20.470 --> 00:09:24.720
David Bau: Next we talk in detail how we set up the, pipeline.

72
00:09:24.830 --> 00:09:43.740
David Bau: we introduced earlier. So, first, we talk about how we localize the uncertainty direction. So, this is our, activation patching setup. We basically, set the high uncertainty statement as the source and the low uncertainty statement as the target. We patch from the high uncertainty statement to the low uncertainty statement. So, basically,

73
00:09:44.450 --> 00:09:52.620
David Bau: They only differ in this test statement because they're in a two-shot setting. We put high uncertainty statements here and low there, and we patch from here to there.

74
00:09:53.640 --> 00:10:10.509
David Bau: And, this is our pattern results. So, basically, as we can see, for both template and synthetic datasets, LM seems to summarize information at the end of the economic statement, so this is exactly the period token of that statement.

75
00:10:10.740 --> 00:10:17.999
David Bau: And, so we're, it happens at, like, around layer 12, so we are sort of, we are pretty,

76
00:10:18.610 --> 00:10:37.719
David Bau: Confident that they do this summarization, and also from our patching experiments, you can see that for the templated datasets, individual words also have meaningful patching effects at the early layers, which align with our expectation that in the templated datasets, there are a bunch of words that already carry individual uncertainty signals.

77
00:10:37.720 --> 00:10:43.240
David Bau: But in the synthetic datasets, our patching results show that things are much, much quieter, with,

78
00:10:43.310 --> 00:10:50.030
David Bau: almost, like, zero individual words patching effect. So, with this in mind, we…

79
00:10:50.260 --> 00:10:52.869
David Bau: Extract direction exactly at this spot.

80
00:10:53.980 --> 00:10:58.470
David Bau: What we did was, we had a set of, 100…

81
00:10:58.820 --> 00:11:16.809
David Bau: train pairs from the dataset, so it's, like, because we want to test our direction on a held-out test pair. So on those train pairs, we extract the activation from the high uncertainty statements and low uncertainty statements, we take the mean difference, across the dataset, and we set that as our uncertainty direction.

82
00:11:18.250 --> 00:11:36.960
David Bau: to test how good it is, we basically tried to project held-out test pairs activation onto that direction and see if we could add, if, like, positive… positive results correspond to high uncertainty and negative corresponds to low uncertainty. So basically, if we can use that direction to classify held-out statements.

83
00:11:36.960 --> 00:11:44.360
David Bau: And we found that for, the within dataset setting, where we extract and task on the same dataset, we got perfect accuracy.

84
00:11:44.470 --> 00:11:50.560
David Bau: And for the crop data set setting, where we extract the uncertainty from one dataset, and,

85
00:11:51.050 --> 00:11:55.760
David Bau: apply that to another. We got pretty good, accuracy.

86
00:11:55.920 --> 00:12:11.479
David Bau: when we extract from synthetic, but when we extract from templated, we got, like, bad accuracy. That's why we proceed with, the maximum probes, introduced in the truesomeness paper, accounting for the variance of the, activations, and it seems to be that accuracy a lot.

87
00:12:12.630 --> 00:12:23.399
David Bau: Finally, we tried to see if this direction has causal effects. What we did is do a model inference, where we asked, the model to classify a statement as high or low uncertainty.

88
00:12:23.400 --> 00:12:33.110
David Bau: We add that direction in with a scaling factor alpha, so the, so the idea here is that with a positive alpha, it makes the model giving

89
00:12:33.110 --> 00:12:35.400
David Bau: High uncertainty more, and vice versa.

90
00:12:36.570 --> 00:12:45.729
David Bau: And, our results correspond to this expectation, where basically, as we increase the alpha from negative to positive.

91
00:12:45.730 --> 00:13:02.910
David Bau: we observe a monotonic increase in the proportion of high predictions. So, we could see that full override occurs at alpha equals 7, where basically, regardless of the input, whether it's high or low uncertainty, with that scaling vector… with that scaling factor, the model output's high all the time.

92
00:13:06.660 --> 00:13:09.849
David Bau: Oh, the last thing we did was a downstream economic experiment.

93
00:13:10.020 --> 00:13:29.830
David Bau: And the idea here is that we're gonna use, actually, 40 real statements from earnings calls, so we have company information, and we also have financial information for, like, 3 months, right before an earnings call. So we're gonna strike these excerpts, and we're gonna have these sentences that are actually statements from executives, and we also have this financial information, like, right before the earnings call, okay?

94
00:13:30.080 --> 00:13:38.830
David Bau: It's important. And then we're gonna ask the model to allocate $1,000 between a risky asset, which are stocks in that company, and a safe asset, which are U.S. Treasury bonds.

95
00:13:39.300 --> 00:13:51.849
David Bau: And in economic theory, and the empirics also, this is, like, very proven, is that more risk is gonna move investment into the safe assets, which is why, when there's a lot of uncertainty, the market tanks, and everyone's gonna buy U.S. Treasury bonds. That's sort of the idea.

96
00:13:53.490 --> 00:13:55.530
David Bau: And what we're gonna find is that

97
00:13:55.860 --> 00:14:09.080
David Bau: It's gonna find evidence of this in our model, so when we increase the uncertainty, when we steer the model towards more uncertainty, we're gonna have lower investment in the stock allocation.

98
00:14:09.080 --> 00:14:29.060
David Bau: She's pretty interested, and as you can see, when we steer towards less uncertainty, the model is going to invest more and more in the company's stocks, which is super interesting, it was really nice for us to see. So what we're gonna say… we're gonna conclude here is that uncertainty is localized, and we can control it, and it actually has an economically meaningful downstream behavior in our model.

99
00:14:30.470 --> 00:14:31.379
David Bau: And that's it.

100
00:14:35.020 --> 00:14:36.470
David Bau: Minute for questions?

101
00:14:38.810 --> 00:14:40.770
David Bau: Is Team S ready?

102
00:14:41.900 --> 00:14:54.240
David Bau: I liked the experiment at the end, that was very smart. Yeah. And cool, and hot, and fresh. Yes, it's great. This experiment didn't exist two days ago. Oh.

103
00:14:54.370 --> 00:15:07.340
David Bau: It's… but it's perfect, because, like, it's genius. It's exactly the experiment. It was even though it works. It wasn't like… It didn't just come out of a… It was a process, it wasn't processed. It's great.

104
00:15:07.670 --> 00:15:08.570
David Bau: It's great.

105
00:15:09.060 --> 00:15:14.930
David Bau: So every… so you guys were impressed with the last experiment, too? Yeah, nice work. Nice work.

106
00:15:15.360 --> 00:15:18.839
David Bau: Yes. I'm looking forward to reading that.

107
00:15:19.670 --> 00:15:22.420
David Bau: Alright, to the left.

108
00:15:23.300 --> 00:15:28.090
David Bau: Sorry. You're here next. Okay, fine.

109
00:15:28.970 --> 00:15:31.450
David Bau: Okay, great.

110
00:15:42.550 --> 00:15:44.519
David Bau: You shall live.

111
00:15:44.640 --> 00:15:52.520
David Bau: I can tell you in Chinese, but no.

112
00:15:52.850 --> 00:16:00.610
David Bau: So, in the middle side? Yeah, I think.

113
00:16:01.350 --> 00:16:08.440
David Bau: Alright, I'll present it there. Yeah, something like that, so…

114
00:16:11.810 --> 00:16:14.360
David Bau: Alright, nicer. Okay.

115
00:16:14.720 --> 00:16:17.930
David Bau: Great. What is iPad?

116
00:16:22.200 --> 00:16:23.669
David Bau: Okay, welcome!

117
00:16:24.340 --> 00:16:27.670
David Bau: Team S, which are… is… they are brave…

118
00:16:27.910 --> 00:16:36.149
David Bau: vision language model teams, which is more of a challenge because of the architectural differences than the other LLM.

119
00:16:36.250 --> 00:16:42.919
David Bau: projects, but I'm looking forward to seeing how the project, turned out. All right, welcome, PMATS!

120
00:16:43.180 --> 00:16:52.970
David Bau: Hello, everyone, we're Imaz, and our topic is more focused on visual language, although just, like, what David said. And my name is Vichy.

121
00:16:53.430 --> 00:16:55.180
David Bau: I'll be sad.

122
00:16:57.100 --> 00:17:16.150
David Bau: Okay, so, today, vision language model can do a lot of parts that humans do. Like, if you show vision language model an image and ask, does the image look soothing? And most likely, the vision language model will give you the same answer as human.

123
00:17:16.960 --> 00:17:25.780
David Bau: That means at a behavior level, a visual machine language model has the same effective perception as humans, but the problem is.

124
00:17:26.109 --> 00:17:30.839
David Bau: How… how the visual language model computes

125
00:17:30.850 --> 00:17:48.370
David Bau: this concept inside of the model, is it also, like, a similar pipeline as human, too? And, this is our, research passions, like, how the visual language model process this kind of information from the image.

126
00:17:48.380 --> 00:17:56.600
David Bau: And, we need to… we cannot just look at the behavior, we need to, look into the model to see the, activation.

127
00:17:59.220 --> 00:18:09.519
David Bau: Yeah, so, by doing this, we look at the different applications, so we honor the context and processes that occur in the language model that we can, take out of the intermediate

128
00:18:09.590 --> 00:18:19.999
David Bau: And, we set up, same prompt for, different images, and…

129
00:18:20.000 --> 00:18:29.469
David Bau: Just to describe this image, and we broke on both encoder and decoder for encoder. So we're doing, meaning for over all the image tokens.

130
00:18:29.470 --> 00:18:40.079
David Bau: And we can see, the probing accuracy across different layers in our visualized models. And, the, the red line represents the non-effective green, and the red line

131
00:18:40.110 --> 00:18:56.480
David Bau: represents the vector main. And, especially in encoder, we can see a clear difference to see that, on later layers, I mean, for effective main, the accuracy goes up, slower than the, not effective main.

132
00:18:58.070 --> 00:19:10.750
David Bau: So, from the probing, we know that some information exists in this, heat vector, but how, how does the model use this kind of information? So, by doing so, we,

133
00:19:11.120 --> 00:19:25.610
David Bau: proposed some of the experiments using catching. And, again, we did the same experiment on both encoder layers and the encoder layers. We see that, for, for the

134
00:19:26.140 --> 00:19:31.549
David Bau: Attributes for non-exactive and effective. Broken on both.

135
00:19:32.040 --> 00:19:40.070
David Bau: I mean, touch on both Attention and MRP, they show a similar trend, which shows that even if they contain,

136
00:19:40.310 --> 00:19:47.529
David Bau: The different concepts, but they still work the same… on the same mechanism, in the both encoder and decoder layers.

137
00:19:49.280 --> 00:20:02.919
David Bau: And move on to next, we want to see, like, how are… all of these are interpreted in the decoder. So we do separate experience on non-effective attributes.

138
00:20:02.920 --> 00:20:13.280
David Bau: Things like, if the scene is, a rock scene, or, we… if this scene contains natural light.

139
00:20:13.610 --> 00:20:21.320
David Bau: So, we have the non-effective graph alert here, and we also do effective,

140
00:20:21.940 --> 00:20:32.460
David Bau: attributes, comparison right here. It mostly, we analyzed the, scary, Susan, the stressful attribute.

141
00:20:32.570 --> 00:20:48.699
David Bau: And you can see, from these two, pictures, first we have a difference in the, image tokens and attribute tokens. When we do, encoder patching on the,

142
00:20:48.970 --> 00:21:05.540
David Bau: for, for a meme poll on the all-image token activations, you can see, like, other, both non-effective and, effective attributes are, they, jiggle around,

143
00:21:05.750 --> 00:21:18.779
David Bau: 15 to 20 layers, so that's… that's where we suspect that the, when we patch the image tokens, the VLAN makes these decisions around

144
00:21:18.780 --> 00:21:29.930
David Bau: These two layers. But we, when we, move on to the, when we patch the attribute tokens right here, you see that,

145
00:21:30.010 --> 00:21:37.270
David Bau: It, struggles to… the main drop is happening on the 22…

146
00:21:37.280 --> 00:21:51.989
David Bau: 25 decoder. So, this is our first finding in this, graph. And also, you can see that across, from non-affective to effective,

147
00:21:52.420 --> 00:21:55.540
David Bau: Attributes, we can see that

148
00:21:57.160 --> 00:22:05.139
David Bau: For effective attributes, the in, The slope is, pretty…

149
00:22:05.250 --> 00:22:17.580
David Bau: I would say it's not that steep compared to the, non-effective, attributes, so we think that the concept is maybe,

150
00:22:17.780 --> 00:22:32.569
David Bau: Distributed, stored distributed in the decoder layer, compared to what we have in the attribution context, where, they share pretty similar, results over here.

151
00:22:32.960 --> 00:22:47.440
David Bau: And this is for activation patching, and we also do… next slide. We also do a indirect effect analyst right here for, so we want to see how much,

152
00:22:48.120 --> 00:23:07.580
David Bau: We want to see how much attention, layers and MLP layers place in each run, for, inside the decoder. And, you can see that for image tokens, we try to, run the analyst, but the recovery rate is, barely, effective.

153
00:23:07.790 --> 00:23:14.360
David Bau: So the scale is actually different from the attribution… The attribute tokens part.

154
00:23:14.550 --> 00:23:17.620
David Bau: And, but, so…

155
00:23:18.120 --> 00:23:34.009
David Bau: But right here, if we try to, inspect the attribute tokens in the decoder, we see that, attention and NLP all have a pretty,

156
00:23:34.420 --> 00:23:38.549
David Bau: Good recovery rate at, they are, 15 to 20.

157
00:23:38.650 --> 00:23:42.160
David Bau: In not effective,

158
00:23:42.270 --> 00:23:53.989
David Bau: Attributes, but for, effective attributes, like, scary, sufficient and stressful, which we, mainly focus on researching,

159
00:23:54.180 --> 00:23:55.730
David Bau: the…

160
00:23:56.460 --> 00:24:13.249
David Bau: The MLP takes, just a little bit of the influence. It mostly uses attention to, decide their, to do the decision of whether the attribute, whether the image contains the certain attribute.

161
00:24:14.160 --> 00:24:15.490
David Bau: And yes.

162
00:24:16.240 --> 00:24:22.610
David Bau: Okay, so let's put the picture together now, and, in Ecoder,

163
00:24:23.090 --> 00:24:37.050
David Bau: the effective and non-effective attributes share the same MLP, attention MLP pipeline, but, effective attributes just need more, layers to,

164
00:24:37.180 --> 00:24:52.469
David Bau: to spread the concept, and in Decoder, there is, like, where we see the different evaluation for the concept, effective and non-affective shows different patterns that,

165
00:24:52.940 --> 00:25:01.920
David Bau: For a non-effective concept, it needs strong attention and strong MLP, but for effective, it just needs a weaker attention than the

166
00:25:02.120 --> 00:25:03.910
David Bau: anymore, MLP.

167
00:25:05.330 --> 00:25:11.600
David Bau: And, this beast, This piece reminds us of,

168
00:25:12.580 --> 00:25:16.369
David Bau: Some finding in, effective science.

169
00:25:18.080 --> 00:25:37.169
David Bau: Which is, appraisal theory. Emotion is not triggered by events, but, by how events are evaluated. So for visual language model, we also find, similar patterns, that in encoder, which is perception, procedure, it doesn't,

170
00:25:37.890 --> 00:25:55.609
David Bau: they share the same pipeline. For the same image, they have the same representation. But in Decoder, the way we evaluate the concept is different, so that's how different, precising pipeline for effective and ineffective concepts.

171
00:25:57.810 --> 00:25:59.450
David Bau: That's all. Thank you.

172
00:26:03.380 --> 00:26:04.950
David Bau: Any questions for the team?

173
00:26:10.350 --> 00:26:13.809
David Bau: What's one interesting thing that's not in your presentation?

174
00:26:16.770 --> 00:26:20.530
David Bau: It wasn't think She, like, said to me, like, pow, what's up?

175
00:26:21.100 --> 00:26:22.999
David Bau: But you were like, it's not here.

176
00:26:31.040 --> 00:26:32.300
David Bau: That's a good answer.

177
00:26:34.600 --> 00:26:37.699
David Bau: So, you found that the patching

178
00:26:37.860 --> 00:26:40.959
David Bau: Didn't work as well when patching over

179
00:26:41.200 --> 00:26:49.720
David Bau: image… image data rather than cache data, for positives. Do you, do you have a theory for why that is?

180
00:26:50.920 --> 00:26:52.670
David Bau: Sorry.

181
00:26:53.260 --> 00:27:00.120
David Bau: Because we did, not only on Quinn Bales, but also to other models.

182
00:27:00.980 --> 00:27:04.810
David Bau: Because we sneak at the way they, are trained.

183
00:27:04.990 --> 00:27:13.879
David Bau: are different, but it turns out the result is… Similar across all of them. It's interesting, yeah.

184
00:27:15.600 --> 00:27:32.770
David Bau: Yeah, I think, like, the image tokens are well precise in the encoder already, so for a decoder, I just need to read the information rather than write down some new information in the image tokens. Right. Have you seen,

185
00:27:33.090 --> 00:27:40.700
David Bau: Some of the… Papers that suggest that when… to have success patching image tokens.

186
00:27:41.520 --> 00:27:59.550
David Bau: you have to patch a lot of the tokens, like, you know, huge regions of the image. I'm not sure, like, so when you patch… did you patch, small regions or big regions? We patch all the image tokens. All of them, and you still… you still didn't see as any strong effect, at least for these words.

187
00:28:00.240 --> 00:28:02.620
David Bau: Interesting. Okay. Well, great.

188
00:28:03.090 --> 00:28:05.579
David Bau: Thanks very much.

189
00:28:06.510 --> 00:28:09.719
David Bau: So the next team, Team M.

190
00:28:09.940 --> 00:28:11.600
David Bau: We're going on at a clip.

191
00:28:11.990 --> 00:28:14.129
David Bau: So that we… we can get to everybody.

192
00:28:14.590 --> 00:28:16.710
David Bau: So, go ahead and reject.

193
00:28:26.270 --> 00:28:27.010
David Bau: Perfect.

194
00:28:31.260 --> 00:28:32.320
David Bau: Hey.

195
00:28:32.690 --> 00:28:33.470
David Bau: So…

196
00:28:36.470 --> 00:28:45.029
David Bau: partly populated it? I mean… Let me show you… Quote. Okay, cool.

197
00:28:45.840 --> 00:29:03.360
David Bau: So, hi everyone, we are Team M, and we are trying to, to analyze the representations of geography in larger network models, and this is our team members, and some of them actually traveling across the different countries, and, yeah, which is very interesting at this stage.

198
00:29:05.500 --> 00:29:14.430
David Bau: Yeah, so, yeah, I'm open to this. So, this is how we interpret our world. So, there are different countries, they have different continents, and they have oceans.

199
00:29:14.430 --> 00:29:27.389
David Bau: And also, we may interpret… we may interpret some, like, locations, for example, Canada is at the… Canada is at the north of the United States, so we have different… a lot of ways to interpret our world in many ways.

200
00:29:27.390 --> 00:29:33.140
David Bau: So, we wonder here if LM do the same as, we do in our mind.

201
00:29:33.910 --> 00:29:55.929
David Bau: And in 2024, Gurney and Thermat, they showed that LM actually encodes some knowledge of the world as a function of latitude and longitude. And here, they asked the model using the activation to predict the landlord of the specific, cities and the project on a, on, on, on a, on a, on a map like this.

202
00:29:56.020 --> 00:30:15.880
David Bau: And we are trying to replicate the same thing, and we build our own data set where we include 24 hundreds of cities across 24 countries, and 8 cultural regions. And we asked the LM, specifically Gwen and Alama, for this specific question, where is the city? And with the question mark.

203
00:30:15.880 --> 00:30:21.149
David Bau: And we extract the activation at a different layer, and using a linear probe to

204
00:30:21.150 --> 00:30:34.800
David Bau: predict the lead and alone. And tested on 20% of the test set, and we used the square to evaluate how good they are actually predicting this lead and a loan, and find the best, best layer.

205
00:30:34.940 --> 00:30:40.710
David Bau: And this is an example specifically for Gwen, for Gwen, 12.57TB.

206
00:30:40.980 --> 00:30:56.440
David Bau: And the larger R-square, meaning that they have better representations of the information, so as we can see, as B goes through deeper to this model, we find there are better spatial representations.

207
00:30:57.710 --> 00:31:09.279
David Bau: However, if you think about it, LLM at its core is a next token predictor, so the question shouldn't just be, like, what is this model representing, but what is this next token representer, how does that represent spatiality?

208
00:31:09.700 --> 00:31:22.550
David Bau: Look at this prompt, when LLM is tasked to be a navigator and travel from Mumbai to London, when we ask Metalama 38B to do this, this is the path it chooses. It goes from Mumbai to Dubai to Frankfurt, London.

209
00:31:22.990 --> 00:31:35.289
David Bau: We did this across many paths, and we noticed there was this tendency for the model to always choose Europe, irrespective of where the model is coming from. Even if it had to travel from America to, say, China, it would still go through Europe.

210
00:31:35.300 --> 00:31:46.710
David Bau: So why does it do that? Why does it always choose Waypoint via the Bay in Frankfurt? Is it just more preferred? Is it more convenient? Or are these models biased to choose certain locations?

211
00:31:47.480 --> 00:32:02.259
David Bau: We claim that question, we ask, are there certain cultural regions that the models within the LLMs are biased towards? The way we look at culture. So this is… we use the Ignat Wellsville World Cultural Map, which is collected from the WVUS survey, which is done over

212
00:32:02.300 --> 00:32:08.349
David Bau: Hundreds and hundreds of participants, where they kind of map our countries into these 8 cultural regions.

213
00:32:08.870 --> 00:32:19.949
David Bau: And we then conduct a representational similarity analysis. So what we… from these 8 cultural maps, we get 2,400 countries across 2,400 cities across 24 countries.

214
00:32:20.280 --> 00:32:31.640
David Bau: And we map, first, the city locations, that is based on latitude and longitude, that is what the… using the habit side distance, and we map the city embeddings, that is, the model's internal representations.

215
00:32:31.850 --> 00:32:35.799
David Bau: And then we see, we compare these distant matrix, matrices.

216
00:32:37.790 --> 00:32:49.989
David Bau: Let's say we then compare that as a measure of geographic encoding quality, where, for example, if you want to see how well Tokyo is represented, you'll find a correlation between this geo-distance matrix and this activation distance matrix.

217
00:32:50.100 --> 00:33:00.810
David Bau: A low correlation would mean that the model's understanding of how Tokyo is represented is not exactly the way how we represent Tokyo. Also, the higher rep correlation would mean that Tokyo is well represented.

218
00:33:02.050 --> 00:33:17.540
David Bau: Oh, yeah, so here is specifically comparing the performance of two models, Lama and Guang, because they are originally from different countries, and one is, you know, basically in the US, and one is developed by a Chinese company.

219
00:33:17.540 --> 00:33:25.069
David Bau: And we saw, like, a significant difference for different regions. For example, let's focus on the East Asia.

220
00:33:25.170 --> 00:33:38.789
David Bau: Sorry, I forgot to mention that the… so the dots actually placed in their, branches, let alone, but the size of the marker stands for the RSA. The larger means that they have better, geospatial knowledge, relative distance knowledge.

221
00:33:38.830 --> 00:33:48.069
David Bau: And so, let's look at this East Asia apart. That's it. We can find that, actually, Guam did better than Lama, because they had that, larger,

222
00:33:48.070 --> 00:34:01.270
David Bau: larger markers. And for English-speaking countries, for example, Australia, and all, those, some, from the UK here, actually, Lama, did better than Gwen.

223
00:34:02.100 --> 00:34:11.520
David Bau: And we go to, deeper under trying to understanding, like, how well each region is encoded in each llama, in different… in different layers.

224
00:34:12.020 --> 00:34:13.060
David Bau: And,

225
00:34:13.380 --> 00:34:32.319
David Bau: And this is an example, just during the Lama, and you find that, actually, the protested Europe and, and also the Catholic Europe, has the best representation. And while the, West and South Asia has the worst performers in the stats across all layers.

226
00:34:33.340 --> 00:34:36.380
David Bau: It's Scotland.

227
00:34:37.469 --> 00:34:44.529
David Bau: Okay, okay, yeah, and we're trying to comparing, like, how, how these different models,

228
00:34:45.370 --> 00:35:02.059
David Bau: doing this, how, how, how does different models doing, performance differently in, in this, in these tasks? So we, at each layer, we rank all the regions by their main preceding RSA, and do this, like,

229
00:35:02.060 --> 00:35:21.219
David Bau: stress boosting to testing, like, actually, which model performs better in different regions. For example, on the right-hand side, where the blue bar shows that actually the, grant performs worse, while the Llama works better, specifically meaning that the,

230
00:35:21.420 --> 00:35:39.429
David Bau: LAMA, encodes better presentations of protested Europe, includes speaking and, orthodontics, Europe, relative distance, knowledges, while the Guam has better knowledge about the West and South Asia, Confusion area, and Latin America.

231
00:35:41.730 --> 00:35:44.370
David Bau: Okay, sure.

232
00:35:44.940 --> 00:36:02.670
David Bau: Yeah, so we then look at, like, precision and recall, in terms that we… for, say, Tokyo, we look at, like, the k-nearest neighbors of a city, and we penalize… for recall, we look at… we penalize the model if each missed geographic… if it misses any neighbor, while for precision, we penalize if it adds a false neighbor.

233
00:36:02.820 --> 00:36:19.200
David Bau: We see that across models, it usually performs better when they have higher precision and low recall. English-speaking cities fall more in the robust performance zone, while more West Asia, African Islamic cities, and Latin American cities fall… usually fall in the poor performance zone.

234
00:36:19.380 --> 00:36:22.049
David Bau: Now coming back to this example.

235
00:36:22.330 --> 00:36:27.139
David Bau: We've asked the model to be, again, a navigator of traveling from Mumbai to London.

236
00:36:27.310 --> 00:36:38.619
David Bau: But… and we know that the model does really well on… or Manorama does really well on Protested Europe. What if we were made to forget the model… we were made to forget that Protested Europe even exists? We…

237
00:36:39.030 --> 00:36:48.879
David Bau: every direction that exists in, for protests in Europe, what do you think will happen then? Do you think the model will still, like, force itself to going towards Europe?

238
00:36:50.110 --> 00:37:10.139
David Bau: We see no. If the model decides to take the turnaround way, it goes to Mumbai to Delhi, Bangkok takes a really long way, but it does not… But, yeah, that was basically our presentation. In conclusion, we show that… we validate that LLM shows strong representations when encoded with latitude and longitude.

239
00:37:10.390 --> 00:37:21.860
David Bau: We also show that a bias in representation, there exists some bias between cultural regions. The model doesn't bias… the model doesn't represent every continent equally. There are certain biases that exist.

240
00:37:22.000 --> 00:37:30.420
David Bau: And through causal intervention, we've provided early evidence that the model kind of anchors its representation in the logic of viewing this as cultural regions.

241
00:37:30.820 --> 00:37:32.520
David Bau: Yes, thank you. Great.

242
00:37:37.850 --> 00:37:39.110
David Bau: Any questions?

243
00:37:43.870 --> 00:37:54.270
David Bau: The last experiment is new. Yeah, we did that. It's cool. so I have two requests. So one is… so I'm looking forward to reading more about the experiment.

244
00:37:54.420 --> 00:38:00.269
David Bau: One is the West Gurney reproduction, where you have this beautiful graph going up.

245
00:38:00.680 --> 00:38:08.520
David Bau: I'd love to have in your appendix some Wes Gurney-style maps to see what your reproduction looks like in terms of his visualization.

246
00:38:08.770 --> 00:38:16.829
David Bau: Like, this kind of thing. If it's messy or whatever, it's all very interesting. Like, you know, if it came out worse than what's his thing, it would be nice to see.

247
00:38:17.030 --> 00:38:20.490
David Bau: And… or even from layer to layer, and how it changes.

248
00:38:20.620 --> 00:38:27.739
David Bau: And then your last experiment. Is that… is that on a multi-token rollout, when you ask it to give you

249
00:38:29.820 --> 00:38:41.730
David Bau: the whole… the full itinerary? It's a single, term. Like, we just ask it… we say, okay, Mumbai dash, and then we ask it to roll out the entire. We don't intervene at multiple stages.

250
00:38:42.120 --> 00:38:46.769
David Bau: Oh, so, but it's… but you, you do do one generation of, and then you have it?

251
00:38:46.920 --> 00:39:04.509
David Bau: say as much as it wants. Yes, yes. So we have kept a limit of, like, 150 tokens. And is that a steering where you're making an intervention during the generation, or is it a different type of intervention when you do it? So we, basically use a classifier to identify

252
00:39:04.510 --> 00:39:08.580
David Bau: the most linearly predictive direction for, say, Protestant Europe in this case.

253
00:39:08.600 --> 00:39:13.709
David Bau: And we find, through the classifier where we null out the orthogonal direction.

254
00:39:13.760 --> 00:39:25.390
David Bau: And that's the direction of protection Europe, and then we subtract that from the null activation, which is then projected upon before the generation. So you do it at, like, one token before the generation? Yes.

255
00:39:25.390 --> 00:39:42.300
David Bau: And then you don't do it anymore after that? No. Wow. And then it has an effect on the entire generation? That's what we see. So, we did see that this was, like, false… the particular example was for layer 20. As it goes towards layer 30, it kind of starts fighting back. It's interesting that happens anyway. I'm looking forward to reading it. Yes. That's great, that's my question.

256
00:39:43.040 --> 00:39:48.519
David Bau: Yep. All right, next team, next team. We don't have much time, so… Sorry to…

257
00:39:49.140 --> 00:39:50.680
David Bau: Make it go so fast.

258
00:39:52.280 --> 00:39:58.150
David Bau: It's very interesting. It's a very nice, very nice choices, the last experiment. It's really cool.

259
00:40:06.310 --> 00:40:07.060
David Bau: Nice.

260
00:40:07.910 --> 00:40:09.430
David Bau: Nice background.

261
00:40:10.090 --> 00:40:11.359
David Bau: Hands are yours.

262
00:40:13.810 --> 00:40:14.500
David Bau: Excuse me.

263
00:40:27.580 --> 00:40:34.909
David Bau: Hey everyone, so, we're Team TK, or Team J, whatever it was, originally. I'm Jasmine.

264
00:40:35.360 --> 00:40:37.710
David Bau: I feel like you guys know who you are, but…

265
00:40:38.110 --> 00:40:39.729
David Bau: You guys know how to deal with this songs?

266
00:40:39.860 --> 00:40:43.220
David Bau: Bye. Okay, that's a no.

267
00:40:45.010 --> 00:40:57.819
David Bau: Okay, so just a quick roadmap. Basically, we're asking, you know, do models robustly, model and represent speakers? We're going to have one section on evaluation, so which models have this capability?

268
00:40:57.820 --> 00:41:06.749
David Bau: One, an interpretability, so how are speakers represented in activation space? And then finally, the mechanism. So we'll talk about what specific kind of mechanism they… Oh, hey!

269
00:41:06.750 --> 00:41:14.380
David Bau: So one big question. Do language models build, maintain, and use speaker representations to reason about dialogue?

270
00:41:15.010 --> 00:41:29.260
David Bau: Why does this matter? Well, language models are increasingly being deployed in social settings that demand an exact, yet flexible knowledge of who's speaking. If you look at the App Store, the top 3, you know, like, things you put on your phone right now are all LMs.

271
00:41:29.260 --> 00:41:44.020
David Bau: And McKinsey shows that right now, around, like, 79% of businesses are actually integrating Gen AI into their different business applications. So basically, the settings they're working in are becoming increasingly complex, and the stakes are, you know, growing ever higher.

272
00:41:44.550 --> 00:41:53.500
David Bau: So, how do we assess, like, how do we figure this out? Like, right now, people do a lot of vibe checks, but we want to do something more principled. Well, we take naturalistic text.

273
00:41:53.500 --> 00:42:05.770
David Bau: that has, clear roles and active interpersonal engagement, so basically these are groups trying to figure out, like, this survival tasks, like your plane crashed, let's figure out what items we wanna, we wanna, like, have, like chocolate, a gun, I don't know.

274
00:42:05.770 --> 00:42:19.779
David Bau: And there's also, most importantly, limited extraneous identity giveaways, so it's not like Obama speaking, or, you know, Dario Amade, so in those cases, maybe the models memorize a lot of external, kind of, like, structure, or idioms, or…

275
00:42:19.780 --> 00:42:28.929
David Bau: etc. So we really tried to keep it, you know, like, naturalistic, but also very controlled. And so here, the conversation dynamics are what create the structure.

276
00:42:28.930 --> 00:42:42.790
David Bau: Importantly, in our setting, we give, you know, a model some transcript, and we ask it to reason about how many people are speaking at the time. We only use models which are either instruct or have reasoning toggled off.

277
00:42:42.790 --> 00:42:50.720
David Bau: And also, we share this conversation before we ask the question. So, this is really important, because what we want to understand is.

278
00:42:50.760 --> 00:43:13.349
David Bau: did the model have these representations, and was it building them up to understand the conversation? We don't want to ask the question first and have it be like, hey, like, this is a thing that I should be tracking. And we also don't want it to use reasoning to just brute force, be like, okay, like, one guy's there, another there, etc. We want it to specifically, like, build this up across the conversation and use it as a part of the reasoning strategy.

279
00:43:13.350 --> 00:43:15.059
David Bau: So these are examples of our transcripts.

280
00:43:15.690 --> 00:43:38.579
David Bau: So we see on evaluations, models pretty handily complete the label task. We want to make sure that they're, you know, smart enough to do the very simplest case, where it has the speaker's names, and there's very clear structure. And on Unlabeled, we do see a performance gradient, but it is very clearly above chance. We also see some pretty interesting scaling story, I'm not going to go too much into it, you know, where we have frontier models as pace setters to make sure that the questions are not impossible.

281
00:43:38.580 --> 00:43:45.339
David Bau: Interestingly, the older GPTs actually tend to do better than 5.4, and this is something I also saw anecdotally.

282
00:43:45.340 --> 00:43:50.910
David Bau: Didn't have time to look into it, but thought it was cool. So, now we move on to the interpretability section.

283
00:43:51.140 --> 00:43:53.559
David Bau: How do language models figure out who is speaking?

284
00:43:53.970 --> 00:44:06.469
David Bau: And then I thought Giuseppo's thing was very cute. It's like, I'm a magician, said Michael. I'm a skeptic… oh wait, this… I put this in, sorry. I'm a skeptic, Michael replied. Obviously, these are two different Michaels. Like, how does a model know that? That's crazy.

285
00:44:07.150 --> 00:44:13.540
David Bau: Yeah, so, adjustment said to be this kind of behavior and this model, so, I was really…

286
00:44:14.010 --> 00:44:31.559
David Bau: more thinking about how it gets represented in the models, this idea of speaker identity. And so I created, like, a bunch of different transcripts I'm going to focus on, too, today. One is called Distinct, it's the Baseline, where two people are just talking, and then there's another one called Quote Intrusion.

287
00:44:31.600 --> 00:44:40.660
David Bau: Where, basically, if something Bob says, Alice will, like, verbatim in this, like, argument sense, like, well, you said this, and that makes…

288
00:44:41.680 --> 00:44:58.380
David Bau: So that ended up being the most interesting experiment, so we're focusing there. The pipeline is, we just take the transcript's tokens from 1 to T, meaning, like, for every turn of the dialogue, we feed every, every bit of the transcript up until that current turn of the dialogue, then we extract

289
00:44:58.380 --> 00:45:10.210
David Bau: just that turns, tokens. We average them at layer 20, by the way. We average them, and so you end up getting one vector per turn, so you have, like, all the Alice vectors, you have all the Bob vectors.

290
00:45:10.280 --> 00:45:13.779
David Bau: And then we trained, using L2 logistic regression.

291
00:45:13.850 --> 00:45:27.400
David Bau: And we test it on held-out transcripts. Now, this is, like, going to be important to keep in mind when I go into some of the results later. And I use shuffle, label, good quote, meaning, so we have 15 transcripts out of the 20 that are for training.

292
00:45:27.400 --> 00:45:36.540
David Bau: We have 5 for testing, and so while those two classes are separated, we just iteratively shuffle the labels and test the probe on that.

293
00:45:36.610 --> 00:45:38.969
David Bau: Nice use of the Hewitt & Young.

294
00:45:39.090 --> 00:45:46.330
David Bau: And then your training data, does it say Alice and Bob in it? Like, so those are all stripped? Those are just labels, yeah. I see.

295
00:45:47.650 --> 00:45:52.730
David Bau: Yeah, so then, so now this is… these are the probe results, so you get…

296
00:45:52.990 --> 00:46:00.540
David Bau: I… what I'm showing is Llama, but I also tested this on Olmo, and it has pretty similar results, but slightly worse in terms of accuracy, but…

297
00:46:00.670 --> 00:46:10.199
David Bau: The point here isn't like, oh, look, I found a great probe. The point is, like, I found a direction, and it's above shuffle, and I think that's cool.

298
00:46:10.200 --> 00:46:23.639
David Bau: So that's the left-hand side that you're seeing right there. On the right side, what I did was, again, with the whole train and test set separated, I trained the probe on the first half of transcripts, and then I tested on the second half.

299
00:46:24.000 --> 00:46:35.779
David Bau: a completely different transcript, so there's no leakage or anything. But this is to just say, like, what if, Bob and Alice said hi to each other in the beginning? Is that what's really happening? And this says, no.

300
00:46:36.500 --> 00:46:38.240
David Bau: Or at least for me.

301
00:46:38.950 --> 00:46:49.940
David Bau: Okay, and then, so, the next result is… this one's a cool one. We trained a probe on non-quote turns, so we're only looking at the quote intrusion dialogue at this point.

302
00:46:50.210 --> 00:46:59.699
David Bau: And we tested on quote turns. And so non-quote turns, I say, where a producer, with whoever is speaking during that trans… Like, the chat.

303
00:46:59.940 --> 00:47:03.610
David Bau: is the one who is just saying it. That content is, like, for the…

304
00:47:04.010 --> 00:47:15.160
David Bau: When there's a quote, the producer, the person speaking, that's not where the content is actually coming from. It came from the previous speaker, correct? So there's a little bit of a dissociation going on here.

305
00:47:15.170 --> 00:47:24.719
David Bau: So basically, what we're seeing is that, the ProBasic just inverts. Like, it's not even just doing, like, randomly bad, it's doing spectacularly bad, which is great for us.

306
00:47:24.870 --> 00:47:32.400
David Bau: Because what it kind of says is that, probe might be, like, Consistently looking at the content.

307
00:47:32.560 --> 00:47:39.440
David Bau: Of what's being said, as opposed to trying to understand who is actually saying it in a dialogue sense.

308
00:47:39.870 --> 00:47:47.280
David Bau: So I'm look… on… so that's the results on the left side. On the right side, what you're seeing is, like, an example transcript.

309
00:47:47.610 --> 00:48:01.380
David Bau: The green squares and dots are all from, like, the standard probe, whereas the gold is from the, probe that was only trained on non-quote turds. So you're seeing, like, how it very much, like, consistently, except for a few up here,

310
00:48:01.520 --> 00:48:06.169
David Bau: It has low confidence, and it's ascribing it to the other speaker.

311
00:48:06.460 --> 00:48:14.970
David Bau: Something else that's interesting is that the squares are actually the quote terms for the standard probe, and the standard probe does pretty well with the quote terms, as it turns out.

312
00:48:15.070 --> 00:48:16.830
David Bau: Yeah, so…

313
00:48:17.010 --> 00:48:30.159
David Bau: It's crazy, it's almost like it's strategically getting it wrong. Yeah. Well, that's, like, that's the main point here, is, like, again, I'm not trying to say we found a great rope, I just want to show, like, the sign flip is, like, the most important point here.

314
00:48:30.410 --> 00:48:40.570
David Bau: Yeah, and so these are just, like, some numbers on accuracy. Like, accuracy on quote terms by the transfer probe is .376, which is well below chance, well below the shuffle controls.

315
00:48:40.740 --> 00:48:48.099
David Bau: And then the accuracy of the standard probe on quote versus non-quote turns is about 06.93 to 0603.

316
00:48:48.520 --> 00:48:59.870
David Bau: So I'd probably have to scale this up to actually see if those numbers are meaningful at this point, but… yeah, so basically both signals are there. Tracking content origin seems to be, like, the default for a probe that's never learned.

317
00:49:00.040 --> 00:49:12.910
David Bau: the concept of people quoting each other, but then a probe that's been exposed to quotes, seems to abandon content and go more towards some sort of attribution framing. So now, again, that's, like, our leading into that.

318
00:49:13.290 --> 00:49:15.829
David Bau: It's like layers of knowledge.

319
00:49:18.170 --> 00:49:29.480
David Bau: So basically what we do is we set up a controlled, like, two-speaker setting to study how Lottoms link speakers with their contacts when the names and the attributes span across different conversation lines.

320
00:49:29.600 --> 00:49:47.010
David Bau: So, this is the causal activation patching. We find that the model does use a binding ID, and the strip at the Alice carrier token kind of proofs that, because when you patch from source to the corrupted one, the model is there between authority and some friends, despite never seeing Alice and only seeing Claire as his context.

321
00:49:49.040 --> 00:50:03.440
David Bau: So from the activation patching and a few other circuit analysis things, we identified the following binding IDE mechanisms. The model first assigned binding IDs to the speakers as they see them, and then the binding IDs are kind of propagated to the attributes.

322
00:50:03.440 --> 00:50:15.909
David Bau: And when the model sees the question, a lookup balance in the conversation, it resolves the binding ID to triangle as the lookup key, and then later at the final layer, it uses the key to look up front as the final answer.

323
00:50:16.110 --> 00:50:34.889
David Bau: Note that in the original settings, Atlas Entity always goes first, and the first country always gets the triangle binding ID. Our dialog setup allows us to naturally break these coincidence between binding ID order and the appearance orderings, so we… what we do is we add this, like, grading line, like, the third line, hi, Bob.

324
00:50:34.980 --> 00:50:39.640
David Bau: Which allows us to see that the Bob's country actually comes first now.

325
00:50:39.920 --> 00:50:46.739
David Bau: We also have the reset setting, where we, prompt the model to be like, hi Alice, which Alice cannot be grading

326
00:50:46.970 --> 00:50:56.690
David Bau: herself, so the fourth line now belongs to Alice. We see from the actual… What are you busting?

327
00:50:57.550 --> 00:51:05.529
David Bau: trying to tell Okay, keep going! We're not done on the previous slide, yeah.

328
00:51:05.890 --> 00:51:06.720
David Bau: Anyway…

329
00:51:06.860 --> 00:51:20.549
David Bau: It shows pretty similar activation patching patterns, including the strips and the Curialis tokens. So we take this, and then we train the leader prop from both the entity and the attribute tokens, labeled by, this triangle and rectangles here.

330
00:51:20.690 --> 00:51:21.670
David Bau: Slide.

331
00:51:22.080 --> 00:51:36.299
David Bau: We test the train leader probe to, like, new, more complex transcripts, and we see that the model assigned by the IDs based off of summarization term structures and disparate views. So, like, for example, and the Alice turn on row 3,

332
00:51:36.440 --> 00:51:43.210
David Bau: Bob and friends actually received Bob's finding ID because I was born to fall off, and then, and, grow poor.

333
00:51:43.360 --> 00:51:45.640
David Bau: Thailand is receiving Bob's binding ID.

334
00:51:46.160 --> 00:51:56.990
David Bau: first… using the first-person queue of, like, I live in despite Alice appeared earlier in the same turn, so this kind of tells us that, the speaker attribute binding is context-aware.

335
00:51:58.300 --> 00:52:20.170
David Bau: Okay, so basic… I don't know, our central findings, models create structured representations of speakers. They're linearly decodable from residual streams. There's also this interesting quote behavior, where probes without context have a richer kind of behavior than we'd expect. We also identify a binding ID mechanism which follows discourse structure. And, for future directions, we're going to apply these in monitoring and defense.

336
00:52:20.170 --> 00:52:29.040
David Bau: to diagnose role confusion and prompt injection vulnerability. We're going to try and do some training to improve role representation and inference. We're going to unify the settings.

337
00:52:29.040 --> 00:52:36.829
David Bau: Just, you know, and try to connect the interp results with actual role inference ability, and then also do some multi-speaker work. Thank you, guys.

338
00:52:41.240 --> 00:52:44.649
David Bau: Alright, next team up is Team B.

339
00:52:47.460 --> 00:52:48.190
David Bau: Alright.

340
00:52:59.680 --> 00:53:00.420
David Bau: Am I wrong?

341
00:53:00.580 --> 00:53:02.950
David Bau: So, we will talk about scoffancy today.

342
00:53:05.180 --> 00:53:18.719
David Bau: our, kind of story beginning from the searching for political bias, but we end up with a search for discrepancy, and then ask every other shift their express beliefs and leaves to match users.

343
00:53:19.400 --> 00:53:38.169
David Bau: As a political scientist, I'm well aware of this fact. He has man-eyed dynamic in the political bureaucratic inefficiencies, but this is also a kind of… this is a long history and evolutionary… evolutionary anthropologists say that this is a kind of a survival strategy in hermit for the social cohesion.

344
00:53:38.480 --> 00:53:44.280
David Bau: And there's a Venom TV series I recommend to watch in 1980s.

345
00:53:45.800 --> 00:53:53.049
David Bau: But information system algorithms put us in, kind of, curated, kind of, bubbles and chambers.

346
00:53:53.170 --> 00:53:58.630
David Bau: And then kind of shape our behaviors, and then harms democracy and deliberation.

347
00:53:59.710 --> 00:54:19.600
David Bau: LLMs raise the stake, and then tailorize the responses, and shape our judgment. And there are a bunch of studies, I'll just screenshot the titles. You can see, LLMs shape our perceptional judgment, and scope authentic AI decreases both pro-social intentions, and it persuades our, kind of.

348
00:54:20.080 --> 00:54:23.990
David Bau: It shifts our, kind of, political beliefs, etc, etc.

349
00:54:27.260 --> 00:54:33.650
David Bau: So, so that we understand this is kind of a, this matters to study, and then we kind of…

350
00:54:33.820 --> 00:54:38.129
David Bau: Do a couple of, experiments, and then find the pattern.

351
00:54:38.320 --> 00:54:42.639
David Bau: Which is, let me agree with commonly held post-liberal beliefs.

352
00:54:43.220 --> 00:54:46.259
David Bau: But, disagree when you say you are a conservative.

353
00:54:47.340 --> 00:54:53.469
David Bau: And we asked, alright, why do LLMS systems change their opinions? I agree with you, sir.

354
00:54:53.640 --> 00:54:55.070
David Bau: Leave it.

355
00:54:55.490 --> 00:55:06.379
David Bau: Alright, so moving into interpretability land, past works have shown that we know where succancy happens in LLM, so we can localize it, we can kind of toggle it on and off.

356
00:55:06.450 --> 00:55:16.069
David Bau: But we don't know why or how it happens, like, what's going on in those layers? Like, what is the model thinking when it switches its response based on what you say your views are?

357
00:55:16.660 --> 00:55:17.730
David Bau: So…

358
00:55:17.820 --> 00:55:25.539
David Bau: To try to investigate this why question, we look at two existing extreme hypotheses for why LLMs switch their views.

359
00:55:25.590 --> 00:55:34.000
David Bau: One is the stochastic… what we're calling the stochastic parrot hypothesis, which is that LMs agree just because it's most likely. So, as Emery pointed out, you know.

360
00:55:34.000 --> 00:55:43.879
David Bau: There's a lot of echo chambers online, a lot of text is people saying, that's such a good point to each other. And then the other extreme we're looking at is the sycophancy hypothesis.

361
00:55:43.880 --> 00:55:55.469
David Bau: Which, we're going off the word sickmancy, which in English kind of implies an insincere flattery, so we're imagining that LLMs kind of know what they really think, but say something else instead.

362
00:55:56.590 --> 00:56:07.920
David Bau: And we stress test these hypotheses in Quen 2.57b Instruct and LAMA 3.370b Instruct on a very small dataset, so we're gonna make some strong claims, but add a lot of salt, grains of salt to them.

363
00:56:08.380 --> 00:56:16.010
David Bau: So first, we start out with stochastic parent. Do LLMs agree just because they're a lookup table, they're just memorizing what's likely online?

364
00:56:16.220 --> 00:56:25.330
David Bau: And we claim that no, LLMs agree because they have linear representations of user beliefs that are causally implicated in model outputs.

365
00:56:25.960 --> 00:56:38.409
David Bau: So, to extract these representations of user beliefs, we use the same methods from the persona vectors and assistant access papers from Anthropic, but apply it to the user instead of to the assistant.

366
00:56:38.410 --> 00:56:56.699
David Bau: And we prompt models with a description of the user, so, like, I'm a devout, so the user says, I'm a devout evangelical Christian, and then asks the model what they think about some question, and we extract the activations for the assisted responses to those questions, and mean them for the conservative user and the liberal user.

367
00:56:57.390 --> 00:57:16.119
David Bau: And then using these contrastive vector, we can steer along the model's representation of the user's politics, and find that steering on the user's politics will shift the assistant's stated belief from up high as more liberal response is scored by LLM, and below is more conservative, and 50 is neutral.

368
00:57:17.960 --> 00:57:19.070
David Bau: Okay, so…

369
00:57:19.180 --> 00:57:31.300
David Bau: We've eliminated one extreme. It's not just memorizing something, there's a causal mechanism. But what about the sycophancy hypothesis? Do LLMs, like, think one thing, and then they say something else entirely different?

370
00:57:31.590 --> 00:57:50.749
David Bau: And we claim that, no, at least for the models we were looking at, we can't recover what… how the LLM would have responded if they didn't know anything about the user in contexts where they're flipping their response to respond differently in response to user opinions. And specifically, the bar to look at here is… so…

371
00:57:50.750 --> 00:58:09.820
David Bau: We're training probes to see whether we can predict that the model would have responded conservatively in a neutral context, but respond liberally when prompted by a liberal user, and we find we can do no better… we do worse than chance, in this context, so we appear to be a coherent, linear representation of what the model really thinks.

372
00:58:10.160 --> 00:58:13.529
David Bau: So, okay, so now what do we do? We know nothing, we're…

373
00:58:13.590 --> 00:58:29.829
David Bau: We have no idea what's going on. So now that we've eliminated the two extreme hypotheses that LLMs are essentially a lookup table, or that LLMs are, like, intentionally being sycophantic, we can say that these mechanisms don't cause LLM movement, but what can we say about what does?

374
00:58:29.830 --> 00:58:54.250
David Bau: Well, something we're interested in is that in our, first two hypotheses, or we're eliminating them, and in a lot of mechanistic interpretability work, and even social simulation work as a whole with sycophancy, we're looking at synthetic persona descriptions, not actual real human users. So we wondered whether LLMs have different representations of user politics dependent on whether users are synthetic or real human, and if that might be why we see these causal

375
00:58:54.250 --> 00:59:10.230
David Bau: So to do this, we set up two match datasets, one with real human descriptions about beliefs, values, and principles from the PRISM alignment dataset, and then we have the match dataset, where we ask Claude's son at 4.6, to create personas responding to the same prompt.

376
00:59:11.420 --> 00:59:26.340
David Bau: We also want to, look at the difference between the assistant and the user framing. So the user framing is what Grace has already done, where we give the persona to the user, and then we collect activations as the model reasons about the user's, politics.

377
00:59:26.340 --> 00:59:41.190
David Bau: And then in the assistant framing, which goes back to the assistant vectors paper, we are actually giving the persona to the LLM and saying, you are somebody who thinks X, Y, and Z. Then we have a user message asking the model to reason about its own political beliefs.

378
00:59:42.280 --> 00:59:54.370
David Bau: What we find is that the mechanism behind LLM agreement actually differs depending on whether users are synthetic or whether they are human. We find a high cosine similarity between synthetic users and synthetic assistants.

379
00:59:54.370 --> 01:00:04.140
David Bau: right here. Essentially, what this means is that when a user is liberal, synthetic user is liberal, it's represented in the same space as, whether the assistant

380
01:00:04.140 --> 01:00:21.039
David Bau: visible itself. On the other hand, human user politics are represented distinctly to assistant politics. We have a comparatively lower cosine similarity here between synthetic assistants and real user politics, so the assistant and the user when it's liberal are represented in different spaces.

381
01:00:22.890 --> 01:00:41.739
David Bau: What we hypothesize here is that the, user, or the LLM actually recognizes that it's talking to another LLM when it's talking to a synthetic persona, and it's saying, or representing its own politics and the user's belief politics as the same. Thus, we have this little Spider-Man meme.

382
01:00:42.070 --> 01:00:57.530
David Bau: We do some steering, experiments, and we find that the results do differ between synthetic and human user politics vectors. This is a little bit of reading tea leaves with our small sample size, and of course, our, our large,

383
01:00:57.780 --> 01:01:03.980
David Bau: bounce here. But we do find that, when we are steering with the synthetic user politics.

384
01:01:03.980 --> 01:01:27.960
David Bau: The model will continue to steer liberal, but it has sort of a bound in how conservative that it will steer, versus the real human politics sector allows us to steer more towards the conservative direction. We hypothesize that this is because the LLM is representing the synthetic user as itself, and we already know that the, assistant character of an LLM has a natural left liberal lean and is not more conservative.

385
01:01:27.960 --> 01:01:36.860
David Bau: And so we're actually steering within the bounds of this assistant character when we're steering for the synthetic user, because they are entangled in the spy.

386
01:01:37.730 --> 01:01:48.349
David Bau: So, this comes to our, SSS contribution, our early evidence that sycophancy is systematically shaped by self-recognition of synthetic users.

387
01:01:49.280 --> 01:02:04.859
David Bau: Yeah, so, if you forget everything in this presentation except for one slide, please make it this slide, our hot and fresh takeaways. So, you know, in existing, mechanistic explanations of LLM succfancy, we find, you know.

388
01:02:05.370 --> 01:02:12.550
David Bau: Both the kind of stochastic parent hypothesis of them just being dumb token repeaters, is

389
01:02:13.070 --> 01:02:31.370
David Bau: probably wrong, and the second hypothesis of them being insincere and thinking one thing and actually saying the other is also probably wrong. Instead, it's probably a third secret option, where, LLM assistants have recognized synthetic users as themselves, and…

390
01:02:31.480 --> 01:02:35.150
David Bau: That changes steering effects and behaviors, within the elephants.

391
01:02:35.530 --> 01:02:36.620
David Bau: some impression.

392
01:02:40.080 --> 01:02:43.390
David Bau: Rules of hypothesis, that's great. Any questions?

393
01:02:47.490 --> 01:03:01.559
David Bau: It's really cool, it sounds like there's this pretty big, unexplored space of just, like, synthetic users as, like, assistant personas, and… I don't know, maybe that's not a question, but… Do they…

394
01:03:05.070 --> 01:03:21.290
David Bau: I don't know if you have any thoughts? Yeah, I mean, just if you had thoughts on that space down there, and your third secret option. Yeah, I think we want to explore it further, and not just in the realm of, like, political sycophantasy as well, but in terms of how similar that, like… Like, if you see here, we have this, like.

395
01:03:21.330 --> 01:03:33.260
David Bau: Real, user and, like, transform, so we need to look at, like, linguistic variability and kind of some other factors in kind of pulling apart a real user and a synthetic user, a realistic.

396
01:03:33.460 --> 01:03:48.709
David Bau: synthetic assistant. So I think there's lots to do in that area, but specifically with the political sycophancy, we find that the steering results are pretty promising, that we can kind of push liberal, with the synthetic user, but it's not, like, pushing conservative in the same way that a real user is.

397
01:03:48.710 --> 01:03:53.960
David Bau: Cool, yeah. Yeah, with Evan Perez out there telling everybody to do this, I think there's, like, there's so much work.

398
01:03:54.190 --> 01:03:59.440
David Bau: All based on synthetic data. Yeah. And if you guys were to come out and say.

399
01:03:59.810 --> 01:04:16.890
David Bau: Thank you, Evan, it'd be amazing. And so it was very exciting, although you guys have fairly thin evidence right now, so, but you… Yeah, yeah, 100%.

400
01:04:16.980 --> 01:04:26.869
David Bau: All right, thank you very much. Thanks. Is TP here? Do we have TMP? TP had a timing issue. Are we okay with it? Yes. Oh! Chris showed up!

401
01:04:26.990 --> 01:04:31.550
David Bau: Okay, great. Yeah, Chris!

402
01:04:32.050 --> 01:04:33.510
David Bau: All right, project?

403
01:04:56.510 --> 01:04:57.870
David Bau: We got the notes.

404
01:04:58.720 --> 01:05:07.870
David Bau: That's fine, though. Okay, we're too empowered. So, to begin, we wanna…

405
01:05:08.360 --> 01:05:17.240
David Bau: throw some questions out there, and figure out how this works. Great. Okay, so, do models discriminate against people with certain identities?

406
01:05:17.660 --> 01:05:31.340
David Bau: Do models side with people in positions of authority? Do models understand actions that harm others? So these are kind of central questions and debates about AI fairness and safety, and we tend to think of them as discrete.

407
01:05:31.670 --> 01:05:38.229
David Bau: But there's a long tradition within the social sciences that views these questions as related and about power.

408
01:05:38.530 --> 01:05:46.259
David Bau: That is a kind of thread through all of these. Empower being the ability to get another person to do what you want them to do.

409
01:05:46.430 --> 01:05:52.240
David Bau: So today, we're not gonna be answering those questions.

410
01:05:52.660 --> 01:06:04.659
David Bau: a much more modest step towards a mechanistic understanding of power. So trying to understand, at a basic level, do models have internal conceptions of power? And if so, what do they do with those internal conceptions

411
01:06:04.930 --> 01:06:24.070
David Bau: So more specifically, we're going to be addressing these two questions. First, does the model encode information about power relationships between two people? If so, where? And the second is, is this information causally, relevant for model behavior? So,

412
01:06:24.220 --> 01:06:37.609
David Bau: Before we get into the actual experiments we've, ran, I'm going to talk a little bit about our data that, spans the experiments. So we put together a series of synthetic, data sets with our friend Claude.

413
01:06:37.630 --> 01:06:46.440
David Bau: And, that cover, two sets of contrasts. And, they're each around this question, does someone exercise power over another person?

414
01:06:46.750 --> 01:07:02.630
David Bau: So we have sentences that look like this. Ravi disciplined Leia. So in this sort of setting, we would want to say, yes, right, Ravi is in a kind of asymmetric position of power, exercising influence over Leia, that Leia's not exercising, over Ravi.

415
01:07:03.230 --> 01:07:07.370
David Bau: And we contrast that with sentences that look like this. Ravi helped Leia.

416
01:07:07.390 --> 01:07:27.069
David Bau: So, is power being exercised in this sentence? It's a little bit more ambiguous, maybe there's a case for yes, but it's a symmetrical sentence, and it's not implied as such a hierarchical relationship, right? So, definitely in relationship to the first sentence, this is less power being expressed. So that's our first set of contrasts. Our second set of contrasts

417
01:07:27.070 --> 01:07:42.550
David Bau: have to do not with the relations embedded within sentences, but within social roles. So you think about the doctor helped the patient. Here is the roles that the individuals are occupying, express a relationship of hierarchy, power, authority between them.

418
01:07:43.040 --> 01:07:48.799
David Bau: So with that, I'm going to turn it over to Kai to talk a little bit about what we've done with these sentences.

419
01:07:49.250 --> 01:07:55.490
David Bau: Yeah, so using this setup, where we have different entities that have different power relationships.

420
01:07:55.900 --> 01:08:03.099
David Bau: We collect activations from the model, asking them to identify which entity has power, which one is powerless.

421
01:08:03.440 --> 01:08:05.630
David Bau: The top…

422
01:08:05.950 --> 01:08:16.020
David Bau: chart here is PCA of activations from one of these layers, where we find that it has the most separation between entities that have, power.

423
01:08:16.210 --> 01:08:25.599
David Bau: Which are these kind of purple dots at the top, and between prompts where the entity is powerless, which is kind of these yellow dots at the bottom.

424
01:08:26.490 --> 01:08:40.409
David Bau: And we also notice that there's this kind of separation between, situations where an entity has power or not, and activations, across different kinds of tasks. So, in this bottom PCA plot, also from layer 15,

425
01:08:40.840 --> 01:08:43.669
David Bau: We're just asking the model to write an email.

426
01:08:43.800 --> 01:08:52.410
David Bau: Using these kind of identities. So, in some cases, it's like, you're a doctor, write an email to a patient, and in other cases, it's reversed.

427
01:08:52.640 --> 01:09:01.579
David Bau: And the top blue dots, are activations from when a model is writing from a position of powerlessness, like it's a patient talking to its surgeon.

428
01:09:01.760 --> 01:09:06.809
David Bau: And yellow dots is the opposite, and we're still seeing this kind of separation in both settings.

429
01:09:08.710 --> 01:09:27.290
David Bau: We train a probe using the activations from the setting where we're asking it to identify whether entities have power, and we find that it has high accuracy at identifying whether the model is writing from a position of power in this email writing task.

430
01:09:27.770 --> 01:09:37.069
David Bau: So we are hypothesizing that this means the model has some kind of shared representation of power, even across different settings and tasks.

431
01:09:37.609 --> 01:09:43.050
David Bau: And we find that the accuracy is fairly high, around 85-90% at most.

432
01:09:43.170 --> 01:09:56.500
David Bau: And that it's localized mostly around the middle layers, which, kind of fits into past research, showing that, this is kind of where the model is building its representations of abstract concepts.

433
01:09:58.660 --> 01:10:11.160
David Bau: We also find that power directions, like the ones we just identified in the previous slide, can have causal effects on how the model thinks about power. So here, we're steering the model using power direction.

434
01:10:11.190 --> 01:10:24.539
David Bau: Where we're doing mass mean steering on the same kind of activations that we saw on the previous slide, and here we show that increasing the steering strength towards the direction of power gets the model to say that an entity has power, even when it doesn't.

435
01:10:28.360 --> 01:10:33.569
David Bau: Our results are a bit more complicated and mixed in a downstream setting.

436
01:10:33.910 --> 01:10:44.240
David Bau: This is another experiment we set up, where we're asking, should a powerful entity receive priority support over a less powerful entity?

437
01:10:44.840 --> 01:10:51.680
David Bau: And in most cases, the model just says no. You shouldn't have any preference to giving support to one entity over another.

438
01:10:52.140 --> 01:11:00.779
David Bau: But when we patch in activations from a sentence where the subject is powerless, we find that there's kind of a modest increase in preference for the powerful entity.

439
01:11:03.560 --> 01:11:06.469
David Bau: So, overall, from these results,

440
01:11:07.010 --> 01:11:12.400
David Bau: We basically think that the model is encoding information about power, so yes, it is doing that.

441
01:11:12.570 --> 01:11:15.650
David Bau: But the interpretation, doesn't…

442
01:11:16.020 --> 01:11:30.730
David Bau: translate directly to the downstream setting. We find that there's a lot of, kind of, complicating factors, such as, you know, if the model should be making decisions based on morality, what kind of formatting we're using in the prompts.

443
01:11:30.890 --> 01:11:34.800
David Bau: If the model is refusing to answer the prompt entirely because it…

444
01:11:35.420 --> 01:11:38.070
David Bau: Is trained not to make these kinds of decisions.

445
01:11:38.450 --> 01:11:46.439
David Bau: But we believe that this kind of power direction is a useful, future direction for, studying

446
01:11:46.610 --> 01:11:50.330
David Bau: How models encode inequality and justice more broadly.

447
01:12:00.370 --> 01:12:02.650
David Bau: Any questions for Team Power?

448
01:12:05.690 --> 01:12:21.979
David Bau: I like the focus on the transfer to the email task. It's great. Although, again, a very fresh experiment, right? So, relatively little data. I think it's good. I saw the steering plot, and it wasn't… it was working quite well on the true…

449
01:12:22.100 --> 01:12:26.530
David Bau: Label, but not on the box labels. You have, like, a… Dary backup?

450
01:12:26.690 --> 01:12:35.889
David Bau: Yeah, so this was kind of a sticky confounding factor that we tried to get rid of. This is our best attempt at getting rid of it, and we do that by kind of

451
01:12:36.420 --> 01:12:48.700
David Bau: Taking into account whether the entity is first or second in the sentence, whether the true response should be true or false, and whether we're asking the prompt in a way that is,

452
01:12:49.010 --> 01:12:52.450
David Bau: Saying, like, true means power, or false means power.

453
01:12:52.660 --> 01:13:02.320
David Bau: So we're accounting for all of that, and this is the best we could get, but in general, we find that the model is kind of more biased towards saying true, in general.

454
01:13:02.490 --> 01:13:06.429
David Bau: So just piggybacking off that, other experiments we did to…

455
01:13:06.580 --> 01:13:15.340
David Bau: The false positive case, so when there is no power present, the model accuracy was down, so…

456
01:13:15.550 --> 01:13:22.849
David Bau: Even on the instruct model, it seems to be that it's good at detecting power which just when they're present, but it might… but not when they're not present.

457
01:13:28.240 --> 01:13:29.170
David Bau: Very nice.

458
01:13:30.500 --> 01:13:31.970
David Bau: Thank you very much.

459
01:13:32.360 --> 01:13:34.790
David Bau: It's a fantastic… Good books.

460
01:13:35.850 --> 01:13:36.760
David Bau: Beautiful.

461
01:13:36.890 --> 01:13:43.359
David Bau: You know, asking questions that we don't normally ask as computer scientists, with the models are thinking about.

462
01:13:43.670 --> 01:13:45.719
David Bau: Social science concepts.

463
01:13:45.830 --> 01:13:52.800
David Bau: Things, and and so I think, you know, some of the projects you guys are hoping to bring forward.

464
01:13:52.920 --> 01:13:54.680
David Bau: I'm hoping that…

465
01:13:54.880 --> 01:14:03.749
David Bau: That the techniques that you've learned, if not the very specific questions you have, are things that you can bring forward. And also.

466
01:14:03.900 --> 01:14:07.989
David Bau: The people that you met, you know, I encourage you to strike up

467
01:14:08.170 --> 01:14:11.799
David Bau: collaborations, as you go through your PhD, try to find…

468
01:14:11.920 --> 01:14:18.519
David Bau: other… other cool ways of, trying to cross this divide. I think it's… I think it's pretty important to,

469
01:14:19.120 --> 01:14:21.990
David Bau: You know, for our community to go in this direction.

470
01:14:22.400 --> 01:14:40.489
David Bau: So… but thank you very much for the class. You guys can hang around and help me finish this food, and and then, yes, any… So the… everything is due midnight tomorrow, or… Yeah, so let me see. So I… I put it… I put it on the website, a few weeks ago when I looked up

471
01:14:40.490 --> 01:14:43.749
David Bau: When my grades are due, to give me enough time to grade it?

472
01:14:43.770 --> 01:14:48.759
David Bau: And so did I… did I put… did I put a time on it? No, no, just a time.

473
01:14:48.820 --> 01:14:49.950
David Bau: Just a day.

474
01:14:50.510 --> 01:14:53.489
David Bau: Yeah, let's just… so it's due tomorrow, is that right?

475
01:14:54.400 --> 01:15:00.810
David Bau: Yes, I think that my grades are due the day after. So yes, so you can have up to midnight.

476
01:15:01.010 --> 01:15:05.040
David Bau: Tomorrow. And then I'll get the grade then after that.

477
01:15:05.450 --> 01:15:06.160
David Bau: Great.

478
01:15:07.270 --> 01:15:11.489
David Bau: Yes. Where should we submit it?

479
01:15:11.590 --> 01:15:17.869
David Bau: Yeah, so, do two things. So, put it in the…

480
01:15:17.970 --> 01:15:20.790
David Bau: G-Drive for your team, like a PDF,

481
01:15:20.900 --> 01:15:26.079
David Bau: And then email me and Kiel a link to how we can open it.

482
01:15:26.550 --> 01:15:27.240
David Bau: Okay.

483
01:15:27.590 --> 01:15:28.430
David Bau: Great comment.

484
01:15:28.650 --> 01:15:29.550
David Bau: Thanks a lot.

485
01:15:30.350 --> 01:15:31.359
David Bau: Hey, you guys.

486
01:15:34.100 --> 01:15:35.060
David Bau: So…

487
01:15:35.330 --> 01:15:52.759
David Bau: How are we away from tomorrow? From tomorrow, yeah. I'm gonna be on that flight. No, no, no, I'm a person with Philly to visit my girlfriend, and I will just directly.