13 May 2025
•
Emmy Liu

I’m a great believer in small habits that make a difference. One that I’ve recently discovered is taking my laptop and working outdoors, either from a park or from my rooftop patio. Above: the view from my rooftop at around 8pm. It may look a little gloomy on camera, but in person it’s a gorgeous day, with a light breeze. I can see dark clouds gathering to the south, while in the north over the crest of a mountain I can see the last of the sunset. I can hear the sounds of birds chattering all around me along with the typical urban white noise of passing cars and pedestrians on the road below.
I recently got a laptop with a matte screen, and though it was pretty expensive, I think being able to work outside without significant glare will more than pay off over time. I’ve already tried reading some papers outside, taking meetings outside, as well as writing outside (for instance, for this blog post). In all cases, I believe my experience was significantly improved by working outdoors (though one time my laptop started overheating so I had to go back inside).
To state the obvious, going outside is something that correlates with a better mood for almost everyone. But sometimes it’s hard to make time to do so. Even though I try to take walks outside most days and also walk to the office every day, practical realities mean that I have to be looking at a screen indoors most of the time. This means that despite getting quite a bit of outdoor time relative to the typical CS PhD student (ok I know this is a low bar), this might only amount to an hour or two at the beginning and end of my day.
Now whenever I’m working from home and feel like it, I go up to my rooftop patio or walk to an outdoor garden nearby to do some mundane work. I usually prefer to use 2 or more screens for more intensive work and can’t really work from my laptop by itself, but I’ve basically started doing my more “chill” work outdoors. Speaking of chill, I’ve also noticed it’s much harder to get aggravated or in my own head when I’m working outdoors and (sometimes literally) touching grass. Things just don’t seem very important anymore, but at the same time things being unimportant mean that I just do them without thinking too much about it. For instance, I wrote this blog post in a 20 minute break whereas normally I can sit on ideas for posts for days. Maybe it’s also because being outside makes me more aware of the passage of time — since I began writing this post, the wind has picked up and it’s much colder, while the light continues to fade. Lights are turning on in the city around me and if I wait longer, it’ll be completely dark on the rooftop.
This was a rather simple post, but the point is I think this is something that you should try if you haven’t yet. If your laptop doesn’t have a matte screen, this may be kind of difficult, but there are many laptop shades available online that you can use. It’s also possible to look for a structure with overhanging roofs, awnings etc, like the outside tables of coffee shops. I would have never attempted this before because of glare, but knowing that this is such a great experience now, if I was forced to go back to my old laptop I’d probably buy some equipment just to make this more feasible. If you try it out, also email me and send me a pic of where you’re working from!

25 Apr 2025
•
Emmy Liu
Recently, I saw news that an AI-agent generated paper was accepted to an ICLR workshop. I’ve been interested in this topic for a while, and some masters students I’m working with are currently building a benchmark for end-to-end scientific reasoning in LMs (from idea generation to coding/execution), so I was curious to read the paper. I’m not actually skeptical that LM-based agents can eventually automate parts of research or serve as assistants in many aspects of research. In fact, I often ask LLMs to fetch literature related to research ideas, draw plots, critique ideas, and more. If you haven’t tried this yet, you should! Sometimes it’s not very helpful, but the LMs tend to call every idea you pitch brilliant and innovative, which is very good for building confidence (NOTE: this is not referring to gpt-4o’s recent update, which verges into sycophantic). I explain this to say that I wasn’t looking for flaws at all, and was rather thinking about how this particular system could be benchmarked.
The issue is: the paper is completely terrible. It’s actually quite impressive how bad it is given the strict page limit, as it manages to be bad in multiple different and conflicting ways even within 5 pages. This is the type of paper I wish I got more of as a reviewer – in the sense that it’s more fun to write 1-star or 5-star ratings compared to 3-star ratings. I’ve attached the annotated version to this post, but I’ll just review a few of the major problems here:
Critiquing the paper (or: the fastest review I’ve ever written)
Attached is a PDF of comments: I review papers in exactly the same way (though I spend much more time and leave more comments for real papers). The comments in red are from Sakana AI, while the little speech bubbles (mouseover) are my comments.
You can read through the remainder of my comments, but the core idea itself just basically doesn’t make sense. The idea is to encourage compositional reasoning in language models (LSTM in this case) by using the following penalty, where $T$ is the sequence length and $h_t$ is the hidden state at time $t$:
\[L_{\text{comp}} = \frac{1}{T - 1} \sum_{t=1}^{T - 1} \left\lVert h_{t + 1} - h_{t}\right\rVert^2\]
It is actually unclear if this is the embedding or the hidden state, the paper seems to refer to it pretty clearly as the hidden state but the sakana comment mentions that this is a typo and it should actually be the embeddings referenced in this section. This embedding idea actually makes even less sense so I’ll ignore it. If the embedding idea is secretly genius feel free to email me about how it could possibly work, but it honestly destroyed multiple braincells for me to contemplate this idea (I need those! I haven’t got many!) so I would prefer not to think about it further.
If you look at that penalty and wonder how it would actually produce compositional reasoning, you would be right…in the “hidden states” interpretation this maybe makes sense in some circumstances: in non-compositional expressions, the hidden state can potentially change a lot in some cases (e.g. in an idiom like ‘raining cats and dogs’ where the meaning isn’t derived from the individual words but requires a sudden shift to the idiomatic interpretation after processing the end of this text chunk). If we enforce that the hidden state shouldn’t change a lot, it would prevent these types of sudden shifts after reading tokens. However, this in itself does not promote compositional reasoning. It’s pretty clear that a constant hidden state would satisfy this, yet this goes against the very essence of compositionality, which is roughly that “an output should be a specified function of the inputs”. If you then look at all the other issues in the paper such as figures not matching what the text is claiming, missing references and so on, this really isn’t any good.
Final score: 2/10. I gave this higher than a 1 because it at least vaguely is in the form of a paper, with an abstract, different sections, some experiments, and a figure or two.
So how did this get past review?
Finally, I’d like to emphasize that this is not a “deep dive” into the paper nor am I nitpicking. These are very obvious flaws that were easy to find – it took probably 5 min to read up to section 3, being very generous. Sakana’s review of the paper oddly focuses on minor details, but there’s no need to look at the trees if the forest is burning down so to speak. So the question should be: how did this get through review?
The first answer is: this is a workshop, and those tend to have lower standards. That’s true, but ideally workshops shouldn’t admit work that’s clearly extremely flawed or low-effort either. As a heuristic, the reviewers shouldn’t be spending more work reviewing a paper than the authors did writing it, or as a co-chair put it succintly, “you can’t just submit any random 5 page document and expect people to spend their valuable time reviewing it”. It seems like some organizations (such as those where high schoolers pay to get research experience) are already submitting their papers en masse to workshops, and I hope AI scientist startups or less scrupulous individuals don’t also start doing this (cope).
I think a deeper issue is reviewer overload and subsequent low effort reviews. This is hopefully an uncontroversial opinion, but in the NLP/ML community (which I’m familiar with), peer review is extremely stochastic, and you can often scrape by with a bad paper (in this case) if you get three reviewers that just say something like “looks good to me” (without reading the paper), or conversely get rejected if you get three reviewers that nitpick on inconsequential/mistaken points (without reading your rebuttal).
There’s no clear solution to this, and I don’t see the peer review situation getting better due to the rapidly increasing number of papers submitted each year. Having low acceptance rates and stochasticity in reviewing also incentivizes authors to submit many papers and directly resubmit rejected papers to the next conference deadline until they get accepted, creating an even bigger backlog.
Ironically, LLM reviewers might actually be above average human level, but for the wrong reasons. Not to say that we should just replace reviewers with LLMs either, that would be a bit too dystopian for now.
Until models can actually conduct end-to-end science themselves and trigger paradigm shifts, I worry about the spam they’ll release on a human audience. Human reviewers are already overloaded with human-written papers, they don’t have the capacity to review random 5-10 page documents generated en masse by people just wanting to attach their names to a publication. I’ve seen similar spam from people using LLMs to generate books, but at least these books aren’t forced on readers.
I do think that LMs have a lot of headway to help us in solving scientific problems, and I already use them to help me with many tasks. However, there’s definitely going to be a huge adjustment period until then, and scaling up reviewer count and quality (or creating really good AI reviewers before AI scientists) is going to be a major challenge.
27 Mar 2024
•
Emmy Liu
(Before anyone who knows me says “wait a minute, I’ve seen you post things on Twitter!”, I have occasionally posted and reposted papers for work reasons more or less, but for more or less a year, I haven’t opened Twitter to browse through new posts at all. This is what I mean by “I didn’t check Twitter for a year”.)
I am also not a Twitter personality or famous for my work, if I were, the tradeoffs would change a bit and I may not have ever made this post. I’m speaking as just an ordinary researcher who uses Twitter to post my work/friends’ works and to keep up with the literature, which I think describes a large segment of students and researchers. If you also fit in this demographic, the approach I outline may work for you.
TL;DR: Experiencing the typical mid-PhD ennui, I lose interest in keeping up with Twitter and stuff in general, and surprisingly I find that I don’t really miss it. Some more thoughts on a slower way to read literature and think of ideas.
The requisite existential crisis
Feel free to skip this section if you don’t feel like reading about my personal circumstances or general moping. To be honest, I didn’t really want to write this section either, but I couldn’t think of a way to honestly present my motivations for this personal experiment without giving a bit of background context on my life. Again, I wouldn’t mind (would even prefer maybe?) if you skip this section, but who knows, maybe some might also find this useful/relatable?
I haven’t really been productive in terms of research this year. At least for the semester of the academic year, I didn’t really publish any first-author papers or have any ideas that I was really excited about. I had some co-authored papers on which I was either working on or mentoring others on, but to be honest, I don’t think my heart was really in any of those projects either.
From this perspective, every time I opened Twitter, it was like the universe was laughing at me – Everyone is publishing high-impact research except for you! Enjoy graduating in 7+ years, if you even can! I usually get somewhat excited by new ideas, but over time, the emotional cost of looking at my feed started to outweigh the reward for me, so I started avoiding Twitter, or even discussing new ideas with people in real life. I wouldn’t recommend the second half of the previous sentence, though as you can guess from the title, I don’t really miss Twitter.
Second, I also got mono last semester, which reduced my productivity even more and restricted my working hours. I usually started feeling extremely tired by 8pm and had to sleep, which isn’t great when there were many days when I would have research or classes straight from 9am-5pm. I did what I could basically, and that was the projects I already started and felt obligated to work on. But this also fed into the first point in this section and further strengthened the assertion that I was going nowhere fast.
Basically, all of this combined with the fact that I wanted to pivot my research (and also needed to potentially link all of my past research together in a proposal soon) created conditions in which I had no time to really scroll Twitter/read papers there, and also didn’t have any incentive to do so even if I had time.
My claim is that for people that are non-famous and mostly using Twitter to keep up with research or occasionally post their own, Twitter can almost entirely be replaced by different, and possibly better alternatives. Let’s examine some different benefits and alternatives one-by-one:
Keeping up with the literature
- Finding things related to your research projects
- Finding things unrelated to your research projects
For finding things related to my own research projects, I would usually either be doing an active literature review anyway, or (more commonly) others would post related papers on the project channel or DM me directly. One could object that I simply outsourced my reading to others, and you’d be correct, but it’s a somewhat natural process to pass papers around, and I did this for others as well when I found papers related to their work.
For unrelated research, the process above is a little weaker (since people might not directly show you something unrelated to your research), but the natural diffusion process still worked well enough I suppose. I think for people truly concerned about keeping up with the latest research (even if unrelated), looking at arxiv every day or setting many google scholar alerts is probably the way to go.
OK, I did do some self promotion on Twitter, and I concede that it’s hard to do this otherwise. I would usually draft tweet threads in a google doc, then use an automated service (I used Buffer) to post all the tweets in the thread at once.
For building a “brand”/following, I’ve never really wanted to do that to be honest. In fact I think I try to cultivate an air of neutrality and boringness on Twitter. I admire people that have a lot of clever quips and can engage in (reasonable) debate on Twitter, but I’ve never quite been able to grasp how to do that myself. I default to either writing in long-form or writing somewhat trite reflections/endorsements (“Was so interesting to work on this paper!”, “Check out our poster at Session 2 on Wednesday!”). I don’t think it’s exactly insincere, but I also can’t figure out a more sincere projection of my personality down to 240 characters most of the time. I would venture that this is true for many others as well. And yes, now that I admitted to being boring, you can unfollow me if you want, but hopefully the content of my posts is interesting enough to stick around for some?
Meeting new people
Like I said, I’m pretty boring on Twitter so I haven’t exactly met many new people through the platform directly, though I have used it to message people I kind of know at conferences to meet up and hang out. I don’t really have much to say on this, though I do meet a fair number of people through email (the boomer way?) or in-person (the best way I think).
Having fun/feeling productive
Back when I used to go on Twitter more often as a first/second year grad student, I often got this expansive and productive feeling after scrolling through papers for a while: “wow, there are so many interesting papers out there! I learned so much!”. It may have been true in some sense, in that I didn’t know that X number of papers existed before, and afterwards I knew they existed, but I doubt that was the most efficient way to achieve this goal (that would probably be looking through arxiv or creating a ritual of scanning conferences for interesting work). If I’m being honest, a lot of the time also wasn’t spent on high-mindedly contemplating research, but on consuming opinions on the latest drama, or miscellaneous fun facts. For drama and fun facts…well, there are also other websites that serve those purposes more directly. It’s easy to feel productive while in reality you’re just gathering vibes.
Slow research
(Anyone who’s possibly going to fund me for anything/give me awards, please skip this paragraph thanks!)
There’s another component to this as well, which is that I really would prefer to work more slowly on things that take a long time. Although I may have published a fair bit in my first few years and even in undergrad, I always doubted how much of a contribution I was really making. Of course, there were some extenuating circumstances as well that prevented me from spending as much time on improving my skills as I would’ve liked to, but if I’m being honest, it was also just easier to stay at the same rough technical level at which I produced my first papers. After all, the publication cycle is short, and it’s easy to feel like you’re falling behind if you’re not working on more and more things, when everyone seems to be working on and publishing so much. I’m not saying that every research paper has to be a huge leap from the ones before, as that imposes a different kind of pressure, but personally, in hindsight I would have liked to do less, but do better. I probably wouldn’t have published several of my papers if I could redo things, but instead use that time for either more ambitious work, or helped others more, or just sat down with a good RPG which I haven’t done in a while….
I have some interesting projects that will hopefully come out in the next few months, but personally, I really appreciated having time to reflect on the types of research I want to pursue in the future and to start setting my own agenda. I think there’s a certain type of thinking and agenda-setting you can only really do when you’re detached from what others are thinking and the constant stream of information about what’s popular in the moment. There’s a danger to being too detached as well, but I would wager that most of us are more likely to get caught up in the collective and lose our own guiding tastes.
The purpose of this blog
I realized this year that I do have a lot to communicate (outside of research papers), but that Twitter isn’t necessarily the right format for me to do that. I hope to make some informative posts here about research topics I learn about, but also to talk about some meta-aspects of research, as well as random other stuff maybe. That’s a pretty broad mandate, but I guess it’s my blog and I can do whatever??
If you’ve read this far, it probably seems strange for me to now say that I’m returning to Twitter after listing out all the reasons why I’ve found it unnecessary for a year. You might even be wondering why you bothered reading this blog post in the first place if the author is so capricious that they can go back on their conclusion from three paragraphs ago.
I don’t think Twitter is useless either, and I think for the more POST-like applications (as opposed to GET-like applications for lack of better terms), there’s not really a clear substitute right now. After detaching for a while, I do want to re-enter the online community more mindfully as well. The point I wanted to make is that it’s far from necessary to be active on or check Twitter if you’re a researcher, and there may be some benefits to disengaging as well. There may be benefits to posting and checking out highlights occasionally, but what are the real benefits to scrolling through the feed everyday, as opposed to keeping up with papers in other ways? This is probably a question that we should all ask ourselves.