Wokeness Inequity Is Destroying America

Did you know that the top 1% are responsible for over 90% of all woke tweets, posts, and tiktoks, while the next 10% account for only 9%? In the frenetic post-memetic socio-capitalist wokeonomy of today, the woke are getting wokier, while the mass social karmically-poor are struggling to virtue signal fast enough to keep up with ever-accelerating outrage inflation.

In this post, we discuss the causes and historic drivers of wokeness inequity in America, its effects, and possible solutions.

A Sustained Pattern

America experienced tremendous, historical woke growth in the past few decades. Total Domestic Wokeness (TDW) grew by well over 80% year-over-year on a sustained basis in the decade from 2010 to 2020, and America's share of global wokeness grew from 6% to 73% during that same decade. A record 16,500 topics were eliminated from the Overton window, according to research done by UNIWPC.

Despite this abundance of wokeness, more sheeple than ever are falling below the social karma poverty line -- 10 likes/retweets/creepy DMs per day. The reason for this is simple: the increase in wokeness has not been distributed equitably. The average wokeness per woking household has stayed roughly flat since 1970 in real terms (that is, after adjustment for outrage inflation), and what gains do exist have accrued to the top wokers.

Despite calls from top wokers that the bottom 90% should "just woke themselves up by their bootstraps", the political reality is that it is often impossible for woking Americans to accumulate any amount of wokapital. In 1970, a household needed to espouse only an average of 2.1 woke ideas to be in the top quintile, whereas today, even 20 woke memes would barely land that same household outside the lowest quintile.

Wokeness Inequity Affects Everyone

As a result of this, the woke and the woke-nots are often at odds with each other. Toxic, weaponized wokeness is increasingly more common, leading to intolerance and brutality against the less woke. Those who self-identify as unwoke receive proportionally far less representation amongst the college educated, in high-income vocations, and in political office. This results in less diversity in the work place and in public discourse.

More perniciously, these effects are often inter-generational. Compared with just one generation ago, Americans are far less likely to achieve wokeness mobility. What the woke fail to account for is to what degree their their wokeness is impacted by woke privilege. By far, the largest predictor of an individual's wokeness later in life is the socio-economic class of their parents, estimates George Georgeson, the director of Center For Wokeness Equity, who was not consulted for this article because he does not exist.

How You Can Help

If you would like to see more woke people in the world, the most effective thing you can do today (besides donating to the Woke Institute For A Better America by paypalling this address) is to become more bigoted. The simple truth about wokeness is that it is a finite resource. Wokeonomist magazine projects only 1% total global wokeness growth during the next decade, and a survey by the same found that 78% of analysts predict peak wokeness within the next 3 years, with a siginificant fraction even going as far as asserting we have already hit peak wokeness.

In other words, there is only so much wokeness to go around, so to get more people woke, you have to leave some outrage for others to pick up. Even better if you can produce more than you consume. George Georgeson recommends incorporating some of the following into your daily routine:

  • use a word that sounds suspiciously like but isn't actually a racial slur

  • mansplain something to a coworker (anyone can do this regardless of gender)

  • consume products whose production necessitates and perpetuates systemic social injustice, cruelty, and malice, such as any electronics, any meat, or any media

If wokeness equity is important to you, you owe it to yourself and to all the woke-poor to increase bigotry in the world by following these simple steps before the wokeness equity police cancels you. Please join me and the Woke Institute For A Better America; together, we can end wokeness inequity by making everyone a bigot.

The Big Short Squeeze: An Account Of The Gamestop Saga

Gamestop (the stock) has had a big week last week. Somehow, the story of a bunch of (self-proclaimed) retards throwing away their life savings because they saw a bunch of rocket emojis became the underdog story of retail investors sticking it to hedge funds/the establishment by beating them at their own game, and the establishment retaliating via underhanded tactics negotiated in secret backdoor meeting. Accusations of market manipulation have been thrown on both sides, and politicians, billionaires, and other scoundrels have jumped in to cash in on the attention.

I'm not sure that that narrative is really "correct" (whatever correct means -- let's say, "grounded in reality?"), so I will just note some things from my perspective for posterity. I'm writing mostly for people who do not have a super detailed understanding of how the markets work (like myself), so I'll begin with some basics for context.

Short Squeeze

A short squeeze consists of two things: a short (many shorts) and a squeeze (many squeezes).

Short

  • Shorting/short selling means selling a stock when you do not own it.

  • This is done by borrowing the stock from someone who does own it, and then selling it the same way you sell normal stocks.

  • The short seller pays interest on the market value of the stock borrowed, and has an obligation to return the stock at some later time by buying a stock to close the position (cover). There are usually also collateral requirements.

  • Unlike options, there is no time limit to when the position needs to be closed. However, brokerages can close your position for you if they get nervous (e.g., if the stock goes up a lot and the brokerage does not think you will be able to buy back the stock in the future).

I like to think of short selling as synthetically creating a new stock-antistock pair from the ether (the ledgers of the financial systems or whatever) using energy (money). Basically, if there is a lot of demand for a stock (prices are high), then enterprising individuals who think a profit can be made can just artificially create a stock out of nothing to meet that demand, but that process also unavoidably creates an equal and opposite obligation to return that stock (because some kind of "law of conservation of stonks"). A stock and antistock can then be annihilated by bringing them together (closing the short position) to release money. Alternatively, you can think of short selling as time traveling: the short seller buys the stock at some future date, the stock travels back in time to allow the short seller to sell it today. These analogies are not particularly useful or elucidating, I'm just rambling to meet the word count requirement that my editor gave me on this post on my own personal blog that I completely own.

If the short interest (the total number of shorts sold) is high, it creates a volatile situation where many shares are owed compared to the number of shares available to sell. The short interest is often compared to the total number of shares issued or the public float (of all the stocks for a particular company that exist, only some are actually available for purchase on the public market, others are locked away via ownership by insiders) as a percentage, but this is somewhat misleading. (For simplicity, I'll stick with the total number of shares issued.) If the short interest (X) is higher than the total number of shares issued (Y), it does not imply that it's impossible for the short sellers to cover, because the total number of shares owned is actually X + Y (the short sellers created/sold new shares to new owners).

Still, though, the higher that ratio is, the harder it is for short sellers to unwind their position.

Squeeze

In a volatile situation where the short interest is high, an increase in the price of the stock can lead to a run-away process where price increase begets further price increase. The initial catalyst could be anything, like some sudden change in the company's outlook, or the short sellers collectively realizing they are screwed. As the price increases, the collateral requirements and interest rate for the short sellers increase also, so some of them give up and begin buying the stock to close out their position (at a loss), which drives up the price even more. This chain reaction can result in the stock price temporarily increasing by several multiples.

Gamestonk

So the short interest for Gamestop had been high for some time, exceeding 100% of total number of shares in existence. This by itself doesn't create a short squeeze: if everyone agrees that the company/stock is a piece of shit and the shorts are justified, then the price continues being where it is or declining (giving the short sellers a profit). But a few things happened:

The Bull Case

Ryan Cohen's thesis is that Gamestop should pivot away from the declining, unexciting business of being a traditional brick-and-mortar retailer that is its legacy to pivot into building out a growing, glamourous, cash-printing digital experience/ecosystem as the core of its business going forward. This would reflect the reality of where gamers/dollars are going today. Whether this can be pulled off remains to be seen, but Cohen has the right experiences/credentials to get investors/management hard about the potential of him leading such a transformation, and his board appointment indicates Gamestop's willingness to move in that direction. It's not out of the question to think that Cohen may become the CEO of Gamestop in the near future. Regardless, Gamestop seems committed to the strategy he outlined.

So, there is some plausible case to be made that Gamestop's stocks are undervalued, and may be worth more in the future.

The Basket Case

Wallstreetbets is a subforum of Reddit where people go to lose brain cells. Well, first you lose the brain cells, then you lose your life savings. I mean, it's the internet, so you can never believe what people say... but almost certainly some of these people actually are maxing out credit cards to jump into meme stonks (and I support their right to be retarded). Still, theatre or not, it's pretty entertaining to read, and occasionally someone knowledgeable comments and you could learn something about the stock market (but... mostly just memes).

WSB isn't some kind of cabal of conspirators (they don't have enough collective brain cells to organize a conspiracy) -- it's more like a headless horde of zombies (zombies of below average mental faculties). Sometimes things become a meme and sometimes they don't. For whatever reason, GME captured the WSB collective to become the meme of the month. Of course, it has since blown up in mainstream media due to the stock price 10xing over the course of a few days and also the narrative of retailers vs hedge funds, but before then it was just something a relatively obscure corner of the internet made gifs and dumb jokes about, and before then it was just something that a few members talked about. One of the (the?) first redditors to start buying GME is DFV, and he was derided for being an idiot before everyone eventually came around and piled in on the stock.

Why did people start piling in? Probably because eventually GME started trending up. But it probably wasn't WSB that caused this -- as late as July 2020 DFV was still posting loss porn on GME (and getting called an idiot), but it turned into profit porn in August, maybe because of Ryan Cohen's actions. It seems that at that point GME reached a tipping point and more and more people began to buy into the bull thesis. At that point, WSB served like something of an amplifier. It gave the idea more visibility so more and more people became aware of the situation and an increasing fraction of them decided to jump on the bandwagon, for lulz, FOMO, personal gain, or whatever. But not, at least at this point, to screw over hedge funds.

So the first point I wanted to make is that WSB did not collectively decide to do anything. People were in disagreement until some kind of tipping point arrived and then groupthink became stronger and stronger. The second point is that WSB did not collectively decide to jump in on GME to fuck the hedge funds. The GME hype train was well in motion before the anti-hedge fund undertone became the headline. If you read those earlier posts, nobody is talking about Melvin or Shitron (some of the funds/people that shorted Gamestop). But today, you can't wade into WSB without bumping into a post or comment that talks about holding the line or make the shorts bleed or whatever.

In other words, I don't think WSB "caused" or was the initial catalyst for the Gamestop short squeeze (if any -- the situation is developing), but as an online watering hole, it allowed people to exchange ideas and was instrumental in building up the Gamestop situation to where it is today. So,

  1. Technology is enabling more people today to become more knowledgeable about and participate in investing and the stock market. This is driven by mass adoption of investment platforms like Robinhood and Webull, 0 commission trades, and the ease of looking up information online.

  2. Given 1, the dynamics of how stock trends play out are more affected by group dynamics of online social networks, which is very different from before, because people are more connected, anonymous, etc. (groupthink/echo chamber effects seem stronger)

  3. Retailers vs the establishment is more of a post-hoc rationalization than a driving force in building up the Gamestop phenomenon to where it is today.

Effect Of WSB On The Real World

Evidently, WSB has had a big effect on the media, but it's not clear to me that it had or has that much effect on the actual price of stocks. WSB reached 2 million subscribers sometime in the week of 2021-01-18 (MLK day); today (2021-01-30) it has 7 million. Of 2mm subs, lets say 10% actually held GME stock (today it might be higher, but I think 20 is generous for pre-2021 -- also, 2mm is how many people joined, the daily active users count will be much lower), and maybe the average number of stocks per person is 10 (some people do take large positions, but there are also a lot of students/BA-degree-holders), so 2mm shares out of 70mm. It's not nothing (around Burry's share), but I don't think it's enough to start a short squeeze. This is just conjecture, but I'd guess that other hedge funds or investors (like Burry/Cohen) are more likely responsible for having driven the initial ramp up (perhaps after having been alerted by/observed the action on WSB).

Today, it's less clear. People who have never used reddit a day in their life are probably buying GME due to the media coverage, but they are also more likely to abandon the stock at the first sign of trouble (the technical term for this sort of person is a "paper-handed bitch"). Retail might be having a substantial effect on the price movement today.

Self-fulfilling Prophecy

After Cohen was given a (few) board seat(s) in early January, GME doubled in price from ~$20 to ~$40 over the course of a few days. It was around this time that your correspondent decided to hop on the bandwagon (bandrocket) and put some money into GME.

I decided that short squeezes were self-fulfilling prophecies. If everyone thought a short squeeze was going to happen, then people would buy/hold, short sellers would cover, and it would happen. On the other hand, if everyone thought it wasn't going to happen, then it wouldn't. I didn't know if Ryan Cohen or the console cycle or whatever meant that GME was "really" worth $40, but during the short squeeze the valuation would not be determined by "market fundamentals" or "reality" but by the logic of the self-fulfilling prophecy, and as such it could go as high as people (collectively) wanted to. That (and the high short interest) convinved me that GME could be worth more. Also, YOLO.

There are three games going on: a game of chicken between the short sellers and short squeezers, and two prisoner's dilemmas, one for each group.

Timeline of Events

I started following the story a lot more closely since then. Here are some of the things I witnessed:

The Sheriff of Nottingham

Robinhood... oooh boy those guys are fucked. I mean, I know my mind immediately went to the conspiracy theory explanation:

I don't have any sources/concrete examples atm, but it definitely seems to me that anytime someone is explaining the Chinese wall, it's in the context of "here is something that looks bad, because it looks like there was a conflict of interest and some institution breached the firewall and acted unethically for their financial gain per their incentives, which would be super easy to do because it's not like people outside these institutions have oversight, but don't worry it couldn't have happened because that's not how things work -- we have a firewall!" It's never "hey check it out these guys lost a whole bunch of money and now they're bankrupt -- guess the firewall was working!"

Take the Robinhood fiasco. By blocking buys but not sells, Robinhood biases the market towards lowering Gamestop's price (and that is exactly what happened on Thursday). Many retail investors don't have very strong conviction in GME and would panic sell if they see the stock dropping as volatile stocks are wont to do sometimes or if they saw a youtube video/TV segment/talking raven tell them to sell (these are the aforementioned paper-handed bitches). Normally this would be countered by another segment of retail investors that for whatever reason (delusions, a misguided sense of loyalty towards a cause like sticking it to the hedge funds which is a post-hoc narrative as we discussed, financial illiteracy, having a stroke, etc.) decide to HODL or buy, but Robinhood's actions means that they won't be able to. This would drive Gamestop's price lower. Also, the restriction affects Robinhood customers only, it's not a market-wide pause like exchange based halts, so those that sold on Robinhood, would be selling directly to institutional actors such as the hedge funds that overextended themselves on shorts.

In other words, whatever Robinhood's actual motives were, the concrete effect of Robinhood's trading restrictions are beneficial for short sellers such as Melvin, which Citadel has a direct interest in protecting. Immediately, allegations sprung up that Citadel/Ken Griffin pressured Robinhood to make these changes.

On the other hand, here is an alternative explanation:. TLDR is that Robinhood needs to post margins with the NSCC. The margins are to counteract against the risk of trades not being able to get properly settled (like, say, if someone were to buy a share of stock that's been heavily shorted and therefore is hard to find, and then that share couldn't be delivered because the stock has been heavily shorted and therefore is hard to find), and depend not only on the price and volume of the stock (of which GME is probably comparable to e.g. AAPL), but also its volatility (guh) and what fraction of Robinhood's total volume the stock makes up (guh). Due to the volatility and high price, Robinhood didn't have enough capital on hand to meet its margin requirements.

This theory is supported by the fact that Robinhood raised 1bb new capital Thursday night and is reported to have drawn down its credit lines.

Regardless of what the truth is, sentiment is low. People are angry! (Incidentally, I saw a lot of people railing against Robinhood on Blind, but no Robinhood employees commenting on any of these threads. Maybe they're all polishing up their resume.)

(At some point, AOC and Ted Cruz both tweeted their support for investigating Robinhood/the events of the week. I think this is a sign that Robinhood probably did not act nefariously -- only the honest and the stupid piss off both sides of the aisle.)

Assuming for the time being that not being able to meet margin requirements are why they imposed trading restrictions on GME (never assume malice when incompetence will suffice etc.), Robinhood pretty much handled things in the worst way possible.

  • The initial messaging positioned the move as "protecting the investors", which predictably pissed all of Robinhood's customers off.

  • I'm not sure how wide the messaging was. There was no email. Was there a megaphone? Some people may have found out by seeing the "Buy" button disabled in app.

  • Robinhood's founder only started doing interviews after there was significant backlash and everyone believed that Robinhood had made these changes at the behest or for the benefit of Citadel/the establishment.

  • The founder publically claimed Robinhood had no liquidity issues, which we now know was probably false (and contradicts the margin explanation (the explanation where Robinhood isn't evil)).

What Robinhood needed to do was get out in front of the messaging and control the narrative. Thursday morning, the founder needed to have hit all the TV shows/social media and given a reason that doesn't smell like bullshit. Instead, Robinhood asymmetrically restricted trading with minimal and poor messaging, there was predictable backlash and people jumped to conspiracy theories (or are they...), forcing Robinhood on the defensive.

Probably, Robinhood acted the way it did because it was concerned that admitting Robinhood had a liquidity issue and was having trouble meeting the margin requirements would trigger a mass exodus of customers who thought they might not get their money back (another self fulfilling prophecy). In hindsight, they should have been more concerned about the optics (and reality?) of market manipulation. Maybe if they admitted posting margin was a problem, it would have gone better (but maybe not -- as far as the customer is concerned, it's not really their problem, the brokerage should just figure it out). Maybe there was a way to message the whole thing that prevents both a withdrawal run and conspiracy theories/mass outrage. Or maybe there was a way out of their pickle without placing trading restrictions at all (unlikely -- Ken Griffin and his baseball bat are very convincing).

Regardless, the way it did play out, Robinhood lost the confidence/drew the ire of exactly the demographic of users it's catering towards. Maybe it will blow over (there was some kind of outage a while back where you couldn't trade for like two whole days, and then that FINRA fine thing -- how many people remember those?). At the very least, the IPO probably is going to be a dud (people were so excited just a few days ago). Personally, I think this is the end for Robinhood. I'm certainly in their targeted demographic, and I've decided to take my business elsewhere.

Robinhood's Replacement

I (and probably a lot of people) signed up for Webull on a tip that they were still trading GME. I think at some point Webull also halted trading of GME, but they somehow escaped the masses' anger. (The CEO of Webull, Anthony Denier, claims that that Robinhood placed trading restrictions due to clearing costs and not due to Citadel pressure, but of course he would say that, his name is Denier. A different explanation implicating Citadel was proposed by his cousin, Anthony Confirmer.)

I was surprised to find that Webull's interface is pretty nice. It's way better than Robinhood -- there are more widgets and numbers and the lines come in different colors and so on. It seems they are taking a different approach to the UI. Features wise, you can trade during extended hours and participate in IPOs (not every IPO is available, it depends on if the underwriter will throw some bones to Webull).

Fidelity and Vanguard were two other brokerages that did not restrict trading of GME and I also opened accounts with them. Fidelity also gives you access to (select) IPOs, but only if you have a lot of money. Vanguard I think gives you lower expense ratios on their ETFs (if you have a lot of money).

Where We Are

By some accounts, throughout most of the week's antics, the shorts had not covered. (That graph looks like they did cover a bit towards the end of the week though.) It was surprising to me that the short interest remained elevated at the beginning of the week (since I thought a short squeeze was going to happen), like what the hell are these guys doing, don't they know they're going to get burned? But I suppose the short sellers are thinking the exact same thing about the short squeezers. Also, maybe some short sellers got out and new ones (or the same ones) piled in. After all, if you were willing to short the stock when it was < $20, you would much more happily short it at $200.

Parting Thoughts

  • The biggest loser of the Gamestonk phenomenon was Robinhood (and some hedge funds)

  • The biggest winners were Webull et al (and some hedge funds)

  • Retail investors: depending on how early they got in, some are winning and some are losing. Some will be left holding the bag

  • Politicians will use these events to expand their influence in a way that is basically orthogonal to what actually happened

  • Institutional investors may start paying more attention to what WSB are piling onto, or sentiment on Reddit/social networks in general. It's not clear how much signal you can extract here. For every Gamestonks, there are probably dozens of failstonks that come out of WSB. However, it's not important or even desirable to get in early -- DFV started going long GME in like 2019, but you could have jumped in at the beginning of 2021 and made some quite good returns. You could just invest in whatever is popular at any given point -- I'm kind of curious to try this out...

  • rocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocketrocket

Yebisu Optimus Maximus

I became obsessed with Yebisu beer after my second trip to Japan, when I discovered it by going to the Yebisu beer museum appropriately located just outside Ebisu station. You buy some tokens from a vending machine and then you can get a beer tasting and beer snacks. There are like 6 different beers so you have to be prepare yourself if you want to catch them all -- just come on a morning or afternoon when you have nothing else going on (so, any day) and accept that you've arrived at the "publically intoxicated old man who DGAF" stage of your life.

Back in the states, I searched high and low for this beer, but didn't find it. Generally speaking, it's pretty easy to find Japanese beer in America -- any supermarket will likely have one of Sapporo/Kirin/Asahi, Whole Foods has Hitachino, you can probably pick up some Coedo from a hipster store somewhere, etc. And in fact, Yebisu the brewery is owned by Sapporo. But you cannot find Yebisu in anywhere in America. [0] This is because, according to me, Yebisu is the best and greatest beer and Japan is hoarding it for itself. Sure, they'll export the second-tier beer [1], but the good stuff stays at home.

Not only in America, but you can't find it in Asia either. The only place I managed to find Yebisu outside of Japan was in a Carrefour in Taipei, and they only carried it for like a couple months. This is because, according to me, Japan likes Taiwan better than the other Asian countries. You can measure the success or failure of your foreign relations program with Japan from the Yebisu index (how much Yebisu they are willing to send you).

yebisu in combini

You can find Yebisu in just about any convenience store in Japan

yebisu in fridge

Smuggled into Vietnam after NYE trip to Tokyo

yebisu + joel robuchon

Yebisu will come out with seasonal flavors. Here is a cross promotion with Joel Robuchon that... was honestly not very good.

Since I couldn't source it in Asia, the only solution I came up with was to bring back Yebisu with me every time I went to Japan. When you run out, then that's when you know it's time to go to Japan again.

When I had to return to America to escape the pandemic in February (because as we now know, clearly that's the safest place to be during a pandemic, America), I had a layover in Narita. Unfortunately, the 7-11 in Narita did not carry any Yebisu screamscreamscream

not found at the 7-11 in narita

Not found at the 7-11 in Narita

free asahi beer at ANA Narita

There was free beer at the ANA lounge, but... it was Asahi. What am I, a farmer?

ANA sushi

While hanging out at the lounge, suddenly there was a commotion. I went to investigate and found them giving out sushi as a mid-afternoon snack.

Narita fukubukuro

Also found in Narita: Fukubukuros being translated as "happy bags" instead of the more traditional "fukbags".

I looked around and eventually found the one restaurant in Narita that served Yebisu, in the food court

delicious yebisu

A hard won glass of Yebisu after wandering around Narita for like an hour. I paid 700 yen for this shit!

Anyway, the reason I'm was reminded of all this is because I recently came across something. Remember... the scene in Evangelion where Misato drinks some beer and then also her fridge is full of beer? It turns out that Misato drinks Yebisu lololol.

misato beer
misato beer fridge
important questions

So there you have it, Misato knows her beer. By the way, I found out from talking to people in Japan that Yebisu is stereotypically thought of as an old man's beer. Make of that what you will.

0

While briefly contemplating an ill-timed move to NYC just before the pandemic hit, I found people claiming to have found Yebisu in something called a "Sunrise Mart". Sadly, upon physically going to the store this turned out to be nothing but lies.

1

Actually the Sapporo you buy in America is brewed in Canada so it's not even an export.

AI Generated Hacker News Comments Project Breakdown

I've been scraping hacker news comments for another project, and while I had the data I figured I would use it to ship a quick project.

The idea is simple: I pretty much only read the comments on HN, almost never the actual article (ain't nobody got time for that). By training AI to generate comments, you can skip reading the article for anything you want, not just the ones that get a lot of attention.

Originally I envisioned having the model fetch the url and reading the actual article or a summary of it, but in the interest of shipping as close to inside of a weekend as possible the model is predicated on only the title and url. Previous work (salesforce CTRL) has shown that you can generate reasonable news articles from the url alone, so it's not far-fetched to expect reasonable comments from the title/url alone.

Tech Stack

  1. Django + DRF for backend

  2. React for frontend (I previously used Django+DRF+React with additionall Daphne/Channels for websockets for Turnbase)

  3. gpt-2-simple (I have previously used this to generate, uh, interesting works of literature)

Pipeline

Most people are naturally attracted to the modeling aspects of machine learning, but to deploy a production ML system you need to think about/work on the whole pipeline. For this project, I probably spent less than 10% of the total work time finetuning the model. Almost all of the work is in data processing and building the UI.

  1. Data acquisition

  • This is pretty straightforward as HN has a pretty clean API, but did take a few days to scrape (I didn't count this as part of the project time as nominally it was for another project)

  1. Data cleaning/processing

  • I originally kept each HN item as its own text file (because didn't want to bother escaping new lines), but this turned out to make things too slow. As it turns out, it's not good to have a huge (23mm) number of very small files. So then I paged them together a few 100ks at a time.

  • Then, I wrote structures for tree traversal so that you could display the entire comment chain. Now, a child to a particular comment might be stored in another page depending on how much time elapsed between them, so to build out the tree for a particular root, you might need to seek ahead to multiple pages. You can't keep all of the data in memory (maybe you could, I couldn't because my machine wasn't beefy enough...), only a few pages at a time, so I had to write a caching mechanism for this. So this took longer than expected.

  • Cleaning was not too bad as the data is mostly clean. I mostly just filtered out dead comments.

  • It was not clear the best way to format the final training text files. How do you let the model know that a comment is a child to another? Do you write out all the children of a root item (a story) in a nested format (so that each story gets one sample in the training file), or do you write it out one reply chain at a time (so that each story generates multiple samples, which you then need to cap to make sure the most popular stories aren't overrepresented), etc.

  • In the end I used <|c|> and <|ec|> tokens to start/end a children block, sorted the children (HN API gives you the rank but not the score for the children), and limited to 10 children per item (so in theory we get only the best comments) with no limits on the depth (so comment chains can go as long as they want). In theory, the model should learn also the distribution of how many replies an item is likely to get this way (with the cap of 10 slightly modifying the distribution)

  • The whole thing is dumped into a single file with <|endoftext|> to delimit stories.

Note

Hilariously, I found a bug with how <|endoftext|> is used that may explain some previous weirdnesses I'd seen with gpt-2-simple.

  1. Model training

  • I kept things basic and didn't spend too much time tuning parameters. I think you need MLFlow or similar if you want to do any real tuning, otherwise you end up with a bunch of models with names like hn_model_lr_0001_final2_noclip, but didn't attempt setting it up for this project.

  • Used the 335m (medium) gpt2 model, trained on a p3.2xlarge.

  1. Model deployment + serving

  • The simplest architecture would be to keep a copy of the model on the same machine that serves the website. Whenever someone submits a new story that they want AI generated comments for, the backend would then invoke the model. You wouldn't want to invoke it right away, you'll need to put it in a queue and thread or the website will freeze whenever the model is thinking.

  • I was a cheapskate and the machine that serves the website is a t2.micro, so inference wasn't even possible as it OOMs

  • I investigated whether you could get by with a t2.large. You can do inference (no OOM), but with the CPU it's like 40 or 60x slower than a p3.2xlarge (on which inference takes about 30s with overhead for loading the model and stuff, so more like 50s if that's needed). I felt like that would make the user experience too poor.

  • So instead, there are two machines (website and inference). The website's DB acts as a container for the queue. There is an endpoint to broadcast if there are stories awaiting generation. Then, a script continuously checks the endpoint. When there are stories, the script then sshs into the second machine and kicks off a second script that lives on that machine to do inference (this way I didn't have to set up endpoints on the second machine, saving myself some work, although this SHOULD be fairly straightforward with an MLFlow workflow). This also means that inference can be batched which is a little bit more efficient due to the overhead of loading the model and setting stuff up.

  • It's realy janky, but it works, and it's cheap. You could set it up so that the second machine is turned off most of the time and only spun up for inference and then shut down (adding slightly to the overhead per inference). It works out pretty well since inference is on the order of a minute which is the minimum time block AWS will bill you for. I just turn it on manually whenever there are things in the queue and I'm paying attention, which is almost never.

Tips/Summary

This post has gone on long enough, so I'll just quickly summarize with a few tips:

  • Think about your whole pipeline. You need tooling for the entire pipeline so that you can iterate (your data format may change, your encoding may change) quickly.

  • For gpt2 specifically, encode your data before you finetune.

  • Inference costs need to be thought about. I was kind of expecting that most of the cost would be in training the model, so it was an unwelcome surprise to discover that I needed a machine as powerful as the training machine to do inference. In fact, for typical production models, the inference cost should way outweigh the training costs (unless you're updating your model constantly), so it's much more important to make sure that inference is efficient and economical. My interest in the distill* family of algorithms has gone up since this project.

  • Funny enough, gpt3 came out the day after I shipped this project. I don't want to think about deploying that in a production system.

Addendum (2020-07-11)

I tried deploying a GPT2 model with MLFlow. MLFlow can build a docker image for you with the model, artifacts, and any necessary libraries packaged inside. I've used it to deploy simpler model before and prefer this work flow because I don't want to mess with setting up conda and conceptually this is clean. However, I ran into some kind of CUDA error. This seems to be an issue with tensorflow+MLFLow build docker specifically.

2 Cups 100 Floors

There is a brainteaser that goes like this. There is a building with 100 floors and you have two identical cups made from some unknown type of glass. You want to determine the highest floor that you can drop the glass from without it shattering. Call it floor k. The glasses can be dropped as many times as you want from floors below k without breaking, but as soon as you go above k, they break.

So one way to do this is by dropping the cup from each floor from 1 to 100, sequentially, until it breaks. But this is obviously pretty inefficient. What is the most efficient way of determining k (in the sense of requiring the fewest number of test drops)?

(I heard a version with eggs instead of cups, and my friend immediately answered that k should be 1 because it’s an egg. “What the hell kind of eggs are you using??”)

There is a fairly simple solution that people commonly come up with. And, it’s pretty good, but not optimal. This post is about how to get the really actually truly optimal solution. Here is a pause in case you haven’t heard this before and want to figure it out for yourself. Otherwise, read on.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First, since you don’t know what k is, “fewest” means minimizing the average number of steps your algorithm takes. You could define some other measure of optimality, but this is what we’ll go with. It’s important to note that the answer depends on the distribution k takes on, but it seems pretty reasonable to assume a uniform distribution.

The simple algorithm goes like this: first, try the 10th, 20th, … etc. floors until the first glass breaks. Once the first glass breaks, you’ve narrowed k to a range of 10 floors. E.g., it didn’t break at 30 but it broke at 40, so k must be between 31 and 39, so you use the second glass to try 31, 32, … etc.

It’s clear that, once the first glass breaks, you must try the second glass one floor at a time, starting from the highest known floor not to break a glass. So the strategy is entirely determined by what sequence of floors to try for the first glass, before it breaks. The simple solution can be represented by the array [10, 20, …, 90].

You can calculate (e.g., by writing code, or using maths) that the average running time for this algorithm is 10.81 steps. But, the average running time for the following algorithm, which is optimal, is 10.31:

[13, 25, 36, 46, 55, 63, 71, 78, 84, 89, 93, 96, 98, 99, 100]

It turns out that the simple algorithm is actually pretty good. The optimal is only a 5% improvement or so. But why isn’t the simple algorithm optimal? Because it doesn’t self adjust to additional information.

What is the optimal solution if you only had 50 floors? There’s nothing special about the number 10, so there is no reason to think that [10, 20, 30, 40] should be optimal for 50. If we think the heuristic is a square root, then we should be going up by about 7 floors each time, something like [7, 14, 21, …].

Note

(Square root is actually a pretty reasonable heuristic — if you make the interval too large, you have to a lot of drops with the second glass, and if you make the interval too small, you have to do a lot of drops with the first glass. Square root balances the expected amount of work you need to do with the first glass with the second.) (And, in fact, the average running time for [7, 14, 21, …] is 7.86, better than [10, 20, 30, 40]’s 8.22.)

So, back to the 100 floors case, this means that, if for your first glass, you went 10, 20, … all the way up to 50, and the glass hasn’t broken yet, then switching to going up in intervals of 7 instead of 10 would yield a lower average running time. Because if you’re at 50 and the first glass hasn’t broken yet, then you have reduced to the 50 floors case. The naive strategy of always going up by 10 floors doesn’t adjust to the information you get from the glass not breaking.

This reduction idea hints at how to find an optimal solution. What if you already knew optimal solutions for the 1 floor, 2 floor, … 99 floors cases? Then to find an optimal solution for the 100 floors case, all you need to do is figure out what is the first floor p that you should test drop at. If the glass breaks at p, you are forced to use the second glass one floor at the time. If the glass does not break at p, then you follow the optimal solution for 100-p floors from your book of solutions.

(My first attempt at explicitly writing down the optimal solution was to brute force all possible strategies — that is, all sequences [a1, a2, … am] with \(0 < a_1 < a_2 < \ldots < a_m <= 100\). My code ran for 600k iterations before I decided to estimate the number of such sequences and found it was about 2^100 = 10^30. Whoops.)

Let \(f(n)\) denote the optimal average running time of the n floors case. Let’s say that we decide to pick the first floor as p. If \(k = p\), then we need to try every floor below p, so the cost is p. If \(k < p\), then the cost is 1+k: 1 for trying the first glass at floor p, which breaks, and k tries with the second glass until it breaks. If \(k > p\), then we reduce to the \(n-p\) floors case, so the cost is \(f(n-p) + 1\). To be optimal, we need to pick the p that minimizes the expected value of the cost, so:

\begin{equation*} f(n) = \min_{p \in [1, n]} E[1+k | k < p] + \frac pn + (1- \frac pn)(f(n-p) + 1) \end{equation*}
\begin{equation*} f(n) = \min_{p \in [1, n]} \frac{p-1}{n} + \frac{(p-1)p}{2n} + \frac pn + (1- \frac pn)(f(n-p) + 1) \end{equation*}
\begin{equation*} f(n) = \min_{p \in [1, n]} \frac{p^2 + 3p - 2}{2n} + (1- \frac pn)(f(n-p) + 1) \end{equation*}

I am not entirely sure how to “solve” for \(f\)–maybe there is some slick trick, or maybe it just can’t be done. But it’s straightforward enough to write some code to compute it:

def poly(p):
  return p*p + 3*p -2

n = 10000
f = range(0, n+1)
g = range(0, n+1)

for i in range(2, n+1):
  min_f = -1
  min_p = -1
  for p in range (1, i+1):
    tmp = poly(p)/2
    if p < i:
      tmp = poly(p)/2 + (i-p)*(f[i-p] + 1)
    if min_f < 0 or tmp < min_f:
      min_f = tmp
      min_p = p
  f[i] = (min_f+0.)/i
  g[i] = min_p

Saving the g vector is important if you want to actually know what the optimal algorithm looks like. You can read it off with something like

k = 100
s = [g[k]]
k = k - s[0]
while k > 0:
  s.append(s[-1] + g[k])
  k = k - g[k]

Bonus:

Here is a graph of how \(g\) behaves. As with \(f\), I don’t know how to write it down, but it should track \(1/\sqrt{n}\) fairly closely.

Is Simulation Possible?

In “the future”, when I upload my brain into a computer (whatever that means) and delete the physical original, have I transcended into immortal existence or did I just kill myself?

That question is too hard. Let’s start with an easier one: “is strong AI possible?” Here is an argument for why it should be possible:

We don’t understand how consciousness works, exactly, but however it works, it’s based on physical processes, because the brain is a physical object. (The notion of a soul existing in some astral plane separated from physical existence is absurd.) Even if we don’t have a good theory of how cognition works, we do have a good theory of how the universe works. Forget about cognition; it’s too hard. In two years when we have desktop running at ten thousand teraflops or whatever, let’s just simulate the brain at the sub-atomic level, lepton for lepton, leprechaun for leprechaun. Then you have machine sentience.

Implicit in this argument is that simulation is possible. Obviously, simulation is possible in the sense that you can simulate things (subject to computing power, which let’s assume is not an issue, because historical trends are good and the question here is whether simulation is possible in principle); what I mean by this is whether the simulated brain is just as “real” as an actual physical brain. Or, to be even more ambitious, let’s simulate a whole universe, with planets and meadows and people running through trees hunting dinosaurs, etc. Is this universe real? Are those people real?

It’s kind of a stupid question, because if it talks like a real person and acts like a real person, then… Also, you would argue that this universe is real enough to the people inside of it–they can feel the grass between their toes and be warmed by the sun and smell spring and all of that mushy stuff. They would have no idea that their existence is a simulation.

Nevertheless, accepting that the simulated universe is real is philosophically troublesome, because where does this universe exist? It emphatically does not exist in the wires in your fifty exoflop laptop, in its hard drives or transistors. Transistor/electrical is just one way to implement a turing machine; you could use billiard balls or even just do the computations by hand with pen and paper. You could simulate using one form of a turing machine up to some time T, output the simulated state onto some other medium, and resume the computation using another form of a turing machine! The simulated people would not know the difference.

If you accept that this simulated universe exists, you must also accept the following

  1. The simulated universe exists in some platonic realm. There is a one-to-one correspondence with the numbers in your computation and the states of the objects in the simulated universe (variable x corresponds to some parametre of the wave function of this electron, etc.), sure, but the simulated universe does not exist “inside” of your universe.

  2. All that is required for existence in this platonic realm is a representation, which means some descriptors (the numbers in the computation) and a model for interpreting that representation (this number maps to this thing in the simulated universe). As an aside, the descriptors themselves can be instrumented through representations! There are no bits in your computer, there are electrical signals that represent 1 or 0 based on voltage. There are no bits in your harddrive, there are physical pieces of medium that represent 1 or 0 based on magnetization. There are no bits when you write down a number, there are physical marks on paper that represent 1 or 0 based on what they look like. In other words, the bit already lives in the platonic realm.

Questions for next time

  1. If all that’s required for existence is the representation, is it even necessary to run the simulation? Could you just say, here are the equations my simulation would have solved (here is the algorithm my turing machine would have run), here are the initial conditions, and now the simulated universe exists? If you’re in a deterministic universe, specifying the governing equations and initial conditions determines the state at every time point already–the solution exists, even if you don’t know what it is, so what extra value does doing the computation add? How does computing the solution (i.e., doing the simulation) make the simulated universe more real? Even if you’re in a non-deterministic universe, specifying the (probabilistic) governing equations and initial conditions determines every solution that is compatible with said equations and conditions (i.e., they determine every simulated outcome that could have taken place when you run the simulation), so what extra value does doing the computation add? You’re just moving from an implicit representation (the equations + initial conditions, or equivalently the algorithm + initial state) to an explicit one (the computed descriptors at each time step, as represented by bits as represented by whatever physical system your turing machine runs on).

  2. What happens when you make an error in the computation, due to e.g. hardware glitch or human error if doing it by hand?

  3. Glossed over is the fact that what we have are not exact governing laws of the universe (and we may never?), all we have are models to some varying degrees of accuracy, and additionally we may lose accuracy by choosing a coarse spatial or temporal resolution or poor numerical algorithm or not having enough sigfigs in our floating point arithmetic. Does that matter?

Appendix

I used simulating consciousness to motivate what I really wanted to talk about, which was simulating the entirety of existence and what it means to exist. Being able to simulate a universe is sufficient but not necessary for being able to simulate consciousness (again, simulation to me means that the simulated consciousness is “real”), and I think the former is strictly harder while the latter is almost not an interesting question at all. There are arguments such as this and this which supposedly show why simulating consciousness is not possible; to me these arguments lack imagination. “X is unintuitive” is not an argument against x.

The fundamental question re: simulating consciousness is

Is the neuron (a real one) the only structure that is capable of mediating (physically supporting) consciousness?

The answer is a resounding no, because (intelligent) aliens almost certainly exist (if not, then they almost certainly can exist, which is all that’s needed here), and aliens almost certainly do not have the same biological architecture as humans do.