I have been fascinated by the ACL 2022 Best Paper, Learned Incremental Representations for Parsing by Nikita Kitaev, Thomas Lu, and Dan Klein. I had the good fortune, wandering around in Gather.town, to attend the poster session virtually before the award was announced. The big thing that the paper showed is the tractibility of human-like incremental parsing, which seems to have been pretty much a dormant problem in the field ever since full-sentence language models started dominating all the benchmarks. What’s especially crazy is the numbers Kitaev et al. produce with only left context for their parsing model beat state-of-the-art whole-sentence models from just a couple years ago.

Kitaev and Klein have done a ton of great work on syntactic parsing in the past, e.g. this paper which introduced attention to constituency parsing in 2018, or this one which Klein is an author on, achieving SOTA on constituency parsing with a neural chart-parsing model for (I think) the first time. Thomas Lu seems to have been a fourth-year undergrad at Berkeley at the time when the paper was submitted—insanely cool.

What I found exciting about the paper was two things: (1) vector quantisation, a technique to induce discrete values in the model pipeline while still having the system be end-to-end differentiable; and (2) the unsupervised learning of a pretty small inventory of token labels, whose distribution is worth investigating. I’ve thought about a couple of tasks that would need learned discrete labels as an intermediate step, so learning about VQ was a huge deal for me (and led me down the rabbithole of variational autoencoders recently). But my main topic here today is the learned tagset itself.

While checking out their code, I was excited to see that the model can be run in Colab extremely easily, so naturally I decided to poke around at it. The purpose of this post is to list some things I figured out about the learned 32-token tagset—I was curious how it compared to the usual part-of-speech tagsets that we use. Note that while they largely discuss the 256-token tagset with an F1 score of 94.49, the 32-token one is still really good with a score of 93.50.

Prepositional phrases

One of the first things I thought to poke at were prepositional phrase attachment, since it was mentioned in the paper. One thing became pretty clear: the thing this tagset is trying to do is very different from what conventional POS tagsets are. Part-of-speech tags intends to group lemmata with similar distributions together; e.g., the Cambridge Grammar of the English Language designs its POS tagset by making arguments about distribution, coordination scoping, and similar linguistic considerations—this is how it ends up grouping some conventional subordinators with prepositions, due to shared acceptance of some clausal dependents.

This tagset, on the other hand, is learned as the sole input in order to parse syntax, and that too with only the benefit of left-side context. A normal POS tagset, I wager, is not informative enough to generate accurate syntax trees by itself.

A common feature is a distinction between leading (initial) and non-leading tags, sort of like the B and I tags of BIO. This makes sense; a leading tag signals the start of a new constituent, say an NP, and a non-leading tag signals its continuation. But things were not so simple as to let there be only two tags for indicating an NP. Below are some minimal pairs I found for PP-attachment tags.

PP attachment (NP vs. VP)
[24, 11, 0][4, 16, 25]There is some cheese {from, in} the farm
[24, 11, 0][4, 11, 7]I saw the {man, plant} with a telescope
[24, 11, 21][4, 28, 21]I drew pictures {of, with} my kids
Presence of preceding VP-internal constituent
[17, 11, 7][4, 28, 21]I drew {∅, pictures} with my kids
[17, 28][4, 28]He is {∅, truly} in fear
[17, 28, 21][4, 11, 21]He is {∅, truly} in the farm
[17, 11][17, 28]He is in {France, fear}
[17, 11, 7][17, 11, 21]I am on the {internet, edge}

Table 1: Tags outputted for some PP attachment minimal pairs.

So we see that the tag on the preposition really does signal where it’s attaching: to the preceding NP (4), or to the VP (24). Also, there is another tag for attaching to the VP when there is no preceding NP in the way (17).

The nouns are sort of screwed up though. There is a clear distinction between leading and non-leading tokens, but it’s not just two tags involved. We see all of the following: [11, {0, 7, 21}], [28, 21], [16, 25]. I think this is signalling some kind of syntactic tendency embodied by these determiners/nouns, to use for potential dismabiguation as more tokens to the right are consumed by the encoder. I’m not sure what that tendency is though. Really interesting to find nevertheless!

Attempts to describe each tag

Because I am excellent at making use of my time, I tried to figure out what each tag was conveying syntactically. Some of these descriptions are very rough and were before I systematically tried out some of the PP-attachment stuff, so do not take them as authoritative. I also could not get it to produce all of the tags since I was only using it manually (it seemed like a fun challenge). An actual analysis should probably run the tagger on the original paper’s test set.

  1. Non-leading NP elements but nested in something. This contrasts with 7; if 0 is used, the PP attaches to the NP preceding, but with 7 it attaches to the VP as an adjunct.

    • I saw a man with a book
  2. Commas.

    • When I was young , I had no brains
  3. Fronted verb or auxiliary verb in WH-questions.

    • What have you been doing
    • How dare you !
  4. Final punctuation.

    • I hate commas .
  5. Prepositions, complement to VPs. Interestingly, to (in the infinitive) falls under this category as well!

    • There is some cheese [in the fridge]PP
    • I did this [with the goal of saving you]PP
    • I came here [with him]PP [on the run]PP
    • I did this [to save you]S
  6. Non-leading elements of QPs.

    • It happened [almost 20]QP years ago
  7. Verbs in main clauses. Your least fancy verb in the sentence goes under this category. Interestingly, this includes auxiliary verbs in the clause too. This leads to chains of 6s.

    • Luna smashed the pumpkin
    • You run
    • I will be drawing a picture of him
    • He said that you are mean
  8. NP complements of PPs.

    • The power of the people is magnificent
    • Inside a potato is a man
    • I am a man of the city
    • I saw the planet with a telescope

    Some kind of non-leading NP element (compounds?)

    • What are the consequences of artificial intelligence ?
  9. Conjunctions.

    • [You or me]NP will go
  10. Verbs in nominalised clauses.

    • Eating food {is, seems} epic
    • Yelling at people is mean

    Kind of cursed, but also to when indicating a complement clause.

    • I want to be the best
  11. Adjectives in predicate position.

    • I am hungry
  12. Leading elements of non-subject NPs. This one is super interesting. So, if the object NP has a determiner, then that determiner is labelled 11. But, if there is no determiner in that NP then it’s just the nominal labelled that! This makes sense given only left context is available to the model; this signals the left-most boundary of an object NP.

    • The man eats the food
    • The man eats food

    Also seems to include PP objects.

    • A hungry man is in my kitchen
  13. Verbs in SINV (inverted clauses), as well as where the subject is a VP or NP with relative clause. Seems to be an indicator to head back up to the S constituent after a pretty complex subject.

    • Behind every door is a fridge
    • Eating food is epic
    • The man who I saw smells funny
  14. Verbs in adjunct clauses.

    • [When Luna smashed the pumpkin]SBAR , she hurt her foot

    Verbs in WH-subject clauses which cause movement.

    • Who are you
    • What [have you done]SQ
    • What have you been doing
  15. ?

  16. Leading NP elements of subjects of adjunct and complement clauses.

    • When the man was young he lived here
    • He said that you are mean
  17. Leading elements of main-clause subject NPs. Interestingly, this behaves the exact same way as 11, indicating the leftmost token of a subject NP.

    • The man eats the food
    • He eats the food

    Also includes subjects of SQ which end up post-verbal.

    • How dare you !
  18. Preposition that heads a PP adjunct to the verb.

    • I am [inside a potato]PP
    • We live [for the delicious things in life]PP
    • Yelling [at people]PP is mean
  19. Adverbs (spatial adjuncts, adjective modifiers).

    • He lives here
    • Life is really weird
  20. Adverbs (verbal adjuncts in general).

    • He quickly handed me the potato
  21. Leading subjects of WH-questions and main clauses with preceding WH-adjunct.

    • What have you been doing
    • When the man was young he lived here

    Adverbial modifiers to S. These nest directly under S, no intermediate projections.

    • He threw up today

    Leading direct objects in those constructions where the indirect object isn’t in a PP.

    • He quickly handed me the potato
  22. Non-leading NP elements. These seem to all be non-leftmost elements of the (flat) NP that are nested under a VP or S directly. This contrasts with 25.

    • The man [eats [the food]NP ]VP
    • [[A hungry man]NP has arrived]VP

    In copular clauses, inside the predicate NP following the determiner (labelled 28), the nominal is labelled 21 or 25. It seems to be 21 if inanimate (or tending towards inanimate?) but 25 if animate. Honestly, I have no clue what’s going on here. Maybe the sentence I am testing are two short relative to the training data so they confuse the parser, but this animacy thing seems to have a point to it!

    • I am a {pumpkin, cheeseburger, telescope}
  23. Complementiser.

    • He said that you are mean
  24. WH-words. This is used regardless of where they are syntactically, in a question, in a relative clause, or as an adjunct clause (with when) etc.

    • What have you done
    • Who are you
    • Why are you hitting yourself
    • The man who I saw smells funny
  25. Prepositions.

    • The cat in [in the hat]PP strikes back
  26. 25 is used for a variety of non-leading NP elements inside the VP. One is the first NP with a PP complement. Another is the NP inside a PP following the object of the verb, which seems entirely counterintuitive.

    • There is [[a man]NP [in the fridge]PP ]NP
    • We live for [[the delicious things]NP in life]NP
    • There [is some cheese [in [the fridge]NP ]PP ]VP

    I have found yet another cursed contrast here though, see 21.

    • I am a {man, woman, lady, baby, machine, device}
  27. Verbs in relative clauses.

    • She ate the pumpkin [that Luna smashed]SBAR
    • [What you said]SBAR was wrong
  28. ?

  29. NP-leading predicate of a copular clause. Also in some PPs?

    • We are men (of the city)
    • I am a man (of the city)
    • I did this [with the goal of saving you]PP

    Honestly just a grab bag of weird NPs.

    • I am [[four feet]NP tall]ADJP
  30. Non-leading NP element that is a modifier to an ADJP.

    • I am [[four feet]NP tall]ADJP
  31. ?

  32. Particles.

    • He threw up everywhere

    The adverb ago.

    • [6 years ago]ADVP he got here


  1. Nikita Kitaev and Dan Klein. 2018. Constituency Parsing with a Self-Attentive Encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2676–2686, Melbourne, Australia. Association for Computational Linguistics.
  2. Nikita Kitaev, Thomas Lu, and Dan Klein. 2022. Learned Incremental Representations for Parsing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3086–3095, Dublin, Ireland. Association for Computational Linguistics.
  3. Mitchell Stern, Jacob Andreas, and Dan Klein. 2017. A Minimal Span-Based Neural Constituency Parser. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 818–827, Vancouver, Canada. Association for Computational Linguistics.