Some CCG derivations in Hindi

Combinatory Categorial Grammar (CCG) is one of the many, many (far too many) syntactic formalisms posited by linguists in the Chomskyian era. CCG has been outlined in work by Mark Steedman, the most recent guide being his 2001 book which I have to read for a class.

Unlike most other syntactic theories, I find the mechanics of CCG very elegant. There is no difference between the distributional categorisation of a lexeme and the argument structure that it has; both are neatly contained in the CCG type system, and the connections to type theory just work out very nicely (with the caveat that this is in my limited understanding of type theory).

Naturally, I was thinking about how CCG would be applied to Hindi syntax. This isn't anything novel (see this 2017 paper on exactly that) but for my own edification I wanted to see how flexible CCG is across languages. Here's CCG analysis of some simply clauses in Hindi; as you can see, all the derivations proceed backwards, which makes sense since Hindi is pretty left-branching.

  1. mɛ̃ kām karūŋgā
    NP
    : I′
    NP
    : work′
    (S\NP)\NP
    : λx.λy.will-do′xy
    S\NP
    : λy.will-do′work′y
    S
    : will-do′work′I′
  2. mɛ̃ kām karūŋgā aur soūŋgā
    NP
    : I′
    NP
    : work′
    (S\NP)\NP
    : λx.λy.will-do′xy
    CONJ
    : and′
    S\NP
    : λy.will-sleep′y
    S\NP
    : λy.will-do′work′y
    S\NP
    : λy.and′(will-do′work′y)(will-sleep′y)
    S
    : and′(will-do′work′I′)(will-sleep′I′)

So, those were pretty straightforward; note that I also included a weak functional representation of the semantics in the same lambda calculus of Steedman. You can see that the familiar (S\NP)/NP of English is replaced with (S\NP)\NP, corresponding to SOV structure in Hindi (both verbal arguments are to the left). I do wonder how a language where the object is further than the subject would work though.

Here's a more complex clause, with a ditransitive verb as well as another example with non-canonical word ordering. The preferred way to analyse non-SOV orderings (which are canonical in Hindi) is with type-raising and then forward composition (a rare rightwards derivation). I am reminded of how CCG handles relative clauses in English, which have screwed-up word order due to an argument being moved out of the clause.

  1. us ne mujʰ ko paise diye
    NP NPne\NP NP NPko\NP NP ((S\NPne)\NPko)\NP
    NPne
    NPko
    (S\NPne)\NPko
    S\NPne
    S
  2. us ne paise mujʰ ko diye
    NP NPne\NP NP NP NPko\NP ((S\NPne)\NPko)\NP
    NPne NPko
    (S\NP)/((S\NP)/NPko)
    (S\NPne)\NP
    S\NPne
    S

One thing I would like to explore is ergativity. Ergativity has been studied to death by Hindi linguists, unfortunately. But I like how you can easily express the verbal ergative agreement using a feature on the subject, which even works well where a serial verb overrides the agreement of a main verb. Also, check out all the nice left derivations!

  1. mɛ̃ ne kām kiyā
    NP NPne\NP NP (S\NPne)\NP
    NPne
    S\NPne
    S
  2. mɛ̃ ne kām kiyā tʰā
    NP NPne\NP NP (S\NPne)\NP (S\NP)\(S\NP)
    NPne (S\NPne)\NP
    S\NPne
    S

Here's how lenā and cuknā are handled as serial verbs; the former takes the ergative and the latter does not. The backward composition operation overrides the unspecified subject type of the bare verb kar.

  1. mɛ̃ ne kām kar liyā tʰā
    NP NPne\NP NP (S\NP)\NP (S\NPne)\(S\NP) (S\NP)\(S\NP)
    NPne (S\NPne)\NP
    (S\NPne)\NP
    S\NPne
    S
  2. mɛ̃ kām kar cukā tʰā
    NP NP (S\NP)\NP (S\NP-ne)\(S\NP) (S\NP)\(S\NP)
    (S\NP-ne)\NP
    (S\NP-ne)\NP
    S\NP-ne
    S

Dative subjects:

  1. us ko ḍar lag rahā hai
    NP NPko\NP NP (S\NPko)\NP (S\NP)\(S\NP) (S\NP)\(S\NP)
    NPko (S\NPko)\NP
    (S\NPko)\NP
    S\NPko
    S