Package takeover: indents

Parsers are one of Haskell’s indisputable strengths. The most well-known library is probably Parsec. This parser combinator library has been around since at least 2001, but is still widely used today, and it has inspired new generations of general purpose parsing libraries.

Parsec makes it really easy to prototype parsers for certain classes of grammars. Lots of grammars in use today, however, are whitespace-sensitive. There are different approaches for dealing with that. One of the most commonly used approaches is to add explicit INDENT and DEDENT tokens. But that usually requires you to add a separate lexing phase – not a bad idea by itself, but a bit annoying if you are just writing a quick prototype.

Image may be NSFW.
Clik here to view. $^(\t{2,})(\S.*)\n(?:\1\t.*\n)* can get you only so far in life$

^(\t{2,})(\S.*)\n(?:\1\t.*\n)* can get you only so far in life

That is why I like the indents package – it sits in a sweet spot because it is a straightforward package that allows you turn any Parsec parser into an indentation-based one without having to change too many types.

It offers a bunch of semi-cryptic operators like <+/> and <*/> which I would personally avoid in favor of their named variants, but other than that I would consider it a fairly “easy” package.

Unfortunately, I found a few bugs an inconveniences in the old package. One interesting bug would allow failing branches of the parse to still affect the indentation’s internal state, which is very bad ¹. Additionally, the package fixed the underlying monad, which prevented you from using transformers.

Because I didn’t want to confuse people by creating yet another package, I took over the package which is a very smooth process nowadays. I can definitely recommend this to anyone who discovers issues like these in unmaintained packages. The hackage trustees are doing great and valuable work there.

I have now uploaded a new version which fixes these issues. To celebrate that, let’s create a toy parser for indentation-sensitive taxonomies such as the big tea taxonomy ²:

tea
  green
    korean
      pucho-cha
      chung-cha
    vietnamese
      snow-green-tea
    japanese
      roasted
        ...
  black
    georgian
      traditional
      caravan-blend
    african
      kenyan
      tanzanian
    ...

We need some imports to get rolling. After all, this blogpost is a literate haskell file which can be loaded in GHCi.

> import           Control.Applicative ((*>), (<*), (<|>))
> import qualified Text.Parsec         as Parsec
> import qualified Text.Parsec.Indent  as Indent

We just store a single term in the category as a String.

> type Term = String

A taxonomy is then recursively defined as a Term and its children taxonomies.

> data Taxonomy = Taxonomy Term [Taxonomy] deriving (Eq, Show)

A parser for a term is easy. We just parse an identifier and then skip the spaces following that.

> pTerm :: Indent.IndentParser String () String
> pTerm =
>     Parsec.many1 allowedChar <* Parsec.spaces
>   where
>     allowedChar = Parsec.alphaNum <|> Parsec.oneOf ".-"

In the parser for a Taxonomy, we use the indents library. withPos is used to “remember” the indentation position. After doing that, we can use combinators such as indented to check if we are indented past that point.

> pTaxonomy :: Indent.IndentParser String () Taxonomy
> pTaxonomy = Indent.withPos $ do
>     term <- pTerm
>     subs <- Parsec.many $ Indent.indented *> pTaxonomy
>     return $ Taxonomy term subs

Now we have a simple main to function to put it all together;

> readTaxonomy :: FilePath -> IO Taxonomy
> readTaxonomy filePath = do
>     txt <- readFile filePath
>     let errOrTax = Indent.runIndentParser parser () filePath txt
>     case errOrTax of
>         Left  err -> fail (show err)
>         Right tax -> return tax
>   where
>     parser = pTaxonomy <* Parsec.eof

And we can verify that this works in GHCi:

*Main> readTaxonomy "taxonomy.txt"
Taxonomy "tea" [Taxonomy "green" [Taxonomy "korean" [...
*Main>

Special thanks to Sam Anklesaria for writing the original package.

See http://lpaste.net/344393.↩
The interesting tea taxonomy can be found in this blogpost: https://jameskennedymonash.wordpress.com/mind-maps/amazing-tea-taxonomy/.↩

Package takeover: indents

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112