Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

     You can't parse context-free or context-sensitive 
     languages with regular expressions
I didn't say that.

Also, to add to your point, current regexp libraries have capabilities that far exceed the concept in the formal languages theory. Features like capturing buffers or look-ahead assertions cannot describe a finite-state machine.

Also, Perl has had support for recursive regular expressions. Perl 5.10 even has support for recursive capture buffers.

     To be able to produce a grammar specification
     for the parser generator you need to understand 
     formal grammars
IMHO, that's the same as saying that in order to be a programmer then you need to know lots of math, which is in general bullshit. Even before the formal theory was available, people have been building compilers the good-old fashioned way - by writing top-down LL(k) parsers, without necessarily having a name for this technique.

But I also said that one should remember some basics from its theory of computation class, as it is helpful to know what a pushdown automaton is and the difference between it and a finite-state machine.

     All of this is vastly more involved than just
     anding regular expressions
It involves functions calling each other, sometimes recursively and loops and switches and remembering where you are in the text and doing repetitive transformations. It's as if you're doing real programming.

     You probably meant parser generator, parser combinator 
     is a different thing
No, I meant parser combinator.


I wonder how often you looked at a compiler's implementation to make such claims. First of all, there are many parts in a compiler that are not fully automated (lexer and parser generators making frontend development easier; basically BURS systems for backend automation (w.r.t. instruction selection), but then in optimization you have to deal with instruction scheduling, register allocation, etc.)

Then, I also think that your comment about people having written top-down recursive descent parsers for a long time without concrete historical reference is suprising, particularly when we all know of a prominent example (C) that you cannot parse with LL techniques (I rememeber a professor at university remarking on K&R probably not knowing about LL -- though I can't attest to this being true or not.)

I can also not figure out how you connect the Dragon book to Parser combinators. Having implemented parsers in both, monadic and combinatoric style (both of which btw. are a mess to debug) the best connection I can think of is translating formal grammar descriptions using parser combinators. Is this what you are referring to?

I like your characterization of compilers being just calling functions and loops plus switches and pointers to text. While we know about Church-Turing thesis, I am positive that the intricacies of database system implementation, operating system implementation and programming langauge implementation deserves distinction (which is supported by many of them having dedicated special interest groups.)


      I also think that your comment about people having 
      written top-down recursive descent parsers for a long 
      time without concrete historical reference is 
      suprising
I do not have concrete knowledge or evidence about it, it was a statement based on intuition alone, sorry about that.

The reason why I said it is because the first parser I ever wrote (something like 10 years ago), was a top-down LL(k) parser and I had absolutely no idea about what I was doing, but in the end it worked. Many programmers are choosing LL(k) implementations for their manually-built parsers because they are so easy to reason about and build manually.

I was under the impression that C can be described as LL(k). C++ definitely can't be.

About a compiler's implementation, I realize that it can get really messy, especially for statically typed languages and especially if you want to add type inference to it. Also agree on optimization. But the thing is, most people don't need to build turing-complete languages or need efficient translations to the target representation, they just need to parse DSLs or network protocols (hence my original comment).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: