Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Happy to answer any questions! This tool has been in use for many months by some select Zapier users and we decided to finally release it. I definitely want to open source the core extractor bits and document the REST API that powers the Zapier integration.

We have more information on using it in Zapier (the main use case at the moment) here: https://zapier.com/zapbook/updates/308/introducing-zapier-em...



Very nice! This looks very useful and easy to use.

How much boilerplate text do you need on either side of a token in order to identify it? Put another way, how much can the template emails vary? If the template format changes is there any sort of notification? Did you use the simplest implementation that could work or is this much more complicate than it looks?


It can actually get pretty complex, the technique we're using is a wacky hacky hodgepodge of Google diff-match-patch that works surprisingly well! If you run into any that don't work, just let us know and we can add it to the test suite and figure it out.


We've got some particularly complicated html RegEx for Email parsing at our company. We manually write new ones for new email layouts as we get them. I'd be interested in any information on how you're solving the issue, as I love how you've tackled it at least on the UI end.


Sure! In very broad strokes:

First, download yourself a copy of Google's diff-match-patch.

Second, make a template for the email you have (think "Your shipment will be delivered {{date}}. Thank you!" vs. the original raw email "Your shipment will be delivered 2014-04-04. Thank you!").

Third, run it through diff-match-patch.

Forth, walk over the change tree and record the insertion (1), a deletion (-1) or equality (0) transformations (one as keys the other as values).

(There are a lot of edge cases to handle between the forth and fifth step, but test cases make those pretty obvious (if not very frustrating.)

Fifth, collate the keys/values into a dictionary and do some last minute cleanups.

We will be documenting a REST API so you can use parser.zapier.com directly, and it is pretty easy to forward emails automatically to our robot (so you can conceivably avoid writing anything at all and just use the app).


> If you run into any that don't work, just let us know and we can add it to the test suite and figure it out.

How would you prefer we contact you? There's no contact info on the site.


Any hints on how I can process datetimes with non-standard formatting properly? I've got mails where the date is formatted as dd/mm/yyyy, which causes Google Calendar to create an event on the 3rd of January instead of the 1st of March.


If you can, try setting each portion of the date as a different field for the parser to split out. Then in a zap you could re-assemble the date in the right order so GCal can understand it.


Good job, mate ! I love it !




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: