If you are looking for resources on how to implement Apache Kafka in production, you may be interested in the Mastering Path to Production for Data Streaming Systems with Apache Kafka course, recently published on Confluent Developer:
This course is designed to cover everything you need to know about transitioning your real-time stream processing system from proof-of-concept to production. Covering topics, including how to:
- Gather data requirements to build data streaming apps that meet your needs
- Design the streaming platform with the right capabilities
- Plan for business continuity
- Automate all changes in production
- Operate your platform
- Productionize your streaming applications
I’ve also created two super interesting hands-on exercises, so try them out:
- Exercise 1: Build a staging and production data streaming platform with Terraform and a PR-based promotion mechanism.
- Exercise 2: Create a GitOps system to automate the deployment of Kafka streaming applications in Kubernetes with CNCF Flux
Last but not least, there’s also a cheat sheet PDF to check your readiness!
Check out the resources if you are interested and I look forward to hearing your feedback.
Happy coding!
Gilles Philippart, Software Practice Lead, Confluent
Funding Circle | Software Engineers | ONSITE | San Francisco, London | Full-time
Funding Circle is one of the world’s largest direct lending platforms for businesses, and we’re looking for people who share our passion and restless determination to revolutionize a broken system.
We write an event-driven platorm using Clojure and Kafka Streams. We run on AWS, using Marathon. We also love Domain Driven Design and BDD (Behaviour Driven Development). We host ClojureBridge event and a quartely Clojure meetup in our premises (Clojure Circle!).
Why join us ?
- We’re a band of good folks, and if you want to make a real social impact, well, we're supporting small businesses, the engine of economic growth: https://www.youtube.com/watch?v=eH5bKNP34O8
It's a very important distinction though. Arguably Datomic is missing the point here, which is that t_truth and t_recorded are often different. My classic example of this is an accounting system. Sometimes you want to know "what was the P&L on Dec 2016?" sometimes you want to know "what was the P&L on Dec 2016 believed to be on Dec 2016"? And sometimes you want to do differences between the two: e.g. Dec2016 using Dec2016 facts vs Dec2015 using Dec2016 facts.
Both are equally important, but Datomic blesses one and not the other.
Where I work at, we have a special database used to log any state of certain noteworthy user personal info. We also store declarations about future states.
We hesitated between a store-facts-then-aggregate model à la Datomic and a bitemporal model. A bitemporal model stores states with a validity period and instead of relying on a single punctual timestamp as with facts, makes use of two date fields to model an interval. Current states are encoded with a +Infinity until_timestamp. Updating a state, means closing the previous state's interval (setting its until_timestamp to Time.now) and opening a new interval (from: Time.now, until: +Infinity or the next future state's from_timestamp).
This is stil monotemporal. So far we can log/store data, pretty much the same way Datomic does.' and set states in the future. But in the event we store data about future states, that is also store dates in our temporal system we are tempted to model it with the same interval-based heuristics. Why would we do such a thing if we already can set events in the future thanks to the intervals ? Here is the thing : you can only keep one of these two info:
- the interval for which the state was the freshest state in the database
- the interval declared by the user about the future state
These two correspond to the temporal two axis of a bitemporal database. They are are most often named system and validity temporal axis. Axis is a bit vain I think as it's an expression that conveys the idea they all entertain the same homogenous relationships as in a space whereas it's in practice more like a subtle dependency graph. In the context of tax-related declarations ahead of time we store both the moment the declaration has been stated (and subsequent amendments that may be made to it) as well as the year for which it applies. For this to work the validity axis heuristics must be built on top of that of the system axis. You have to put one close-previous-interval-open-next-one semantic over another one. You could even build a n-tower of such edit mechanisms in a n-temporal database. But in practice you most likely won't need to go linear like that. Suppose that you also want to confirm all those special facts your customer tell about themselves and you want to log it the same way. Unless you do not need to validate present declarations about future states you won't need to make this confirmation axis stand on top the validity axis that himself stands on top of the system axis (a 3 temporal-system).
What's fun is that with the benefit of a system-axis you can schedule data-maintenance in the future. It's also very fast on a relational database engine.
But it messes up with your relationships cardinalities. You will need two ids on your table. One for the state, and one for the item it models. 1_to_1 becomes 1_to_n when the right-end is temporalized. This gets really hairy if you want to temporalize relationships.
Edit:
The author writes:
>event time: the time at which stuff happened.
>recording time: the time at which you're system learns that stuff happened.
>(Disclaimer: this terminology is totally made up by me as I'm writing this.)
In short I do not think such a terminology exists, there is no definite terminology but only metatimes over metatimes and as many field-driven reasons to bring them up, and why not store them all in a badass npm-style time-axis dependency oriented byzantine database.
If you are looking for resources on how to implement Apache Kafka in production, you may be interested in the Mastering Path to Production for Data Streaming Systems with Apache Kafka course, recently published on Confluent Developer:
https://cnfl.io/data-streaming-systems
This course is designed to cover everything you need to know about transitioning your real-time stream processing system from proof-of-concept to production. Covering topics, including how to:
- Gather data requirements to build data streaming apps that meet your needs
- Design the streaming platform with the right capabilities
- Plan for business continuity
- Automate all changes in production
- Operate your platform
- Productionize your streaming applications
I’ve also created two super interesting hands-on exercises, so try them out:
- Exercise 1: Build a staging and production data streaming platform with Terraform and a PR-based promotion mechanism.
- Exercise 2: Create a GitOps system to automate the deployment of Kafka streaming applications in Kubernetes with CNCF Flux
Last but not least, there’s also a cheat sheet PDF to check your readiness!
Check out the resources if you are interested and I look forward to hearing your feedback. Happy coding!
Gilles Philippart, Software Practice Lead, Confluent