Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can view the question as a proxy for "how do you provide value for money?".

If you build something that then gets replaced a few years later, maybe you did something wrong. Ideally you make something that evolves, or even better, that acts as a foundation others can build on. If you get a lot of assumptions right and the implementation doesn't get in the way of what people do - or better yet, meaningfully enables them to get work done, you've succeeded.

Here are some things I've observed in the wild.

Data infrastructure projects often fail, not because the technology doesn't work, but because the solution does not enable _organizations_ to work with them. I've seen many companies invest millions in solutions that eventually turned out to be useless because they failed to help make data and results accessible to complex organizations with lots of internal boundaries.

Too much too soon and too complex. You try to address every possible need from the start and in order to make the feature list as long and impressive as possible, you introduce lots and lots of systems that are expensive and complex. Then to use the system, you unload a huge burden onto the users. They have to learn all of these systems and spend lots of time and money training people and adapting their systems so they can interoperate with the rest.

I've helped a few companies design their data infrastructure. I usually follow an extremely minimalist approach. Here's how I start.

1) your long term data store is flat files, 2) you make real-time data available over streaming protocols, 3) by default everyone (inside the company) has access - access limitations have to be justified, 4) you document formats and share code that is used to interpret, transform and process data so the consumer can interpret the data. 5) you give people access to resources where they can spin up databases and run stuff. Data producers and consumers decide how they want to create and process data. You focus on the interface where they exchange data.

(I left security as an exercise to the reader because a) it depends and b) how to secure these kinds of systems is an even longer post)

Points 1 and 2 are sufficient to bootstrap databases and analytic systems at any time. Including systems that receive live data. It makes it possible to both support systems that are supposed to be up permanently and systems that perhaps only load the data, do some progressing and then get nuked. 5 provides the resources to do so.

3 usually meets with resistance in some types of organizations, but is critical. I've seen companies invest millions in "data lakes" and whatnot ... and then piss away the value because only 2-3 people have access to the data and they ain't sharing. You need executive management to empower someone to put their foot down. (One way to make people share data is to use budgets. If you don't share data, your department pays for its storage. If it is shared, it is paid for by a central budget.)

Point 4 requires you to also educate people a bit on data exchange. For instance in many areas there exists exchange standards, but these are not necessarily very good. If you find yourself in a situation where you spend a lot of effort expressing the data in format X and then spend a lot of effort interpreting the data at the other end, you are wasting your time. Come up with something simpler. Not all standards are worth using. And not everything is worth standardizing - don't lose sight of actual goals.

Point 5 is where you grow new core services. Producers and consumers get to pick their own technologies and do whatever they want. When they can show that they've built something that makes life easier for other parts of the organization, you can consider moving it to the "core" but this only happens when something has shown that it works and improves productivity across internal boundaries.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: