Data Engineering skills

3 min read Last updated Oct 04 2020

#data-engineering #programming #career

My friend asked me an interesting question about what skills are worth learning for Data Management specialists and how to build a grow roadmap.

In fact, the question made me think because I haven't had a clear picture in my head. It's just my thoughts on the topic and for the most part, I'm just speculating about the current state and the future of Data Management.

Prerequisites

In the beginning, as in any other area, there are basic things that any Software Engineer should know.

In short, I assume that the person who came to the Big Data already knows some programming language and has an idea about such basic things as CS algorithms, SQL, VCS, SDLC, Networking fundamentals, Linux basics, CI/CD, etc. One way or another the common practices of software engineering do not go anywhere — this is the basis that I think is needed in any field. If you lack them then you better learn them, but perhaps you can argue with me here.

Big Data Basics

This is another layer of knowledge that I think is the basic one. Here I could highlight programming languages, for example, because in the Big Data world there are predominant languages without which it's just hard (or impossible) to do something. These are of course JVM languages such as Java, Scala may soon be joined by Kotlin, and, of course, Python as the second best language for everything and the de-facto language for Data Science.

Aside from languages, there are some terms and fundamental concepts behind the Big Data that I think should be learned such as horizontal scaling, distributed systems, asynchronous communication, CAP theorem, eventual consistency, consensus problems, availability, durability, reliability, resiliency, fault tolerance, disaster recovery, etc. There is no need for a deep understanding of them but if you want to succeed in the field you better Google it at least.

Now let's talk about how the field is developing, to be more precise, the challenges it faces.

Data Challenges

Data Challenges in Big Data

Management Challenges

Management Challenges in Big Data

Analytical Challenges

Analytical Challenges in Big Data

Operational Challenges in Big Data

Conclusion

It turned out a lot and it seems that I did not say anything. Check out also an interview with Tobias Macey on Data Engineering Landscape in 2021.

Too many products are available. Most of them claim to solve all data problems your company encounter. But it is not true.

I do not consider myself an expert in everything and only speculate on technology here. But as you can see even from my article, many skills overlap in several areas of Big Data and do not end there. Having them, you can not be afraid that you will not find a job.

Do not chase trends — build skills that stay relevant. And the most relevant skills are probably soft skills.

The steep learning curve associated with a lot of data engineering activities becoming a cliff. Developers' hand-coding projects need deep knowledge of many aspects of the organization and a lot of tools and existing solutions.

In data management, we are still on the Wild West, especially in ML...

Liked this? I publish one deep-dive every week.

Join 2,500+ engineers. No BS, no vendor fluff.

Get the newsletter