Blog | luminousmen

I P

Big Data file formats

Apache Spark supports many different data formats, such as the ubiquitous CSV format and web-friendly JSON format. Common formats used primarily for big data analytical purposes are Apache Parquet and Apache Avro. In this post, we’re going to cover the properties of these 4 formats — CSV, JSON,...

How to make color emoji work on Chrome for Linux

In order to support emoji on Linux we will be using Noto Color Emoji font. The script is simple: wget https://noto-website.storage.googleapis.com/pkgs/NotoColorEmoji-unhinted.zip sudo mkdir -p /usr/local/share/fonts/truetype sudo unzip NotoColorEmoji-unhinted.zip -d...

Spark. Anatomy of Spark application

Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and most in-demand big data framework across all major industries....

Data Science. The Central Limit Theorem and sampling

There are a lot of engineers who have never been involved in statistics or data science. So, in order to build a data science pipelines or rewrite produced by data scientists code to an adequate, easily maintained code many nuances and misunderstandings arises from the engineering side. For...

Spark core concepts explained

Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and most in-demand big data framework across all major industries....

Things you need to know about Hadoop and YARN being a Spark developer

Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and most in-demand big data framework across all major industries....