Data Analytics
An Overview

This site is intended to give an overview of software tools for data analytics and machine learning.

About

With the new hype about big data, it has become difficult to decide between all the tools and methods that are available as open source or commercial software packages. In the following sections, I will present the most important companies, databases, big data storage engines, and analytics pipelines to compare their features and limitations. This may help to select the appropriate tools for a particular use case among diffenent alternatives.

Big Data Companies

Here is a list of companies practicing data analytics that you should know about:

Amazon

Amazon uses big data within the whole company. One of their most innovative products is the smart speaker Echo which can be used to interact with their personal assistant Alexa.

Cloudera

Cloudera offers the customized Linux distribution CDH (Cloudera Distribution Including Apache Hadoop).

Datameer

Datameer focuses on big data analytics and visualization on top of Hadoop.

DeepMind

DeepMind uses neural networks to build intelligent software and solve complex problems. Their software AlphaGo was the first to beat a professional human player in the game Go.

Facebook

Facebook uses likes and other interactions within their social network to optimize the placement of ads. Publishers can select matching user groups according to their preferences in a very detailed and specific way (Microtargeting).

FICO

Google

Hortonworks

IBM

Innoquant

Knime

Microsoft

Oracle

Pentaho

Rapid-I

SAP

Teradata

Data Analytics Tools

Here is a list of data analytics tools:

Alteryx

Cognos

Dell Statistica

Hadoop

Kibana

Mahout

A library for data mining which runs on top of Hadoop.

Mathematica

Matlab

Microsoft Power BI

Microstrategy Analytics Desktop

Octave

Pig

Qlik Sense

QlikView

R

SAP KXEN

SAP Lumira

SAS Visual Analytics

Spark

SPSS Modeler

SPSS Statistics

SSAS

SSIS

SSRS

Tibco Spotfire

Tableau Desktop

Tensorflow

Zookeeper

Databases

Access

Relational Database

Microsoft

Accumulo

Wide column Database

Azure SQL Database

Relational Database

Microsoft

BigQuery

Relational Database

Google

Cassandra

Wide column Database

Coherence

Oracle

Key-value Database

Couchbase

Document Database

CouchDB

Document Database

DB2

Relational Database

dBASE

Relational Database

Derby

Relational Database

DynamoDB

Document Database

Amazon

Ehcache

Key-value Database

Elasticsearch

Search Engine

FileMaker

Relational Database

H2

Relational Database

HANA

Relational Database

SAP

Hazelcast

Hazelcast provides implementations for Java Collections whose entries are being distributed within the cluster.

Key-value Database

Clustered

Replication

No master (peer to peer)

HBase

Wide column Database

Hive

Relational Database

Impala

Relational Database

Informix

Relational Database

Ingres

Relational Database

MariaDB

Relational Database

Memcached

Key-value Database

MongoDB

Document Database

MySQL

Relational Database

Neo4j

Graph Database

Oracle

Relational Database

PostgreSQL

Relational Database

Redis

Key-value Database

SAP Adaptive Server

Relational Database

SimpleDB

Key-value Database

Amazon

Solr

Search Engine

Spark SQL

Relational Database

Splunk

Search Engine

SQLite

Relational Database

SQL Server

Relational Database

Microsoft

Teradata

Relational Database

Important Terms

HDFS

Hadoop Distributed File System

MapReduce

A data processing concept which can be used to process large amounts of data in a parallel way.

OLAP

Online Analytical Processing (Data Warehouse)

OLTP

Online Transactional Processing (Operational System)

RDBMS

Relational database management system