Is Hadoop a proven technology

10 Big Data Technologies You Should Know About

Big data stocks are one of the most important resources of many companies, from which insights can be drawn for the development of new business models, products and strategies. At the moment, however, most of those responsible are faced with the challenge of identifying a suitable big data concept and specific purposes. Depending on the application scenario, different, mostly individualized technology concepts from the big data environment are used. The ten most important are presented below; the entire overview can be downloaded here.

1. Hadoop - a proven concept

Hadoop is an open source framework written in Java for parallel data processing on highly scalable server clusters. In the big data area, Hadoop now plays a central role in many solutions. Hadoop is particularly suitable for data evaluations that require extensive analyzes.

2. Cloudera - everything from a single source

Cloudera offers its own Hadoop distribution, which is now one of the most popular. Cloudera comprises a broad portfolio of tested open source big data applications that can be easily managed and installed via the cluster manager on a web interface. Companies can benefit from the fact that they can fall back on proven solutions and implement new big data technologies in existing processes.

3. Apache Hive - the data warehouse for Hadoop

A challenge for companies is the relocation of their data to Hadoop, because the existing data is usually stored in relational databases with the Structured Query Language (SQL). The open source data warehouse system Apache Hive offers support here. The main functions of Hive are data summarization, querying, and analysis.

4. Cloudera Impala - the solution for real-time queries

With Impala, the Hadoop specialist Cloudera has developed a technology with which real-time queries can be carried out in Hadoop or HBase. The main function of Impala is to provide a scalable and distributed data retrieval tool for Hadoop.

5. MongoDB - the database for all cases

MongoDB is one of the leading NoSQL databases from the open source area. As a "general purpose database", MongoDB is ideally suited for today's IT landscape with its large and sometimes unstructured amounts of data. The database enables dynamic development and high scalability for your applications.

6. Pentaho - flexible business intelligence platform

Pentaho's strategy is to combine various proven individual solutions into a complete framework and to provide support for this from a single source. For example, with Pentaho Data Integration (PDI), data developers and analysts can work together to create new data sets using the same product for both data development and visualization.

7. Infobright - MySQL engine with effective data compression

The explosive data growth is putting the established data management solutions under pressure because their flexibility is limited. For this reason, column-based databases were developed. With the MySQL engine Infobright, a new open source system has recently been established that is suitable for data volumes from 500 gigabytes. Infobright combines a column-based database with a self-managing knowledge grid architecture.

8. Apache Spark - a framework for real-time analysis

Many companies want to use their data in order to be able to make quick and well-founded decisions, for example the optimization of products or the identification of savings opportunities. One technology that can be used for this is Apache Spark. This is an open source framework that works in parallel and enables large amounts of data to be processed quickly on clustered computers.

9. Splunk - Simplify Big Data

Splunk Enterprise enables the monitoring and analysis of clickstream data as well as customer transactions, network activities or call data records. Splunk takes on the integration of the various data so that they can be meaningfully evaluated. The advantage is that almost all types of files can be indexed, processed and evaluated with it.

10. Apache Storm - real-time big data analysis

Apache Storm is a fault-tolerant, scalable system for real-time processing of data streams. The technology is a component of the Hadoop ecosystem and works independently of programming languages.