The purpose of this blog is try to explain about different types of benchmark tools available for BigData components. We did a talk on BigData benchmark Linaro Connect @LasVegas in 2016. This is one of my effort to collectively put into a one place with more information.
We have to remember that all the BigData/components/benchmarks are developed
- Keeping in mind x86 architecture.
- So in first place we should make sure that all the relevant benchmark tools compile and run it on AArch64.
- Then we should go ahead and try to optimize the same for AArch64.
Different types of benchmarks and standards
- Micro benchmarks: To evaluate specific lower-level, system operations
- E.g. HiBench, HDFS DFSIO, AMP Lab Big Data Benchmark, CALDA, Hadoop Workload Examples (sort, grep, wordcount and Terasort, Gridmix, Pigmix)
- Functional/Component benchmarks: Specific to low level function
- E.g. Basic SQL: Individual SQL operations like select, project, join, Order-by..
- Application level
- Spark bench
The below table explains different types of benchmark
Benchmark Efforts - Microbenchmarks
Generate, read, write, append, and remove data for MapReduce jobs
Execution Time, Throughput
Sort, WordCount, TeraSort, PageRank, K-means, Bayes classification, Index
Hadoop and Hive
Execution Time, Throughput, resource utilization
Part of CALDA workloads (scan, aggregate and join) and PageRank
Load, scan, select, aggregate and join data, count URL links
Benchmark Efforts - TPC
HSGen, HSData, Check, HSSort and HSValidate
Performance, price and energy
Execution Time, Throughput
Decision support benchmark
Data loading, queries and maintenance
Execution Time, Throughput
Benchmark Efforts - Synthetic
Synthetic user generated MapReduce jobs of reading, writing, shuffling and sorting
Synthetic and basic operations to stress test job scheduler and compression and decompression
Memory, Execution Time, Throughput
17 Pig specific queries
MapReduce benchmark as a complementary to TeraSort - Datawarehouse operations with 22 TPC-H queries
Load testing namenode and HDFS I/O with small payloads
CPU, memory and shuffle and IO intensive workloads. Machine Learning, Streaming, Graph Computation and SQL Workloads
Execution Time, Data process rate
Interactive-based queries based on synthetic data
1. Micro Benchmarks (sort, grep, WordCount);
2. Search engine workloads (index, PageRank);
3. Social network workloads (connected components (CC), K-means and BFS);
4. E-commerce site workloads (Relational database queries (select, aggregate and join), collaborative filtering (CF) and Naive Bayes;
5. Multimedia analytics workloads (Speech Recognition, Ray Tracing, Image Segmentation, Face Detection);
6. Bioinformatics workloads
Hadoop, DBMSs, NoSQL systems, Hive, Impala, Hbase, MPI, Libc, and other real-time analytics systems
Memory, CPU (MIPS, MPKI - Misses per instruction)
Let's go through each of the benchmark in detail.
Hadoop benchmark and test tool:
The hadoop source comes with a number of bench marks. The TestDFSIO, nnbench, mrbench are in hadoop-*test*.jar file and the TeraGen, TeraSort, TeraValidate are in hadoop-*examples*.jar file in the source code of hadoop.
You can check it using the command
$ cd /usr/local/hadoop
$ bin/hadoop jar hadoop-*test*.jar
$ bin/hadoop jar hadoop-*examples*.jar
While running the benchmarks you might want to use time command which measure the elapsed time. This saves you the hassle of navigating to the hadoop JobTracker interface. The relevant metric is real value in the first row.
$ time hadoop jar hadoop-*examples*.jar ...
TeraGen, TeraSort and TeraValidate
This is a most well known Hadoop benchmark. The TeraSort is to sort the data as fast as possible. This test suite combines HDFS and mapreduce layers of a hadoop cluster. The TeraSort benchmark consists of 3 steps Generate input via TeraGen, Run TeraSort on input data and Validate sorted output data via TeraValidate. We have a wikipage which explains about this test suite. You can refer Hadoop Build Install And Run Guide
It is part of hadoop-mapreduce-client-jobclient.jar file. The Stress test I/O performance (throughput and latency) on a clustered setup. This test will shake out the hardware, OS and Hadoop setup on your cluster machines (NameNode/DataNode). The tests are run as a MapReduce job using 1:1 mapping (1 map / file). This test is helpful to discover performance bottlenecks in your network. The benchmark write test follow up with read test. You can use the switch case -write for write tests and -read for read tests. The results are stored by default in TestDFSIO_results.log. You can use following switch case -resFile to choose different file name.
MR(Map Reduce) Benchmark for MR
The test loops a small job in number of times. It checks whether small job runs are responsive and running efficiently on your cluster. It puts focus on MapReduce layer as its impact on the HDFS layer is very limited. The multiple parallel MRBench issue is resolved. Hence you can run it from different boxes.
Test command to run 50 small test jobs
$ hadoop jar hadoop-*test*.jar mrbench -numRuns 50
Exemplary output, which means in 31 sec the job finished
DataLines Maps Reduces AvgTime (milliseconds)
1 2 1 31414
NN (Name Node) Benchmark for HDFS
This test is useful for load testing the NameNode hardware & configuration. The benchmark test generates a lot of HDFS related requests with normally very small payloads. It puts a high HDFS management stress on the NameNode. The test can be simultaneously run from several machines e.g. from a set of DataNode boxes in order to hit the NameNode from multiple locations at the same time.
The TPC is a non-profit, vendor-neutral organization. The reputation of providing the most credible performance results to the industry. The TPC is a role of “consumer reports” for the computing industry. It is a solid foundation for complete system-level performance. The TPC is a methodology for calculating total-system-price and price-performance. This is a methodology for measuring energy efficiency of complete system
TPC Express Benchmark Standard is easy to implement, run and publish, and less expensive. The test sponsor is required to use TPCx-Hs kit as it is provided. The vendor may choose an independent audit or peer audit which is 60 day review/challenge window apply (as per TPC policy). This is approved by super majority of the TPC General Council. All publications must follow the TPC Fair Use Policy.
- TPC-H benchmark focuses on ad-hoc queries
The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.
- This is the standard benchmark for decision support
The TPC Benchmark DS (TPC-DS) is a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. The benchmark provides a representative evaluation of performance as a general purpose decision support system. A benchmark result measures query response time in single user mode, query throughput in multi user mode and data maintenance performance for a given hardware, operating system, and data processing system configuration under a controlled, complex, multi-user decision support workload. The purpose of TPC benchmarks is to provide relevant, objective performance data to industry users. TPC-DS Version 2 enables emerging technologies, such as Big Data systems, to execute the benchmark.
- TPC-C is an On-Line Transaction Processing Benchmark
Approved in July of 1992, TPC Benchmark C is an on-line transaction processing (OLTP) benchmark. TPC-C is more complex than previous OLTP benchmarks such as TPC-A because of its multiple transaction types, more complex database and overall execution structure. TPC-C involves a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution. The database is comprised of nine types of tables with a wide range of record and population sizes. TPC-C is measured in transactions per minute (tpmC). While the benchmark portrays the activity of a wholesale supplier, TPC-C is not limited to the activity of any particular business segment, but, rather represents any industry that must manage, sell, or distribute a product or service.
TPC vs SPEC models
Here is our comparison between TPC Vs SPEC model benchmark
|TPC model||SPEC model|
|Specification based||Kit based|
|Performance, Price, energy in one benchmark||Performance and energy in separate benchmarks|
|Multiple tests (ACID, Load)||Single test|
|Independent Review||Summary disclosure|
|Full disclosure||SPEC research group ICPE|
|TPC Technology conference||SPEC Research Group, ICPE (International|
Conference on Performance Engineering)
BigBench is a joint effort with partners in industry and academia on creating a comprehensive and standardized BigData benchmark. One of the reference reading about BigBench Toward An Industry Standard Benchmark for BigData Analytics BigBench builds upon and borrows elements from existing benchmarking efforts (such as TPC-xHS, GridMix, PigMix, HiBench, Big Data Benchmark, YCSB and TPC-DS). BigBench is a specification-based benchmark with an open-source reference implementation kit. As a specification-based benchmark, it would be technology-agnostic and provide the necessary formalism and flexibility to support multiple implementations. It is focused around execution time calculation Consists of around 30 queries/workloads (10 of them are from TPC). The drawback is, it is a structured-data-intensive benchmark.
Spark Bench for Apache Spark
We are able to build on ARM64. The setup completed for single node but run scripts are failing. When spark bench examples are run, a KILL signal is observed which terminates all workers. This is still under investigation as there are no useful logs to debug. No proper error description and lack of documentation is a challenge. A ticket is already filed on spark bench git which is unresolved.
It is based on TPC-H and TPC-DS benchmarks. You can exeriment Apache Hive at any data scale. The benchmark contains data generator and set of queries. This is very useful to test the basic Hive performance on large data sets. We have a wiki page for Hive TestBench
This is a stripped-down version of common Mapreduce jobs. (sorting text data and SequenceFiles). Its a tool for benchmarking Hadoop clusters. This is a trace based benchmark for MapReduce. It
evaluate MapReduce and HDFS performance.
It submits a mix of synthetic jobs , modeling a profile mined from production loads. The benchmark attempt to model the resource profiles of production jobs to identify bottlenecks
Basic command line usage:
$ hadoop gridmix [-generate
] [-users ]
Con - Challenging to explore the performance impact of combining or separating workloads, e.g., through consolidating from many clusters.
The PigMix is a set of queries used test pig component performance. There are queries that test latency (How long it takes to run this query ?). The queries that test scalability (How many fields or records can ping handle before it fails ?).
Usage: Run the below commands from pig home
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy (generate test dataset)
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix (run the PigMix benchmark)
The documentation can be found at Apache pig - https://pig.apache.org/docs/
This benchmark enables rigorous performance measurement of MapReduce systems. The benchmark contains suites of workloads of thousands of jobs, with complex data, arrival, and computation patterns. Informs both highly targeted, workload specific optimizations. This tool is highly recommended for MapReduce operators The performance measurement - https://github.com/SWIMProjectUCB/SWIM/wiki/Performance-measurement-by-executing-synthetic-or-historical-workloads
This is a BigData Benchmark from AMPLab, UC Berkeley provides quantitative and qualitative comparisons of five systems
- Redshift – a hosted MPP database offered by Amazon.com based on the ParAccel data warehouse
- Hive – a Hadoop-based data warehousing system
- Shark – a Hive-compatible SQL engine which runs on top of the Spark computing framework
- Impala – a Hive-compatible* SQL engine with its own MPP-like execution engine
- Stinger/Tez – Tez is a next generation Hadoop execution engine currently in development
This benchmark measures response time on a handful of relational queries: scans, aggregations, joins, and UDF’s, across different data sizes.
This is a specification based benchmark. The two key components: A data model specification and a workload/query specification. It's a comprehensive end-to-end big data benchmark suite. The git hub for BigDataBenchmark
BigDataBench is a benchmark suite for scale-out workloads, diﬀerent from SPEC CPU (sequential workloads), and PARSEC (multithreaded workloads). Currently, it simulates ﬁve typical and important big data applications: search engine, social network, e-commerce, multimedia data analytics, and bioinformatics.
Currently, BigDataBench includes 15 real-world data sets, and 34 big data workloads.
This benchmark test suite is for Hadoop. It contains 4 different categories tests, 10 workloads and 3 types. This is a best benchmark with metrics: Time (sec) & Throughput (Bytes/Sec)
Terasort, TestDFSIO, NNBench, MRBench
GridMix3, PigMix, HiBench, TPCx-HS, SWIM, AMPLab, BigBench
Industry Standard benchmarks
TPC - Transaction Processing Performance Council http://www.tpc.org
SPEC - The Standard Performance Evaluation Corporation https://www.spec.org
CLDS - Center for Largescale Data System Research http://clds.sdsc.edu/bdbc