Distributed system performance benchmarks are an important source of information for decision makers who must choose the right technology for their next compute or data intensive problems. Since, important decisions rely on trustworthy experimental data, so a benchmark of Apache Hama with other available systems might prove to get additional attention by big data community. After working on Hama last year, I know its capabilities and where it stands in comparison to other in-memory distributed systems. Yet lack of experimental results makes it hard to convince data engineers to move to Hama. So for this year's GSoC, I am planning to do a performance benchmark against Apache Spark and Flink. In addition to this, I will contribute to multiple Jira tickets that are in open or pending status for quite some time.




  • Edward J. Yoon