Apache Spark 3.2 Release: Main Features and What's New for Spark-on-Kubernetes
Apache Spark 3.2 is now released and available on our platform. Spark 3.2 bundles Hadoop 3.3.1, Koalas (for Pandas users) and RocksDB (for Streaming users). For Spark-on-Kubernetes users, Persistent Volume Claims (k8s volumes) can now "survive the death" of their Spark executor and be recovered by Spark, preventing the loss of precious shuffle files!
October 26, 2021
Tutorial: Run your R (SparklyR) workloads at scale with Spark-on-Kubernetes
A step-by-step tutorial to help you run R applications with Spark on a Kubernetes cluster using the SparklyR library. We'll go through building a compatible Docker image, building the code of the SparlyR application itself, and deploying it on Data Mechanics.
November 24, 2021
Tutorial: Running PySpark inside Docker containers
In this tutorial, we'll show you how to build your first PySpark applications from scratch and run it inside a Docker container. We'll also show you how to install libraries (like koalas) and write to a data sink (postgres database).
October 12, 2021
How the United Nations Modernized their Maritime Traffic Data Exploration while cutting costs by 70%
By migrating from HBase and EMR to the Data Mechanics platform, the united nations reduced their costs by 70% while improving their team productivity and development experience.
September 21, 2021
Delight: The New & Improved Spark UI & Spark History Server is now Generally Available
We're releasing Delight: our free, hosted & cross-platform monitoring dashboard for Apache Spark. It's a great complement to the Spark UI and Spark History Server to help you understand and improve the performance of your Spark applications. It's easy to install Delight on top of any Spark platform - including Databricks, EMR, Dataproc, and many others.
April 27, 2021
Our Optimized Spark Docker Images Are Now Available
Today we’re excited to publicly release our optimized Docker images for Apache Spark. They can be freely downloaded from our DockerHub repository, whether you’re a Data Mechanics customer or not. These images come with many connectors to common data sources built-in: S3, GCS, Azure Data Lake, Snowflake, Delta Lake ; as well as support for Python, Scala, Java.. and Spark, of course!
April 20, 2021
Migrating from EMR to Spark on Kubernetes with Data Mechanics
Customer Story: Lingk is a data integration platform powered by Apache Spark. AWS EMR was getting hard to manage and expensive. By migrating to Spark on Kubernetes with Data Mechanics, Lingk now enjoys ~2x faster Spark applications, their AWS bill has decreased by 65%, and their developer can now "achieve the plans they dream about".
April 6, 2021
Apache Spark 3.1 Release: Spark on Kubernetes is now Generally Available
With the Apache Spark 3.1 release in March 2021, the Spark on Kubernetes project is now officially declared as production-ready and Generally Available. This is the achievement of 3 years of booming community contribution and adoption of the project - since initial support for Spark-on-Kubernetes was added in Spark 2.3 (February 2018). In this article, we will go over the main features of Spark 3.1, with a special focus on the improvements to Kubernetes.
March 8, 2021
Cost-Effective Weather Analytics At Scale with Cloud-Native Apache Spark
Customer Story: Weather2020 is a predictive weather analytics company. In 3 weeks, their data engineering team built Apache Spark pipelines ingesting terabytes of weather data to power their core product. Data Mechanics performance optimizations and pricing model lowered their costs by 60% compared to Databricks, the main alternative they considered.
January 13, 2021
Data + AI Summit Europe 2020 Highlights
Data + AI Summit 2020 Highlights: What’s new for the Apache Spark community? In this article we’ll go over the highlights of the conference, focusing on the new developments which were recently added to Apache Spark or are coming up in the coming months: Spark on Kubernetes, Koalas, Project Zen.
November 24, 2020
Released: Free Cross-platform Spark UI & Spark History Server
Today we’re releasing a web-based Spark UI and Spark History Server which work on top of any Spark platform, whether it’s on-premise or in the cloud, over Kubernetes or YARN, with a commercial service or using open-source Apache Spark. This is our first step towards building Data Mechanics Delight - the new and improved Spark UI.
November 16, 2020
Spark on Kubernetes Made Easy: How Data Mechanics Improves On The Open-Source Version
How Is Data Mechanics different than running Spark on Kubernetes open-source? In this article, we explain how our platform extends and improves on Spark on Kubernetes to make it easy-to-use, flexible, and cost-effective. We'll go over our intuitive user interfaces, dynamic optimizations, and custom integrations
November 10, 2020
How to be successful with Apache Spark in 2021
Apache Spark is the leading technology for data engineering at scale. But making Spark easy-to-use, stable, and cost-efficient remains challenging. In this article, the AI & Data consulting firm Quantmetry and Data Mechanics team up to give you their best practices to ensure you're successful with Spark in 2021.
November 2, 2020
Spark and Docker: Your Spark development cycle just got 10x faster !
Native support for Docker is in fact one of the main reasons companies choose to deploy Spark on top of Kubernetes instead of YARN. In this article, we will illustrate the benefits of Docker for Apache Spark by going through the end-to-end development cycle used by many of our users at Data Mechanics.
October 13, 2020
Setting up, Managing & Monitoring Spark on Kubernetes
Earlier this year at Spark + AI Summit, we went over the best practices and pitfalls of running Apache Spark on Kubernetes. We’d like to expand on that and give you a comprehensive overview of how you can get started with Spark on k8s, optimize performance & costs, monitor your Spark applications, and the future of Spark on k8s!
September 22, 2021
How We Built A Serverless Spark Platform On Kubernetes - Video Tour Of Data Mechanics
In this video, we give you a product tour of our serverless Spark platform and its core features: connecting a Jupyter notebook, submitting apps programmatically, monitoring their logs and metrics, tracking their costs and performance over time.
September 8, 2020
Apache Spark Performance Benchmarks show Kubernetes has caught up with YARN
Apache Spark on Kubernetes is as performant as Spark on YARN, including during shuffle stages. This article presents the benchmark results and gives critical performance tips for Spark on Kubernetes.
July 6, 2020
Data Mechanics Delight - We're building a better Spark UI
Data Mechanics Delight is a Spark UI and Spark History Server replacement with new metrics and visualizations that will delight you. It works on top of any Spark platform and is entirely free.
June 23, 2020
Our Experience Going Through YCombinator
What is YCombinator like? What did we get out of it? The founders tell their story.
June 3, 2020
The Pros and Cons of Running Apache Spark on Kubernetes
Support for deploying Spark on top of Kubernetes (instead of Yarn, Mesos, Standalone) was added in Spark 2.3. What are the main benefits and drawbacks? Should you get started?
May 26, 2020
Introducing Data Mechanics
We're proud to let you know about what we've been working on. Our mission, our vision, and our recipe for building the data platform of tomorrow.
April 16, 2020