![]() ![]() Previously, she worked on Uber’s Marketplace Intelligence team building the systems to deploy the global ridesharing platform at scale. Xuanzi Han: Xuanzi Han is a senior back-end software engineer at Monte Carlo.Previously, she served as senior software engineer at Portchain and a team lead at Infragistics. Helena Muñoz: Helena Muñoz is a senior front-end engineer at Monte Carlo.in Statistics from the University of California, Berkeley. She received her MBA from Harvard Business School and her B.S. Prior to joining Monte Carlo, Mei worked in product management at NEXT Trucking and Product Strategy at Coinbase. Mei Tao: Mei Tao is a product manager at Monte Carlo, a data reliability company.Learn how they designed the data model, query parser, and larger database design for field-level lineage-highlighting learnings, wrong turns, and best practices developed along the way. ![]() In this session, Mei Tao, Helena Munoz, and Xuanzi Han (Monte Carlo) tackle this challenge head-on by leveraging some of the most popular tools in the modern data stack, including dbt, Airflow, Snowflake, and ANother Tool for Language Recognition (ANTLR). But it hasn’t always been easy to create, particularly at the field level. Lineage is a critical component of any root cause, impact analysis, and overall analytics heath assessment workflow. Here are just a few of the talks we will have on January 18th 2023.īuilding field-level lineage for modern data systems We will be having 10 great speakers provider their advice on data strategy, infrastructure, self-service analytic and more. Our team is putting together an all day event focused on helping answer some of these questions. What is the state of data infra?Īre Snowflake and Databricks still fighting over total cost of ownership? ^ "Introducing Amazon Managed Workflows for Apache Airflow (MWAA)".^ "Google launches Cloud Composer, a new workflow automation tool for developers"."Astronomer is Now the Apache Airflow Company". ^ Trencseni, Marton (January 16, 2016)."Airflow: a workflow management platform". ^ Error: Unable to display the reference properly.Starting from November 2020, Amazon Web Services offers Managed Workflows for Apache Airflow. Cloud Composer is a managed version of Airflow that runs on Google Cloud Platform (GCP) and integrates well with other GCP services. Astronomer has built a SaaS tool and Kubernetes-deployable Airflow stack that assists with monitoring, alerting, devops, and cluster management. Three notable providers offer ancillary services around the core open source project. Previous DAG-based schedulers like Oozie and Azkaban tended to rely on multiple configuration files and file system trees to create a DAG, whereas in Airflow, DAGs can often be written in one Python file. hourly or daily) or based on external event triggers (e.g. DAGs can be run either on a defined schedule (e.g. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. While other "configuration as code" workflow platforms exist using markup languages like XML, using Python allows developers to import libraries and classes to help them create their workflows.Īirflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Airflow is designed under the principle of "configuration as code". From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.Īirflow is written in Python, and workflows are created via Python scripts. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface. It started at Airbnb in October 2014 as a solution to manage the company's increasingly complex workflows. Apache Airflow is an open-source workflow management platform for data engineering pipelines. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |