Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course
Apache Airflow is an open-source platform for orchestrating workflows and automating complex data pipelines.
This instructor-led, live training (online or onsite) is aimed at intermediate-level participants who wish to automate and manage machine learning workflows, including model training, validation, and deployment using Apache Airflow.
By the end of this training, participants will be able to:
- Set up Apache Airflow for machine learning workflow orchestration.
- Automate data preprocessing, model training, and validation tasks.
- Integrate Airflow with machine learning frameworks and tools.
- Deploy machine learning models using automated pipelines.
- Monitor and optimize machine learning workflows in production.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction to Apache Airflow for Machine Learning
- Overview of Apache Airflow and its relevance to data science
- Key features for automating machine learning workflows
- Setting up Airflow for data science projects
Building Machine Learning Pipelines with Airflow
- Designing DAGs for end-to-end ML workflows
- Using operators for data ingestion, preprocessing, and feature engineering
- Scheduling and managing pipeline dependencies
Model Training and Validation
- Automating model training tasks with Airflow
- Integrating Airflow with ML frameworks (e.g., TensorFlow, PyTorch)
- Validating models and storing evaluation metrics
Model Deployment and Monitoring
- Deploying machine learning models using automated pipelines
- Monitoring deployed models with Airflow tasks
- Handling retraining and model updates
Advanced Customization and Integration
- Developing custom operators for ML-specific tasks
- Integrating Airflow with cloud platforms and ML services
- Extending Airflow workflows with plugins and sensors
Optimizing and Scaling ML Pipelines
- Improving workflow performance for large-scale data
- Scaling Airflow deployments with Celery and Kubernetes
- Best practices for production-grade ML workflows
Case Studies and Practical Applications
- Real-world examples of ML automation using Airflow
- Hands-on exercise: Building an end-to-end ML pipeline
- Discussion of challenges and solutions in ML workflow management
Summary and Next Steps
Requirements
- Familiarity with machine learning workflows and concepts
- Basic understanding of Apache Airflow, including DAGs and operators
- Proficiency in Python programming
Audience
- Data scientists
- Machine learning engineers
- AI developers
Need help picking the right course?
macao@nobleprog.com or +852 81990613
Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course - Enquiry
Related Courses
AdaBoost Python for Machine Learning
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists and software engineers who wish to use AdaBoost to build boosting algorithms for machine learning with Python.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start building machine learning models with AdaBoost.
- Understand the ensemble learning approach and how to implement adaptive boosting.
- Learn how to build AdaBoost models to boost machine learning algorithms in Python.
- Use hyperparameter tuning to increase the accuracy and performance of AdaBoost models.
AlphaFold: AI-Driven Protein Structure Prediction and Interpretation
7 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at biologists who wish to understand how AlphaFold works and use AlphaFold models as guides in their experimental studies.
By the end of this training, participants will be able to:
- Understand the basic principles of AlphaFold.
- Learn how AlphaFold works.
- Learn how to interpret AlphaFold predictions and results.
Anaconda Ecosystem for Data Scientists
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists who wish to use the Anaconda ecosystem to capture, manage, and deploy packages and data analysis workflows in a single platform.
By the end of this training, participants will be able to:
- Install and configure Anaconda components and libraries.
- Understand the core concepts, features, and benefits of Anaconda.
- Manage packages, environments, and channels using Anaconda Navigator.
- Use Conda, R, and Python packages for data science and machine learning.
- Get to know some practical use cases and techniques for managing multiple data environments.
Creating Custom Chatbots with Google AutoML
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at participants with varying levels of expertise who wish to leverage Google's AutoML platform to build customized chatbots for various applications.
By the end of this training, participants will be able to:
- Understand the fundamentals of chatbot development.
- Navigate the Google Cloud Platform and access AutoML.
- Prepare data for training chatbot models.
- Train and evaluate custom chatbot models using AutoML.
- Deploy and integrate chatbots into various platforms and channels.
- Monitor and optimize chatbot performance over time.
Pattern Recognition
21 HoursThis instructor-led, live training in Macao (online or onsite) provides an introduction into the field of pattern recognition and machine learning. It touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics.
By the end of this training, participants will be able to:
- Apply core statistical methods to pattern recognition.
- Use key models like neural networks and kernel methods for data analysis.
- Implement advanced techniques for complex problem-solving.
- Improve prediction accuracy by combining different models.
DataRobot
7 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists and data analysts who wish to automate, evaluate, and manage predictive models using DataRobot's machine learning capabilities.
By the end of this training, participants will be able to:
- Load datasets in DataRobot to analyze, assess, and quality check data.
- Build and train models to identify important variables and meet prediction targets.
- Interpret models to create valuable insights that are useful in making business decisions.
- Monitor and manage models to maintain an optimized prediction performance.
Edge AI with TensorFlow Lite
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at intermediate-level developers, data scientists, and AI practitioners who wish to leverage TensorFlow Lite for Edge AI applications.
By the end of this training, participants will be able to:
- Understand the fundamentals of TensorFlow Lite and its role in Edge AI.
- Develop and optimize AI models using TensorFlow Lite.
- Deploy TensorFlow Lite models on various edge devices.
- Utilize tools and techniques for model conversion and optimization.
- Implement practical Edge AI applications using TensorFlow Lite.
Google Cloud AutoML
7 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists, data analysts, and developers who wish to explore AutoML products and features to create and deploy custom ML training models with minimal effort.
By the end of this training, participants will be able to:
- Explore the AutoML product line to implement different services for various data types.
- Prepare and label datasets to create custom ML models.
- Train and manage models to produce accurate and fair machine learning models.
- Make predictions using trained models to meet business objectives and needs.
Kaggle
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists and developers who wish to learn and build their careers in Data Science using Kaggle.
By the end of this training, participants will be able to:
- Learn about data science and machine learning.
- Explore data analytics.
- Learn about Kaggle and how it works.
Kubeflow Essentials: Build, Train & Serve with Kubernetes
14 HoursKubeflow is an open-source platform designed to streamline building, training, and deploying machine learning workloads on Kubernetes.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level professionals who wish to build reliable ML workflows using Kubeflow.
Upon completion of this training, attendees will gain the skills to:
- Navigate the Kubeflow ecosystem and core components.
- Build reproducible workflows with Kubeflow Pipelines.
- Run scalable training jobs on Kubernetes.
- Serve machine learning models efficiently using Kubeflow Serving.
Format of the Course
- Guided presentations and collaborative discussions.
- Hands-on labs with real Kubeflow components.
- Practical exercises to build end-to-end ML workflows.
Course Customization Options
- Customized versions of this training can be arranged to align with your team’s technology stack and project requirements.
Kubeflow Fundamentals
28 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at developers and data scientists who wish to build, deploy, and manage machine learning workflows on Kubernetes.
By the end of this training, participants will be able to:
- Install and configure Kubeflow on premise and in the cloud.
- Build, deploy, and manage ML workflows based on Docker containers and Kubernetes.
- Run entire machine learning pipelines on diverse architectures and cloud environments.
- Using Kubeflow to spawn and manage Jupyter notebooks.
- Build ML training, hyperparameter tuning, and serving workloads across multiple platforms.
Machine Learning for Mobile Apps using Google’s ML Kit
14 HoursThis instructor-led, live training in (online or onsite) is aimed at developers who wish to use Google’s ML Kit to build machine learning models that are optimized for processing on mobile devices.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start developing machine learning features for mobile apps.
- Integrate new machine learning technologies into Android and iOS apps using the ML Kit APIs.
- Enhance and optimize existing apps using the ML Kit SDK for on-device processing and deployment.
Machine Learning with Random Forest
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists and software engineers who wish to use Random Forest to build machine learning algorithms for large datasets.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start building machine learning models with Random forest.
- Understand the advantages of Random Forest and how to implement it to resolve classification and regression problems.
- Learn how to handle large datasets and interpret multiple decision trees in Random Forest.
- Evaluate and optimize machine learning model performance by tuning the hyperparameters.
Advanced Analytics with RapidMiner
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at intermediate-level data analysts who wish to learn how to use RapidMiner to estimate and project values and utilize analytical tools for time series forecasting.
By the end of this training, participants will be able to:
- Learn to apply the CRISP-DM methodology, select appropriate machine learning algorithms, and enhance model construction and performance.
- Use RapidMiner to estimate and project values, and utilize analytical tools for time series forecasting.
GPU Data Science with NVIDIA RAPIDS
14 HoursThis instructor-led, live training in Macao (online or onsite) is aimed at data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.
By the end of this training, participants will be able to:
- Set up the necessary development environment to build data models with NVIDIA RAPIDS.
- Understand the features, components, and advantages of RAPIDS.
- Leverage GPUs to accelerate end-to-end data and analytics pipelines.
- Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
- Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
- Build data visualizations and execute graph analysis with cuXfilter and cuGraph.