Skip to main content

[DSA-C01] SnowPro Advanced: Data Scientist Exam Guide

The SnowPro Advanced: Data Scientist Certification exam will test advanced knowledge and skills used to apply comprehensive data science principles, tools, and methodologies using Snowflake.

This certification will test the ability to:
- Outline data science concepts
- Implement Snowflake data science best practices
- Prepare data and feature engineering in Snowflake
- Train and use machine learning models
- Use data visualization to present a business case (e.g., model explainability)
- Implement model lifecycle management

  • Course Number

  • Self-Paced

SnowPro™ Advanced: Data Scientist Certification Candidate

2+ years of practical data science experience with Snowflake, in an enterprise environment.
In addition, successful candidates may have:

  • A statistical, mathematical, or science education (or equivalent work experience)
  • Background working with one or more of the following programming languages (e.g., Python, R, SQL, PySpark, etc.)
  • Experience modeling and using machine learning platforms (e.g., SageMaker, Azure Machine Learning, GCP AI platform, AutoML tools, etc.)
  • An understanding of various open source and commercial frameworks and libraries (e.g., scikit-learn, TensorFlow, etc. )
  • Experience preparing, cleaning, and transforming data sets from multiple sources
  • Experience creating features for machine learning training
  • Experience validating and interpreting models
  • Experience putting a model into production and monitoring the model in production

Target Audience:
  • Data Scientists
  • AI/ML Engineers
  • Quantitative Researchers

Exam Format

Exam Version: DSA-C01
Total Number of Questions: 65
Question Types: Multiple Select, Multiple Choice
Time Limit: 115 minutes
Languages: English
Registration Fee: $375 USD
Passing Score: 750 + Scaled Scoring from 0 - 1000
Unscored Content: Exams may include unscored items to gather statistical information. These items are not identified on the form and do not affect your score, and additional time is factored in to account for this content.
Prerequisites: SnowPro Core Certified
Delivery Options:

  • Online Proctoring
  • Onsite Testing Centers

Find more about registration details here.

Exam Domain Breakdown

This exam guide includes test domains, weightings, and objectives. It is not a comprehensive listing of all the content that will be presented on this examination. The table below lists the main content domains and their weighting ranges.

1.0 Domain: Data Science Concepts

1.1 Define machine learning concepts for data science workloads.

  • Artificial intelligence.
  • Machine Learning
    • Supervised learning
    • Unsupervised learning
    • Reinforcement learning
    • Deep learning

1.2 Outline machine learning problem types.

  • Supervised Learning
    • Structured Data
    • Unstructured Data
  • Unsupervised Learning
    • Clustering

1.3 Summarize the machine learning lifecycle.

  • Data Collection
  • Data Visualization and Exploration
  • Feature engineering
  • Training models
  • Model deployment
  • Model monitoring and evaluation

1.4 Outline data governance for data science.

  • Dynamic data masking
  • Row level security
  • Role Based Access Control (RBAC)

1.5 Outline statistical concepts for data science.

  • Normal distribution
  • Central limit theorem
  • Z and T tests
  • Bootstrapping
  • Confidence intervals

1.6 Define model governance for data science.

  • Model versioning
  • Lineage
  • Model explainability

2.0 Domain: Data Pipelining

2.1 Source and collect data into Snowflake from multiple sources.

  • Data loading
    • Batch loading
    • Snowpipe
    • External tables
    • Materialized views
    • Streams
    • Tasks
  • Connecting ETL tools (Connectors)

2.2 Enrich data by consuming data sharing sources.

  • Snowflake Data Marketplace
  • Direct Sharing
  • Shared database considerations

2.3 Create a development environment (e.g., sandbox) and maintain the environment.

  • Cloning
  • Levels or hierarchy
  • Automation to keep dataset updated
  • Time Travel

2.4 Build a data science pipeline.

  • Automation of data transformation
  • Streams and tasks
  • Functions
  • Stored procedures
  • Connect Snowflake to machine learning platforms (e.g., connectors, ML partners, etc.)

3.0 Domain: Data Preparation and Feature Engineering

3.1 Prepare and clean the data for analysis in Snowflake.

  • Use Snowflake native functions, SQL, and Snowpark
    • Aggregate
    • Joins
    • Common Table Expressions (CTEs)
    • Identify critical data
    • Remove duplicates
    • Remove irrelevant fields
    • Handle missing values
    • Data type casting
    • Sampling data
    • Tasks to automate repeatable jobs
    • Stored procedures

3.2 Perform feature engineering on Snowflake data.

  • Preprocessing
  • Data transformations
    • Data Frames
    • Snowpark
  • Binarizing data
    • Binning continuous data into intervals
    • Label encoding
    • One hot encoding
  • Time Travel

3.3 Perform exploratory data analysis in Snowflake.

  • Snowsight and SQL
    • Identify initial patterns
    • Connect external machine learning platforms and/or notebooks (e.g., Jupyter)
  • Use Snowflake native statistical functions to analyze and calculate descriptive data statistics.
    • Window Functions
    • TOPN
    • Approximation/High Performing functions
  • Linear Regression
    • Find the slope and intercept
    • Verify the dependencies on dependent and independent variables

3.4 Visualize and interpret the data to present a business case.

  • Statistical summaries
    • Snowsight
    • Snowflake SQL
    • Functions
  • Charts and graphs
    • Identify data outliers

4.0 Domain: Model Development

4.1 Connect data science tools directly to data in Snowflake.

  • Connectors
    • Python connector with panda support
    • Spark connector
    • R connector
  • Snowflake Best Practices
    • Query rewrite
    • One platform, one copy of data, many workloads
    • Enrich datasets using the data marketplace
    • Stream and Tasks
    • External tables
    • External functions to trigger training
    • Zero-copy cloning for training snapshots
    • Materialized views for training and prediction
    • Snowflake SQL for aggregation and sampling

4.2 Train a data science model.

  • Hyperparameter tuning
  • Optimization metric selection
  • Partitioning
    • Cross validation
    • Train, validation, hold out
  • Down/Up-sampling

4.3 Validate a data science model.

  • ROC curve/confusion matrix
    • Calculate the expected payout of the model
  • Regression problems
  • Residuals plot
    • Interpret graphics with context
  • Model metrics

4.4 Interpret a model.

  • Feature impact
  • Partial dependence plots
  • Confidence intervals

5.0 Domain: Model Deployment

5.1 Move a data science model into production.

  • Deploy an external hosted model
    • External functions
    • Pre-generated models from third-party vendors
  • Deploy a model in Snowflake
    • Java User Defined Functions (UDFs)
    • Pre-generated models from third-party vendors

5.2 Score the effectiveness of a model and retrain if necessary.

  • Metrics for model evaluation
    • Data drift /Model decay
  • External functions
  • User defined functions (UDFs)
  • Storing predictions
    • Stage commands
  • Use Snowsight to do distribution comparison

5.3 Outline model lifecycle and validation tools.

  • Streams and Tasks
  • Metadata tagging
  • Partner model versioning
    • Source control
    • Git workflow
  • Automation of model retraining

Recommended Training

We recommend individuals have at least 2 + years of hands-on Snowflake Practitioner experience in a Data Scientist role prior to attempting this exam. The exam will assess skills through scenario-based questions and real-world examples. As preparation for this exam, we recommend a combination of hands-on experience, instructor-led training, and the utilization of self-study assets.

Instructor-Led Course recommended for this exam:
Snowflake Data Scientist Training

Free Self Study recommended for this exam:
SnowPro Advanced: Data Scientist Study Guide