DOE Guide¶

Design of Experiments (DOE) is a systematic approach to understanding how different factors affect a response variable. Instead of changing one factor at a time, DOE allows you to study multiple factors simultaneously, revealing important interactions while minimizing experimental effort.

When Should You Use DOE?¶

Process Optimization: Finding the best settings for your manufacturing process
Product Development: Understanding how design parameters affect performance
Quality Improvement: Identifying factors that impact product quality
Screening Studies: Determining which factors are most important among many candidates
Cost Reduction: Getting maximum information from minimum experimental effort

Basic DOE Workflow¶

1. Problem Definition¶

Start by clearly defining:

Response variable: What you want to improve or understand
Factors: Variables you can control that might affect the response
Objectives: Are you screening, optimizing, or characterizing?
Resources: How many experimental runs can you afford?

2. Factor Selection and Levels¶

import daspi as dsp

# Continuous factors (temperature, pressure, concentration)
temperature = dsp.Factor('Temperature', (150, 200))
pressure = dsp.Factor('Pressure', (10, 15))
time = dsp.Factor('Time', (30, 60))

# Categorical factors (machine, operator, material)
machine = dsp.Factor('Machine', ('A', 'B', 'C'), is_categorical=True)
operator = dsp.Factor('Operator', ('Day', 'Night'), is_categorical=True)

3. Choose Design Type¶

The choice depends on your objectives and resources:

# Full factorial: Complete information, higher cost
builder = dsp.FullFactorialDesignBuilder(temperature, pressure, time)

# Fractional factorial: Screening many factors, lower cost
builder = dsp.FractionalFactorialDesignBuilder(
    temp, pressure, time, concentration, catalyst,
    generators=['D=AB', 'E=AC']  # 5 factors in 8 runs
)

# 2^k specialized: Most common for industrial experiments
builder = dsp.FullFactorial2kDesignBuilder(temperature, pressure, time)

4. Add Design Features¶

# Enhance your design with:
builder = dsp.FullFactorial2kDesignBuilder(
    temperature, pressure, time,
    replicates=3,        # Repeat for error estimation
    central_points=5,    # Detect curvature
    blocks='highest',    # Control for time trends
    shuffle=True         # Randomize run order
)

5. Generate and Execute¶

# Generate the design matrix
design = builder.build_design(corrected=False)
print(design)

# Execute experiments according to run_order
# Record your response data

Practical Example: Chemical Process Optimization¶

Let's walk through a complete example optimizing a chemical reaction:

Problem Setup¶

You want to maximize yield in a chemical reaction and suspect that temperature, pressure, and catalyst type affect the outcome.

import daspi as dsp
import numpy as np

# Define factors
temperature = dsp.Factor('Temperature', (150, 200))  # °C
pressure = dsp.Factor('Pressure', (1.0, 1.5))       # bar
catalyst = dsp.Factor('Catalyst', ('A', 'B'), is_categorical=True)

# Create a 2^3 design with center points
builder = dsp.FullFactorial2kDesignBuilder(
    temperature, pressure, catalyst,
    replicates=2,        # Run each combination twice
    central_points=4,    # 4 center points to check curvature
    shuffle=True         # Randomize to avoid time effects
)

design = builder.build_design(corrected=False)
print(f"Total experimental runs: {len(design)}")
print(design.head(10))

Analyzing Results¶

After running your experiments, you can analyze the results:

# Add your response data (simulated here)
np.random.seed(42)
design['Yield'] = (
    75 +  # baseline
    5 * (design['Temperature'] == 200) +     # temperature effect
    3 * (design['Pressure'] == 1.5) +       # pressure effect
    -2 * (design['Catalyst'] == 'B') +      # catalyst effect
    2 * ((design['Temperature'] == 200) & (design['Pressure'] == 1.5)) +  # interaction
    np.random.normal(0, 1.5, len(design))   # experimental error
)

# Calculate factor effects
high_temp_yield = design[design['Temperature'] == 200]['Yield'].mean()
low_temp_yield = design[design['Temperature'] == 150]['Yield'].mean()
temperature_effect = high_temp_yield - low_temp_yield

print(f"Temperature effect: {temperature_effect:.2f} percentage points")

Design Selection Guide¶

Number of Factors vs. Design Choice¶

Factors	Recommended Design	Runs	Purpose
2-4	Full Factorial	4-16	Complete understanding
5-8	Fractional Factorial	8-16	Factor screening
9+	Sequential DOE	Variable	Multi-stage approach

Resource Constraints¶

Limited time/money: Start with fractional factorial for screening

# Screen 6 factors in just 8 runs
builder = dsp.FractionalFactorialDesignBuilder(
    *six_factors,
    generators=['D=AB', 'E=AC', 'F=BC']
)

Need complete picture: Use full factorial

# All interactions for 3 factors
builder = dsp.FullFactorialDesignBuilder(*three_factors)

Advanced Techniques¶

Sequential Experimentation¶

Screen with fractional factorial to identify important factors
Optimize with full factorial or response surface methods on important factors
Confirm with verification runs at optimal conditions

Blocking Strategy¶

Control for known nuisance variables:

# Block by time periods
builder = dsp.FullFactorialDesignBuilder(
    temp, pressure, catalyst,
    blocks=4,  # Split across 4 time periods
    replicates=2
)

# Block by highest-order interaction (recommended)
builder = dsp.FullFactorialDesignBuilder(
    temp, pressure, catalyst,
    blocks='highest',  # Confound with 3-way interaction
)

Foldover for Fractional Factorials¶

Resolve confounding between main effects and interactions:

# Initial fractional factorial
builder = dsp.FractionalFactorialDesignBuilder(
    A, B, C,
    generators=['C=AB']
)

# Add foldover to resolve confounding
builder = dsp.FractionalFactorialDesignBuilder(
    A, B, C,
    generators=['C=AB'],
    fold=True  # Doubles the runs, resolves main effect confounding
)

Common Pitfalls and Solutions¶

Too Many Factors Initially¶

Problem: Trying to study 10+ factors in one experiment

Solution: Use sequential approach - screen first, then optimize

Ignoring Practical Constraints¶

Problem: Factors levels that are impractical or unsafe

Solution: Choose levels based on practical operating ranges

Confusing Correlation with Causation¶

Problem: Assuming all significant effects are causal

Solution: Use engineering knowledge to interpret statistical results

Not Randomizing¶

Problem: Running experiments in standard order

Solution: Always randomize unless there are strong practical constraints

Best Practices¶

Start Simple: Begin with screening designs for many factors
Use Replication: Include replicates to estimate error
Add Center Points: Help detect curvature in responses
Block When Possible: Control for known nuisance variables
Randomize Run Order: Reduce bias from time trends
Document Everything: Record all conditions and observations
Validate Results: Confirm findings with follow-up experiments

Next Steps¶

After mastering basic DOE:

Learn response surface methodology for optimization
Explore mixture designs for formulation problems
Study robust design for reducing variation
Consider computer experiments for simulation studies

The key is to start with simple designs and build complexity as you gain experience and understanding of your system.