Student dropout rarely announces itself

A higher education institution serving thousands of students across associate’s, bachelor’s, and master’s programmes faced a persistent challenge: knowing which students were most at risk of not returning the following term — and knowing it early enough to act.

Advisors relied largely on instinct and reactive outreach. By the time a student’s risk became obvious — missed assignments, unpaid balances, a formal withdrawal — the window for meaningful intervention had often already closed. The institution needed a way to surface warning signals earlier, automatically, and at scale.

“The goal wasn’t just prediction — it was getting the right information to the right advisor before a student quietly disappeared.”

An added architectural constraint: the institution operated on Salesforce but held a limited analytics licence pool against a significantly larger Salesforce user base. Any solution would need to work for everyone — not just those with specialized analytics access.


A lean, iterative build across five phases

The engagement followed a design-led methodology — starting with deep business understanding before touching a single dataset — and moved through five structured phases over ten weeks.

Define & Design
Business scoping, stakeholder alignment, and solution architecture planning.
Build & Iterate
Data preparation and feature engineering across three focused development sprints.
Evaluate
Model validation against a held-out test set from a separate historical term.
Deploy
Production rollout with monitoring plan and write-back automation configured.
Knowledge Transfer
Full handoff so internal teams can maintain and extend the model independently.

From 185 variables down to 13 that matter

The team drew from both native Salesforce objects and external system extracts — pulling together academic records, financial aid data, engagement activity from the learning management system, and enrolment history into a unified dataset via CRM Analytics Recipes.

After rigorous analysis, 185 candidate features were narrowed to 40 for model training, then refined further to a core set of 13 organized into four categories:

Academic
GPA, DFW rates (D/F/Withdrawal), total credits earned, gateway course performance, academic holds.
Demographic
Age, generation, preferred language, marital status, and geographic location.
Engagement
Days since last engagement, longest disengagement periods, activity in foundational courses.
Financial
Total financial holds, award types and amounts, funding sources.
Feature Category Model Importance
Programme of Study Academic 41.03%
Student Location Demographic 9.39%
Engagement Cluster Engagement 7.05%
Total Financial Aid Awarded Financial 5.06%

A risk score every advisor can see

Einstein Discovery generates a persistence likelihood score for each enrolled student at the start of every term. Rather than locking this insight behind a specialized analytics interface, the team engineered a write-back mechanism: scores are pushed directly to each student’s Contact record in Salesforce — accessible to all standard-licence users across the institution.

Students are automatically segmented into three intervention tiers:

High Risk
Immediate advisor outreach prioritized
Medium Risk
Monitoring with proactive check-ins
Low Risk
Routine support, no urgent action

A centralized dashboard — designed specifically for student success staff — layers on comparative views by degree type and programme, with filters for department and cohort. Advisors see exactly where their caseload stands, in one place, without navigating multiple systems.

The write-back strategy solved a real institutional constraint: predictive intelligence should not be locked behind a licence count. It belongs in the workflow every advisor already uses daily.


A path toward near-real-time intelligence

The current model operates on batch data refreshed at the start of each term. The technical roadmap points toward a future state that is significantly more dynamic — and more responsive to changes in student behaviour as they happen.

Now
Batch scoring via data extracts
Data pulled from Salesforce and external systems each term, processed through CRM Analytics Recipes into a unified model input dataset.
Next
Data Cloud integration
ETL into Data Lake Objects and Data Model Objects, enabling live data connections to CRM Analytics and reducing refresh latency.
Future
Near-real-time predictive scoring
Scores refresh continuously as student behaviour changes — not just at the semester boundary — enabling in-term intervention that current batch architecture cannot support.

Honest about the gaps from the start

A key strength of this engagement was surfacing data limitations early rather than working around them. Two datasets — grade detail records with high null values, and withdrawal records with overlapping statuses — were excluded from the initial model to preserve integrity rather than risk polluting predictions with unreliable inputs.

The team also flagged where institutional behaviour itself created noise: a support ticketing system designed to log student interactions was inconsistently used across departments, limiting its predictive value. Meaningful social engagement data — peer connections, campus activity — simply was not being captured anywhere in a structured form.

These are not failures. They are a roadmap for what to build next — and being explicit about them builds trust with the stakeholders who need to act on the model’s outputs.

Data integrity over data volume. Excluding unreliable features preserved model trust and produced a leaner, more explainable prediction set.
Accessibility by design. The write-back architecture ensured predictive intelligence reached every advisor — not just those with specialized licence access.
Build for the roadmap, not just the sprint. Designing with Data Cloud integration in mind from day one means the future-state architecture requires evolution, not a rebuild.
Einstein Discovery CRM Analytics Salesforce Predictive Modeling Student Success Feature Engineering Data Cloud Agile Delivery