Persistence Insights: Predictive Analytics in Higher Education

The Challenge

Student dropout rarely announces itself

A higher education institution serving thousands of students across associate’s, bachelor’s, and master’s programmes faced a persistent challenge: knowing which students were most at risk of not returning the following term — and knowing it early enough to act.

Advisors relied largely on instinct and reactive outreach. By the time a student’s risk became obvious — missed assignments, unpaid balances, a formal withdrawal — the window for meaningful intervention had often already closed. The institution needed a way to surface warning signals earlier, automatically, and at scale.

“The goal wasn’t just prediction — it was getting the right information to the right advisor before a student quietly disappeared.”

An added architectural constraint: the institution operated on Salesforce but held a limited analytics licence pool against a significantly larger Salesforce user base. Any solution would need to work for everyone — not just those with specialized analytics access.

The Approach

A lean, iterative build across five phases

The engagement followed a design-led methodology — starting with deep business understanding before touching a single dataset — and moved through five structured phases over ten weeks.

Define & Design

Business scoping, stakeholder alignment, and solution architecture planning.

Build & Iterate

Data preparation and feature engineering across three focused development sprints.

Evaluate

Model validation against a held-out test set from a separate historical term.

Deploy

Production rollout with monitoring plan and write-back automation configured.

Knowledge Transfer

Full handoff so internal teams can maintain and extend the model independently.

The Data

From 185 variables down to 13 that matter

The team drew from both native Salesforce objects and external system extracts — pulling together academic records, financial aid data, engagement activity from the learning management system, and enrolment history into a unified dataset via CRM Analytics Recipes.

After rigorous analysis, 185 candidate features were narrowed to 40 for model training, then refined further to a core set of 13 organized into four categories:

Academic

GPA, DFW rates (D/F/Withdrawal), total credits earned, gateway course performance, academic holds.

Demographic

Age, generation, preferred language, marital status, and geographic location.

Engagement

Days since last engagement, longest disengagement periods, activity in foundational courses.

Financial

Total financial holds, award types and amounts, funding sources.

Top Predictors by Model Importance

Feature	Category	Model Importance
Programme of Study	Academic	41.03%
Student Location	Demographic	9.39%
Engagement Cluster	Engagement	7.05%
Total Financial Aid Awarded	Financial	5.06%

The Solution

A risk score every advisor can see

Einstein Discovery generates a persistence likelihood score for each enrolled student at the start of every term. Rather than locking this insight behind a specialized analytics interface, the team engineered a write-back mechanism: scores are pushed directly to each student’s Contact record in Salesforce — accessible to all standard-licence users across the institution.

Students are automatically segmented into three intervention tiers:

High Risk

Immediate advisor outreach prioritized

Medium Risk

Monitoring with proactive check-ins

Low Risk

Routine support, no urgent action

A centralized dashboard — designed specifically for student success staff — layers on comparative views by degree type and programme, with filters for department and cohort. Advisors see exactly where their caseload stands, in one place, without navigating multiple systems.

The write-back strategy solved a real institutional constraint: predictive intelligence should not be locked behind a licence count. It belongs in the workflow every advisor already uses daily.

What’s Next

A path toward near-real-time intelligence

The current model operates on batch data refreshed at the start of each term. The technical roadmap points toward a future state that is significantly more dynamic — and more responsive to changes in student behaviour as they happen.

Now

Batch scoring via data extracts

Data pulled from Salesforce and external systems each term, processed through CRM Analytics Recipes into a unified model input dataset.

ETL into Data Lake Objects and Data Model Objects, enabling live data connections to CRM Analytics and reducing refresh latency.

Future

Near-real-time predictive scoring

Scores refresh continuously as student behaviour changes — not just at the semester boundary — enabling in-term intervention that current batch architecture cannot support.

Lessons Learned

Honest about the gaps from the start

A key strength of this engagement was surfacing data limitations early rather than working around them. Two datasets — grade detail records with high null values, and withdrawal records with overlapping statuses — were excluded from the initial model to preserve integrity rather than risk polluting predictions with unreliable inputs.

The team also flagged where institutional behaviour itself created noise: a support ticketing system designed to log student interactions was inconsistently used across departments, limiting its predictive value. Meaningful social engagement data — peer connections, campus activity — simply was not being captured anywhere in a structured form.

These are not failures. They are a roadmap for what to build next — and being explicit about them builds trust with the stakeholders who need to act on the model’s outputs.

Data integrity over data volume. Excluding unreliable features preserved model trust and produced a leaner, more explainable prediction set.

Accessibility by design. The write-back architecture ensured predictive intelligence reached every advisor — not just those with specialized licence access.

Build for the roadmap, not just the sprint. Designing with Data Cloud integration in mind from day one means the future-state architecture requires evolution, not a rebuild.

Einstein Discovery CRM Analytics Salesforce Predictive Modeling Student Success Feature Engineering Data Cloud Agile Delivery

Using Predictive Analytics
to Keep Students on Track

Student dropout rarely announces itself

A lean, iterative build across five phases

From 185 variables down to 13 that matter

A risk score every advisor can see

A path toward near-real-time intelligence

Honest about the gaps from the start

Advisors working smarter. Students staying longer.

Using Predictive Analyticsto Keep Students on Track

Student dropout rarely announces itself

A lean, iterative build across five phases

From 185 variables down to 13 that matter

A risk score every advisor can see

A path toward near-real-time intelligence

Honest about the gaps from the start

Advisors working smarter. Students staying longer.

Using Predictive Analytics
to Keep Students on Track