Introduction

Video-based learning (VBL) is a pedagogical modality that leverages audiovisual content to deliver instructional material. It has become increasingly prevalent in higher education (Navarrete et al., 2025), particularly within online, hybrid, and flipped classroom models. Asynchronous video lectures, explainer animations, and recorded demonstrations provide flexible and scalable access to course content (Garcia & Yousef, 2023), supporting self-regulated learning and accommodating diverse cognitive and learning styles. Research in multimedia learning theory suggests that well-designed instructional videos can enhance cognitive processing by combining visual and verbal channels (Mayer, 2009). According to this theory, this approach facilitates dual coding and reduces extraneous cognitive load. In classroom settings, VBL is often employed to maximize in-class time for higher-order tasks by offloading lower-order content to video. As the use of video proliferates, educators and instructional designers are exploring augmentation strategies to increase the pedagogical efficiency of video consumption (Mayer, 2021; Tanprasert et al., 2023; Torre et al., 2022). Recent trends also point to increasing learner autonomy in navigating video content. Students often skim, pause, or accelerate videos to optimize their study habits, which has led to new concerns about selective attention and superficial engagement (Ahn & Chan, 2025; Ritzhaupt et al., 2015). As learners gain more control over how they interact with video content, educators face growing uncertainty about whether this autonomy supports meaningful learning or facilitates avoidance of cognitively demanding material. This evolving landscape has prompted renewed interest in instructional support, particularly those that can guide or scaffold learner engagement without undermining content comprehension.

One such support mechanism gaining traction is video summarization (Kawamura & Rekimoto, 2024), or the condensation of lengthy instructional videos into concise overviews. These summaries are designed to enhance efficiency by helping learners quickly grasp key ideas, preview content, or review material after initial exposure. Traditionally produced manually, video summaries are now increasingly generated through artificial intelligence (AI) by leveraging advancements in automatic speech recognition, natural language processing, and large language models (Gupta & Sharma, 2023). These systems can extract and condense essential information from video transcripts in a matter of seconds, which offer on-demand summarization with minimal effort. Importantly, these AI-generated summaries are not only integrated into learning platforms by instructors but are also increasingly generated independently by students themselves. This growing accessibility empowers learners to customize their engagement with instructional content, but it also raises new questions about how such tools influence learning behaviors, cognitive investment, and the perceived necessity of engaging with full-length material. As AI continues to reshape the design and delivery of educational media, the pedagogical implications of automated summarization warrant scrutiny. While summaries may enhance accessibility and learning efficiency, their presence also creates a decision point in the learning process: Should learners invest time in the full instructional experience, or rely on the condensed version provided?

Despite the growing prevalence of AI-generated summaries in educational contexts, empirical research on their effects remains limited (Apostolidis et al., 2021). Much of the existing literature on multimedia learning focuses on video design principles, while studies on AI in education often emphasize technical accuracy, user satisfaction, or automation (Hasanah et al., 2025). What remains underexplored is how the availability of AI-generated summaries affects core learning variables such as engagement, comprehension, and motivation. The ability to bypass full-length videos in favor of algorithmically produced summaries introduces new learner behaviors that may not align with instructional intent. This shift toward efficiency could inadvertently promote surface-level processing, reduce persistence, or weaken the formation of robust mental models. Moreover, when learners independently generate summaries, instructors lose visibility into how students interact with core content. To date, most studies have examined summary usage in cumulative or multi-lesson contexts, leaving open the question of how even a single exposure to an AI-generated summary might influence learner behavior. First-time encounters with such tools are especially important, as they may set behavioral expectations or engagement patterns that persist across subsequent learning experiences. Understanding these initial effects is critical for instructors, instructional designers, and educational technology developers aiming to balance efficiency with depth of learning. This study addresses this gap by examining the immediate impact of AI-generated summaries in a single video-based lesson. Specifically, it investigates how summary availability affects student engagement with the video, comprehension of content, and motivation to watch the full material. It is hypothesized that learners with access to a summary will exhibit reduced engagement (H1), slightly lower comprehension (H2), and diminished intrinsic motivation to engage with the complete video experience (H3).

Materials and Methods

Research Design and Participants

This study employed a quasi-experimental, between-subjects design to examine the immediate effects of AI-generated summaries on student engagement, comprehension, and motivation during a single video-based lesson. A quasi-experimental approach was selected due to practical constraints in classroom randomization and to maintain ecological validity within an authentic learning environment. By focusing on single exposure rather than longitudinal implementation, the study aimed to isolate the initial behavioral and cognitive impacts of summary availability, providing insight into how learners respond when first encountering AI-generated scaffolds. Participants were 60 undergraduate students enrolled in an introductory-level course at a large university in the Philippines. The sample included students from diverse academic majors to reflect the general population of learners. Participants were recruited through course announcements and provided informed consent prior to participation. They were randomly assigned to one of two conditions: a control group, which received a full-length instructional video only, and an experimental group, which received both the full video and an AI-generated summary. Random assignment was used to reduce selection bias and to control potential confounding variables such as prior knowledge, digital literacy, and video viewing habits.

Instructional Materials and Measures

The instructional content consisted of a single 15-minute educational video selected from the course curriculum. The video was chosen for its moderate complexity and relevance to course learning objectives, ensuring that all participants could engage with the material without requiring extensive background knowledge. For the experimental condition, an AI-generated textual summary was created using ChatGPT. The summary condensed the video's key points into a 150–200-word paragraph, offering a high-level overview of the content without detailed explanations or visual context. To measure learning outcomes, a 10-item multiple-choice comprehension quiz was used. This quiz had been originally developed by the faculty-in-charge as a formative assessment for the course. The assessment was aligned with the learning objectives of the lesson and included items targeting both factual recall and basic conceptual understanding. Motivation was assessed using a shortened version of the Intrinsic Motivation Inventory (IMI; Ryan, 1982), adapted to assess three dimensions: interest/enjoyment, value/usefulness, and effort/importance. These subscales have been validated in prior research as reliable indicators of task-specific motivation in educational settings. Engagement was operationalized using behavioral metrics recorded through the institution's learning management system (LMS). These included total video watch time (in minutes), skip-ahead actions (defined as forward seeking actions of more than 10 seconds), and replay behavior (defined as any backward seeking action). All interactions were tracked to preserve the naturalistic learning experience.

Procedures and Data Analysis

All participants completed the study asynchronously within a scheduled time window. Upon accessing the LMS, students first completed a brief pre-lesson questionnaire assessing their familiarity with the video topic and their typical study behaviors. Following this initial survey, participants viewed the instructional video. For those in the experimental group, an AI-generated summary was presented in a sidebar adjacent to the video player (Figure 1). Participants were not given specific instructions on how or when to engage with the summary, which allowed them to choose whether to read it before, during, or after watching the video, or even to rely on it instead of viewing the video. They were simply instructed to engage with the materials as they normally would for a class assignment. After completing the learning session, all participants took the comprehension quiz, followed by the post-lesson motivation survey. The entire activity was designed to be completed within 30 minutes. No instructor support or feedback was provided during the session to preserve the ecological validity of a self-regulated VBL environment. Data were analyzed using independent samples t-tests to compare the two groups across three primary outcomes: video engagement (measured as total watch time), comprehension (quiz score out of 10), and intrinsic motivation (composite IMI score). This statistical approach was chosen to evaluate mean differences between groups, consistent with the between-subjects design and the continuous nature of the dependent variables. Effect sizes were reported using Cohen's d to indicate the magnitude of observed differences.

Results

Engagement

Consistent with the H1 that learners with access to an AI-generated summary would exhibit reduced engagement, participants in the summary group watched significantly less of the instructional video compared to those in the control group. This suggests that the availability of a summary may have altered learners' perceptions of the video's necessity. On average, students in the control condition (video only) viewed 9.1 ± 1.2 minutes of the 15-minute video, while those in the summary condition watched only 2.9 ± 1.8 minutes, t(58) = 12.35, p < .001. This difference reflects an extremely large effect size (d = 4.13), indicating that the two groups differed by more than four standard deviations in viewing time. While such an effect size is relatively uncommon in educational research, it is coherent with the stark behavioral divergence observed: many participants in the summary group appeared to bypass the video almost entirely.

Variables Control (n = 30) Summary (n = 30) Mean Difference
Watch Time (minutes) 9.1 (SD = 1.2) 2.9 (SD = 1.8) -6.2
Comprehension Score (0–10) 8.2 (SD = 1.1) 7.6 (SD = 1.4) -0.6
Intrinsic Motivation (1–7) 5.8 (SD = 0.7) 5.2 (SD = 0.9) -0.6

This disparity of over six minutes suggests that the AI-generated summary was not treated as a supplementary aid but rather as a substitute for the primary instructional content. This behavior indicates that learners may have prioritized efficiency or convenience over a thorough learning process. Engagement behavior differed not only in amount but also in kind. A large proportion of summary-group participants (42%) skipped ahead during video playback, while only 17% of control participants did so. Conversely, 35% of control participants replayed parts of the video, while only 20% did so in the summary condition. This result suggests a reduced cognitive monitoring and effortful re-engagement with complex segments. LMS data further revealed that 68% of summary-group participants opened the summary before beginning the video. Of those, 51% either stopped playback within the first two minutes or skipped directly to the end. This usage pattern reinforces the interpretation that the AI-generated video summary was perceived not as a replacement for the content, rather than a tool to support deeper learning.

Comprehension

In line with the hypothesized outcome of lower comprehension in the summary condition (H2), quiz scores were moderately higher for the control group (8.2 ± 1.1) than for the summary group (7.6 ± 1.4). Although the difference did not meet the conventional threshold for statistical significance, t(58) = 1.74, p = .087, it reflected a small-to-moderate effect size (d = 0.45). Given the sample size and variability, this effect may be meaningful despite the marginal p-value. This finding suggests a meaningful reduction in understanding, especially considering the minimal video exposure in the summary group. Item-level analysis showed that the performance gap was more pronounced on higher-order items involving causal reasoning and conceptual transfer, where the summary group underperformed by 11–15%. Differences in factual recall items were smaller (< 5%). These patterns suggest that learners who largely skipped the video may have missed explanatory content essential for conceptual learning. In addition, the summary group showed greater variance in comprehension scores, likely reflecting diverse strategies, ranging from full summary reliance to partial video engagement, that led to inconsistent outcomes.

Outcome Variable t(df) p Cohen's d
Watch Time 12.35 (58) < .001 3.18
Comprehension Score 1.74 (58) .087 0.45
Intrinsic Motivation 2.60 (58) .012 0.67

Intrinsic Motivation

Reflecting the hypothesized motivational decline (H3), students in the summary group scored lower on intrinsic motivation measures. The control group had a composite motivation score of 5.8 ± 0.7, while the summary group averaged 5.2 ± 0.9. This difference was statistically significant (t(58) = 2.60, p = .012) and reflected a moderate effect size (d = 0.67), indicating a meaningful decline in learners' internal drive to engage with the material. One possible explanation is that the presence of a pre-written summary may have reduced learners' sense of autonomy or personal responsibility in the learning process. The largest discrepancies were observed on the interest/enjoyment subscale, followed by value/usefulness, and effort/importance. These results suggest that students who bypassed the instructional video may have viewed the task as less engaging and less worthwhile, reinforcing the interpretation that the summary was perceived as a shortcut rather than a complementary learning aid.

Interaction Behavior Control Group (%) Summary Group (%)
Skipped Ahead in Video 17% 42%
Replayed Segments 35% 20%
Stopped Video Within 2 Minutes 7% 51%
Opened Summary Before Video - 68%

Discussion

Education is undergoing a gradual shift as AI begins to reshape how learning is delivered and assessed (Acut et al., 2025; Garcia et al., 2025; Xiao et al., 2025). In addition to content personalization and administrative task automation, the integration of AI-generated support into digital learning environments presents significant opportunities and pedagogical challenges. While tools such as video summaries are often designed to enhance learner efficiency, relatively little is known about their impact on behavior when introduced at the point of initial learning. Thus, this study examined the immediate effects of AI-generated video summaries on learner engagement, comprehension, and intrinsic motivation within a video-based learning context. By comparing learners who watched a full instructional video alone with those who also received a concise AI-generated summary, the study offers empirical insight into how summary access can influence learning behaviors and outcomes. Although the potential of AI tools to support efficiency in education is frequently emphasized in both research and practice (Crompton & Burke, 2023, 2024), the present findings suggest that these affordances may come with trade-offs. Specifically, the data indicates that when summaries are offered as optional support, they may act more as substitutes for engagement than as scaffolds for deeper learning. This section discusses the implications of these findings for educational design, learner support, and future AI integration.

Summary Access as a Gateway to Shallow Engagement

The contrast in watch time between the control group and the summary group (9.1 minutes vs. 2.9 minutes) represents more than a quantitative difference in media consumption. It appears that there is a qualitative shift in how learners approached the task. The summary group did not simply use the AI-generated text as a preparatory tool or review aid. Instead, many treated it as a substitute for the instructional video itself. This behavior exemplifies what scholars have termed the efficiency fallacy, where learners prioritize speed and minimal cognitive effort over depth of understanding. The interaction data further supports this. Skip-ahead behavior, a potential indicator of selective attention and disengagement, was more than twice as frequent in the summary condition. Replay behavior, often associated with deeper cognitive processing and reflection (Odo, 2022), was nearly halved. These findings are consistent with cognitive load theory. While reducing extraneous load can benefit learning, eliminating too much germane load can inhibit schema construction (Mayer, 2021). This dynamic may be especially pronounced in environments where learners perceive summaries as instructor-approved shortcuts. The presence of a video summary likely conveyed an implicit endorsement of its sufficiency. As studies on metacognitive strategy use have shown, learners often make poor judgments about the sufficiency of brief content representations, particularly when cognitive effort is not externally reinforced. Considering these findings, the key issue may not be the presence of video summaries per se, but the affordances they signal to learners (Garcia et al., 2025). Without structured guidance, AI-generated support may unintentionally encourage learners to bypass cognitively demanding tasks in favor of minimal-effort alternatives (Jin et al., 2023). In this way, summary access lowers the perceived cost and the expected depth of participation.

Summaries and the Undermining of Conceptual Learning

The observed gap in comprehension scores, though not statistically significant, suggests a potential difference with instructional consequences. This disparity highlights the latent limitation of AI-generated summaries for supporting the learning of specific concepts, especially when those summaries lack the explanatory detail, analogies, and elaborative cues embedded in well-designed instructional videos. This finding is consistent with multimedia learning theory, which emphasizes that knowledge construction is facilitated by the integration of verbal and visual information (Mayer, 2021). The summary omits critical visual scaffolds such as diagrams, demonstrations, and instructor gestures, all of which contribute to knowledge encoding and retrieval. By removing these features, the summary may inadvertently constrain learners to surface-level processing and rote memorization rather than fostering causal inference and abstract reasoning. Additionally, the summary's brevity introduces a compression effect. While this can aid in rapid review or preview, it also risks filtering out essential conceptual transitions. Research on instructional coherence has shown that omitting linking material (e.g., transitional explanations or conceptual bridges) can disrupt mental model formation and impede transfer (Mayer, 2009; Ritzhaupt et al., 2015). In the current study, the underperformance of the summary group on conceptual transfer items suggests that such coherence gaps were indeed consequential. The greater variability in comprehension within the summary group points to individual differences in how learners interpreted and acted upon the summary. Some may have treated it as an overview and still watched substantial portions of the video, while others relied exclusively on the summary. This variation highlights the need for metacognitive support when integrating AI-generated content (Xu et al., 2025). Without explicit cues or training in how to use summaries strategically, learners are left to infer their pedagogical value. Prior work on self-regulated learning suggests that novice learners, in particular, struggle to accurately assess what content warrants further exploration. Collectively, the present study suggests that while video summaries offer an appealing form of content compression, their use in learning can jeopardize the comprehension of complex material unless paired with appropriate design features that promote active engagement.

Motivational Costs of Reduced Cognitive Challenge

Motivation is both a precursor to and an outcome of meaningful learning. In this study, intrinsic motivation was significantly lower among learners with the summary access. These findings challenge the common assumption that convenience and reduced effort necessarily enhance learner satisfaction. Instead, the results align with theories of cognitive engagement and motivation that emphasize the role of challenge, effort, and achievement in generating satisfaction (Deci & Ryan, 2013). When learners bypass the full instructional experience in favor of a condensed summary, they may miss the sense of progression and mastery that accompanies sustained cognitive effort. This cognitive thinning of the learning experience appears to reduce not only knowledge acquisition but also affective investment. The decline in perceived value is especially noteworthy. It suggests that learners who shortcut the process may retrospectively judge the task as less worthwhile. This devaluation effect is consistent with expectancy-value theory, which posits that learners' beliefs about the utility of a task are shaped in part by the level of engagement it elicits (Eccles & Wigfield, 2002). Furthermore, the reduced motivation may be exacerbated by the passive nature of reading a summary. Unlike video engagement, which often demands sustained attention, processing of voice, and sometimes interaction with visual stimuli, summaries require comparatively little sensory or cognitive involvement. This aspect aligns with emerging concerns in AI-mediated education that convenience may erode the motivational scaffolding provided by more immersive modalities (Miranda et al., 2025). Of particular concern is the potential for first-time exposure to summary-based learning to set lasting patterns. Research in learning sciences indicates that early experiences with instructional formats can shape future engagement heuristics (National Research Council, 2000). If learners internalize the norm that summaries are "good enough," they may carry this strategy into other contexts, thereby undermining long-term academic habits and outcomes. The motivational costs observed here point to a deeper tension in the design of educational technology of the balance between efficiency and engagement. AI-generated tools must be evaluated not only for their informational accuracy or usability, but also for their capacity to sustain student curiosity and value.

Design Implications, Limitations, and Future Research

The integration of AI-generated content into educational settings is a pedagogical decision with far-reaching implications for how learners engage with, process, and value instructional material. As the findings of this study illustrate, tools designed for efficiency can reshape learning behaviors in unintended ways, particularly when introduced without accompanying guidance or structural scaffolding. From a design perspective, instructors, instructional designers, and educational institutions must recognize that the presence of optional AI-generated summaries may inadvertently signal that deeper engagement with instructional media is unnecessary. For teachers, the study underscores the importance of setting clear expectations about how support materials should be used. Instructional designers should consider embedding summaries within structured pathways that include prompts for reflection, checkpoints for comprehension, or requirements to engage with full media before accessing condensed versions. Schools that promote digital learning ecosystems should be cautious about offering summaries in isolation, as doing so may unintentionally undermine pedagogical goals.

Across all levels of educational design, careful orchestration of when and how summaries are introduced can mitigate their potential to displace more cognitively demanding learning activities. One direction is to present summaries as post-exposure reinforcement tools or embed them within retrieval practice tasks rather than positioning them as pre-content alternatives. Another is to integrate prompts that activate metacognitive monitoring, encouraging learners to question whether a summary alone is sufficient for mastery. An alternative approach is to replace text-based summaries with nanolearning versions of instructional videos. Nanolearning is "about providing digestible small learning units" that align with modern attention spans and just-in-time learning needs (Garcia et al., 2022). When designed intentionally, these ultra-short video segments can preserve explanatory depth while maintaining brevity. These short-form videos can offer a middle ground between full-length content and overly compressed text. Unlike generic AI-generated summaries, nanolearning modules can incorporate visual cues, instructor narration, and scaffolded examples to support cognitive engagement (Yousef et al., 2023). When embedded as modular components within a broader instructional sequence, they can promote efficiency without undermining conceptual clarity or learner motivation.

Despite the behavioral patterns observed, several limitations must be acknowledged. First, the study employed a single-session design, capturing only learners' immediate responses to summary access. This setup limits the ability to draw conclusions about long-term learning or behavior change. Future research should adopt longitudinal designs to examine how repeated exposure to AI-generated summaries shapes comprehension, motivation, and study habits over time. Additionally, although the sample size was sufficient to detect significant effects, the extremely large effect size observed for engagement should be interpreted with caution. Such large values, while statistically accurate for this dataset, are rare in educational research and may reflect context-specific factors such as task novelty, absence of grade incentives, or a particularly strong manipulation. Furthermore, the study was limited to one instructional topic and video format. It remains unclear whether similar patterns would emerge across disciplines, content types, or with learners of different ages and educational backgrounds. The AI-generated summary used in this study was purely textual; future studies should explore whether multimodal summaries (e.g., video abstracts, narrated slides, or animated recaps) produce different outcomes, particularly regarding cognitive engagement and transfer.

Building on these limitations, several avenues for future research emerge. First, studies should explore how learners interpret the presence of AI-generated support (whether as trusted substitutes, optional tools, or signals of reduced task importance) and how these interpretations shape behavior. Second, future work should examine the impact of scaffolded summary use, in which learners are explicitly instructed on how and when to use AI-generated summaries effectively. Research on this topic could include testing metacognitive prompts, usage guidelines, or system-enforced sequencing (e.g., requiring full video engagement before summary access). Additionally, researchers should investigate the effects of summary quality, including the level of detail, inclusion of conceptual scaffolds, and alignment with instructional objectives, on learner outcomes. Finally, as AI-generated content becomes more pervasive in educational platforms, researchers must ask what learners gain and might lose from such tools. Efficiency and accessibility are vital goals, but they should not come at the cost of sustained engagement, intrinsic motivation, and conceptual understanding. The next generation of learning design must consequently strike a careful balance between technological affordances and pedagogical integrity.

Conclusion

Watching videos has become a central pedagogical strategy in contemporary education. As VBL continues to expand through online platforms and AI-enhanced tools, understanding how learners interact with and respond to video content is increasingly essential. This study examined the impact of AI-generated summaries on learner engagement, comprehension, and intrinsic motivation within a video-based instructional context. While summaries are often intended to support efficiency and accessibility, the findings indicate that when offered without explicit guidance, they may unintentionally encourage shallow engagement, weaken conceptual learning, and undermine motivational investment. Learners with access to AI-generated summaries watched significantly less of the instructional video, performed slightly worse on higher-order comprehension assessment, and reported lower interest and perceived value in the learning experience. These outcomes underscore the need for thoughtful integration of AI-generated support that can preserve cognitive challenges and promote sustained interaction with core content. As educators and designers increasingly incorporate AI into digital learning environments, careful attention must be given not only to what these tools provide, but also to how they shape learners' perceptions, behaviors, and long-term habits of engagement.