Artificial intelligence-enhanced electrocardiography for acute myocardial infarction detection: a systematic review

Yangyoun Lee; Jung Hwan Ahn; Hyoung-Mo Yang

doi:10.21037/cdt-2025-aw-561

Review Article

Artificial intelligence-enhanced electrocardiography for acute myocardial infarction detection: a systematic review

Yangyoun Lee¹ , Jung Hwan Ahn², Hyoung-Mo Yang¹

¹Department of Cardiology, Ajou University School of Medicine, Suwon, Republic of Korea; ²Department of Emergency Medicine, Ajou University School of Medicine, Suwon, Republic of Korea

Contributions: (I) Conception and design: Y Lee, HM Yang; (II) Administrative support: HM Yang; (III) Provision of study materials or patients: Y Lee; (IV) Collection and assembly of data: Y Lee; (V) Data analysis and interpretation: Y Lee, HM Yang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Hyoung-Mo Yang, MD, PhD. Department of Cardiology, Ajou University School of Medicine, 164, World Cup-ro, Yeongtong-gu, Suwon 16499, Republic of Korea. Email: yhhm20@hanmail.net.

Background: Artificial intelligence (AI) shows promise for improving electrocardiogram (ECG)-based acute myocardial infarction (AMI) detection, but clinical readiness remains uncertain. We aimed to conduct a systematic synthesis informed by a structured search of AI-enhanced ECG systems to evaluate their performance, validation quality, and implementation readiness.

Methods: We conducted a structured search of PubMed and Embase (publication date limits: January 1, 2017, to August 31, 2025; last searched: February 18, 2026) for English-language human studies that developed or validated AI models for ECG-based AMI diagnosis, extracting data on architectures, clinical applications, validation approaches, and performance metrics; we excluded non-original publications and studies that were non-English or non-human studies or did not use ECG input or did not address AMI diagnosis.

Results: We included 88 studies; the total number of participants was not estimable because sample sizes were inconsistently reported across studies. Among 88 identified studies, convolutional neural networks predominated (51/88, 58%). Most studies were retrospective (80/88, 91%) and used 12-lead ECG (64/88, 73%). Reported performance varied widely [area under the receiver operating characteristic curve (AUROC), 0.700–0.999; sensitivity, 67.7–100.0%; specificity, 73.3–100.0%], with promising results for detecting subtle ischemic patterns in non-ST-elevation and occlusion myocardial infarction (OMI). However, only 33/88 (37.5%) performed external validation. Public datasets were used in 50/88 (57%) and institutional patient cohorts in 43/88 (49%); several studies used both sources. More complex architectures did not consistently demonstrate superior accuracy, though heterogeneity in study designs limits definitive conclusions.

Conclusions: AI demonstrates substantial technical potential for AMI detection, particularly for subtle ischemic patterns. However, critical gaps impede clinical deployment: insufficient external validation, reliance on curated datasets with limited generalizability, absence of standardized evaluation frameworks, and insufficient evidence on patient-centered outcomes. Future research must prioritize prospective multicenter validation, standardized benchmarks with rigorous reference standards, and real-world implementation studies examining clinical outcomes and workflow integration. Technical feasibility is established; clinical impact now depends on validation rigor and pragmatic deployment.

Keywords: Artificial intelligence (AI); electrocardiography; myocardial infarction; machine learning (ML)

Submitted Oct 30, 2025. Accepted for publication Mar 10, 2026. Published online Apr 21, 2026.

doi: 10.21037/cdt-2025-aw-561

Highlight box

Key findings

• In this systematic review of 88 studies, artificial intelligence (AI)-enhanced electrocardiogram (ECG) models showed promising diagnostic performance for acute myocardial infarction, including more subtle ischemic presentations such as non-ST-elevation and occlusion myocardial infarction.

• However, most studies were retrospective, performance reporting was highly heterogeneous, and only a minority included external validation, limiting confidence in real-world clinical readiness.

What is known and what is new?

• AI-based ECG analysis has emerged as a potentially valuable tool for improving early detection of acute myocardial infarction, particularly where conventional ECG interpretation may miss subtle ischemic patterns.

• This manuscript provides an updated and structured synthesis of recent evidence, highlighting not only reported diagnostic performance but also critical gaps in validation quality, dataset representativeness, and implementation readiness that may not be apparent from accuracy metrics alone.

What is the implication, and what should change now?

• Future studies should move beyond high retrospective performance and prioritize prospective multicenter external validation, standardized reference standards, transparent reporting, and evaluation of patient-centered and workflow-related outcomes.

• Clinical adoption of AI-enhanced ECG systems should be guided not only by technical accuracy but also by evidence of generalizability, reproducibility, and real-world impact.

Introduction

Background

Acute myocardial infarction (AMI) remains a major global cause of morbidity and mortality, underscoring the importance of timely diagnosis and treatment for optimal patient outcomes. Despite advances in diagnostic modalities such as echocardiography, high-sensitivity troponin assays, and coronary computed tomography angiography, the 12-lead electrocardiogram (ECG) remains the cornerstone of initial AMI evaluation and management owing to its low cost, rapid availability, and noninvasive nature. In particular, for ST-elevation myocardial infarction (STEMI), the ECG provides definitive evidence to prompt urgent reperfusion therapies—such as primary percutaneous coronary intervention (PCI) or fibrinolysis—as recommended by guidelines (1).

However, traditional pattern recognition-based ECG interpretation for AMI detection faces several challenges that make it difficult to distinguish AMI from other conditions. These include inter-observer variability among physicians with different levels of clinical experience and education, subtle early changes in non-ST-elevation myocardial infarction (NSTEMI) ECGs that may be missed by human readers, and confounding ECG changes such as left-bundle branch block and left ventricular hypertrophy. Furthermore, approximately one-third of patients with occlusion myocardial infarction (OMI) do not present with apparent ST-elevation on their initial ECGs (2,3). Additionally, ECG criteria, including the Minnesota Code and Sgarbossa criteria, have demonstrated suboptimal sensitivity for detecting these subtle ischemic patterns (4,5).

Advances in artificial intelligence (AI) have impacted cardiology research, and studies have explored AI applications to detect arrhythmia and left ventricular dysfunction in both traditional and wearable-device ECG systems (6-8). While traditional machine learning (ML) algorithms have demonstrated promising results, the advent of deep learning has accelerated the development of AI-enhanced ECG models for various cardiac conditions, including AMI. Deep-learning models can automatically learn complex, non-linear spatiotemporal patterns from raw ECG signals, enabling the detection of subtle ischemic signatures that may elude conventional criteria. These AI systems, trained on large, diverse annotated ECG datasets, have demonstrated performance comparable to or exceeding that of expert cardiologists in identifying AMI, including NSTEMI and OMI. AI-enabled ECG analysis can, therefore, deliver near-real-time interpretations and may help streamline triage and catheterization-lab activation; however, evidence demonstrating that AI deployment alone shortens door-to-balloon (D2B) time or improves outcomes is still in its early stages, necessitating further controlled implementation studies.

Rationale and knowledge gap

Recent reviews have broadly surveyed AI applications in ECG across ischemic and structural heart disease, providing a valuable overview of the expanding landscape (9). However, because these broader syntheses necessarily cover multiple disease domains, a focused and AMI-specific evaluation is still needed to clarify methodological rigor, clinical validity, and implementation readiness for AMI detection use cases. Building on this foundation, our review specifically examines methodological quality within the AMI detection literature alongside the clinical positioning of AI-enhanced electrocardiography (AI-ECG) outputs.

Despite growing interest in AI-ECG for AMI detection, key gaps limit clinical interpretation and deployment. Substantial heterogeneity in study design, reference standards, and reported metrics complicates cross-study comparison and assessment of clinical utility. Moreover, the evidence base is dominated by retrospective analyses—often using public or curated datasets that may not reflect real-world case mix and data quality—and relatively few studies perform external validation, limiting generalizability across settings.

Objective

This systematic review aimed to critically synthesize current evidence on AI-enhanced ECG systems for AMI detection, focusing on the primary algorithmic frameworks, target outcomes, data sources, validation methods, and performance comparisons with physician interpretation. We evaluate the application of key algorithmic backbones [traditional ML, convolutional neural network (CNN), recurrent neural network (RNN), and hybrid approaches] in detecting outcomes such as STEMI, NSTEMI, and occlusion MI. We assess public datasets and institutional data, highlight the critical deficit in external validation, and examine performance metrics across heterogeneous study designs. By addressing methodological limitations and barriers to clinical implementation, we aim to guide clinicians and researchers in evaluating the readiness of these technologies for clinical adoption and to identify priority areas for future investigation. We present this article in accordance with the PRISMA reporting checklist (available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-aw-561/rc).

Methods

Search strategy

We conducted a comprehensive literature search to identify studies that evaluated AI applications for AMI detection using electrocardiography. Two major biomedical databases, PubMed and Embase, were systematically searched for relevant publications published between January 1, 2017 and August 31, 2025. Both databases were last searched on February 18, 2026. No additional sources (reference lists, trial registries, or websites) were searched. The search strategy utilized three primary keywords: “artificial intelligence”, “electrocardiography”, and “myocardial infarction” (Table 1). These terms were combined using Boolean operators and adapted for each database to accommodate differences in indexing systems (Table S1).

Table 1

Search strategy summary

Items	Specification
Date of search	Initial search: September 30, 2025; final update: February 18, 2026
Databases searched	PubMed and Embase
Search terms used (simple)	“Electrocardiography” AND “myocardial infarction” AND “artificial intelligence”
Timeframe	Studies published from 2017-01-01 to 2025-08-31
Inclusion criteria	Original research articles evaluating AI or machine learning models for acute myocardial infarction detection or diagnosis using ECG data (human datasets; quantitative performance metrics reported)
Exclusion criteria	Review articles, editorials, commentaries, conference abstracts without full-text, animal/simulation studies, non-MI cardiac topics, non-English publications
Selection process	Duplicate records removed. Titles/abstracts screened followed by full-text review by two authors; disagreements resolved by discussion; data extracted using a standardized spreadsheet template
Additional considerations	Heterogeneity in tasks and evaluation metrics precluded meta-analysis; emphasis was placed on reporting quality and external validation

AI, artificial intelligence; ECG, electrocardiography; MI, myocardial infarction.

Inclusion and exclusion criteria

Studies were included if they met the following criteria: (I) original research articles published in English; (II) evaluation of AI or ML models for detection, diagnosis, or classification of AMI using ECG data; (III) utilization of human data, including both publicly available datasets and institutional patient cohorts; (IV) studies developing new AI models or evaluating the performance of existing models; and (V) reporting of quantitative performance metrics such as sensitivity, specificity, accuracy, or area under the receiver operating characteristic curve (AUROC). We excluded (I) review articles, editorials, commentaries, or conference abstracts without full-text availability; (II) studies using exclusively animal or simulated data; (III) studies focusing solely on other cardiac conditions without specific evaluation of myocardial infarction; and (IV) articles published in languages other than English.

Study selection and data extraction

After removing duplicate records from the initial search, we screened titles and abstracts, followed by a full-text review of potentially relevant studies. Two reviewers independently screened all title/abstract records and independently assessed all full-text reports for eligibility and extracted data using a standardized spreadsheet without automation tools; disagreements were resolved by consensus. From each included study, we systematically extracted the following information: (I) study characteristics including first author, publication year, and country or region; (II) study design (prospective or retrospective); (III) data source and sample size; (IV) AI model architecture and classification approach; (V) ECG lead configuration; (VI) input representation method; (VII) target outcome definitions; (VIII) reference standard used for diagnosis; (IX) performance metrics including AUROC, sensitivity, and specificity; and (X) whether external validation was performed. We classified validation setting and reference standards using prespecified frameworks when available. Outcomes sought included AMI-related targets as defined in each study; for each outcome, AUROC and, when available, sensitivity, specificity, and accuracy [and positive predictive value (PPV)/negative predictive value (NPV) if reported] were extracted. For studies reporting multiple models or validation cohorts, we extracted data for the primary or best-performing model as designated by the authors. When multiple performance estimates were available for the same outcome, externally validated estimates were prioritized when reported. Missing or unclear information was recorded as “NR” or “Unclear”. We did not contact study investigators for additional information.

Data synthesis and analysis

Due to the substantial heterogeneity in AI architecture, study populations, reference standards, and outcome definitions across included studies, we performed a structured narrative synthesis rather than quantitative meta-analysis. Studies were grouped and analyzed according to prespecified strata-AI model type (traditional ML, convolutional neural networks, RNNs, or hybrid approaches), target clinical application, data source, ECG lead configuration, validation approach, and reference standard. We operationalized external validation using a five-tier validation taxonomy: (I) internal validation only; (II) cross-database validation across distinct datasets; (III) temporal validation within the same institution; (IV) external validation using an independent cohort from a different institution; and (V) prospective multicenter validation. Reference standards were classified into four tiers: (I) coronary angiography; (II) biomarker-based diagnosis with clinical adjudication; (III) clinical diagnosis including International Classification of Diseases (ICD)-coded labels; and (IV) curated public dataset labels. We summarized the distribution of studies across these categories and documented the growth in publications over time.

All included studies contributed to the overall narrative synthesis. Stratified syntheses included only studies reporting the relevant outcome/metric, with generalizability-focused syntheses restricted to studies performing external validation, and OMI syntheses restricted to studies explicitly defining and evaluating OMI/acute coronary occlusion (ACO). Performance metrics were extracted as reported and standardized to a common format; confidence intervals were recorded when available. Missing or unclear information was coded as “NR” or “Unclear”. Results were presented in summary tables (study, model/dataset, validation setting, reference standard, and performance) and figures showing publication trends and performance distributions across key methodological strata.

Results

Overview of included studies

As shown in Figure 1, the database search and study selection process resulted in 88 eligible studies published between 2017 and 2025 that evaluated AI applications for AMI detection using electrocardiography. Figure 1 summarizes the number of records identified, duplicates removed, records screened, full-text reports assessed, and studies included. Tables S2,S3 summarize 88 studies on AI models for ECG-based myocardial infarction detection. Table S2 details model design, study settings, patient numbers, target, ECG lead configurations, and input representations. Table 2 presents performance metrics (AUROC, sensitivity, specificity) and validation rigor, including external validation status. These tables provide a valuable resource for comparing models, assessing validation quality, and guiding future research and clinical implementation.

Figure 1 PRISMA flow diagram of study selection for AI-enhanced ECG in acute myocardial infarction detection. PRISMA diagram showing identification and selection of studies evaluating AI-enhanced ECG for AMI detection. Database searches performed in PubMed and Embase from January 2017 to August 2025. Duplicate removal performed using DOI-based matching and title comparison. AI, artificial intelligence; AMI, acute myocardial infarction; ECG, electrocardiogram; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 2

Summary of model performance by architecture (n=88)

Model family	No. of studies (n=88)	Typical architectures	Median AUROC (IQR)^†	External validation (% of studies)	Key strengths	Key limitations
Traditional ML	17 (19.3%)	SVM, RF, KNN, LR	0.940 (0.887–0.977)	23.5% (4/17)	Interpretable handcrafted features; often feasible on smaller datasets	Requires domain expertise for feature engineering; only 4/17 (23.5%) validated externally
CNN	51 (58.0%)	1D/2D CNN, ResNet, DenseNet	0.948 (0.875–0.986)	37.3% (19/51)	End-to-end learning from minimally preprocessed raw ECG data with strong internal performance	Prone to overfitting on small datasets; only 19/51 (37.3%) validated externally
RNN	2 (2.3%)	LSTM, GRU	0.903^‡	50.0% (1/2)	Leverages temporal dynamics; potentially suitable for serial/continuous ECG streams (limited evidence in included studies)	Computationally expensive; limited evidence base (n=2); requires long sequential data
Hybrid	18 (20.5%)	CNN + RNN, CFW	0.954 (0.927–0.992)	44.4% (8/18)	Can combine spatial and temporal representations; lead- and feature-weighted interpretability	Increased complexity without consistent performance gain; limited external validation (8/18, 44.4%); heterogeneous architectures limit comparison

^†, median AUROC (IQR) was calculated across studies using external validation AUROC estimates when reported; if a study did not perform external validation, its internal validation AUROC estimate was used. ^‡, only 1 of 2 RNN studies reported AUROC; insufficient data for IQR calculation. AUROC, area under the receiver operating characteristic curve; CFW, CNN with Attention-based Feature Weighting; CNN, convolutional neural network; DenseNet, densely connected convolutional network; ECG, electrocardiography; GRU, gated recurrent unit; IQR, interquartile range; KNN, k-nearest neighbors; LR, logistic regression; LSTM, long short-term memory; ML, machine learning; ResNet, Residual Network; RF, random forest; RNN, recurrent neural network; SVM, support vector machine.

Publication volume increased markedly over time, from one study in 2017 to 22 studies in 2025 (Figure 2), with annual distribution by model architecture (Figure 2A) and validation setting (Figure 2B). Geographically, research was led by China (mainland) (n=22, 25%), followed by the United States (n=13, 15%) and South Korea (n=12, 14%), with additional contributions from Taiwan, India, Singapore, and multiple European nations.

Figure 2 Annual distribution of included studies by model architecture and validation type [2017–2025]. (A) Model family (ML, CNN, RNN, hybrid). (B) Validation type (internal validation only, cross-database, temporal, external cohort, and prospective multicenter). Percentages above bars in Panel B indicate the proportion of studies per year performing any external validation (i.e., cross-database, temporal, external cohort, or prospective multicenter). Cross-database validation refers to evaluation across distinct databases/datasets. Temporal validation refers to a temporally separated validation set within the same institution (e.g., earlier vs. later period). External cohort validation refers to evaluation in an independent cohort from a different institution (retrospective or prospective). Prospective multicenter validation refers to prospective evaluation across multiple institutions. CNN, convolutional neural network; MC, multicenter; ML, machine learning; RNN, recurrent neural network.

Most studies used retrospective designs (n=80, 91%), while eight studies (9%) were prospective. Most investigations relied on 12-lead ECG configurations (n=64, 73%), though single-lead approaches designed for wearable devices were investigated in six studies, and intermediate configurations including 8-lead and 4-lead systems were evaluated in several investigations. Across model families, AUROC, sensitivity, and specificity analyses showed wide overlap with no consistent superiority; given substantial between-study heterogeneity and mixed validation types, direct statistical comparisons were not performed (Figure 3). To facilitate cross-study interpretation of AUROC in the setting of substantial heterogeneity, we stratified results by validation setting, reference standard, and data source, in addition to model family (Figure 4).

Figure 3 Distribution of AUROC, sensitivity, and specificity across model architectures (ML, CNN, RNN, hybrid) with IQRs. Box-and-whisker plots illustrate the distribution of (A) AUROC, (B) sensitivity, and (C) specificity for each model architecture (ML, n=17; CNN, n=51; RNN, n=2; Hybrid, n=18) with IQR. Statistical hypothesis testing was not performed due to substantial heterogeneity in tasks, datasets, and reference standards across studies. AUROC, area under the receiver operating characteristic curve; CNN, convolutional neural network; IQR, interquartile range; ML, machine learning; RNN, recurrent neural network.

Figure 4 AUROC stratified by validation setting, reference standard type, and data source. Each dot represents one study (AUROC available: n=53). External AUROC was used when reported; otherwise internal AUROC. When multiple external validations were reported, the most stringent validation setting was selected. Validation settings were defined as follows: cross-database, evaluation across distinct datasets/databases; temporal, temporally separated validation within the same institution; external cohort, validation in an independent clinical cohort from a different institution (retrospective or prospective); prospective MC, prospective evaluation across multiple institutions. Reference standard type reflects the outcome ascertainment/labeling approach (angiography-based, biomarker plus adjudication, ICD/clinical diagnosis, or curated public dataset labels). AUROC, area under the receiver operating characteristic curve; CAG, coronary angiography; ICD, International Classification of Diseases; MC, multicenter.

Data sources were evenly distributed between public datasets (n=50, 57%) and institutional patient cohorts (n=43, 49%), with the Physikalisch-Technische Bundesanstalt (PTB) database and its larger variant, Physikalisch-Technische Bundesanstalt extra-large (PTB-XL) database, being the most commonly utilized public repositories (10). Critically, external validation was absent in most studies, with 33 investigations (37.5%) conducting any form of external testing beyond their training or internal validation datasets.

Traditional ML approaches

Early applications of AI to ECG-based AMI detection predominantly used traditional ML algorithms requiring explicit feature engineering. Among the 17 studies employing conventional ML methods, random forest (RF) emerged as the most common approach (n=5), followed by k-nearest neighbors (kNN, n=3), and others (n=9). These studies typically extracted handcrafted features from ECG signals, including time-domain characteristics such as QRS duration and ST-segment deviation, morphological descriptors of P waves, QRS complexes, and T waves, and frequency-domain parameters derived from automated wavelet or Fourier transforms, where specific parameters were selected based on domain knowledge.

Kora et al. demonstrated the feasibility of using firefly particle swarm optimization for feature selection combined with large margin nearest neighbor classification, achieving 100% sensitivity and 98.7% specificity on a small public dataset of 44 patients (11). Similarly, Sopic et al. developed an event-driven hierarchical RF classifier for single-lead wearable ECG analysis, reporting 83.3% accuracy (12). While these traditional ML approaches offered interpretability and feasibility on smaller datasets, their reliance on manual feature engineering and limited scalability to larger, heterogeneous datasets motivated the transition toward deep-learning methodologies.

Convolutional neural networks

CNNs were originally developed for visual pattern recognition and document analysis, later achieving major breakthroughs in large-scale image classification and being adapted for one- and two-dimensional (2D) biosignal analysis (13). CNNs emerged as the predominant AI architecture for AMI detection, employed in 51 of the 88 included studies (58%). Unlike traditional ML approaches, CNNs enable end-to-end learning directly from raw ECG signals without requiring manual feature extraction. The architectural designs varied considerably, ranging from relatively simple single-branch networks to complex multi-branch architectures that process each ECG lead independently before fusion.

Liu et al. showed the multiple-feature-branch CNN approach, developing 12 parallel convolutional branches corresponding to each ECG lead, combined with a global softmax layer for myocardial infarction classification and localization (14). This architecture achieved perfect classification performance (100.0% sensitivity, 99.9% specificity) on a retrospective dataset of 180 patients, though external validation was not performed. Subsequent investigations explored various CNN architectures, including Residual Network (ResNet), Visual Geometry Group Network (VGG), Densely Connected Convolutional Network (DenseNet), and custom designs optimized for ECG signal characteristics (15-17).

The performance of CNN-based models varied substantially across studies, with reported sensitivities and specificities ranging from 67.7% to 100.0% and 73.3% to 100.0%, respectively. This wide range likely reflects multiple factors, including differences in study population, task complexity, dataset characteristics, and sample sizes. However, direct comparison is complicated by heterogeneous reference standards (database labels, physician interpretation, and angiography results) and inconsistent evaluation metrics.

CNN-based models demonstrated strong internal validation performance, with the key advantage of learning end-to-end from minimally preprocessed raw ECG signals without manual feature extraction. However, critical limitations emerged regarding generalizability. Only 17 of 51 CNN studies (33.3%) conducted external validation. Several studies achieving perfect or near-perfect internal validation performance on small public datasets likely represent overfitting rather than robust generalizable learning, highlighting the indispensability of rigorous external validation before clinical deployment.

RNNs

RNNs, designed to capture temporal dependencies in sequential data, were investigated in two dedicated studies for AMI detection (18,19). CNNs primarily learn spatial and morphological features from individual ECG segments or from 2D ECG representations (e.g., time-frequency maps or rasterized 12-lead ECG images). In contrast, RNN variants such as long short-term memory (LSTM) and gated recurrent unit (GRU) maintain hidden states that encode information from previous time steps. This architecture theoretically enables detection of dynamic ischemic evolution across the cardiac cycle and capture of subtle temporal patterns in ST-segment shifts or T-wave changes (20,21).

Across the two RNN-focused studies, reported sensitivities ranged from 81.5% to 91.9%, specificities from 80.8% to 87.3%, and AUROC up to 0.918, comparable to CNN-based approaches. One study performed cross-database validation (trained on PTB-XL and tested on PTB), whereas the other lacked external validation, limiting conclusions regarding generalizability.

The limited adoption of pure RNN architectures for ECG-based AMI detection is notable given their theoretical advantages for sequential data. Possible explanations include computational considerations for real-time clinical applications, the relatively short ECG segments (typically 10 seconds) used in most studies, and well-documented training challenges such as vanishing gradients—though LSTM and GRU architectures largely address these issues (20,21).

Hybrid and advanced architectures

Eighteen studies (20% of all included studies) developed hybrid models that combined multiple architectural components or modeling approaches. Based on our analysis of model descriptions, these hybrids fell into three categories: CNN-RNN combinations (n=6); CNN with attention-based Feature Weighting (CFW), encompassing lead-wise, channel-wise, and hierarchical attention mechanisms to weight and integrate multi-lead ECG information (n=6); and ensemble methods (n=6), with some studies counted in more than one category due to the use of multiple hybrid strategies (22-39).

CNN-RNN hybrids combine convolutional layers for spatial and morphological feature extraction with recurrent components (LSTM or GRU) to capture temporal dependencies across ECG segments, thereby aiming to leverage both morphological and sequence dynamics (24,25,27,30,33,36). CFW models employ diverse attention strategies—including lead-wise weighting (15,25,32), channel-level dense attention (36), hierarchical attention networks (29), and self-attention mechanisms (26)—to selectively emphasize diagnostically significant ECG features across multiple leads while reducing noise. These complementary approaches enhance both interpretability and diagnostic performance by operating at different levels of the feature hierarchy (24,25,28,29,32). Ensemble approaches range from combinations of traditional ML classifiers to stacked pipelines in which deep-learning features feed secondary classifiers, as well as multi-stage systems that combine detection and classification modules (22,23,31,34,35,37).

Hybrid architectures combining CNN-RNN, CFW, or ensemble approaches were explored in 18 studies (20%). These models theoretically offer advantages by integrating spatial feature extraction with temporal modeling or attention-based interpretability. However, they did not show consistent performance advantages despite increased complexity. Reported sensitivities (71.8–99.1%) and specificities (76.0–100.0%) overlapped substantially with pure CNN ranges. External validation was conducted in only 4 of 18 studies (22%), including cross-database testing or independent cohorts (23,27,33,39). The heterogeneity in hybrid approaches and absence of standardized comparative evaluations preclude definitive conclusions about whether architectural complexity yields clinically meaningful gains.

Target outcome definitions

Across the 88 included studies, target outcome definitions showed substantial heterogeneity; we identified 48 distinct formulations. For analytic clarity and clinical interpretability, we grouped these into four categories reflecting increasing diagnostic difficulty and therapeutic granularity: (I) simple binary detection; (II) STEMI identification; (III) NSTEMI detection and multi-class ACS classification; and (IV) OMI identification.

Binary MI detection

More than 50% of studies employed basic binary classification tasks distinguishing patients with MI from comparison groups. The clinical relevance and difficulty of these tasks varied considerably based on the composition of comparison groups. Studies comparing MI to healthy controls represented simplified scenarios with limited real-world applicability, as clinical practice requires distinguishing MI from other acute presentations rather than from asymptomatic individuals. More challenging and clinically realistic studies compared MI to non-MI chest pain patients, mixed cardiac presentations, or heterogeneous emergency department populations (40,41).

STEMI identification

Multiple studies (n=14, 16%) focused on STEMI detection, distinguishing STEMI from non-ST-elevation (non-STE) presentations—a composite category that includes MI without STE (e.g., NSTEMI and OMI without STE), unstable angina, other cardiac conditions, and normal findings. STEMI identification constitutes a clinically critical binary decision point that typically triggers immediate reperfusion therapy (primary PCI or fibrinolysis). AI models consistently demonstrate excellent diagnostic performance for STEMI, often matching or even exceeding human expert interpretation, owing to the distinct and easily learnable ST-elevation patterns on ECG (42). Performance was lower when distinguishing NSTEMI from STEMI, reflecting the inherent difficulty of detecting subtle or transient ischemic changes that lack clear ST-elevation morphology. Nevertheless, AI-based STEMI recognition is now a relatively mature and well-validated application of ECG analysis (43,44).

NSTEMI detection and multi-class ACS classification

Eight studies addressed NSTEMI detection, representing one of the most challenging tasks in ECG-based ischemia interpretation due to the subtle manifestations of non-ST-elevation ischemia (29,34,43,45-49). Three of these studies focused on the clinically important but diagnostically difficult distinction between NSTEMI and unstable angina (47-49).

Several investigations developed multi-class classification frameworks encompassing the full acute coronary syndrome (ACS) spectrum. These included three-way classification of STEMI, NSTEMI, and non-AMI; STEMI, NSTEMI, and healthy controls; or four-way classification encompassing unstable angina, NSTEMI, STEMI, and non-ACS (43,44,46). One advanced investigation predicted specific culprit coronary arteries specifically in patients with NSTEMI, where ECG localization is more challenging than in STEMI due to less prominent changes (34).

The recent emphasis on AI-based NSTEMI detection reflects growing recognition that AI may offer particular diagnostic value for identifying subtle ischemic signatures—including minor ST depressions, T-wave changes, or non-specific repolarization abnormalities—that are easily missed or misinterpreted by human readers, especially in the time-pressured emergency department environment.

OMI identification

The OMI paradigm emerged from the recognition that ACO can be missed when relying solely on STEMI ECG criteria. While ST-segment elevation criteria have enabled life-saving reperfusion pathways and appropriately remain central to guidelines and clinical systems, they are insufficient to identify all clinically important ACO presentations (50). Several STEMI-equivalent patterns may indicate ACO despite absent ST-elevation, and approximately 30% of NSTEMI patients may have angiographic ACO (3). However, these additional patterns are less intuitive than ST-elevation thresholds and may be under-recognized, particularly in resource-limited settings or by non-specialists. Broader adoption of an OMI-oriented framework would require coordinated updates in guidelines, education, and care pathways to support timely reperfusion decision-making beyond STEMI criteria (51).

The nine OMI-focused studies identified are summarized in Table S4. Notably, more than half were published in 2025, reflecting growing interest in and an ongoing paradigm shift toward an occlusion-centric approach.

Most studies sought to strengthen clinical validity by using angiographic confirmation as the reference, although the operational definition of OMI varied across studies. Some investigations further extended the task to artery-specific occlusions or to distinguishing OMI from non-occlusive ACS (34,52). Five studies reported external validation, including prospective multicenter validation. Although disease prevalence and operating thresholds were not consistently reported across studies, more than half provided PPV and NPV estimates, and several explicitly stated an intended clinical positioning (rule-in, rule-out, or dual-use triage).

While AI-enabled OMI triage may accelerate recognition of under-detected occlusion patterns, it may also increase false-positive catheter laboratory activations and procedural risks. Recent perspectives emphasize staged implementation strategies (e.g., pre-alert followed by confirmatory assessment using serial ECGs, clinical features, and biomarkers) and prospective monitoring of missed OMI, false activations, and time-to-reperfusion (51). Furthermore, if models are trained on imperfect proxy labels (e.g., STEMI-centric labels or administrative codes), AI may perpetuate existing biases and under-detect non-ST-elevation occlusions; therefore, occlusion-aligned reference standards, prespecified operating points, and rigorous external validation are essential for responsible deployment.

Public dataset utilization

Public ECG databases play a prominent role in AI-AMI research, with 50 studies (57%) utilizing publicly available data either exclusively or in combination with institutional datasets. PTB and PTB-XL were the most frequently employed repositories, offering advantages of open access, well-documented metadata, expert annotations, and cross-study comparability enabled by shared benchmarks (10).

However, reliance on public datasets introduces important limitations. First, these corpora often comprise carefully curated, high-quality recordings that may not fully represent the noise, artifact burden, device heterogeneity, and case-mix complexity typical of routine emergency and prehospital care, potentially limiting generalizability (53). Second, demographic and clinical characteristics in public datasets may differ from local patient populations, limiting geographic and demographic generalizability (54). Third, methodological issues such as record-wise rather than patient-wise data splits and inconsistent label quality can inflate model performance if not properly addressed (55).

Institutional data and real-world complexity

Studies using institutional patient data (n=43, 49%) provide complementary evidence on AI performance in real-world clinical environments. These cohorts typically include larger sample sizes, broader case-mixes, and routine-care ECGs with greater variability in signal quality, devices, and acquisition conditions (43,56). Among studies with external validation, several investigations document performance decline in external cohorts compared to internal validation, highlighting potential challenges in generalizing models trained on curated datasets to heterogeneous real-world settings (56,57). The extent to which dataset shifts, signal quality differences, or other factors contribute to this performance gap requires further investigation.

Institutional datasets also enable context-specific evaluations—including emergency department (ED) triage, prehospital emergency medical services (EMS), and intensive care unit (ICU) monitoring—which inform operational constraints such as processing latency, alert burden, and integration with existing clinical workflows and electronic health records. However, data-sharing restrictions related to privacy and governance often preclude open release, limiting independent replication, external validation, and head-to-head benchmarking across centers.

External validation deficit

Perhaps our most striking finding is the paucity of external validation in AI-AMI research. Only 33 of 88 studies (37.5%) evaluated model performance beyond the primary training and internal validation datasets. This deficit is concerning, given the well-documented propensity of ML models to overfit development data and to degrade in performance when applied to external populations (57).

Among studies that performed external validation, methodologies varied substantially. The most common approach was validation using independent hospital cohorts (n=19), followed by cross-database validation—typically training on one public repository (e.g., PTB) and testing on another (e.g., PTB-XL)—reported in nine studies (19,27,39,58-62). Validation settings that provide stronger evidence for real-world generalizability—temporal validation using post-development data within the same institution (n=3) and prospective multicenter validation across geographically distinct sites (n=3)—remained uncommon (49,56,57,63,64).

Internal and external AUROC values were summarized by external validation category in Table 3. Among studies that reported both internal and external AUROC within the same study, external validation typically yielded modestly lower AUROC than internal validation (56,57). This performance gap underscores the necessity of external validation and indicates that internally derived metrics may substantially overestimate real-world clinical performance. Addressing this shortfall—through prospective, multicenter, and temporally separated validation cohorts with prespecified operating points and transparent reporting—is essential before widespread clinical implementation of AI-ECG systems for AMI detection.

Table 3

Model discrimination (AUROC) stratified by external validation type

Validation type	N [%]	Paired n^†	Internal AUROC^‡, median (IQR)	External AUROC^‡, median (IQR)	∆ AUROC^†, median (range)
Internal	55 [62]	NA	0.948 (0.890–0.983)	NA	NA
Cross-database	8 [9]	3	0.921 (0.879–0.951)	0.906 (0.803–0.944)	−0.053 (−0.128 to −0.023)
Temporal	3 [3]	0	0.996 (n=1)	0.840 (0.827–0.909)	NC
External cohort	19 [22]	4	0.910 (0.910–0.980)	0.970 (0.885–0.993)	−0.025 (−0.100 to −0.001)
Prospective MC	3 [3]	0	NR	0.789 (0.744–0.834)	NC

^†, paired n indicates the number of studies reporting both internal and external AUROC within the same study. ∆AUROC was calculated as the within-study difference (external minus internal) and summarized as median (range) among paired studies only. NC indicates not calculable due to insufficient paired data. ^‡, internal and external AUROC summaries [median (IQR)] include all studies reporting the respective metric within each validation category and may therefore be unpaired (i.e., internal and external summaries can be based on different sets of studies). Where only a single study reported AUROC, the value is shown as n=1. AUROC, area under the receiver operating characteristic curve; IQR, interquartile range; MC, multicenter; NA, not applicable; NC, not calculable; NR, not reported.

Heterogeneity and fragility of reference standards

Among the 88 included studies, reference standards were distributed as follows: coronary angiography (n=22, 25%), biomarker-based diagnosis with clinical adjudication (n=16, 18%), clinical diagnosis including ICD codes (n=5, 6%), and curated public dataset labels (n=45, 51%), reflecting heavy reliance on public datasets.

Reference standard heterogeneity is not simply a performance issue—it fundamentally defines the ground truth a model learns to reproduce. As discussed regarding the OMI paradigm, reliance on clinical diagnosis alone (including ICD-based labeling) can introduce systematic mislabeling when true occlusion events are recorded as NSTEMI or non-MI because occlusion was not recognized or angiography was not performed. Label noise from less sensitive diagnostic criteria or heterogeneous labeling propagates into model training and evaluation. Unlike physiologic data sources (angiography, serial biomarkers, ECG waveforms), administrative or curated labels are difficult to verify. Therefore, performance from curated public dataset labels or ICD-based diagnoses should not be interpreted as equivalent to angiography-adjudicated evidence, particularly for occlusion-aligned tasks.

Performance metrics and heterogeneity

Performance reporting across the 88 included studies demonstrated substantial heterogeneity in both metrics and evaluation methodology. While sensitivity and specificity are reported in approximately 75% of studies, AUROC is available in only 55% of investigations, constraining comparative assessment. This inconsistent reporting reflects the lack of standardized evaluation frameworks for AI-ECG systems and complicates efforts to identify superior approaches or optimal architectures.

Reported performance varied widely within each architectural category. Traditional ML approaches demonstrated sensitivities ranging from 73.0% to 100.0% and specificities from 77.0% to 100.0%. CNN-based models showed similar ranges, with sensitivities of 69.7–100.0% and specificities of 73.3–100.0%. Pure RNN architectures, explored in only two studies, reported sensitivities of 81.5–91.9% and specificities of 80.8–87.3%. Hybrid models achieved sensitivities ranging from 71.8% to 99.1% and specificities from 76.0% to 100.0%. These overlapping performance ranges suggest that architectural sophistication does not consistently translate to superior diagnostic accuracy. However, definitive conclusions are limited by the heterogeneity in study populations, tasks, and reference standards.

Importantly, variability in reported performance likely reflects multiple confounding factors beyond architectural differences and raises concern for potential performance inflation in studies reporting near-perfect performance. Task complexity is shaped not only by label granularity (binary versus multi-class) but also by case-mix composition. For example, binary classification of MI versus healthy controls, or STEMI versus normal ECGs—often derived from curated public databases with well-separated class boundaries—differs fundamentally from detecting OMI within an emergency department cohort of patients with undifferentiated chest pain, despite both being two-class formulation. Studies evaluating the former type of task often reported higher performance metrics. Overfitting risk may be further amplified when high-capacity deep models are applied to relatively small and curated datasets.

A key contributor to inflated performance in AI-ECG studies is patient-level data leakage, which occurs when record-level splitting allows multiple ECGs from the same patient to be distributed across training and test sets, yields optimistically biased AUROC estimates. In Table 4, among studies with best AUROC ≥0.99, 50% (6/12) reported no external validation, 75% (9/12) did not report the split unit, and only 25% (3/12) explicitly used patient-level splitting. Accordingly, performance estimates from record-level or unspecified splitting may be optimistic and should be interpreted cautiously, particularly when compared with patient-level splitting or independent external/temporal validation.

Table 4

Study design and reporting characteristics stratified by best-reported AUROC

Best AUROC	Studies	Public dataset	No external validation	Patient-level split	Record-level split	Split not reported	Binary task
≥0.99	12/88 [14]	5/12 [42]	6/12 [50]	3/12 [25]	0/12 [0]	9/12 [75]	7/12 [58]
0.95–0.98	13/88 [15]	8/13 [62]	7/13 [54]	5/13 [38]	1/13 [8]	7/13 [54]	11/13 [85]
0.90–0.94	12/88 [14]	5/12 [42]	6/12 [50]	7/12 [58]	0/12 [0]	5/12 [42]	12/12 [100]
<0.90	16/88 [18]	2/16 [12]	7/16 [44]	7/16 [44]	0/16 [0]	9/16 [56]	14/16 [88]
Not reported	35/88 [40]	29/35 [83]	29/35 [83]	15/35 [43]	7/35 [20]	13/35 [37]	19/35 [54]
Overall	88/88 [100]	49/88 [56]	55/88 [62]	37/88 [42]	8/88 [9]	43/88 [49]	63/88 [72]

Data are presented as n/N [%]. Best AUROC indicates the highest AUROC reported in each study across the target tasks evaluated in that study. For rows with a reported AUROC, percentages in the characteristic columns are row-wise (i.e., within each AUROC stratum). No external validation indicates internal validation only (i.e., single-dataset train/test split without independent external cohort evaluation). Split unit was categorized as patient-level when the training/validation/test split was explicitly described at the patient level; record-level when splitting was explicitly performed at the ECG/record level; and not reported when the split unit could not be determined from the manuscript. Binary task indicates that the primary prediction target was formulated as a two-class classification task (label granularity), and does not necessarily imply lower clinical complexity (e.g., binary OMI detection involves high clinical complexity despite two-class formulation). AUROC, area under the receiver operating characteristic curve; ECG, electrocardiogram; n/N, number of studies with characteristic/total number of studies in stratum; OMI, occlusion myocardial infarction.

Reference standards also varied markedly, from database labels of uncertain provenance to rigorous adjudication using troponin levels and angiographic confirmation. Collectively, these methodological disparities preclude direct cross-study comparison and highlight the need for standardized evaluation protocols.

Standardized evaluation frameworks should incorporate patient-level splitting with clear reporting of the split unit, independent external or temporally separated validation cohorts, and (where feasible) prospective multicenter evaluation. In addition to discrimination metrics, transparent reporting should include prespecified operating points aligned with intended clinical use (e.g., rule-in vs rule-out thresholds), calibration assessment, and decision-curve analysis to support clinically meaningful interpretation and implementation.

Comparison with physician performance

Multiple studies incorporated direct comparison between AI system performance and human physician interpretation, though methodological approaches varied substantially. For STEMI identification, AI performance was generally comparable to that of experienced cardiologists while exceeding that of emergency physicians and trainees with limited ECG expertise (42). However, these comparisons must be interpreted cautiously, as inter-observer variability among physicians and differences in case selection can substantially affect results.

AI systems demonstrate particular value in detecting subtle ischemic patterns–including NSTEMI and STEMI-equivalent presentations lacking overt ST-elevation (44). Studies reported higher sensitivity for AI compared to human readers in identifying non-ST-elevation ischemia, a clinically meaningful finding given the well-documented difficulty of NSTEMI diagnosis (44,64). Deep-learning models appear capable of recognizing subtle ST-segment depressions, T-wave abnormalities, and non-specific repolarization changes that may escape human detection.

However, physician-AI comparison studies face important limitations. Most used small physician samples without adequate stratification by experience level and evaluated performance under artificial conditions that lacked the time pressure and cognitive load of real emergency care (29). Additionally, the choice of reference standard—whether physician consensus, biomarkers, or angiography—fundamentally affects what “accuracy” means in these comparisons.

Several investigations explored AI-assisted interpretation, where physicians reviewed ECGs with AI predictions as decision support. These studies showed improved diagnostic accuracy compared to unassisted interpretation, suggesting value for AI as a clinical aid rather than an autonomous diagnostic tool (63,65). This collaborative approach may leverage the complementary strengths of pattern recognition of AI compared to clinical reasoning and contextual judgment of physicians.

Discussion

Clinical implications of current studies

Most included studies framed AI-ECG as diagnostic classification with limited discussion of clinical actionability. When linked to time-critical workflows (e.g., prehospital STEMI triage, catheter laboratory activation), AI-ECG may provide actionable value, but this depends on specifying the downstream decision and selecting an operating point aligned with the intended role. Some systems functioned as rule-out tools (prioritizing high sensitivity/NPV), whereas others served as rule-in tools (prioritizing high specificity/PPV). In addition, treatment relevance may be better reflected by endpoints closer to action (e.g., ACO/OMI as a proxy for emergent reperfusion need) than diagnostic labels alone.

PPV and NPV are prevalence-dependent: the same sensitivity/specificity yields different post-test probabilities across deployment contexts. As a hypothetical example, with 90% sensitivity and 90% specificity, PPV is ~32% at 5% prevalence, whereas at 15% prevalence PPV is ~61% and NPV ~98%. Thus, AUROC alone cannot capture clinical utility without specifying deployment context and operating point. Future evaluations may benefit from explicitly stating the intended clinical role (rule-in vs. rule-out) and prespecified thresholds, with PPV/NPV reported at representative prevalences.

Other important considerations

Several commercial or regulatory-cleared AI-ECG systems for AMI detection have been evaluated in real-world cohorts, demonstrating encouraging performance in their respective contexts (2,49,60,63). Given variability in thresholds, reference standards, and study designs, along with the modest number of independent evaluations, firm conclusions about equivalence to research models are premature; further independent and head-to-head assessments will be informative.

Several studies (n=15) incorporated anatomical localization to classify infarct territory, offering procedural value for catheterization planning. Anterior MI generally achieved higher accuracy than inferior or lateral infarctions, likely reflecting more distinctive precordial-lead patterns (42). However, per-territory performance reporting was inconsistent, and external validation was limited.

The predominance of 12-lead ECG systems in clinical practice was mirrored in the literature, with 73% (n=64) of studies employing standard 12-lead configurations. This focus aligns with diagnostic guidelines and enables detection of MI across multiple anatomical territories. Nevertheless, several investigations explored reduced-lead configurations, motivated by wearable-device development and resource-limited environments where full 12-lead acquisition may be impractical (45,53,64). Six studies explored single-lead configurations for wearable devices, achieving sensitivities of 81–95% and specificities of 79–93% (25,64). Most deep-learning models processed minimally preprocessed raw signals, though alternative approaches included spectrograms and wavelet coefficients (26,65-67). Systematic head-to-head comparisons on identical datasets are needed to determine optimal configurations for specific diagnostic tasks.

Strengths

We synthesized the rapidly expanding literature on AI-enhanced ECG systems for AMI detection, identifying 88 studies over 8 years. A structured search of major biomedical databases ensured broad capture of available evidence. By extracting detailed study-level characteristics—including AI architectures, validation strategies, data sources, and performance metrics—we enabled a multidimensional appraisal of the field’s current state.

A central strength is our critical appraisal of methodological quality, in which we quantify the external validation shortfall (33/88 studies; 37.5%) and classify validation approaches (cross-database, independent cohorts, temporal, multicenter) that collectively highlight barriers to clinical translation. The review’s temporal scope traces the evolution from traditional ML to deep-learning and hybrid approaches, documenting trends in methodologies, applications, and research rigor. Geographic diversity among included studies supports cautious generalizability across healthcare settings and patient populations.

To facilitate comparability and clinical interpretability, future studies should adopt harmonized reporting practices—prespecified, clinically justified operating points; AUROC with confidence intervals; threshold-dependent metrics (e.g., sensitivity/specificity and PPV/NPV at the prespecified threshold and study prevalence); calibration (calibration plots, calibration-in-the-large and slope, Brier score); and decision-curve analysis (net benefit)—and should report results stratified by case mix, site, time period (temporal drift), and device/vendor when applicable.

Limitations

Our review has some limitations. Formal risk-of-bias assessment tools and meta-analysis were not performed due to the substantial methodological and clinical heterogeneity across included studies. We present the findings as a structured narrative synthesis with descriptive summaries rather than pooled estimates or formal evidence grading, and because we searched a limited set of databases and included only studies with available full-text articles, coverage may be incomplete. Given substantial heterogeneity in tasks, reference standards, and evaluation metrics, quantitative pooling was not feasible, limiting inferences about comparative effectiveness. We also relied on study-reported metrics without independent verification or standardized re-analysis; variation in operating-point selection, statistical reporting, and reference standards (e.g., database labels vs. clinician adjudication vs. biomarker/angiography) complicates cross-study comparisons. Because we extracted the best-reported (often maximum) performance estimates from each study to harmonize heterogeneous reporting, our synthesis may be subject to optimism bias and may overestimate performance under routine clinical conditions. The English-language restriction may have introduced language bias, and publication bias likely inflates apparent performance by underrepresenting null findings. The rapid cadence of AI development risks both obsolescence of older methods and under-capture of very recent advances that have not yet been peer-reviewed, challenging currency and completeness.

Future investigations should adhere to emerging reporting standards—Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-Artificial Intelligence (TRIPOD-AI) for prediction modeling and Consolidated Standards of Reporting Trials-Artificial Intelligence (CONSORT-AI)/Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence (DECIDE-AI) for trials and early evaluations—and should prioritize prospective, multicenter, temporally separated external validation under transparent, preregistered protocols. Preregistration [e.g., ClinicalTrials.gov or Open Science Framework (OSF)], open methods/code, and model cards where feasible, and the use of shared benchmark cohorts with clearly defined reference standards would materially improve reproducibility and enable objective benchmarking.

Finally, our review emphasized technical performance and validation, with limited coverage of patient-centered outcomes, equity and access, workflow/throughput effects, and long-term clinical outcomes. These domains warrant dedicated prospective and implementation-science studies as AI-ECG systems move toward routine clinical use.

Conclusions

AI-enhanced ECG for AMI detection shows promising technical performance, with particular value for detecting subtle ischemic patterns in NSTEMI and OMI. However, critical gaps impede clinical deployment: only 37.5% of studies performed external validation. More complex architectures did not demonstrate superior performance compared to simpler models, though this finding should be interpreted cautiously given substantial heterogeneity in datasets, tasks, and evaluation methods. Future research must prioritize prospective multicenter validation, standardized evaluation frameworks, and implementation studies examining clinical outcomes. Technical feasibility is established; clinical impact now depends on validation rigor and pragmatic deployment.

Acknowledgments

We acknowledge the contributions of all researchers and clinicians working to advance AI-enhanced ECG analysis for improved patient care.

Footnote

Reporting Checklist: The authors have completed the PRISMA reporting checklist. Available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-aw-561/rc

Peer Review File: Available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-aw-561/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-aw-561/coif). H.M.Y. serves as an unpaid editorial board member of Cardiovascular Diagnosis and Therapy from July 2025 to June 2027. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Byrne RA, Rossello X, Coughlan JJ, et al. 2023 ESC Guidelines for the management of acute coronary syndromes. Eur Heart J 2023;44:3720-826. [Crossref] [PubMed]
Choi J, Kim J, Spaccarotella C, et al. Smartwatch ECG and artificial intelligence in detecting acute coronary syndrome compared to traditional 12-lead ECG. Int J Cardiol Heart Vasc 2025;56:101573. [Crossref] [PubMed]
McLaren J, de Alencar JN, Aslanger EK, et al. From ST-Segment Elevation MI to Occlusion MI: The New Paradigm Shift in Acute Myocardial Infarction. JACC Adv 2024;3:101314. [Crossref] [PubMed]
Tabas JA, Rodriguez RM, Seligman HK, et al. Electrocardiographic criteria for detecting acute myocardial infarction in patients with left bundle branch block: a meta-analysis. Ann Emerg Med 2008;52:329-336.e1. [Crossref] [PubMed]
Daly MJ, Adgey JA, Harbinson MT. Improved detection of acute myocardial infarction in patients with chest pain and significant left main stem coronary stenosis. QJM 2012;105:127-35. [Crossref] [PubMed]
Sato M, Kodera S, Setoguchi N, et al. Deep Learning Models for Predicting Left Heart Abnormalities From Single-Lead Electrocardiogram for the Development of Wearable Devices. Circ J 2023;88:146-56. [Crossref] [PubMed]
Himmelreich JCL, Karregat EPM, Lucassen WAM, et al. Diagnostic Accuracy of a Smartphone-Operated, Single-Lead Electrocardiography Device for Detection of Rhythm and Conduction Abnormalities in Primary Care. Ann Fam Med 2019;17:403-11. [Crossref] [PubMed]
Mannhart D, Lischer M, Knecht S, et al. Clinical Validation of 5 Direct-to-Consumer Wearable Smart Devices to Detect Atrial Fibrillation: BASEL Wearable Study. JACC Clin Electrophysiol 2023;9:232-42. [Crossref] [PubMed]
Kim EJ, Gala D, Ayyad M, et al. AI Applications in Electrocardiography for Ischemic and Structural Heart Disease: A Review of the Current State. J Clin Med 2026;15:316. [Crossref] [PubMed]
Wagner P, Strodthoff N, Bousseljot RD, et al. PTB-XL, a large publicly available electrocardiography dataset. Sci Data 2020;7:154. [Crossref] [PubMed]
Kora P. ECG based Myocardial Infarction detection using Hybrid Firefly Algorithm. Comput Methods Programs Biomed 2017;152:141-8. [Crossref] [PubMed]
Sopic D, Aminifar A, Aminifar A, et al. Real-Time Event-Driven Classification Technique for Early Detection and Prevention of Myocardial Infarction on Wearable Systems. IEEE Trans Biomed Circuits Syst 2018;12:982-92. [Crossref] [PubMed]
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998;86:2278-324. [Crossref]
Liu W, Huang Q, Chang S, et al. Multiple-feature-branch convolutional neural network for myocardial infarction diagnosis using electrocardiogram. Biomedical Signal Processing and Control 2018;45:22-32. [Crossref]
He Z, Yuan Z, An P, et al. MFB-LANN: A lightweight and updatable myocardial infarction diagnosis system based on convolutional neural networks and active learning. Comput Methods Programs Biomed 2021;210:106379. [Crossref] [PubMed]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations (ICLR 2015): Computational and Biological Learning Society; 2015:1-14.
Huang G, Liu Z, Maaten LVD, et al. editors. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 21-26 July 2017.
Martin H, Izquierdo W, Cabrerizo M, et al. Near real-time single-beat myocardial infarction detection from single-lead electrocardiogram using Long Short-Term Memory Neural Network. Biomedical Signal Processing and Control 2021;68:102683. [Crossref]
Martin H, Morar U, Izquierdo W, et al. Real-time frequency-independent single-Lead and single-beat myocardial infarction detection. Artif Intell Med 2021;121:102179. [Crossref] [PubMed]
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9:1735-80. [Crossref] [PubMed]
Cho K, van Merriënboer B, Gulcehre C, et al. editors. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Doha, Qatar: Association for Computational Linguistics. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014:1724-34.
Wang HM, Zhao W, Jia DY, et al. Myocardial Infarction Detection Based on Multi-lead Ensemble Neural Network. Annu Int Conf IEEE Eng Med Biol Soc 2019;2019:2614-7. [Crossref] [PubMed]
Al-Zaiti S, Besomi L, Bouzid Z, et al. Machine learning-based prediction of acute coronary syndrome using only the pre-hospital 12-lead electrocardiogram. Nat Commun 2020;11:3966. [Crossref] [PubMed]
Liu W, Wang F, Huang Q, et al. MFB-CBRNN: A Hybrid Network for MI Detection Using 12-Lead ECGs. IEEE J Biomed Health Inform 2020;24:503-14. [Crossref] [PubMed]
Fu L, Lu B, Nie B, et al. Hybrid Network with Attention Mechanism for Detection and Location of Myocardial Infarction Based on 12-Lead Electrocardiogram Signals. Sensors (Basel) 2020;20:1020. [Crossref] [PubMed]
Han C, Song Y, Lim HS, et al. Automated Detection of Acute Myocardial Infarction Using Asynchronous Electrocardiogram Signals-Preview of Implementing Artificial Intelligence With Multichannel Electrocardiographs Obtained From Smartwatches: Retrospective Study. J Med Internet Res 2021;23:e31129. [Crossref] [PubMed]
Tadesse GA, Javed H, Weldemariam K, et al. DeepMI: Deep multi-lead ECG fusion for identifying myocardial infarction and its occurrence-time. Artif Intell Med 2021;121:102192. [Crossref] [PubMed]
Liu WC, Lin C, Lin CS, et al. An Artificial Intelligence-Based Alarm Strategy Facilitates Management of Acute Myocardial Infarction. J Pers Med 2021;11:1149. [Crossref] [PubMed]
Liu WC, Lin CS, Tsai CS, et al. A deep learning algorithm for detecting acute myocardial infarction. EuroIntervention 2021;17:765-73. [Crossref] [PubMed]
Wu L, Huang G, Yu X, et al. Deep Learning Networks Accurately Detect ST-Segment Elevation Myocardial Infarction and Culprit Vessel. Front Cardiovasc Med 2022;9:797207. [Crossref] [PubMed]
Ashokan PL, Sathya SS, Satheesh S. CnnBoost: a multilevel explainable stacked ensemble framework for effective detection of Myocardial Infarction from 12-lead ECG images using a transformational approach. Health Inf Sci Syst 2025;13:40. [Crossref] [PubMed]
Bulbul AAM, Awal MA, Aloteibi S, et al. CardIA-Net: An Explainable Deep Learning Model for MI Detection with ECG Lead Optimization. 2025.
Chen KW, Wang YC, Liu MH, et al. Artificial intelligence-assisted remote detection of ST-elevation myocardial infarction using a mini-12-lead electrocardiogram device in prehospital ambulance care. Front Cardiovasc Med 2022;9:1001982. [Crossref] [PubMed]
Goktekin MC, Gul E, Çakmak T, et al. Automatic Detection of Occluded Main Coronary Arteries of NSTEMI Patients with MI-MS ConvMixer + WSSE Without CAG. Diagnostics (Basel) 2025;15:347. [Crossref] [PubMed]
Kim J, Shon B, Kim S, et al. ECG data analysis to determine ST-segment elevation myocardial infarction and infarction territory type: an integrative approach of artificial intelligence and clinical guidelines. Front Physiol 2024;15:1462847. [Crossref] [PubMed]
Qiang Y, Dong X, Yang Y. Automatic detection and localisation of myocardial infarction using multi-channel dense attention neural network. Biomedical Signal Processing and Control 2024;89:105766. [Crossref]
Qu J, Sun Q, Wu W, et al. An interpretable shapelets-based method for myocardial infarction detection using dynamic learning and deep learning. Physiol Meas 2024; [Crossref] [PubMed]
Wang J, Guo X. Automated detection of myocardial infarction based on an improved state refinement module for LSTM/GRU. Artif Intell Med 2024;152:102865. [Crossref] [PubMed]
Yang X, Jiang G, Zhu Z, et al. MDD2DG-IRA: Multivariate Degree Distribution to Dynamic Graph With Inter-Channel Relevance Attention Mechanism for Multi-Channel Myocardial Infarction ECG Analysis. IEEE J Biomed Health Inform 2025;29:5503-14. [Crossref] [PubMed]
de Capretz PO, Björkelund A, Björk J, et al. Machine learning for early prediction of acute myocardial infarction or death in acute chest pain patients using electrocardiogram and blood tests at presentation. BMC Med Inform Decis Mak 2023;23:25. [Crossref] [PubMed]
AhujaYSasankanPRonanRAI-MI: A Deep Learning Model to Predict Actionable Acute Coronary Syndrome Using 12-Lead ECGs. Available online: 10.1101/2025.05.18.24319528
Zhao Y, Xiong J, Hou Y, et al. Early detection of ST-segment elevated myocardial infarction by artificial intelligence with 12-lead electrocardiogram. Int J Cardiol 2020;317:223-30. [Crossref] [PubMed]
Gustafsson S, Gedon D, Lampa E, et al. Development and validation of deep learning ECG-based prediction of myocardial infarction in emergency department patients. Sci Rep 2022;12:19615. [Crossref] [PubMed]
Park BE, Shon B, Cho J, et al. Signal-Guided Multitask Learning for Myocardial Infarction Classification Using Images of Electrocardiogram. Cardiology 2025;150:347-56. [PubMed]
Gadag V, Singh S, Khatri AH, et al. Improving myocardial infarction diagnosis with Siamese network-based ECG analysis. PLoS One 2025;20:e0313390. [Crossref] [PubMed]
Hori K, Suzuki S, Hirota N, et al. Performance of convolutional neural network-enhanced electrocardiography in detecting acute coronary syndrome: focusing on subtypes and reduced leads. J Cardiol 2025;86:301-11. [Crossref] [PubMed]
Qin L, Qi Q, Aikeliyaer A, et al. Machine learning algorithm can provide assistance for the diagnosis of non-ST-segment elevation myocardial infarction. Postgrad Med J 2023;99:442-54. [Crossref] [PubMed]
Wu CC, Hsu WD, Islam MM, et al. An artificial intelligence approach to early predict non-ST-elevation myocardial infarction patients with chest pain. Comput Methods Programs Biomed 2019;173:109-17. [Crossref] [PubMed]
Wu L, Zhou B, Liu D, et al. LASSO Regression-Based Diagnosis of Acute ST-Segment Elevation Myocardial Infarction (STEMI) on Electrocardiogram (ECG). J Clin Med 2022;11:5408. [Crossref] [PubMed]
Ayyad M, Albandak M, Gala D, et al. Reevaluating STEMI: The Utility of the Occlusive Myocardial Infarction Classification to Enhance Management of Acute Coronary Syndromes. Curr Cardiol Rep 2025;27:75. [Crossref] [PubMed]
Ayyad M, Albandak M, Allencherril J. Reclassifying myocardial infarction: from ST elevation to coronary occlusion. Eur Heart J 2026;47:1427-31. [Crossref] [PubMed]
Herman R, Meyers HP, Smith SW, et al. International evaluation of an artificial intelligence-powered electrocardiogram model detecting acute coronary occlusion myocardial infarction. Eur Heart J Digit Health 2023;5:123-33. [Crossref] [PubMed]
Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 2020;21:345-52. [PubMed]
Zhang A, Xing L, Zou J, et al. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng 2022;6:1330-45. [Crossref] [PubMed]
Gupta A, Huerta E, Zhao Z, et al. editors. Deep Learning for Cardiologist-Level Myocardial Infarction Detection in Electrocardiograms. Cham: Springer International Publishing; 2021.
Cho Y, Kwon JM, Kim KH, et al. Artificial intelligence algorithm for detecting myocardial infarction using six-lead electrocardiography. Sci Rep 2020;10:20495. [Crossref] [PubMed]
Krychtiuk KA, Sionis A. Development and external validation of a deep learning electrocardiogram model for risk stratification of coronary revascularization need in the emergency department. Eur Heart J Acute Cardiovasc Care 2025;14:240-2. [Crossref] [PubMed]
Lee SH, Jeon KL, Lee YJ, et al. Development of Clinically Validated Artificial Intelligence Model for Detecting ST-segment Elevation Myocardial Infarction. Ann Emerg Med 2024;84:540-8. [Crossref] [PubMed]
Liu W, Ji J, Chang S, et al. EvoMBN: Evolving Multi-Branch Networks on Myocardial Infarction Diagnosis Using 12-Lead Electrocardiograms. Biosensors (Basel) 2021;12:15. [Crossref] [PubMed]
Chen Y, Ye J, Li Y, et al. A Multi-Domain Feature Fusion CNN for Myocardial Infarction Detection and Localization. Biosensors (Basel) 2025;15:392. [Crossref] [PubMed]
Sheth KA, Upreti C, Prusty MR, et al. Time-frequency transformation integrated with a lightweight convolutional neural network for detection of myocardial infarction. BMC Med Imaging 2024;24:326. [Crossref] [PubMed]
A PB, R M, E S. Optimized deep residual networks for early detection of myocardial infarction from ECG signals. BMC Cardiovasc Disord 2025;25:371. [Crossref] [PubMed]
Lee MS, Shin TG, Lee Y, et al. Artificial intelligence applied to electrocardiogram to rule out acute myocardial infarction: the ROMIAE multicentre study. Eur Heart J 2025;46:1917-29. [Crossref] [PubMed]
Al-Zaiti SS, Martin-Gill C, Zègre-Hemsey JK, et al. Machine learning for ECG diagnosis and risk stratification of occlusion myocardial infarction. Nat Med 2023;29:1804-13. [Crossref] [PubMed]
Liu W, Zhang M, Zhang Y, et al. Real-Time Multilead Convolutional Neural Network for Myocardial Infarction Detection. IEEE J Biomed Health Inform 2018;22:1434-44. [Crossref] [PubMed]
Strodthoff N, Strodthoff C. Detecting and interpreting myocardial infarction using fully convolutional neural networks. Physiol Meas 2019;40:015001. [Crossref] [PubMed]
Wang Z, Qian L, Han C, et al. Application of multi-feature fusion and random forests to the automated detection of myocardial infarction. Cognitive Systems Research 2020;59:15-26. [Crossref]

Cite this article as: Lee Y, Ahn JH, Yang HM. Artificial intelligence-enhanced electrocardiography for acute myocardial infarction detection: a systematic review. Cardiovasc Diagn Ther 2026;16(2):30. doi: 10.21037/cdt-2025-aw-561