Skip to content

isa-group/SimulatingDomainExpert

Repository files navigation

Supplementary Materials — Simulating the Domain Expert: An Empirical Evaluation of LLM-Based Interview Training for Business Process Modeling

Paper abstract. Business process discovery relies heavily on stakeholder interviews, yet practicing this skill in educational settings is constrained by limited access to domain experts. This paper empirically evaluates whether LLM-based conversational agents, configured to simulate non-technical domain experts, can effectively support the training of business process elicitation and BPMN modeling skills by providing opportunities for structured practice before engaging with human experts. We conducted a counterbalanced within-subjects experiment with eight Computer Science students, who each interviewed both an LLM-based domain expert and a human expert on two different business process scenarios and subsequently produced a BPMN model. Results show that LLM-based sessions were substantially faster (mean 33.4 vs. 59.5 minutes), perceived as equally informative, and produced models of comparable quality. Both conditions generated distinct but condition-specific error patterns, which varied across process complexity and gateway type. These findings suggest that LLM-simulated domain experts are a useful training tool for BPM education, particularly for early-stage and repeated practice where expert availability is limited.


Repository Structure

/
├── conversations_process_models/           # BPMN models (.bpmn / image files) produced by
│                                           #   students after each interview session,
│                                           #   organized by participant and condition
├── conversations_transcriptions/           # Full transcripts of LLM-Tool and HUMAN-Teams
│                                           #   interview sessions (anonymized)
├── online_questionnaires/                  # PDFs with the Google Forms questionnaires that 
│                                           #   where provided to students to fill out before 
│                                           #   and after each interview session, along with 
│                                           #   the valid responses collected
├── reference_process_models/               # Ground-truth BPMN models for the Standard (HOF)
│                                           #   and Extended (CFOF) scenarios
└── results_scripts/
    ├── 4.1-interview_efficiency/
    │   └── Section_4_1_-_Interview_Efficiency.xlsx   # Conversation statistics (message counts,
    │                                                 #   timestamps, duration) per participant
    │                                                 #   and condition
    ├── 4.2-student_perception_analysis/
    │   ├── Likert_descriptive_statistics.xlsx        # Descriptive statistics summary table
    │   └── DivergingLikertBarsByCondition/           # Python package for Section 4.2:
    │       ├── docs/                                 #   Likert-scale perception analysis,
    │       ├── outputs/                              #   reliability, and statistical tests
    │       ├── src/
    │       ├── HumanInterviews.csv                   # Raw Likert responses — HUMAN-Teams condition
    │       ├── LLM-tool-Interviews.csv               # Raw Likert responses — LLM-Tool condition
    │       ├── pyproject.toml                        # Package configuration
    │       └── README.md                             # ← See this file for setup and usage
    ├── 4.3-quantitative_bpmn_analysis/
    │   ├── quantitative_data.xlsx                    # Raw BPMN element counts per participant
    │   │                                             #   and condition (activities, gateways,
    │   │                                             #   events)
    │   └── quantitative_analysis.py                  # Script that generates the boxplot figure
    │                                                 #   (boxplots_bpmn.png)
    └── 4.4-bpmn_model_analysis/
        └── Section_4_4_-_Analysis_of_BPMN_models.xlsx  # Qualitative BPMN assessment 
                                                        #   per participant/condition

Study Design

The experiment used a counterbalanced within-subjects design with eight Computer Science students (S1–S8). Each participant interviewed both an LLM-Tool agent and a HUMAN-Teams (Microsoft Teams) expert on two distinct business process scenarios, then independently produced a BPMN model for each. The two scenarios differ in complexity:

  • Standard scenario (HOF): A simpler process (7 activities, 6 gateways, 1 start event, 1 end event).
  • Extended scenario (CFOF): A more complex process (12 activities, 6 gateways, 1 start event, 2 end events).

Counterbalancing controlled for scenario order and condition order across participants. The results are organized into four analysis sections, each corresponding to a folder under results_scripts/:

Section Folder Focus
4.1 4.1-interview_efficiency/ Conversation duration, message counts
4.2 4.2-student_perception_analysis/ Student perception (Likert questionnaires)
4.3 4.3-quantitative_bpmn_analysis/ Quantitative BPMN element comparison
4.4 4.4-bpmn_model_analysis/ Qualitative BPMN model assessment

Terminology and Abbreviations

The raw data files use internal codes that differ from the terminology used in the paper. The following table maps between them:

Code in data files Meaning in paper
HOF Hardware Order Fulfillment process (Standard scenario)
CFOF Custom Furniture Order Fulfillment (Extended scenario)
LEIA LLM-Tool interview condition
TEAMS HUMAN-Teams (Microsoft Teams) interview condition
S1S8 Anonymized participant identifiers

Accessing the Results

Section 4.1 — Interview Efficiency

No script is required. Open Section_4_1_-_Interview_Efficiency.xlsx directly. The sheet contains per-participant conversation statistics (total messages, student messages, start/end timestamps, and duration in minutes) for both conditions, along with group averages and medians.

Section 4.2 — Student Perception Analysis

This section has its own self-contained Python package. Refer to 4.2-student_perception_analysis/README.md for full setup and usage instructions.

Section 4.3 — Quantitative BPMN Analysis

Requirements: pandas, numpy, seaborn, matplotlib, openpyxl

pip install pandas numpy seaborn matplotlib openpyxl
cd results_scripts/4.3-quantitative_bpmn_analysis
python quantitative_analysis.py

The script reads quantitative_data.xlsx, reshapes the data into long format, and produces boxplots_bpmn.png — a 2×4 grid of boxplots comparing BPMN element counts across conditions (Standard vs. Extended) and tasks (LLM-Tool vs. HUMAN-Teams), with reference lines for the expected values of each element type.

Section 4.4 — BPMN Model Analysis

No script is required. Open Section_4_4_-_Analysis_of_BPMN_models.xlsx directly. The sheet contains a qualitative assessment of each student's BPMN model for both scenarios (Standard and Extended), covering activity quality, loop/flow structure, and notable errors, broken down by participant and condition.


Ethics and Anonymization

All participant data has been anonymized. Student identifiers (S1–S8) are pseudonyms assigned at random and bear no relation to enrollment order or any other identifying attribute. Questionnaire response tokens (e.g., ex1HQQgKplhnmMzCOdV) were randomly generated at collection time. No names, email addresses, institutional affiliations, or other personally identifiable information are included in any file in this repository.

The study was conducted in accordance with the ethical guidelines of [institution name].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors