February 4, 2026

Creating a GitHub Actions Workflow for Android Detox Testing with GitHub CoPilot? What Could Go Wrong?

Last month, I shared my experience using GitHub Copilot to create a React Native app from scratch to be used in my DetoxDemo project in my article, First Time Using GitHub CoPilot to Create a ReactNative LoginPage app. What Could Go Wrong?

This time, I used GitHub Copilot (Claude Opus 4.5) to create a GitHub Actions CI/CD workflow for running Detox end-to-end tests on Android. While GitHub CoPilot is incredibly powerful, it still required significant human guidance to get the workflow passing.

Detox Demo: https://github.com/tjmaher/detox-demo

I had a working GitHub Actions Workflow with ios-regression.yml and asked Copilot to create an Android version that matched. Despite this instruction, I had to repeatedly ask Copilot to compare against the iOS workflow to create the Android workflow, android-regression.yml.

The result? 14 commits, 17 hours, and a lot of lessons learned. Here's the timeline of what went wrong, and what finally worked:

[ View the Pull Request ]

The Stats

Total Commits: 14 commits

Time Span: ~17 hours

  • Started: Feb 3, 2026 at 9:54 PM EST
  • Finally Passed: Feb 4, 2026 at 3:15 PM EST


Commit Timeline

📊 14 commits over 17 hours

🕘 9:54 PM - Android support - Initial workflow creation

🕙 10:05 PM - Android support - Initial fixes

🕙 10:19 PM - Add cleanup stage

🕚 11:19 PM - Check Gradle is running

🕚 11:37 PM - Switch to macOS runner - ❌ Failed - macOS doesn't support HVF for Android emulators

🕛 11:56 PM - Upload/download artifacts - Split build and test into separate stages

🕐 12:57 AM - Change arch to arm64-v8a - ❌ Wrong architecture for Linux runners

🕐 1:22 AM - Change arch to arm64-v8a - ❌ Still wrong - needed x86_64

🕗 8:45 AM - Environment variables persist - Script variables not persisting between lines

🕘 9:05 AM - Fix if/fi problem - ❌ Multi-line if/then/else/fi broken - each line runs as separate command

🕤 9:31 AM - Start Metro - ❌ Missing Metro bundler - "Unable to load script" errors

🕑 1:58 PM - Enable Allure only for iOS - ❌ videokitten/scrcpy video recording fails on Android emulators

🕑 2:41 PM - End Metro gracefully - ❌ "Exit code null" - cleanup commands interrupted by emulator shutdown

🕒 3:15 PM - End Metro gracefully - ✅ PASSED! Removed manual cleanup - let emulator-runner handle it

What I Had to Have GitHub CoPilot Correct

"Just Make It Like iOS" (4+ times)

  • Initial request: "Match android-regression.yml with ios-regression.yml"
  • Metro bundler check: "Compare with ios-regression to ensure Metro is fully implemented" - Copilot had missed adding Metro startup entirely
  • Cleanup approach: "What does ios-regression use? Can we use that?" - Copilot was overcomplicating the Metro cleanup
  • iOS bootstatus fix: When the iOS simulator boot was hanging, my prompt revealed Copilot's until loop was flawed

Runner & Environment Issues

  • Runner type: Copilot initially tried macos-latest to match iOS, but macOS GitHub runners don't support nested virtualization (HVF) for Android emulators. Had to switch to ubuntu-latest with KVM.
  • KVM permissions: Copilot forgot to add the KVM permissions step (sudo chmod 777 /dev/kvm) required for hardware acceleration on Linux runners.
  • Architecture mismatch: Initially used arm64-v8a (matching macOS ARM), had to change to x86_64 for Linux.

android-emulator-runner Action Issues

  • Missing script: input: Got "Input required and not supplied: script" error - Copilot didn't know the ReactiveCircus/android-emulator-runner action required a script: parameter.
  • Script execution: Each line in script: was executed as a separate shell command, breaking multi-line if/then/else/fi statements. Everything had to be consolidated onto single lines.

Metro Bundler Issues

  • Metro not running: Got "Unable to load script" errors - Copilot forgot Android debug builds also need Metro running, just like iOS.
  • Metro cleanup causing failures: Five failed attempts to fix "exit code null":
    1. Tried kill $METRO_PID || true - failed
    2. Added pkill commands - still failed
    3. Used nohup for Metro - still failed
    4. Wrapped cleanup in subshell with || true - still failed
    5. Finally removed cleanup entirely - the emulator-runner action handles process termination automatically

Allure/Video Recording Issues

  • videokitten errors on Android: The Allure adapter uses videokitten/scrcpy for video recording, which fails on Android emulators. I had to point out the errors were still happening.
  • jest.config.js not checking env var: Copilot fixed .detoxrc.js but forgot jest.config.js also loads the Allure adapter. Both files needed updating.
  • Naming convention: I requested renaming DETOX_DISABLE_ALLURE to DETOX_ENABLE_ALLURE for clearer opt-in semantics.

System Dependencies

  • Missing libraries: Had to add libpulse0 and scrcpy installation for emulator audio/video support.

Documentation Gaps

  • README not updated: I had to ask if DETOX_ENABLE_ALLURE was documented.
  • ESM link missing: I asked for a link explaining what ESM (ECMAScript Modules) is.
  • Custom variable clarification: I wanted it clear that DETOX_ENABLE_ALLURE is project-specific, not a built-in Detox property.

Lessons that GitHub CoPilot Learned Along the Way

According to GitHub CoPilot, the lessons it learned during this project were:

  • AI needs human oversight: Even with a working reference (ios-regression.yml), Copilot kept making platform-specific assumptions that required correction.
  • CI/CD is complicated: The interaction between GitHub Actions, android-emulator-runner, shell scripts, and background processes created edge cases that Copilot couldn't anticipate.
  • Check the working example: When Copilot's solution doesn't work, explicitly asking "what does the working version do?" helped identify gaps.
  • Multiple iterations are normal: 14 commits over 17 hours for a CI/CD workflow isn't unusual, even with AI assistance (according to GitHub CoPilot).
  • AI is a collaborator, not a replacement: The value came from rapid iteration and suggestions, but human judgment was essential for debugging and validation.

As always... Happy Testing!

-T.J. Maher

Software Engineer in Test

BlueSky | YouTubeLinkedIn | Articles

No comments: