{"id":19888,"date":"2026-06-25T08:00:19","date_gmt":"2026-06-25T06:00:19","guid":{"rendered":"https:\/\/www.gtb.de\/?p=19888"},"modified":"2026-06-29T18:10:17","modified_gmt":"2026-06-29T16:10:17","slug":"ai-tests-but-who-tests-the-ai","status":"publish","type":"post","link":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/","title":{"rendered":"AI Tests \u2014 But Who Tests the AI?"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1920px + 30px );margin-left: calc(-30px \/ 2 );margin-right: calc(-30px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:15px;--awb-margin-bottom-large:20px;--awb-spacing-left-large:15px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:15px;--awb-spacing-left-medium:15px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:15px;--awb-spacing-left-small:15px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-two blog-headline-box\" style=\"--awb-text-color:var(--awb-color6);--awb-margin-bottom:56px;--awb-margin-top-small:0px;--awb-margin-right-small:0px;--awb-margin-bottom-small:56px;--awb-margin-left-small:0px;--awb-font-size:var(--awb-custom_typography_12-font-size);\"><h2 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"font-family:var(--awb-custom_typography_12-font-family);font-weight:var(--awb-custom_typography_12-font-weight);font-style:var(--awb-custom_typography_12-font-style);margin:0;letter-spacing:var(--awb-custom_typography_12-letter-spacing);text-transform:var(--awb-custom_typography_12-text-transform);font-size:1em;--fontSize:40;line-height:var(--awb-custom_typography_12-line-height);\"><p>Ever since generative AI tools made their way into software development, a new narrative has become popular: AI writes code, AI generates test cases, AI finds bugs. Humans set the goal \u2014 the model does the rest. Productivity gains that used to take quarters now happen in hours.<\/p>\n<p>This development is real. And it is good.<br \/>\nBut it raises a question that often gets lost in the euphoria: <strong>if AI tests \u2014 who tests the AI?<\/strong><\/p><\/h2><\/div><div class=\"fusion-builder-row fusion-builder-row-inner fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-space-between fusion-flex-content-wrap\" style=\"--awb-flex-grow:0;--awb-flex-grow-medium:0;--awb-flex-grow-small:0;--awb-flex-shrink:0;--awb-flex-shrink-medium:0;--awb-flex-shrink-small:0;width:calc( 100% + 30px ) !important;max-width:calc( 100% + 30px ) !important;margin-left: calc(-30px \/ 2 );margin-right: calc(-30px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column_inner fusion-builder-nested-column-0 awb-sticky awb-sticky-medium awb-sticky-large fusion_builder_column_inner_2_5 2_5 fusion-flex-column blue-box person-box\" style=\"--awb-padding-top:24px;--awb-padding-right:24px;--awb-padding-bottom:24px;--awb-padding-left:24px;--awb-padding-top-medium:24px;--awb-padding-right-medium:24px;--awb-padding-bottom-medium:24px;--awb-padding-left-medium:24px;--awb-padding-top-small:24px;--awb-padding-right-small:24px;--awb-padding-bottom-small:24px;--awb-padding-left-small:24px;--awb-overflow:hidden;--awb-bg-color:#28A0DC33;--awb-bg-color-hover:#28A0DC33;--awb-bg-size:cover;--awb-border-radius:16px 16px 16px 16px;--awb-width-large:40%;--awb-margin-top-large:0px;--awb-spacing-right-large:15px;--awb-margin-bottom-large:187px;--awb-spacing-left-large:15px;--awb-width-medium:40%;--awb-order-medium:0;--awb-spacing-right-medium:15px;--awb-spacing-left-medium:15px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:15px;--awb-spacing-left-small:15px;--awb-sticky-offset:150px;\" data-scroll-devices=\"small-visibility,medium-visibility,large-visibility\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-bottom:8px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\" style=\"border-radius:50%;\"><img decoding=\"async\" width=\"402\" height=\"402\" title=\"Anja Kribernegg\" src=\"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Anja-Kribernegg.png\" data-orig-src=\"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Anja-Kribernegg.png\" alt class=\"lazyload img-responsive wp-image-19885\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27402%27%20height%3D%27402%27%20viewBox%3D%270%200%20402%20402%27%3E%3Crect%20width%3D%27402%27%20height%3D%27402%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Anja-Kribernegg-200x200.png 200w, https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Anja-Kribernegg-400x400.png 400w, https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Anja-Kribernegg.png 402w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 800px) 100vw, 402px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-1 md-text-align-right fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-font-size:14px;--awb-text-color:var(--awb-color6);--awb-margin-bottom:24px;\"><p>Anja Kribernegg<\/p>\n<\/div><div class=\"fusion-text fusion-text-2 fusion-text-no-margin\" style=\"--awb-text-color:var(--awb-color6);--awb-margin-bottom:24px;\"><p>As an experienced test manager, project manager and business analyst, Anja Kribernegg has successfully managed complex IT projects in the banking, insurance and public sectors over the past 20+ years. With her expertise in agile methodologies (Scrum, Kanban, SAFe), test automation, end-to-end testing and requirements engineering, she has not only implemented technical solutions but also guided teams through digital transformations.<br \/>\nHer focus lies on combining process optimisation, quality assurance and compliance \u2013 always with the aim of creating robust and user-centred systems. She applies this experience and expertise specifically to projects that require innovative testing approaches and sustainable IT solutions. Her open-mindedness supports her in this endeavour.<\/p>\n<\/div><div class=\"fusion-text fusion-text-3 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-font-size:14px;--awb-text-color:var(--awb-color6);--awb-margin-bottom:0px;\"><p>Ver\u00f6ffentlicht: 06.2026<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column_inner fusion-builder-nested-column-1 fusion_builder_column_inner_3_5 3_5 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:60%;--awb-margin-top-large:0px;--awb-spacing-right-large:15px;--awb-margin-bottom-large:20px;--awb-spacing-left-large:15px;--awb-width-medium:60%;--awb-order-medium:0;--awb-spacing-right-medium:15px;--awb-spacing-left-medium:15px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:15px;--awb-spacing-left-small:15px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:var(--awb-color6);--awb-margin-bottom:32px;--awb-margin-top-small:0px;--awb-margin-right-small:0px;--awb-margin-bottom-small:32px;--awb-margin-left-small:0px;--awb-font-size:var(--awb-custom_typography_13-font-size);\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"font-family:var(--awb-custom_typography_13-font-family);font-weight:var(--awb-custom_typography_13-font-weight);font-style:var(--awb-custom_typography_13-font-style);margin:0;letter-spacing:var(--awb-custom_typography_13-letter-spacing);text-transform:var(--awb-custom_typography_13-text-transform);font-size:1em;--fontSize:32;line-height:var(--awb-custom_typography_13-line-height);\">The New Test Object<\/h3><\/div><div class=\"fusion-text fusion-text-4\" style=\"--awb-text-color:var(--awb-color6);\"><p>In classical software development, the test object is clearly defined: a system with specified requirements, deterministic behavior, and traceable logic. A well-formulated test case has an unambiguous expectation. Either the system delivers the correct result \u2014 or it doesn&#8217;t.<\/p>\n<p>AI systems play by different rules.<\/p>\n<p>A large language model gives a different answer today than it did yesterday for an identical input. An image classification model makes the right decision in 98% of cases \u2014 but in which 2% does it get it wrong, and why? A recommendation system optimizes for click-through rate even though it is really supposed to maximize customer satisfaction. These systems aren&#8217;t incorrectly programmed. They are built that way. And that is precisely what makes testing them fundamentally different.<\/p>\n<p>The quality community faces a challenge for which the classical methods are only partly equipped: how do you test a system whose behavior is probabilistic, context-dependent, and shaped by training data \u2014 rather than by explicit logic?<\/p>\n<\/div><div class=\"fusion-separator fusion-full-width-sep\" style=\"align-self: center;margin-left: auto;margin-right: auto;margin-top:12px;margin-bottom:32px;width:100%;\"><div class=\"fusion-separator-border sep-single sep-solid\" style=\"--awb-height:20px;--awb-amount:20px;--awb-sep-color:rgba(40,160,220,0.5);border-color:rgba(40,160,220,0.5);border-top-width:1px;\"><\/div><\/div><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:var(--awb-color6);--awb-margin-bottom:32px;--awb-margin-top-small:0px;--awb-margin-right-small:0px;--awb-margin-bottom-small:32px;--awb-margin-left-small:0px;--awb-font-size:var(--awb-custom_typography_13-font-size);\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"font-family:var(--awb-custom_typography_13-font-family);font-weight:var(--awb-custom_typography_13-font-weight);font-style:var(--awb-custom_typography_13-font-style);margin:0;letter-spacing:var(--awb-custom_typography_13-letter-spacing);text-transform:var(--awb-custom_typography_13-text-transform);font-size:1em;--fontSize:32;line-height:var(--awb-custom_typography_13-line-height);\">Where Classical Testing Methods Reach Their Limits<\/h3><\/div><div class=\"fusion-text fusion-text-5\" style=\"--awb-text-color:var(--awb-color6);\"><p>Let&#8217;s look at three core principles of professional testing \u2014 and what happens to them when the test object is an AI system:<\/p>\n<p><strong>Reproducibility:<\/strong> A classical test always delivers the same result under the same conditions. With generative AI systems, this is not guaranteed. Factors such as temperature parameters, sampling strategies, and model updates can cause a test that was green yesterday to be red today \u2014 without the code having changed. Flaky tests (unreliable or wobbly tests) take on a new dimension.<\/p>\n<p><strong>Expected values:<\/strong> Every test case needs an oracle \u2014 a defined expectation. With AI systems, the oracle is often unclear. What is the &#8220;right&#8221; answer to a complex customer inquiry? What is a &#8220;fair&#8221; credit decision? These questions are not technical \u2014 they are ethical and domain-specific. Test design no longer begins in the test team, but in the business domain, in the ethics committee, or in legal review.<\/p>\n<p><strong>Quality characteristics:<\/strong> This sets entirely new priorities for the quality characteristics that must be considered. Ethics, fairness, and freedom from bias, for example, are not even included in the ISO\/IEC 25010 standard. Measuring the quality of these systems additionally requires statistical methods that have no significance in classical quality engineering. The F1 score, for instance, is a central metric in statistics and machine learning for evaluating the quality of a classification model. It is calculated as the harmonic mean of precision and recall.<\/p>\n<p><strong>Coverage:<\/strong> Statement coverage, branch coverage, path coverage \u2014 all of these metrics presuppose that there is a defined code flow. For a neural network with millions of weights, &#8220;coverage&#8221; is not a meaningful concept in the classical sense. New metrics are needed: coverage of the input space, robustness against adversarial inputs, and behavior at distribution boundaries.<\/p>\n<\/div><div class=\"fusion-separator fusion-full-width-sep\" style=\"align-self: center;margin-left: auto;margin-right: auto;margin-top:12px;margin-bottom:32px;width:100%;\"><div class=\"fusion-separator-border sep-single sep-solid\" style=\"--awb-height:20px;--awb-amount:20px;--awb-sep-color:rgba(40,160,220,0.5);border-color:rgba(40,160,220,0.5);border-top-width:1px;\"><\/div><\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:var(--awb-color6);--awb-margin-bottom:32px;--awb-margin-top-small:0px;--awb-margin-right-small:0px;--awb-margin-bottom-small:32px;--awb-margin-left-small:0px;--awb-font-size:var(--awb-custom_typography_13-font-size);\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"font-family:var(--awb-custom_typography_13-font-family);font-weight:var(--awb-custom_typography_13-font-weight);font-style:var(--awb-custom_typography_13-font-style);margin:0;letter-spacing:var(--awb-custom_typography_13-letter-spacing);text-transform:var(--awb-custom_typography_13-text-transform);font-size:1em;--fontSize:32;line-height:var(--awb-custom_typography_13-line-height);\">What AI Testing Means \u2014 Concrete Approaches<\/h3><\/div><div class=\"fusion-text fusion-text-6\" style=\"--awb-text-color:var(--awb-color6);\"><p>The good news: the craft of testing is not obsolete. It has to be expanded.<\/p>\n<p>Here are approaches that are already applicable today:<br \/>\n<strong>Metamorphic Testing:<\/strong> When no unambiguous oracle exists, relations between test cases can be checked. If a translation system correctly translates the sentence &#8220;The cat is sitting on the mat,&#8221; then the translation of &#8220;The cat is not sitting on the mat&#8221; should contain a consistent negation. It is not the absolute answer that is tested \u2014 but the consistency of the behavior under defined transformations of the input.<\/p>\n<p><strong>Property-Based Testing:<\/strong> Instead of individual test cases, properties are defined that the system should always satisfy \u2014 regardless of the specific input. A credit decision system should arrive at the same results for identical financial data, independent of the applicant&#8217;s ethnic origin. This property can be checked automatically with thousands of generated test cases.<\/p>\n<p><strong>Adversarial Testing:<\/strong> AI systems can be induced to produce wrong outputs through targeted manipulation of the input \u2014 so-called adversarial examples. An image that clearly shows a cat to the human eye can, through minimal pixel manipulation, cause an AI system to classify it as &#8220;dog.&#8221; Safety-critical AI systems must be tested for this kind of robustness.<\/p>\n<p><strong>Bias and Fairness Testing:<\/strong> Training data reflects historical realities \u2014 and thus historical inequalities. A model trained on applicant data from the last 20 years may structurally disadvantage certain groups. Bias testing means: systematically checking whether the model decides consistently and fairly across different demographic groups. Tools such as IBM AI Fairness 360 or Facets offer initial methodological support here.<\/p>\n<p><strong>Monitoring as Continuous Testing:<\/strong> AI systems also change in the field \u2014 through new training data, model updates, and changed usage contexts. A one-time test before go-live is not sufficient. Production monitoring that responds to statistical deviations in model behavior (data drift, concept drift) is a form of continuous testing \u2014 and must be planned and operated as such.<\/p>\n<\/div><div class=\"fusion-separator fusion-full-width-sep\" style=\"align-self: center;margin-left: auto;margin-right: auto;margin-top:12px;margin-bottom:32px;width:100%;\"><div class=\"fusion-separator-border sep-single sep-solid\" style=\"--awb-height:20px;--awb-amount:20px;--awb-sep-color:rgba(40,160,220,0.5);border-color:rgba(40,160,220,0.5);border-top-width:1px;\"><\/div><\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-text-color:var(--awb-color6);--awb-margin-bottom:32px;--awb-margin-top-small:0px;--awb-margin-right-small:0px;--awb-margin-bottom-small:32px;--awb-margin-left-small:0px;--awb-font-size:var(--awb-custom_typography_13-font-size);\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"font-family:var(--awb-custom_typography_13-font-family);font-weight:var(--awb-custom_typography_13-font-weight);font-style:var(--awb-custom_typography_13-font-style);margin:0;letter-spacing:var(--awb-custom_typography_13-letter-spacing);text-transform:var(--awb-custom_typography_13-text-transform);font-size:1em;--fontSize:32;line-height:var(--awb-custom_typography_13-line-height);\">Human Responsibility Remains<\/h3><\/div><div class=\"fusion-text fusion-text-7\" style=\"--awb-text-color:var(--awb-color6);\"><p>AI can help with testing \u2014 generating test cases, analyzing logs, detecting anomalies. That is valuable. But AI cannot decide what a fair outcome is. AI cannot weigh which risk is acceptable. AI cannot take responsibility for a system that affects people.<\/p>\n<p>That responsibility lies with humans. And it lies specifically with those who practice testing professionally.<\/p>\n<p>The role of the quality engineer is changing: less manual test execution, more test design for non-deterministic systems. Less script maintenance, more risk assessment and ethical review. Less writing bug reports, more quality responsibility within interdisciplinary teams \u2014 together with data scientists, business domains, and the legal team.<\/p>\n<p>This requires new competencies. And it requires the quality community to actively help shape this development \u2014 instead of waiting for others to provide the answers.<\/p>\n<p><strong>An Open Question to the Industry<\/strong><br \/>\nAI systems today make decisions that grant loans, diagnose illnesses, screen out job applications, and steer autonomous vehicles. The question of how these systems are tested \u2014 systematically, traceably, responsibly \u2014 is not an academic one. It is a societal one.<\/p>\n<p>The quality community has the tools, the intellect, and the experience to take a leading role here. The question is whether we will also demand it.<\/p>\n<\/div><div class=\"fusion-text fusion-text-8\" style=\"--awb-font-size:20px;--awb-text-color:var(--awb-color6);\"><p>AI tests. But who tests the AI \u2014 that should be us.<\/p>\n<\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":6,"featured_media":19887,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[81],"tags":[82,91,84],"class_list":["post-19888","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-ai","tag-quality-engineering","tag-test"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>AI Tests \u2014 But Who Tests the AI? - German Testing Board<\/title>\n<meta name=\"description\" content=\"AI writes code and finds bugs. But who tests the AI? The limitations of traditional testing methods and new approaches to testing AI systems.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AI Tests \u2014 But Who Tests the AI? - German Testing Board\" \/>\n<meta property=\"og:description\" content=\"AI writes code and finds bugs. But who tests the AI? The limitations of traditional testing methods and new approaches to testing AI systems.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"German Testing Board\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-25T06:00:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-29T16:10:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Heroimage-Post05-1024x462.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"462\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Dr. Armin Metzger\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GTB_ISTQB\" \/>\n<meta name=\"twitter:site\" content=\"@GTB_ISTQB\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dr. Armin Metzger\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/\"},\"author\":{\"name\":\"Dr. Armin Metzger\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#\\\/schema\\\/person\\\/580154dcda34782d68a1503d906ba00a\"},\"headline\":\"AI Tests \u2014 But Who Tests the AI?\",\"datePublished\":\"2026-06-25T06:00:19+00:00\",\"dateModified\":\"2026-06-29T16:10:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/\"},\"wordCount\":6343,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.gtb.de\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/Heroimage-Post05.png\",\"keywords\":[\"AI\",\"Quality Engineering\",\"Test\"],\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/\",\"url\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/\",\"name\":\"AI Tests \u2014 But Who Tests the AI? - German Testing Board\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.gtb.de\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/Heroimage-Post05.png\",\"datePublished\":\"2026-06-25T06:00:19+00:00\",\"dateModified\":\"2026-06-29T16:10:17+00:00\",\"description\":\"AI writes code and finds bugs. But who tests the AI? The limitations of traditional testing methods and new approaches to testing AI systems.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.gtb.de\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/Heroimage-Post05.png\",\"contentUrl\":\"https:\\\/\\\/www.gtb.de\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/Heroimage-Post05.png\",\"width\":2392,\"height\":1080,\"caption\":\"Wer testet die KI?\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/blog\\\/ai-tests-but-who-tests-the-ai\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/homepage\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI Tests \u2014 But Who Tests the AI?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/\",\"name\":\"German Testing Board\",\"description\":\"Software.Testing.Excellence\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#organization\",\"name\":\"German Testing Board e. V.\",\"url\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.gtb.de\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/gtb-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.gtb.de\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/gtb-logo.png\",\"width\":224,\"height\":183,\"caption\":\"German Testing Board e. V.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/GTB_ISTQB\",\"https:\\\/\\\/de.linkedin.com\\\/company\\\/german-testing-board\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.gtb.de\\\/en\\\/#\\\/schema\\\/person\\\/580154dcda34782d68a1503d906ba00a\",\"name\":\"Dr. Armin Metzger\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/70d5768f24f60d4aa0012915fb37c2937223f49270944aa45222f8a20583d028?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/70d5768f24f60d4aa0012915fb37c2937223f49270944aa45222f8a20583d028?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/70d5768f24f60d4aa0012915fb37c2937223f49270944aa45222f8a20583d028?s=96&d=mm&r=g\",\"caption\":\"Dr. Armin Metzger\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AI Tests \u2014 But Who Tests the AI? - German Testing Board","description":"AI writes code and finds bugs. But who tests the AI? The limitations of traditional testing methods and new approaches to testing AI systems.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/","og_locale":"en_US","og_type":"article","og_title":"AI Tests \u2014 But Who Tests the AI? - German Testing Board","og_description":"AI writes code and finds bugs. But who tests the AI? The limitations of traditional testing methods and new approaches to testing AI systems.","og_url":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/","og_site_name":"German Testing Board","article_published_time":"2026-06-25T06:00:19+00:00","article_modified_time":"2026-06-29T16:10:17+00:00","og_image":[{"width":1024,"height":462,"url":"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Heroimage-Post05-1024x462.png","type":"image\/png"}],"author":"Dr. Armin Metzger","twitter_card":"summary_large_image","twitter_creator":"@GTB_ISTQB","twitter_site":"@GTB_ISTQB","twitter_misc":{"Written by":"Dr. Armin Metzger","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#article","isPartOf":{"@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/"},"author":{"name":"Dr. Armin Metzger","@id":"https:\/\/www.gtb.de\/en\/#\/schema\/person\/580154dcda34782d68a1503d906ba00a"},"headline":"AI Tests \u2014 But Who Tests the AI?","datePublished":"2026-06-25T06:00:19+00:00","dateModified":"2026-06-29T16:10:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/"},"wordCount":6343,"commentCount":0,"publisher":{"@id":"https:\/\/www.gtb.de\/en\/#organization"},"image":{"@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Heroimage-Post05.png","keywords":["AI","Quality Engineering","Test"],"articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/","url":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/","name":"AI Tests \u2014 But Who Tests the AI? - German Testing Board","isPartOf":{"@id":"https:\/\/www.gtb.de\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#primaryimage"},"image":{"@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Heroimage-Post05.png","datePublished":"2026-06-25T06:00:19+00:00","dateModified":"2026-06-29T16:10:17+00:00","description":"AI writes code and finds bugs. But who tests the AI? The limitations of traditional testing methods and new approaches to testing AI systems.","breadcrumb":{"@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#primaryimage","url":"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Heroimage-Post05.png","contentUrl":"https:\/\/www.gtb.de\/wp-content\/uploads\/2026\/06\/Heroimage-Post05.png","width":2392,"height":1080,"caption":"Wer testet die KI?"},{"@type":"BreadcrumbList","@id":"https:\/\/www.gtb.de\/en\/blog\/ai-tests-but-who-tests-the-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.gtb.de\/en\/homepage\/"},{"@type":"ListItem","position":2,"name":"AI Tests \u2014 But Who Tests the AI?"}]},{"@type":"WebSite","@id":"https:\/\/www.gtb.de\/en\/#website","url":"https:\/\/www.gtb.de\/en\/","name":"German Testing Board","description":"Software.Testing.Excellence","publisher":{"@id":"https:\/\/www.gtb.de\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.gtb.de\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.gtb.de\/en\/#organization","name":"German Testing Board e. V.","url":"https:\/\/www.gtb.de\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.gtb.de\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.gtb.de\/wp-content\/uploads\/2023\/10\/gtb-logo.png","contentUrl":"https:\/\/www.gtb.de\/wp-content\/uploads\/2023\/10\/gtb-logo.png","width":224,"height":183,"caption":"German Testing Board e. V."},"image":{"@id":"https:\/\/www.gtb.de\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/GTB_ISTQB","https:\/\/de.linkedin.com\/company\/german-testing-board"]},{"@type":"Person","@id":"https:\/\/www.gtb.de\/en\/#\/schema\/person\/580154dcda34782d68a1503d906ba00a","name":"Dr. Armin Metzger","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/70d5768f24f60d4aa0012915fb37c2937223f49270944aa45222f8a20583d028?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/70d5768f24f60d4aa0012915fb37c2937223f49270944aa45222f8a20583d028?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/70d5768f24f60d4aa0012915fb37c2937223f49270944aa45222f8a20583d028?s=96&d=mm&r=g","caption":"Dr. Armin Metzger"}}]}},"_links":{"self":[{"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/posts\/19888","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/comments?post=19888"}],"version-history":[{"count":10,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/posts\/19888\/revisions"}],"predecessor-version":[{"id":19926,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/posts\/19888\/revisions\/19926"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/media\/19887"}],"wp:attachment":[{"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/media?parent=19888"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/categories?post=19888"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gtb.de\/en\/wp-json\/wp\/v2\/tags?post=19888"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}