Skip to product information
1 of 12

www.ChineseStandard.us -- Field Test Asia Pte. Ltd.

YY/T 1833.2-2022 English PDF (YY/T1833.2-2022)

YY/T 1833.2-2022 English PDF (YY/T1833.2-2022)

Regular price $350.00
Regular price Sale price $350.00
Sale Sold out
Shipping calculated at checkout.
YY/T 1833.2-2022: (Artificial intelligence medical devices - Quality requirements and evaluation - Part 2: General requirements for data sets)
Delivery: 9 seconds. Download (and Email) true-PDF + Invoice.
Get Quotation: Click YY/T 1833.2-2022 (Self-service in 1-minute)
Newer / historical versions: YY/T 1833.2-2022
Preview True-PDF (Reload/Scroll-down if blank)


YY/T 1833.2-2022
YY
PHARMACEUTICAL INDUSTRY STANDARD
ICS 11.040.99
CCS C 30
Artificial intelligence medical device - Quality requirements
and evaluation -- Part 2.General requirements for datasets
ISSUED ON. JULY 01, 2022
IMPLEMENTED ON. JULY 01, 2023
Issued by. National Medical Products Administration
Table of Contents
Foreword... 3
Introduction... 4
1 Scope... 5
2 Normative references... 5
3 Terms and definitions... 5
4 Requirements for dataset description... 7
5 Requirements for dataset quality... 13
6 Evaluation of dataset quality compliance... 18
Annex A (normative) Explanation of dataset types... 23
Annex B (informative) Data screening and cleaning instructions... 25
Bibliography... 27
Artificial intelligence medical device - Quality requirements
and evaluation -- Part 2.General requirements for datasets
1 Scope
This document specifies the general quality requirements and evaluation methods for
datasets used throughout the life cycle of artificial intelligence medical devices.
This document is applicable to the development and evaluation of datasets used in the
research and development, production, testing, quality control and other aspects of
artificial intelligence medical devices.
2 Normative references
The following referenced documents are indispensable for the application of this
document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
GB/T 2828.4, Sampling procedures for inspection by attributes -- Part 4.Procedures
for assessment of declared quality levels
GB/T 2828.11, Sampling procedures for inspection by attributes -- Part 11.
Procedures for assessment of declared quality levels for small population
GB/T 6378.4, Sampling procedures for inspection by variables -- Part 4.Procedures
for assessment of declared quality levels for mean
YY/T 1833.1, Artificial Intelligence Medical Devices Quality Requirements and
Evaluation -- Part 1.Terminology
3 Terms and definitions
For the purposes of this document, the terms and definitions defined in YY/T 1833.1 as
well as the followings apply.
3.1 inspection by attributes
An inspection that is carried out with respect to a specified requirement or group of
requirements, or that only classifies unit products as qualified or unqualified, or that
only counts the number of unqualified units.
4.1.2.1 Compliance statement
The dataset description shall provide a compliance statement of the data source.
4.1.2.2 Privacy protection
The dataset description shall describe the technical means used to protect the privacy
of subjects, such as data de-identification, data anonymization, etc. When appropriate,
the dataset description document shall describe the rules for data de-identification or
data anonymization.
4.1.2.3 Diversity
The dataset description shall provide a description of the diversity of data sources, such
as population, collection location, collection equipment, parameter settings, operator
qualifications, collection process, collection time, etc.
4.1.2.4 Principles of compliance for data collection
The dataset description shall provide the regulations, technical standards, clinical
specifications, expert consensus or other references on which the data were collected.
4.1.2.5 Data screening
The dataset description shall describe the data entry and exclusion criteria, as well as
the methods for data screening, such as manual cleaning and automatic cleaning.
NOTE. See Annex B for examples.
4.1.3 Data preprocessing
When appropriate, the dataset description shall describe the steps and content of data
preprocessing.
4.1.4 Dataset annotation
4.1.4.1 Principles for dataset annotation
If the dataset has annotation information, the dataset description shall describe the
regulations, technical standards, clinical specifications, expert consensus or other
references on which the dataset is annotated.
4.1.4.2 Reference standards
If the dataset has annotation information, the dataset description shall describe the
establishment rules, scope, storage format and data specifications of the dataset
reference standard. If the reference standard is verifiable, the verification method of the
reference standard shall be described.
4.1.4.3 Annotation process
If the dataset has annotation information, the dataset description shall describe the data
annotation and quality control process and clarify the decision-making mechanism. In
the case of multiple annotations or multiple annotations, the arbitration mechanism for
annotation differences shall be described.
4.1.4.4 Other annotation information
If the dataset has annotation information, the dataset description shall describe the scope,
data specifications and storage format of other annotation information in addition to the
reference standard.
4.1.5 Dataset storage information
The dataset description shall describe the data storage information, such as the dataset
storage method and storage path, security control, backup, and recovery instructions. If
the dataset is stored using cloud services, the cloud service provider name and
qualifications, access path, and usage permission instructions shall be provided.
4.1.6 Dataset user access
4.1.6.1 Access control
The dataset description shall describe the user access control mechanism, such as user
type, permission allocation, and authorization mechanism.
4.1.6.2 Access conditions
The dataset description shall describe the conditions required to access the dataset, such
as software and hardware configuration, access method, data interface, protocol, tools,
etc.
4.1.6.3 Visualization
The dataset description shall describe the visual presentation of the dataset information.
4.1.7 Development management
The dataset description shall describe the governing standards to which the dataset was
developed.
4.2 Dataset identification
4.2.1 Identification
The dataset shall display a unique identifier, including the dataset name, version number,
and information about the dataset manufacture responsible organization. This can be
provided in the form of an attached file and described in detail in the dataset description
The dataset description shall state the extent to which the dataset is authentic and
trustworthy, including the acquisition and processing of data and metadata. Verifiable
evidence shall be presented in written form.
4.3.5 Timeliness
The dataset description shall state the extent to which the timeframes required for each
step in the dataset development phase meet expectations, taking into account
preprocessing, cleaning, labeling, etc. Provide verifiable metrics in written form.
4.3.6 Accessibility
The dataset description shall state the extent to which the dataset is accessible.
Demonstrate verifiable evidence in written form.
4.3.7 Compliance
The dataset description shall state the standard specifications, expert consensus,
operating procedures or other references to which the dataset complies.
4.3.8 Confidentiality
The dataset description shall describe the measures related to information security and
data confidentiality. Verifiable evidence shall be presented in written form.
The dataset description shall state the resource consumption required to perform the
dataset-related tasks. Verifiable compliance evidence shall be presented in written form,
such as the software, hardware, and network configuration required for tasks such as
accessing, reading data, previewing, and retrieving.
4.3.10 Precision
The dataset description shall describe the closeness of the quantitative information of
the data to the true value, taking into account the data elements, metadata, and data
annotation results. Provide verifiable indicators in written form, such as
spatial/temporal resolution, significant figures, and minimum measurement units.
4.3.11 Traceability
The dataset description shall describe the extent to which the data can be traced, taking
into account the data collection history, data annotation history, data access traces, and
data change traces.
4.3.12 Understandability
The dataset description shall use terms that are understandable to the users of the dataset.
Provide explanations of the meaning of the data elements, metadata, and annotation
results. Present verifiable evidence in written form.
4.3.13 Availability
The dataset description shall state the extent to which the dataset can be used and
retrieved by authorized users. Verifiable evidence shall be presented in written form.
4.3.14 Portability
The dataset description shall state the ability of the dataset to be installed, replaced, or
moved from one system to another. Maintain the attributes of the existing quality.
Consider the efficiency of data installation, replacement, and movement. Show
verifiable evidence in written form.
4.3.15 Recoverability
The dataset description shall state the extent to which the dataset can be recovered.
Verifiable evidence shall be presented in written form. The dataset description may
provide measures for data recovery. The dataset description may provide measures to
prevent interruption or failure in the use of the dataset.
4.3.16 Representativeness
The dataset description shall analyze the sample composition, proportion, population
distribution characteristics, data diversity, and the degree of closeness to the application
scenario. Verifiable indicators shall be provided in written form.
5 Requirements for dataset quality
5.1 Overview
The content of this document focuses on the quality characteristics and overall risk of
the dataset. It is advisable to conduct a quality assessment of the dataset based on its
intended use and application scenarios, and form a technical report as a verification of
the dataset quality.
5.2 Quality characteristics
5.2.1 Completeness
5.2.1.1 Accuracy
The dataset shall comply with the accuracy statements in the dataset description, such
as.
a) Accuracy of recorded information;
b) Accurate, clear and unambiguous text description;
d) Original records, intermediate records and final records shall be consistent.
External consistency refers to the correlation between data from different sources, such
as.
a) Data from different sources shall be consistent in terms of data characteristics;
b) Outliers shall be explainable;
c) Data from different sources shall comply with the same regulations, technical
standards, medical specifications, and other literature requirements during
collection and annotation.
5.2.4 Authenticity
The dataset shall comply with the dataset description with statements about authenticity,
such as.
a) Data shall come from real clinical data collection processes. When appropriate,
the equipment, personnel, and methods involved in data collection shall comply
with technical standards, clinical norms, or expert consensus;
b) Data amplification, data synthesis activities, and results shall be traceable and
interpretable;
c) Metadata shall describe the data truthfully.
5.2.5 Timeliness
The time limit for data collection, labeling, circulation, archiving, and change activities
shall comply with the statement of timeliness in the dataset description. Dynamically
updated datasets shall specify the data update cycle, update method, and update ratio.
If the data involves the time series process in clinical diagnosis and treatment, the
rationality of the data in terms of clinical timeliness shall be demonstrated.
5.2.6 Accessibility
Datasets shall meet the access needs within the scope of the intended use and
application scenarios of the dataset.
5.2.7 Compliance
Datasets shall comply with the statements about compliance in the dataset description.
5.2.8 Confidentiality
Datasets shall comply with the confidentiality statements in the dataset description.
Take steps to ensure that they can only be accessed by authorized users.
The isolated datasets shall have dataset authorization access mechanism and isolation
protection mechanism. There shall be measures to prevent data leakage, data tampering,
and data loss, such as data anonymization, physical isolation, data auditing, etc.
5.2.9 Resource utilization
The processing and use of the dataset shall be in accordance with the statement of
resource availability in the dataset description.
5.2.10 Precision
The dataset shall comply with the precision statement in the dataset description
document.
5.2.11 Traceability
The dataset shall comply with the dataset description regarding traceability, with
relevant records, such as.
a) Original data source, metadata source, compliance proof;
b) Data collection activity records;
c) Personnel management records;
d) Data annotation process records;
e) Blind management records;
f) Data circulation records;
g) Data questioning, auditing, deactivation, and correction records;
h) Records of labeling tools and platform usage;
i) Statistical information query of dataset labeling results, including labeling
progress, label statistics, labeler progress statistics, difficult case set, etc.;
j) Data service anomaly and failure records;
k) Data maintenance and backup records;
l) Data update records;
m) Cloud service provider name, contact information, cloud service type, etc.
5.2.12 Understandability
Datasets shall conform to the dataset description's statements regarding
understandability.
5.2.13 Availability
Datasets shall comply with the availability statements in the dataset description
document.
5.2.14 Portability
The dataset shall comply with the statement about portability in the dataset description.
If the dataset is allowed to be used on different platforms and systems, the data quality
shall not change with the platform and system.
5.2.15 Recoverability
Measures used to maintain the quality of the dataset and protect against failure events
shall be consistent with the stat...
View full details