Skip to product information
1 of 8

PayPal, credit cards. Download editable-PDF and invoice in 1 second!

GB/T 38673-2020 English PDF (GBT38673-2020)

GB/T 38673-2020 English PDF (GBT38673-2020)

Regular price $205.00 USD
Regular price Sale price $205.00 USD
Sale Sold out
Shipping calculated at checkout.
Quotation: In 1-minute, 24-hr self-service. Click here GB/T 38673-2020 to get it for Purchase Approval, Bank TT...

GB/T 38673-2020: Information technology -- Big data -- Basic requirements for big data systems

This Standard specifies the functional requirements and non-functional requirements of big data systems. This Standard is applicable to the design, model selection, acceptance and testing of various big data system requirements.
GB/T 38673-2020
GB
NATIONAL STANDARD OF THE
PEOPLE’S REPUBLIC OF CHINA
ICS 35.240
L 67
Information technology - Big data - Basic
requirements for big data systems
ISSUED ON: APRIL 28, 2020
IMPLEMENTED ON: NOVEMBER 01, 2020
Issued by: State Administration for Market Regulation;
Standardization Administration of the People’s Republic of
China.
Table of Contents
Foreword ... 3
1 Scope ... 4
2 Normative references ... 4
3 Terms and definitions ... 4
4 Abbreviations ... 5
5 Big data system framework ... 5
6 Functional requirements ... 7
7 Non-functional requirements ... 14
Information technology - Big data - Basic
requirements for big data systems
1 Scope
This Standard specifies the functional requirements and non-functional
requirements of big data systems.
This Standard is applicable to the design, model selection, acceptance and testing of various big data system requirements.
2 Normative references
The following documents are indispensable for the application of this document. For dated references, only the dated version applies to this document. For undated references, the latest edition (including all amendments) applies to this document.
GB/T 35295-2017, Information technology - Big data - Terminology
GB/T 35589-2017, Information technology - Big data - Technical reference model
3 Terms and definitions
Terms and definitions determined by GB/T 35295-2017 and the following ones are applicable to this document. For ease of use, some of the terms and definitions in GB/T 35295-2017 are repeated below.
3.1 Big data system
The system that implements all or part of the big data reference architecture. [GB/T 35295-2017, Definition 2.1.14]
3.2 Distributed computing
A computing mode that covers the storage layer and the processing layer and is used to implement multi-type programming algorithm models.
c) It shall provide column conversion, row conversion and table conversion functions of structured data;
d) It shall provide data loading function, to support the loading of cleaned and converted data to the data analysis module;
e) It should provide data comparison function before and after cleaning; f) It should support data conversion function of unstructured data.
6.3 Data storage module
The data storage module requirements are as follows:
a) It shall provide data storage function, to support the storage of structured data, unstructured data and semi-structured data.
b) It shall provide the function of exchanging data or files with relational databases and other file systems.
c) Support distributed file storage, to realize the following functions: 1) It shall support basic operations of the file system, including upload, download, read and write, copy, move, delete, rename, permission
modification, etc.;
2) It shall support multi-copy storage and recovery functions of data blocks; 3) It should support the function of fast retrieval of files, and support the unified retrieval, cataloging, adding and deleting operations of data
resources;
4) It should support data compression storage function.
d) Support distributed column data storage, to achieve the following functions: 1) It shall support the function of storing data in the form of key-value; 2) It should support user authority management functions that are based on tables, column families, and columns. Authority management
operations include read, write, and create.
e) Support distributed structured data storage, to achieve the following functions:
1) It should support distributed storage of structured data, to ensure the scalability and consistency of data storage;
1) Built-in graph data query API, support synchronous or asynchronous
computing model to write iterative algorithms;
2) Online graph analysis and query function;
3) Graph data expression that is based on the attribute graph model,
including the label and attribute type definition on the node/edge;
4) Built-in common graph index calculation function, to describe the
topological structure characteristics of graphs.
d) It should support memory computing, to realize the following functions: 1) Provide data processing capabilities through distributed memory
computing and DAG execution engine;
2) Support multiple data types, including data processing of structured data, unstructured data, and semi-structured data.
e) It should support the batch stream integration computing framework, to achieve the following functions:
1) Batch stream integration unified query SQL language;
2) Streaming SQL in multiple scenarios, such as location information
analysis, etc.;
3) Common time windows, including jumping windows, sliding windows, etc. f) It should support automatic scheduling of tasks according to the
dependencies between tasks.
g) It should support the description of multi-task dependencies within the job in the form of a directed acyclic graph.
h) It should provide the ability to dispatch complex tasks.
6.5 Data analysis module
The data analysis module requirements are as follows:
a) Support data query, to realize the following functions:
1) It shall provide the function of querying through a standard database connection interface;
2) It shall provide the function of querying through the REST API query interface;
3) It should support data statistics on real-time streams;
4) It should support the sorting of streaming data;
5) It should support the association with static tables;
6) It should support the associated processing of multiple data streams. f) It should support interactive on-line analysis, to achieve the following functions:
1) Perform distributed on-line analysis of data through structured query language, such as OLAP;
2) Perform ad hoc query of data through structured query language;
3) Use visualization middleware to display data analysis results;
4) Define the calculation formula and parameter configuration during the interactive analysis process;
5) Automatically save and roll back during interactive analysis;
6) Save and publish analysis results during interactive analysis;
7) Interactive data analysis based on online on-line analysis.
g) It should support visual process editing operations, to achieve the
following functions:
1) Perform process editing and revision through drag;
2) Support workflow dispatch trigger mechanism, configurable trigger time or trigger event;
3) Support the persistent storage of process editing results.
6.6 Data visualization module
The requirements of the visualization module are as follows:
a) It should support the use of conventional charts to display data, such as tables, bar charts, pie charts, line charts, heat maps;
b) It should support the API of third-party data visualization tools.
6.7 Data access module
d) It shall provide service management functions, including the management of big data system component services;
e) It should provide the health check management function, to support the realization of cluster health check through a graphical interface.
7 Non-functional requirements
7.1 Reliability requirements
7.1.1 High availability
High availability requirements are as follows:
a) It shall provide the system automatic fault detection and management functions;
b) It shall ensure that there is no single point failure risk for system components;
c) When any node of the cluster fails, there shall be no service interruption, data loss or data inconsistency;
d) When any unit of the cluster fails, the system operation shall not be affected;
e) It shall guarantee that the system operates without any problems for a long time without interruption.
7.1.2 Data redundant storage and distribution
Data redundancy storage and distribution requirements are as follows:
a) It shall provide the metadata multi-copy memory function; the failure of any node will not affect the system's ability to continue to provide services; b) It shall provide the master copy planning function that is based on partition fault tolerance, with the ability to plan the physical distribution of each copy data in advance.
7.1.3 Data backup and recovery
The data backup and recovery requirements are as follows:
a) It shall provide distributed file storage backup and recovery functions; b) It shall configure authority for users according to the principle of minimizing authority;
c) It shall support the allocation of authority for users according to the granularity of the data table level and the data column level;
d) It shall support the allocation of authority for users according to different operation types (such as adding, deleting, modifying, checking, executing). 7.3.3 Log management
The log management requirements are as follows:
a) It shall provide the function of recording system operation logs, to record important operations of users;
b) It shall ensure that the system operation log cannot be deleted, modified or overwritten;
c) The operation log shall include date, time, operator information, operation type, operation description and operation result;
d) It shall provide functions of statistics, query, analysis and report generation of system operation logs.
7.3.4 Data security
The data security requirements are as follows:
a) It shall provide data storage encryption and decryption functions, to support database-level data encryption;
b) It shall provide encrypted transmission function of system sensitive data, and the encryption key can be replaced;
c) It should support data encryption at the data column level.
7.4 Scalability requirements
The system scalability requirements are as follows:
a) It shall provide online cluster expansion and reduction functions;
b) It shall provide offline cluster expansion and reduction functions.
7.5 Maintainability requirements
The system maintainability requirements are as follows:

View full details