使用测量
本文档介绍了如何在图表上定义和查询测量。 您可以使用测量来确保在联接中正确执行聚合。
概览
“测量”是节点或边表的 PROPERTIES 子句中定义的聚合属性。 测量是使用 MEASURE 关键字和以下受支持的聚合函数之一定义的:
SUMAVGCOUNTCOUNT(DISTINCT)MINMAX
测量根据其定义的节点或边表的 KEY 定义其聚合。这意味着,当您查询测量时,即使基础表以导致行重复的方式联接,也会正确执行聚合。
您无法在 GQL 查询中引用由测量定义的属性。相反,
您可以通过调用
GRAPH_EXPAND TVF
来访问测量,以创建图表的扁平化表表示形式。此函数不接受所有类型的图表。如需详细了解
哪些图构成有效输入,请参阅输入限制。
您可以对 GRAPH_EXPAND TVF 的输出调用
AGG 函数
,以根据您在
测量中定义的聚合函数聚合
属性。
定义测量
您可以在节点或边表定义的
PROPERTIES子句
中使用 MEASURE() 关键字(围绕受支持的聚合函数)来定义测量。
以下示例创建了一个名为 university 的数据集,以及学院、系和课程的表:
CREATE SCHEMA IF NOT EXISTS university;
CREATE OR REPLACE TABLE university.College (
college_id INT64 PRIMARY KEY NOT ENFORCED,
college_name STRING
);
CREATE OR REPLACE TABLE university.Department (
dept_id INT64 PRIMARY KEY NOT ENFORCED,
dept_name STRING,
college_id INT64,
budget FLOAT64,
FOREIGN KEY (college_id) REFERENCES university.College(college_id) NOT ENFORCED
);
CREATE OR REPLACE TABLE university.Course (
course_id INT64 PRIMARY KEY NOT ENFORCED,
course_name STRING,
dept_id INT64,
credits INT64,
FOREIGN KEY (dept_id) REFERENCES university.Department(dept_id) NOT ENFORCED
);
INSERT INTO university.College (college_id, college_name)
VALUES (101, 'College of Engineering'),
(102, 'College of Arts');
INSERT INTO university.Department (dept_id, dept_name, college_id, budget)
VALUES (1001, 'Computer Science', 101, 500000),
(1002, 'Mechanical Engineering', 101, 400000),
(1003, 'Fine Arts', 102, 200000),
(1004, 'Research', 101, 50000);
INSERT INTO university.Course (course_id, course_name, dept_id, credits)
VALUES (1, 'Intro to CS', 1001, 3),
(2, 'Data Structures', 1001, 4),
(3, 'Thermodynamics', 1002, 3),
(4, 'Oil Painting', 1003, 2);
以下语句创建了一个名为 SchoolGraph 的图表,该图表定义了 Department 和 Course 节点中某些属性的测量。
您必须为由测量定义的属性提供别名。
CREATE OR REPLACE PROPERTY GRAPH university.SchoolGraph
NODE TABLES (
university.College
KEY(college_id)
PROPERTIES(college_id, college_name),
university.Department
KEY(dept_id)
PROPERTIES(dept_id, dept_name, college_id,
budget OPTIONS(description="Department budget in USD"),
MEASURE(SUM(budget)) AS total_budget),
university.Course
KEY(course_id)
PROPERTIES(
course_id,
course_name,
credits,
dept_id,
MEASURE(AVG(credits)) AS avg_credits,
MEASURE(SUM(credits)) AS total_credits,
MEASURE(COUNT(course_id)) AS course_count)
)
EDGE TABLES (
university.Department AS CollegeDept
SOURCE KEY (college_id) REFERENCES College (college_id)
DESTINATION KEY (dept_id) REFERENCES Department (dept_id),
university.Course AS DeptCourse
SOURCE KEY (dept_id) REFERENCES Department (dept_id)
DESTINATION KEY (course_id) REFERENCES Course (course_id)
);

total_budget 测量定义为 MEASURE(SUM(budget))。该测量将聚合锁定到 KEY,即 dept_id。
了解重复计数
当您联接表时,如果数据中存在一对多关系,则每次都会重复数据。例如,如果您联接 Course、Department 和 College
表,则具有多门课程的系会在输出中的多行中显示:
SELECT
college_name,
dept_name,
course_name,
budget
FROM university.Course
LEFT JOIN university.Department
ON Course.dept_id = Department.dept_id
LEFT JOIN university.College
ON Department.college_id = College.college_id;
/*------------------------+------------------------+-----------------+----------+
| college_name | dept_name | course_name | budget |
+------------------------+------------------------+-----------------+----------+
| College of Engineering | Computer Science | Intro to CS | 500000.0 |
| College of Engineering | Computer Science | Data Structures | 500000.0 |
| College of Engineering | Mechanical Engineering | Thermodynamics | 400000.0 |
| College of Arts | Fine Arts | Oil Painting | 200000.0 |
+------------------------+------------------------+-----------------+----------*/
如果您尝试使用 SUM(budget) 计算每个学院的总预算,则计算机科学系的预算会被计算两次。您可以直接查询 Department
表来避免此问题,但如果您想从有助于联接数据的不同表中计算多个聚合,则此方法不起作用。以下部分介绍了测量如何解决此问题。
查询具有测量的图表
您可以使用 GQL 查询具有测量的图表,但您的查询无法直接引用由测量定义的任何属性。例如,以下查询引用了定义了测量属性的节点,但未使用或返回任何测量字段:
GRAPH university.SchoolGraph
MATCH (c:College)-[]-(d:Department)-[]->(course:Course)
RETURN c.college_name, d.dept_name, course.course_name;
/*------------------------+------------------------+-----------------+
| college_name | dept_name | course_name |
+------------------------+------------------------+-----------------+
| College of Engineering | Computer Science | Intro to CS |
| College of Engineering | Computer Science | Data Structures |
| College of Engineering | Mechanical Engineering | Thermodynamics |
| College of Arts | Fine Arts | Oil Painting |
+------------------------+------------------------+-----------------*/
使用测量
如需使用测量,您可以使用
GRAPH_EXPAND
表值函数 (TVF)
将图表查询为单个扁平化表。
输出表中的列派生自图表中为每个节点和边表定义的属性。为防止命名冲突,列的命名方式是将节点或边表标签与属性名称连接起来,例如 Course_course_name 或
Department_total_budget。以下查询显示了 GRAPH_EXPAND TVF 的一些示例输出:
SELECT
College_college_name,
Department_dept_name,
Department_budget,
Course_course_name
FROM
GRAPH_EXPAND("university.SchoolGraph");
/*------------------------+------------------------+-------------------+--------------------+
| College_college_name | Department_dept_name | Department_budget | Course_course_name |
+------------------------+------------------------+-------------------+--------------------+
| College of Engineering | Computer Science | 500000.0 | Intro to CS |
| College of Engineering | Computer Science | 500000.0 | Data Structures |
| College of Engineering | Mechanical Engineering | 400000.0 | Thermodynamics |
| College of Arts | Fine Arts | 200000.0 | Oil Painting |
+------------------------+------------------------+-------------------+--------------------+
GRAPH_EXPAND 函数通过对节点和边表应用一系列 LEFT JOIN 操作来生成扁平化表。有效的输入图表必须
恰好有一个根节点表,即其KEY值未
出现在任何其他表中的表。无法通过一系列联接从根节点表访问的数据不会显示在输出中。在前面的示例中,Course
表是根节点表。由于Research系
没有任何课程,因此从输出中省略了该系。
您无法直接为由测量定义的属性选择列。
相反,您必须将它们封装在
AGG()函数中。
此函数可确保对测量定义的聚合按每个键执行一次。
以下查询同时计算每个学院的总预算和课程总数:
SELECT
College_college_name,
AGG(Department_total_budget) AS college_budget,
AGG(Course_course_count) AS total_courses
FROM
GRAPH_EXPAND("university.SchoolGraph")
GROUP BY
College_college_name;
/*------------------------+----------------+---------------+
| College_college_name | college_budget | total_courses |
+------------------------+----------------+---------------+
| College of Engineering | 900000.0 | 3 |
| College of Arts | 200000.0 | 1 |
+------------------------+----------------+---------------*/
查看 GRAPH_EXPAND 架构
如需查看 GRAPH_EXPAND 函数返回的表的架构,
而无需调用该函数,请使用
BQ.SHOW_GRAPH_EXPAND_SCHEMA 系统过程:
DECLARE schema STRING DEFAULT '';
CALL BQ.SHOW_GRAPH_EXPAND_SCHEMA('university.SchoolGraph', schema);
SELECT schema;
此过程会使用每个列的名称、类型和模式填充 schema 变量。它还会指明列引用的属性是否为测量,并列出您对其定义的任何说明或同义词。