使用测量

本文档介绍了如何在图表上定义和查询测量。 您可以使用测量来确保在联接中正确执行聚合。

概览

“测量”是节点或边表的 PROPERTIES 子句中定义的聚合属性。 测量是使用 MEASURE 关键字和以下受支持的聚合函数之一定义的:

  • SUM
  • AVG
  • COUNT
  • COUNT(DISTINCT)
  • MIN
  • MAX

测量根据其定义的节点或边表的 KEY 定义其聚合。这意味着,当您查询测量时,即使基础表以导致行重复的方式联接,也会正确执行聚合。

您无法在 GQL 查询中引用由测量定义的属性。相反, 您可以通过调用 GRAPH_EXPAND TVF 来访问测量,以创建图表的扁平化表表示形式。此函数不接受所有类型的图表。如需详细了解 哪些图构成有效输入,请参阅输入限制

您可以对 GRAPH_EXPAND TVF 的输出调用 AGG 函数 ,以根据您在 测量中定义的聚合函数聚合 属性。

定义测量

您可以在节点或边表定义的 PROPERTIES子句 中使用 MEASURE() 关键字(围绕受支持的聚合函数)来定义测量。

以下示例创建了一个名为 university 的数据集,以及学院、系和课程的表:

CREATE SCHEMA IF NOT EXISTS university;

CREATE OR REPLACE TABLE university.College (
  college_id INT64 PRIMARY KEY NOT ENFORCED,
  college_name STRING
);

CREATE OR REPLACE TABLE university.Department (
  dept_id INT64 PRIMARY KEY NOT ENFORCED,
  dept_name STRING,
  college_id INT64,
  budget FLOAT64,
  FOREIGN KEY (college_id) REFERENCES university.College(college_id) NOT ENFORCED
);

CREATE OR REPLACE TABLE university.Course (
  course_id INT64 PRIMARY KEY NOT ENFORCED,
  course_name STRING,
  dept_id INT64,
  credits INT64,
  FOREIGN KEY (dept_id) REFERENCES university.Department(dept_id) NOT ENFORCED
);

INSERT INTO university.College (college_id, college_name)
VALUES (101, 'College of Engineering'),
      (102, 'College of Arts');

INSERT INTO university.Department (dept_id, dept_name, college_id, budget)
VALUES (1001, 'Computer Science', 101, 500000),
      (1002, 'Mechanical Engineering', 101, 400000),
      (1003, 'Fine Arts', 102, 200000),
      (1004, 'Research', 101, 50000);

INSERT INTO university.Course (course_id, course_name, dept_id, credits)
VALUES (1, 'Intro to CS', 1001, 3),
       (2, 'Data Structures', 1001, 4),
       (3, 'Thermodynamics', 1002, 3),
       (4, 'Oil Painting', 1003, 2);

以下语句创建了一个名为 SchoolGraph 的图表,该图表定义了 DepartmentCourse 节点中某些属性的测量。 您必须为由测量定义的属性提供别名。

CREATE OR REPLACE PROPERTY GRAPH university.SchoolGraph
  NODE TABLES (
    university.College
      KEY(college_id)
      PROPERTIES(college_id, college_name),
    university.Department
      KEY(dept_id)
      PROPERTIES(dept_id, dept_name, college_id,
        budget OPTIONS(description="Department budget in USD"),
        MEASURE(SUM(budget)) AS total_budget),
    university.Course
      KEY(course_id)
      PROPERTIES(
        course_id,
        course_name,
        credits,
        dept_id,
        MEASURE(AVG(credits)) AS avg_credits,
        MEASURE(SUM(credits)) AS total_credits,
        MEASURE(COUNT(course_id)) AS course_count)
  )
  EDGE TABLES (
    university.Department AS CollegeDept
      SOURCE KEY (college_id) REFERENCES College (college_id)
      DESTINATION KEY (dept_id) REFERENCES Department (dept_id),
    university.Course AS DeptCourse
      SOURCE KEY (dept_id) REFERENCES Department (dept_id)
      DESTINATION KEY (course_id) REFERENCES Course (course_id)
  );

SchoolGraph 图的可视化效果。

total_budget 测量定义为 MEASURE(SUM(budget))。该测量将聚合锁定到 KEY,即 dept_id

了解重复计数

当您联接表时,如果数据中存在一对多关系,则每次都会重复数据。例如,如果您联接 CourseDepartmentCollege 表,则具有多门课程的系会在输出中的多行中显示:

SELECT
  college_name,
  dept_name,
  course_name,
  budget
FROM university.Course
LEFT JOIN university.Department
  ON Course.dept_id = Department.dept_id
LEFT JOIN university.College
  ON Department.college_id = College.college_id;

/*------------------------+------------------------+-----------------+----------+
 | college_name           | dept_name              | course_name     | budget   |
 +------------------------+------------------------+-----------------+----------+
 | College of Engineering | Computer Science       | Intro to CS     | 500000.0 |
 | College of Engineering | Computer Science       | Data Structures | 500000.0 |
 | College of Engineering | Mechanical Engineering | Thermodynamics  | 400000.0 |
 | College of Arts        | Fine Arts              | Oil Painting    | 200000.0 |
 +------------------------+------------------------+-----------------+----------*/

如果您尝试使用 SUM(budget) 计算每个学院的总预算,则计算机科学系的预算会被计算两次。您可以直接查询 Department 表来避免此问题,但如果您想从有助于联接数据的不同表中计算多个聚合,则此方法不起作用。以下部分介绍了测量如何解决此问题。

查询具有测量的图表

您可以使用 GQL 查询具有测量的图表,但您的查询无法直接引用由测量定义的任何属性。例如,以下查询引用了定义了测量属性的节点,但未使用或返回任何测量字段:

GRAPH university.SchoolGraph
MATCH (c:College)-[]-(d:Department)-[]->(course:Course)
RETURN c.college_name, d.dept_name, course.course_name;

/*------------------------+------------------------+-----------------+
 | college_name           | dept_name              | course_name     |
 +------------------------+------------------------+-----------------+
 | College of Engineering | Computer Science       | Intro to CS     |
 | College of Engineering | Computer Science       | Data Structures |
 | College of Engineering | Mechanical Engineering | Thermodynamics  |
 | College of Arts        | Fine Arts              | Oil Painting    |
 +------------------------+------------------------+-----------------*/

使用测量

如需使用测量,您可以使用 GRAPH_EXPAND 表值函数 (TVF) 将图表查询为单个扁平化表。 输出表中的列派生自图表中为每个节点和边表定义的属性。为防止命名冲突,列的命名方式是将节点或边表标签与属性名称连接起来,例如 Course_course_nameDepartment_total_budget。以下查询显示了 GRAPH_EXPAND TVF 的一些示例输出:

SELECT
  College_college_name,
  Department_dept_name,
  Department_budget,
  Course_course_name
FROM
  GRAPH_EXPAND("university.SchoolGraph");

/*------------------------+------------------------+-------------------+--------------------+
 | College_college_name   | Department_dept_name   | Department_budget | Course_course_name |
 +------------------------+------------------------+-------------------+--------------------+
 | College of Engineering | Computer Science       | 500000.0          | Intro to CS        |
 | College of Engineering | Computer Science       | 500000.0          | Data Structures    |
 | College of Engineering | Mechanical Engineering | 400000.0          | Thermodynamics     |
 | College of Arts        | Fine Arts              | 200000.0          | Oil Painting       |
 +------------------------+------------------------+-------------------+--------------------+

GRAPH_EXPAND 函数通过对节点和边表应用一系列 LEFT JOIN 操作来生成扁平化表。有效的输入图表必须 恰好有一个根节点表,即其KEY值未 出现在任何其他表中的表。无法通过一系列联接从根节点表访问的数据不会显示在输出中。在前面的示例中,Course 表是根节点表。由于Research系 没有任何课程,因此从输出中省略了该系。

您无法直接为由测量定义的属性选择列。 相反,您必须将它们封装在 AGG()函数中。 此函数可确保对测量定义的聚合按每个键执行一次。

以下查询同时计算每个学院的总预算和课程总数:

SELECT
  College_college_name,
  AGG(Department_total_budget) AS college_budget,
  AGG(Course_course_count) AS total_courses
FROM
  GRAPH_EXPAND("university.SchoolGraph")
GROUP BY
  College_college_name;

/*------------------------+----------------+---------------+
 | College_college_name   | college_budget | total_courses |
 +------------------------+----------------+---------------+
 | College of Engineering | 900000.0       | 3             |
 | College of Arts        | 200000.0       | 1             |
 +------------------------+----------------+---------------*/

查看 GRAPH_EXPAND 架构

如需查看 GRAPH_EXPAND 函数返回的表的架构, 而无需调用该函数,请使用 BQ.SHOW_GRAPH_EXPAND_SCHEMA 系统过程

DECLARE schema STRING DEFAULT '';
CALL BQ.SHOW_GRAPH_EXPAND_SCHEMA('university.SchoolGraph', schema);
SELECT schema;

此过程会使用每个列的名称、类型和模式填充 schema 变量。它还会指明列引用的属性是否为测量,并列出您对其定义的任何说明或同义词。

后续步骤