Drizzle ORM - PostgreSQL 全文搜索

Drizzle | PostgreSQL 全文搜索

本指南假定您熟悉

开始使用 PostgreSQL
Select 语句
索引
sql 运算符
您应该使用 [email protected] 和 [email protected] 或更高版本。

本指南演示了如何使用 Drizzle ORM 在 PostgreSQL 中实现全文搜索。全文搜索是一种在文档或一组文档中搜索文本的技术。文档是全文搜索系统中的搜索单位。PostgreSQL 提供了一组用于全文搜索的函数，例如 to_tsvector 和 to_tsquery

to_tsvector 函数将文本文档解析为词元（token），将词元缩减为词素（lexeme），并返回一个 tsvector，其中列出了词素及其在文档中的位置。

import { sql } from 'drizzle-orm';

const db = drizzle(...);

await db.execute(
  sql`select to_tsvector('english', 'Guide to PostgreSQL full-text search with Drizzle ORM')`,
);

[
  {
    to_tsvector: "'drizzl':9 'full':5 'full-text':4
    'guid':1 'orm':10 'postgresql':3 'search':7 'text':6"
  }
]

to_tsquery 函数将关键词转换为规范化的词元，并返回一个 tsquery，它匹配 tsvector 中的词素。@@ 运算符用于直接匹配。

await db.execute(
  sql`select to_tsvector('english', 'Guide to PostgreSQL full-text search with Drizzle ORM')
    @@ to_tsquery('english', 'Drizzle') as match`,
);

[ { match: true } ]

截至目前，Drizzle 还不原生支持 tsvector 类型，因此您需要即时转换 text 列中的数据。为了提高性能，您可以像这样在您的列上创建 GIN 索引：

schema.ts

migration.sql

db_data

import { index, pgTable, serial, text } from 'drizzle-orm/pg-core';

export const posts = pgTable(
  'posts',
  {
    id: serial('id').primaryKey(),
    title: text('title').notNull(),
  },
  (table) => [
    index('title_search_index').using('gin', sql`to_tsvector('english', ${table.title})`),
  ]
);

CREATE TABLE IF NOT EXISTS "posts" (
        "id" serial PRIMARY KEY NOT NULL,
        "title" text NOT NULL
);

CREATE INDEX IF NOT EXISTS "title_search_index" ON "posts"
  USING gin (to_tsvector('english', "title"));

[
  { id: 1, title: 'Planning Your First Trip to Europe' },
  { id: 2, title: "Cultural Insights: Exploring Asia's Heritage" },
  { id: 3, title: 'Top 5 Destinations for a Family Trip' },
  { id: 4, title: 'Essential Hiking Gear for Mountain Enthusiasts' },
  { id: 5, title: 'Trip Planning: Choosing Your Next Destination' },
  { id: 6, title: 'Discovering Hidden Culinary Gems in Italy' },
  { id: 7, title: 'The Ultimate Road Trip Guide for Explorers' },
];

要在 PostgreSQL 中使用 Drizzle ORM 实现全文搜索，您可以结合 sql 运算符使用 to_tsvector 和 to_tsquery 函数。

import { sql } from 'drizzle-orm';
import { posts } from './schema';

const title = 'trip';

await db
  .select()
  .from(posts)
  .where(sql`to_tsvector('english', ${posts.title}) @@ to_tsquery('english', ${title})`);

[
  { id: 1, title: 'Planning Your First Trip to Europe' },
  { id: 3, title: 'Top 5 Destinations for a Family Trip' },
  { id: 5, title: 'Trip Planning: Choosing Your Next Destination' },
  { id: 7, title: 'The Ultimate Road Trip Guide for Explorers' }
]

要匹配任意关键词，您可以使用 | 运算符。

const title = 'Europe | Asia';

await db
  .select()
  .from(posts)
  .where(sql`to_tsvector('english', ${posts.title}) @@ to_tsquery('english', ${title})`);

[
  { id: 1, title: 'Planning Your First Trip to Europe' },
  { id: 2, title: "Cultural Insights: Exploring Asia's Heritage" }
]

要匹配多个关键词，您可以使用 plainto_tsquery 函数。

// 'discover & Italy'
const title = 'discover Italy';

await db
  .select()
  .from(posts)
  .where(sql`to_tsvector('english', ${posts.title}) @@ plainto_tsquery('english', ${title})`);

select * from posts
  where to_tsvector('english', title) @@ plainto_tsquery('english', 'discover Italy');

[ { id: 6, title: 'Discovering Hidden Culinary Gems in Italy' } ]

要匹配短语，您可以使用 phraseto_tsquery 函数。

// if you query by "trip family", it will not return any result
// 'family <-> trip'
const title = 'family trip';

await db
  .select()
  .from(posts)
  .where(sql`to_tsvector('english', ${posts.title}) @@ phraseto_tsquery('english', ${title})`);

select * from posts
  where to_tsvector('english', title) @@ phraseto_tsquery('english', 'family trip');

[ { id: 3, title: 'Top 5 Destinations for a Family Trip' } ]

您也可以使用 websearch_to_tsquery 函数，它是 to_tsquery 的简化版本，采用类似于网络搜索引擎使用的替代语法。

// 'family | first & trip & europ | asia'
const title = 'family or first trip Europe or Asia';

await db
  .select()
  .from(posts)
  .where(sql`to_tsvector('english', ${posts.title}) @@ websearch_to_tsquery('english', ${title})`);

select * from posts
  where to_tsvector('english', title)
  @@ websearch_to_tsquery('english', 'family or first trip Europe or Asia');

[
  { id: 1, title: 'Planning Your First Trip to Europe' },
  { id: 2, title: "Cultural Insights: Exploring Asia's Heritage" },
  { id: 3, title: 'Top 5 Destinations for a Family Trip' }
]

要在多列上实现全文搜索，您可以创建多列索引并使用 to_tsvector 函数连接这些列。

schema.ts

migration.sql

db_data

import { sql } from 'drizzle-orm';
import { index, pgTable, serial, text } from 'drizzle-orm/pg-core';

export const posts = pgTable(
  'posts',
  {
    id: serial('id').primaryKey(),
    title: text('title').notNull(),
    description: text('description').notNull(),
  },
  (table) => [
    index('search_index').using(
      'gin',
      sql`(
          setweight(to_tsvector('english', ${table.title}), 'A') ||
          setweight(to_tsvector('english', ${table.description}), 'B')
      )`,
    ),
  ],
);

CREATE TABLE IF NOT EXISTS "posts" (
      "id" serial PRIMARY KEY NOT NULL,
      "title" text NOT NULL,
      "description" text NOT NULL
);

CREATE INDEX IF NOT EXISTS "search_index" ON "posts"
  USING gin ((setweight(to_tsvector('english', "title"), 'A') ||
  setweight(to_tsvector('english', "description"), 'B')));

[
  {
    id: 1,
    title: 'Planning Your First Trip to Europe',
    description:
      'Get essential tips on budgeting, sightseeing, and cultural etiquette for your inaugural European adventure.',
  },
  {
    id: 2,
    title: "Cultural Insights: Exploring Asia's Heritage",
    description:
      'Dive deep into the rich history and traditions of Asia through immersive experiences and local interactions.',
  },
  {
    id: 3,
    title: 'Top 5 Destinations for a Family Trip',
    description:
      'Discover family-friendly destinations that offer fun, education, and relaxation for all ages.',
  },
  {
    id: 4,
    title: 'Essential Hiking Gear for Mountain Enthusiasts',
    description:
      'Equip yourself with the latest and most reliable gear for your next mountain hiking expedition.',
  },
  {
    id: 5,
    title: 'Trip Planning: Choosing Your Next Destination',
    description:
      'Learn how to select destinations that align with your travel goals, whether for leisure, adventure, or cultural exploration.',
  },
  {
    id: 6,
    title: 'Discovering Hidden Culinary Gems in Italy',
    description:
      "Unearth Italy's lesser-known eateries and food markets that offer authentic and traditional flavors.",
  },
  {
    id: 7,
    title: 'The Ultimate Road Trip Guide for Explorers',
    description:
      'Plan your next great road trip with tips on route planning, packing, and discovering off-the-beaten-path attractions.',
  },
];

setweight 函数用于为 tsvector 的条目标记给定权重，权重是字母 A、B、C 或 D 之一。这通常用于标记来自文档不同部分的条目，例如标题与正文。

您可以通过以下方式在多列上进行查询：

const title = 'plan';

await db.select().from(posts)
  .where(sql`(
      setweight(to_tsvector('english', ${posts.title}), 'A') ||
      setweight(to_tsvector('english', ${posts.description}), 'B'))
      @@ to_tsquery('english', ${title}
    )`
  );

[
  {
    id: 1,
    title: 'Planning Your First Trip to Europe',
    description: 'Get essential tips on budgeting, sightseeing, and cultural etiquette for your inaugural European adventure.'
  },
  {
    id: 5,
    title: 'Trip Planning: Choosing Your Next Destination',
    description: 'Learn how to select destinations that align with your travel goals, whether for leisure, adventure, or cultural exploration.'
  },
  {
    id: 7,
    title: 'The Ultimate Road Trip Guide for Explorers',
    description: 'Plan your next great road trip with tips on route planning, packing, and discovering off-the-beaten-path attractions.'
  }
]

要对搜索结果进行排名，您可以使用 ts_rank 或 ts_rank_cd 函数以及 orderBy 方法。

import { desc, getTableColumns, sql } from 'drizzle-orm';

const search = 'culture | Europe | Italy | adventure';

const matchQuery = sql`(
  setweight(to_tsvector('english', ${posts.title}), 'A') ||
  setweight(to_tsvector('english', ${posts.description}), 'B')), to_tsquery('english', ${search})`;

await db
  .select({
    ...getTableColumns(posts),
    rank: sql`ts_rank(${matchQuery})`,
    rankCd: sql`ts_rank_cd(${matchQuery})`,
  })
  .from(posts)
  .where(
    sql`(
      setweight(to_tsvector('english', ${posts.title}), 'A') ||
      setweight(to_tsvector('english', ${posts.description}), 'B')
      ) @@ to_tsquery('english', ${search})`,
  )
  .orderBy((t) => desc(t.rank));

[
  {
    id: 1,
    title: 'Planning Your First Trip to Europe',
    description: 'Get essential tips on budgeting, sightseeing, and cultural etiquette for your inaugural European adventure.',
    rank: 0.2735672,
    rankCd: 1.8
  },
  {
    id: 6,
    title: 'Discovering Hidden Culinary Gems in Italy',
    description: "Unearth Italy's lesser-known eateries and food markets that offer authentic and traditional flavors.",
    rank: 0.16717994,
    rankCd: 1.4
  },
  {
    id: 2,
    title: "Cultural Insights: Exploring Asia's Heritage",
    description: 'Dive deep into the rich history and traditions of Asia through immersive experiences and local interactions.',
    rank: 0.15198177,
    rankCd: 1
  },
  {
    id: 5,
    title: 'Trip Planning: Choosing Your Next Destination',
    description: 'Learn how to select destinations that align with your travel goals, whether for leisure, adventure, or cultural exploration.',
    rank: 0.12158542,
    rankCd: 0.8
  }
]

ts_rank 函数侧重于查询词在整个文档中的频率。ts_rank_cd 函数侧重于查询词在文档中的接近度。