Page tree
Skip to end of metadata
Go to start of metadata

Generally

Most databases nowadays store data in UTF format. In this case you can use these symbols directly in your query.

Example:

SELECT 

    sa_name AS 中國條約

FROM 

    folio_agreements.subscription_agreement sa 

WHERE 

    sa_name = '中國條約'

In order to be able to save this query, the file must be saved in UTF format, otherwise the symbols will not be encoded correctly.

Useful / advanced functions

In order to query the different symbols better, combinations of PSQL functions can be used. The functions can be found in the official documentation from postgres.

Here is a small selection:

The trick is to convert the symbols into a uniform system and thus enable the basis for a better quering. Here it is advisable to convert the symbols into a HEX value. Each symbol has a HEX value. PSQL offers functions for this. PSQL offers the simplest option with the data type bytea.

sa_name::bytea

If you look at the attribute value in Dbeaver, you get the following view. A hexviewer.

The address area is on the left, the hexadecimal area in the middle and the character area on the right. If a symbol is marked in the Hexviewer, the corresponding value can be read off.

Examples

1. Replace the space with another symbol. Space = 20 , the other symbol = E4B8AD (Tables: https://www.utf8-chartable.de/unicode-utf8-table.pl)

convert_from(decode(encode(concat('\x', replace(encode('A B', 'hex'), '20', 'E4B8AD'))::bytea, 'escape'), 'escape'), 'UTF-8')::varchar

2. Find all agreements that have non-Latin symbols in their names

SELECT * FROM 

(

SELECT 

    sa_name, 

    /*

     * 1. delete all latin script symbols with the functions translate() and replace()

     * 2. look at the HEX value at the beginning. If they start with a letter, then you have a title with non-latin symbols

     */

    length(REPLACE(REPLACE(REPLACE(regexp_matches(encode(REPLACE(TRANSLATE(sa_name, 'abcdefghijklmnopqrstuvwxyzäöü+()- ,#+[]ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')::bytea, 'hex'), '[a-f]*')::varchar, '{', ''), '}', ''), '"', '')) AS len

FROM 

    folio_agreements.subscription_agreement sa

) AS t

WHERE t.len > 0


  • No labels