US20040049502A1 - Method of indexing and searching feature vector space - Google Patents

Method of indexing and searching feature vector space Download PDF

Info

Publication number
US20040049502A1
US20040049502A1 US10/658,552 US65855203A US2004049502A1 US 20040049502 A1 US20040049502 A1 US 20040049502A1 US 65855203 A US65855203 A US 65855203A US 2004049502 A1 US2004049502 A1 US 2004049502A1
Authority
US
United States
Prior art keywords
node
approximation
indexing
distance
special
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/658,552
Inventor
Yang-lim Choi
Youngsik Huh
B. Manjunath
Shiv Chandrasekaran
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
University of California
Original Assignee
Samsung Electronics Co Ltd
University of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd, University of California filed Critical Samsung Electronics Co Ltd
Priority to US10/658,552 priority Critical patent/US20040049502A1/en
Publication of US20040049502A1 publication Critical patent/US20040049502A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • Y10S707/99945Object-oriented database structure processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99948Application of database or data structure, e.g. distributed, multimedia, or image

Definitions

  • the present invention relates to a method of indexing a feature vector space, and more particularly, to a method of indexing a high-dimensional feature vector space. Furthermore, the present invention relates to a method of quickly searching the feature vector space indexed by the indexing method for a feature vector having features similar to a query vector.
  • Feature elements such as color or texture of images or motion pictures can be represented by vectors. These vectors are called feature vectors.
  • the feature vectors are indexed in a vector space in which the feature vectors exist such that a feature vector having features similar to a query vector can be found.
  • indexing method for a feature vector space data partitioning and space divisional schemes based on a tree data structure, such as an R tree or an X tree, are utilized in indexing a low-dimensional feature vector space. Furthermore, a vector approximation (VA) approach is utilized in indexing a high-dimensional feature vector space by assuming vectors having similar features belong to the same hypercube.
  • VA vector approximation
  • Indexing of feature vectors in a high-dimensional feature vector space may expend much time in retrieving from the feature vector space a feature vector similar to a query vector. Thus, there still remains a need for an indexing method for reducing the required retrieval time.
  • the present invention provides a method of indexing a feature vector space including the step of indexing an approximation region in which feature vector elements are sparsely distributed as one special node belonging to a child node of the tree structure, together with any other sparsely distributed approximation region spaced apart by a distance less than a predetermined distance.
  • the present invention also provides a method of indexing a feature vector space including the steps of (a) partitioning the feature vector space into a plurality of approximation regions, (b) selecting an arbitrary approximation region to determine whether the selected approximation region is heavily or sparsely distributed, and (c) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree structure, together with another sparsely distributed approximation region spaced apart by a distance less than a predetermined distance.
  • the steps (b) and (c) are repeatedly performed on all approximation regions partitioned in the step (a).
  • the indexing method further includes the step of (c-1) if the approximation region selected in the step (b) is determined to be heavily distributed, indexing the corresponding approximation region as an ordinary node, partitioning the corresponding approximation region into a plurality of sub-approximation regions, and repeating the step (b) for the partitioned sub-approximation regions.
  • the indexing method further includes the steps of (d) determining whether all approximation regions are indexed as special nodes, (e) if all approximation regions are not indexed as special nodes, selecting the next approximation region and performing the steps after (b) on the approximation region repeatedly, and (f) if all approximation regions are indexed as special nodes, completing the indexing.
  • the plurality of approximation regions may be subspaces used in random indexing. Alternatively, the plurality of approximation regions may be subspaces used in multi-dimensional scaling (MDS), Fast-map, or locality sensitive hashing.
  • the step (c) includes the step of (c′) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree structure together with an adjacent sparsely distributed approximation region.
  • the present invention also provides a method of retrieving a feature vector having features similar to a query vector from a vector space indexed by an indexing method using a tree structure including the step of indexing an approximation region in which feature vector elements are sparsely distributed as one special node belonging to a child node of the tree structure, together with another sparsely distributed approximation region spaced apart by a distance less than a predetermined distance.
  • the retrieval method includes the steps of (a) determining a special node to which the query vector belongs, (b) setting the distance between an element of the query vector and an element in an approximation region corresponding to the determined special node, which is the closest to the element of the query vector, as a first threshold value, and (c) excluding all child nodes of the corresponding node if the distance between the query vector and the approximation region indexed as an ordinary node is greater than or equal to the first threshold value.
  • FIG. 1 is a flowchart showing the main steps of a method of indexing a feature vector space according to an embodiment of the present invention
  • FIG. 2 shows an example of a feature vector space indexed by the method shown in FIG. 1;
  • FIG. 4 is a flowchart showing the main steps of a method of retrieving from the indexed feature vector space of FIG. 1 a feature vector having features similar to a query vector.
  • the feature vector space may be partitioned into subspaces used in more improved indexing schemes such as random indexing, multi-dimensional scaling, Fast-map, and locality sensitive hashing.
  • the partitioned hypercubes are indexed as child nodes of a root node using a tree structure (step 104 ).
  • the tree structure basically has a root node and a plurality of child nodes branched from the root node.
  • FIG. 3 shows an example of a tree data structure for indexing the feature vector space of FIG. 2.
  • an arbitrary hypercube which is one of child nodes of a root node 302 , is selected to determine whether the hypercube is heavily or sparsely populated (step 106 ).
  • the root node 302 is an initial hypercube [0,1] n
  • the child nodes of the root node are sub-cubes of [0,1] n .
  • the root node 302 is considered to be in level 0.
  • the child nodes of the root node 302 are in level 1, and the child nodes of the child node of the root node are in level 2.
  • the corresponding hypercube is indexed as an ordinary node (step 108 ) and partitioned into a plurality of hypercubes (step 110 ).
  • the selected hypercube is determined in the step 106 to be sparsely populated, the corresponding hypercube is indexed as one special node together with adjacent sparsely populated hypercubes (step 120 ).
  • a hypercube 202 determined to be sparsely populated, and a hypercube 204 are indexed as a special node 304 .
  • a hypercube 206 determined to be heavily populated and a hypercube 208 are indexed as ordinary nodes 306 and 308 , respectively.
  • the hypercube determined to be heavily populated is partitioned into a plurality of sub-hypercubes 206 1 , 206 2 , 206 3 , and 206 4 . It is then determined whether the partitioned sub-hypercubes 206 1 , 206 2 , 206 3 , and 206 4 are heavily or sparsely populated.
  • the hypercube 208 determined to be heavily populated is partitioned into a plurality of sub-hypercubes 208 1 , 208 2 , 208 3 , and 208 4 .
  • the hypercube 208 1 is partitioned into a plurality of sub-hypercubes again, in which case all partitioned cubes are determined to be sparsely populated and thus indexed as an ordinary node 318 .
  • the cubes 208 2 , 208 3 , and 208 4 determined to be sparsely populated are indexed as a special node 320 .
  • FIG. 4 is a flowchart showing the main steps of the method for the feature vector space shown in FIG. 1.
  • a query vector in the feature vector space R n corresponding to the database D is denoted by q
  • the query vector q lies in a feature space defined as [0,1] n where n is a positive integer denoting the dimensionality of a feature vector space for describing feature vectors.
  • the following steps are performed so as to find an element of the database which is the closest to the query vector q, that is, the feature vector.
  • arbitrary interval a ⁇ b may be represented as pair (a, b), and hyper cube ⁇ 1 ⁇ i ⁇ n ⁇ ( a i , b i )
  • a hypercube space corresponding to the node is converted into a low-dimensional space (step 420 ). That is, if the query vector q is determined to lie in a sparsely populated node in the step 410 , the query vector q is projected into a low-dimensional subspace corresponding to the query vector q.
  • elements that satisfy d sp ⁇ e are determined to be candidate elements (step 424 ).
  • e be updated with d sp if an element that satisfies d sp ⁇ e exists (step 426 ).
  • an index or search structure be designed such that only one element having features most similar to the query vector q can be quickly found.
  • a predetermined threshold value e which is determined to be the closest distance between the query vector q and an element in a hypercube corresponding to the special node to which the query vector q belongs.
  • a high-dimensional feature vector space indexed by the method of indexing a feature vector space according to the present invention can support functions such as a similarity search, retrieval or browsing in a salable and efficient manner.
  • functions such as a similarity search, retrieval or browsing in a salable and efficient manner.
  • the method of indexing and searching a feature vector space according to the present invention can be written as a program executed on a personal or server computer.
  • the program codes and code segments constructing the program can be easily inferred by computer programmers in the industry.
  • the program can be stored in a computer-readable recording medium.
  • the recording medium includes a magnetic recording medium, an optical recording medium, and a radio medium.

Abstract

A method of indexing a high-dimensional vector space, along with a method of quickly retrieving a feature vector having features similar to a query vector from the vector space indexed by the indexing method, are provided. The method of indexing a feature vector space includes the steps of (a) partitioning the feature vector space into a plurality of approximation regions; (b) selecting an arbitrary approximation region to determine whether the selected approximation region is heavily or sparsely distributed; and (c) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree data structure, together with any other sparsely distributed approximation region spaced apart by a distance less than a predetermined distance.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates to a method of indexing a feature vector space, and more particularly, to a method of indexing a high-dimensional feature vector space. Furthermore, the present invention relates to a method of quickly searching the feature vector space indexed by the indexing method for a feature vector having features similar to a query vector. [0002]
  • The present application is based on Korean Patent Application No. 00-79180 filed on Dec. 20, 2000, and upon U.S. Provisional Application No. 60/252,391 filed on Nov. 15, 2000, both of which are incorporated herein by reference. [0003]
  • 2. Description of the Related Art [0004]
  • Feature elements such as color or texture of images or motion pictures can be represented by vectors. These vectors are called feature vectors. The feature vectors are indexed in a vector space in which the feature vectors exist such that a feature vector having features similar to a query vector can be found. [0005]
  • According to an ordinary indexing method for a feature vector space, data partitioning and space divisional schemes based on a tree data structure, such as an R tree or an X tree, are utilized in indexing a low-dimensional feature vector space. Furthermore, a vector approximation (VA) approach is utilized in indexing a high-dimensional feature vector space by assuming vectors having similar features belong to the same hypercube. [0006]
  • Indexing of feature vectors in a high-dimensional feature vector space according to this ordinary indexing method based on VA, however, may expend much time in retrieving from the feature vector space a feature vector similar to a query vector. Thus, there still remains a need for an indexing method for reducing the required retrieval time. [0007]
  • SUMMARY OF THE INVENTION
  • To solve the above problems, it is an object of the present invention to provide a method of indexing a feature vector space that can reduce a time required for retrieving a feature vector similar to a query vector by adaptively indexing the feature vector space according to the density of feature vectors. [0008]
  • It is another object of the present invention to provide a method for quickly retrieving from this indexed vector space a feature vector having features similar to a query vector. [0009]
  • Accordingly, to achieve the above objects, the present invention provides a method of indexing a feature vector space including the step of indexing an approximation region in which feature vector elements are sparsely distributed as one special node belonging to a child node of the tree structure, together with any other sparsely distributed approximation region spaced apart by a distance less than a predetermined distance. [0010]
  • The present invention also provides a method of indexing a feature vector space including the steps of (a) partitioning the feature vector space into a plurality of approximation regions, (b) selecting an arbitrary approximation region to determine whether the selected approximation region is heavily or sparsely distributed, and (c) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree structure, together with another sparsely distributed approximation region spaced apart by a distance less than a predetermined distance. Preferably, the steps (b) and (c) are repeatedly performed on all approximation regions partitioned in the step (a). [0011]
  • Furthermore, prior to the step (c), the indexing method further includes the step of (c-1) if the approximation region selected in the step (b) is determined to be heavily distributed, indexing the corresponding approximation region as an ordinary node, partitioning the corresponding approximation region into a plurality of sub-approximation regions, and repeating the step (b) for the partitioned sub-approximation regions. [0012]
  • After the step (c), the indexing method further includes the steps of (d) determining whether all approximation regions are indexed as special nodes, (e) if all approximation regions are not indexed as special nodes, selecting the next approximation region and performing the steps after (b) on the approximation region repeatedly, and (f) if all approximation regions are indexed as special nodes, completing the indexing. The plurality of approximation regions may be subspaces used in random indexing. Alternatively, the plurality of approximation regions may be subspaces used in multi-dimensional scaling (MDS), Fast-map, or locality sensitive hashing. [0013]
  • The step (c) includes the step of (c′) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree structure together with an adjacent sparsely distributed approximation region. [0014]
  • The present invention also provides a method of retrieving a feature vector having features similar to a query vector from a vector space indexed by an indexing method using a tree structure including the step of indexing an approximation region in which feature vector elements are sparsely distributed as one special node belonging to a child node of the tree structure, together with another sparsely distributed approximation region spaced apart by a distance less than a predetermined distance. The retrieval method includes the steps of (a) determining a special node to which the query vector belongs, (b) setting the distance between an element of the query vector and an element in an approximation region corresponding to the determined special node, which is the closest to the element of the query vector, as a first threshold value, and (c) excluding all child nodes of the corresponding node if the distance between the query vector and the approximation region indexed as an ordinary node is greater than or equal to the first threshold value.[0015]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above objects and advantages of the present invention will become more apparent by describing in detail preferred embodiments thereof with reference to the attached drawings in which: [0016]
  • FIG. 1 is a flowchart showing the main steps of a method of indexing a feature vector space according to an embodiment of the present invention; [0017]
  • FIG. 2 shows an example of a feature vector space indexed by the method shown in FIG. 1; [0018]
  • FIG. 3 shows an example of a tree data structure for indexing the feature vector space of FIG. 2; and [0019]
  • FIG. 4 is a flowchart showing the main steps of a method of retrieving from the indexed feature vector space of FIG. 1 a feature vector having features similar to a query vector.[0020]
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 1 shows a method of indexing a feature vector space according to an embodiment of the present invention. The feature vector space is first partitioned into a plurality of hypercubes (step [0021] 102). FIG. 2 shows an example of a feature vector space indexed according to the indexing method of FIG. 2. Meanwhile, feature vectors in the feature vector space may be represented by an n-dimensional vector where n is a positive integer. In other words, assuming a database including N feature vectors is D where N is a positive integer, the feature vectors may be represented by an n-dimensional vector in the vector space Rn.
  • Although this embodiment is described with reference to an example in which the feature vector space is partitioned into a plurality of hypercubes, the feature vector space may be partitioned into subspaces used in more improved indexing schemes such as random indexing, multi-dimensional scaling, Fast-map, and locality sensitive hashing. [0022]
  • Next, the partitioned hypercubes are indexed as child nodes of a root node using a tree structure (step [0023] 104). The tree structure basically has a root node and a plurality of child nodes branched from the root node. FIG. 3 shows an example of a tree data structure for indexing the feature vector space of FIG. 2.
  • Next, an arbitrary hypercube, which is one of child nodes of a [0024] root node 302, is selected to determine whether the hypercube is heavily or sparsely populated (step 106). Here, the root node 302 is an initial hypercube [0,1]n, and the child nodes of the root node are sub-cubes of [0,1]n. The root node 302 is considered to be in level 0. Furthermore, the child nodes of the root node 302 are in level 1, and the child nodes of the child node of the root node are in level 2.
  • If the selected hypercube is determined in the [0025] step 106 to be heavily populated, the corresponding hypercube is indexed as an ordinary node (step 108) and partitioned into a plurality of hypercubes (step 110). On the other hand, if the selected hypercube is determined in the step 106 to be sparsely populated, the corresponding hypercube is indexed as one special node together with adjacent sparsely populated hypercubes (step 120). Although this embodiment has been described with reference to an example in which the corresponding hypercube determined to be sparsely populated is indexed as one special node together with adjacent sparsely populated hypercubes, it is possible to index this hypercube as one special node together with another sparsely populated hypercube, which is spaced apart by a distance less than a predetermined distance. Next, when the corresponding hypercube is partitioned into a plurality of hypercubes, the steps after step 106 are repeatedly performed on each of the partitioned hypercubes.
  • In this embodiment, referring to FIGS. 2 and 3, a [0026] hypercube 202 determined to be sparsely populated, and a hypercube 204 are indexed as a special node 304. On the other hand, a hypercube 206 determined to be heavily populated and a hypercube 208 are indexed as ordinary nodes 306 and 308, respectively. Also, the hypercube determined to be heavily populated is partitioned into a plurality of sub-hypercubes 206 1, 206 2, 206 3, and 206 4. It is then determined whether the partitioned sub-hypercubes 206 1, 206 2, 206 3, and 206 4 are heavily or sparsely populated. The hypercube 206 4 determined to be heavily populated is indexed as an ordinary node 310. On the other hand, the hypercubes 206 1, 206 2, and 206 3 determined to be sparsely populated are indexed as one special node 312. The hypercube 206 4 determined to be heavily populated is partitioned into a plurality of sub-hypercubes 206 4 1, 206 4 2, 206 4 3, and 206 4 4. Next, it is determined whether the partitioned sub-hypercubes 206 4 1, 206 4 2, 206 4 3, and 206 4 4 are heavily or sparsely populated and the cube 206 4 1 determined to be heavily populated is indexed as an ordinary node and is partitioned into a plurality of sub-hypercubes again, in which case all partitioned cubes are determined to be sparsely populated and thus indexed as a special node 314. On the other hand, the cubes 206 4 2, 206 4 3, and 206 4 4 are determined to be sparsely populated and are indexed as a special node 316.
  • Subsequent to the [0027] step 120, it is determined whether all hypercubes are indexed as special nodes (step 130). If it is determined that all hypercubes are not indexed as special nodes, the next hypercube is selected (step 132) to perform the steps 106 and subsequent steps repeatedly. On the other hand, if it is determined that all hypercubes are indexed as special nodes, indexing is finished.
  • That is, returning to FIG. 2, the hypercube [0028] 208 determined to be heavily populated is partitioned into a plurality of sub-hypercubes 208 1, 208 2, 208 3, and 208 4. Next, it is determined whether the partitioned sub-hypercubes 208 1, 208 2, 208 3, and 208 4 are heavily or sparsely populated, and the hypercube 208 1 determined to be heavily populated is indexed as an ordinary node. However, the hypercube 208 1 is partitioned into a plurality of sub-hypercubes again, in which case all partitioned cubes are determined to be sparsely populated and thus indexed as an ordinary node 318. On the other hand, the cubes 208 2, 208 3, and 208 4 determined to be sparsely populated are indexed as a special node 320.
  • A method of retrieving a feature vector having features similar to a query vector in the feature vector space indexed according to the indexing method described above will now be described. FIG. 4 is a flowchart showing the main steps of the method for the feature vector space shown in FIG. 1. Hereinafter, a query vector in the feature vector space R[0029] n corresponding to the database D is denoted by q, and it is assumed that the query vector q lies in a feature space defined as [0,1]n where n is a positive integer denoting the dimensionality of a feature vector space for describing feature vectors. Under the above assumptions, the following steps are performed so as to find an element of the database which is the closest to the query vector q, that is, the feature vector.
  • First, a special node, to which the query vector q belongs, is determined (step [0030] 402). Then, the distance between an element of the query vector q and an element in a hypercube corresponding to the determined special node, which is the closest to the element of the query vector, is set as e (step 404). An arbitrary node is selected among child nodes of the root node 302 to determine whether the selected node is a special or ordinary node (step 410). In other words, assuming that the query vector q lies in hypercube (ai, bi)n corresponding to a certain node of an index tree, it is determined whether the query vector lies in a heavily populated or sparsely populated node. If the selected node is determined to be an ordinary node in the step 410, the distance dor between the query vector q and the hypercube nor indexed as the ordinary node is calculated by Equation (1) (step 412): d o r = d ( q , n o r ) = i { q i - a i 2 when q i a i 0 when a i q i b i q i - a i 2 when b i q i ( 1 )
    Figure US20040049502A1-20040311-M00001
  • Here, arbitrary interval a≦×≦b may be represented as pair (a, b), and [0031] hyper cube Π 1 i n ( a i , b i )
    Figure US20040049502A1-20040311-M00002
  • (a[0032] i, bi) may be represented by sequence of intervals (ai, bi).
  • Next, it is determined whether d[0033] or is less than e (step 414), and if dor is determined to be less than e, a child node of the corresponding ordinary node is selected (step 416). In this case, if an element that satisfies dor<e exists, it is preferable that e is updated with dor (step 417). In particular, it is preferable that an index or search structure be designed such that only one element having features most similar to the query vector q can be quickly found. On the other hand, if dor is determined to be greater than or equal to e, all child nodes of the corresponding node are excluded (step 418).
  • If the selected node is determined to be a special node in the [0034] step 410, a hypercube space corresponding to the node is converted into a low-dimensional space (step 420). That is, if the query vector q is determined to lie in a sparsely populated node in the step 410, the query vector q is projected into a low-dimensional subspace corresponding to the query vector q.
  • Next, the distance d[0035] sp between the query vector q and each element v in the hypercube nsp indexed as the special node are calculated by Equation (2) (step 422): d s p = d ( q , n s p ) = min v n s p d ( q , v ) ( 2 )
    Figure US20040049502A1-20040311-M00003
  • Next, elements that satisfy d[0036] sp<e are determined to be candidate elements (step 424). In this case, although not shown in FIG. 4, it is preferable that e be updated with dsp if an element that satisfies dsp<e exists (step 426). In particular, it is preferable that an index or search structure be designed such that only one element having features most similar to the query vector q can be quickly found.
  • Next, it is determined whether a search is performed on all special nodes (step [0037] 430), and if it is determined that not all special nodes have been searched, the next node is selected (step 432), and then the steps after 410 are performed recursively. On the other hand, if all special nodes are determined to have been searched, a predetermined number of elements are determined to be finally found elements (step 440).
  • According to the above retrieval method, if the distance d[0038] or between the query vector q and an element is greater than or equal to a predetermined threshold value e, which is determined to be the closest distance between the query vector q and an element in a hypercube corresponding to the special node to which the query vector q belongs, all child nodes of the corresponding node are excluded from a similarity search and hypercubes corresponding to the nodes branched from all excluded child nodes cease to be utilized in the similarity measurement for the query vector q. Furthermore, quicker retrievals can be provided by repeatedly updating the predetermined threshold value e used in the similarity measurement
  • That is, a high-dimensional feature vector space indexed by the method of indexing a feature vector space according to the present invention can support functions such as a similarity search, retrieval or browsing in a salable and efficient manner. Thus, even if the size of a database increases, the time required for the similarity search and retrieval does not increase as much. [0039]
  • Furthermore, the method of indexing and searching a feature vector space according to the present invention can be written as a program executed on a personal or server computer. The program codes and code segments constructing the program can be easily inferred by computer programmers in the industry. Furthermore, the program can be stored in a computer-readable recording medium. The recording medium includes a magnetic recording medium, an optical recording medium, and a radio medium. [0040]

Claims (17)

What is claimed is:
1. A method of indexing a feature vector space using a tree structure, the method comprising the step of indexing an approximation region in which feature vector elements are sparsely distributed as one special node belonging to a child node of the tree data structure, together with another sparsely distributed approximation region spaced apart by a distance less than a predetermined distance.
2. A method of indexing a feature vector space comprising the steps of:
(a) partitioning the feature vector space into a plurality of approximation regions;
(b) selecting an arbitrary approximation region to determine whether the selected approximation region is heavily or sparsely distributed; and
(c) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree data structure, together with any other sparsely distributed approximation region spaced apart by a distance less than a predetermined distance.
3. The method of claim 2, wherein the steps (b) and (c) are repeatedly performed on all approximation regions partitioned in the step (a).
4. The method of claim 2, prior to the step (c), further comprising the step of:
(c-1) if the approximation region selected in the step (b) is determined to be heavily distributed, indexing the corresponding approximation region as an ordinary node, partitioning the corresponding approximation region into a plurality of sub-approximation regions, and repeating the step (b) for the partitioned sub-approximation regions.
5. The method of claim 4, wherein the steps (b) and (c) are performed on all approximation regions partitioned in the step (a).
6. The method of claim 2, after the step (c), further comprising the steps of:
(d) determining whether all approximation regions are indexed as special nodes;
(e) if all approximation regions are not indexed as special nodes, selecting the next approximation region and performing the steps after (b) on the approximation region repeatedly; and
(f) if all approximation regions are indexed as special nodes, completing the indexing.
7. The method of claim 2, wherein the plurality of approximation regions are subspaces used in random indexing.
8. The method of claim 2, wherein the plurality of approximation regions are subspaces used in multi-dimensional scaling (MDS), Fast-map, or locality sensitive hashing
9. The method of claim 2, wherein the step (c) comprises the step of:
(c′) if the approximation region is determined to be sparsely distributed, indexing the corresponding approximation region as one special node belonging to a child node of the tree data structure together with an adjacent sparsely distributed approximation region.
10. A method of retrieving a feature vector having features similar to a query vector from a vector space indexed by an indexing method using a tree structure including the step of indexing an approximation region in which feature vector elements are sparsely distributed as one special node belonging to a child node of the tree data structure, together with another sparsely distributed approximation region spaced apart by a distance less than a predetermined distance, the retrieval method comprising the steps of:
(a) determining a special node to which the query vector belongs;
(b) setting the distance between an element of the query vector and an element in an approximation region corresponding to the determined special node, which is the closest to the element of the query vector, as a first threshold value; and
(c) excluding all child nodes of the corresponding node if the distance between the query vector and the approximation region indexed as an ordinary node is greater than or equal to the first threshold value.
11. The method of claim 10, prior to the step (c) further comprising the step of:
(c′) selecting an arbitrary node among child nodes of a root node and determining whether the selected node is a special or ordinary node.
12. The method of claim 11, wherein the step (c) comprises the steps of:
(c-1) if the selected node is determined to be an ordinary node in the step (c′), calculating the distance the distance dor between the query vector q and the approximation region nor indexed as the ordinary node according to the following equation:
d o r = d ( q , n o r ) = i { q i - a i 2 when q i a i 0 when a i q i b i q i - a i 2 when b i q i
Figure US20040049502A1-20040311-M00004
(c-2) determining whether the distance dor between the query vector q and the approximation region nor indexed as the ordinary node is less than the first threshold value;
(c-3) if the distance dor between the query vector q and the approximation region nor indexed as the ordinary node is less than the first threshold value, selecting child nodes of the corresponding node; and
(c-4) if the distance dor between the query vector q and the approximation region nor indexed as the ordinary node is greater than or equal to than the first threshold value, excluding all child nodes of the corresponding node.
13. The method of claim 12, after the step (c-2), further comprising the step of updating the first threshold value with the distance dor if the distance dor is less than the first threshold value.
14. The method of claim 11, after the step (c′), further comprising the step of if the selected node is determined to be a special node in the step (c′), converting a space of approximation region corresponding to the special node into a low-dimensional space.
15. The method of claim 12, after the step (c′), further comprising the steps of:
(c-5) if the node selected in the step (c′) is determined to be a special node, converting a space of approximation region corresponding to the node into a low-dimensional space;
(c-6) calculating the distance dsp between the query vector q and each element v in the approximation region nsp indexed as the special node according to the following equation:
d s p = d ( q , n s p ) = min v n s p d ( q , v ) : and
Figure US20040049502A1-20040311-M00005
(c-7) determining elements that satisfy the requirement of the distance dsp being less than the first threshold value to be candidate elements.
16. The method of claim 15, after the step (c-7), further comprising the step of updating the first threshold value with the distance dsp if an element satisfying the requirement of the distance dsp being less than the first threshold value exists.
17. The method of claim 12, after the step (c-4), further comprising the steps of:
determining whether all special nodes have been searched;
selecting the next node to perform the steps after (c-1) repeatedly if all special nodes have not been searched;
determining a predetermined number of elements as finally found elements if all special nodes have been searched.
US10/658,552 2000-11-15 2003-09-10 Method of indexing and searching feature vector space Abandoned US20040049502A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/658,552 US20040049502A1 (en) 2000-11-15 2003-09-10 Method of indexing and searching feature vector space

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US25239100P 2000-11-15 2000-11-15
KR00-79180 2000-12-20
KR10-2000-0079180A KR100429792B1 (en) 2000-11-15 2000-12-20 Indexing method of feature vector space and retrieval method
US09/794,401 US6745205B2 (en) 2000-11-15 2001-02-28 Method of indexing and searching feature vector space
US10/658,552 US20040049502A1 (en) 2000-11-15 2003-09-10 Method of indexing and searching feature vector space

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/794,401 Division US6745205B2 (en) 2000-11-15 2001-02-28 Method of indexing and searching feature vector space

Publications (1)

Publication Number Publication Date
US20040049502A1 true US20040049502A1 (en) 2004-03-11

Family

ID=22955818

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/794,401 Expired - Lifetime US6745205B2 (en) 2000-11-15 2001-02-28 Method of indexing and searching feature vector space
US10/658,552 Abandoned US20040049502A1 (en) 2000-11-15 2003-09-10 Method of indexing and searching feature vector space

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/794,401 Expired - Lifetime US6745205B2 (en) 2000-11-15 2001-02-28 Method of indexing and searching feature vector space

Country Status (2)

Country Link
US (2) US6745205B2 (en)
KR (1) KR100429792B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172869A1 (en) * 2012-12-19 2014-06-19 International Business Machines Corporation Indexing of large scale patient set
US11436228B2 (en) * 2017-03-30 2022-09-06 Odd Concepts Inc. Method for encoding based on mixture of vector quantization and nearest neighbor search using thereof

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3950718B2 (en) * 2001-03-19 2007-08-01 株式会社リコー Image space display method
KR100446639B1 (en) * 2001-07-13 2004-09-04 한국전자통신연구원 Apparatus And Method of Cell-based Indexing of High-dimensional Data
US7386561B1 (en) * 2002-02-06 2008-06-10 Ncr Corp. Partitioned joins of spatial objects in a database system
JP3974511B2 (en) * 2002-12-19 2007-09-12 インターナショナル・ビジネス・マシーンズ・コーポレーション Computer system for generating data structure for information retrieval, method therefor, computer-executable program for generating data structure for information retrieval, computer-executable program for generating data structure for information retrieval Stored computer-readable storage medium, information retrieval system, and graphical user interface system
US8345988B2 (en) * 2004-06-22 2013-01-01 Sri International Method and apparatus for recognizing 3-D objects
US8024337B1 (en) * 2004-09-29 2011-09-20 Google Inc. Systems and methods for determining query similarity by query distribution comparison
KR100620125B1 (en) * 2005-07-18 2006-09-06 인하대학교 산학협력단 System and method for a index reorganization using a part index transfer in spatial data warehouse
US20090101042A1 (en) * 2006-08-30 2009-04-23 Glyde-Rail Licensing, Llc Apparatus for enabling an excavator to mount, demount and travel on railroad tracks
KR100900497B1 (en) * 2007-09-19 2009-06-03 한국과학기술원 Method of Vector Quantization and Computer Program Electronic Recording Medium for the Method
JP5121367B2 (en) * 2007-09-25 2013-01-16 株式会社東芝 Apparatus, method and system for outputting video
KR100925294B1 (en) * 2007-10-24 2009-11-04 유민구 Searching system and its method for using tag data and cube structure of information
JP4675995B2 (en) * 2008-08-28 2011-04-27 株式会社東芝 Display processing apparatus, program, and display processing method
JP5388631B2 (en) * 2009-03-03 2014-01-15 株式会社東芝 Content presentation apparatus and method
JP4852119B2 (en) * 2009-03-25 2012-01-11 株式会社東芝 Data display device, data display method, and data display program
US8510659B2 (en) * 2009-08-14 2013-08-13 Oracle International Corporation Analytical previewing of multi-dimensional sales territory proposals
US8244767B2 (en) * 2009-10-09 2012-08-14 Stratify, Inc. Composite locality sensitive hash based processing of documents
US9355171B2 (en) * 2009-10-09 2016-05-31 Hewlett Packard Enterprise Development Lp Clustering of near-duplicate documents
AU2012202352A1 (en) * 2012-04-20 2013-11-07 Canon Kabushiki Kaisha Method, system and apparatus for determining a hash code representing a portion of an image
US9893950B2 (en) * 2016-01-27 2018-02-13 International Business Machines Corporation Switch-connected HyperX network
US11106708B2 (en) 2018-03-01 2021-08-31 Huawei Technologies Canada Co., Ltd. Layered locality sensitive hashing (LSH) partition indexing for big data applications
KR102595508B1 (en) 2018-12-11 2023-10-31 삼성전자주식회사 Electronic apparatus and control method thereof
CN116541420B (en) * 2023-07-07 2023-09-15 上海爱可生信息技术股份有限公司 Vector data query method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6411953B1 (en) * 1999-01-25 2002-06-25 Lucent Technologies Inc. Retrieval and matching of color patterns based on a predetermined vocabulary and grammar
US6584465B1 (en) * 2000-02-25 2003-06-24 Eastman Kodak Company Method and system for search and retrieval of similar patterns

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5647058A (en) * 1993-05-24 1997-07-08 International Business Machines Corporation Method for high-dimensionality indexing in a multi-media database
KR100233365B1 (en) * 1996-12-13 1999-12-01 윤덕용 Hg-tree index structure and method of inserting and deleting and searching it
JPH10301937A (en) * 1997-04-23 1998-11-13 Nippon Telegr & Teleph Corp <Ntt> Neighborhood retrieval method inside multi-dimensional vector space and recording medium for the program
US6141655A (en) * 1997-09-23 2000-10-31 At&T Corp Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template
KR100309671B1 (en) * 1997-12-11 2001-12-17 오길록 Device and method for forming vector data phase relationship using r*-tree spatial index
KR100282640B1 (en) * 1998-07-01 2001-02-15 윤덕용 A database management method using partitioned minimum bounding rectangle
KR100282608B1 (en) * 1998-11-17 2001-02-15 정선종 How to Configure Spatial Indexes to Support Similarity Search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6411953B1 (en) * 1999-01-25 2002-06-25 Lucent Technologies Inc. Retrieval and matching of color patterns based on a predetermined vocabulary and grammar
US6584465B1 (en) * 2000-02-25 2003-06-24 Eastman Kodak Company Method and system for search and retrieval of similar patterns

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172869A1 (en) * 2012-12-19 2014-06-19 International Business Machines Corporation Indexing of large scale patient set
US20150293956A1 (en) * 2012-12-19 2015-10-15 International Business Machines Corporation Indexing of large scale patient set
US9355105B2 (en) * 2012-12-19 2016-05-31 International Business Machines Corporation Indexing of large scale patient set
US20160188699A1 (en) * 2012-12-19 2016-06-30 International Business Machines Corporation Indexing of large scale patient set
US10242085B2 (en) * 2012-12-19 2019-03-26 International Business Machines Corporation Indexing of large scale patient set
US10394850B2 (en) * 2012-12-19 2019-08-27 International Business Machines Corporation Indexing of large scale patient set
US11436228B2 (en) * 2017-03-30 2022-09-06 Odd Concepts Inc. Method for encoding based on mixture of vector quantization and nearest neighbor search using thereof

Also Published As

Publication number Publication date
US6745205B2 (en) 2004-06-01
KR100429792B1 (en) 2004-05-03
KR20020038438A (en) 2002-05-23
US20020085011A1 (en) 2002-07-04

Similar Documents

Publication Publication Date Title
US6745205B2 (en) Method of indexing and searching feature vector space
US6529891B1 (en) Automatic determination of the number of clusters by mixtures of bayesian networks
US9842141B2 (en) Range query methods and apparatus
US6148295A (en) Method for computing near neighbors of a query point in a database
US7882109B2 (en) Computer representation of a data tree structure and the associated encoding/decoding methods
US6084595A (en) Indexing method for image search engine
EP1025514B1 (en) Multidimensional data clustering and dimension reduction for indexing and searching
US7546293B2 (en) Relevance maximizing, iteration minimizing, relevance-feedback, content-based image retrieval (CBIR)
US6003029A (en) Automatic subspace clustering of high dimensional data for data mining applications
US7885960B2 (en) Community mining based on core objects and affiliated objects
US8762390B2 (en) Query specific fusion for image retrieval
US20060218138A1 (en) System and method for improving search relevance
US20060036564A1 (en) System and method for graph indexing
US20100106713A1 (en) Method for performing efficient similarity search
US20050114331A1 (en) Near-neighbor search in pattern distance spaces
CN103339624A (en) High efficiency prefix search algorithm supporting interactive, fuzzy search on geographical structured data
US20060271532A1 (en) Matching pursuit approach to sparse Gaussian process regression
EP1207464A2 (en) Database indexing using a tree structure
US20020123987A1 (en) Nearest neighbor data method and system
US20080133496A1 (en) Method, computer program product, and device for conducting a multi-criteria similarity search
US6910030B2 (en) Adaptive search method in feature vector space
Ng A maximum likelihood ratio information retrieval model
KR100786675B1 (en) Data indexing and similar vector searching method in high dimensional vector set based on hierarchical bitmap indexing for multimedia database
CN113688702A (en) Streetscape image processing method and system based on fusion of multiple features
KR100902010B1 (en) Effcient similarity search method for content based multimedia retrieval with relevance feedback

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION