Index text fuzzy vs utl_match.jaro_winkler_similarity

Dear all,

I try with the utl_match.jaro_winkler_similarity function in my code to search scenarios.
I tried the function utl_match.jaro_winkler_similarity by replacing oracle index contains - fuzzy function.
utl_match.jaro_winkler_similarity (string1, string2) >= 77
gives more possible values as text fuzzy oralce index. It gives quick result comparing to blurry text indexes. More important, it gets the tuples that are not recovered by the text index fuzzy oracle with a default score of 60. Fuzzy does not work for numeric values, rather utl_match.jaro_winkler_similarity retrieves the numeric values successfully.

I have two questions.
1. can I use function utl_match.jaro_winkler_similarity instead of the oracle text index - blurred? Are there restrictions when using this feature of utl_match?
2. can I create a function based on utl_match.jaro_winkler_similarity index to improve performance?

Kind regards
Suresh.

1. can I use function utl_match.jaro_winkler_similarity instead of the oracle text index - blurred? Are there restrictions when using this feature of utl_match?

You can use it, but it will be probably slow by itself. I would use Oracle Text contains and blurred in a subquery to limit the lines, using an index first, then use utl_match.jaro_winkler in an external query to further limit. I would not use the default values for fuzzy, but could ensure that it would not remove everything that should not be eliminated. Fuzzy is destined to find similar spellings. Jaro_Winkler is destined to find similar names. If you have strings that include a number, then I would use utl_match.edit_distance_similarity instead of Jaro_Winkler.

2. can I create a function based on utl_match.jaro_winkler_similarity index to improve performance?

Yes, if the two strings that you compare are in the same table in the same line. You can also create a view and an index on the view.

Tags: Database

Similar Questions

  • Indexed text view Oracle is possible

    Indexed text view Oracle is possible?

    Looks like you are indexing the file as part of a multi_column_datastore name but expected to read the contents of the file. How do you know that filename is a file that must be read, where the other columns are text indexing Oracle text directly?

    To index the content of a file, you must use the FILE_DATASTORE. There is no direct way to concatenate the contents of a file with columns of simple text - if you need to do this you need to use USER_DATASTORE, you read the contents of the file in the procedure of data store and filter it with CTX_DOC. POLICY_FILTER.

  • performance of the 10g search text fuzzy

    Hello to all members of this community,

    IM new to this and I have a question that belongs to the Oracle 10g text.

    My configuration:

    Oracle Database 10 g Enterprise Edition Release 10.2.0.4.0 - 64 bit

    8 cores with each 2,5 GHz

    64 GB OF RAM

    What I would do:

    I would like to compare a large amount of games of lines between them so that human caused errors (e.g. spelling, typos) will not be tolerated.

    My setup of CONTEXT of the TEXT is as follows:

    MULTI_COLUMN_DATASTORE with each column to compare.

    begin
      ctx_ddl.create_preference('my_datastore', 'MULTI_COLUMN_DATASTORE');
      ctx_ddl.set_attribute('my_datastore', 'columns', 'column1, ...'); 
    end;
    

    BASIC_LEXER - with the parameters of the GERMANS:

    begin
       ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
       ctx_ddl.set_attribute('my_lexer', 'index_themes', 'NO');
       ctx_ddl.set_attribute('my_lexer', 'index_text', 'YES');
       ctx_ddl.set_attribute('my_lexer', 'alternate_spelling', 'GERMAN');
       ctx_ddl.set_attribute('my_lexer', 'composite', 'GERMAN');
       ctx_ddl.set_attribute('my_lexer', 'index_stems', 'GERMAN');
       ctx_ddl.set_attribute('my_lexer', 'new_german_spelling', 'YES');
    end;
    

    BASIC_WORDLIST - with the parameters of the GERMANS:

    begin
       ctx_ddl.create_preference('my_wordlist', 'BASIC_WORDLIST');
       ctx_ddl.set_attribute('my_wordlist','FUZZY_MATCH','GERMAN');
       ctx_ddl.set_attribute('my_wordlist','FUZZY_SCORE','60'); --defaults
       ctx_ddl.set_attribute('my_wordlist','FUZZY_NUMRESULTS','100'); --defaults
       --ctx_ddl.set_attribute('my_wordlist','SUBSTRING_INDEX','TRUE'); --uncommented due to long creation time of index
       ctx_ddl.set_attribute('my_wordlist','STEMMER','GERMAN');
    end;
    

    And a BASIC_SECTION_GROUP with a field_section for each column.

    begin
      ctx_ddl.create_section_group(
        group_name => 'my_section_group', 
        group_type => 'BASIC_SECTION_GROUP'
      );
      ctx_ddl.add_field_section(
        group_name   => 'my_section_group',
        section_name => 'column1',
        tag          => 'column1'
      );
    ...
    end;
    

    I create the index with

    create index idx_myfulltextindex on fulltexttest(column1)
    indextype is ctxsys.context
    parameters ('datastore my_datastore 
                 section group my_section_group 
                 lexer my_lexer
                 wordlist my_wordlist 
                 stoplist ctxsys.empty_stoplist')
    

    Everything works well functionally.

    In my test scenario, I had a table with lines about 100,000 that has a primary key that is not in the CONTEXT index.

    The problem:

    I do a query like:

    SELECT SCORE(1), a.* 
    FROM fulltexttest a 
    WHERE CONTAINS(a.column1, 'FUZZY(({TEST}),,,W) WITHIN COUMN1', 1) 
      AND a.primkey BETWEEN 1000 AND 4000
    

    This will do a full text search in a set of 3000 lines. Here, the response time is almost immediate. Maybe a second.

    If I do the same in a slider repeatedly (> 1000) with various search terms, we take one of course a lot of time. It does in the average 1 requests per second.

    I thought that this cannot be that slow and I tested the same with:

    SELECT SCORE(1), a.* 
    FROM fulltexttest a 
    WHERE CONTAINS(a.column1, '({TEST}) WITHIN COUMN1', 1) 
      AND a.primkey BETWEEN 1000 AND 4000
    

    NOTE there is no fuzzy search more...

    With that, it is up to 20 times faster.

    The cpu of the server about 15% charge when the fuzzy query processing.

    So:

    If I do a fuzzy search, it seems do not access the index. I thought I was saying to the index to calculate the results of 100 extensions in advance.

    I'm doing it wrong? Or is it not possible to build a particular Index to the fuzzy search?

    Are there suggestions to improve performance? Note that I have already read the guide (7 Tuning Oracle Text). None caused advice cure.

    I would be grateful if anyone can help me in this case... Or simply to give an indication.

    Thank you

    Dominik

    The attributes of a list of words can be used to specify how developed the stems and fuzzies create prefix and substring index.

    If you do a lot of research with generic characters from end, like partialword % then a prefix index can make these quick searches.  If you do the research with two wildcard characters, such as % partialword % then an index of the substring can make this faster research.  There is a trade-off between taking time and storage space to create and maintain the index with prefixes and/or substrings and time of the query.  You can specify the minimum length of a prefix.  The shorter length, prefixes no longer possible and the longer it takes to create the largest index.  So you need to use depends on what types of queries you expect most of the time.

    I don't know if you have tested from queries.  The attribute index_stems of the lexer and the attribute of forms derived from the word list generator are in conflict.  You must keep the generator attribute of forms derived from the list of words and do not use the attribute index_stems of the lexer.

  • Hang partitioned index text on construction

    I have a partitioned table with 2 000 000 + lines with a blob column.  When you try to create an index of type CTXSYS. CONTEXT that is partitioned correctly finished 8 of 10 partitions but crashes right on the other 2 indefinitely.  I am building with a parallel degree of 8.  Three of these have expected of 'direct path read' while the rest are the "PX Deq: execution Msg.

    Any ideas?

    Ok.  18121298 patch solves this problem.  Discovered this while trying to make a service request.

  • The problem of text (fuzzy) when the composition evolve.

    Hello Experts,

    I put across a composition of width 900px to 450, but the text will to be to blur, the continuously rasterize enabled.

    Y at - it an option to solve this problem?

    Thanks in advance,

    Just nest the comp to 900 pixels in a new comp of 450 pixels.  The layer is nested to adapt to the new model of the scale.

  • Text fuzzy HELP!

    Untitled.jpg
    Pls help im new to AE and don't know what to do ab... when I start typing the text becoming like this

    You are viewing the model with a zoom factor of 1600%. Close inspection reveals that your text is only 6 pixels height. Far too small to read 100%.

    Basic workflow

  • Is it possible to save a table in Photoshop (which has been created in InDesign) without text fuzzy?

    I am trying to save a table in Photoshop that was created in InDesign for web use. When I save it as a png (cannot save in pdf or jpeg the limitations of our company Web site), the text appears blurred (pixelated?). Is this because the table is flattened/pixelated?

    Is there a different adobe product that can be used to avoid what is happening? The company that built our Web site is a style editor that has limited capabilities. We tried using the website editor for creating tables; However, the Editor doesn't have the flexibility and the ease of use that comes with InDesign. That's why we opted to create the tables in Indesign.

    To download the table in the editor, the file should be saved as a png image, that's why we create in InDesign and then by saving the table in Photoshop.

    Any ideas?

    Thank you very much for the help!

    Tracy

    Yes, export while in InDesign.  Make sure save you it in png-24 format and not a png-8 (transparency).  Also, make sure that you do not have your visible background layer when you save it in Photoshop.  You should see the grid in Photoshop which denotes a transparent background when you save it.

  • Fuzzy search Orcle text on multilingual (English/French)

    Hello

    I use Oracle 11.2, I would like to make text fuzzy match on column contains the English and French letters. How can I set the value for FUZZY_MATCH via ctx_ddl.set_attribute

    Thanks in advance!

    create a tb_class (varchar2 (20) nm, addr varchar2 (200));
    insert into tb_class ('cadfecc', 'Paris'); commit;

    Start
    ctx_ddl.set_attribute ('STEM_FUZZY_PREF', 'FUZZY_MATCH', 'FRENCH'); ... It is only for the French?
    end;

    create index class_nm_idx on tb_class (nm) indextype is ctxsys.context parameters ("list STEM_FUZZY_PREF sync (on commit)" ");

    Select * from tb_class where contains (nm, 'cadfcc', weight)', 1) > 0;
    SCOTT@orcl_11gR2> create table tb_class
      2    (nm    varchar2 ( 20),
      3       addr  varchar2 (200))
      4  /
    
    Table created.
    
    SCOTT@orcl_11gR2> insert into tb_class values ('çadfécc', 'Paris')
      2  /
    
    1 row created.
    
    SCOTT@orcl_11gR2> begin
      2    ctx_ddl.create_preference
      3        ('STEM_FUZZY_PREF', 'BASIC_WORDLIST');
      4    ctx_ddl.set_attribute
      5        ('STEM_FUZZY_PREF', 'STEMMER', 'AUTO');
      6    ctx_ddl.set_attribute
      7        ('STEM_FUZZY_PREF', 'FUZZY_MATCH', 'AUTO');
      8    ctx_ddl.create_preference
      9        ('LEXER_PREF', 'BASIC_LEXER');
     10    ctx_ddl.set_attribute
     11        ('LEXER_PREF', 'BASE_LETTER', 'YES');
     12  end;
     13  /
    
    PL/SQL procedure successfully completed.
    
    SCOTT@orcl_11gR2> create index class_nm_idx
      2  on tb_class (nm)
      3  indextype is ctxsys.context
      4  parameters
      5    ('wordlist  STEM_FUZZY_PREF
      6        sync        (on commit)
      7        lexer        LEXER_PREF')
      8  /
    
    Index created.
    
    SCOTT@orcl_11gR2> select * from tb_class
      2  where  contains (nm, 'fuzzy (çadfcc, 1, 5000, weight)', 1) > 0
      3  /
    
    NM
    --------------------
    ADDR
    --------------------------------------------------------------------------------
    çadfécc
    Paris
    
    1 row selected.
    
  • Photo Story 3 blurs the text and format of appropriate background music

    I had a lot of trouble with a blurry text on Photo Story 3. It seems perfectly fine to start, and then whenever I saw it, it's blurred beyond belief. In addition, I don't know the type of music format for the background music. What can I do to fix the text, and what file format do I for the background music?

    Many different audio formats would work but the best
    choice is always WMA or WAV. Sometimes, it can
    having problems when you save a project as a result of the use of certain
    audio formats.

    As for your photos... If nothing else... to enlarge a
    size at least equal to your screen resolution.

    That will probably not improve the appearance of the
    photos, but it will improve the text (fuzzy) pixelated
    problem.

    First of all, lets you know what is the resolution of your screen. Right
    Click your desktop, then choose Properties / Settings tab...
    Watch the screen resolution slider to see your current
    Resolution of the screen. For this tutorial, let's say your screen
    The resolution is 1024 x 768.

    If you download and install the free Win XP ImageResizer
    PowerToy it will add a 'Resize' option to the right click
    menu.

    Can resize the following free PowerToy for Windows XP
    lots of photos very quickly:

    Click on the following link and on the left, click PowerToys...

    Windows XP downloads
    http://Windows.Microsoft.com/en-us/Windows/downloads/Windows-XP

    (FWIW... it's always a good idea to create a system)
    Restore point before installing software or updates)

    Download and install: ImageResizer.exe
    (Filename: ImageResizerPowertoySetup.exe)
    (does not work on Vista)

    Now you can open any folder containing images and
    you will have an option to resize. Just right-click the selected
    (highlighted) image files and choose resize images of
    the menu. You can select a picture or an entire group.

    The program puts in the folder with resized copies
    your originals.

    (Tip: * never * crush your originals... especially if you were)
    resizing of the originals to a smaller size)

    Resized copies will the custom word added to the
    file name so that they will not be easy to find when you are ready to add
    paste them into your photo story project.

    To ensure that images in the orientation to portrait and landscape
    are resized to the same size... go to... Advanced... and
    tick... Customize... Enter the largest number in these two areas.

    IOW... in this tutorial assuming that the screen resolution 1024 x 768...
    Enter 1024 in these two areas.

    The maximum number of Photo Story 3 photos can just a Council...
    handle is 300 and if you use a lot of movement and Transitions a
    smallest of the numbers is recommended.

    Well, the next gift can change the duration of the photos
    in an existing one. WP3 project... it can also delete... Motion and
    Transitions and also randomize the slides.

    TweakPS
    http://www.windowsphotostory.com/TweakPS

    More info on the following sites:

    WindowsPhotoStory.com
    http://www.windowsphotostory.com/

    PapaJohn
    http://Papajohn.org/
    See Photo Story 3 in the left column

    See you soon...

  • Styles of text or suspicion on SystemPrompt?

    Is there a way to make an index, text style, or put flags like TextInputFlag.AutoPeriodOff on a fast system?

    No, there is no way to do it with a SystemPrompt.  Try a custom dialog box:

    https://developer.BlackBerry.com/Cascades/documentation/UI/dialogs_toasts/custom_dialogs.html

  • Copy the text from the online site created by muse

    Hey,.

    People can not copy the text of some sites created by Muse?

    Is it because I chose the wrong police at my site?

    Best regards and thank you

    Tom

    In addition, you use system fonts?

    System fonts allow you to use the typography more unique, based on specific fonts (the Designer) have installed on your machine. For example, if you design a site that has a specific theme, such as a gardening site, you can install a unusual flower font that is not available in the web fonts Typekit library. It's good to apply sparingly system to the text fonts, but keep in mind that the text content will export as images. This means that the page takes longer to load and visitors will not be able to select, copy, or paste the text of the page. System fonts are also better suited for small pieces of text, such as headers. To make the site easy to use, avoid using fonts system for addresses, phone numbers and any content visitors can copy and paste in a calendar or e-mail message. Don't forget that some visitors have difficulty reading the text smaller and can set the browser to increase the size of the police; but this strategy won't help to read the text that is exported as image files. Remember that the search engines index text content search sites to define a site's ranking in search results. For these reasons, it is best to only apply the system fonts when you really need have an impact on the design. If there is a comparable web or Standard police, apply this police instead.

  • Normalize the names in a huge table using UTL_MATCH

    Hello

    I have a large table (350 million records) with a "full name" column

    This column has a few typos, so I have to 'normalise' the data (only for this column), using UTL_MATCH. JARO_WINKLER_SIMILARITY.

    I did some tests with a small table, and it works to show the similar names:

    SELECT b.SID, b.name FROM typotable a, typotable b utl_match.jaro_winkler_similarity (b.SID, b.name) WHERE BETWEEN 85 and 99 AND a.rowid > b.rowid;


    But:


    (1) the test table was small, by using this code directly on the 350 million accounts table take ages... What can be done about it?


    (2) this shows just the similar names. How can I update the table by searching for similarities, choose one of them as the only value for each name?




    Thank you

    1590733 wrote:

    Yes, I get your point. The thing is that there is no "correct" available names and the original table is huge, that's what I thought:

    -Create a table of secondary NAMES, with unique names. These names would have been generated by match the values similar to one of them (but always the same, no matter if is not one that suits). This should be equivalent to your table 'correctness '.

    -Run the cleaning procedure for updating records

    How can I create this secondary NAMES table? (The column 'genre' is not serious at all, that the 'name' must be set)

    Thanks for your help

    Well, you need to determine what is the logic that would pick one of the incorrect names on the other.  In its current version, you can easily get two incorrect values having the same value of match.  But then you must also consider what creates a 'group' of values that you can get the best in the group.  Using the match itself is not enough to create groups.

    Example:

    SQL > ed

    A written file afiedt.buf

    1 Select a.fname as $fname1, b.fname as fname2

    2, utl_match.jaro_winkler_similarity (a.fname, b.fname) as a match

    3 typotable one

    4 join typotable b on (a.fname! = b.fname)

    where the 5 utl_match.jaro_winkler_similarity (a.fname, b.fname) > = 85

    6 * 1.3 desc order

    SQL > /.

    $FNAME1 FNAME2 MATCH

    ---------- ---------- ----------

    FROCEN FROZEN 92

    FROZEN FROCEN 92

    FROZEN FROCEN 92

    FROZEN FROZEN 92

    JELLY FROZIN 93

    JELLY FROCEN 92

    FROZEN FROZEN 92

    FROZEN FROZIN 93

    WHIPLASH WIPLASH 96

    WHIPLASH WIPLASH 96

    10 selected lines.

    As you can see, for example, FROCEN has two possible variants, both with a football match of the 92.  The same with others.

    However, you could start cutting things around (and it's really a hack) to get something like:

    SQL > ed

    A written file afiedt.buf

    1 with t as)

    2. Select a.fname as $fname1, b.fname as fname2

    3, utl_match.jaro_winkler_similarity (a.fname, b.fname) as a match

    typotable a 4

    5 join typotable b on (a.fname! = b.fname)

    where the 6 utl_match.jaro_winkler_similarity (a.fname, b.fname) > = 85

    7       )

    8, ch. as)

    9 select $fname1, ($fname1, fname2) greatest as fname2, match

    10, (select count (*)

    11 t t2

    12 where t2.fname2 = t.fname2

    13 and t2.fname1! = t.fname1

    (14) as the NTC

    15 t

    16       )

    17, r as)

    18 select $fname1, fname2, match, cnt

    19, row_number() over (partition by $fname1 by cnt desc, desc match order): the nurse

    20 c

    21       )

    22 select $fname1, fname2

    23 r

    where the 24 rn = 1

    25 * order by 1

    SQL > /.

    $FNAME1 FNAME2

    ---------- ----------

    FROZEN FROCEN

    FROZEN FROZEN

    FROZEN FROZEN

    FROZIN FROZIN

    WHIPLASH WIPLASH

    WIPLASH WIPLASH

    6 selected lines.

    but then it depends on your data as to if it will work in all circumstances

  • If is it possible to apply the wiggle expression loop to individual characters in a block of text?

    I'm relatively new to After Effects and text animation. What I was, I was wondering if there is a way to essentially take the effect of a wiggly selector, rotation, which applies to each individual character, and then pass? Also impressive that the expression 'loop wiggle' is, it seems to apply only to the entire text block.

    Selectors of expression work on indexes text and the expression of Dan is not adapted specifically work on text animators. It only returns a uniform value. You have to build a loop around it for each character through the clues. If you really have such a short text, it can be faster and easier for you to understand, to use only the many animations of text because there are letters and limit the influence of each in a single character...

    Mylenium

  • The Oracle text and locking

    Hi Experts,

    I'm on Oracle 11.2.0.3 on Linux. I've implemented the Oracle text in my database. My production dba told me today that there is some locking problems in production and he sent me a report from ASH. I can post that ASH in a separate email thread, but first of all in this post you ask who are there known issues with Oracle expected text that it causes associated locks or lock? I read somewhere that oracle indexes text get defragmented over time. My full text index is defined as sync on commit. For example, should they be rebuilt over time - say drop and recreate them? What should be the frequency of doing this?

    I'll be very grateful for any pointer on it.

    Kind regards

    OraserN

    There are different ways to optimize, some fast and partial and some more slow and painstaking.  Please see the procedure optimize_index of the ctx_ddl package:

    CTX_DDL package

    The type of optimization and frequency can be set in the index settings, as you do with sync (on commit), or on demand using dbms_job and dbms_scheduler.

  • Model fuzzy of After Effects at first

    Hello


    I made a simple logo as a 3D object to rotate and have a flash text by. I coded it and imported for Prem Pro. The problem is the look of logo and text fuzzy when I hit the game. Is there a rule of thumb for export settings etc etc. Thanks for the help.

    My first thought is that the monitor playback settings are not full quality.

    On that note, the internal monitors to a NLE are a guide to content only.  Always use a calibrated external screen to judge the quality.

Maybe you are looking for