Football game and the characters as markup. ? and!

I posted a few questions a few weeks ago on the same system I'm working on. It's basically a bunch of chat messages that are archived that we want to search. At this point, I'm trying to understand why Oracle Text behaves the way when it comes to some punctuation as characters I thought I configured the text with all index so that he would return and tagging these. Here are some examples of SQL that has the same issues:

I use the following function to the markup, so I can use it as part of a select query:

CREATE OR REPLACE FUNCTION my_markup
  (p_index_name IN VARCHAR2,
   p_textkey    IN VARCHAR2,
   p_text_query IN VARCHAR2,
   p_plaintext  IN BOOLEAN  DEFAULT TRUE,
   p_starttag   IN VARCHAR2 DEFAULT '<<<',
   p_endtag     IN VARCHAR2 DEFAULT '>>>',
   p_key_type   IN VARCHAR2 DEFAULT 'primary_key')
   RETURN       CLOB
AS
   v_clob       CLOB;
BEGIN
   CTX_DOC.SET_KEY_TYPE (p_key_type);
   CTX_DOC.MARKUP
   (index_name => p_index_name,
    textkey    => p_textkey,
    text_query => p_text_query,
    restab     => v_clob,
    plaintext  => p_plaintext,
    starttag   => p_starttag,
    endtag     => p_endtag);
   RETURN v_clob;
END my_markup;

Here is the sample table and some sample data:

create table search_test 
(
     data_id number(19),
     test_data clob
);
alter table search_test 
    add constraint search_test_pk primary key ( data_id ) ;

insert into search_test values (1, 'this is, the first. test sentence ?? with ., some ??? content! ');
insert into search_test values (2, 'The quick, brown fox > jumps < over the lazy dog.');
insert into search_test values (3, 'Some !@#$%%^&*(()),.<>;:''"[]{}-_=+~ crazy char string ');
insert into search_test values (4, 'this is, the first test sentence ?? with ., some ??? content! ');
insert into search_test values (5, 'this is, the first; test sentence ?? with ., some ??? content! ');
insert into search_test values (6, 'this is, the first: test sentence ?? with ., some ??? content! ');
insert into search_test values (7, 'this is, the first? test sentence ?? with ., some ??? content! ');
insert into search_test values (8, 'this is, the first! test sentence ?? with ., some ??? content! ');
insert into search_test values (9, 'this is, the first@ test sentence ?? with ., some ??? content! ');
insert into search_test values (10, 'this is, the first# test sentence ?? with ., some ??? content! ');
insert into search_test values (11, 'this is, the first$ test sentence ?? with ., some ??? content! ');
insert into search_test values (12, 'this is, the first% test sentence ?? with ., some ??? content! ');
insert into search_test values (13, 'this is, the first^ test sentence ?? with ., some ??? content! ');
insert into search_test values (14, 'this is, the first& test sentence ?? with ., some ??? content! ');
insert into search_test values (15, 'this is, the first* test sentence ?? with ., some ??? content! ');
insert into search_test values (16, 'this is, the first( test sentence ?? with ., some ??? content! ');
insert into search_test values (17, 'this is, the first) test sentence ?? with ., some ??? content! ');
insert into search_test values (18, 'this is, the first[ test sentence ?? with ., some ??? content! ');
insert into search_test values (19, 'this is, the first] test sentence ?? with ., some ??? content! ');
insert into search_test values (20, 'this is, the first{ test sentence ?? with ., some ??? content! ');
insert into search_test values (21, 'this is, the first} test sentence ?? with ., some ??? content! ');
insert into search_test values (22, 'this is, the first< test sentence ?? with ., some ??? content! ');
insert into search_test values (23, 'this is, the first> test sentence ?? with ., some ??? content! ');
insert into search_test values (24, 'this is, the first- test sentence ?? with ., some ??? content! ');
insert into search_test values (25, 'this is, the first_ test sentence ?? with ., some ??? content! ');
insert into search_test values (26, 'this is, the first= test sentence ?? with ., some ??? content! ');
insert into search_test values (27, 'this is, the first+ test sentence ?? with ., some ??? content! ');
insert into search_test values (28, 'this is, the first| test sentence ?? with ., some ??? content! ');
insert into search_test values (29, 'this is, the first!! test sentence .. with ., some && content! ');

And here's the text index definition:

BEGIN
     CTX_DDL.CREATE_PREFERENCE('test_lexer', 'BASIC_LEXER');
     CTX_DDL.SET_ATTRIBUTE('test_lexer', 'printjoins', '~!@#$%^*()_-+={}[]:;<>,.?/');
     
     CTX_DDL.CREATE_PREFERENCE('test_wordlist', 'BASIC_WORDLIST');
     CTX_DDL.SET_ATTRIBUTE('test_wordlist', 'SUBSTRING_INDEX', 'YES');
     CTX_DDL.SET_ATTRIBUTE('test_wordlist', 'PREFIX_INDEX', 'TRUE');
     CTX_DDL.SET_ATTRIBUTE('test_wordlist', 'PREFIX_MIN_LENGTH', '3');
     CTX_DDL.SET_ATTRIBUTE('test_wordlist', 'PREFIX_MAX_LENGTH', '6');
END;
/

CREATE INDEX search_test_text_idx on search_test(test_data)
     INDEXTYPE IS CTXSYS.CONTEXT
          PARAMETERS ('
            DATASTORE CTXSYS.DEFAULT_DATASTORE
            FILTER CTXSYS.NULL_FILTER
            STOPLIST CTXSYS.EMPTY_STOPLIST
            LEXER test_lexer
            SYNC (EVERY "sysdate+(10/(24*60*60))")
            WORDLIST test_wordlist');

Therefore, since the index uses an empty list and all special characters are defined in the lexer as joins impression, I expected to research which one '. «, » ! 'or'?' to correspond to these, but this is not the case. E.e. when I run:

select data_id, my_markup('search_test_text_idx', data_id, 'first\?')
from search_test
where contains(test_data, 'first\?', 1) > 0

Then, I get the following results:

            DATA_ID MY_MARKUP('SEARCH_TEST_TEXT_IDX',DATA_ID,'FIRST\?')                         
------------------- --------------------------------------------------------------------------------
                  1 this is, the <<<first>>>. test sentence ?? with ., some ??? content!             
                  4 this is, the <<<first>>> test sentence ?? with ., some ??? content!              
                  7 this is, the <<<first>>>? test sentence ?? with ., some ??? content!             
                  8 this is, the <<<first>>>! test sentence ?? with ., some ??? content!             
                 14 this is, the <<<first>>>& test sentence ?? with ., some ??? content!             
                 28 this is, the <<<first>>>| test sentence ?? with ., some ??? content!             
                 29 this is, the <<<first>>> !! test sentence .. with ., some && content!           

 7 rows selected 

I expect not to see that line with id 7 and I expected the '?' to be included in the tag. Why does it return all of these lines and is at - it something I can do to get what I expected?

A second thing, I stumbled on who is probably related to this issue/question looking for multiples of these characters in a line, for example, a search for '?'. The markup still mark an inferior is specified in the search expression contains it:

select data_id, my_markup('search_test_text_idx', data_id, 'first. test sentence \?\?')
from search_test
where contains(test_data, 'first. test sentence \?\?', 1) > 0

This returns:

            DATA_ID MY_MARKUP('SEARCH_TEST_TEXT_IDX',DATA_ID,'FIRST.TESTSENTENCE\?\?')          
------------------- --------------------------------------------------------------------------------
                  1 this is, the <<<first. test sentence ?>>>? with ., some ??? content!             
                  4 this is, the <<<first test sentence ?>>>? with ., some ??? content!              
                  7 this is, the <<<first? test sentence ?>>>? with ., some ??? content!             
                  8 this is, the <<<first! test sentence ?>>>? with ., some ??? content!             
                 14 this is, the <<<first& test sentence ?>>>? with ., some ??? content!             
                 28 this is, the <<<first| test sentence ?>>>? with ., some ??? content!             

 6 rows selected 

Again I have only see line with id 1 and all the two? in the end to be included in the tag. Why is not what I expected?

Okay, I think I can see what is happening.  The doc:

  • If a printjoins character is also defined as a punctuation character, the character is only treated as an alphanumeric character if the character immediately following it is a standard alphanumeric character or has been defined as a character printjoins and skipjoins.

... and...


  • punctuation
  • Specify the non-alphanumeric characters which, when they appear at the end of a Word, indicate the end of a sentence. The default values are period '. ', question mark'?' and exclamation mark "!".
  • Characters which are defined as punctuation marks are removed from a token before full-text indexing. However, if a punctuation character is also defined as a printjoins character, the character is removed only when it is the last character in the token.

So question mark is getting charged at the end of chips.  The solution for this is to erase punctuation.  You can't really define punctuation NULL or a string empty, but you can set it to a space which has the same effect:

CTX_DDL. SET_ATTRIBUTE ('test_lexer', 'punctuation', ');

Your second question is because you have two question mark characters.  In this case, the last is stripped, and it is indexed as a question mark.  Try to set punctuation as above, and I think it will solve these two issues.

Tags: Database

Similar Questions

Maybe you are looking for

  • Why Firefox only opens after update? Told to close, but how?

    Windows 7. Attempted to update Firefox later today, but it does not work on my screen. When I try to use the previous version or remove it, he first shut it down. How can I close that I can't see? Thank you.

  • Problem with Audio HD Coexant & Windows 7 drivers

    Hello. I have problems with the Conexant HD audio drivers in my HP dv633nr laptop. I just installed Windows 7 Pro through the MSDN Academic Alliance, and now Windows is indicating that the driver is not certified digitally. I tried uninstalling and r

  • Instructions for adding playlists to Clip +? Too many problems.

    Hi, I had problems adding playlists to my Clip +. I FOLLOWED THE GUIDE STEP BY STEP and I have a playlist that has only 1 song on 13. I have somewhat the same songs in different playlists, but this shouldn't be a problem. Then, I tried to make a new

  • OfficeJet 4500 G510n - the carriage does not move to the right

    Operating system: Win7. Chronic problem - when I need to replace an ink cartridge, open the access door fails to stimulate the transport to move the location of the far right cartridge replacement.  Usually solved in electric bicycle and try again, u

  • 1102w: computer says printeroffline

    I 1102w printer that wolrks fine on Win 7 machines.  Drivers installed (both x 86 and 64-bit) on the new laptop Dell with Win 10. Printer is detected by the computer, but shows both in offline or not turned on. Followed first pilot than offerred Win