How to index the words containing the letters as html entities?

The title says it all.

I replace currently known from HTML entities with their counterparts of Unicode from the start, but I was wondering if some feature integrated into the Oracle text could do the same and save an extra headache.

Indexed entities html anyone?

Thank you

Flavio

----

http://oraclequirks.blogspot.com

http://www.yocoya.com

You can create your own procedure using any method that you like, then use this procedure in a procedure filter and using this filter procedure in your index settings.  In the example below, I borrowed a strip_html function of

http://www.supermanhamuerto.com/doku.php?id=Oracle:fixhtml

and used in the procedure.

Scott@orcl12c_11gR2 > left scan

Scott@orcl12c_11gR2 >-table, data and lexer:

Scott@orcl12c_11gR2 > create table example (t varchar2 (4000))

2.

Table created.

Scott@orcl12c_11gR2 > insert all

2 in the example values ("crónicas y relatos")

3 in the example values ("crónicas y relatos")

4 in the example values ("CRÓnicas y Relatos de Mexico")

5 in the example values ("" crónicas y relatos Mexico City ' ")

6 select * of the double

7.

4 lines were created.

Scott@orcl12c_11gR2 > start

2 ctx_ddl.create_preference ("mylex", "BASIC_LEXER");

3 ctx_ddl.set_attribute ('mylex', 'base_letter', 'YES');

4 end;

5.

PL/SQL procedure successfully completed.

Scott@orcl12c_11gR2 >- http://www.supermanhamuerto.com/doku.php?id=oracle:fixhtml function

Scott@orcl12c_11gR2 > strip_html FUNCTION to CREATE or REPLACE (dirty IN clob,

2 to_cvs to THE NUMBER 0 by DEFAULT)

CLOB RETURN 3 IS OUT clob.

4

5 TYPE IS varray arr_string (200) OF VARCHAR2 (64);

6

entities_search_for 7 arr_string;

8 entities_replace arr_string.

9 cont NUMBER;

10

BEGIN 11

12

13

14. to speed up the question

15. IF dirty IS NULL THEN

16 dirty BACK;

17 END IF; -isnull (dirty)

18

19. If LENGTH (dirty) = 0 THEN

20 dirty BACK;

21 END IF; -length (dirty)

22

23 entities_search_for: = arr_string)

24 ' !'.

25 ' #'.

26 ' $'.

27 ' %'.

28 '& ',.

29 '' '.

30 ' ('.

31 ' )'.

32 ' *'.

33 ' +'.

34 ' ,'.

35 ' ‐'.

36 ' .'.

37 ' /'.

38 ' :'.

39 ' ;'.

40 ' < ',.

41 ' ='.

42 ' > '.

43 ' ?'.

44 ' @'.

45 ' ['.

46 ' \'.

47 ' ]'.

48 "BE."

49 ' _'.

50 ' `'.

51 ' {'.

52 ' |'.

53 ' }'.

54 "˜"

55' ',

56 ""

57 "¢."

58 "£"

59 ' ¤',.

60 «¥»,

61 '¦ ',.

62 «§»,

63 ' ¨'.

64 ' ©'.

65 "ª"

66 ' ' ', '.

67 '¬"

68 cm,

69 '®',

70 '¯ ',.

71 "°",.

72 "±"

73 '²',

74 '³',

75 "Honourable."

76 "µ",.

77 "¶"

78 "·"

79 '¸ ',.

80 '¹',

81 'º"

82 '' '.

83 ' &fr;'.

84 ' &fr;'.

85 ' &fr;'.

86 ""

87 'TO. "

88 'A ',.

89 'A ',.

90 'A ',.

91 'A ',.

92 'A ',.

93 "AE."

94 ' &il;'.

95 'E ',.

96 'E ',.

97 'E ',.

98 'E ',.

99 'I ',.

100 'I ',.

101 "I."

102 'I ',.

103 "D."

104 "N."

105 "O."

106 "O."

107 'O,

108 "O."

109 "O."

110 'x',

111 "O."

112 "U."

113 "U."

114 "U."

115 "U."

116 "Y."

117 'Þ ',.

118 "ss."

119 "to."

120 'a ',.

121 'a ',.

122 "a."

"123A",.

124 'e ',.

125 'e ',.

126 'e ',.

127 ' &etilde;'.

128 'e ',.

129 "i."

130 'i ',.

131 'i ',.

132 ' ĩ'.

133 'i ',.

134 "o."

135 "o."

136 'o,

137 "o."

138 "o."

139 "u."

140 "u."

141 "u."

142 ' ũ'.

143 'u');

144

145 entities_replace: = arr_string)

146 ""

147 'º"

148 "$."

149 '% ',.

150 '& ',.

151 '' '.

152 '(',)

153 ")',"

154 ' *'.

155 «+»,

156 «,»,

157 '-'.

158 '.',

159 "ground."

160 "colon."

161 ' *'.

162'<>

163 'is',

164 ' > '.

165 '?'.

166 «,»,

167 ' *'.

168 ' *'.

169 ' *'.

170 ' *'.

171 "_."

172 "',

173 ' *'.

174 ' *'.

175 ' *'.

176 cm '.

177', '

178 ""

179 "cent."

180 'L ',.

181 ' *'.

182 'Y',

183 ' *'.

184 ' *'.

185 '.',

186 '(c)"

187 ' *'.

188 ' *'.

189 '!'.

190 ' *'.

191 "(r)."

192 ' *'.

193 ' *'.

194 ' *'.

195 ' *'.

196 ' *'.

197 'a ',.

198 "u."

199 ' *'.

200 "·"

201 c '.

202 ' *'.

203 ' *'.

204 ' *'.

205 ' *'.

206 ' *'.

207 ' *'.

208 ""

209 'E ',.

210 'A ',.

211 'A ',.

212 "A."

213 ' *'.

214 ' *'.

215 "AE."

216 ' *'.

217 'E ',.

218 "E."

219 ' *'.

220 ' *'.

221 "I."

222 'I ',.

223 'I ',.

224 ' *'.

225 ' *'.

226 ' N ',.

227 "O."

228 "O."

229 'O,

230 ' W '.

231 ' *'.

232 ' *'.

233 ' O ',.

234 "U."

235 "U."

236 "U."

237 ' *'.

238 ' *'.

239 ' *'.

240 ' *'.

241 "to."

242 'a ',.

'a' 243,

244 'a ',.

245 ' *'.

246 'e ',.

247 'e ',.

248 'e ',.

249 "e."

250 ' *'.

251 'i ',.

252 'i ',.

253 'i ',.

254 'i ',.

255 ' *'.

256 "o."

257 "o."

258 'o,

259 ' o ',.

260 ' *'.

261 "u."

262 "u."

263 "u."

264 "u."

265 ' *');

266

267 OUT: = dirty;

268

269 - replace which is bounded and

270?-- -> lazy star (catch the minimum possible)

OUT 271: = regexp_replace (OUT, '. *?', ", 1, 0, 'nor');

272 clean what it is inside the style tags

273 OUT: = regexp_replace (OUT, ' ', ", 1, 0, 'nor');

274

275 IF to_cvs = 2 THEN

276 disinfect (not clean) the html code

277

278 clean tag

279 OUT: = regexp_replace (OUT, '<\?xml:.*?>', ", 1, 0, 'nor');

280 clean tags

281 OUT: = regexp_replace (OUT, '', ", 1, 0, 'nor');

282 comments

283 OUT: = regexp_replace (OUT,' ', ", 1, 0, 'nor');

284 clean meta

OUT 285: = regexp_replace (OUT, '', ", 1, 0, 'nor');

286 - own link

287 OUT: = regexp_replace (OUT, '', ", 1, 0, 'nor');

288 clean DIV

289 OUT: = regexp_replace (OUT, ",", 1, 0, 'nor');

290 - DURATION of own

OUT 291: = regexp_replace (OUT, ",", 1, 0, 'nor');

292 clean 'class inside of the tags'

293 OUT: = regexp_replace (OUT, ' (<.*?)class="?[a-zA-Z0-9-_]*"?(.*?>) ","\1\2", 1, 0, 'nor');

294 - clean the 'style' inside the following tags: I p b

295 OUT: = regexp_replace (OUT, ' (<[ibp] .*?)style=".*?" (.*?="">) ","\1\2", 1, 0, 'nor');

296 clean namespaces

297 OUT: = regexp_replace (OUT, ' (<)[a-zA-Z0-9-_]*:(.*?>)', "\1\2", 1, 0, 'nor');

298 OUT: = regexp_replace (OUT, "()", "\1\2", 1, 0, 'nor');

299

300 clean empty tags opening / closing: it must be

301 - past twice or three times to clean things like this:

302-

303 TWEAK:

should be replaced by

304 OUT: = regexp_replace (OUT, e

','
1, 0, 'nor');

305 OUT: = regexp_replace (OUT, '<([a-zA-Z0-9-_]*)>', ", 1, 0, 'nor');

306 TWEAK:

should be replaced by

307 OUT: = regexp_replace (OUT, e

','
1, 0, 'nor');

308 OUT: = regexp_replace (OUT, '<([a-zA-Z0-9-_]*)>', ", 1, 0, 'nor');

309

ELSE 310

311 clean html

312

313 - replace all the stuff that comes up to a carriage return

OUT 314: = regexp_replace ([OUT, '] * > ', Chr (10) |) CHR (13));

315 OUT: = regexp_replace ([OUT, '] * > ', Chr (10) |) CHR (13));

OUT 316: = regexp_replace ([OUT, '] * > ', Chr (10) |) CHR (13));

317

318 - replace all other html stuff

OUT 319: = regexp_replace ([OUT,'<[^>] * > "," 1, 0, 'nor');

320

321 replace all entities

FOR cont IN 1.119 LOOP 322

323 OUT: = REPLACE (OUT, (cont) entities_search_for, entities_replace (cont));

324 END LOOP;

325

326 - cleaning for export to cvs

327 IF to_cvs = 1 THEN

328 OUT: = REPLACE (OUT, CHR (10), ");

OUT 329: = REPLACE (OUT, CHR (13), ");

330 TO: = REPLACE (OUT, CHR (9), ");

331 OUT: = REPLACE (OUT, ';', ',');

332 OUT: = REPLACE (' OUTSIDE, ' "',"');

333 END IF;

334

335

336 END IF;

337

338

339 (OUT) RETURN;

340 END strip_html;

341.

The function is created.

Scott@orcl12c_11gR2 >-procedure that uses the function:

Scott@orcl12c_11gR2 > create or replace procedure standardization

2 (p_input in clob,

3 p_output in out nocopy clob)

4, as

5. start

6 p_output: = strip_html (p_input);

7 end normalize;

8.

Created procedure.

Scott@orcl12c_11gR2 >-filter that uses the procedure:

Scott@orcl12c_11gR2 > start

2 ctx_ddl.create_preference ('myfilt', 'procedure_filter');

3 ctx_ddl.set_attribute ('myfilt', 'procedure', 'normalise');

4 ctx_ddl.set_attribute ('myfilt', 'input_type', 'clob');

5 ctx_ddl.set_attribute ('myfilt', 'TYPE_SORTIE', 'clob');

6 end;

7.

PL/SQL procedure successfully completed.

Scott@orcl12c_11gR2 >-index that uses the filter:

Scott@orcl12c_11gR2 > create index myindex on example (t) indextype is ctxsys.context

2 parameters ("FILTER LEXER mylex myfilt")

3.

The index is created.

Scott@orcl12c_11gR2 >-tokens indexed:

Scott@orcl12c_11gR2 > select token_text from dr$ myindex$ I

2.

TOKEN_TEXT

----------------------------------------------------------------

CRÓNICAS

OF

Mexico

RELATOS

THERE

5 selected lines.

Scott@orcl12c_11gR2 >-research:

Scott@orcl12c_11gR2 > select * from example where contains (t, "crónicas") > 0

2.

T

--------------------------------------------------------------------------------

Crónicas y relatos

Crónicas y relatos

CRÓnicas y Relatos de Mexico

Crónicas y relatos of Mexico

4 selected lines.

Scott@orcl12c_11gR2 > select * from example where contains (t, "Mexico") > 0

2.

T

--------------------------------------------------------------------------------

CRÓnicas y Relatos de Mexico

Crónicas y relatos of Mexico

2 selected lines.

Tags: Database

Similar Questions

  • How to change the letters of the disc in XP

    During repair XP, after the kids he messed up, the names of CD/DVD players changed into something like "Compact Flash" and "MS/SD" or something like that.  How can I change the names of back to Cd/DVD?  The letters are very well that just the names are wrong.

    The letters are very well that just the names are wrong.

    Hello

    Open my computer. If the reader can be renamed, there will be an option to rename if you right-click.

    You can also consider a system restore...

    http://support.Microsoft.com/kb/306084

    .. .provided that it was a fairly recent thing, and you can go back enough.

    Tricky

  • HOW TO MAKE THE LETTERS ON MY SCREEN POLICY MORE BIG - TRYING TO READ THE SCREEN BELOW IT BLUE LETTERS ARE TOO LIGHT AND TOO SMALL

    HOW CAN I MAKE THESE LETTERS MORE GRAND AS WELL ON MY DESKTOP SCREEN - THE LETTERS ON THIS BODY ARE TOO SMALL AND THE LIGHT BLUE IS DIFFICULT TO READ

    Hello

    If you are using Internet Explorer, press the Alt key to display the menu bar , and then select View/Text Size. Select a size that's comfortable for you.

    For the Office, you will need to change the ppp settings.

    Click on desktop and select Screen Resolution.

    In the next window, select Make text and other more or less important.

    In the next window, the Select one of the sizes in option or set a custom size to your needs.

    I hope this helps.

    Thank you for using Windows 7

    Ronnie Vernon MVP
  • How to index the condition 'null' or 'not null '?

    Hello together,

    first of all for your background, we would like to make the following changes to a Table:

    1. we have an old varchar2 (50) column that is filled with strings.
    2. we now have a new number (3) column that is blank.

    Our goal is to move from the old column a new column so that each different string is mapped to a number. ('abc' - > 0, "xyz"-> 1, etc.)
    The table that contains the columns has 1.3 billion lines. There is no index on the old column.

    If possible the migration should be made online (without interruption) and the temporary additional space should be as low as possible. Due to the effect of the performance, we plan to cut migration into several parts which will run on low load times.

    To avoid full table scans, I question whether it is possible to index the status of the line. With the status, I'm only interested in "null" or "not null".
    Is it possible to define a type of bitmap index? (0 = null, 1 = not null) which should stimulate the migration time and does not use the amount of memory?

    Unfortunately I am quite familiar with index now.


    To crack the migration in parts, I thought to use to use ' where rowum > = x and rownum < = x + 10, 000, 000and new_column is not null "to do it in steps 10mio.»

    Thanks in advance,
    Andreas

    Like this?

    CREATE INDEX idx_test
      ON TABLE_NAME (NVL2 (column_name, 0, 1));               -- NVL2--> if column is null then 0 else 1
    
    SELECT *
      FROM table_name
     WHERE NVL2 (column_name, 1, 0) = 1;
    

    G.

  • the letters on the pages I open are really small, I get about an inch of my screen to read - how to enlarge the letters on Firefox?

    everything I opened is to small to read, I can't read my emails because the lettering is micro-petit. The letters in the browser are legible. I went through microsoft help and he did increase the letters in the browser - same prblem here on Firefox. Help

    Firefox remembers the settings of zoom on a per site basis. Maybe you just need to zoom in more. Use the keyboard Ctrl key & more

    That is to say the key to control with it, then press the sign next to the BACKSPACE key.

  • How to disable the letters over the entrance

    Hello world

    I was wondering if there is a way to disable the user to type the letters on a text box on a widget, a few numbers.

    I made this function, but it removes the field if there is a letter:

    function processKeyPress (e) {}
    var targ;
    If (! e) var e = window.event;
    If (e.target) targ = e.target;
    Else if (e.srcElement) targ = e.srcElement;
      
    If {(e.keyCode<49 ||="" e.keycode="">57)
    {if(e.keycode!=8)}
    Targ.Value ="";
    }
    }
    }

    document.addEventListener ("keypress", processKeyPress, false);

    and I put.

    You have a better idea?

    Thank you

    Michel

    The application will run on BlackBerry Device Software 5.0 or higher? If so, then simply use the new HTML5 input types.  There is a digital input.

  • How to put the code in html to wordpress

    I want to put ads on my blog, how do I put the code in html to wordpress

    Your question better asked in the WordPress support forums:
    http://WordPress.org/support/forum/3

  • How to index the occurrences in the table

    Hi, is there a way we can index every occurrence in the table?

    It seems that search that ID Array is once and I can't understand how do.

    Please notify

    Thanks in advance

    Clement

    You need only the index that corresponds to the item looking like this?

  • In Windows 7, how "to hide the letters highlighted for the keyboard navigation until I press the Alt key" (as in XP)?

    My underlined letters are all the time in menus and dialog boxes, and I want to disable them.

    The first thing I did was go to control panel, accessibility, making the keyboard easy to use, to turn off underline keyboard shortcuts and access keys - but it is already off.

    So how do you disable these underscores in Windows 7?

    Hello Dmbyrnes,

    I understand that you may be eager to hide the menu bar.  Attached, are steps that can be beneficial in addressing you request...

    (1) first Rt click on the bottom of the menu and uncheck lock the toolbars. Then click on organize > layout > menu bar make sure is unaudited.

    From there you should be able to hit the Alt key and the Bar Menu to repopulate.  If you please you would follow with me at your convenience, I would be very happy.

    Thank you

    Aaron
    Microsoft Answers Support Engineer
    Visit our Microsoft answers feedback Forum and let us know what you think

  • How to lock the letters on the keyboard

    Hello

    I had my thinkpad Tablet 2 for less than a week.

    This is a silly question, I couldn't figure out how to make a lock.

    Thanks in advance,

    Double tapping the shift "key" allows to lock. There is no need to go to the 'full' keyboard

  • How to end the region of HTML

    Hi guys,.

    I just copied the HTML from Youtub code and pasted into the HTML region:

    How can I avoid what you see in the following picture

    http://www.9M.com/upfiles/fm577745.PNG

    However, it is the HTML code I use:
    </head>
    <body>
      <object style="height: 390px; width: 640px">
        <param name="movie" value="&P2_VIDEO.">
        <param name="allowFullScreen" value="true">
        <param name="allowScriptAccess" value="always">
        <embed
          src="&P2_VIDEO."
          type="application/x-shockwave-flash"
          allowfullscreen="true"
          allowScriptAccess="always"
          width="640"
          height="390">
      </object>
    </body>
    </html>
    Best regards

    I just want to make sure that nothing bad in the HTML code.

    There are. This is the soup of tags:

    
    
      
        
        
        
        
      
    
    
    

    As already explained ({message identifier: = 9671453}), html , headand body are elements that are provided by their SUMMIT by the page template. Only the part of the object and its contents appear in an HTML area.

    To work with APEX, he got at least a basic understanding of (X) HTML, CSS, and JavaScript. It is the Foundation on which everything else is built. Spend some time on some tutorials: start with HTML, XHTML and CSS, Javascript and the HTML DOM, until you know how the pages are structured, presentation is applied through stylesheets and behavior is added using scripts and events.

  • Dreamweaver / excel or access - how to get the series of html files

    Hello

    IM pretty newbe to dreameaver, any advice is appreciated.

    I have prepared a web page layout using dreamweaver. I also have a list of 200 lines with 4 columns in excel.

    I d like dreamweaver automatically create series of clean html files (no java or other technologies) only html and css, and use the data from the excel file.

    I d like dreamweaver to 'take' each of the cells of a row in the excel file and put it in its place for the layout (description3, description1, description2, description4). as the output, I d like to get 200 html files.

    I was sleeping me with options of database in dreamweaver for a few days and I had the feeling that the features are rather designed for someone who creates dynamic web pages and a database installed on a server. in one such case dreamweaver inserts... java, ajax and spry... (I don't know m) code that I don't want.

    Thanks in advance.

    I have the

    This may seem a little too complex, but you "could" export your Excel files to xml format and built pages by using spry data sets.

    In this way, you wouldn't need a database and the learning curve is perhaps easier.

    "Look at the demos of spry on the following link and the previous link to connect to the developers for ' how - to of.

    http://www.Adobe.com/devnet/Dreamweaver/?view=samples

    Using spry and export your data from Excel to xml format would mean just update your xml file to reflect changes you need for data to Web sites.

    HTH

    Note: in this way, you can use any server provider and there is no specific requirement.

    Post edited by: pziecina

  • How to disable the credentials of html cache/recording release form?

    I've included several changes to the file user.js (SLES11, firefox - 3.6.16 - 0.2.1) to achieve most of my requirements, but I can not find the topic: config that prevents the names of users (and passwords according to the chosen setting) economy.

    I was using everything: config setting in pref.js then I change the setting by default: config and add the configuration line appropriate to user.js rather than put it in pref.js

    The information I've found online head to edition-> Preferences, but this configuration must be written for use in a base which will be distributed image.

    user_pref ("signon.rememberSignons", false);

  • How to upgrade the pannel of html when you select the layer?

    Hi friends of script.

    I'm doing a Panel to display the name of the selected layer.

    Declaring even is not a job for this.

    Thanks for the help!

    In HTML, make sure that you sign up for the events correctly. See these instructions: toshopcallback http://www.davidebarranca.com/2014/02/html-panels-tips-7-events-photoshopregisterevent-pho /

    Make sure that you run the latest version 14.x and the 15.x.

  • How to make the bin-debug HTML file compatible with Firefox

    When I build the program, it creates this HTML file in the bin-debug output folder.  The file works fine with Internet Explorer, but when I use Firefox, it's just a blank white page.  In any case so that it works with IE and Firefox?

    Here is the code above, if you want just a shortcut:

       ${title}   
    
          
    
       

    Alternative content

Maybe you are looking for

  • Book IBooks appears no content

    When I open a book of the iBooks Library window, after the opening of the book animation I just get a transparent surface on the screen were the book should be, see the image below. This seems to happen to all the books, including any recently re-upl

  • Each site has no javascript code

    Some websites that I don't have javascript or they (sorry for my English) are messy. Images or videos don't is not loading. He started not so long ago, and I use 17.0.1.

  • Impossible to scan OCR on a model of all-in-One Photosmart B109n-z

    My solution center software downloaded from the HP site does not features to find in searchable PDF documents, .txt or .rtf as it and as stated in the manual online.  This has changed and why the non-existent OCR feature.  The solution downloaded sof

  • Dependent Service in UCSM material of blade profiles? Is that we can assign the same service to a B200M2 profile, B230 M2 and a blade of B200M3?

    Dependent Service in UCSM material of blade profiles? Is that we can assign the same service to a B200M2 profile, B230 M2 and a blade of B200M3? An example: if my B200 M2 goes down because of a hardware failure, I replaced the same slot with a B200 M

  • Lightroom questions?

    Hi all, would appreciate any input. My Lightroom seems to be falling apart! Just got in return for a photo shoot big and downloaded some 3000 images from around 6 SD cards. The questions are:Significant time lag in loading library mode.Pictures impos