Stripping all the HTML of a CLOB

Hi all

Running Oracle 9.2.0.8 on AIX...

We have a table that stores the HTML document fragments in a clob. I have a requirement to convert them into plain text / (remove all the HTML tags) for sending in a body of text/plain e-mail.

I read the following solution to the site of Tom Kyte:

http://asktom.Oracle.com/pls/asktom/f?p=100:11:0:P11_QUESTION_ID:25695084847068

Basically, creating a text index Oracle CLOB column and appellant with true ctx_doc.filter assigned to the parameter 'plaintext '.

I noticed in the example of Tom, he uses the filter by default, based on the docs, NULL_FILTER, which applies to any filtering. I tried the example in my dev box, creating the text index on the CLOB column without parameters.

The call to ctx_doc.filter filtered html code at all. I recreated the index and specified the INSO_FILTER and filtering has been performed. I was under the impression that INSO_FILTER was for binary content in plain text filtering...
create table filter ( query_id number, document clob );

create table demo
  ( id            int primary key,
    theclob       clob
  );

create index demo_idx on demo(theClob) indextype is ctxsys.context;

SET DEFINE OFF;
Insert into DEMO
   (ID, THECLOB)
 Values
   (1, '<html><body><p>This is a test of <strong>ctx_doc.filter</strong> and plaintext filtering.</p></body></html>');
COMMIT;

exec ctx_doc.filter('demo_idx',1, 'filter',1, true);
The code above does not convert html to plain text...

Now re-create the index with INSO_FILTER
drop index demo_idx;

create index demo_idx on demo(theClob) indextype is ctxsys.context parameters ('filter ctxsys.inso_filter');

exec ctx_doc.filter('demo_idx',1, 'filter',1, true);
Above script returns string "this is a test ctx_doc.filter and filtering of plain text.

ORacle documentation does not specify any special filter parameter that needs to be fixed... I was wondering, am I missing soemthing here... or, better yet, if there is a better solution to my problem. ;-)

Thank you

Stéphane

This is a brute force method.

SCOTT@orcl_11gR2> create table demo
  2    (id       int primary key,
  3       theclob   clob)
  4  /

Table created.

SCOTT@orcl_11gR2> insert into demo values (1,
  2    '

This is a test of ctx_doc.filter and plaintext filtering.

') 3 / 1 row created. SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION no_html 2 (p_string IN CLOB) 3 RETURN CLOB 4 AS 5 v_string_in CLOB := p_string; 6 v_string_out CLOB; 7 BEGIN 8 WHILE INSTR (v_string_in, '>') > 0 LOOP 9 v_string_out := v_string_out 10 || SUBSTR (v_string_in, 1, INSTR (v_string_in, '<') - 1); 11 v_string_in := SUBSTR (v_string_in, INSTR (v_string_in, '>') + 1); 12 END LOOP; 13 v_string_out := v_string_out || v_string_in; 14 RETURN v_string_out; 15 END no_html; 16 / Function created. SCOTT@orcl_11gR2> SHOW ERRORS No errors. SCOTT@orcl_11gR2> SELECT id, no_html (theclob) FROM demo 2 / ID ---------- NO_HTML(THECLOB) -------------------------------------------------------------------------------- 1 This is a test of ctx_doc.filter and plaintext filtering. 1 row selected. SCOTT@orcl_11gR2>

Tags: Database

Similar Questions

  • Dreamweaver CS6 when connecting it to the external style sheet, it doesn't affect all the html page.

    Dreamweaver CS6 when connecting it to the external style sheet, it doesn't affect all the html page. It started to happen after upgrading to mac osx elcapitan.

    have no idea how to solve this? now I write all my css stylesheet in the head section of my html page.

    Thanks for help

    Best regards

    the problem has been resolved.

    I forgot to write rel = "stylesheet", that was the problem.

  • CSS wrong erases ALL the html

    Using Firefox today in a webapp template.  When I put an incorrect syntax of css in a div style tag (I know, how could I?) he erased ALL the html code!  Wow.  That will teach me to learn my css.

    Seriously, can someone check this?

    IE is crap, pure and simple, he should be aware.

    For many people, I see here all the time too for example have problems with browsers, the page layouts and styles. The first problem is that the code and sites are not in the web standards, full of validation errors, not well trained, inline/external javascript and CSS in every sense, is something going to the same place and very different the next. Even to be an expert, then find the mamoth question belongs.

    If your HTML and CSS in the sites of malformed people then rely on the browser, making it. Firefox does a very good job thinking about what things should be, and then untangle them people used to know the problems, even with a few javascript errors, it can carry on.

    Missing one; and chrome and firefox will be fine, but IE stops on all errors, which in some ways is good for this. But badly formed HTML it falls on.

    Do some web standards, coding good CSS, knowing really well what you do, just like in any job as a plumber, electrician, then all you must then think about what are the bugs of IE.

    I know hide form elements and try to run javascript will be problems in older browsers IE due to bugs, I know that 7 does not properly clear with clearfix again if you need to do certain things. I know that a z-index will not work unless you apply it to the parent so elements 7.

    My IE only style sheets are very small these days because of these principles.

    It is not to be ' I'm better then you "but if you follow the standards of the web, good code, pracatace etc., you have fewer problems and the performance of the sites you generate just increase the best.

    Using code editors that not only quickly fill in your code, easy to follow and watch and provide the tools you need means that get you the job done correctly and efficiently.

    It's like having the task sawing wood. You have a guy trying to do it on her lap, or the guy using the bench and the clamp. Just because you can do two ways, which gets the best cut and are less of a hassle and also safer?

  • "Export as html" creates all THE html files

    Hello

    A few weeks ago Muse created all of a sudden all files same html so I do not change a thing on some people.

    Muse creates only a new css files for these pages on that I changed something.

    What can I do?

    Thank you

    Ylva

    Any user created uninstall files never touch, but of course, it's always a good idea to save them.

    Mylenium

  • Function to remove the HTML of all THE

    I have a field to retrieve from the primavera which is the BLOB format in the database and pump it in a view.

    We injected codes to remove HTML tags such as <>/ and some other stuff ASCII (obtained using a developer on ACII it so I don't know how to explain to forgive me), but we have still some surprises like & nbsp and & amp appearing according to what is entered in the primavera.

    How to remove all THE html codes? Currently, the code looks like this:

    Replace (Replace (REGEXP_REPLACE (utl_raw.cast_to_varchar2 (DBMS_LOB.substr (TM.task_memo)), ' < [^ >] + > '), Chr (13), "), Chr (10),") and the stories

    Thank you.

    Your home, remember to mark as answer.

    If you look at the string in the database, you can use dump to see which character is at the end of the line.

    In this example, I placed a chr (0) at the end of the chain to show:

    Decimal:
    Select dump ('test' |) Chr (0), 8) twice;

    Output: Type = 1 Len = 8: 164,145,163,164,151,156,147,0

    Spell:
    Select dump ('test' |) Chr (0), 16) double;

    Output: Type = 1 Len = 8: 74,65,73,74,69, 6th, 67, 0

    Published by: specdev on August 6, 2012 05:08

    Has answered without receive useful or correct answer points :-(but make us someone happy today :-)

  • Windows XP Pro SP3, OEM in Dell D600 "not responding" on every article, every time, all the windows

    My DELL D600 has OEM Windows XP Pro SP3, and netframeworks all or almost suddenly started not ANSWER DO NOT each time. I don't know what went wrong, it's driving me crazy. I can't afford to upgrade.

    I tried the fixit from Microsoft online, no luck.

    I uninstalled a few programs, perhaps thinking it's too full. Still no luck.

    Whenever I use any folder window, or any element of the queue at all, the same thing happens, every time. "Not responding" delays on the window or the desktop contains the item. Must hit Ctl + Alt + Delete at least twice, if not 3 - 4 times.  Each 'entry' lasts for a number of at least 26 and often much longer.  Basically, Windows is not very functional.

    I have two OEM for Dell, not sure that I put here (I was wondering if I should repair-installation Windows), but worried to destroy my Office Server installations that have 3 WordPress developments.  I can't afford to destroy the.

    Any ideas to FIX this would be greatly appreciated.

    Thank you.

    Good, minor success.

    The returned quickly file number, but a re - check with Roguefix showed, it was not clear, no change in System.

    I therefore decided to start uninstalling things, hoping to uninstall accidentally the culprit.

    First I stripped all the netframeworks and the SPs.  Has failed, file number remained.

    I then started to uninstall programs. After uninstalling programs a half-dozen, suddenly THE FILES ARE WORKING AGAIN!  Don't ask me why.

    Then, I rebuilt my netframeworks and SPs.

    Then, I clicked to access Microsoft Updater to get up to speed.

    IT WILL NOT BE UPDATED.  Failed! is what I get.  It says the software needed TO be updated is not in my HD and offer a download. When I try to download and install failed!  It will be not even installed that.

    This goes back to the same "cause" - for reasons unknown the last 4-6 weeks, installing any updates in my DELL D600 Windows XP Pro OEM.  Initially, they install.  Then begins the phase were some updates installed, others down without reason.

    Then, no updates would be installed.  That was BEFORE I plucked executives trying to solve the issue of records.

    It is not even asking me to install the real test happened before, I think that.  I think it was in August 2013 or almost.

    I'm desperate to get the updates, if you have an idea on how to help me.

    Thank you.

  • How to convert the Variable containing the HTML must be interpreted as HTML in the HTML area

    To understand the issue, I have a table that stores the HTML code generated by user and type of report:
     USER      | REPORT_TYPE | HTML_GENERATED
    ----------------------------------------------------------------------------------------|
     TEST_USER |   TEST      | <HTML><head><title></title></head><BODY>TEST</BODY></HTML>   |
    On the page, there is a process that uses an anonymous PL/SQL block that is set on "on the charge before the header." Here's part of it:
    SELECT HTML_GENERATED into :P2_HTML
    from LN_DOCUMENT_LABELS
    where USER = :app_user AND REPORT_TYPE = 'TEST';
    The idea is to generate all the HTML code behind the scenes in another package.

    There is only one element on the page to store the variable in:
     P2_HTML     (Hidden) 
    Then I created an HTML region, and in the region, I wanted to call in the element, so I got back:
    &P2_HTML.
    The result just displays the HTML code:
    <HTML><head><title></title></head><BODY>TEST</BODY></HTML>
    Is there a way to force it to be the result of the HTML when the page?
    TEST

    You have a region of pl/sql with code like:

    begin
      htp.p(:P2_HTML);
    end;
    

    Who would send to your browser to display html code... However, you can confirm that the produced HTML code is valid...

    Thank you

    Tony Miller
    Ruckersville, WILL

  • is the HTML disabled when we export?

    Hello

    With the help of obiee 11g
    I would like to know one thing, when I created a report that I have a static text, which, by default, occupies the center of the position in the dashboard,
    I lined up at left in the análsis, in the dashboadr his watch correctly, but when I download its aligned at the Center.

    Who is all the html part what we do in the report went when we downlaod.
    Because in the position column too, if I had put a few < br > then I see that all that used to be there when I download?

    Thank you

    So it's quite simple. You must do so within the analysis itself.
    Create 2 static view, a real and other vacuum but set equal width for each. Now place table see below actual static and graphic display under vacuum. This will show text above table but not above chart. No html tags required in this case. Hope this will help

  • Remove the HTML tag to convert the site in CSS

    My site use any CSS and I need to change that. Is there a quick way to remove all the HTML code but leave in the links? I can copy the text via a browser, but I lost all the links to the images.

    Thank you

    "align" isn't a tag - it is an attribute. As I suspected, your best bet is
    to start again.

    --
    Murray - ICQ 71997575
    Adobe Community Expert
    (If you * MUST * write me, don't don't LAUGH when you do!)
    ==================
    http://www.projectseven.com/go - DW FAQs, tutorials & resources
    http://www.dwfaq.com - DW FAQs, tutorials & resources
    ==================

    "bastiat" wrote in message
    News:gefeta$6LL$1@forums. Macromedia.com...
    > I can't get this to work with the tag "align".

  • How to stop all the text/HTML/menu of the browser to appear as numbered seats?

    A few weeks ago ALL the text in my firefox browser began to appear as numbered places. It's all of the HTML on the page, javascript popups, text fields of menus... everything. It didn't happen after I made a bet to day or anthing, it just happened partially through the day when I opened the browser window. I tried to do an update when I was invited, which didn't help, and I tried completely uninstall and then reinstall the latest version.

    Try some prefs to toggle Boolean gfx.font_rendering on the topic: configuration page to disable some features.

    Filter: gfx

    Some to try:

    • Set gfx.direct2d.disabled to true to disable Direct2D
    • GFX.font_rendering.directwrite.use_gdi_table_loading set to false

    To open the topic: config page, type Subject: config in the address bar (address) and press the 'Enter' key, as you type the url of a Web site to open a Web site.

    If you see a warning then you can confirm that you want to access this page.

    • Use the filter at the top bar of the on: page config to more easily spot a preference.
    • Preferences that have changed also show "BOLD" (user set).
    • Preferences can be reset to the default value using the context menu if they are set of users
    • Preferences can be changed via the context menu: Edit (string or integer) or toggle (Boolean)

    Try turning off hardware acceleration.

    • Tools > Options > advanced > General > Browsing: "use hardware acceleration when available.
  • All that print from the web doing in the HTML code. I use the family XP Edition version

    All that print from the web doing in the html code. I use the family XP Edition version.  I'm almost on my home computer so I don't know when it happened first.  Try to print the boarding card.  Help.

    Thank you

    Hi GordonHarris,

    ·         You use Internet Explorer? If so, what version of Internet Explorer?

    ·         Did you do changes on the computer before the show?

    Follow these methods.

    Method 1: Follow these steps:

    (a) click Start, Control Panel, Add and remove programs.

    (b) click on set program access and defaults, select custom, select Internet Explorer as your web browser by default.

    (c) click OK.

    Method 2: Follow the steps in the article, if you are using Internet Explorer.

    I can't print or preview before printing a Web page in Internet Explorer

    http://support.Microsoft.com/kb/973479

  • HP 8600 pro all-in-one: the horizontal stripes on the print out and print to PDF HP 8600

    I'm the horizontal stripes on the printed pages. I cleaned 3 times, each time, the printed test results look good. I even had streaks on the pdf when I chose the print to pdf function. I ran the diagnostics and everything seems fine. I don't get the error messages.

    Not all pages have streaks. But I tried to re - print this document several times and the first page always streaks.

    Hello kerriman,

    Could you go to see how the printer is connected? Is it plugged into a bar supply or directly into a wall outlet? If it is in a power bar could you try plug it straight into the wall.

    If you're still having problems after this attempt following the steps in this document:

    http://support.HP.com/us-en/document/c02866222

  • What is the html code to make the horizontal boxes, I need to get some boxes left to right, but all I can find is above the other

    What is the html code to make the horizontal boxes that I do some boxes go from left to right, but all I can find is above the other, you can see an example here car insurance quotes where the boxes are next to each other, and will this affect the page width on my site. Someone told me use DIV tags but how do i put this in HTML code?

    You might be better off using a Table element.  See http://www.w3.org/TR/html401/struct/tables.html

    Steve

  • How to print if there is no output from the command in the HTML file instead of all?

    Hi all

    I got this Hostile coding HTML and I gave my entry for this get my desired result. Below the Script works fine.

    Problem here is when I execute Script I put out in the HTML file with all the details below.

    Example: If no virtual machine is not connected to the CD ROM, and then also in the output html file I see topic as 'CD-ROM connected to VM"- I does not require that there is no such VM.

    Is it possible that I can print only if the command's output to the HTML file.

    Please suggest on this.

    $OutputPath = get-Date - UFormat "C:\users\$env:username\desktop\Reports\ /%B /%Y-%b-%d @ % I-%M%p.html.

    $Css = '< style >.

    {body

    do-family: Verdana, without serif.

    do-size: 14px;

    Color: #666666;

    background: #FEFEFE;

    }

    #title {}

    color: #90B 800;

    font size: 30px;

    make-weight: bold;

    padding-top: 25px;

    margin-left: 35px;.

    height: 50px;

    }

    {#subtitle}

    do-size: 11px;

    margin-left: 35px;.

    }

    #main {}

    position: relative;

    padding-top: 10px;

    padding-left: 10px;

    padding-bottom: 10px;

    padding-right: 10px;

    }

    {#box1}

    position: absolute;

    background: #F8F8F8;

    border: 1px solid #DCDCDC;

    margin-left: 10px;

    padding-top: 10px;

    padding-left: 10px;

    padding-bottom: 10px;

    padding-right: 10px;

    }

    {#boxheader}

    do-family: Arial, without serif.

    padding: 5px 20px;

    position: relative;

    z-index: 20;

    display: block;

    height: 30px;

    color: #777;

    text-shadow: 1px 1px 1px rgba (255,255,255,0.8);

    line-height: 33px;

    font size: 19px;

    Background: #fff;

    background:-moz-linear-gradient(top, #ffffff 1%, #eaeaea 100%).

    background:-webkit-gradient (linear, left top, left bottom, color-stop(1%,#ffffff), color-stop(100%,#eaeaea));

    background:-webkit-linear-gradient(top, #ffffff 1%,#eaeaea 100%).

    background:-o-linear-gradient(top, #ffffff 1%,#eaeaea 100%).

    background:-ms-linear-gradient(top, #ffffff 1%,#eaeaea 100%).

    background: linear-gradient(top, #ffffff 1%,#eaeaea 100%).

    filter: progid:DXImageTransform.Microsoft.gradient (startColorstr = "#ffffff", endColorstr = "#eaeaea", GradientType = 0);

    box-shadow:

    0px 0px 0px 1px rgba (155,155,155,0.3),

    1px 0px 0px 0px rgba (255,255,255,0.9) Locket,

    0px 2px 2px rgba (0,0,0,0.1);

    }

    table {}

    Width: 100%;

    border-collapse: collapse;

    }

    table td, table th {}

    border: 1px solid #98bf21;

    padding: 3px 7px 2px 7px;

    }

    Table th {}

    text-align: left;

    padding-top: 5px;

    padding-bottom: 4px;

    background-color: #90B 800;

    color: #fff;

    }

    table tr.alt td {}

    Color: #000;

    background-color: #EAF2D3;

    }

    "< / style" >

    #These's statements of divs used to properly style HTML using CSS defined previously

    $PageBoxOpener = "< div id ="box1"" > ""

    $one = "< div id ="boxheader"" > Services vCenter < / div > ""

    $BoxContentOpener = "< div id ="boxcontent"" > ""

    $PageBoxCloser = "" < / div > ""

    $br = "< br >".

    $two = "< div id ="boxheader"" > DataStore < / div > ""

    $three = "< div id ="boxheader"" > data mapped with hosts store < / div > ""

    $four = "< div id ="boxheader"" > VM on Local Storage < / div > ""

    $five = "< div id ="boxheader"" > VM in incompatible files < / div > ""

    $six = "< div id ="boxheader"> Sharing (Virtual) ' in VM SCSI Bus < / div >" "

    $seven = "< div id ="boxheader"" > CD-ROM connected to the virtual computer < / div > ""

    $eight = "< div id ="boxheader"" > ISO mounted VM < / div > ""

    $nine = "< div id ="boxheader"" > map of physical Nic below 1 GB < / div > ""

    $ten = "< div id ="boxheader"" > Host Configuration issues < / div > ""

    VCenter Service Info #Get

    $vcservice = Get-Service -ComputerName srti003a vpxd,vctomcat,VMWareCertificateService,VMwareDirectoryService,VMwareIdentityMgmtService,VMwareKdcService,vmwarelogbrowser,VMwareSTS,vimQueryService,vmware-network-coredump,vmware-network-coredump-webserver,vimPBSM,vmware-ufad-vci,RpcEptMapper,DcomLaunch,RpcSs,VMTools | where {$_.} Status - doesn't "work"} | Select-Object Name, DisplayName, status | ConvertTo-HTML-Fragment

    Info question for the #Get data store

    $dc = get-datastore. where {$_.} State - eq "unavailable"} | Select-object Name, State. ConvertTo-HTML-Fragment

    #Get issue of data store mapped with hosts Info

    $dci = get-datastore. where {$_.} State - eq "unavailable"} | Select-object Name, State

    $maph = get-VMHost - store of data $dci.name | Select name. ConvertTo-HTML-Fragment

    #Get VM located on the storage space that are registered in the inventory

    $vmls = get-Datastore. where {$_.} Name - match "local | Stor"} | Get - VM | Get-hard drive | Select @{N = 'Name'; E={$_. Parent}},@{N='Filename'; E={$_. {{Filename.Split('/') [0]}} | ConvertTo-HTML-Fragment

    #Get VM inconsistent name in the inventory and the path to the folder

    $vmi = get-View - ViewType VirtualMachine.

    where {$_.} Name - not $_. {Summary.Config.VMPathName.Split (['/] ") [2]} |

    Select Name,@{N='Path'; E={$_. {(Summary.Config.VMPathName.Split('/') [0]}} | ConvertTo-HTML-Fragment

    Bus #SCSI, share on a virtual computer

    #$phy = get-Cluster | Get-VMHost | Get - VM | Get-SCSI controller. Where-Object {$_.} BusSharingMode - eq 'Physical'} | Select {$_.} Parent.Name}, {$_.} Parent.Host}, BusSharingMode | Sort {$_.} Parent.Host} | ConvertTo-HTML-Fragment

    $vir = get-Cluster | Get-VMHost | Get - VM | Get-SCSI controller. Where-Object {$_.} BusSharingMode - eq 'virtual'} | Select {$_.} Parent.Name}, {$_.} Parent.Host}, BusSharingMode | Sort {$_.} Parent.Host} | ConvertTo-HTML-Fragment

    #CD-ROM connected to the virtual computer

    $cd = get - vm | where {$_ | get-cddrive | where {$_.}} ConnectionState.Connected - eq "true"}} | Select name. ConvertTo-HTML-Fragment

    #ISO mounted on the virtual machine

    $iso = get - vm | where {$_ | get-cddrive | where {$_.}} ConnectionState.Connected - eq "true" - and $_. ISOPath-like. " {{' ISO * "}} | Select Name, @{Name =". Path ISO. " Expression = {(Get-CDDrive_$_).isopath}} | ConvertTo-HTML-Fragment

    #Physical speed of NIC 1 GB cards list below

    $nic = get-VMHostNetworkAdapter | Where {$_.} BitRatePerSec - no "0" - and $_. BitRatePerSec - not "1000" - and $_. {BitRatePerSec - not "10000"} | Where {$_.} Name - match "vmnic"} | Select Name, VMHost |  ConvertTo-HTML-Fragment

    #Host configuration problems

    $hi = get-VMHost | Select @{N = "HostName"; E={$_. Name}},@{N='Message'; E={$_. ExtensionData.ConfigIssue.FullFormattedMessage}} | where {$_.} Message - '[string]'} | ConvertTo-HTML-Fragment

    Report #Create HTML

    #-Tête parameter may be omitted if the header is declared in the body

    ConvertTo-Html-title "vCheck" - head "< div id ="title"> PowerCLI Reporting < / div > $br < div id =" subtitle"> report generated: $(Get-Date) < / div >"

    "- Body"$Css $PageBoxOpener $one $BoxContentOpener $vcservice $PageBoxCloser $br $two $BoxContentOpener $dc $PageBoxCloser $br $three $BoxContentOpener $maph $PageBoxCloser $four $BoxContentOpener $vmls $PageBoxCloser $five $BoxContentOpener $vmi $PageBoxCloser $six $BoxContentOpener $vir $PageBoxCloser $seven $BoxContentOpener $cd $PageBoxCloser $Eight $BoxContentOpener $iso $PageBoxCloser $Nine $BoxContentOpener $nic $PageBoxCloser $Ten $PageBoxOpener $hi $PageBoxCloser.

    " | Out-file - filepath $OutputPath

    You seem to have blank lines, that should not be there.

    The last 3 lines of the script must be

    "ConvertTo-Html-title"vCheck"- head"

    PowerCLI Reporting
    $br
    Report generated: $(Get-Date)
    " `

    -Body $Body |

    Out-file - filepath $OutputPath

  • How to integrate all the .js files in .html file?

    Hello!

    I need to create a sequence of html5, which will take place on an ad server. According to its policy, all the banners of html5 can only contain images and html files and all the css and js codes is the part of the .html file.

    So my question is how to incorporate all the JS code into the html?

    If I publish the project, animate Edge produces a file of * _edge.js next to the html file and I tried several solutions to integrate this code into the HTML file, but the result was a blank white page all the time

    Thank you in advance,

    George

    The runtime of edge looks specifically for this file * _edge.js, and if she could not find this file, it will give you error which will hinder the animation.

    We can understand your problem, and Adobe might consider providing a solution for this in the next version of edge animate.

    HTH,

    Vivekuma

Maybe you are looking for