Loading MS Word Documents for the Conversion DB Records?

Here's what I'm working with...
Oracle 10.2.0.3.x (w / cd installed companion)
Fedora Core 7
4 processors dual-core.
16G RAM

For a long time, our intranet has accepted uploaded word files and convert to PDF and each file associated with a db record that stores the location of the file on the disk, the name of the document, the owner and a few other details. The files are generally well formatted, each containing a line for the title, entry into force, author, owner, etc.

We have reached a point where the PDF conversion is not work too well. In addition, our research via a google mini system returns obsolete files or which have not yet been published. Basically, we need a better solution.

What I want to do is...
1. use oracle text scan/load/import the word files,.
2. use oracle text to analyze the word files identifying items such as the effective date, owner.
3 load 2 results in a table that is searchable by oracle. I can use my intranet application to convert data to word or pdf or whatever the user needs.

I'm here because I have a) do not know if this is possible, exactly b) how to go about the task.

I would really like tips, pointers or links to useful documentation.
Thank you.

If this is the case,
Step 1 - try to download the documents by using the following code

http://snandan.blogspot.com/2007/07/how-to-insert-PDFDoc-files-into-table.html

The above is indicated for the pdf file, but the same can be done for MS word too.

Step 2 - create indexes of CONTEXT using AUTO_FILTER

Step 3 - try which precedes for couple of documents and then perform a search using the keywords as TITLE, EFFECTIVE DATE etc.
See the precision of the results, and then you can go further.

The other part that I can think of - can be a bit complicated - but certainly much more efficient research and storage method will be.
Once you save the file into the BLOB column - you can try to spend some time writing a small pl/sql script to read once more and obtain the necessary information like TITLE, EFFECTIVE DATE, etc, and store in the table.

In fact we have implemented something like this before also

We had PDFs like that.
We inserted these pdf files in BLOB
and had a sql script to extract the values you want and put it in a separate - intermediate table
and once again - a perl script to set all of these properties and content in an XML document

Now the advantage pf putting documents was felt when we had multiple queries running against them based on the properties and the content
We used SURLABASEDESDONNEESDUFABRICANTDUBALLAST - 11g feature

queries like

text contains 'Insurance' and date > «»

Tags: Database

Similar Questions

  • I'm trying to convert a PDF to a word document and the system tells me that I must pay.  The account must be active and does not require not renewal until mid-February of this year.

    I'm trying to convert a PDF to a word document and the system tells me that I must pay.  The account must be active and does not require not renewal until mid-February of this year.

    Hi louiset79309343,

    Have you tried the https://cloud.acrobat.com/exportpdf online option?

    I checked your account, you have an active subscription for the service to export it to Adobe PDF format.

    Kind regards
    Nicos

  • Using a desktop PC I'm opening a PDF file and trying to save as a WORD document.  The application continues to send me subscribe to convert the document that I already did before.  Is there another way I should try to convert?

    Using a desktop PC I'm opening a PDF file and trying to save as a WORD document.  The application continues to send me subscribe to convert the document that I already did before.  Is there another way I should try to convert?

    The see often when someone pays for a product buy us tries to use another. Less often if the subscription fails.

    So, what exactly did you pay?

  • Add the Word Document to the dashboard

    Hi Experts,

    I have to build a dashboard containing a Word document. This Word document is updated frequently by business users.

    I use 'Link and Image' in the dashboard after loading the Word document in the directory of resources. The URL of the link looks to ' / analytics/res/Help_Tips_Basic.doc'.

    The disadvantage of this approach is that whenever changes to the document, the professional user should ask the system administrator to download the document on the Unix server.

    The ideal approach would be that the professional user can download the Word Document in the catalog. However I can't get the link right. I use OBIEE 11.1.1.5.

    Any suggestions?

    Thank you very much

    Shi-ning

    Just upload your Word, excel, pdf, txt, etc. in webcatalogs (catalog of Download Manager - via presentation services then)
    then call using downloadFile & path
    ex: http://localhost:9704/analytics/saw.dll?downloadFile&path=text.pdf

    Note:
    If any sapce b & w the path of the folder to use %20 in the url of tha
    If / in the path use % 2F in this 100% URL it will work

    Visit this link
    http://total-bi.com/2011/02/external-files-OBIEE-dashboard/

    If brand pls help

    Published by: VIEREN Srini December 3, 2012 16:09

  • Script for the conversion of the hyperlinks to the buttons?

    Hello!

    Does anyone know if West a script for the conversion of the hyperlinks to buttons with the action of going to the URL with the same URL, which has been used with hyperlink?

    Here it is:

    /* Copyright 2012, Kasyan Servetsky
    November 29, 2012
    Written by Kasyan Servetsky
    http://www.kasyan.ho.com.ua
    e-mail: [email protected] */
    //======================================================================================
    var scriptName = "Convert hyperlinks to buttons - 1.0";
    
    Main();
    
    //===================================== FUNCTIONS  ======================================
    function Main() {
        var hyperlink, source, sourceText, destination, page, arr, outlinedText, gb, button, behavior,
        barodeCount = 0,
        hypCount = 0;
        if (app.documents.length == 0) ErrorExit("Please open a document and try again.", true);
        var startTime = new Date();
    
        var doc = app.activeDocument;
        var layer = doc.layers.item("Buttons");
        var swatch = doc.swatches.item("RGB Yellow");
        var hyperlinks = doc.hyperlinks;
    
        var progressWin = new Window ("window", scriptName);
        progressBar = progressWin.add ("progressbar", undefined, 0, undefined);
        progressBar.preferredSize.width = 450;
        progressTxt = progressWin.add("statictext", undefined,  "Starting processing hyperlinks");
        progressTxt.preferredSize.width = 400;
        progressTxt.preferredSize.height = 30;
        progressTxt.alignment = "left";
        progressBar.maxvalue = hyperlinks.length;
        progressWin.show();
    
        for (var i = hyperlinks.length-1; i >= 0; i--) {
            hyperlink = hyperlinks[i];
            source = hyperlink.source;
            sourceText = source.sourceText;
            destination = hyperlink.destination;
            page = sourceText.parentTextFrames[0].parentPage;
    
            barodeCount++;
            progressBar.value = barodeCount;
            progressTxt.text = "Processing hyperlink " + hyperlink.name + " (Page - " + page.name + ")";
    
            arr = sourceText.createOutlines(false);
            outlinedText = arr[0];
            gb = outlinedText.geometricBounds;
            outlinedText.remove();
    
            button = page.buttons.add(layer, {geometricBounds: gb, name: hyperlink.name});
            button.fillColor = swatch;
            button.fillTint = 50;
            button.groups[0].transparencySettings.blendingSettings.blendMode = BlendMode.MULTIPLY;
            behavior = button.gotoURLBehaviors.add();
            behavior.url = destination.destinationURL;
    
            hyperlink.remove();
            source.remove();
    
            hypCount++;
        }
    
        var endTime = new Date();
        var duration = GetDuration(startTime, endTime);
        progressWin.close();
    
        alert("Finished. " + hypCount + " hyperlinks were convertted to buttons.\n(time elapsed: " + duration + ")", scriptName);
    
    }
    //--------------------------------------------------------------------------------------------------------------------------------------------------------
    function GetDuration(startTime, endTime) {
        var str;
        var duration = (endTime - startTime)/1000;
        duration = Math.round(duration);
        if (duration >= 60) {
            var minutes = Math.floor(duration/60);
            var seconds = duration - (minutes * 60);
            str = minutes + ((minutes != 1) ? " minutes, " :  " minute, ") + seconds + ((seconds != 1) ? " seconds" : " second");
            if (minutes >= 60) {
                var hours = Math.floor(minutes/60);
                minutes = minutes - (hours * 60);
                str = hours + ((hours != 1) ? " hours, " : " hour, ") + minutes + ((minutes != 1) ? " minutes, " :  " minute, ") + seconds + ((seconds != 1) ? " seconds" : " second");
            }
        }
        else {
            str = duration + ((duration != 1) ? " seconds" : " second");
        }
    
        return str;
    }
    //--------------------------------------------------------------------------------------------------------------------------------------------------------
    function ErrorExit(error, icon) {
        alert(error, scriptName, icon);
        exit();
    }
    
  • Automatically add word documents to the list of OPM of word documents

    Is it possible that you can paste a new word document in the rules folder in the project folder and OPA will automatically add it to the Project Explorer or do you add manually new files every time?

    I want to have a request to create a partially formatted word template to the OPM project and the project include construction and be available for editing without adding this document.

    For any help or suggestion would be appreciated.


    Thank you

    It of a hacky Darren solution... and would not be supported to a PoV of product but... If you want, you can write a script custom/some code that could directly edit the file .xproj and add new references to document in the file xml xproj

  • 16 June updated blown all my documents for the last 10 days. Any solution?

    Documents for the last 10 days, all my email settings, all memory of recent activity disappeared.

    Hello Craig,.

    Let's see if the files are actually missing or if they are hidden:

    http://answers.Microsoft.com/en-us/Windows/Forum/Windows_7-files/Unhide-files-and-folders/ca46d3ba-1b51-E011-8dfc-68b599b31bf5

    Best regards

    Matthew_Ha

  • Word document open in Windows Mail - recorded while the work can't return

    I opened a Word document, use the Save button all working. Finish, but could not find him. I never saved the document in 'My Documents'. Must have registered somewhere, but where should I find it.

    It is somewhere between quite unlikely and very little probable that you will find anywhere.  When you open a document from the email that you open a temporary copy of the file, and unless you use "Save as" to save to a network drive or a location on your hard drive, this temporary copy is the one that is being updated.  In general, as soon as you close the e-mail message or e-mail program, the temporary copy went forever.

    You could try a search on your hard drive, including indexed and hidden locations and the files system and records, but it is not likely that you will find.

    writing in the new message: * e-mail address is removed from the privacy... *

    I opened a Word document, use the Save button all working. Finish, but could not find him. I never saved the document in 'My Documents'. Must have registered somewhere, but where should I find it.

  • A design of query for the conversion of time difference in days, hours, Minutes

    Hi all

    A design of query for the conversion of time difference of time in number of days remaining remaining hours minutes and rest in seconds. Made this one till now. Please suggest for all modifications, until now, it seems to work very well, kindly highlight for any anomaly.

    WITH DATA (startDAte, EndDate, Datediff) AS (SELECT to_date ('2015-10-01 10:00:59 ',' yyyy-mm-dd hh24:mi:ss'), to_date ('2015-20-01 03:00:49 ',' yyyy-mm-dd hh24:mi:ss'), to_date('2015-10-01 10:00','yyyy-dd-mm hh24:mi:ss')-to_date('2015-20-01 03:00','yyyy-dd-mm hh24:mi:ss') FROM dual)

    UNION ALL SELECT to_date ('2015-10-01 10:00:39 ',' yyyy-mm-dd hh24:mi:ss'), to_date ('2015-20-01 03:00:40 ',' yyyy-mm-dd hh24:mi:ss'), to_date('2015-10-01 10:00','yyyy-dd-mm hh24:mi:ss')-to_date('2015-20-01 03:00','yyyy-dd-mm hh24:mi:ss') FROM dual

    UNION ALL SELECT to_date ('2015-11-01 10:30:45 ',' yyyy-mm-dd hh24:mi:ss'), to_date ('2015-11-01 11:00:50 ',' yyyy-mm-dd hh24:mi:ss'), to_date('2015-11-01 10:30','yyyy-dd-mm hh24:mi:ss')-to_date ('2015-11-01 11:00 ',' yyyy-mm-dd hh24:mi:ss') FROM dual

    UNION ALL SELECT to_date ('2015-11-01 09:00:50 ',' yyyy-mm-dd hh24:mi:ss'), to_date ('2015-11-01 10:00:59 ',' yyyy-mm-dd hh24:mi:ss'), to_date('2015-11-01 09:00','yyyy-dd-mm hh24:mi:ss')-to_date ('2015-11-01 10:00 ',' yyyy-mm-dd hh24:mi:ss') FROM dual

    UNION ALL SELECT to_date ('2015-11-01 08:30:49 ',' yyyy-mm-dd hh24:mi:ss'), to_date ('2015-11-01 09:30:59 ',' yyyy-mm-dd hh24:mi:ss'), to_date('2015-11-01 08:30','yyyy-dd-mm hh24:mi:ss')-to_date('2015-11-01 09:30','yyyy-dd-mm hh24:mi:ss') FROM dual

    )

    Select

    trunc ((EndDate-StartDate)) days.

    trunc (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24) hours)

    trunc (to_number (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24-trunc (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24)) * 60) Minutes,))

    (to_number (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24-trunc (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24)) * 60 - trunc (to_number (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24-trunc (((enddate-startdate)-to_number (trunc ((enddate-startdate))) * 24)) * 60)) * 60 seconds))))

    data;

    Thanks for the answers in advance.

    AHA!

    TO_TIMESTAMP expects a string as input, so it first makes an implicit conversion from DATE to a string, in the format of NSL_DATE_FORMAT.

    To convert the TIMESTAMP DATE independently NLS_DATE_FORMAT, use

    CAST ( AS TIMESTAMP)

  • I try to combine pdf files.  My word documents convert the text but lose the background color.  Any suggestions?

    I try to combine pdf files.  My word documents convert the text but lose the background color.  Any suggestions?

    Hello

    With Office 365, need you Acrobat 11.0.1 or higher.

    You can also try the new DC Pro Acrobat.

    Thank you

    Tanvi

  • Documentation for the conversion of virtual and physical memory

    Are there any decent documentation for the conversion of virtual and physical memory?

    Any help would be appreciated.

    Yes I do, I'll send you my notes!

    Matthew

    Kaizen!

  • The Word Documents to PDF conversion loses header

    OK, so a two pages of Word document (2010) with headers on the second page. Using the last update for Adobe Acrobat XI.

    If I print > save to .pdf file, it works fine. However, often combine us multiple files together in a single PDF, so it's annoying to have to do this for each file first.

    If I click with the right button on the files, and then combine or convert to PDF format, the headers to disappear from the top of the second page.

    Can anyone offer any idea?

    You mentioned the print function, but the handset function you mention does not use the impression, but PDF Maker feature. Try to create the PDF from WORD using the Acrobat menu. Which gives the correct result. If this isn't the case, you may have screwed up in the converter settings. You can also check the options for the DOC to PDF in the Edit > options preference (in PDF format). You mention the last XI update, which should give you 11.0.09.

  • What is the best way to work with Word documents in The InDesign CS4?

    I work in Microsoft Word 2007 and all my documents have.doc format.

    What is the best way to work with Word documents in InDesign CS4?

    David Blatner says to avoid copying and pasting text from Word instead of placing (Ctrl + D).

    How to paste RTF or text Document?

    I want to do a book layout in ID CS4 and its main feature is that there is the left page with the text and the right - with graphics.

    So, if I understand correctly place the text on each page I create for example 70 Word documents and place each element on 70 pages left?

    He loks like wasting time. I have supplements another way to make such provision?  What kind?

    It is best to place any text.

    You can have all your text in a single file and debit allows you to add text, images and pages as necessary block (hold down the SHIFT key when you click the loaded text cursor), but it is somewhat atypical for the thread on one side of the spread in the perspective of automatic flow, so you don't have to set up properly.

    It is a case where a block of text that you type will work to your advantage. On your master page, add a text block to the left page, but not to the right (or at least not donned one to the right - for another project you can actually two threads of independent text). Hold the cursor of the load on a frame on the left side of a page document and auto-flux. ID will add new spreads as needed, but only to put the text on the left side.

    Peter

  • I need to number the lines of my page document for the presentation of the journal. Help, please!

    I need to number the lines of a document to be submitted to a journal. I can make a section in the document, but I can't find a command, and then add the line numbers.

    Hi Misha,.

    3 pages includes number of words to display as a menu item in the view menu:

    The County appears at the bottom left of the page and it shows more options when you click on it. Unfortunately, "number of lines" is not included in the available options:

    If you had ' 09 Pages on your Mac and did not intentionally removed it when you installed 5 Pages, it will always be there, in a folder named iWork ' 09, in your Applications folder.

    Copy your document, open the Pages ' 09 and paste the contents of your document into a word processor new document in Pages ' 09.

    Check that the end of the document is delivered on the same page (number) and to the same position on this page as in the original and making small changes at the margin to adjust it if necessary.

    Then go to edit > tools > statistics to see this more comprehensive report:

    Note that if you need to present the newspaper article in one format other than a file of Pages, the number of rows may change due to changes in the conversion to the new format of formatting.

    If you don't have a Pages ' 09, you can get a number of lines using tools > line numbers in OpenOffice or LibreOfficeApache. Both are applications open source, free to download and use (even if you might donate help the future evolution of the demand). The links will take you to their respective Web sites.

    Oh... One more thing (as long as Steve jobs used to say sometimes): pages (3), go (menu) Pages > provide pages of comments and make a feature request so that the line count (and line numbering) added the capability of 3 Pages.

    Kind regards

    Barry

  • PDF to word - pdf now converts to a Word document for editing of pop up

    Original title: pdf in word

    Hello! I just converted a Word document to a PDF using Novapdf. However, when I try to open the original pdf a pop-up shows saying that my pdf now converted a Word document to change. I don't want to convert my doc' for editing, because I already edited in Word. More than that, I noticed when a pdf file is converted to an editable Word doc ' doc of all ' will distort puff, it's in every way. I want to just OPEN THE DOC ORIGINAL"WITHOUT GETTING THE pop-up.

    Thank you!!!

    Consider uninstalling Novapdf.

Maybe you are looking for

  • with AutoComplete on, I can no longer type "https" as the beginning of a URL

    I have used AutoComplete in previous versions of Firefox and I was able to type "https" at the beginning of a URL, but now since I did not register domains starting with 'h', AutoComplete imediatley changes the 'h' to a 'p', makes it impossible for m

  • Chrome loads when I run the macbook pro?

    Not a big deal, but lately I've noticed that chrome still load when I start my PC... looks like it was something else - I also use safari most of the time, but some applications run in chrome. I've probably done it myself but can't seem to find out w

  • Recording of data at low speed

    Hello I am acquiring data through data acquisition. The sampling frequency is 20 k and sample read is 2 k. These parameters satisfies my needs. But the data I save are enormous, because I intend to acquire for hours. Therefore, I would be grateful if

  • Acer Aspire One D255 SD card MemorMy y

    My SD memory card does not work. I think I need driver. They are available somewhere? Thank you

  • -What this means and how can I solve this problem?

    My computer has recently been fixed and now I can't send emails via Outlook Express.This is the message I get when I try to send an e-mail with an attachment: failed to connect to the server. Account: ' * address email is removed from the privacy *',