Normalize the names in a huge table using UTL_MATCH
Hello
I have a large table (350 million records) with a "full name" column
This column has a few typos, so I have to 'normalise' the data (only for this column), using UTL_MATCH. JARO_WINKLER_SIMILARITY.
I did some tests with a small table, and it works to show the similar names:
SELECT b.SID, b.name FROM typotable a, typotable b utl_match.jaro_winkler_similarity (b.SID, b.name) WHERE BETWEEN 85 and 99 AND a.rowid > b.rowid;
But:
(1) the test table was small, by using this code directly on the 350 million accounts table take ages... What can be done about it?
(2) this shows just the similar names. How can I update the table by searching for similarities, choose one of them as the only value for each name?
Thank you
1590733 wrote:
Yes, I get your point. The thing is that there is no "correct" available names and the original table is huge, that's what I thought:
-Create a table of secondary NAMES, with unique names. These names would have been generated by match the values similar to one of them (but always the same, no matter if is not one that suits). This should be equivalent to your table 'correctness '.
-Run the cleaning procedure for updating records
How can I create this secondary NAMES table? (The column 'genre' is not serious at all, that the 'name' must be set)
Thanks for your help
Well, you need to determine what is the logic that would pick one of the incorrect names on the other. In its current version, you can easily get two incorrect values having the same value of match. But then you must also consider what creates a 'group' of values that you can get the best in the group. Using the match itself is not enough to create groups.
Example:
SQL > ed
A written file afiedt.buf
1 Select a.fname as $fname1, b.fname as fname2
2, utl_match.jaro_winkler_similarity (a.fname, b.fname) as a match
3 typotable one
4 join typotable b on (a.fname! = b.fname)
where the 5 utl_match.jaro_winkler_similarity (a.fname, b.fname) > = 85
6 * 1.3 desc order
SQL > /.
$FNAME1 FNAME2 MATCH
---------- ---------- ----------
FROCEN FROZEN 92
FROZEN FROCEN 92
FROZEN FROCEN 92
FROZEN FROZEN 92
JELLY FROZIN 93
JELLY FROCEN 92
FROZEN FROZEN 92
FROZEN FROZIN 93
WHIPLASH WIPLASH 96
WHIPLASH WIPLASH 96
10 selected lines.
As you can see, for example, FROCEN has two possible variants, both with a football match of the 92. The same with others.
However, you could start cutting things around (and it's really a hack) to get something like:
SQL > ed
A written file afiedt.buf
1 with t as)
2. Select a.fname as $fname1, b.fname as fname2
3, utl_match.jaro_winkler_similarity (a.fname, b.fname) as a match
typotable a 4
5 join typotable b on (a.fname! = b.fname)
where the 6 utl_match.jaro_winkler_similarity (a.fname, b.fname) > = 85
7 )
8, ch. as)
9 select $fname1, ($fname1, fname2) greatest as fname2, match
10, (select count (*)
11 t t2
12 where t2.fname2 = t.fname2
13 and t2.fname1! = t.fname1
(14) as the NTC
15 t
16 )
17, r as)
18 select $fname1, fname2, match, cnt
19, row_number() over (partition by $fname1 by cnt desc, desc match order): the nurse
20 c
21 )
22 select $fname1, fname2
23 r
where the 24 rn = 1
25 * order by 1
SQL > /.
$FNAME1 FNAME2
---------- ----------
FROZEN FROCEN
FROZEN FROZEN
FROZEN FROZEN
FROZIN FROZIN
WHIPLASH WIPLASH
WIPLASH WIPLASH
6 selected lines.
but then it depends on your data as to if it will work in all circumstances
Tags: Database
Similar Questions
-
How to change the name of column in table effectively
Hello
I use JDev 11.1.1.4.0.
My JSF page is done with 3 or 4 EOs, your and LOVs. However, one of the EO tables has changed its column name. The name change is important for the standard project.
I remember having a lot of harm in changing a prior column name (I had to start over), so I would ask the experts take on how to do it without too much impact.
This particular column is an OT, VO and lives as a field test on a JSF page.
All comments or suggestions will be greatly appreciated.
Thank you
Bones JonesIndeed, the only thing you would have to change is the ColumnName of the object attribute specific entity (name of column of data in the attribute from entity dialog box). Now, if you used sync with databases... feature you will need to go through a full process of refactoring. In this case, use the menu refactor in JDeveloper. See if anything on this post can help you to: http://jdeveloperfaq.blogspot.com/2010/04/faq-20-how-to-refactor-adf-components.html
-
How to change the names of slide in Table of contents
Hello
How can I change the name of the slide in my table of contents? Slide 1, 2, 3, etc. is not very informative. I went into the properties of the slide and added labels, but who has not updated the names of table of contents slide.
Any help would be appreciated. Thank you.
Hello
Perhaps try to click Reset TOC?
Click on project > Table of contents.
Click Reset TOC.
Click on the image below to see full size.
See you soon... Rick
Useful and practical links
-
How to get a list of the names of a query table?
Hello
I use Oracle 10 g. I have about 100 SQL queries stored in a table. I would like to know if there is an easy way to retrieve the source tables in each query.
For example:
I have a query "SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.col1 = t2.col1.
This query, I would automatically get a list of tables:
Table1:
Table2
Thanks in advance for your collaboration.
Best regards
BeroetzThis query, I would automatically get a list of tables:
Make a plan to explain on the query.
The name of the object will be in the OBJECT_NAME column in your PLAN_TABLE.
But the name of the object can be a table or an index so you'll need to join the user_objects this name to see if it is a table or index name.
You will also need to take into account those moments where a query can be satisfied by only using the index. You can always get the name of the table glancing user_indexes.
-
How the names of variables and units used in the binary output file
My colleague will give me LabView generated from the binary files (*.dat). There are more than 60 variables (columns) in the binary output file. I need to know the names of variables and units, which I think he has already configured in LabView. Is there a way for him to produce a file that contains the name of the variable and unity, so that I'll know what contains the binary file? It can create an equivalent ASCII file with a header indicating the name of the variable, but it does not list the units of each variable.
As you can tell I'm not a user of LabView, so I apologize if this question makes no sense.
Hi KE,.
an ASCII (probably the csv format) file is just text - and contains all data (intentially) written to. There is no special function to include units or whatever!
Your colleague must save the information it records the names and values in the same way...
(When writing to text files, it could use WriteTextFile, FormatIntoFile, WriteToSpreadsheetFile, WriteBinaryFile even could serve...)
-
How to reference the names of columns, if you use select *.
Hello
How to reference the names of columns to get out of the data, when you use select * and not aware of the column names (and number of columns) in advance.
Even if I could get the column names in the other variables. I am new to CF so question may be stupid.
getting column names: -.
< cfquery datasource = "RTW_ORA" name = "cn" >
SELECT COLUMN_NAME
OF ALL_COL_COMMENTS
WHERE TABLE_NAME = ' #meas #
< / cfquery >obtain data: -.
< cfquery datasource = "RTW_ORA" name = "cd" >
SELECT *.
To #meas #.
< / cfquery >How do all the output data?
Any help would be much appreciated!
Thank you
Tushar Saxena
How to reference the names of columns to get out of the data, when you use select * and not aware of the column names (and number of columns) in advance.
Even if I could get the column names in the other variables. I am new to CF so question may be stupid. getting column names: -.
SELECT COLUMN_NAME
OF ALL_COL_COMMENTS
WHERE TABLE_NAME = ' #meas #
obtain data: -.
SELECT *.
To #meas #.
How do all the output data?
Your question is not stupid. You can use the concept of a query requestand their properties cfquery attributes name and result.
SELECT *.
To #meas #.
column names: #column_names #.
number of columns: #no_of_columns #.
SELECT #column_names #.
FROM the cd
A SQL query:#resQoQ.sql #.
Query:
#column #: #cd [column] [currentrow] #.
T/t:
#column #: #QoQ [column] [currentrow] #.
-
I have the data into two table with the structure of similar column, I want to loop through the data in these two tables
based on some condition and runtime that I want to put the query in loop for example, the example is given, please help me
create table ab (a number, b varchar2 (20));
Insert into ab
Select rownum, rownum. "" sample "
of the double
connect by level < = 10
create table bc (a number, b varchar2 (20));
Insert into BC.
Select rownum + 1, rownum + 1 | "" sample "
of the double
connect by level < = 10
declare
l_statement varchar2 (2000);
Boolean bool;
Start
bool: = true;
If it is true, then
l_statement: =' select * ab ';
on the other
l_statement: =' select * from bc';
end if
I'm in execute immediate l_statement - something like that, but I don't know
loop
dbms_output.put_line (i.a);
end loop;
end;
Something like that, but this isn't a peace of the code work.
Try this and adapt according to your needs:
declare
l_statement varchar2 (2000);
c SYS_REFCURSOR;
l_a number;
l_b varchar2 (20);
Boolean bool;
Start
bool: = true;
If it is true, then
l_statement: = "select a, b, AB;
on the other
l_statement: = "select a, b from bc;
end if;
--
Open c for l_statement;
--
loop
extract the c in l_a, l_b;
When the output c % notfound;
dbms_output.put_line (l_a |') -' || l_b);
end loop;
close c;
end;
/
-
On facebook I can't watch my friends with Tracey name I can use my phone app I can with Chrome etc but not using Firefox on my PC Tower
Hello
To better help you with your question, please provide us with a screenshot. If you need help to create a screenshot, please see How to make a screenshot of my problem?
Once you have done so, attach the file to screen shot saved to your post on the forum by clicking on the button Browse... under the box to post your reply . This will help us to visualize the problem.
Thank you!
-
Best way to update all the lines of a huge table
Hi all
I'm on Oracle 11.2 Enterprise edition.
I asked a question, "assumes that there is a large table, say 3 GB size.» We need to update all rows in this table (e.g. col3 = col3 * 2), what is the best way to do this? »
My answer was, there are 2 possible ways, depending on the number of indexes on the table
(1) if there is a lot (or wholesale) index on the table OR the database is in FORCE LOGGING mode (due to log shipping), I just run an UPDATE command on the table and update all records. This will generate a lot of redo and undo, but we cannot do anything for indexes
(2) if there is not a lot of clues, I'll create an empty table of the same structure (create table t2 nologging in select * from t1 where 1 = 2). Then, using an INSERT... AS SELECT... command, with the ' col3 * 2 "instead of col3, I insert any data into the new table with APPEND tip. Then create all indexes on the new table and finally, rename T2 T1
In case the first solution, I can avoid the recreation of the index and save this I/O. In the case of newspapers, first solution makes more sense anyway.
Second solution is logical, if we have the freedom of creating objects in NOLOGGING mode.
What do you experts? I think in the right direction? or what?
Thanks in advance
Hello
This should probably help you.
Kind regards
Suntrupth
-
the names of the plots on the chart and use these channels in the menu of the ring
Hello everyone!
I table 1 d which is more a cluster of 2 elements: one is a number and other string. These string contains the information on the name of signals. I connected this table 1 d to reshape the array with a dimension of 10. Then the consistent table is connected to the table in the cluster. This cluster has led is still naked to get the name of the plots. The problem is that I don't want these 9 unbundle blocks to get the name of the plots. Is there anyway I can do it without use of unbundling 9 times. I though that the use of loop for or while loop, but I need some suggestions.
So I have two questions:
How to get the name of plots is without using unbundle so many times?
Second is how to display the names of these plots on my menu ring?
I must have missed something, I didn't see any large cluster on your drawing. Change this large cluster in a table, because there a lot of the same element. Then proceed as attaché.
-
How to get the name of the particular index table option.
Hello
Can any body tell how to get the name of the item to a particular array.i have a table within array.i must compare the name of Francesca in particular key.here is the table.
myArray= Array (@43b1e09)
[0] = object (@42b33f9)
Testing_1 = Array (@4428821)
[0] = object (@43adc19)
choice_id = '0 '.
delete = "N".
DownloadURL = "xyz".
selected = 'Y '.
translation = "2_486."
length = 1
length = "N".
Editable = 'Y '.
field_id = '388 '.
LanguageLink = 'Y '.
linked_definition_id = null
multiple values = "N".
name = "Photo".
otheroption = "N".
photovitlink = object (@43ad0d9)
required = "N".
step = '1 '.
translation = "Photo".
visible = 'Y '.
[1] = object (@43ad5d9)
[2] = object (@4490089)Here is the structure of the table I get server side.i give table name of result as table myArray.This have several child as an object of object.each having .i table (Testing_1 in the first case) must get the name of this Testing_1 table and compare with my sort key that I perform an operation. But I am unable to get the name of this Testing_1 array(Since_it_is_dynamic_so_this_name_changes_some_times).can a body guide me how to get the name of this table.
Thanks and greetings
Vineet Sharma
Hi Vineet Osho,
You can browse your object using the loop and you can get the name of the table... as below...
for each (var obj:Object in myArray)
{
for (var str:String in obj)
{
If (obj [str] is array)
{
var arrayName:String = str;
}
}
}Thank you
Jean Claude
-
How to get all the names a table display
Hi all
I try to get the names of all the points of view of a table. I tried to use the user_views table, but there is no column by specifying the name of the table.
Is someone can you please tell me how I can get all the names display in a table.
Thank youYou will need to join with USER_VIEWS USER_DEPENDENCIES for the list of dependent views on a particular table.
-
Get - VM: with "xyz" name VM is not found using the specified filters.
Hello everyone!
I have a script that reads a CSV with multiple host names, to connect to vCenter and should get the name of host, ip and PortGroup information to generate an external file of CSV.
It happens that several cases vm does not exist in vCenter and returns me the below error:Get - vm: 04/07/2016-14:41:58 Get - VM VM with the name "pxl1sso00008" was not found using the specified filters. No caractere:31 of C:\Users\f3135606\Desktop\vmTeste1.ps1:25
+ foreach ($vmName in $vmList) {get - vm $vmName |} Select Name, @{N = "Network"; e = {$_...}}
+ ~~~~~~~~~~~~~~ + CategoryInfo : ObjectNotFound: (:)) [Get - VM], VimException) + FullyQualifiedErrorId: Core_OutputHelper_WriteNotFoundError, VMware.VimAutomation.ViCore.Cmdlets.Commands.GetVM
How would I do to get the information from an input file, see the vCenter and if this positive results write the file, ignoring errors. Ideally, if not to find the machine, simply create a line in the output file with only the hostname with the rest in white.
Follow the .ps1 file:$vmlist = Get-Content C:\vmnames.csv if (!(Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction SilentlyContinue)) { Add-PSSnapin VMware* Set-PowerCLIConfiguration -DisplayDeprecationWarnings $false -DefaultVIServerMode multiple -InvalidCertificateAction Ignore -Scope Session -ProxyPolicy NoProxy -Confirm:$false | Out-Null [void](Get-PSSnapin VMWare.VimAutomation.Core -ErrorVariable getVmwareSnapinErr 2> $null) if ($getVmwareSnapinErr.Count -gt 0) { Add-PSSnapin VMware.VimAutomation.Core } } $VCconn = Connect-VIServer $vCenter -User $vUsuario -Password $vPass > $null foreach ($vmName in $vmList) {get-vm $vmName| Select Name, @{N="Network"; e={ $_ | get-networkadapter|Select-Object @{N="Network";E={$_.NetworkName}}} }, @{N="IP Address";E={@($_.guest.IPAddress[0])}}|Export-Csv –path c:\scripts\vlans.csv –NoTypeInformation}
Laurent,
See below... should get what you want... If the virtual machine is not found that it only allows to correct the virtual computer name in the output.
$vmlist = Get-Content C:\vmnames.csv if (!(Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction SilentlyContinue)) { Add-PSSnapin VMware* Set-PowerCLIConfiguration -DisplayDeprecationWarnings $false -DefaultVIServerMode multiple -InvalidCertificateAction Ignore -Scope Session -ProxyPolicy NoProxy -Confirm:$false | Out-Null [void](Get-PSSnapin VMWare.VimAutomation.Core -ErrorVariable getVmwareSnapinErr 2> $null) if ($getVmwareSnapinErr.Count -gt 0) { Add-PSSnapin VMware.VimAutomation.Core } } $VCconn = Connect-VIServer $vCenter -User $vUsuario -Password $vPass > $null $arrVMInfo = @() foreach ($vmName in $vmList) { $vm = get-vm $vmName -ErrorAction SilentlyContinue -ErrorVariable VMError | Select Name, @{N="Network";E={ $_ | get-networkadapter | Select-Object @{N="Network";E={$_.NetworkName}}} }, @{N="IPAddress";E={@($_.guest.IPAddress[0])}} if ($vm -eq $null) { $arrVMInfo += New-Object PSObject -Property @{ ` Name=$vmName ` } } else { $arrVMInfo += New-Object PSObject -Property @{ ` "Name"=$vm.name; ` "Network"=$vm.Network.Network; ` "IP Address"=$vm.IPAddress ` } } } $arrVMInfo | Select Name, Network, "IP Address" | Export-Csv "c:\scripts\vlans.csv" -NoTypeInformation
-
After entering the Ctrl-F to save the project email all in composition, if the recipients in the To: / Cc: / Bcc: fields are not already in the list of contacts, the live tool Email would remove the
of "name " format, causes the email undeliverable due to unknown recipients. BTW, is there a way to set Live automatic email saves the recipient from the list of contacts if the ' name ; ' e-address format are used to: field? See: a question of Hotmail or Windows Live Mail?
http://answers.Microsoft.com/en-us/Windows/Forum/Windows_7-networking/have-a-Windows-Live-Mail-or-Hotmail-question/8bd31c48-d1a7-49D6-a08c-9069aaeba2e5 -
Can someone tell me the name of this font, I used years ago please? I work on homework for my graphic design class and want to use the same font, but do not remember the name of it and it is not in my favorites or downloaded list.
KZ BLADERUNNER 1 fonts | WhatFontis.com
Fenja
Maybe you are looking for
-
I rented a movie and I can not watch and has already been charged
I rented a movie and I can not watch and has already been charged
-
said mootools.js has not loaded
went to verizon wireless get the mootools.js not loaded message site and the site looks strange, whatever that means? What is mootools.js?
-
I am currently using a Netgear WNR2000v3 wireless router. I've never installed the Firmware/Software updates. If I install the latest update of the Firmware/Software (Firmware Version 1.1.2.10), this includes all previous updates or do I return compl
-
Hello.. How can I get a subset of a waveform and its slope between 1.5 and 2.5 volts from a 4 v p - p, 100 Hz sinusoidal with an offset of 2V (by a waveform generator). Please help me in this regard.
-
Hi experts,I would like to delete a document all empty pages.What is the best solution? By storythread or pageitems or something?A storythread can contain one or more empty page with same tagnameEmpty page = the current page contains only empty text