Compression and query performance in data warehouses

Hello

Using Oracle 11.2.0.3 have a table made large index bitmap with dimensions would reveal.

Understanding bitmap indexes are compressed by default, so assume it cannot compress more.

Is this correct?

You want to try to compress the large fact table to see if this will reduce the IO on the readings and so give performance benefits.

Speed ETL fine want just to increase the performance of the report.

Thoughts - someone has seen significant gains in data warehouse performance report with compression.

In addition, current PCTFREE on table 10%.

Insert as only tabel considering making this 1% to redesign performance report.

Thoughts?

Thank you

Yes, you can't compress bitmap indexes, they are already compressed.

Yes, Compression (obviously depends on the type of data) reduced definetely e physics / s so improves performance. We saw this in our environment.

There ONLY inserts and updates, you can go with pctfree 0, you need to book ANYTHING for updates.

Tags: Database

Similar Questions

Questions after TimesTen first trial: memory footprint and query performance
Hello!

I'm testing TimesTen In - Memory Database cache to see if it could help with some ad hoc reports questioned this need too long to run in our Oracle database.

Here is the configuration:

1.) TimesTen Server CPU Quad Core 2 with 32 GB of RAM running Windows 2003 x 64.

2.) put in place two cachegroups read-only: a little for a quick test and the real thing that maps to a table of the database as such:

Database table looks like:
```
  CREATE TABLE "TB_BD" 
   (   
   "VALUE" NUMBER NOT NULL ENABLE, 
   "TIME_UTC" TIMESTAMP (6) NOT NULL ENABLE, 
   "ASSIGNED_TO_ID" NUMBER NOT NULL ENABLE, 
   "EVENT_ID" NUMBER, 
   "ID" NUMBER NOT NULL ENABLE, 
   "ID_LABEL" NUMBER NOT NULL ENABLE, 
   "ID_ALARM" NUMBER, 
    CONSTRAINT "PK_TB_BD" PRIMARY KEY ("ID")  
   );
```
Oracle database table has 1.367.336.329 lines and table segments are approximately 61 GB, so a medium line takes about 46 bytes.

Since I have 32 GB in the TimesTen machine, I created the Group cache with a where predicate in the ID column that only the 98.191.284 most recent ranks get in the cache group. In the Oracle database, it is around 4.2 GB of data.

After the cache loading dssize group returns:
```
Command> dssize

  PERM_ALLOCATED_SIZE:      26624000
  PERM_IN_USE_SIZE:         19772852
  PERM_IN_USE_HIGH_WATER:   26622892
  TEMP_ALLOCATED_SIZE:      32768
  TEMP_IN_USE_SIZE:         10570
  TEMP_IN_USE_HIGH_WATER:   14192

  (Note: the high PERM_IN_USE_HIGH_WATER comes from a first test where I tried to cache too many rows)
```
I then ran on the TimesTen machine:
```
tisql> select avg(value) from tb_bd;
```
She is still going after 10 hours, so I can already tell that the query execution time is not really met my expectations. :-)

In the Windows Task Manager, I see that tisql constantly use 13% of CPU (= 100% / 8 cores), so that it uses only a carrot, but even he was using all the hearts and the execution time would be 1/8th, it wouldn't meet my expectation. :-)

I also see in the the Windows Task Manager who becomes slowly higher and higher, currently the 'MemUsage' of my tisq 14FR processl. I believe that it is shared memory mapping that is already mapped by the TimesTen process that has approximately 24 GB mapped. The query is probably 53% through and the total time of queries can be around 20 hours.

My questions:

1.) for what I tested, 1 GB of data in the table Oracle needs about 4-5 gigabytes of memory in the TimesTen database. I read a post on the forum who has explained with ' data are optimized for performance, no space in TT ", but I don't quite buy it. A factor of 4-5 means that the CPU must spend 4 to 5 times the amount of data. The data is not compressed in the Oracle database, but it is in its natural binary form. I would like to understand why data takes much more space in TT - like when you have a numeric in Oracle, which TT do with it to make it 4 - 5 times bigger and why does do that?

2.) regarding the performance of the queries: how long can take even to the base allows to browse about 20 GB of data in memory, number of lines, summarize the NUMBER of a column with a division to get the avg (< column >)? Is there something flawed with my setup?

Thanks for the ideas!

Kind regards
Marcus

Published by: user11973438 on 06.09.2012 23:27
I agree that the use of 4 - 5 times more memory than Oracle is far from optimal. Your drawing is unfortunately a little pathological; normally we see more like 2 - 3 times (which is still too really0. There are many internal differences between Oracle and TimesTen in the way data are stored internally. Some are historical, and some are due to the optimization of performance rather than storage efficiency.

For example:

1 oracle lines are always variable storage length while TimesTen lines are always of fixed length in storage.

2. in Oracle, a column defined as NUMBER only occupies the space needed based on the stored value. In TimesTen SEVERAL column always occupies the space to store the maximum possible precision and therefore takes up 22 bytes. You can reduce it by restricting explicitly using NUMBER (n) or NUMBER(n,p).

3 TimesTen does not support any kind of parallel query within a single data store. All queries will be run using maximum core CPU; Oracle DB supports parallel queries and so it can make a big difference for certain types of application.

4. NUMBER is implemented in software and is relatively ineffective. Calculating the average of almost 100M lines will take time... You can try to change cela a native binary type (TT_INTEGER, TT_BIGINT, BINARY_DOUBLE depending on your data); This will no doubt give a good improvement (but see point 5 below).

5. with a database of this size, it is possible that Windows made a lot of paging, while the query is running. I myself also observed on Windows it seems to be a penalty when a process key/maps a page for the first time. You should monitor the paging activity via the task manager that the query is run. All important pagination will really affect the performance. Also, try to execute the query a second time without disconnecting ttIsql this may also show an advantage. On Unix/Linux platforms, we provide an option (MemoryLock == 4) to lock the entire database in physical memory to prevent any paging, but is not available under Windows.

Chris

Impossible to get analytical information and/or perform a data load: the 0 dimension is not valid

Hello
I am trying to load the first set of PSPB files using import data
Header is:
Budget item, Cube name of loading data, view FTES, proposed, FTE Start Date, end Date of the FTE
And the first row of data:
< LINEITEM ("EPT and powers of the State"), > PlanHR, 'BegBalance, Budget, Base, any year, vacancy, unspecified, U09999, POS099999', 1, 01/01/1951,.
I get the following error when posting:
The 0 dimension is not valid

In addition, the dimension of the office's budget item
and size of the drivers of accounts with members proposed FTE FTE Start Date, end Date of the selected FTE.
Cannot find the error, please help.
Thank you

Just to let you know, part of the hierarchy of the dimension entity has been marked as invalid for the main Type of Plan. So that's what was causing the error.

ESXi 5.1 and VMFS-3 data warehouses

We have an existing environment to ESX 3.5 Update 5 7 host and an EMC CX3-10 x iSCSI SAN. We go to decomission 2 ESX hosts and replace them with hardware ESXi 5.1 and recent guests. We realize that we no longer able to manage these hosts ESXi from our Virtual Center 2.5 console, but these two hosts will run a minimum number of virtual machines (Exchange 2010 multirole) that don't use VMotion, DRS and HA.
Our understanding is that ESX5.1 can see and use VMFS-3 data warehouses. Is it true that, once we plugged the new 5.1 ESXi hosts to our SAN and you can see all the existing LUN that we should be able to stop the virtual machines on two ESX 3.5 hosts and start the virtual machines on the host ESX 5.1 computers?
Everyone has problems running ESXi 5.1 with VMFS-3 data warehouses?
Some data warehouses VMFS-3 existing could be improved to VMFS-5 because they are only used by the virtual machines that work on 5.1 ESXi hosts. But some of the data warehouses VMFS-3 existing is not extensible because they are used by the virtual machines running on ESX 3.5 other hosts. In other words, a single virtual machine can have virtual drives running on the VMFS-3 and VMFS-5 warehouses of data at the same time?
Thank you
Jay

VMFS version isn't a problem. What I am more concerned by the fact that your storage system is not supported by ESXi 5.1.

André

For the recovering site - new or replicated LUN data warehouses?

I am trying to implimenting SRM 4.1, and so far the process has been very frank. However I'm confused about something, and that's what data warehouses to be used on the recovering site. My table replica LUNS on Lun matched. Shoud I maps ESXi hosts the site caught in paired LUNS that are exact replicas of the protected site data storage? Or should I create new LUNS and create new warehouses of data, and then use those LUNS when you create Protection groups?
I hope that I have missed something in the docs, but I couldn't find the answer to this question after seraching various sites and sources. What others have done with their MRS. configurations? Thank you!

SRM in conjunction with your variety of SAN replication maps will do it for you durin site pairing and Setup.

Basically, the only thing you need to do is zoning on the FC switches or VLANS on Ethernet switches depending on what you use - storage FC or IP so that hosts the site of DR can reach the storage device.

SRM with the securities regulators will take care of the rest.

WBR

Imants

Do we need data warehouse, if we only create dashboards and reports in obiee?

Hello! I'm new to obiee.
My organization has decided to build their reports and dashboards using obiee. I am involved in this mission, but I don't have in-depth knowledge he obiee. My question is what do we need to have the installation of the data warehouse? Or I just need to just install obiee by the creation of a repository, and then by creating a data source in publisher bi and then create dashboards or reports?
I'm confused too please help me in this regard. Please share any document or link where I can easily understand these things. Thank you

Please share any document or link where I can easily understand these things. Thank you

OBIEE is a software to run without a good understanding of its complex concepts. I would really recommend attending a training course, or at least a book (for example this or this). There are MANY items of general blog on OBIEE, many of which are of poor quality and are all step-by-step guides on how to do a particular task, without explaining the overall situation.

If you want to use OBIEE and to make it a success, you have learned to understand the basics.

To answer your question directly:

-BI Publisher is not the same thing as OBIEE. It is a component of it (but also autonomous available). OBIEE makes data accessible through 'Dashboards' which is made up of 'Analysis', written in the answers tool. Dashboards can also contain content BI Publisher if you want

-OBIEE can report against the many sources of different data, one or more data warehouse and transactional. Most of the OBIEE implementations that perform well are based against dedicated DW, but is not a mandatory condition.

-If reports against a DW real or not, when you build the repository OBIEE you build a "virtual" data warehouse, in other words, you dimensionally model all your business in one data set of logic diagrams in Star.

POS 5.5 could not obtain data with analytical performance data warehouses

Hi all
I have two devices POS running version: 5.5.5.180.
All of a sudden I can not connect to the Web Client for each device.
POS status show all services in green on the two Pdvs.
```
root@vdp:~/#: dpnctl status all
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: gsan status: up
dpnctl: INFO: MCS status: up.
dpnctl: INFO: Backup scheduler status: up.
dpnctl: INFO: axionfs status: up.
dpnctl: INFO: Maintenance windows scheduler status: enabled.
dpnctl: INFO: Unattended startup status: enabled.
```
By clicking on the Storage tab, displays the error message: "Unable to get data with analytical performance data warehouses" and no data warehouses are listed.
- VCenter restarts, Pdvs, doesn't change anything.
- I can connect to Pdvs very well.
- CP are created.
I found similar topics but no response... (POS 5.5 ERROR)

Open a support case and turned out that the POS password user (a user defined in the domain of the @vsphere.local) that was used to access the vCenter has expired. Apparently, there's a bug in vCenter for some versions that makes them expire in 65 days.

effectiveness of the sql data warehouse and star/snowflake schema

Hello
We use 11.2.0.3 and need to improve the performance of queries for reports. schema of data warehouse star/snowflake
In addition to indexing, partitioning with star_transformation enabled etc I'm condisriing impact of the following on the performance of the queries.
makes central (more than 1 billion lines) is associated with a client of dimesnion (a few hundred thousand lines) which in turn joined with the latest version of the dimesnion (includes approximately 30 000 lines).
The table with a few hundred thousand lines (client dimesnion) alwsys must be questioned as stored data against the version of the customer at the time-, we wonder just latest_customer what users want to see
the most recent version of the attributes of customer to stop data being fragemented through several lines in the report.
If consideration would be more efficient to create a dimension that is equivalent to the customer but also stores the most recent version of the client attributes on the line-this means customer dimensuion many more columns but queries could would avoid additional research of this array of rank k 30.
Thoughts are - it would be a material advantage?
At the monent users request latest_customer to say would get all customers belonging to a certain multiple string.
If change as above, then they would be interviewing the customer with a few hundred thousand lines dimension.
Thoughts?
Thank you

Because a lot depends on the model of data access and data dissemination, we cannot really much more than only sb. However, keep in mind Oracle accesses data by blocks (copies blocks and sometimes several blocks), in order to become more broad lines could have a paradoxical effect of not getting enough blocks in memory until you need it.

It can help to visualize how data should be obtained (google something like visual sql and Karen Morton or Jonathan Lewis), as well as see the estimate and rowcounts actual imagined different plans.

Performance of BI with warehouse data or data warehouse.
Hi gurus,

Anyone here have a document or presentation to compare a BI performance which is implemented with the warehouse database or data warehouse?

Appreciate your help :)
You need to come to a conclusion of performance tests.

The query can also have different points of view;

1 compare the performance on obtaining data directly from the source systems against obtaining data accumulated in a single database.
2. by comparing the performance on obtaining data from a standard model against data from a dimensional model.

In most cases; getting data from data warehouses should give better performance since then.
Data warehouses allocate more resources toward BI systems.
Source systems serve much purpose other than BI systems; If resources are shared.
In a data warehouse for data sets are already in the same place, reducing the latency of the network.
Data warehouse stores summary information.
Data warehouses are implemented with three-dimensional models which serves best for the BI queries.

Data warehouse backups and read only tablespaces
Hi all

I'm working on a database to store data with the following specifications:

Version: Oracle Enterprise 10.2.0.3
Operating system: Solaris
Application: Data warehouse

Allows us to take the "level 0" & "level 1" RMAN backups. " We block change tracking is enabled and data files and archival of backups RMAN records directly on tape.

I'm exploring ways to reduce backups "level 0" and was specifically focusing on the use of read-only repositories for this purpose. ".

I often saw that he mentions that a best practice D/Ws is to store old static partitions of tables of facts read only tablespaces in order to reduce the size of backups.

In case you have already implemented such a scheme, I would like to know how you have implemented it.

I think the following mechanism:

-Start using the tablespace, rather than 'level 0' backups at the database level.
-Save the last SNA of all files before saving.
-If the last CNS has not changed since the last backup and the tablespace is in read-only mode, then
-Check if a copy of backup storage space was completed within the window of recovery and is accessible.
-If the copy exists, so backup tablespace, otherwise save the tablespace.
-If the tablespace is read/write, and then save it.

I have not drawn on the low-level details, but this seems to be a lot of work. I just know from you if there are any device ready to use which makes all this easier.

Thanks in advance.
go through the documentation of rman, you will notice that if you put 'optimize' (use CONFIGURE BACKUP OPTIMIZATION ON the rman prompt) he will perform following tasks automatically during backup.

-Save the last SNA of all files before saving.
-If the last CNS has not changed since the last backup and the tablespace is in read-only mode, then
-Check if a copy of backup storage space was completed within the window of recovery and is accessible.
-If the copy exists, so backup tablespace, otherwise save the tablespace.

I have a problem with file compression and then not be able to extract their share at a later date

original title: file compression

I am running Vista Home Premium on my laptop. I have problems with file compression and then not be able to extract their share at a later date. For example, I sent a file compressed to my instructor and she pointed out that she had a .zipx extension and the fact that she was unable to extract the files of ranking. Can someone help me with this problem?

Hello

1. do you have a third party compression/extraction of software installed on your machine?
2. do you get an error message when you try to extract the files?
3. is the relevant question for the particular file type?
4. don't you make changes to your computer before the problem?

Follow the steps and check if they help.

Step 1:

Open zip files is to open the folder and drag the content to another folder.

See article:
Compress and uncompress files (zip files)

Step 2:

If you have a third party software to Compress/decompress, I'd Uninstall it employs the way Windows compression and decompression of files/folders and see if they help.

Compress and decompress data using GZip

Hello

I compress and decompress data on Blackberry 8100 device using the GZip algorithm.

I use following code:

try {           GZIPOutputStream gCompress = null;          try {               ByteArrayOutputStream compressByte = new ByteArrayOutputStream();               gCompress = new GZIPOutputStream(compressByte,9);               gCompress.write(inputstring.getBytes());                                                //gCompress.flush();                compressedBytes = compressByte.toByteArray();               System.out.println("compressedBytes : "                     + new String(compressedBytes));             compressedString = new String(compressedBytes);                         } catch (IOException ex) {              ex.printStackTrace();           }       } catch (Exception e) {         e.printStackTrace();        }

The server is unable to decompress data.

Help, please.

Thank you very much.

I think you do a byte to the conversion of the string which you should not do and which may be screwing up your data. The output of compression is not likely to be easily represented by Unicode characters and you want to transmit the bytes, so I think that is not necessary.

This is a reformulated version of the sample in the API, which seems inaccurate, in any case: give it a try and remember that you can send the bytes, not characters.

public static byte[] compress( byte[] data ) {
    try {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        GZIPOutputStream gzipStream = new GZIPOutputStream( baos,
                                                            GZIPOutputStream.COMPRESSION_BEST,
                                                            GZIPOutputStream.MAX_LOG2_WINDOW_LENGTH );
        gzipStream.write( data );
        gzipStream.close();
        return baos.toByteArray();
    }
    catch(IOException ioe) {
        return null;
    }
}

Oracle Business Intelligence Data Warehouse Administration Console 11 g and Informatica PowerCenter and Guide of Installation PowerConnect adapters 9.6.1 for Linux x 86 (64-bit)

Hi all
I'm looking for full installation GUIDE for Oracle Business Intelligence Data Warehouse Console Administration 11 g and Informatica PowerCenter and PowerConnect 9.6.1 for Linux x 86 (64 bit) adapters. I just wonder is there any url that you can recommend for installation. Please advise.

Looks like these are ask you.

http://docs.Oracle.com/CD/E25054_01/fusionapps.1111/e16814/postinstsetup.htm

http://ashrutp.blogspot.com.by/2014/01/Informatica-PowerCenter-and.html

Informatica PowerCenter 9 Installation and Configuration Guide complete | Training of Informatica & tutorials

Create the query by combining data from DBSS_Data_Model and HostModel

Hello

I am trying to create a dashboard with the host server list and instances of SQL Server running on the host.

And I am interested in creating a query by combining data model of data in SQL Server (DBSS_Data_Model) and the host (Hosts) data model, say: which connects DBSS_SQL_Server_Host and host.

I wonder if there is way to do it?

Thank you

Mark

Something like this function should work:
```
def physicalHost = nullqueryStatement = server.QueryService.createStatement("(Host where name = '$hostName')")result = server.QueryService.executeStatement(queryStatement)objs=result.getTopologyObjects()

if (objs != null && objs.size() > 0) {     physicalHost = objs.get(0)}return physicalHost
```
When the input parameter "hostName" is DBSS_SQL_Server_Host.physical_host_name

Kind regards

Brian Wheeldon

Change the name of files and folders in data warehouses

Is it safe while a virtual machine is running to change the name of the VMDK files and folders in a data store?
I discovered that a VMS VMDK files have a different name for the virtual machine itself, then someone changed the name of the virtual machine after you have created the records

I had to do it a few times, will try to document the process that I follow. If you have many warehouses of data, then make a note of what data store, the machine is in. The next step is to unregister the virtual machine from vCenter / ESXi

Then, you need to enable SSH on your ESXi host

Use Putty or something similar for you connect to the ESXi host

Run the following commands (in italics)

cd vmfs/volumes / that is, if the data store that contains the virtual computer is named datastore1 then you would run cd/vmfs/volumes/datastore1

Then run ls to verify that the virtual machine file is here

Rename the folder by using mv

Change to the folder with cd

Rename the nvram, vmsd, vmx and running again is to say the mv

MV oldname.nvram newname.nvram

MV oldname.vmsd newname.vmsd

MV oldname.vmx newname.vmx

MV 1234.hlog - oldname newname - 1234.hlog

Rename the vmdk with vmkfstools EI oldname.vmdk newname.vmdk

Edit the vmx with vi newname.vmx

Replace references to oldname to newname in this file

Register the virtual machine with the ESXi server and it should be OK

This link gives more details and other methods VMware KB: rename a virtual machine and VMware ESXi and ESX records

Compression and query performance in data warehouses

Similar Questions

Maybe you are looking for