Проблемы с производительностью отчетов

sergeyt

Всем привет!
Есть большие проблемы с производительностью отчетов на мультикубе, помогите пж-та!
Записей в таблице фактов около 500тыс. Отчеты за 1 день получаются быстро, а за период времени - полная труба. Причем такое впечатление, что в течении дня система расскачивается и потом работает быстрее.
Собственно говоря вопросы:
Нужно ли делать вторичные индексы, если все поля уже включены в первичный(уникальный) индекс?
Можно ли достичь эффекта ускорения буфферизовав небольшие таблицы куба (к примеру е.и., в которой 16 записей и постоянно к ней идет обращения
Как часто надо делать сбор статистики для таблиц БД?
Имеет ли смысл после каждой загрузки удалять индексы?
Имеет ли смысл делать индексирование кубов после окончания всех загрузок и правильно ли это делать отдельной цепочкой?

p.s. На сколько помогает операция "свернуть" пакеты ? Пакетов около 8000 . С какой периодичностью надо делать сжатие?

BORLAND

Касательно Ваших впечатлений, того что в течении дня система "раскачивается" имеет обоснование с той стороны что наиболее часто запрашиваемые данные идут в Most Recently Used. В запросах лучше настроить Режим считывания и кэш режим. Отличия есть.
Из личного опыта, была ситуация, когда данные формировались в 1500 записей, в течении 30-40 минут. Данные собирались из проводок, кол-во которых только в начислениях было около 6 000 000. Для филиалов это вообще была труба. Выход был найден в организации данных и с использованием Data Mart. Были настроены параметры БД. На данный момент формирование того же отчета занимает 3-5 минут. Причем скорость во многом зависит от количества вычислений. Если хотите увеличить скорость формирования, то лучше создать предварительно рассчитанные показатели и хранить их в кубе. Допустим есть Основной Долг и Просроченный ОД в сумме они дают Всего ОД. И именно этот показатель наиболее часто используется в отчетах. Ну вот и представьте, что формируется отчет для 10 000 контрактов, гораздо быстрее считать эти данные, чем их рассчитать. В общем нужно проанализировать ваши Key Figures, выделить наиболее важные их них и загружать их готовые значения. На больших объемах выйгрыш существенный.
Ускорения можно также достичь за счет использования индексов при определении признаков в Кубе есть галочки "Кардинальность", при этом создаются BTree индексы в остальных случаях создаются BitMap индексы (применять лучше там, где кардинальность меньше, чем 0,01) По поводу их использования есть статья http://www.oracle.com/technology/pub/ar ... dexes.html
Если в Измерении всего 16 записей и лишь часть из них уникальна, почему бы не сделать их Line Item?
When loading transaction data, you should follow this procedure:
Load all master data.
Delete the indices of the InfoCube and its aggregates.
Drop the secondary indexes on the fact tables. Do not drop the primary key.
Turn on number range buffering.
Set an appropriate data packet size.
Load the transaction data.
Re-create the indices.
Turn off number range buffering.
Rebuild the indexes on the fact table.
Update the database statistics for cardinality information needed for the query optimizer to best query the execution path.

1. If exclusions exist, make sure they exist in the global filter area. Try to remove exclusions by subtracting out inclusions.
2. Use Constant Selection to ignore filters in order to move more filters to the global filter area. (Use ABAPer to test and validate that this
ensures better code)
3. Within structures, make sure the filter order exists with the highest level filter first.
4. Check code for all exit variables used in a report.
5. Move Time restrictions to a global filter whenever possible.
6. Within structures, use user exit variables to calculate things like QTD, YTD. This should generate better code than using overlapping
restrictions to achieve the same thing. (Use ABAPer to test and validate that this ensures better code).
7. When queries are written on multiproviders, restrict to InfoProvider in global filter whenever possible. MultiProvider (MultiCube) queries
require additional database table joins to read data compared to those queries against standard InfoCubes (InfoProviders), and you should
therefore hardcode the infoprovider in the global filter whenever possible to eliminate this problem.
8. Move all global calculated and restricted key figures to local as to analyze any filters that can be removed and moved to the global
definition in a query. Then you can change the calculated key figure and go back to utilizing the global calculated key figure if desired
9. If Alternative UOM solution is used, turn off query cache.
10. Set read mode of query based on static or dynamic. Reading data during navigation minimizes the impact on the R/3 database and
application server resources because only data that the user requires will be retrieved. For queries involving large hierarchies with many
nodes, it would be wise to select Read data during navigation and when expanding the hierarchy option to avoid reading data for the
hierarchy nodes that are not expanded. Reserve the Read all data mode for special queries—for instance, when a majority of the users need
a given query to slice and dice against all dimensions, or when the data is needed for data mining. This mode places heavy demand on database and memory resources and might impact other SAP BW processes and tasks.
11. Turn off formatting and results rows to minimize Frontend time whenever possible.
12. Check for nested hierarchies. Always a bad idea.
13. If "Display as hierarchy" is being used, look for other options to remove it to increase performance.
14. Use Constant Selection instead of SUMCT and SUMGT within formulas.
15. Do review of order of restrictions in formulas. Do as many restrictions as you can before calculations. Try to avoid calculations before
restrictions.
16. Check Sequential vs Parallel read on Multiproviders.
17. Turn off warning messages on queries.
18. Check to see if performance improves by removing text display (Use ABAPer to test and validate that this ensures better code).
19. Check to see where currency conversions are happening if they are used.
20. Check aggregation and exception aggregation on calculated key figures. Before aggregation is generally slower and should not be used
unless explicitly needed.
21. Avoid Cell Editor use if at all possible.
22. Make sure queries are regenerated in production using RSRT after changes to statistics, consistency changes, or aggregates.
23. Within the free characteristics, filter on the least granular objects first and make sure those come first in the order.
24. Leverage characteristics or navigational attributes rather than hierarchies. Using a hierarchy requires reading temporary hierarchy tables
and creates additional overhead compared to characteristics and navigational attributes. Therefore, characteristics or navigational attributes
result in significantly better query performance than hierarchies, especially as the size of the hierarchy (e.g., the number of nodes and
levels) and the complexity of the selection criteria increase.
25. If hierarchies are used, minimize the number of nodes to include in the query results. Including all nodes in the query results (even the
ones that are not needed or blank) slows down the query processing. The “not assigned” nodes in the hierarchy should be filtered out, and
you should use a variable to reduce the number of hierarchy nodes selected.
Ключевые слова для поиска:
Query read mode
Bitmap index
Statistics for cost-based optimizer
Partition
Parallel query option

Rednaxela

Незнаю куда запостить.... но т.к. это касается производительности то наверное сюда...

Индексы - структуры которые занимают доп. дисковое пространство на сервере БД, так?
Вопрос - увеличивает ли дополнительно созданный индекс занятое место в ОЗУ сервера? Если до - то только в случае выборки по дпнному индексу или вне зависимости от типа выборки при любом обращени к этой таблице? а то чета сомнения закрались...

BORLAND

К размышлению:
Вот что пишет Dan Tow в книге: "SQL Tuning"
Chapter 2. Data-Access Basics
"When the entire index is too large to cache well, you will see some physical I/O when accessing it. However, keep in mind that indexes generally cover just the entity properties that define the parts of the table that are most frequently accessed. Therefore, a database rarely needs to cache all of a large index—it needs to cache only a small subset of the index blocks that point to interesting rows—so even large indexes usually see excellent cache-hit ratios. The bottom line is this: when you compare alternative execution plans, you can ignore costs of physical I/O to indexes. When physical I/O happens at all, physical I/O to tables will almost always dominate physical I/O to indexes."

Когда входящий индекс слишком велик для кэширования, то для доступа к нему будет выполнено несколько операций физического ввода/вывода. Но стоит помнить, что в общем индексы охватывают только существенные свойства, которые определяют часть талицы, которая наиболее часто используется. Следовательно, БД редко нужно кэшировать весь огромный индекс и требуется кэширование лишь малого набора блоков индексов, указывающих на интересующие строки, итак даже если индексы большие, обычно они имеют прекрасный коэффициент попадания в кэш. Итог: Когда вы сравниваете альтернтивнеы планы выпролнения, вы можете игнорировать стоймость физического Ввода/Вывода для индексов. Если существует физический Ввод/Вывод, физический Ввод/Вывод таблиц всегда перекрывает физический Ввод/Вывод Индеков.

When solving performance problems, I frequently advise creating new indexes. When I do so, though, I almost always have in mind at least one specific query, which runs often enough to matter and has proven unable to run fast enough without the new index.

Когда я решаю проблемы производительности, я часто советую создавать новые индексы. Когда я так делаю, я всегда держу на уме, по крайней мере один специфичный запрос, который выолняется достаточно часто, но не может быть выполнен ьыстро без новго индекса.

Проблемы с производительностью отчетов

Кто сейчас на конференции