1.06-02 Don't use subsets on large lists

rob_marshall · May 2020

Lists, and subsets take up space within a model, so if you need multiple subsets of the same list, consider whether they would be better as separate lists. This is especially valid if the lists do not overlap and they are being fed from a Data hub. For overlapping subsets or if there is a need to “consolidate” the value back to the master list then subsets are a valid construct for model efficiency

Misbah · September 2020

Rule 1.06-02 Don't use subsets on large lists. It is better to create a list on its own if the Subset is more than 75% of the list. This goes against “Performance” of PLANS if you wish to create subsets on large lists

Here is how it was done in Pre Planual Era: Without checking the size of the lists we used to create subsets thinking it saves space and helps in model optimization. Little did we know that there can be a performance hit because of such large subset and at the same time with no space saving. For Example List A with 10,000,000 transactions having a subset which has got 75% occupancy, subset used to be created thinking it saves space for 25% of list items.

What is wrong with this method? First we need to understand what subsets really are? Subsets are essentially the lists within lists. List Subset items consume as much space as List items do (which is roughly 500 bytes per item) even if that list or subset is not being used as a dimension in any module. When a large list with top level which has got one subset in it is being used in modules it impacts the Performance because the system has to aggregate the data not only for the lists but also for the subsets and re-aggregate in all those modules where this particular list and subsets are being used as dimensions. Performance takes a hit when you add or remove subset items from such lists

Also there is a myth that ALL subsets help in space optimization. That is not true. Here is the analysis on it

A List with 10,000,000 List items in it will contain 5,000,000,000 Bytes of space which is roughly equal to 4.7GB. If we add a subset to this list which has got 75% occupancy of the Original list meaning the subset will have atleast 7,500,000 list items in it and will consume additional 3,750,000,000 bytes of space which is roughly equal to 3.5GB. List which was originally consuming 4.7GB space is now consuming 8.2GB Space (4.7GB from Original list and 3.5GB from Subset). Model builders have to take a judicious call on this whether that subset can save 3.5GB in due course of model building which in turn will depend on how many times that subset will be referenced and on how many intersections. Let’s see what happens when this list and/or subset is being used as a dimension in any module.

	Format	Space Used	If List Used	If Subset Used	Diff (In MB)
Line Item 1	Number	8 Bytes	80,000,000	60,000,000	20
Line item 2	Number	8 Bytes	80,000,000	60,000,000	20
Line item 3	Time Period	4 Bytes	40,000,000	30,000,000	10
Line item 4	Time Period	4 Bytes	40,000,000	30,000,000	10
Line Item 5	List	4 Bytes	40,000,000	30,000,000	10
			280,000,000	210,000,000	70
		Note: Based on Simple module having a single dimensions

As you can see using subset in a module saved 70MB of a space for 5 line items. This subset has to save 3.5GB of a space to Breakeven which in turn will depend on the number of times this subset is being dimensioned by line items/modules

Here is how it should be done in Planual Way: Create a different list altogether instead of a subset for large lists.

Advantages:

System will not have to aggregate the data for List and Subset at the same time and for modules.
Only one list will be impacted upon import

Gluvakov · December 2023

Hello,

I am strugling with subset type issue.

End-users needs to use "provisional positions" which exist among other "real positions" in large Position list (30k items). Whenever they need to use provisional position the plan is that they rename it and then they tick the box to make it active. That means I have subset for Position: Active. When you want to tick for a position and make it active you need to tick the box in the subset but it takes more than 15 20 seconds because the list is long. This influence on other action and processes whenever I have something to deal with this subset on the list Position..

Do you have any advice what and how I can reduce loading time when someone wants to tick the box to make position active in the subset?

rob_marshall · January 2

@Gluvakov

The ticking of the "Active" for the subset and its running/processing for 15-20 seconds is very likely caused by the running of the action to set the subset. When this occurs, the model then recalculates every module using that subset. Something you might consider, how many non-active positions do you have in the list? If there are a lot, you might want to consider cleaning them up and/or moving them to a different list (this could very well be a lot of work, so be careful with this). Something else to consider, review all modules/line items using this subset and review the summary method to see if the line items really need to the Summary to Summary vs None. Changing it to None will help decrease the number of calculations being done when a position is switched to Active.

1.06-02 Don't use subsets on large lists

Comments

Title

Categories