1.06-02 Don't use subsets on large lists

Lists, and subsets take up space within a model, so if you need multiple subsets of the same list, consider whether they would be better as separate lists.  This is especially valid if the lists do not overlap and they are being fed from a Data hub.  For overlapping subsets or if there is a need to “consolidate” the value back to the master list then subsets are a valid construct for model efficiency

Tagged:

Comments

  • Misbah
    edited January 2023

    Rule 1.06-02 Don't use subsets on large listsIt is better to create a list on its own if the Subset is more than 75% of the list. This goes against “Performance” of PLANS if you wish to create subsets on large lists

    Here is how it was done in Pre Planual Era: Without checking the size of the lists we used to create subsets thinking it saves space and helps in model optimization. Little did we know that there can be a performance hit because of such large subset and at the same time with no space saving. For Example List A with 10,000,000 transactions having a subset which has got 75% occupancy, subset used to be created thinking it saves space for 25% of list items.

    What is wrong with this method? First we need to understand what subsets really are? Subsets are essentially the lists within lists. List Subset items consume as much space as List items do (which is roughly 500 bytes per item) even if that list or subset is not being used as a dimension in any module. When a large list with top level which has got one subset in it is being used in modules it impacts the Performance because the system has to aggregate the data not only for the lists but also for the subsets and re-aggregate in all those modules where this particular list and subsets are being used as dimensions. Performance takes a hit when you add or remove subset items from such lists

    Also there is a myth that ALL subsets help in space optimization. That is not true. Here is the analysis on it

    A List with 10,000,000 List items in it will contain 5,000,000,000 Bytes of space which is roughly equal to 4.7GB. If we add a subset to this list which has got 75% occupancy of the Original list meaning the subset will have atleast 7,500,000 list items in it and will consume additional 3,750,000,000 bytes of space which is roughly equal to 3.5GB. List which was originally consuming 4.7GB space is now consuming 8.2GB Space (4.7GB from Original list and 3.5GB from Subset). Model builders have to take a judicious call on this whether that subset can save 3.5GB in due course of model building which in turn will depend on how many times that subset will be referenced and on how many intersections. Let’s see what happens when this list and/or subset is being used as a dimension in any module.

     

     

    Format

    Space Used

    If List Used

    If Subset Used

    Diff (In MB)

    Line Item 1

    Number

    8 Bytes

             80,000,000

           60,000,000

    20

    Line item 2

    Number

    8 Bytes

             80,000,000

           60,000,000

    20

    Line item 3

    Time Period

    4 Bytes

             40,000,000

           30,000,000

    10

    Line item 4

    Time Period

    4 Bytes

             40,000,000

           30,000,000

    10

    Line Item 5

    List

    4 Bytes

             40,000,000

           30,000,000

    10

     

           280,000,000

        210,000,000

    70

       

    Note: Based on Simple module having a single dimensions

     

    As you can see using subset in a module saved 70MB of a space for 5 line items. This subset has to save 3.5GB of a space to Breakeven which in turn will depend on the number of times this subset is being dimensioned by line items/modules

    Here is how it should be done in Planual Way: Create a different list altogether instead of a subset for large lists.

    Advantages:

    1. System will not have to aggregate the data for List and Subset at the same time and for modules.
    2. Only one list will be impacted upon import

     

  • Hello,

    I am strugling with subset type issue.

    End-users needs to use "provisional positions" which exist among other "real positions" in large Position list (30k items). Whenever they need to use provisional position the plan is that they rename it and then they tick the box to make it active. That means I have subset for Position: Active. When you want to tick for a position and make it active you need to tick the box in the subset but it takes more than 15 20 seconds because the list is long. This influence on other action and processes whenever I have something to deal with this subset on the list Position..

    Do you have any advice what and how I can reduce loading time when someone wants to tick the box to make position active in the subset?

  • @Gluvakov

    The ticking of the "Active" for the subset and its running/processing for 15-20 seconds is very likely caused by the running of the action to set the subset. When this occurs, the model then recalculates every module using that subset. Something you might consider, how many non-active positions do you have in the list? If there are a lot, you might want to consider cleaning them up and/or moving them to a different list (this could very well be a lot of work, so be careful with this). Something else to consider, review all modules/line items using this subset and review the summary method to see if the line items really need to the Summary to Summary vs None. Changing it to None will help decrease the number of calculations being done when a position is switched to Active.