Subscribe to this thread
Home - General / All posts - Data inspecting and the 50,000 record limit
tjhb

8,280 post(s)
#05-Oct-18 01:55

We have been over this before, but I want to re-iterate it since nothing has changed.

Let's say I have a table with 300,000 (or 300,000,000) records. Not unusual at all.

I open the table, and Ctrl-click on column [elevation] to sort it.

(One annoying thing is that the display now ends up somewhere down the second half of the table, though previously I was at the top. Annoying but not serious.)

If I now scroll to the top of the record display, I am shown the minimum value in the [elevation] column.

That is the natural assumption, and it would be very hard to shake how natural that assumption is.

You have sorted the table for me, as I asked, and that is what I see.

But it is entirely false.

This is a major design mistake. You meant well, but it will break things.

tjhb

8,280 post(s)
#05-Oct-18 02:03

The correct answer is to begin to sort the entire table--but to allow immediate cancellation if the user thinks it is a mistake.

I still think that the whole 50,000-record decision was a horrible exercise in double-guessing the user.

It is our data. Please do what we say, even if that's stupid. (And allow us to cancel.)

danb


1,656 post(s)
#05-Oct-18 02:36

I agree 100% with this as I have been struggling with it while developing national fragmentation indicators using a base drawing of some 6 million polygons. I am so used in M8 to doing exactly the same sort of filtering and sorting of the data to quickly check for a variety of common issues such as double ups, maximum, minimum etc. in and out of selections, but the result in M9 is truncated and false for tables of over the 50k limit.

I would typically very happily wait to see the result and the ability to terminate the sort if it better judgment deems it to be unnecessary would be a great option.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

danb


1,656 post(s)
#05-Oct-18 04:44

It just got me again. Quite by chance working through the selection topic but with a large drawing open. I was initially confused as to why my interactive selection in the drawing appeared not to be selected in the filtered table. I finally realized that my selection filter was only applying to the first 50k records and that my selection was outside this range so I would have to uncheck 'Filter Fetched Records Only' to see them.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

Dimitri

5,045 post(s)
#05-Oct-18 14:07

I would typically very happily wait to see the result and the ability to terminate the sort if it better judgment deems it to be unnecessary

I admit to being curious. A table with 6 million records is like a roll of paper over 26 miles long covered with tiny text. 26 miles is a long way. Standing on a barren plain it is all the way to the horizon and back.

Suppose that table is sorted by some field: what task is it that you are doing where it helps you to have a roll of paper 26 miles long with small text sorted on that roll? How will you use that very long roll of paper in your workflow?

A tabular display of 50,000 records is like a roll of paper over a thousand feet long covered with small text. Just manually scanning such a long roll of tiny letters and numbers is a heck of a lot of work.

What I'd like to do is to hear examples of workflow that involves manual scanning of such immensely long rolls of endless lists of records printed in small text. I'm curious to hear specific cases of workflow where doing that is productive.

I think where people claim such tedious, manual, and likely unrealistic workflow is productive, they're only claiming that because either a) they are not yet acquainted with more productive tools that would make their tasks quicker and easier, or b) for whatever reason those tools are too hard to use or they don't exist.

What I'm looking for, ultimately, are ways to save people from having to do tedious, manual things that could cause errors, and instead, to give people the joy, speed and guaranteed accuracy of more automated tools.

tjhb

8,280 post(s)
#05-Oct-18 16:40

Dimitri I really wish you'd take the opportunity to listen in this thread, not cover in in screeds of paper (let's generously say wallpaper) thousands of feet long.

You were very stubborn about this design question the first time around, but here, as I say, is another opportunity to listen, to thoughts perhaps better than your own (it happens!), for the improvement of the product.

More tomorrow.

Dimitri

5,045 post(s)
#05-Oct-18 17:17

No stubbornness about design decisions. The task is to save people from tedium, from having to pour through tables that are miles long to accomplish what they want. If you have an example where you would prefer to pour through a table that is miles long instead of using a quicker tool, I'd be grateful to hear it.

More positively:

The 50K sample is just a default for what seems to be such a large number of records that people would not want to manually look through them all.

Manifold promised that an option would be introduced to change that to whatever people wanted. If you want 100,000 or a million or a billion, no problem. That will happen. If you want a bigger default, no problem.

But I get the impression that you don't really want to search through a million or a billion records manually. I get the impression you want a set of tools that in an automated way give you greater flexibility in scope, so that you can pull results out of a full scope without using SQL, but still using point-and-click tools like the sort.

For example, an option, like with filters, to apply sorts to the entire table and pull from that and not just from the sample, but without using a query.

Or, maybe, a find box that operates on total scope and not just on the sample, and without using the Select pane followed by a refresh.

Or (since you gave the example of interest in a minimum) a Minimum with optional range, like 10 least or 1000 least or whatever, as Selection templates.

All that is possible and more. But to do those more automated things it helps to know what the workflow is and the task is. Then it becomes possible to provide general, orthogonal facilities that automate desired tasks without having to manually poor through very long tables.

KlausDE

6,229 post(s)
#05-Oct-18 07:02

I guess there are good reasons to have this limit for remote and big datasets and for a simple and internal future opportunity to delegate such things like Min() or Sort to the remote DB.

For now wouldn't it be a workaround to remove sort on click the header but make it a transform function ?

tjhb

8,280 post(s)
#05-Oct-18 07:24

I think it would, I agree.

The transform would sort the whole table--and could be quickly cancelled.

No spurious sort (part of table) on column header. Get rid of it.

Dimitri

5,045 post(s)
#05-Oct-18 10:34

Two wrong assumptions:

The transform would sort the whole table--and could be quickly cancelled.

The above assumes away key issues. If you could do that in a transform, why not with a Ctrl-click on the header?

No spurious sort (part of table) on column header.

You've assumed away a massive user error: the sort is exactly correct, not spurious, on the subset view provided. That's something to consider in any proposed modification. Usually, somebody who won't RTFM in simple things isn't going to RTFM in complex things, either.

Dimitri

5,045 post(s)
#05-Oct-18 17:06

More positively:

The 50K sample is just a default for what seems to be such a large number of records that people would not want to manually look through them all.

Manifold promised that an option would be introduced to change that to whatever people wanted. If you want 100,000 or a million or a billion, no problem. That will happen. If you want a bigger default, no problem.

But I get the impression that you don't really want to search through a million or a billion records manually. I get the impression you want a set of tools that in an automated way give you greater flexibility in scope, so that you can pull results out of a full scope without using SQL, but still using point-and-click tools like the sort.

For example, an option, like with filters, to apply sorts to the entire table and pull from that and not just from the sample, but without using a query.

Or, maybe, a find box that operates on total scope and not just on the sample, and without using the Select pane followed by a refresh.

Or (since you gave the example of interest in a minimum) a Minimum with optional range, like 10 least or 1000 least or whatever, as Selection templates.

All that is possible and more. But to do those more automated things it helps to know what the workflow is and the task is. Then it becomes possible to provide general, orthogonal facilities that automate desired tasks without having to manually poor through very long tables.

Dimitri

5,045 post(s)
#05-Oct-18 10:24

For now wouldn't it be a workaround to remove sort on click the header but make it a transform function ?

Sort by ctrl-clicking the header is a very useful and fast tool. Like Filters, it is a wonderful convenience for those who know how to work the tool.

That people who have not yet learned the tool may not be able to fully understand what it does is a routine consequence of providing rich tools to handle complex situations. Denying a full set of capabilities to skilled users is not the answer to that problem.

Dimitri

5,045 post(s)
#05-Oct-18 13:22

For now wouldn't it be a workaround to remove sort on click the header but make it a transform function ?

A workaround to what? That's a key question here. What are you trying to accomplish?

Let's zero in on that so that if the right tool is available, we can talk about how to use that right tool. If the right tool is not available, no problem, what should that right tool look like? ...so you can effortlessly accomplish what you need to do.

Whatever we do, it is important to do that without muddying the rest of the system. Let me give you an example of what I mean.

Suppose, for some reason, you want to see a table with records sorted by some field. Just why it is you want to see a table sorted by the value of some field drives what makes sense to do.

A Transform template isn't the way to provide a sorted view, because transform templates either change something in a table or they create new tables. There's no such thing as a "sorted" table in Manifold or, for that matter, in any big DBMS.

To see records in a table in sorted order, use an ORDER BY query to get a results table, a report, in sorted order. That's not transforming a table. Wouldn't it be better, if a new tool is required, to put it somewhere more related to what it does?

In fact, such a tool is already available: I have a table of 9 million records for LiDAR data, and I want to see records where Intensity is greater than 89. Cool. I have three, very fast ways to do that within the table window, if that is the user interface preferred:

1. Just right click on a value of 89 and choose the preset >= filter choice. Done. That applies the filter to those 50K records already fetched. It's a fast way to get a feel for what is going on based on a representative sample.

2. Having done that, in View - Filter uncheck Filter Fetched Records Only, so the filter now goes through all 9 million records and provides the first 50K records it finds. That's an even more exhaustive way. When you uncheck Filter Fetched Records Only you're running filters against the entire data set: run several filters on various records and you might reduce your results down to just a handful of records out of 9 million.

3. Use SQL with a single click: In View - Filter, click on Filter using Query. Right away, that pops open the command window loaded with the equivalent query to 2) above. Note that this method in no way limits what it produces: if it turns out that 8 million of those 9 million records are picked out by the query, they'll all be available for any further use. It's only if you look at them in a window manually that you get 50K presented as a sample, but the universe of all 9 million records will be your source for any queries you create based on that "Filter using Query" query.

Now, what you do with the above depends on why you want to see a sorted table.

Suppose you want to see a sorted table so you can manually scan the table to find duplicates. No way are you going to be able to do that reliably with 50,000 records, not by eye. Suppose you could dial that number up or down, so you could fetch, say 100,000 records. That doesn't change anything, since it is still too much to do anything with by eye.

The right way to find duplicates in any table larger than a trivially small table is using SQL. You're going to make errors when trying to find duplicates by spending days scanning page after page on the monitor, dozens of pages a minute, but SQL isn't ever going to make a mistake.

Look, the current table window is a hack, put there to cater to expectations based on experience with very small data. It's still highly useful for that, but sometimes it promotes really bad habits and amateurish workflow (like scanning by eye for duplicates) that work against developing more professional skills that scale better. I still love it.

I had as big a role as anybody in the design of the table window tools in 8 and I love the thing. But I'm honest enough to admit that for certain things it seduces you into amateurish workflow that is much worse than learning a better way to do it.

8 has facilities to save you from the worst of that, like the Duplicates functions in the select toolbar. Such tools, which automate a better, SQL, way of doing things I believe are a better way to go in 9, than to expand a tool which is already an unrealistic GUI (the table Window) into something even more unrealistic.

Let's get away from the unrealistic belief that the best way to accomplish stuff in tables is to manually scan screen after screen by eye in a table window. That's like manually rowing across the Atlantic instead of taking a First Class flight. Let's consider two sets of numbers:

50,000 records - At 56 records per table window that's 893 screens. If each screen is 40 cm high, that's 357 meters worth of small text records if each screen were laid end to end. I don't believe anybody who says they accurately can scan, by eye, a roll of paper 357 meters long that is covered with tiny text and do that without making mistakes. That's just daft.

Scan a few screens at the beginning or the end, sure. Scan a screen here or there somewhere in the "middle", sure. But all 357 meters? No way. And 50,000 records is just the "sample" size the table window provides.

8,000,000 records - That's equivalent to a roll of paper 57 kilometers long. If you want to use a roll of paper 57 kilometers long covered with tiny numbers as your interface to learning anything at all about LiDAR data you're nuts.

The only sensible way of handling such tasks is through automated means, which is why people invented SQL.

Can Manifold provide ever more useful tools to make it easier to automate various things using SQL? Sure. The only question is what are the things you do, which could be automated. For example, if you're in the habit of finding duplicates, why not a "Duplicates" and "All Duplicates but First" choices along with a bunch of others in the View - Filters menu?

KlausDE

6,229 post(s)
#05-Oct-18 14:48

A workaround to what?

A workaround to people in a hurry or manual phobics. Usually not perceptive in this situation. Your educative approach must fail and evoke not irritation, a hold and switch to learning mode as intended but must upset in this situation.

But you can warn them in case of no full fetch of data and give the opportunity to switch to the next best and appropriate tool. You could preselect a 'Sort' tool in Transforms and set the proper default values.

That probably would be better than just to drop the sort function of the column header.

[edited] Anyway you'd need a Sort Table Template - and preferable not Add Component but "Update" a sorted prefetch of the first or last sorted 50.000 records.

Dimitri

5,045 post(s)
#05-Oct-18 15:26

I hear you, but something to consider is that people who are in a hurry, or people who won't read user manuals, .... well, if they don't do that for simple things, they never do that for more complicated things.

You can't help them by pushing an issue from a relatively simple setting into another setting that requires greater skill and understanding.

Where you can help them is learning what their task is, what they are trying to accomplish, and then provide a fast and easy tool that does that for them.

You could preselect the 'Sort' tool in Transforms and set the proper default values

There's no sense to having a "sort" tool in Transform, because tables don't have any sort order. None. There is no meaning to "sorting" a table, and no SQL thinks of tables as being a sorted entity.

There is no meaning to SELECT * INTO [sorted table] FROM [unsorted table] ORDER BY... That's incorrect SQL.

What you can do is provide a report, that is, a results table that lists records in some sort order. But that's just a temporary report. It's not a table in the database.

One reason a table window is a hack (a very convenient hack, but still, a hack) is that it provides features like a sorted display which gives the impression that there could be such a notion as sort order within a table. That's an example of what I meant when I wrote that the table window can seduce you into amateurish notions that are flat out wrong when it comes to a more effective understanding of SQL and, ultimately, way more productive workflow.

But you can warn them in case of no full fetch of data and give the opportunity to switch to the next best and appropriate tool in this case.

Already there. Every table window for a table that's larger than the 50k records shown has an icon that indicates there is no full fetch of data. People who ignore that have skipped over the very basics of tables.

Look, 9 is a big system with very many facilities already and thousands more coming into the product in the next 12 months.

If you're going to try to proof the system against people who are unwilling to learn how to use it, say, by popping open confirmation and warning dialogs ("Hey out there! This icon here [big blinking arrow appears] indicates that only the first 50,000 records are shown! There are more records! ") at every possible step where somebody who refuses to study the basics might get confused, you're going to make the system totally unusable for skilled people.

The opportunity to switch to the next best and appropriate tool is there as well: it's in the View - Filters menu, enabling people to choose a variety of one-click variations, to pull from the entire table, to have Manifold write a query for them, etc.. People who refuse to study the basics won't know about those facilities, but they are there and it doesn't take much skill or learning to benefit from them.

Could those be expanded to add yet more nice things that can get done with a single click? Sure.

You could expand that context menu with a variety of yet more one-click modifications, and you could add many more templates to the Select panel, for things like selecting Duplicates, select All Duplicates but First, and so on.

But I have to say, I think this discussion risks becoming highly misleading, in that I hear an assumption that a very tedious, manual approach searching through miles of text displays is somehow easier or more productive than good automated tools.

I could be wrong, but I don't think anyone disagrees that the table window is OK for small tables, where people really do sometimes sort manually and they can do that productively. I use Manifold all the time as a personal information manager, for example, to keep track of things. But such tables tend to have just a few hundred records in them at most. It's not millions of records. When you get into bigger tables, automated tools work better than manual searches.

I say that 50,000 records, a roll of paper with small text over 350 meters long, is already too much to look through manually. That's tedious and inefficient. If your initial filtering leaves you with 50,000 records to search through manually, you need to do some more filtering. Yet here people are asking to make it even more tedious and inefficient by extending that roll of paper to many kilometers long. Why is that better than good automated tools?

I think that is a fair request: give me an example of real-life workflow where searching through a mile of fine-printed text is a more productive and higher accuracy technique than using an automated tool.

Dimitri

5,045 post(s)
#05-Oct-18 15:29

[edited] Anyway you'd need a Sort Table Template - and preferable not Add Component but "Update" a sorted prefetch of the first or last sorted 50.000 records.

You have that now. Sort on something using the all records option and you get the first 50,000 records. Do the same with the reverse sort and you get the "last" 50,000 records.

But either way, just what is your workflow that you want to manually read through 50,000 records? What is the real-life task, a real example?

KlausDE

6,229 post(s)
#05-Oct-18 16:46

A "real-life" example is exactly the reason to undertake such painful manual work. Real-life data are full of errors. That's real-life. And it's easy to "see" outliers and to near down a sensible range of trustable data with human pattern recognition. I don't say anything against a histogram tool to get this same idea. But it's not there jet. Along the way you notice NULLs that might be unexpected or that you data source has gone through a format that doesn't know NULLs and <no data> appears as 0 ...

All - ahh, no - many, sorts of traps hidden in real data are obvious to the eye in a sorted arrangement.

Dimitri

5,045 post(s)
#05-Oct-18 19:49

And it's easy to "see" outliers and to near down a sensible range of trustable data with human pattern recognition.

I agree 100%. But you can do that now, true? What stops you from looking through however many screens you want, if the 900 screens given to you first are not enough? 900 screens is the default sample with 50,000 records. You don't have to stop there.

When it comes to looking at tables, Manifold is like an ice cream store that will give you however many scoops you want for free. 900 scoops didn't fill you up? No problem. Here are another 900 scoops.

If you want more than 900 screens, no problem. That's easy to do in moments with a short query (the View - Filter choices will generate that for you with one click) so you can take a look at 900 screens from the "end" of the data, another 900 screens from the "middle," yet more sets of 900 screens ordered by various criteria, and so on. All that is with the system exactly as it is today. In moments you can pull many thousands of screens to browse, enough to keep your eyes bleeding for weeks of manual scanning of records, line by line, screen after screen, if that is what you want to do.

I agree data discovery tools like histograms, automatic readouts of max/min values for all fields and things like that are great.

Given that you can do all the manual browsing you want today, might it not be better to shift gears and talk about how automated data discovery tools, like the above, might provide better productivity than spending days or weeks scanning hundreds or thousands of screens worth of records?

Dimitri

5,045 post(s)
#06-Oct-18 06:52

I guess there are good reasons to have this limit

To clear up what seems to be a misunderstanding: the word "limit" is misleading in the title of this thread because there is no limit, not 50K, and not anything. You can browse through however many records you want.

The only question is what is a convenient size for delivery of those records for your workflow. 900 screens (50,000 records) is just a default delivery size that seems to be plenty and convenient without choking people.

Let's take the analogy of the "All You Can Eat" Manifold ice cream store, where Manifold will give you as many scoops of ice cream as you want for free.

When you walk into the store, sit down at a table, and tell the waiter "I want ice cream" right away the waiter brings you 900 scoops. Why 900? It's not a limit, it's just more than most anybody wants to eat at first. The waiter drops off 900 scoops in a huge pile on the table, so high you might not even be able to reach the top with your spoon.

You take a bite here from the huge pile, you take a bite there, you nibble at the bottom of the pile, you stand on the tips of your toes to try a bite from the top of the pile. You may wonder how on Earth anybody can possibly get through 900 scoops.

But if you do want more, even if you throw away most of that first pile of 900 scoops, and you say "more ice cream" the waiter brings you another 900 scoops. If you decide you want more than one huge pile of ice cream at a time and you say, "Bring me more - this time, all chocolate," the waiter pulls up another table for you to use and brings you 900 scoops of chocolate ice cream, so now you have two huge piles to eat. You can fill the entire restaurant, many tables, with huge piles of ice cream, more than you could eat in weeks or years.

That's true, by the way, of table windows. Using the one-click query composition feature of View - Filters you can in moments, based on whatever criteria you like for various fields, create a second, third, fourth, etc., table display, each with 900 screens of data to browse. Have them all open simultaneously to compare various sort orders, etc., within them if you like. In a few seconds, less than a minute, I've just popped open 4500 screens worth of table to browse from my 9 million record sample. It was effortless, like snapping my fingers.

I'm just saying, don't think "limit" when there is no limit. That the default serving size the waiter brings you is 900 scoops doesn't mean the restaurant cuts you off at 900 scoops. If you want more ice cream, just snap your fingers and the waiter will bring you more. No limit.

As noted in my other post, Manifold will add the ability for you to specify whatever default you want. If you prefer a million scoops instead of 900 scoops at a time, you could do that as well.

KlausDE

6,229 post(s)
#06-Oct-18 07:47

D'accord. We'r not speaking of a limit of the software but about a reasonable handling of unlimited data.

But ...

when it comes to analysis and to overview this mass of data the very successful and for this size of data unusual reactive solution entraps the overhasty user.

So we'r asking for nothing but a warning and a comfortable detour to appropriate methods to inspect unlimited data.

And to apply Sort on a chunk of data is a clear hint that user should be reminded of the limited insight he should expect for the full dataset.

Dimitri

5,045 post(s)
#06-Oct-18 09:08

So we'r asking for nothing but a warning and a comfortable detour to appropriate methods to inspect unlimited data.

OK. As you know, Manifold tries to maintain a "quiet cockpit" in the user interface. It avoids flashing lights and other distractions.

There is a warning today. It appears in the illustration see above, part of the Table topic.

The icon above indicates there are more records. It should be a routine thing, like a + sign next to a folder in the Project pane hierarchy that indicates there is more in that folder.... we don't want to make that + sign a bright red, blinking + sign to warn people there is more in the folder. Is it a good idea to make the icon above a bright red or blinking icon to make a stronger warning?

If you do that, why stop there? There are many other places where you could have more emphatic warnings for people who have not read the documentation. Don't you agree that if you go down that path you end up with a very noisy cockpit, and you make the system more distracting and more difficult for people who have invested into learning the system well?

And to apply Sort on a chunk of data is a clear hint that user should be reminded of the limited insight he should expect for the full dataset.

How would that warning look? "Please make sure to read the Tables topic to understand how to work with 50,000 record samples." or, "Please make sure to review a reasonable portion of the full data set to get a good impression of the whole thing." or "You've only scrolled through 14,834 records. Please be aware that is not the entire data set." I'm not being sarcastic... if you try to write what such a warning would work for every way that someone who has not read the documentation might be puzzled, it is very difficult to write such text that does not come off, to at least some people, as sarcastic.

Why stop there? There are endless nuances in a complex system where users who have not read the documentation will be puzzled. Can you anticipate every such case? Why not warn people, when they choose to use an integer field, that "CAUTION: Integer data types do not show the full numeric accuracy a floating point data type will show. Example: 3.1415 in a floating point number will be a round 3 in an integer data type."

Attachments:
btn_fill_record_placeholder.png
il_table_placeholder_records.png

KlausDE

6,229 post(s)
#06-Oct-18 09:51

Actually I have worked with big data but didn't notice the changed scroll icon.

I have to think about not a warning but a choice to use the appropriate tool to sort the full dataset.

This isn't an element that distracts the appearance of the UI when it's not needed. I should only appear when you sort the table window of a table to big for a full fetch. And the choice would be: continue to sort the extract OR switch to a Sort Transform with defaults for the current situation.

Tims 'Cancle' function may be not so important when you explicitly switch the context to the Transform.

Dimitri

5,045 post(s)
#06-Oct-18 12:30

but a choice to use the appropriate tool to sort the full dataset.

I think that is where the focus should be: how can there be better tools to do what people want to do? I don't think that focus lies in literally sorting and presenting at once a huge dataset that no person can realistically use, but to better present extractions that give people what they want for their review and decision making. When dealing with tables that take weeks or months to scan through manually, regardless of sort order, "less is more."

I've sent in some suggestions on that already. For example, I've suggested a context menu on the placeholder record item that provides a "More..." command. That would fetch another 50K record serving of data. That's a baby step, but it's no harm done, and it would be a quick way of jumping to get another huge sample.

Another idea: suppose instead of just a Ctrl-click on the column head to sort the current serving by that column, a Shift-Ctrl-click would be a command to report the first 50K records, that is, a serving, from the entire table in that sort order? That would be very dangerous for a big table in some data sources, and of course some people might be surprised in such circumstances, but I suppose hiding it behind a Shift-Ctrl-click option would help ensure it was used only by people who learned when it was the sensible thing to do and when not.

Such things can be dangerous, by the way, because "cancel" is not always an option with some data sources. You can't just assume "oh, if the user doesn't like how long it takes he can cancel out of it." But that such commands might be puzzling in unskilled hands doesn't mean that you should not provide them for skilled users.

I can think up many commands and shortcuts like that. For example, you could right click on any column head and choose "top view" or "last view" or "middle view" or "random view" and have that be a short cut way of getting a 900 screen serving based on a sort of that field, instead of using the View - Filter - Filter using Query command to quickly construct the equivalent query. That Filter using Query command is fast, but a single click on a drop down menu is even faster, as the Filters facility demonstrates.

There is no reason why such things could not be reduced to single-click commands. But that a host of such pre-built options could be imagined and implemented does not mean they should be, or that there isn't a better way that could be imagined and implemented.

For example, I like what you wrote about histograms better. I don't necessarily mean histograms literally, although they certainly could be part of the picture, I mean the general idea of more automated tools for data discovery and review in an interactive way, which would be clearly superior to any thought of browsing hundreds of screens manually. I like the idea of a data discovery pane, perhaps as part of contents, that could provide flexible and revealing insights.

Something, for example, that gave maximum and minimum value readouts, or which provided one-click launching of big samples of screens to provide a closer, more manual look once the overall readouts helped you zero in on something. The idea is to allow your human intuition to detect anomalies without the tedium of manual browsing.

ColinD


1,874 post(s)
#06-Oct-18 13:16

A while back in the Radian beta IRC I suggested that a function for summarizing the data in fields would be very useful. Adam referred to it as data mining and did put it on the wish list. Something like min, max, number of records, percentiles for example.


Aussie Nature Shots

Dimitri

5,045 post(s)
#05-Oct-18 09:56

But it is entirely false.

This is a major design mistake. You meant well, but it will break things.

No, it is entirely true. When you tell the system to sort a subset, the system sorts that subset and shows you a true result. What breaks things is if, despite all teaching, a user insists that a subset view is a view of the entire table.

With due respect, you're assuming away the key issues. I'd like you to start thinking about those key issues, and not to assume them away, because I would like to get the benefit of your thinking on this, to get the benefit of your experience in making this better.

Look, I happen to like table windows as primary interfaces. Like you (apparently), I've been trained up by years of experience with consumer grade software to expect that a table window is a great primary interface for doing things like sorting a table by a field: Ctrl-click on the column head and your done. Choose one of those wonderful point-and-click filters and it's like magic.

Who wouldn't wan't to have that for all of their tables, including tables with billions of records? Everybody wants that, including all of those people out there in the trillion-dollar IT industry that is so much bigger than GIS.

Let's agree it would be wonderful if table windows could be primary interfaces for big tables, including all the functions we love so much. Let's agree that it would be wonderful to allow a user to begin to sort the entire table, but then to allow immediate cancellation if the user thinks that to be a mistake. Agreed? OK.

Now that we've agreed on the merit of that idea, let me ask you a key question. I think that when you answer that question, you'll be forced to consider issues that should not be assumed away. The question:

Why doesn't Oracle do that?

We are talking about a generic thing, after all, that is broadly applicable throughout DBMS. If you think our idea would be valuable to us peasants here in GIS, a microscopic fraction of the trillion dollar worldwide IT market, our idea would be far more valuable outside of GIS in the broader IT market.

If it was so simple as, "OK, maybe this is daft, but just let the user cancel..." why doesn't Oracle do it?

I'd be curious to hear your analysis, to continue the conversation.

KlausDE

6,229 post(s)
#05-Oct-18 15:05

Why doesn't Oracle do that?

Because they are not focusing on manually manipulating data. Mfd should be IMHO.

Dimitri

5,045 post(s)
#05-Oct-18 15:48

Because they are not focusing on manually manipulating data.

Why do you think they are not focused on manually manipulating data? Could it be that manually manipulating data is tedious, slow, unpleasant, and error-prone compared to automated methods once you get beyond very small numbers of records?

dchall8
501 post(s)
#05-Oct-18 15:53

On the periphery of this discussion, M8 and Excel have some nice GUI display features that would be nice to see in M9.

In M8 when a table has focus, the bottom border bar shows the number of records, number selected, and number filtered. Perhaps if a table in M9 contained more than 50,000 records, there could be a visual flag to indicate that condition. And if a filter selects more than 50,000 records, that could be flagged. When I say flagged, I'm thinking of putting the number of records (or filtered record sets) larger than 50,000 in a bold or even bold red font that stands out to the user. The kicker to this is that a detached table (Alt+click Name Tab) does not have a bottom border bar to display information.

Similarly one of the visual features I like in Excel is, if you have filtered a column, the drop-down icon at the top of the column changes to a funnel icon (image attached). It can be easy to forget your filters without that visual cue out there in front of you.

Attachments:
Excel Filter Indicator.jpg

Dimitri

5,045 post(s)
#05-Oct-18 16:08

The kicker to this is that a detached table (Alt+click Name Tab) does not have a bottom border bar to display information.

No problem: there is a Components panel that is always on, which can provide whatever readouts are desired. It already provides readouts on total number of records, so that's the place to put more info.

People who work with multiple windows can, and should, have a multiple monitor display. One of the least expensive things you can do to improve productivity is to expand the desktop real estate you have available.

I prefer three monitors because then you don't get a border right in the middle. That's the GTOPO30 terrain elevation for the entire world displayed.

And if a filter selects more than 50,000 records, that could be flagged.

Already there. See the icon and discussion under "Placeholder Records" in the Tables topic. If you had a readout in Components such as total records: xxx, filtered: 50,000+ it's enough to have a short indication like a + sign there are more.

I like the idea of adding a funnel icon to a column header that participates in a filter. The filter should appear in the tooltip when you hover over it, as well.

Attachments:
03_monitors.jpg

adamw


8,139 post(s)
#09-Oct-18 10:17

It seems to me that one of the main issues here is that when we were extending the table window with new features and were having these talks about the record limit, we ended up expanding filters to the entire table (via View - Filter - Filter Fetched Records Only switch, which is on by default, but can be turned off) but did not do the same for orders.

Also, maybe we don't warn the user enough that the table has too many records to fit into the window. As in, we do have the fill record indicator way down, but maybe it is too out of the way to work as a warning so you miss it.

Finally, the default of performing filtering / ordering on the fetched records only, might be too technical. There are zero issues with small tables, but with large tables which present the choice of whether to work with the entire table or a portion of it, choosing to work with just a portion of it - silently, as per the point above - is perhaps too error-prone. You click the field, you think you got the whole table sorted, but that's not the case and this is bad.

We have to think about what we have to do here. As Dimitri says, we promised to allow increasing the record limit on the table window - and we should perhaps do that first, because it's simplest - but maybe we could do better.

KlausDE

6,229 post(s)
#09-Oct-18 14:05

And I love Dimitri's idea of a 'data discovery pane'

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.