Subscribe to this thread
Home - General / All posts - Does Future have a Random Number function?
artlembo


2,870 post(s)
#09-Sep-17 00:07

I have been searching, but have not seen a function for generating a random number. does one exist?

tjhb

7,503 post(s)
#09-Sep-17 00:16

Nope, not yet. So a tiny .NET script function is best bet.

I haven't done it though. We need to be very careful about maintaining entropy when calling from multiple threads. Sequential calls from different threads can otherwise return the same "random" value.

Dimitri


4,271 post(s)
#09-Sep-17 07:29

Given that "random" numbers are pseudo-random, you could do a quick hack using UuidMakeNew() to generate a UUID, cast it as a string, grab a group of however many characters you want from that string, and then convert from hex to a decimal number. Depending on what you need the pseudo-random number for, that could be OK.

Dimitri


4,271 post(s)
#10-Sep-17 13:11

and then convert from hex to a decimal number.

Tim is right... a small script is the way to go. When out of curiosity I tried to do this entirely in SQL I could not find an SQL function to parse a hexadecimal string into a number, and CAST won't do it, not as a straight hex string nor as a string started with 0x. So, if you must use a script function to cast a hex string into a number, may as well use a script function that generates a random number.

adamw


7,215 post(s)
#10-Sep-17 10:41

Out of curiosity, what's the scenario?

artlembo


2,870 post(s)
#10-Sep-17 13:27

There are lots of scenarios, but in this scenario it is random sampling:

1. Take a large data set

2. Add a new column called random

3. Populate the field with random numbers

4. Sort by the random field

5. Grab the first x records in the table and you have essentially selected X random records

I do this a lot in 8.

Also, lots of statistical inferences call for random values to be generated and sometimes I use those.

tjhb

7,503 post(s)
#11-Sep-17 02:33

Art,

In that sort of case, I thought Dimitri's suggestion sounded ideal, so I tested

SELECT

SPLIT (

    COLLECT [mfd_id][Name]

    ORDER BY UuidMakeNew() ASC

    FETCH 10 -- number of samples

    )

FROM [mfd_meta]

;

Unfortunately though, if I keep repeating this query, then I tend very strongly to get the same sequence of mfd_id values in the table for successive runs. Same for different numbers of samples (FETCH N).

For example, using FETCH 1000 (effectively all records) on a new project, I consistently get sequence 2, 3, 4, 1. Same in a second and a third new project.

I wonder if this has something to do with how underlying Microsoft code creates UUID values. Quasi-guaranteed to be unique, but not unordered, it seems.

artlembo


2,870 post(s)
#11-Sep-17 03:38

I suppose I could use Postgres or SQLite to return random values using the EXECUTE statement. But, the logical thing would be to add the function to put the database on par with other in this particular area.

Also, referring to another post, a ProjectPoint would be very useful for a number of scenarios. I have offered a few suggestions for things that should be included to match what other databases already have.

adamw


7,215 post(s)
#11-Sep-17 08:26

This is a big part of why we don't have random number generator functions available officially yet in queries.

The UuidMakeNew() takes no parameters and so the query engine detects that it does not depend on the record values and so the query engine basically ignores it in ORDER BY similarly to how it would ignore ORDER BY Sin(Pi/2), for example. (The exact mechanics are such that the ignoring comes slightly later than if the ORDER BY wasn't there at all, but the effect is the same in that there are no multiple calls to UuidMakeNew() and no reshuffling of records.)

There are, of course, ways to let the query engine do what is desired here, and we have plans to implement it, this is on the wishlist.

adamw


7,215 post(s)
#11-Sep-17 08:40

Thanks.

I asked to make sure whatever we do with random numbers covers the scenario you are after adequately. What you described is pretty simple, I see no issues.

Here is how you can do random numbers now.

A C# script called 'Script':

//C#

 

class Script

{

 

public static int Rnd(int seed)

{

  return new System.Random(seed).Next();

}

 

public static double RndDouble(int seed)

{

  return new System.Random(seed).NextDouble();

}

 

static Manifold.Context Manifold;

static void Main()

{

  // just some test code, may be omitted

  Manifold.Application.Log(Rnd(2).ToString());

  Manifold.Application.OpenLog();

}

 

}

Example query:

--SQL9

FUNCTION Rnd(v INT32INT32 AS SCRIPT [Script] ENTRY 'Script.Rnd';

FUNCTION RndDouble(v INT32FLOAT64 AS SCRIPT [Script] ENTRY 'Script.RndDouble';

SELECT mfd_id, Rnd(mfd_id), RndDouble(mfd_id) FROM mfd_meta;

The functions take a seed value to let the query engine know that result values vary per record and make the results repeatable (same seed values are going to produce same resulting values). You don't have to use seed values in the functions.

The script functions are not very economical (normally, you would create a single random number generator and use it to generate all values), but this is not important until tens of millions of calls.

If you want non-repeatable results, use something like this:

// C#

 

class Script

{

 

public static int Rnd(int seed)

{

  return new System.Random(seed ^ (int)System.DateTime.Now.Ticks).Next();

}

 

public static double RndDouble(int seed)

{

  return new System.Random(seed ^ (int)System.DateTime.Now.Ticks).NextDouble();

}

 

static Manifold.Context Manifold;

static void Main()

{

}

 

}

Hope this helps.

artlembo


2,870 post(s)
#16-Nov-17 00:58

Adam,

Here is an example - a really useful one. I have a table of say, 1 billion records. Calculating the average values of some variable will take a very long time. But, what I can do is take a random sample of say 100 or 500, or 1000 records (hell, even 10,000 records), calculate the average of those values along with the standard deviation and then compute a confidence interval.

Now, with those calculations, I now have the best point estimate of the value (the average) and the standard deviation allows me to compute confidence interval to obtain a range that assures me I am 90% or 95% confident that the population average falls within the confidence interval.

That is the real beauty of what is called the central limit theorem. It is very useful when we have just simply too many values in our database but need to make some estimates.

So, here is how I do it:

SELECT avg(NFMTTLVL) AS avg_val,stdev(NFMTTLVL) AS StDev_Val, 1.65*sqrt(var(NFMTTLVL)/count(*))

FROM 

(

SELECT A.*, NFMTTLVL FROM

(

 EXECUTE [[SELECT CAST(random()*1600000 AS Int) AS oid FROM parcels LIMIT 1000]ON [pg] ) AS A,

 mdparcels

 WHERE A.oid = objectid

 AND NFMTTLVL > 0

)

the code in bold is calling a random function in Postgres, and giving me 1,000 random numbers between 1 and 1,600,000. The SELECT statement in italics is simply getting the NFMTTTLVL field from the large table where the oid is equal to the random numbers I generated. This is really fast because the oid value is indexed.

This then is selected by the outer query where I get the average of the sample, and calculate the 90% confidence interval.

The query runs in under 1 second! Calculating the average for the entire table takes about 10 seconds. But, in my test case, there are only 1.6M records. Imagine if we had 1.6B records. That is because the oid field is indexed. And, I have a representative random set of data.

I have run this query many times, and sure enough, 90% of the time, the true value falls within the confidence interval - I love magic!

If we are looking to make inferences about populations and we have insanely large values this is a great way to trim data down, and have consistent and predictable results. So, I think this is an effective way to manage big data - take large random samples from really large populations can generate confidence intervals.

It is cool that I could simply call Postgres with the EXECUTE command to get the random numbers, but honestly, if MF had the ability it would make my life much easier.

Dimitri


4,271 post(s)
#16-Nov-17 08:40

if MF had the ability it would make my life much easier.

Given that you can copy/paste what Adam wrote and then use it with zero effort whenever you want (save it to a .map you can declare as a favorite data source for one-click addition to any project)... why not just do what Adam suggests? It scales up into the millions of calls and you only need 100, 500 or 1000 calls, right?

danb


1,603 post(s)
#16-Nov-17 19:02

Perhaps less useful to future Future users however who would have to first work out that Future doesn't have a builtin random function. They would likely then need to seek help from the forum to see if it can be done and then either have the good sense to search the forum or rely on the memory of Art or others to remember this thread.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

Dimitri


4,271 post(s)
#17-Nov-17 08:29

who would have to first work out that Future doesn't have a builtin random function.

I suggest a careful re-reading of Adam's post, as he covered that very well: Of course Future has a built in random function.

In fact, it has several given the wide range of built in scripting languages. Adam gave a C# example. V8 provides math.random() if you prefer that. Keep in mind that, given V8 is built into absolutely every Future installation, all capabilities of V8 are a built in part of Future as well. Same with all the dot net stuff. That's all a built-in capability of Future. If you don't realize you have a huge buffet of choices, well, that's a different matter. The whole point of Manifold going to the immense effort of building in V8, C#, VBScript, etc., etc, is to provide those many, many capabilities as built in parts of Future for you to use. Please take a careful look at what is already there before saying it is not available.

Likewise, if you do realize you have many choices but for some reason you don't like the various options for random number generation that are already built into Future and you prefer that Manifold adds yet another function, well, heck, that's no problem and easy to do. Manifold is always happy to add new features in whatever priority order the community signals those should be done.

Send it in as a suggestion, for example, if in addition to the many random functions already available you want one as an SQL function, and, to help indicate what priority you attach to that, please give examples how you personally need to use random functions in SQL as opposed to the various ways in which you use random functions in scripts and in other settings.

To guide any new implementation of yet another random function it is important to understand what is deficient in the existing random functions Manifold already provides. We have great respect for the people who have implemented random number functions in V8, etc. Those people are not idiots and in general know what they are doing to a very high and professional level.

If anybody wants to go out on a limb and criticize their work, well, that's OK and fair, but if for some reason somebody doesn't like the way their functions work or how they are utilized, I'd ask please show them some respect and state clearly why their way is not good enough for your specific application.

There are good reasons for using random number functions as both Manifold and Oracle have done (Oracle also implements this as part of a script). None of that prevents providing yet another option as an SQL function, but hey, to make sure we cover all bases it would be useful to know what it is about your specific use of random number functions that requires a new function, and an SQL function at that, so that whatever Manifold does in that direction does not repeat the inadequacies of, say, Google's implementation of math.random() or the C# implementation. These things are not easy to write correctly so should the priority of such a thing as indicated by community response ever rise to the level that it jumps ahead of higher priorities it should be done correctly.

I could be wrong about this but a quick review of the truly massive corpus of SQL that is out there online shows very little use of random number functions within SQL itself. It just doesn't seem to be one of those things people use all that much within SQL. It shows up often enough in scripting/programming but seems to be a very, very niche interest n SQL. So, hearing about real life examples, such as Art's, is very important for any new implementation.

I'd also be curious to hear about such examples because it seems to me that at least some of them are driven by the desire to work around inadequacies in their host systems. For example, the need to create smaller samples within other programs for analytics is often driven by the inability of those programs to work with larger data, so a subset is utilized. A better solution would be to work with all of the data instead of jumping through statistical hoops to get around limitations. So I'd be most interested in application of random functions in SQL that are intrinsic to the use of SQL itself and are not just a workaround to something else that is broken in the host program, but which is not an issue within a Radian engine.

Likewise, not all SQL implementations make it easy to call something that is in a script. If a random function is in SQL because the SQL cannot use a well known, highly standard and well understood random function in something like V8 or C#, that too is a workaround that doesn't really apply in a Radian setting. One of the several good reasons to use standard, well-known functions like math.random() in V8 is precisely because they are well understood and have gone through an evolutionary process of Google repairing inadequacies that have been found.

Adam's code is not *strictly* thread-safe.

By that standard nobody has a random function. While it would be fun for Manifold to introduce the world's first totally parallelized and automatic random function, a real achievement above and beyond what Microsoft, Google, Oracle and others have been able to achieve, well, is that really the highest priority? If anybody says "yes" I ask them, "what is your real life application that you are actually doing right now that requires that?". Pseudo-random and slightly, one in a zillion, not thread safe may not be such a bad choice for how such things tend to be used in real life. That's one reason why suggestions that contain real life examples of how the suggester personally will be using the thing tend to be more useful in terms of specifying what should be implemented.

Pretty much any grown up software tool has a Rnd() function: Postgres, SQL Server, Excel, VB, VBA, VBScript, Python (numpy), MapInfo, Access, and the list goes on. I think it will simply cause more people to ask the question: why doesn't Manifold have a Rnd() function? I spent X hours trying to figure how to do this...

Art, your ability to do logic has just gone missing. Remember the transitive property: if A=B and B=C then A=C. Using the symmetric property we can rewrite that as, if A=B and C=B then A=C and, also, C=A.

You just enumerated a few built-in capabilities of things which in turn are built into Manifold. Let's take VBScript as an example. If random is built into VBscript, and VBscript is built into Manifold, then random is built into Manifold. QED. Same for C#, V8, etc., etc. The more examples you list that are built into Manifold which provide random functions, the more you emphasize that Manifold does indeed also provide those many random functions as built-in capability.

If anybody asks "Why doesn't Manifold have a random function?", the correct answer is to patiently and generously explain that Manifold has several random functions. Explain also the utility of going to the effort to build standard capabilities into Manifold, so that such a rich roster of capabilities is placed at the fingertips of Manifold users.

Pretty much any grown up software tool has a Rnd() function:

Ah, well, the "grown up" words! :-) I suppose by that standard since Manifold has not one but several options built in for random functions then Manifold is especially grown up, way more so than things that have only one option for random functions built in.

Since you enumerated examples that have random functions other than SQL functions, then it is clear you intended to cover implementations such as Oracle and Manifold as "grown up." I appreciate you see the wisdom in Adam's approach. :-)

Without pointing any fingers at items in Art's list, Ialso would respectfully remind contributors to this thread that "random" functions are often not really random. It can happen to the best of us. For example, much to Google's surprise they discovered a couple of years ago that even their own work, math.random() in V8, required rework. That somebody claims to have a random function does not mean they really do.

To repeat the substance of Adam's post: Future does indeed have random number generation built in. In fact, it has more than one option given, say, C#, VBScript and V8 at least. There are very good reasons to use such standard implementations, and doing so in scripting is a good way (as Oracle does).

If there is any reason why users feel that the several choices of random functions that exist today do not fulfill their requirements for tasks they are now doing or in reality intend to do, Manifold is certainly happy to implement yet another random function as people desire. You all know the drill for that: send in a suggestion of what you want so it can bubble its way to the top. Even if you don't send in a suggestion, given Manifold's habit of fully articulating all possible functions within SQL no doubt in the future some random function will be added anyway. It is only a matter of when.

But, given that nobody wants Manifold to prioritize re-inventing the wheel in cases where there are perfectly adequate, and possibly even superior, alternatives already implemented, I think it is fair game to ask anybody who is asking for a higher priority for implementing yet another random function to state specifically what, exactly, is their application that drives that highest priority. That also will help ensure that when Manifold does re-invent the wheel, the design of Manifold's new wheel will fit your needs and your tastes.

How you choose to write your suggestions is up to you, but I strongly recommend you maximize the heft of your suggestion by avoiding advocacy that indicates basic misunderstanding. For example, somebody sending in a suggestion powered by comments such as "V8 has a random function, why doesn't Manifold?" is likely to influence not the engineering process but the documentation process, to ensure people are more obviously taught that V8 is part of Manifold so everything that is in V8 is a built in part of Manifold as well.

That's probably not a bad idea in any event, to ensure new users understand that one reason why Manifold has gone to the trouble of building in things like V8, C#, VBScript and so much more is exactly to ensure that such needed capabilities are built into Manifold itself.

Bottom line: if something is missing you actually use and need, send it in. Provide examples of how you will use it in real life. Ensure it is done exactly to your taste by discussing why the capabilities that already do that are not to your liking.

artlembo


2,870 post(s)
#16-Nov-17 19:51

In this case, I think you are correct, this was pretty easy to implement. Although, for the average user, being able to have a Rnd() function built in would be great. Do Dan's comments are valid. Pretty much any grown up software tool has a Rnd() function: Postgres, SQL Server, Excel, VB, VBA, VBScript, Python (numpy), MapInfo, Access, and the list goes on. I think it will simply cause more people to ask the question: why doesn't Manifold have a Rnd() function? I spent X hours trying to figure how to do this...

Also, I actually like your idea of having a MF .map with all my favorite functions in them. Better yet, it would be nice to have a library of functions somewhere on the Internet that could be called from anywhere.

Dimitri


4,271 post(s)
#17-Nov-17 12:17

Also, I actually like your idea of having a MF .map with all my favorite functions in them. Better yet, it would be nice to have a library of functions somewhere on the Internet that could be called from anywhere.

Yes, it would be cool to define a URL structure to call remote resources. That's been planned for some time. Even better would be to have those automatically linked in to new projects.

I wrote up and filed a suggestion yesterday asking for automatic loading of desired favorite data sources. I originally suggested this as being able to designate a path to a folder, call it the Archive folder, and any .map in that folder or further recursed within would be automatically linked into the Project under an Archive heading.

So, suppose you frequently used a variety of scripts, or a long list of TMS servers (like in the ArcGIS REST example project), or just an archive of MAP files with various standard country maps in them. You could put that standard set of .map projects you would like nested into in every project into your Archive folder and then whenever a new project got launched it would automatically have all that in it, conveniently tucked away under an Archive heading in the Project pane. Nested MAP data sources add zero size to projects so there would be no harm in adding such standard data sources, at least if you planned on continued use of your project on that same system (or on a similarly configured system, or one that could reach the same data stores through your organization's network).

There are other possible interfaces. For example, the Favorite Data Sources dialog could be expanded with a column of check boxes where if the box is checked that Favorite is automatically added to any new project. Or, there could be an option for either linking or, in the case of small stuff like standard scripts, to import it into a new project.

I don't expect any of that to appear any time soon but I think it is pretty good odds that sometime after the really hot movement of Future into 9 simmers down a bit there will be something taken out of that which might appear.

tjhb

7,503 post(s)
#16-Nov-17 20:33

There is also the question that Adam's code is not *strictly* thread-safe. I think it is is possible for two threads to hit the same tick, so form the same seed, then entropy is gone.

That means, to my mind, that there is a deficit to be made up, since Manifold has higher standards. (Naive answer: make the seed robustly dependent on the Manifold calling thread.)

I think it would be better for Manifold to deal with this internally--a RND function fully adapted to Manifold's own threading requirements, so that users don't have to know or care.

lionel109 post(s)
#16-Nov-17 23:33

https://www.postgresql.org/docs/9.5/static/functions-math.html

random() return value in the range 0.0 <=x< 1.0

http://rextester.com/l/postgresql_online_compiler

to test postgreSQL Query select random();

What about MAnifold and radian ? ( .. to test )

Dimitri


4,271 post(s)
#17-Nov-17 08:37

What about MAnifold and radian ?

Adam has provided an answer, with example, to the above in this post.

rk
207 post(s)
#17-Nov-17 14:55

I tested the V8 route.

// JavaScript

function square(a) {

 return Math.pow(a,2);

}

function random(dummy) {

 return Math.random()

}

function randomAB(dummy, a, b) {

 return Math.random()*(b-a)+a;

}

Math.random() has no seed parameter, so to force call to random() for each row I had to invent a dummy parameter.

-- SQL9

FUNCTION Square(x INT32INT32

AS SCRIPT [Math.js] ENTRY 'square';

FUNCTION Random(dummy INT64FLOAT64

AS SCRIPT [Math.js] ENTRY 'random';

FUNCTION RandomAB(dummy INT64, a FLOAT64, b FLOAT64FLOAT64

AS SCRIPT [Math.js] ENTRY 'randomAB';

select * from

 (

 SELECT 

 [Value] 

 

 Square([Value]as [square]

 

 Random([Value]as [random01]

 

 RandomAB([Value], 1, 9999) as [random10000]

 

 RandomAB(12345678, 1, 9999) as [Random Called Once] 

 FROM CALL ValueSequence(0, 10000, 1)

 )

where

 [random10000] < 10

;

Attachments:
mfd9_v8_random.map

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.