Cohen 2014A (†458)Cohen, Fred. Discussion board comment (21 July 2014).
- anonymous : The anonymity definition fails to indicate that it implies the inability to attribute actions to actors. Which is to say, it precludes attribution and therefore implies an inability to achieve a chain of custody, source integrity, transparency, use control, accountability, and reliability in the sense of reflection of reality. In my view, these are the fundamental properties of anonymity with regard to archives and records management and should be clearly stated as properties in the terminology database. (†648)
- big data : The definition of "big data" seems to self-refute its utility as a term. In essence, it asserts that it is an advertising term used for ... nothing in particular. The notion of "big" is a relative one in any case, and data being "big" is really asserted to imply that the volume is large relative to something else that is not defined. For a perspective on this, I had a recent forensics case involving something on the order of 10 trillion records. Is that a "big data" case? If so, is a mere 1 trillion records also "big"? How about 10B? 100M, 1M, 100,000? 1,000? 2? In my view, the key to a definition is that it allows differentiation between at least two things - things that are included in the definition and things that are not. There may be a fuzzy boundary of course, but in this case, we cannot clearly identify anything that is in or out of the realm. So I think it should be removed as a definition, or at a minimum, be identified as a term that is not defined formally as of this time. ¶I should also note: " ~ An approach for the analysis of large volumes of information with the intent of discovering and identifying new relationships in different datasets, " Since when is intent relevant to differentiating technical things? And how can intent be determined? If I have the identical volumes using the identical techniques, but a different intent, is it then not big data? ¶"especially datasets that are too large and complex to manipulate or interrogate with standard methods or tools." - but all current "big data" is interrogated and manipulated with "standard methods and tools". I am pretty up to date on this technology, and it is more or less the same set of operations done at smaller scale. ¶If you are going to associate something useful with "big data", I think it is the ability to fuse together diverse datasets and do analysis across those datasets. For example, mapping has become different in kind as it scaled up. Today, we can map, as an example, diseases in marine mammals against temperature, rainfall, animal size, weight, and presence of various other conditions, and map that against pollution levels, fishing activities, vacation periods for people, and many other similar things. As a result, correlations can be asserted and tested far more quickly than they otherwise could have been. But this is not because of a lack of standard methods, but rather because of the translation of many large databases into a common form so that they can be compared and measured against each other. Similarly, 20 years ago, I could simulate a few million sequences in a day, but since I was working against a space that was far larger, that represented only a very small portion of the total set of things that could occur. With the increased scales of computational capacity now available and the ability to rent large volumes of systems for short times relatively inexpensively, I can now do enough simulations to test designs against each other, whereas before, I couldn't even get a good test of a single design. That's what "big data" seems to really be about. (†649)
- pseudonym : Pseudonyms are not generally associated with the intent to falsify. That would be more like forgery or some such thing. A pseudonym is usually defined in terms of a name with a 1-1 mapping to some other name where the pseudonym is used in place of the actual name to obfuscate the real name or represent it in a particular light. For example, actors taking on roles might be asserted as using pseudonyms (I play Julius Caesar and get stabbed to death). More commonly, we have William Shakespeare - a pseudonym for (nobody really seems to know), as opposed to "Billy the Kid" - a nickname for William Bonnie, as opposed to firstname.lastname@example.org which is one of my email addresses, as is email@example.com, and others - all different addresses for the same individual, but none exerting a different actual name behind them. In the legal arena, there are also things like trade names (a.k.a. DBAs - doing business as) - which are forms of pseudonyms. (†650)
- trust : In essence, you are calling trust the same as confidence. The notion of the associated basis is problematic, but the involved relationship is sensible and I have seen it used elsewhere to good effect. The extent to which you are willing to be harmed by another. The concept that it is based on a risk assessment is problematic as well. I would lean toward "the willingness to sustain a loss" - perhaps adding "at the hands of a particular party". As an aside, the use of an email I sent which includes, at the bottom, "-This is confidential to the parties I intend it to serve-" as a reference in what is intended to be an open database published widely might reasonably be considered a breach of trust (I was willing to sustain the loss of confidentiality at the hands of the recipients - and you are in the process perhaps of exercising that willingness without asking explicit permission). You should ask permission rather than assume it - and of course you haven't yet publicly distributed it... (†651)