AI Code Review Azure Devops

AI Code Review Azure Devops — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Spotify Live

    Spotify Live

    Spotify Live, formerly Spotify Greenroom, was a social audio app by Spotify, that allowed users to host or participate in live-audio virtual environments called "room" for conversations. Each room had a maximum capacity of 1000 people. The app was available on Android and iOS, competing with Twitter Spaces and Clubhouse in the social media segment. It was shut down on April 30, 2023. == History == In October 2020, Betty Labs released Locker Room exclusively on the iOS App Store. The app featured virtual audio chat rooms for sports enthusiasts. In late March 2021, Spotify acquired Betty Labs for $50 million and announced plans to rebrand the app with a broader focus on sports, music, and pop culture. On June 16, 2021, Spotify launched the app as Spotify Greenroom on Android (early access) and iOS, expanding its scope beyond just sports. At launch, Spotify introduced the Greenroom Creator Fund to support creators and shows, serving as a rival to Clubhouse's Creator First Accelerator Program. The fund aimed to provide a monetization path for podcasters integrating Greenroom into their verified Spotify accounts. By July 2021, the app had accumulated over 140,000 iOS installs and 100,000 Android installs. In August 2021, Spotify collaborated with the WWE to produce professional wrestling-related podcasts, many of which would be recorded by The Ringer, Spotify's in-house podcasting team, using Greenroom. In March 2022, Spotify Greenroom announced its rebranding as Spotify Live and its migration to the main Spotify app. After a year, Spotify announced it would shut down the Spotify Live app at the end of April 2023. == Features == Greenroom allowed users to create or join a room, which, in the context of the application, was a virtual space for real-time voice chats. Users could only create a room within a pre-defined group, representing either a brand or a generic category. If a user chose to create a room, they became the host, with the ability to invite people, control who could talk, and enable features like recording and the Discussions tab during room creation. Enabling recording displayed a disclaimer informing users that the conversation was being recorded, and the audio, recorded in mp4 format, would be sent to the host via email after the room concluded. If the Discussions tab was enabled, users could send text messages in the public chat section. The host also had the authority to ban users if necessary. When joining a room, a user could opt to be a listener or request to become a speaker. Users had the freedom to follow or block others and join groups at their discretion. Notifications about new rooms in joined groups would be sent to users. Additionally, users could discover new individuals and groups using the search tab. == Partnered creators == By October 2021, Spotify had a variety of partnered creators aimed at boosting traffic and validating its vertically integrated podcast model. These creators primarily focused on Generation Z. In-house Spotify talent, such as The Ringer, produced sports-related content. Simultaneously, the company recruited creators from various social channels to grow Greenroom's audience while also promoting its integration with Spotify and Anchor. Each verified Spotify partner had their Greenroom shows featured in both the Greenroom app and their profiles on the Spotify app. This was part of the company's strategy leading into the 2022 ramp-up to compete with Clubhouse. == Platforms == The app was accessible on both Android and iOS platforms, and users could download the app from their respective app stores. Android users needed Android 8 or above to launch the app, while iOS consumers required iOS 13 or later to run it.

    Read more →
  • Information

    Information

    Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. The concept of information is relevant to and connected with various concepts, including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy. Information is often processed iteratively: Data available at one step are processed into information to be interpreted and processed at the next step. For example, in written text each symbol or letter conveys information relevant to the word it is part of, each word conveys information relevant to the phrase it is part of, each phrase conveys information relevant to the sentence it is part of, and so on until at the final step information is interpreted and becomes knowledge in a given domain. In a digital signal, bits may be interpreted into the symbols, letters, numbers, or structures that convey the information available at the next level up. The key characteristic of information is that it is subject to interpretation and processing. The derivation of information from a signal or message may be thought of as the resolution of ambiguity or uncertainty that arises during the interpretation of patterns within the signal or message. Information may be structured as data. Redundant data can be compressed up to an optimal size, which is the theoretical limit of compression. The information available through a collection of data may be derived by analysis. For example, a restaurant collects data from every customer order. That information may be analyzed to produce knowledge that is put to use when the business subsequently wants to identify the most popular or least popular dish. Information can be transmitted in time, via data storage, and space, via communication and telecommunication. Information is expressed either as the content of a message or through direct or indirect observation. That which is perceived can be construed as a message in its own right, and in that sense, all information is always conveyed as the content of a message. Information can be encoded into various forms for transmission and interpretation (for example, information may be encoded into a sequence of signs, or transmitted via a signal). It can also be encrypted for safe storage and communication. The uncertainty of an event is measured by its probability of occurrence. Uncertainty is proportional to the negative logarithm of the probability of occurrence. Information theory takes advantage of this by concluding that more uncertain events require more information to resolve their uncertainty. The bit is the standard unit of information. It is 'that which reduces uncertainty by half'. Other units such as the nat may be used. For example, the information encoded in one "fair" coin flip is log2(2/1) = 1 bit, and in two fair coin flips is log2(4/1) = 2 bits. A 2011 Science article estimates that 97% of technologically stored information was already in digital bits in 2007 and that the year 2002 was the beginning of the digital age for information storage (with digital storage capacity bypassing analogue for the first time). == Etymology and history of the concept == The English word "information" comes from Middle French enformacion/informacion/information 'a criminal investigation' and its etymon, Latin informatiō(n) 'conception, teaching, creation'. In English, "information" is an uncountable mass noun. References on "formation or molding of the mind or character, training, instruction, teaching" date from the 14th century in both English (according to Oxford English Dictionary) and other European languages. In the transition from Middle Ages to Modernity the use of the concept of information reflected a fundamental turn in epistemological basis – from "giving a (substantial) form to matter" to "communicating something to someone". Peters (1988, pp. 12–13) concludes: Information was readily deployed in empiricist psychology (though it played a less important role than other words such as impression or idea) because it seemed to describe the mechanics of sensation: objects in the world inform the senses. But sensation is entirely different from "form" – the one is sensual, the other intellectual; the one is subjective, the other objective. My sensation of things is fleeting, elusive, and idiosyncratic. For Hume, especially, sensory experience is a swirl of impressions cut off from any sure link to the real world... In any case, the empiricist problematic was how the mind is informed by sensations of the world. At first informed meant shaped by; later it came to mean received reports from. As its site of action drifted from cosmos to consciousness, the term's sense shifted from unities (Aristotle's forms) to units (of sensation). Information came less and less to refer to internal ordering or formation, since empiricism allowed for no preexisting intellectual forms outside of sensation itself. Instead, information came to refer to the fragmentary, fluctuating, haphazard stuff of sense. Information, like the early modern worldview in general, shifted from a divinely ordered cosmos to a system governed by the motion of corpuscles. Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses. In the modern era, the most important influence on the concept of information is derived from the Information theory developed by Claude Shannon and others. This theory, however, reflects a fundamental contradiction. Northrup (1993) wrote: Thus, actually two conflicting metaphors are being used: The well-known metaphor of information as a quantity, like water in the water-pipe, is at work, but so is a second metaphor, that of information as a choice, a choice made by :an information provider, and a forced choice made by an :information receiver. Actually, the second metaphor implies that the information sent isn't necessarily equal to the information received, because any choice implies a comparison with a list of possibilities, i.e., a list of possible meanings. Here, meaning is involved, thus spoiling the idea of information as a pure "Ding an sich." Thus, much of the confusion regarding the concept of information seems to be related to the basic confusion of metaphors in Shannon's theory: is information an autonomous quantity, or is information always per SE information to an observer? Actually, I don't think that Shannon himself chose one of the two definitions. Logically speaking, his theory implied information as a subjective phenomenon. But this had so wide-ranging epistemological impacts that Shannon didn't seem to fully realize this logical fact. Consequently, he continued to use metaphors about information as if it were an objective substance. This is the basic, inherent contradiction in Shannon's information theory." (Northrup, 1993, p. 5). In their seminal book The Study of Information: Interdisciplinary Messages, Almach and Mansfield (1983) collected key views on the interdisciplinary controversy in computer science, artificial intelligence, library and information science, linguistics, psychology, and physics, as well as in the social sciences. Almach (1983, p. 660) himself disagrees with the use of the concept of information in the context of signal transmission, the basic senses of information in his view all referring "to telling something or to the something that is being told. Information is addressed to human minds and is received by human minds." All other senses, including its use with regard to nonhuman organisms as well to society as a whole, are, according to Machlup, metaphoric and, as in the case of cybernetics, anthropomorphic. Hjørland (2007) describes the fundamental difference between objective and subjective views of information and argues that the subjective view has been supported by, among others, Bateson, Yovits, Span-Hansen, Brier, Buckland, Goguen, and Hjørland. Hjørland provided the following example: A stone on a field could contain different information for different people (or from one situation to another). It is not possible for information systems to map all the stone's possible information for every individual. Nor is any one mapping the one "true" mapping. But peop

    Read more →
  • Enterprise Objects Framework

    Enterprise Objects Framework

    The Enterprise Objects Framework, or simply EOF, was introduced by NeXT in 1994 as a pioneering object-relational mapping product for its NeXTSTEP and OpenStep development platforms. EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code. EOF enjoyed some niche success in the mid-1990s among financial institutions who were attracted to the rapid application development advantages of NeXT's object-oriented platform. Since Apple Inc's merger with NeXT in 1996, EOF has evolved into a fully integrated part of WebObjects, an application server also originally from NeXT. Many of the core concepts of EOF re-emerged as part of Core Data, which further abstracts the underlying data formats to allow it to be based on non-SQL stores. == History == In the early 1990s NeXT Computer recognized that connecting to databases was essential to most businesses and yet also potentially complex. Every data source has a different data-access language (or API), driving up the costs to learn and use each vendor's product. The NeXT engineers wanted to apply the advantages of object-oriented programming, by getting objects to "talk" to relational databases. As the two technologies are very different, the solution was to create an abstraction layer, insulating developers from writing the low-level procedural code (SQL) specific to each data source. The first attempt came in 1992 with the release of Database Kit (DBKit), which wrapped an object-oriented framework around any database. Unfortunately, NEXTSTEP at the time was not powerful enough and DBKit had serious design flaws. NeXT's second attempt came in 1994 with the Enterprise Objects Framework (EOF) version 1, a complete rewrite that was far more modular and OpenStep compatible. EOF 1.0 was the first product released by NeXT using the Foundation Kit and introduced autoreleased objects to the developer community. The development team at the time was only four people: Jack Greenfield, Rich Williamson, Linus Upson and Dan Willhite. EOF 2.0, released in late 1995, further refined the architecture, introducing the editing context. At that point, the development team consisted of Dan Willhite, Craig Federighi, Eric Noyau and Charly Kleissner. EOF achieved a modest level of popularity in the financial programming community in the mid-1990s, but it would come into its own with the emergence of the World Wide Web and the concept of web applications. It was clear that EOF could help companies plug their legacy databases into the Web without any rewriting of that data. With the addition of frameworks to do state management, load balancing and dynamic HTML generation, NeXT was able to launch the first object-oriented Web application server, WebObjects, in 1996, with EOF at its core. In 2000, Apple Inc. (which had merged with NeXT) officially dropped EOF as a standalone product, meaning that developers would be unable to use it to create desktop applications for the forthcoming Mac OS X. It would, however, continue to be an integral part of a major new release of WebObjects. WebObjects 5, released in 2001, was significant for the fact that its frameworks had been ported from their native Objective-C programming language to the Java language. Critics of this change argue that most of the power of EOF was a side effect of its Objective-C roots, and that EOF lost the beauty or simplicity it once had. Third-party tools, such as EOGenerator, help fill the deficiencies introduced by Java (mainly due to the loss of categories). The Objective-C code base was re-introduced with some modifications to desktop application developers as Core Data, part of Apple's Cocoa API, with the release of Mac OS X Tiger in April 2005. == How EOF works == Enterprise Objects provides tools and frameworks for object-relational mapping. The technology specializes in providing mechanisms to retrieve data from various data sources, such as relational databases via JDBC and JNDI directories, and mechanisms to commit data back to those data sources. These mechanisms are designed in a layered, abstract approach that allows developers to think about data retrieval and commitment at a higher level than a specific data source or data source vendor. Central to this mapping is a model file (an "EOModel") that you build with a visual tool — either EOModeler, or the EOModeler plug-in to Xcode. The mapping works as follows: Database tables are mapped to classes. Database columns are mapped to class attributes. Database rows are mapped to objects (or class instances). You can build data models based on existing data sources or you can build data models from scratch, which you then use to create data structures (tables, columns, joins) in a data source. The result is that database records can be transposed into Java objects. The advantage of using data models is that applications are isolated from the idiosyncrasies of the data sources they access. This separation of an application's business logic from database logic allows developers to change the database an application accesses without needing to change the application. EOF provides a level of database transparency not seen in other tools and allows the same model to be used to access different vendor databases and even allows relationships across different vendor databases without changing source code. Its power comes from exposing the underlying data sources as managed graphs of persistent objects. In simple terms, this means that it organizes the application's model layer into a set of defined in-memory data objects. It then tracks changes to these objects and can reverse those changes on demand, such as when a user performs an undo command. Then, when it is time to save changes to the application's data, it archives the objects to the underlying data sources. === Using Inheritance === In designing Enterprise Objects developers can leverage the object-oriented feature known as inheritance. A Customer object and an Employee object, for example, might both inherit certain characteristics from a more generic Person object, such as name, address, and phone number. While this kind of thinking is inherent in object-oriented design, relational databases have no explicit support for inheritance. However, using Enterprise Objects, you can build data models that reflect object hierarchies. That is, you can design database tables to support inheritance by also designing enterprise objects that map to multiple tables or particular views of a database table. == Enterprise Objects (EOs) == An Enterprise Object is analogous to what is often known in object-oriented programming as a business object — a class which models a physical or conceptual object in the business domain (e.g. a customer, an order, an item, etc.). What makes an EO different from other objects is that its instance data maps to a data store. Typically, an enterprise object contains key-value pairs that represent a row in a relational database. The key is basically the column name, and the value is what was in that row in the database. So it can be said that an EO's properties persist beyond the life of any particular running application. More precisely, an Enterprise Object is an instance of a class that implements the com.webobjects.eocontrol.EOEnterpriseObject interface. An Enterprise Object has a corresponding model (called an EOModel) that defines the mapping between the class's object model and the database schema. However, an enterprise object doesn't explicitly know about its model. This level of abstraction means that database vendors can be switched without it affecting the developer's code. This gives Enterprise Objects a high degree of reusability. == EOF and Core Data == Despite their common origins, the two technologies diverged, with each technology retaining a subset of the features of the original Objective-C code base, while adding some new features. === Features Supported Only by EOF === EOF supports custom SQL; shared editing contexts; nested editing contexts; and pre-fetching and batch faulting of relationships, all features of the original Objective-C implementation not supported by Core Data. Core Data also does not provide the equivalent of an EOModelGroup—the NSManagedObjectModel class provides methods for merging models from existing models, and for retrieving merged models from bundles. === Features Supported Only by Core Data === Core Data supports fetched properties; multiple configurations within a managed object model; local stores; and store aggregation (the data for a given entity may be spread across multiple stores); customization and localization of property names and validation warnings; and the use of predicates for property validation. These features of the original Objective-C implementation are not supported by the Java implementation.

    Read more →
  • Document

    Document

    A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The etymology of the word "document" derives from the Latin documentum, which denotes a "teaching" or "lesson": the verb doceō denotes "to teach". Historically, the term "document" was usually used to indicate written proof useful as evidence of a truth or fact. In the Computer Age, the term "document" typically refers to a primarily textual computer file, encompassing its structural and format elements, such as fonts, colors, and images. In the contemporary era, the definition of "document" has expanded beyond its traditional medium, such as paper, to encompass electronic documents as well. History, events, examples, opinions, stories, and creativity can all be expressed in documents. "Documentation" is distinct because it has more denotations than "document". Documents are also distinguished from "realia", which are three-dimensional objects that would otherwise satisfy the definition of "document" because they memorialize or represent thought. Documents are usually considered to be two-dimensional representations. == Abstract definitions == The concept of "document" has been defined by Suzanne Briet as "any concrete or symbolic indication, preserved or recorded, for reconstructing or for proving a phenomenon, whether physical or mental." An often-cited article concludes that "the evolving notion of document" among Jonathan Priest, Paul Otlet, Briet, Walter Schürmeyer, and the other documentalists increasingly emphasized whatever functioned as a document rather than traditional physical forms of documents. The shift to digital technology would seem to make this distinction even more important. David M. Levy has said that an emphasis on the technology of digital documents has impeded our understanding of digital documents as documents. A conventional document, such as a mail message or a technical report, exists physically in digital technology as a string of bits, as does everything else in a digital environment. As an object of study, it has been made into a document. It has become physical evidence by those who study it. "Document" is defined in library and information science and documentation science as a fundamental, abstract idea: the word denotes everything that may be represented or memorialized to serve as evidence. The classic example provided by Briet is an antelope: "An antelope running wild on the plains of Africa should not be considered a document[;] she rules. But if it were to be captured, taken to a zoo and made an object of study, it has been made into a document. It has become physical evidence being used by those who study it. Indeed, scholarly articles written about the antelope are secondary documents, since the antelope itself is the primary document." This opinion has been interpreted as an early expression of actor–network theory. == Kinds == A document can be structured, like tabular documents, lists, forms, or scientific charts, semi-structured like a book or a newspaper article, or unstructured like a handwritten note. Documents are sometimes classified as secret, private, or public. They may also be described as drafts or proofs. When a document is copied, the source is denominated the "original". Documents are used in numerous fields, e.g.: Academia: manuscript, thesis, paper, journal, chart, and technical drawing Media: mock-up, script, image, photography, and newspaper article Administration, law, and politics: application, brief, certificate, commission, constitutional document, form, gazette, identity document, license, manifesto, summons, census, and white paper Business: invoice, request for proposal, proposal, contract, packing slip, manifest, report (detailed and summary), spreadsheet, material safety data sheet, waybill, bill of lading, financial statement, nondisclosure agreement (NDA), mutual nondisclosure agreement, and user guide Geography and planning: topographic map, cadastre, legend, and architectural plan Such standard documents can be drafted based on a template. == Drafting == The page layout of a document is how information is graphically arranged in the space of the document, e.g., on a page. If the appearance of the document is of concern, the page layout is generally the responsibility of a graphic designer. Typography concerns the design of letter and symbol forms and their physical arrangement in the document (see typesetting). Information design concerns the effective communication of information, especially in industrial documents and public signs. Simple textual documents may not require visual design and may be drafted only by an author, clerk, or transcriber. Forms may require a visual design for their initial fields, but not to complete the forms. == Media == Traditionally, the medium of a document was paper and the information was applied to it in ink, either by handwriting (to make a manuscript) or by a mechanical process (e.g., a printing press or laser printer). Today, some short documents also may consist of sheets of paper stapled together. Historically, documents were inscribed with ink on papyrus (starting in ancient Egypt) or parchment; scratched as runes or carved on stone using a sharp tool, e.g., the Tablets of Stone described in the Bible; stamped or incised in clay and then baked to make clay tablets, e.g., in the Sumerian and other Mesopotamian civilizations. The papyrus or parchment was often rolled into a scroll or cut into sheets and bound into a codex (book). Contemporary electronic means of memorializing and displaying documents include: Monitor of a desktop computer, laptop, tablet; optionally with a printer to produce a hard copy; Personal digital assistant; Dedicated e-book device; Electronic paper, typically, using the Portable Document Format (PDF); Information appliance; Digital audio player; and Radio and television service provider. Digital documents usually require a specific file format to be presentable in a specific medium. == In law == Documents in all forms frequently serve as material evidence in criminal and civil proceedings. The forensic analysis of such a document is within the scope of questioned document examination. To catalog and manage the large number of documents that may be produced during litigation, Bates numbering is often applied to all documents in the lawsuit so that each document has a unique, arbitrary, identification number.

    Read more →
  • Image tracing

    Image tracing

    In computer graphics, image tracing, raster-to-vector conversion or raster vectorization is the conversion of raster graphics into vector graphics. == Background == An image does not have any structure: it is just a collection of marks on paper, grains in film, or pixels in a bitmap. While such an image is useful, it has some limits. If the image is magnified enough, its artifacts appear. The halftone dots, film grains, and pixels become apparent. Images of sharp edges become fuzzy or jagged. See, for example, pixelation. Ideally, a vector image does not have the same problem. Edges and filled areas are represented as mathematical curves or gradients, and they can be magnified arbitrarily (though of course the final image must also be rasterized in to be rendered, and its quality depends on the quality of the rasterization algorithm for the given inputs). The task in vectorization is to convert a two-dimensional image into a two-dimensional vector representation of the image. It is not examining the image and attempting to recognize or extract a three-dimensional model that may be depicted; i.e. it is not a vision system. For most applications, vectorization also does not involve optical character recognition; characters are treated as lines, curves, or filled objects without attaching any significance to them. In vectorization, the shape of the character is preserved, so artistic embellishments remain. Vectorization is the inverse operation corresponding to rasterization, as integration is to differentiation. And, just as with these other operations, while rasterization is fairly straightforward and algorithmic, vectorization involves the reconstruction of lost information and therefore requires heuristic methods. Synthetic images such as maps, cartoons, logos, clip art, and technical drawings are suitable for vectorization. Those images could have been originally made as vector images because they are based on geometric shapes or drawn with simple curves. Continuous tone photographs (such as live portraits) are not good candidates for vectorization. The input to vectorization is an image, but an image may come in many forms such as a photograph, a drawing on paper, or one of several raster file formats. Programs that do raster-to-vector conversion may accept bitmap formats such as TIFF, BMP and PNG. The output is a vector file format. Common vector formats are SVG, DXF, EPS, EMF and AI. Vectorization can be used to update images or recover work. Personal computers often come with a simple paint program that produces a bitmap output file. These programs allow users to make simple illustrations by adding text, drawing outlines, and filling outlines with a specific color. Only the results of these operations (the pixels) are saved in the resulting bitmap; the drawing and filling operations are discarded. Vectorization can be used to recapture some of the information that was lost. Vectorization is also used to recover information that was originally in a vector format but has been lost or has become unavailable. A company may have commissioned a logo from a graphic arts firm. Although the graphics firm used a vector format, the client company may not have received a copy of that format. The company may then acquire a vector format by scanning and vectorizing a paper copy of the logo. == Process == Vectorization starts with an image. === Manual === The image can be vectorized manually. A person could look at the image, make some measurements, and then write the output file by hand. That was the case for the vectorization of a technical illustration about neutrinos. The illustration has a few geometric shapes and a lot of text; it was relatively easy to convert the shapes, and the SVG vector format allows the text (even subscripts and superscripts) to be entered easily. The original image did not have any curves (except for the text), so the conversion is straightforward. Curves make the conversion more complicated. Manual vectorization of complicated shapes can be facilitated by the tracing function built into some vector graphics editing programs. If the image is not yet in machine readable form, then it has to be scanned into a usable file format. Once there is a machine-readable bitmap, the image can be imported into a graphics editing program (such as Adobe Illustrator, CorelDRAW, or Inkscape). Then a person can manually trace the elements of the image using the program's editing features. Curves in the original image can be approximated with lines, arcs, and Bézier curves. An illustration program allows spline knots to be adjusted for a close fit. Manual vectorization is possible, but it can be tedious. Although graphics drawing programs have been around for a long time, artists may find the freehand drawing facilities awkward even when a drawing tablet is used. Instead of using a program, Pepper recommends making an initial sketch on paper. Instead of scanning the sketch and tracing it freehand in the computer, Pepper states: "Those proficient with a graphic tablet and stylus could make the following changes directly in CorelDRAW by using a scan of the sketch as an underlay and drawing over it. I prefer to use pen and ink, and a light table"; most of the final image was traced by hand in ink. Later the line-drawing image was scanned at 600 dpi, cleaned up in a paint program, and then automatically traced with a program. Once the black and white image was in the graphics program, some other elements were added and the figure was colored. Similarly, Ploch recreated a design from a digital photograph. The JPEG was imported and some "basic shapes" were traced by hand and colored in the graphics drawing program; more complex shapes were handled differently. Ploch used a bitmap editor to remove the background and crop the more complex image components. He then printed the image and traced it by hand onto tracing paper to get a clean black and white line drawing. That drawing was scanned and then vectorized with a program. === Automatic === Some programs automate the vectorization process. Example programs are Adobe Illustrator, Inkscape, Corel's PowerTRACE, and Potrace. Some of these programs have a command line interface while others are interactive that allow the user to adjust the conversion settings and view the result. Adobe Streamline is not only an interactive program, but it also allows a user to manually edit the input bitmap and the output curves. Corel's PowerTRACE is accessed through CorelDRAW; CorelDRAW can be used to modify the input bitmap and edit the output curves. Adobe Illustrator has a facility to trace individual curves. Automated programs can have mixed results. A program (PowerTRACE) was used to convert a PNG map to SVG. The program did a good job on the map boundaries (the most tedious task in the tracing) and the settings dropped out all the text (small objects). The text was manually re-inserted. Other conversions may not go as well. The results depend on having high-quality scans, reasonable settings, and good algorithms. Scanned images often have a lot of noise, which can require additional work to clean up. == Options == There are many different image styles and possibilities, and no single vectorization method works well on all images. Consequently, vectorization programs have many options that influence the result. One issue is what the predominant shapes are. If the image is of a fill-in form, then it will probably have just vertical and horizontal lines of a constant width. The program's vectorization should take that into account. On the other hand, a CAD drawing may have lines at any angle, there may be curved lines, and there may be several line weights (thick for objects and thin for dimension lines). Instead of (or in addition to) curves, the image may contain outlines filled with the same color. Adobe Streamline allows users to select a combination of line recognition (horizontal and vertical lines), centerline recognition, or outline recognition. Streamline also allows small outline shapes to be thrown out; the notion is such small shapes are noise. The user may set the noise level between 0 and 1000; an outline that has fewer pixels than that setting is discarded. Another issue is the number of colors in the image. Even images that were created as black on white drawings may end up with many shades of gray. Some line-drawing routines employ anti-aliasing; a pixel completely covered by the line will be black, but a pixel that is only partially covered will be gray. If the original image is on paper and is scanned, there is a similar result: edge pixels will be gray. Sometimes images are compressed (e.g., JPEG images), and the compression will introduce gray levels. Many of the vectorization programs will group same-color pixels into lines, curves, or outlined shapes. If each possible color is grouped into its object, there can be an enormous number of objects. Instead, the user is asked to s

    Read more →
  • Organizational metacognition

    Organizational metacognition

    Organizational metacognition is knowing what an organization knows, a concept related to metacognition, organizational learning, the learning organization and sensemaking. It is used to describe how organizations and teams develop an awareness of their own thinking, learning how to learn, where awareness of ignorance can motivate learning. The organizational deutero-learning concept identified by Argyris and Schon defines when organizations learn how to carry out single-loop and double-loop learning. It has also been described as learning how to learn through a process of collaborative inquiry and reflection (evaluative inquiry). "When an organization engages in deutero-learning its members learn about the previous context for learning. They reflect on and inquire into previous episodes of organizational learning, or failure to learn. They discover what they did that facilitated or inhibited learning, they invent new strategies for learning, they produce these strategies, and they evaluate and generalize what they have produced" Learning what facilitates and inhibits learning enables organizations to develop new strategies to develop their knowledge. For example, identification of a gap between perceived performance (such as satisfaction) and actual performance (outcomes) creates an awareness that makes the organization understand that learning needs to occur, driving appropriate changes to the environment and processes. == Learning prototypes == Wijnhoven (2001) grouped four learning prototypes that best meet learning needs, the match between these needs and learning norms dictating an organization's learning capabilities; deutero-learning is the acquisition of these capabilities. knowledge gap analysis classification of problems to select operationally required knowledge and skills coping with organizational tremors and jolts by anticipation, response and adjustments of behavioural repertoires decisional uncertainty measurement == Terminological ambiguities == Organizational metacognition and organizational deutero-learning have both been described as the concept or phenomenon where organizations learn how to learn. Argyris and Schon (1978) place deutero-learning into their cognitive theory of action framework, neglecting aspects of adaptive behaviour and context core to Bateson's (1972) original definitions. In order to resolve terminological ambiguities, Visser (2007) reviewed and reformulated the concept of deutero-learning as, "the behavioral adaptation to patterns of conditioning in relationships in organizational contexts, distinguishing it from meta-learning and planned learning" (pg. 659). == Significance == Organizational metacognition is considered a key norm to the prescriptive concept of the learning organization. Its significance has been recognized by industry, the military and in disaster response. == Examples in practice == Examples of poor metacognition (deutero-learning) have been described in knowledge network environments, "Knowledge networking is important to most competitive enterprises today. Enterprise knowledge is becoming ever more specialized in nature, so no single person or organization can know everything in detail. Hence addressing complex, multidisciplinary problems requires developing and accessing a network of knowledgeable people and organizations. The problem is, many otherwise knowledgeable people and organizations are not fully aware of their knowledge networks, and even more problematic, they are not aware that they are not aware. This focuses our attention toward organizational metacognition."

    Read more →
  • Subject (documents)

    Subject (documents)

    In library and information science documents (such as books, articles and pictures) are classified and searched by subject – as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document. To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years (see below) == Theoretical view == === Charles Ammi Cutter (1837–1903) === For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred [...] to those intellections [...] that had received a name that itself represented a distinct consensus in usage" (Miksa, 1983a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61). Bernd Frohmann adds: "The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts. Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification (DDC), by contrast, severs those connections. Melvil Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic... .... The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112–113). Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century – and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects. === S. R. Ranganathan (1892–1972) === A classification system with an explicit theoretical foundation is Ranganathan's Colon Classification. Ranganathan provided an explicit definition of the concept of "subject": Subject – an organized body of ideas, whose extension and intension are likely to fall coherently within the field of interests and comfortably within the intellectual competence and the field of inevitable specialization of a normal person. A related definition is given by one of Ranganathan's students: A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several... Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...". It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priori, in the definition. Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall. If researchers too often define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult. Based on these arguments, as well as additional arguments which have been used in the literature, we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps, may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon Classification. For such purpose is another understanding of "subject" necessary. === Patrick Wilson (1927–2003) === In his book Wilson (1968) examined – in particular by thought experiments – the suitability of different methods of examining the subject of a document. The methods were: identifying the author's purpose for writing the document, weighing the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader, grouping or count the document's use of concepts and references, construing a set of rules for selecting elements deemed necessary (as opposed to unnecessary) for the work as a whole. Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude ( p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness". Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term i

    Read more →
  • Voiceverse NFT plagiarism scandal

    Voiceverse NFT plagiarism scandal

    In January 2022, 15—the pseudonymous Massachusetts Institute of Technology (MIT) artificial intelligence researcher and creator of the non-commercial generative artificial intelligence voice synthesis research project 15.ai—discovered that the blockchain-based technology company Voiceverse had plagiarized from their platform. Voiceverse marketed itself as a service that offered AI voice cloning technology that could be purchased and traded as non-fungible tokens (NFTs). Amid heightened controversy over NFTs in the gaming industry, voice actor Troy Baker (who has been described as one of the most famous voice actors in video games) announced his partnership with Voiceverse on January 14, 2022, triggering immediate backlash over concerns about the environmental impact of NFTs, potential for fraud, predatory monetization in video games, and the potential of AI displacing jobs for human voice actors. Later that same day, 15 revealed through server logs that Voiceverse had generated voice lines using 15's free text-to-speech platform, pitch-shifted the audio to make them unrecognizable, and falsely marketed the samples as their own technology before selling them as NFTs. Within an hour of being confronted with evidence, Voiceverse confessed and stated that their marketing team had used 15.ai without proper attribution while rushing to create a technology demo to coincide with Baker's partnership announcement, further exacerbating the already negative reception to the original announcement. In response, 15 replied "Go fuck yourself"; the interaction went viral and garnered a large amount of support for the developer. News publications universally characterized this incident as Voiceverse having "stolen" from 15.ai. The next day, Baker appeared on a podcast and stated that his motivation had been to help independent creators who were unable to afford professional voice actors. Following continued backlash and the plagiarism revelation, Baker ended his partnership with Voiceverse on January 31, 2022. Subsequently, the incident was documented in multiple AI ethics databases, criticisms of predatory monetization in video games, and retrospectives as one of the earliest instances of plagiarism and theft stemming from artificial intelligence during the AI boom. == Background == === Troy Baker === Troy Baker is a prominent voice actor in the video game industry best known for his performances as Joel Miller in The Last of Us franchise. Baker has been described as "ubiquitous" by Polygon, "one of the most high-profile and prolific voice actors in video games" by Eurogamer, and "arguably the most famous voice actor in the gaming industry" by GameGuru. His other prominent roles include voicing Agent John "Jonesy" Jones in Fortnite, Booker DeWitt in BioShock Infinite, and both Batman and Joker in multiple Batman video games. As of October 2025, Baker holds the record for the most acting nominations at the BAFTA Games Awards, with five between 2013 and 2021. === Voiceverse === Voiceverse is a blockchain-based startup founded by the Bored Ape Yacht Club that marketed itself as offering AI voice cloning technology in the form of NFTs. Prior to the announcement of their partnership with Baker, Voiceverse had partnered with LOVO, Inc., an AI voice platform that, according to LOVO, could generate human-like voices. Voiceverse stated that any user who purchases a voice NFT would have unlimited and perpetual access to the voice model, which could be used to create content such as audiobooks, YouTube videos, podcasts, e-learning materials, in-game voice chat, and Zoom calls. Voiceverse promised that buyers would "OWN [sic] all of the IP" of content they created using these voices. Voiceverse's roadmap included plans to release 8,888 initial voice NFTs, a feature to add emotions to existing voices, and the ability for users to mint their own voices as NFTs. Prior to Baker's partnership, Voiceverse had also partnered with voice actors Charlet Chung, who voices D.Va in Overwatch, and Andy Milonakis of The Andy Milonakis Show. === 15.ai === 15.ai is a free web application launched in 2020 that uses artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Created by a pseudonymous artificial intelligence researcher known as 15, who began developing the technology as a freshman during their undergraduate research at MIT, it was an early example of an application of generative artificial intelligence during the initial stages of the AI boom. The platform showed that deep neural networks could generate emotionally expressive speech with only 15 seconds of speech; the name "15.ai" references the creator's statement that a voice can be convincingly cloned with just 15 seconds of audio, as opposed to the tens of hours of data previously required. 15.ai became an Internet phenomenon in early 2021 when content utilizing it went viral on social media and quickly gained widespread use among various Internet fandoms. 15 has emphasized that it remain free and non-commercial; it only requires users to give proper credit when using the service for content creation. === NFTs in the video game industry === By early 2022, NFTs had become highly controversial within the gaming industry. Critics raised concerns about their environmental impact due to the significant energy consumption of blockchain technology. In addition, the prevalence of scams, fraud, and potential money laundering associated with NFT sales, as well as fears that NFTs were a new form of predatory monetization following the increasing frequency of loot boxes, caused vocal pushback from the gaming community. Several major gaming companies had begun exploring NFT integration into their products, though fan backlash had already forced some projects to be cancelled. On December 16, 2021, the developers of S.T.A.L.K.E.R. 2: Heart of Chernobyl announced that they would be including NFTs in the game, but cancelled within an hour of the announcement due to immediate universal backlash. Simultaneously, the rise of AI voice technology raised concerns among voice actors about potential job displacement and the devaluation of their work amidst the voice acting industry's ongoing struggles for better compensation and working conditions. == Partnership announcement and backlash == On January 14, 2022, 1:02 a.m. EST, Baker announced on Twitter that he was partnering with Voiceverse "to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create." The announcement concluded with the statement "You can hate. Or you can create." Baker's specific role with Voiceverse remained unclear at the time of the announcement. Along with Baker's announcement, Voiceverse promoted their supposed voice AI technology on Twitter by posting animated videos that featured a cat character created by NFT firm Chubbiverse. The videos concluded with text that read "The Voice Powered By Voiceverse"; Voiceverse stated on Twitter that the voices in the animations had been generated using their own AI voice synthesis technology and presented the videos as a technology demonstration of their voice NFT capabilities. The announcement provoked immediate and widespread backlash from the gaming community. Baker's tweet received thousands of replies and quote retweets (the vast majority of which were negative), far more than the number of likes; Michael McWhertor of Polygon described it as a "textbook example of being ratioed" and commented that reactions had been amplified by the final part of Baker's announcement. Michael Beckwith of Metro called Baker's approach "bizarrely aggressive". Later that day, Baker responded to the backlash by apologizing for his choice of words. He said he appreciated people's thoughts and acknowledged that the "hate/create part might have been a bit antagonistic," calling it a "bad attempt to bring levity". Despite the apology, Baker and his fellow voice actors did not distance themselves from Voiceverse at this point. At the same time, Voiceverse attempted to address the criticisms, stating that they were working to move to more environmentally friendly blockchain technology and that voice actors would receive royalties from NFT sales, with actors benefiting from any increase in NFT value. == Plagiarism revelation == On December 13, 2021, amidst the increasingly negative reactions toward NFTs among the general public, the creator of 15.ai (known pseudonymously as 15) announced that they had "no interest in incorporating NFTs into any aspect of [their] work." On January 14, 2022, 11:17 a.m. EST (10 hours after Baker's initial announcement), 15 commented on the Voiceverse venture, stating that it "sounds like a scam". Two hours later, at 1:20 p.m., 15 explicitly accused Voiceverse of "actively attempting to appropriate [15's] work for [Voiceverse's] own benefit." 15 provided evidence through

    Read more →
  • Mobile cloud storage

    Mobile cloud storage

    Mobile cloud storage is a form of cloud storage that is accessible on mobile devices such as laptops, tablets, and smartphones. Mobile cloud storage providers offer services that allow the user to create and organize files, folders, music, and photos, similar to other cloud computing models. Services are used by both individuals and companies. Most cloud file storage providers offer limited free use but charge for additional storage once the free limit is exceeded. These costs are usually charged as a monthly subscription rate and have different rates depending on the amount of storage desired. In 2018, cloud services revenue was about $182.4 billion and in 2022 it is projected to grow to $331.2 billion. The cloud storage industry was projected to grow 17.2 percent in 2019 (Costello, 2019). == History == The concept of cloud computing trace back to 1960s, when the groundwork for modern internet and network technologies was being laid (Human for humans, 2024). One of the pivotal figures in this early period was J.C.R. Licklider, a visionary computer scientist who worked on ARPANET, the precursor to the internet. Licklider's ideas set the stage for the development of distributed computing systems, which are fundamental to cloud computing. Moving into the 1990s, AT&T introduced PersonaLink Services, a more advanced online platform offering electronic mail and online storage. Major turning point in 2006 The launch of Amazon Web Services (AWS) in 2006 marked a major turning point. AWS introduced Amazon S3 (Simple Storage Service), which allowed businesses and developers to store and retrieve any amount of data, at any time, from anywhere on the web. This development was revolutionary, providing scalable, reliable, and low-cost data storage infrastructure that transformed how organizations managed their data. == Applications == Some mobile device manufacturers include mobile cloud storage apps with their product. These apps facilitate synchronization of user files across multiple platforms. Part of the process for setting up new mobile devices frequently includes configuring a cloud storage service to Backup the device's files and information. Apple iOS devices come pre-loaded and configured to use Apple's mobile cloud storage service iCloud. Google offers a similar feature with the Android operating system by backing up the device using a Google Drive account. The Samsung Galaxy smartphone has partnered with Dropbox, while Microsoft similarly offers Microsoft OneDrive. Some mobile cloud storage apps are platform-independent. For example, Nasuni's Mobile Access app is available on any Android or iOS device. Most companies offering Cloud Storage have secure website to access files allowing use on any device that can browse the Internet.

    Read more →
  • Controlled vocabulary

    Controlled vocabulary

    A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction. == In library and information science == In library and information science, controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of homographs, synonyms and polysemes by a bijection between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency. For example, in the Library of Congress Subject Headings (a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (cockroach versus Periplaneta americana), and choices between synonyms (automobile versus car), among other difficult issues. Choices of preferred terms are based on the principles of user warrant (what terms users are likely to use), literary warrant (what terms are generally used in the literature and documents), and structural warrant (terms chosen by considering the structure, scope of the controlled vocabulary). Controlled vocabularies also typically handle the problem of homographs with qualifiers. For example, the term pool has to be qualified to refer to either swimming pool or the game pool to ensure that each preferred term or heading refers to only one concept. === Types used in libraries === There are two main kinds of controlled vocabulary tools used in libraries: subject headings and thesauri. While the differences between the two are diminishing, there are still some minor differences: Historically, subject headings were designed to describe books in library catalogs by catalogers while thesauri were used by indexers to apply index terms to documents and articles. Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines. Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct order. Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the Library of Congress Subject Heading itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "Broader term" and "Narrow term". The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Well known subject heading systems include the Library of Congress system, Medical Subject Headings (MeSH) created by the United States National Library of Medicine, and Sears. Well known thesauri include the Art and Architecture Thesaurus and the ERIC Thesaurus. When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language. Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as tags, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as metadata. == Indexing languages == There are three main types of indexing languages. Controlled indexing language – only approved terms can be used by the indexer to describe the document Natural language indexing language – any term from the document in question can be used to describe the document Free indexing language – any term (not only from the document) can be used to describe the document When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing exhaustivity, the more terms indexed for each document. In recent years free text search as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is indexed). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search". === Advantages === Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused by the inherent ambiguity of natural language. Take the English word football for example. Football is the name given to a number of different team sports. Worldwide the most popular of these team sports is association football, which also happens to be called soccer in several countries. The word football is also applied to rugby football (rugby union and rugby league), American football, Australian rules football, Gaelic football, and Canadian football. A search for football therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by tagging the documents in such a way that the ambiguities are eliminated. Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic). In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term. === Disadvantages === A controlled vocabulary search may lead to unsatisfactory recall, in that it will fail to retrieve some documents that are actually relevant to the search question. This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer. Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main focus. But it turns out that for the searcher that article is relevant and hence recall fails. A free text search would automatically pick up that article regardless. On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination. Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words. The use of controlled vocabularies can be costly compared to free

    Read more →
  • Bibliographic database

    Bibliographic database

    A bibliographic database is a database of bibliographic records. This is an organised online collection of references to published written works like journal and newspaper articles, conference proceedings, reports, government and legal publications, patents and books. In contrast to library catalogue entries, a majority of the records in bibliographic databases describe articles and conference papers rather than complete monographs, and they generally contain very rich subject descriptions in the form of keywords, subject classification terms, or abstracts. A bibliographic database may cover a wide range of topics or one academic field like computer science. A significant number of bibliographic databases are marketed under a trade name by licensing agreement from vendors, or directly from their makers: the indexing and abstracting services. Many bibliographic databases have evolved into digital libraries, providing the full text of the organised contents:for instance CORE also organises and mirrors scholarly articles and OurResearch develops a search engine for open access content in Unpaywall. Others merge with non-bibliographic and scholarly databases to create more complete disciplinary search engine systems, such as Chemical Abstracts or Entrez. == History == Prior to the mid-20th century, individuals searching for published literature had to rely on printed bibliographic indexes, generated manually from index cards. During the early 1960s computers were used to digitize text for the first time; the purpose was to reduce the cost and time required to publish two American abstracting journals, the Index Medicus of the National Library of Medicine and the Scientific and Technical Aerospace Reports of the National Aeronautics and Space Administration (NASA). By the late 1960s, such bodies of digitized alphanumeric information, known as bibliographic and numeric databases, constituted a new type of information resource. Online interactive retrieval became commercially viable in the early 1970s over private telecommunications networks. The first services offered a few databases of indexes and abstracts of scholarly literature. These databases contained bibliographic descriptions of journal articles that were searchable by keywords in author and title, and sometimes by journal name or subject heading. The user interfaces were crude, the access was expensive, and searching was done by librarians on behalf of "end users".

    Read more →
  • HAKMEM

    HAKMEM

    HAKMEM, alternatively known as AI Memo 239, is a February 1972 "memo" (technical report) of the MIT AI Lab containing a wide variety of hacks, including useful and clever algorithms for mathematical computation, some number theory and schematic diagrams for hardware – in Guy L. Steele's words, "a bizarre and eclectic potpourri of technical trivia". Contributors included about two dozen members and associates of the AI Lab. The title of the report is short for "hacks memo", abbreviated to six upper case characters that would fit in a single PDP-10 machine word (using a six-bit character set). == History == HAKMEM is notable as an early compendium of algorithmic technique, particularly for its practical bent, and as an illustration of the wide-ranging interests of AI Lab people of the time, which included almost anything other than AI research. HAKMEM contains original work in some fields, notably continued fractions. == Introduction == Compiled with the hope that a record of the random things people do around here can save some duplication of effort -- except for fun. Here is some little known data which may be of interest to computer hackers. The items and examples are so sketchy that to decipher them may require more sincerity and curiosity than a non-hacker can muster. Doubtless, little of this is new, but nowadays it's hard to tell. So we must be content to give you an insight, or save you some cycles, and to welcome further contributions of items, new or used.

    Read more →
  • ShowScoop

    ShowScoop

    ShowScoop is a website and mobile app platform on which users can rate and review artists, concerts, and music festivals that they have seen/attended. The reviews and ratings are designed to be informative of how well such performances are live. This helps concert-goers decide which live music events they want to attend. == History == ShowScoop was founded in August 2012 by Micah Smurthwaite and is based out of San Diego, CA. In February 2013, ShowScoop launched its mobile app at the SF Music Tech Summit. The application is currently available on the iPhone, with plans to expand into the Android market in the future. == Services == ShowScoop uses crowdsourcing to provide accurate ratings of live concert experiences. In addition to viewing ratings, users are encouraged to rate and review concerts they have attended. The ShowScoop database includes nearly one million artists and over 2.5 million live music events. ShowScoop users can rate artists on four aspects of the performance: stage presence, crowd interaction, sound quality, and visual effects. The rating system uses an ascending scale from one to five in each of the aspects, with five being the highest score. In addition to the quantitative ratings, ShowScoop users are also free to write qualitative reviews in a provided comment section. This allows users to explain their ratings and add further insight or opinion. ShowScoop incorporates several facets of social media into its services. Users can create a user profile to share limited personal information and store their ratings and reviews. Users are also given the option of sharing their evaluations with their social networks on Facebook and Twitter. Users can "like" reviews, follow artists, and follow other ShowScoop users. The mobile app allows users to take photos, apply filters, and share the final image in conjunction with reviews and through Instagram. == Road Crew == ShowScoop's "Road Crew" is a group made up of top contributors within the ShowScoop community. The Road Crew assists in curating artist pages, assuring information quality and accuracy. In return, members of the Road Crew are given incentives, including free tickets to concerts and personal invitations to exclusive shows. Applicants to the Road Crew are judged on the number and quality of their reviews, the photos and videos they have posted, and their general engagement with the ShowScoop community in following and liking users and reviews.

    Read more →
  • Cancer Likelihood in Plasma

    Cancer Likelihood in Plasma

    Cancer Likelihood in Plasma (CLiP) refers to a set of ensemble learning methods for integrating various genomic features useful for the noninvasive detection of early cancers from blood plasma. An application of this technique for early detection of lung cancer (Lung-CLiP) was originally described by Chabon et al. (2020) from the labs of Ash Alizadeh and Max Diehn at Stanford. This method relies on several improvements to cancer personalized profiling by deep sequencing (CAPP-Seq) for analysis of circulating tumor DNA (ctDNA). The CLiP technique integrates multiple distinctive genomic features of a cancer of interest findings within a machine-learning framework for cancer detection. For example, studies have shown that the majority of somatic mutations found in cell-free DNA (cfDNA) are not tumor derived, but instead reflect clonal hematopoeisis (also known as CHIP). Even though CHIP tends to target specific genes, it also involves many generally non-recurrent mutations that can be shed from leukocytes and detected in cfDNA, regardless of whether profiling patients with cancer and healthy adults. However, genuine tumor derived ctDNA mutations can be distinguished from CHIP-derived mutations. This is because unlike tumor-derived mutations, CHIP-derived mutations that are shed from leukocytes into plasma tend to occur on longer cfDNA fragments, and to lack specific mutational signatures such as those associated with tobacco smoking in lung cancer that are also found in tumor derived ctDNA molecules. CLiP integrates these features within hierarchical ensemble machine learning models that consider somatic mutations and copy number alternations, among other features. While the CLiP method is unique in relying exclusively on mutations and copy number alterations, it is related to a variety of other liquid biopsy methods being commercially developed for early cancer detection using ctDNA and proteins (e.g., CancerSEEK / DETECT-A ), cfDNA fragmentation patterns (e.g., DELFI), and DNA methylation (e.g., cfMeDIP-Seq, Grail). While the CLiP method has not yet been broadly applied for population-based cancer screening, it has been shown to distinguish discriminate early-stage lung cancers from risk-matched controls across multiple cohorts of patients enrolled across the US.

    Read more →
  • Information and media literacy

    Information and media literacy

    Information and media literacy (IML) is a combination of information literacy and media literacy. It enables people to show and make informed judgments as users of information and media, as well as to become skillful creators and producers of information and media messages. The transformative nature of IML includes creative works and creating new knowledge; to publish and collaborate responsibly requires ethical, cultural and social understanding. IML is also known as media and information literacy (MIL). UNESCO first adopted the term MIL in 2008 as a "composite concept" combining the competencies of information literacy and media literacy. UNESCO emphasizes the importance of global education in media and information literacy, and in 2013 defined Media and Information Literacy (MIL) as the ability to access, evaluate, use, and create information and media content in critical and ethical ways. Prior to the 1990s, the primary focus of information literacy was research skills. Media literacy, a study that emerged around the 1970s, traditionally focuses on the analysis and the delivery of information through various forms of media. Information literacy, as a skill proposed as early as 1974, centers on an individual's ability to recognize information needs and effectively locate, evaluate, and use information. These days, the study of information literacy has been extended to include the study of media literacy in many countries like the UK, Australia and New Zealand. It is also referred to as information and communication technologies (ICT) in the United States. Educators such as Gregory Ulmer have also defined the field as electracy.Media literacy is the ability to actively inquire into and think critically about information. It includes the ability to understand, evaluate, and create media content, and is an essential skill in today's information society. Livingstone, Van Couvering, and Thumim (2008) described the distinction between media literacy and information literacy: "Media literacy views media as lenses or windows for observing the world and expressing the self, whereas information literacy sees information as a tool for taking action in the world." == Integration of media and information literacy == Historically, the fields of information and media literacy have been separate, but over the course of the 21st century there have been calls to integrate both fields. Most definitions of information and media literacy include not only the abilities to locate, access, and analyze information but also the ability to create information. Only by integrating media literacy with information literacy can students better understand the sources of information and how it is used. Media education has primarily taken place in educational institutions, while information education has primarily occurred in libraries. Discussions surrounding the overlap of information literacy and media literacy came to fruition in the mid-to-late 2000s and 2010s as noted by Marcus Leaning. == In the digital age == The definition of literacy is "the ability to read and write". In practice many more skills are needed to locate, critically assess and make effective use of information. By extension, literacy now also includes the ability to manage and interact with digital information and media, in personal, shared and public domains. Historically, "information literacy" has largely been seen from the relatively top-down, organisational viewpoint of library and information sciences. However the same term is also used to describe a generic "information literacy" skill. The modern digital age has led to the proliferation of information spread across the Internet. Individuals must be able to recognize whether information is true or false and better yet know how to locate, evaluate, use, and communicate information in various formats; this is called information literacy. Towards the end of the 20th century, literacy was redefined to include "new literacies" relating to the new skills needed in everyday experience. "Multiliteracies" recognised the multiplicity of literacies, which were often used in combination. "21st century skills" frameworks link new literacies to wider life skills such as creativity, critical thinking, accountability. What these approaches have in common is a focus on the multiple skills needed by individuals to navigate changing personal, professional and public "information landscapes". As the conventional definition of literacy itself continues to evolve among practitioners, so too has the definition of information literacies. Noteworthy definitions include: Zurkowski defined information literacy as "the ability to find known or knowable content on any subject." CILIP, the Chartered Institute of Library and Information Practitioners, defines information literacy as "the ability to think critically and make balanced judgements about any information we find and use". In the United States, the definition proposed by the Association of College and Research Libraries (ACRL) is the most widely recognized. It defines information literacy as "a set of abilities requiring individuals to recognize when information is needed and to locate, evaluate, and use the needed information effectively." JISC, the Joint Information Systems Committee, refers to information literacy as one of six "digital capabilities", seen as an interconnected group of elements centered on "ICT literacy". Mozilla groups digital and other literacies as "21st century skills", a "broad set of knowledge, skills, habits and traits that are important to succeed in today's world". UNESCO, the United Nations Educational, Scientific and Cultural Organization, recognizing the necessity of teaching and learning both traditional and new types of information, the global importance of education was emphasized in 2008 through the "Teacher Media and Information Literacy (MIL) Curriculum". It defines MIL as a set of competencies that enable citizens to access, retrieve, understand, evaluate, use, create, and share information and media content in all formats through various tools in a critical, ethical, and effective manner, so as to participate in and carry out personal, professional, and social activities. Besides this, UNESCO also asserts information literacy as a "universal human right". == 21st-century students == In modern society, although the overall level of education has improved, the channels for knowledge production and dissemination have become increasingly diverse and commercialized, and traditional authoritative institutions no longer hold a monopoly over knowledge validation. While digital platforms have broadened access to information, they have also weakened trust mechanisms and evaluation standards, making epistemological skepticism a norm. Moreover, with the rise and spread of social media, misinformation and disinformation can be just as easily accessed in both densely and sparsely populated areas. These factors further underscore the importance of information literacy education. The IML learning capacities prepare students to be 21st century literate. According to Jeff Wilhelm (2000), "technology has everything to do with literacy. And being able to use the latest electronic technologies has everything to do with being literate." He supports his argument with J. David Bolter's statement that "if our students are not reading and composing with various electronic technologies, then they are illiterate. They are not just unprepared for the future; they are illiterate right now, in our current time and context". In a broader sense, developing this advanced competency of media and information literacy is essential, as it is crucial for students to exercise their freedom of expression in the 21st century. Wilhelm's statement is supported by the 2005 Wired World Phase II (YCWW II) survey conducted by the Media Awareness Network of Canada on 5000 Grade 4 – 11 students. The key findings of the survey were: 62% of Grade 4 students prefer the Internet. 38% of Grade 4 students prefer the library. 91% of Grade 11 students prefer the Internet. 9% of Grade 11 students prefer the library. Marc Prensky (2001) uses the term "digital native" to describe people who have been brought up in a digital world. The Internet has been a pervasive element of young people's home lives. 94% of kids reported that they had Internet access at home, and a significant majority (61%) had a high-speed connection. By the time kids reach Grade 11, half of them (51 percent) have their own Internet-connected computer, separate and apart from the family computer. The survey also showed that young Canadians are now among the most wired in the world. Contrary to the earlier stereotype of the isolated and awkward computer nerd, today's wired kid is a social kid. In general, many students are better networked through the use of technology than most teachers and parents, who may not understand the abilities of technology.

    Read more →