---
/doc/cs/hardware/1951-good.pdf
Review of a book by D. R. Hartree
Irving John Good
1951-01
2023-08-15
[("doi","10.2307/2980914")]
ai cs/hardware
<p>This book consists essentially of a short series of lectures delivered by the author at the University of Illinois in 1948.</p>
<p>…On page 61 a prediction is made that the decimal representation of numbers will probably oust the binary representation in general-purpose computers, partly because of the greater ease of “troubleshooting” when the decimal representation is used. This may well be true for computers in the strict sense, but in general-purpose machines intended for logic, for pure mathematics in general, for the theory of numbers in particular, and for the analysis of the nervous system, the binary representation is liable to remain more convenient (except perhaps for multi-valued logics). It cannot be predicted for any of these subjects that their mechanization will not ultimately become of great practical importance.</p>
<p>The author has an open mind on the “exciting” question of whether machines will be constructed which will “think for themselves”, ie. which will handle symbols in a non-predictable but useful manner. If this will be possible for future machines, then it is presumably also possible (though perhaps inconvenient) for machines designed before 1948, a fact which the author has apparently overlooked. It is a question of producing a suitable program. This question has received some attention from Turing and others. A randomizing device would be required, and could be supplied by placing random numbers in the store.</p>
<p>The author is clearly right in using the word “exciting”, since a machine which was so nearly human (or perhaps superhuman) could become a modern oracle. The threshold between a machine which was the intellectual inferior or superior of a man would probably be reached if the machine could do its own programming. Such speculations would be contrary to the matter-of-fact style of the book.</p>
---
/doc/ai/1962-kelley.pdf
Method of Gradients
Henry J. Kelley
1962-01-01
2019-08-29
[("doi","10.1016/S0076-5392(08)62094-9")]
ai math
<p>The <strong>method of gradients</strong>, also known as method of steepest descent, is an elementary concept for the solution of minimum problems. In recent years the computational appeal of the method has led to its adoption in a variety of application such as multivariable minimum problems of ordinary calculus, solution of systems of algebraic equations, integral equations, and <a href="https://en.wikipedia.org/wiki/Calculus_of_variations">variational problems</a>.</p>
<p>This chapter begins with a discussion of the main features of the gradient method in the context of ordinary minimum problems subject to constraints. It also discusses the variational problems of flight performance, introducing Green’s functions in the role played by <a href="!W">partial derivatives</a> in ordinary minimum problems and attempting to preserve an analogy between the two classes of problems in the subsequent development.</p>
<p>The close relationship between Green’s functions or influence functions and the error coefficients of guidance theory has drawn attention to the usefulness of the adjoint system technique in guidance analysis.</p>
---
/doc/ai/1962-bryson.pdf
A Steepest-Ascent Method for Solving Optimum Programming Problems
A. E. Bryson, W. F. Denham
1962-06-01
2019-08-28
[("doi","10.1115/1.3640537")]
ai math
<p>A systematic and rapid steepest-ascent numerical procedure is described for solving two-point boundary-value problems in the calculus of variations for systems governed by a set of nonlinear ordinary differential equations. Numerical examples are presented for minimum time-to-climb and maximum altitude paths for a supersonic interceptor and maximum-range paths for an orbital glider.</p>
<p>[<strong>Keywords</strong>: boundary-value problems, computer programming, differential equations, <a href="https://en.wikipedia.org/wiki/Calculus_of_variations">variational techniques</a>]</p>
<p>…A systematic and rapid steepest-ascent numerical procedure is described for determining optimum programs for nonlinear systems with terminal constraints. The procedure uses the concept of local linearization around a nominal (non-optimum) path. The effect on the terminal conditions of a small change in the control variable program is determined by numerical integration of the adjoint differential equations for small perturbations about the nominal path. Having these adjoint (or influence) functions, it is then possible to determine the change in the control variable program that gives maximum increase in the pay-off function for a given mean-square perturbation of the control variable program while simultaneously changing the terminal quantities by desired amounts. By repeating this process in small steps, a control variable program that minimizes one quantity and yields specified values of other terminal quantities can be approached as closely as desired.</p>
<p>Three numerical examples are presented: (<em>a</em>) The angle-of-attack program for a typical supersonic interceptor to climb to altitude in minimum time is determined with and without specified terminal velocity and heading. (<em>b</em>) The angle-of-attack program for the same interceptor to climb to maximum altitude is determined, (<em>c</em>) The angle-of-attack program is determined for a hypersonic orbital glider to obtain maximum surface range starting from satellite speed at 300,000 ft altitude.</p>
---
/doc/ai/1963-kelley.pdf
Singular Extremals In Lawden’s Problem Of Optimal Rocket Flight
Henry J. Kelley
1963-07-01
2019-08-29
[("doi","10.2514/3.1859")]
ai math
<p>The problem of optimal rocket flight in an inverse square law force field has been studied extensively by Lawden and Leitmann. Periods of zero thrust, intermediate thrust, and maximum thrust are possible subarcs of the solution according to analysis of the <a href="!W">Euler-Lagrange equations</a> and the <a href="!W">Weierstrass necessary condition</a>. Arcs of intermediate thrust have been examined recently by Lawden; however, the question of whether or not such arcs actually may furnish a minimum has been left unresolved.</p>
<p>The present paper derives the singular extremals of Lawden’s problem by means of the Legendre-Clebsch necessary condition applied in a transformed system of state and control variables.</p>
<p>These are obtained as circular orbits along which the thrust is zero and intermediate thrust arcs are found in Lawden’s analysis. Since these solutions satisfy only the weak form of the Legendre-Clebsch condition, ie. the extremals are singular in the transformed system of variables, the question of their minimality remains unanswered.</p>
---
/doc/ai/1966-good.pdf
Speculations Concerning the First Ultraintelligent Machine
Irving John Good
1966-01
2023-08-14
[("doi","10.1016/S0065-2458(08)60418-0")]
ai existential-risk psychology/neuroscience
<p>An ultra-intelligent machine is a machine that can far surpass all the intellectual activities of any man however clever. The design of machines is one of these intellectual activities; therefore, an <a href= "https://en.wikipedia.org/wiki/Artificial_general_intelligence">ultra-intelligent machine</a> could design even better machines [<a href="https://en.wikipedia.org/wiki/Intelligence_explosion">intelligence explosion</a>/<a href="https://en.wikipedia.org/wiki/Technological_singularity">Singularity</a>].</p>
<p>To design an ultra-intelligent machine one needs to understand more about the <a href= "https://en.wikipedia.org/wiki/Human_brain">human brain</a> or human thought or both. The physical representation of both meaning and recall, in the human brain, can be to some extent understood in terms of a subassembly theory, this being a modification of <a href="https://en.wikipedia.org/wiki/Hebbian_theory">Hebb’s cell assembly theory</a>.</p>
<p>The subassembly theory sheds light on the physical embodiment of memory and meaning, and there can be little doubt that both needs embodiment in an ultra-intelligent machine. The subassembly theory leads to reasonable and interesting explanations of a variety of psychological effects.</p> <ol> <li><p>Introduction</p></li>
<li><p>Ultraintelligent Machines and Their Value</p></li>
<li><p>Communication and Regeneration</p></li>
<li><p>Some Representations of “Meaning” and Their Relevance to Intelligent Machines</p></li>
<li><p>Search and Information Retrieval</p></li>
<li><p>Cell Assemblies and Subassemblies</p></li>
<li><p>An Assembly Theory of Meaning</p></li>
<li><p>The Economy of Meaning</p></li>
<li><p><strong>Conclusion</strong>: 11. Appendix: Informational and Causal Interactions</p></li>
<li><p>References</p></li> </ol> <p>…<strong>2. Ultraintelligent Machines and Their Value</strong>: Let an “ultraintelligent machine” be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind (see for example <a href= "/doc/cs/hardware/1951-good.pdf">Good 1951</a>, <a href="/doc/ai/nn/1959-good.pdf">Good 1959</a>, <a href= "/doc/ai/1962-good.pdf">Good 1962</a>). Thus the first ultraintelligent machine is the <em>last</em> invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. It is curious that this point is made so seldom outside of science fiction. It is sometimes worthwhile to take science fiction seriously.</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p> <ul> <li><p><a href="/doc/ai/scaling/2013-yudkowsky.pdf#miri" class="backlink-not id-not">Intelligence Explosion Microeconomics</a></p> </li>
<li><p><a href="/complexity" class="backlink-not id-not">Complexity no Bar to AI</a></p> </li>
<li> <p><a href="https://web.archive.org/web/20230710000944/https://frc.ri.cmu.edu/~hpm/project.archive/general.articles/1975/Raw.Power.html" class= "backlink-not id-not">The Role Of RAW POWER In INTELLIGENCE</a></p> </li>
<li><p><a href="https://www.youtube.com/watch?v=lfXxzAVtdpU&t=1763s" class="backlink-not id-not"><em>Gödel, Escher, Bach</em> author Douglas Hofstadter on the state of AI today § What about AI terrifies you?</a></p> </li>
<li><p><a href="/doc/existential-risk/2016-chalmers.pdf" class="backlink-not id-not">The Singularity: A Philosophical Analysis</a></p> </li>
<li><p><a href="https://arxiv.org/abs/1703.10987" class="backlink-not id-not">On the Impossibility of Supersized Machines</a></p> </li>
<li><p><a href="https://arxiv.org/abs/quant-ph/9908043" class="backlink-not id-not">Ultimate physical limits to computation</a></p> </li>
<li><p><a href="/doc/iq/high/2015-hofman.pdf" class="backlink-not id-not">Evolution of the Human Brain: From Matter to Mind</a></p> </li>
<li><p><a href="https://www.biorxiv.org/content/10.1101/058545.full" class="backlink-not id-not">Towards an integration of deep learning and neuroscience</a></p> </li>
</ul> </div>
---
/doc/ai/1966-ivakhnenko.pdf
Cybernetic Predicting Devices
A. G. Ivakhnenko, V. G. Lapa
1966-09-23
2019-08-29
ai
<p>[Predicting programs designed for large general-purpose computers constitute an important new tool in the control of production and economics. Nevertheless, small predicting filters have their own domain of application. They can be realized not only as programs for general-purpose computers, but also as simple analog devices with very fast response.</p>
<p>The authors discuss three principal methods of prediction in addition to some others. Prediction of deterministic processes, ie. extrapolation and interpolation. Prediction of stochastic processes, based on statistical prediction theory. Prediction based on adaptation or learning of the predicting filters.]</p>
---
/doc/ai/1968-duda.pdf
Experiments in the recognition of hand-printed text, part II: context analysis
Richard O. Duda, Peter E. Hart
1968-12-09
2019-08-30
[("doi","10.1145/1476706.1476736")]
ai
<p>The work described in this paper is part of a larger effort aimed at the recognition of hand-printed text.</p>
<p>In a <a href="/doc/ai/1968-munson-2.pdf" title="‘Experiments in the recognition of hand-printed text, part I: character recognition’, Munson 1968b">companion paper, Munson</a> describes the scanning of the text, and the preprocessing and tentative classification of individual characters.</p>
<p>In this paper, we describe techniques for using context to detect and correct errors in classification.</p>
---
/doc/ai/1968-munson-2.pdf
Experiments in the recognition of hand-printed text, part I: character recognition
John H. Munson
1968-12-09
2019-08-30
[("doi","10.1145/1476706.1476735")]
ai
<p>[<a href="/doc/ai/1968-duda.pdf" title="‘Experiments in the recognition of hand-printed text, part II: context analysis’, Duda & Hart 1968">part 2</a>] Among the many subject areas in the field of pattern recognition, the recognition of machine-printed and hand-printed alphanumeric characters has perhaps been the classic example to which people have referred in exemplifying the field.</p>
<p>Interest in character recognition has long run high; an extensive literature in hand-printed character recognition alone dates back to at least 1955.</p>
---
/doc/reinforcement-learning/exploration/1973-rechenberg.pdf
<em>Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution</em>
Ingo Rechenberg
1973
2020-10-14
ai reinforcement-learning/exploration
<p>The biological method of evolution is postulated to be an optimal strategy to adapt organisms to their environment. Therefore it may be promising to optimize engineering systems applying principles of biological evolution.</p>
<p>Laboratory experiments demonstrate that the simple biological mechanism of mutation and selection can be used successfully to evolve optimal systems in the field of fluid dynamics. A better imitation of the hereditary rules of higher organisms considerably improves the effectiveness of the evolutionary strategy.</p>
<p>Finally a theory is developed, which is based upon the assumption, that the quality of an engineering system can be compared with the fitness of a living organism. It results in a formula for the rate of convergence of the evolutionary strategy. This formula is then used to calculate the time of evolution required for the transition from the first living cell to present-day species.</p>
<p>[PhD thesis (in German) of <a href="https://en.wikipedia.org/wiki/Ingo_Rechenberg">Ingo Rechenberg</a>, an early research in <a href="https://en.wikipedia.org/wiki/Evolutionary_computation">evolutionary computation</a>; this thesis introduces a simple blackbox optimization method, <a href="https://en.wikipedia.org/wiki/Evolution_strategy">“evolution strategies”</a>, which can optimize even extremely complex things like neural networks by an iterative process of jittering the initial input with random noise to obtain <em>n</em> mutated variants, running them all through an ‘environment’ to measure ‘fitness’ of some sort, keeping the best-ranked point, and jittering again, etc. This constructs a ‘cloud’ of mutants around a prototype, and approximates a gradient in hill climbing.]</p>
<div class="aux-links-append see-also-append">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="/doc/reinforcement-learning/exploration/1989-rechenberg.pdf" class="backlink-not id-not">“Evolution Strategy: Nature’s Way of Optimization”</a></p></li>
<li><p><a href="/doc/reinforcement-learning/exploration/2000-rechenberg.pdf" class="backlink-not id-not">“Case studies in evolutionary experimentation and computation”</a></p></li>
</ul>
</div>
---
/doc/ai/1973-levin.pdf
Universal Sequential Search Problems
L. A. Levin
1973
2019-11-13
ai cs/algorithm
<p>[on <a href="https://en.wikipedia.org/wiki/Leonid_Levin">Levin’s</a> <a href="http://www.scholarpedia.org/article/Universal_search">universal search</a>; for English discussion, see <a href="https://core.ac.uk/download/pdf/82092683.pdf" title="Randomness conservation inequalities; information and independence in mathematical theories">Levin 1984</a>]</p>
<p>Several well-known large-scale problems of the “sequential search” type are discussed, and it is proved that those problems can be solved only in the time that it takes to solve any problems of the indicated type, in general.</p>
---
/doc/cs/algorithm/1982-perlis.pdf
Epigrams on Programming
Alan J. Perlis
1982-09
2019-11-14
[("doi","10.1145/947955.1083808")]
ai cs/algorithm design fiction/humor philosophy
<p>[130 epigrams on computer science & technology, compiled for <a href="https://en.wikipedia.org/wiki/Association_for_Computing_Machinery">ACM’s</a> <a href="!W">SIGPLAN</a> journal, by noted computer scientist and programming language researcher <a href="https://en.wikipedia.org/wiki/Alan_Perlis">Alan Perlis</a>. The epigrams are a series of short, programming-language-neutral, humorous statements about computers and programming, distilling lessons he had learned over his career, which are widely quoted.]</p>
<p>8. A programming language is low level when its programs require attention to the irrelevant…19. A language that doesn’t affect the way you think about programming, is not worth knowing…54. Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.</p>
<p>15. Everything should be built top-down, except the first time…30. In programming, everything we do is a special case of something more general—and often we know it too quickly…31. Simplicity does not precede complexity, but follows it…58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it…65. Make no mistake about it: Computers process numbers—not symbols. We measure our understanding (and control) by the extent to which we can arithmetize an activity…56. Software is under a constant tension. Being symbolic it is arbitrarily perfectible; but also it is arbitrarily changeable.</p>
<p>1. One man’s constant is another man’s variable. 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.</p>
<p>36. The use of a program to prove the 4-color theorem will not change mathematics—it merely demonstrates that the theorem, a challenge for a century, is probably not important to mathematics.</p>
<p>39. Re graphics: A picture is worth 10K words—but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures.</p>
<p>48. The best book on programming for the layman is <em>Alice in Wonderland</em>; but that’s because it’s the best book on anything for the layman.</p>
<p>77. The cybernetic exchange between man, computer and algorithm is like a game of musical chairs: The frantic search for balance always leaves one of the 3 standing ill at ease…79. A year spent in artificial intelligence is enough to make one believe in God…84. Motto for a research laboratory: What we work on today, others will first think of tomorrow.</p>
<p>91. The computer reminds one of Lon Chaney—it is the machine of a thousand faces.</p>
<p>7. It is easier to write an incorrect program than understand a correct one…93. When someone says “I want a programming language in which I need only say what I wish done”, give him a lollipop…102. One can’t proceed from the informal to the formal by formal means.</p>
<p>100. We will never run out of things to program as long as there is a single program around.</p>
<p>108. Whenever 2 programmers meet to criticize their programs, both are silent…112. Computer Science is embarrassed by the computer…115. Most people find the concept of programming obvious, but the doing impossible. 116. You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. 117. It goes against the grain of modern education to teach children to program. What fun is there in making plans, acquiring discipline in organizing thoughts, devoting attention to detail and learning to be self-critical?</p>
<p>[<strong>Warning</strong>: There is an HTML version which is more commonly linked; however, it appears to omit a few epigrams, and misspell others in harmful ways.]</p>
---
https://core.ac.uk/download/pdf/82092683.pdf
Randomness conservation inequalities; information and independence in mathematical theories
Leonid A. Levin
1984-04
2021-06-01
[("doi","10.1016/S0019-9958(84)80060-1")]
ai cs/algorithm/information statistics/probability
<p>The article further develops <a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity">Kolmogorov’s</a> <a href="https://en.wikipedia.org/wiki/Algorithmic_information_theory">algorithmic complexity theory</a>.</p>
<p>The definition of randomness is modified to satisfy strong invariance properties (conservation inequalities). This allows definitions of concepts such as <a href="https://en.wikipedia.org/wiki/Mutual_information">mutual information</a> in individual infinite sequences.</p>
<p>Applications to several areas, like probability theory, theory of algorithms, <a href="https://en.wikipedia.org/wiki/Intuitionistic_logic">intuitionistic logic</a> are considered. These theories are simplified substantially with the postulate that the objects they consider are independent of (have small mutual information with) any sequence specified by a mathematical property.</p>
---
/doc/ai/1987-mcdermott.pdf
A critique of pure reason
Drew McDermott
1987-02-01
2019-08-31
[("doi","10.1111/j.1467-8640.1987.tb00183.x")]
ai philosophy/epistemology reinforcement-learning/model
<p>[1987 retrospective by <a href="https://en.wikipedia.org/wiki/Drew_McDermott">noted proponent</a> of logic for planning and reasoning in AI (‘GOFAI’); McDermott criticizes his own work fiercely, along with that of his colleagues (particularly John McCarthy, Robert Moore, James Allen, Jerry Hobbs, & Patrick Hayes), describing the ‘logicist’ paradigm—that sufficiently ingenious and persistent application of logical reasoning, mostly first-order logic, can eventually give rise to human-level understanding of the world, planning & execution of actions, and eventually AGI.</p>
<p>McDermott concludes that the nature of such programs is that they are unable to see if they are making real progress (because a failure to infer something could simply reflect a lacking axiom), and worse, that such logics are not even an approximation to what intelligence is, or a role model, or that failures reflect poor choice of axioms, but that logics only verify things and do not compute useful things like plans, and collapse into verifying trivialities which do no useful intellectual work. Resorts to powerful tools like temporal logics or nonmonotonic logics sacrifice the philosophical advantages of logical inference in an attempt to get working systems, but may obtain neither. What is necessary is <em>doing without deduction</em>.]</p>
<p>It must be the case that a substantial portion of the inferences we want [to make] are deductions, or it will simply be irrelevant how many theorems follow deductively from a given axiom set.</p>
<p>…To summarize: The logicist project of expressing “naive physics” in first-order logic has not been very successful. One reason may be that the basic argument was flawed. You cannot write down axioms independent of a program for manipulating them if the inferences you are interested in are not deductions. Unfortunately, very few interesting inferences are deductions, and the attempts by logicists to extend logic to cover more territory have been disappointing. Hence we must resign ourselves to writing programs, and viewing knowledge representations as entities to be manipulated by the programs.</p>
<p>…Finally, I should admit that I am still doing work in the paradigm that I criticize here. In the domain of shape representation, so little is known that focusing on an idealization cannot but help teach us something. The problem I would like to tackle is representing the knowledge required to answer questions like, Could a paper clip be used as a key ring? The idealization I have been forced to fall back on is to prove that a paper clip of a certain size and shape could fit through the hole of a typical key. It should be obvious how much of the original problem this leaves out. Still, the territory is so unexplored that a tour through the idealized fragment could turn up something interesting. What one cannot hope for is to express as logical axioms everything there is to know about using shapes in unusual ways, before designing programs for this task. This will probably come as a shock to no one but me and a few friends.</p>
---
/doc/ai/1990-mitchell.pdf
Copycat: A computer model of high-level perception and conceptual slippage in analogy making
Melanie Mitchell
1990-01
2023-07-20
ai psychology
<p>Central to every facet of human intelligence are the abilities to flexibly perceive and categorize situations, to see beyond superficial details and understand the essence of a situation, and to make analogies, fluidly translating concepts from one situation into a different situation.</p>
<p>This dissertation describes <a href="https://en.wikipedia.org/wiki/Copycat_(software)"><strong>Copycat</strong></a>, a computer model of the mental mechanisms underlying this fluidity of concepts and high-level perception in the context of analogy-making.</p>
<p>For the purpose of isolating and modeling the mechanisms underlying these abilities, a microworld has been developed in which analogies can be made between idealized situations involving strings of letters. Analogy-making in this stripped-down, seemingly simple domain requires many of the same abilities humans use to understand and to make analogies between more complex, real-world situations.</p>
<p>Copycat constructs interpretations of situations and creates analogies between situations in this microworld. In Copycat, the perception of the essence of a situation and the recognition of essential similarity between two superficially different situations result from the interaction of a large number of simple, independent, and locally-acting perceptual agents with an associative and context-sensitive network of concepts. Central to the model is the notion of statistically emergent high-level behavior, in which the system’s low-level activities are permeated with nondeterminism, but more deterministic high-level behavior emerges from the statistics of the low-level nondeterminism.</p> <hr> <p>This dissertation first discusses some central issues in high-level perception and analogy-making and illustrates how the letter-string microworld captures these issues in an idealized form.</p>
<p>A description of the Copycat program is presented, and detailed results of its performance on a number of analogy problems are given, demonstrating the program’s flexibility and the range of its abilities.</p>
<p>Some problems with the model as it now stands are also discussed.</p>
<p>Copycat is then compared with related research in artificial intelligence and cognitive science, and a discussion is given of the program’s place in the spectrum of computer models of intelligence, ranging from high-level symbolic models to low-level sub-symbolic models.</p>
---
/doc/ai/tabular/1991-thrun.pdf
The MONK’s Problems-A Performance Comparison of Different Learning Algorithms
Sebastian B. Thrun, Jerzy W. Bala, Eric Bloedorn, Ivan Bratko, Bojan Cestnik, John Cheng, Kenneth A. De Jong, Saso Dzeroski, Douglas H. Fisher, Scott E. Fahlman, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R. S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich, H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, J. Zhang
1991-12
2023-01-06
ai/tabular
<p>Once upon a time, in July 1991, the monks of <a href="https://en.wikipedia.org/wiki/Corsendonk">Corsendonk</a> <a href="https://en.wikipedia.org/wiki/Oud-Turnhout">Priory</a> were faced with a school held in their priory, namely the 2<sup>nd</sup> European Summer School on Machine Learning. After listening more than one week to a wide variety of learning algorithms, they felt rather confused: Which algorithm would be optimal? And which one to avoid? As a consequence of this dilemma, they created a simple task on which all learning algorithms ought to be compared [benchmarked]: the <strong>three MONK’s problems</strong>.</p>
<p>This report summarizes the results</p>
<p>[<strong>Keywords</strong>: machine learning, MONK’s problems, AQ17-DCI, AQ17-HCI, AQ17-FCLS, AQ14-NT, AQ15-GA, Assistant Professional, mFOIL, ID5R, IDL, ID5R-hat, TDIDT, <a href="https://en.wikipedia.org/wiki/ID3_algorithm">ID3</a>, AQR, CN2, CLASSWEB, ECOBWEB, PRISM, <a href="https://en.wikipedia.org/wiki/Backpropagation">backpropagation</a>, Cascade Correlation]</p>
<p>This report summarizes a comparison of different learning techniques which was performed at the 2<sup>nd</sup> European Summer School on Machine Learning, held in Belgium during summer 1991. A variety of symbolic and non-symbolic leaning techniques—namely AQ17-DCI, AQ17-HCI, AQ17-FCLS, AQ14-NT, AQ15-GA, Assistant Professional, mFOIL, IDSR, IDL, IDSR-hat, TDIDT, ID3, AQR, CN2, CLASSWEB, ECOBWEB, PRISM, backpropagation, and Cascade Correlation—are compared on 3 classification problems, the <em>MONK’s problems</em>.</p>
<p>The MONK’s problems are derived from a domain in which each training example is represented by 6 discrete-valued attributes. Each problem involves learning a binary function defined over this domain, from a sample of training examples of this function. Experiments were performed with and without noise in the training examples.</p>
<p>One important characteristic of this comparison is that it was performed by a collection of researchers, each of whom was an advocate of the technique they tested (often they were the creators of the various methods). In this sense, the results are less biased than in comparisons performed by a single person advocating a specific learning method, and more accurately reflect the generalization behavior of the learning techniques as applied by knowledgeable users.</p>
---
/doc/ai/1991-winograd.pdf#page=7
Oral History Interview with Terry Allen Winograd (OH #237) § SHRDLU
Terry Allen Winograd, Arthur L. Norberg
1991-12-11
2022-11-16
ai cs/lisp
<p><a href="https://en.wikipedia.org/wiki/Terry_Allen_Winograd">Winograd</a> describes his education in computer science and introduction to linguistics at <a href="https://en.wikipedia.org/wiki/MIT_Computer_Science_and_Artificial_Intelligence_Laboratory">the Massachusetts Institute of Technology</a> (MIT). He discusses the work of <a href="https://en.wikipedia.org/wiki/Marvin_Minsky">Marvin Minsky</a> and other in artificial intelligence. He describes his move to the <a href="https://en.wikipedia.org/wiki/Stanford_University_centers_and_institutes#Stanford_Artificial_Intelligence_Laboratory">Stanford Artificial Intelligence Laboratory</a> and his additional linguistic research at <a href="https://en.wikipedia.org/wiki/PARC_(company)">Xerox-PARC</a>. Winograd compares the approach to artificial intelligence at MIT and Stanford. He describes his involvement with obtaining funding from the Information Processing Techniques Office of the <a href="https://en.wikipedia.org/wiki/DARPA">Defense Advanced Research Projects Agency</a>.</p>
<div class="interview">
<ul>
<li><p><a href="!W"><strong>Terry Allen Winograd</strong></a>: …And then
there was the planning. <a
href="https://en.wikipedia.org/wiki/Carl_Hewitt"
>Carl Hewitt</a> was
there. That’s the other person who was very relevant to my work because,
when I started trying to do the language, it became clear you needed
some kind of a planning/problem-solving platform on which to do the
question answering, and he had developed his ideas for <a
href="https://en.wikipedia.org/wiki/Planner_(programming_language)"
>Planner</a>.
Planner was one of the systems that was always about to be done
implementing. [laughs] There were <em>years</em> during which Planner
was going to be the wonderful thing to use, but didn’t <em>quite</em>
work yet, and so on.</p></li>
<li><p><a href="!W"><strong>Arthur L. Norberg</strong></a>: Was there
some pressure to do that, to provide some sort of application, in a
sense? To implement a program, or implement a language?</p></li>
<li><p><strong>Terry A. Winograd</strong>: Well, certainly
implementation was the coin of the realm. There was nobody doing a
thesis which was “here’s a bunch of ideas, period.” It was “here’s a
program that runs—and I have some ideas to lend to it.”</p></li>
<li><p><strong>A. L. Norberg</strong>: Well, I was just thinking of <a
href="!W" title="Roger Schank">Schank’s</a> <a
href="/doc/ai/1991-schank.pdf"
title="‘Where’s the AI?’, Schank 1991">recent piece</a> in <em>AI
Magazine</em>…where he says that one of the problems associated with AI
is that, in the past, anyway, until <a href="!W"
title="Inference engine">inference engines</a> were developed, that most
theses were simply investigations of good ideas, but they never got to
the implementation stage, because that wasn’t the point of the
dissertation in the first place. And so, we never learned, “we” being
the AI people, we never learned how to develop a product in the normal
sense of what a product would mean on the market.</p></li>
<li><p><strong>T. A. Winograd</strong>: Well, implementation and product
are two stages. That is, implementation was always there as the coin of
the realm. Implementation meant something you could show off. It didn’t
mean something that somebody else could use. So—I mean, my thesis was
certainly that way. I mean, you know, <a
href="https://en.wikipedia.org/wiki/SHRDLU#Functionality"
>the famous
dialogue</a> with <a href="https://en.wikipedia.org/wiki/SHRDLU"
>SHRDLU</a> where
you could pick up a block, and so on, I very carefully worked through,
line by line. If you sat down in front of it, and asked it a question
that wasn’t in the dialogue, there was some probability it would answer
it. I mean, if it was reasonably close to one of the questions that
<em>was</em> there in form and in content, it would probably get it.</p>
<p>But there was no attempt to get it to the point where you could
actually hand it to somebody and they could use it to move blocks
around. And there was no pressure for that whatsoever. Pressure was for
something you could demo. Take a recent example, <a
href="https://en.wikipedia.org/wiki/Nicholas_Negroponte"
>Negroponte’s</a> <a
href="https://en.wikipedia.org/wiki/MIT_Media_Lab"
>Media Lab</a>,
where instead of “perish or publish” it’s “demo or die.”</p>
<p>I think that’s a problem. I think AI suffered from that a lot,
because it led to “Potemkin villages”, things which—for the things they
actually did in the demo looked good, but when you looked behind that
there wasn’t enough structure to make it really work more
generally.</p></li>
<li><p><strong>Norberg</strong>: Is that a question of size, or is it a
question of the idea itself—the idea’s too small?</p></li>
<li><p><strong>Winograd</strong>: The idea of—?</p></li>
<li><p><strong>A L N</strong>: Well, let’s say the ideas behind the
blocks world, SHRDLU.</p></li>
<li><p><strong>T A W</strong>: Well, I think it was based on a
presupposition—at least an attitude—that making things work in the
large, really working, was just like getting a demo except more details
to be filled in. That is, if you had a basic idea and you could show it
worked on something then it was just a sort of grubby, detail work to
fill in all, you know, the hundreds of entries you would need to make it
work for real. But that—an idea—and this is tied to the top-down,
rationalistic way of approaching it, right. An idea which said, “here’s
a nice logical way this should work—would work in practice if you just
went far enough with the details.” And I think that’s been a problem
with AI all along. It’s true in problem-solving, right? Problem-solving,
as conceived by Newell and Simon and developed, and so on, has a certain
realm of applicability but it’s very different from, you know, you
coming to me and saying, “I have a problem. Would you help me solve
it?”, in terms of—to take the most obvious things—the hard part is
figuring out what the problem space <em>is</em>, not searching
it.</p></li>
</ul>
</div>
<div class="interview collapse">
<ul>
<li><p><strong>W</strong>: …One of the things I was really focusing on
was how to apply some of the symbol-processing ideas to syntactic
analysis, and that led to this thing I called <em>systemic grammar</em>,
a form of grammar that I got from <a
href="https://en.wikipedia.org/wiki/Michael_Halliday"
>Halliday</a>. Then
I modified it into what I called “procedural grammar” or something, and
tried to make it operational so it actually was a grammar and a parser
sort of rolled into one. It was before the days when there was much
sophisticated parsing theory, so it was fairly <em>ad hoc</em>, but
effective. It was complex <em>ad hoc</em>. I remember being very
interested in the question of syntactic-semantic mapping. I remember
filling notebook pages with examples of quantifiers building.</p></li>
<li><p><strong>N</strong>: What does that mean?</p></li>
<li><p><strong>W</strong>: “Find a red block”—what’s a good
example?—“Put 3 red blocks on two boxes.” Does that mean “put 3 on each
of two boxes”, or “put 3 altogether on two”, and so on? And just going
through lots of examples and realizing the degree of ambiguity that was
resolvable by context. But in general, there were examples like, you
know, “there were 6 vehicles with 4 wheels”; you know that you mean “six
4-wheeled vehicles”, as opposed to “six vehicles sharing 4 wheels.” But
if you say, “There were 6 people with 4 pizzas”, right, you mean “there
are 6 people all sharing 4 pizzas.” And that has to do with knowing
something about vehicles and wheels and pizzas, rather than syntax.</p>
<p>So, the question that I was really focusing on is how do you make use
of the world knowledge, so-called, to disambiguate constructs in natural
language which could have more than one interpretation. This quantifier
one was an example which was fairly clean in a sense that you had a
small, limited number of possible interpretations. It wasn’t open-ended,
right? You knew it was either 4 divided by 6 or 4 for each of 6, and you
had to pick which of those two it was. The same kind of problem came up
with pronouns. When you see the word “it”, what does it refer to? How do
you know? Well, to do that I needed to—first of all, I worked with a
bunch of examples: “Pick up the block and put it on the table.” Well,
“it” means the block. But is the block bigger than one that you picked
up before? Is “<em>it</em>”? Now, does it mean the one that you picked
up before, or the one that is bigger? And the style, the methodology,
was basically to come up with a particular area like that, to just sit
down and write out lots of examples to try to get a feel for why, in one
case, you had one answer and in a different case you had a different
[answer], and then to come up with mechanisms which were fairly
impromptu.</p>
<p>I mean, it wasn’t like somebody doing systematic theorem proving, or
something. Okay. But clearly, recency is important. If something is more
recent, you’re more likely to mean that. So, what if we keep around a
list of how recently things were mentioned? So, something which is
farther up on that list, gets a little extra point. But, it’s also not
always that, because if it’s the subject of the sentence, it’s more
likely to be it than the object, so we need to keep around the syntactic
structure.</p>
<p>So, the driving problem was this question of how do you use extra
information, part of which is textual—what’s mentioned recently—and part
of which is world knowledge—like pizzas and vehicles, and so on—to
disambiguate natural language utterances into clearly-defined, you know,
I mean <em>that</em> block and put it <em>that</em> place. I wasn’t
particularly using any background of linguistic theory or philosophy of
language theory. I learned all that stuff later, right after I left.</p>
<p>And that was the spirit, very much; you saw a problem, you came up
with a few examples that gave you ideas, and you programmed a mechanism
that handled those and then you started debugging. Minsky said—you know,
one of his famous quotes is—“a program is just an empty page that needs
to be debugged.”</p></li>
</ul>
</div>
<div class="interview collapse">
<ul>
<li><p><strong>W</strong>: …So, the learning of those rules in schools
wasn’t what enabled you to understand English. It may have gotten you
some additional stuff, but it’s clear that a 5-year-old understands
quite complex sentences.</p></li>
<li><p><strong>N</strong>: But what I’m driving at here is, you and I,
in whatever similar ways, know how to do that because we learned it
through listening to other people and testing the language ourselves,
and so on. But, to go from that to both a syntactic structure and a
semantic structure that will work in a program, it seems to me, is a
rather major step. How does it happen? What is happening to make that
step?</p></li>
<li><p><strong>W</strong>: See, there are two answers. In hindsight, I
will say, “I agree, that’s a major step.”</p>
<p>I think the sense that was dominant then, and I was operating under,
certainly, is that there are all sorts of things which are not available
directly to conscious introspection, but, if you <em>were</em> able to
examine them, would be more-or-less straightforward applications of
symbol processing. So, although I don’t know what goes on in my head
when I hear a sentence—I can’t think hard and think about how I [?]
sentence. If, in fact, I could do that what I would see, so goes the
story, right, is something like, O.K., first I find the noun, then I
find the verb, then I see if they match in features, and if they don’t
match in features then I do this. And you take the kind of thing a
clever programmer would do given an algorithmic task like that, and the
assumption is that’s what’s going on. So the problem is only to devise
the right one, not to see a deep problem involved. You know, you may
have trouble finding it. It may take awhile. It may be complicated.</p>
<p>Also I think there was a certain esthetic involved—I mean, it’s
driven by physics. It says, Okay, there’s messy stuff, but there’s also
nice, clean, simple stuff happening underneath. So, even though real
objects in the world move in all sorts of complicated ways, billiard
balls on frictionless surfaces don’t do that. And that, therefore, if
you come up with the algorithm for the billiard ball on the frictionless
surface, you’ll later be able to patch on the things that handle all
that other stuff.</p></li>
<li><p><strong>N</strong>: That’s what physicists say, that’s right. And
often they can’t do it.</p></li>
<li><p><strong>W</strong>: Well, for a certain class of problems they’ve
done it very well. Of course that’s what’s seductive about it.</p></li>
</ul>
</div>
<div class="interview collapse">
<ul>
<li><p><strong>N</strong>: …Can any similar statement be made about
SHRDLU that certain blocks of code were transferred to other programs
later on by others?</p></li>
<li><p><strong>W</strong>: No. SHRDLU, well, except there were
<em>minor</em> cases. One of my students at MIT went off to <a
href="https://en.wikipedia.org/wiki/Computer_Corporation_of_America"
>CCA</a> and wrote a
program, basically taking SHRDLU, which could answer questions about the
weather. But it wasn’t, you know, it wasn’t a line of development. It
was basically an exercise in changing the domain of SHRDLU. As far as I
know the code—I wrote a fairly detailed description of the workings in
the book, so a lot of things which people went ahead and did I’m sure
were influenced by the fact it said, “Here, you can do the structure
this way and this way and this way.”</p>
<p>But as far as actually picking up pieces of <a
href="https://en.wikipedia.org/wiki/Lisp_(programming_language)"
>LISP</a> code—well,
I’ll give you the obvious answer which is it ran in <a
href="https://en.wikipedia.org/wiki/Maclisp"
>Maclisp</a>. So
nobody who ran anything except <a
href="https://en.wikipedia.org/wiki/Incompatible_Timesharing_System"
>ITS</a> could run
the code. It never got imported to any other dialect of LISP as a
whole.</p>
<p>Now, maybe somebody—you know, I got—I’ve had requests over the years,
“Please send me your source code.” And I have no idea… maybe some
imported version is running in a Mac in Timbuktu, right, I just don’t
know. But as far as a project that had major visibility within AI,
everybody started from scratch and went their own way.</p></li>
</ul>
</div>
<div class="interview collapse">
<ul>
<li><p><strong>W</strong>: [on leaving AI] …But I think my energy really
got diverted from research in the <a
href="https://en.wikipedia.org/wiki/KRL_(programming_language)"
>KRL</a> area to
writing a book about language for a period of 5 years or so, if you take
the amount of time I was really working on the syntax part and then the
time I tried to develop the semantics part, and so on.</p>
<p>During that time was when I also started having more of these
conversations with people who were skeptical about AI. So, in the early
to mid-’70s—I should go back and pin down the date on this, I don’t
remember it—somebody—I think it was <a
href="https://en.wikipedia.org/wiki/Hubert_Dreyfus"
>Bert Dreyfus</a>,
but I’m not sure—initiated a lunch seminar. Very informal, in Berkeley,
where he and <a href="https://en.wikipedia.org/wiki/John_Searle"
>John Searle</a> and
various students of his came, and <a
href="https://en.wikipedia.org/wiki/Daniel_G._Bobrow"
>Danny Bobrow</a>
and I and various students of mine—and it was like once a month we’d
just get together and talk, and so on. Then, in the midst of that—or the
end of that, I’ve forgotten the exact sequence now—Flores, <a
href="https://en.wikipedia.org/wiki/Fernando_Flores"
>Fernando Flores</a>
ended up at Stanford. And that’s a whole other history. His history. But
I started talking to him and what’s clear, in hindsight, is that a lot
of the sort of doubts and difficulties with the AI paradigm that I was
feeling I had already been—they were stewing—I was ripe in some sense. I
had run up against a particular problem that it was clear that I didn’t
see a few more steps that way solving. KRL had reached a certain level
of complexity, and it wasn’t—I think at some intuitive level—I wouldn’t
have said this in those days—I could see that it was going to bog down
in its own complexity before it solved the problems in control
language.</p>
<p>And that partly has to do with this business, as I said earlier,
trying to do everything at once; to be a representation language and a
programming language, and so on. But I was certainly feeling at some
level—and I think trying to do the book on semantics—again, just
realizing the more and more complexity of all the different issues that
were coming in without a unifying feeling that they were coming
together.</p>
<p>So then, in the course of I would say ’76 to ’80, more or less, I
went through this gradual shift away from saying well, “I’ve got to get
back to getting KRL to work”, which was the feeling, “Once I get done
with my book, I’ll go do that”, to feeling, “Well, once I get done with
my book on syntax, I’m not sure that I want to pursue AI in the same
way. I’m not sure that’s the direction to go.” And in the course of
that, I started working on the book with Flores. And the earliest—I
don’t know when the earliest draft was about—late-’70s that we
actually—I think we originally promised the publisher an ’81 publication
date or something like that.</p>
<p>And then I just gradually shifted communities in some sense, an
interesting process of being less and less involved with Danny and with
the AI people. He had gotten diverted for reasons having to do with
restructuring a project at Xerox. And <a
href="https://en.wikipedia.org/wiki/Robert_Taylor_(computer_scientist)"
>Bob Taylor’s</a>
view about AI [against]. That’s another part of the history, which you
may or may not have chosen to get into. The situation there was shifting
in a way which just didn’t make it convenient to continue. Not that we
couldn’t have.</p>
<p>And I became more and more interested in the stuff I was doing with
Flores, and the issues that that was raising and the directions and by
the mid-’80s—or the early-’80s—my book was published in ’86, but it was
pretty much in that form earlier—I’d really come around to this
philosophical questioning of AI, as opposed to “We’re headed in the
right direction. Let’s just work harder and do more”, which is the
spirit I had had at MIT…Well, basically my research completely shifted.
I don’t call myself an AI researcher, or an AI person.</p></li>
</ul>
<hr />
<ul>
<li><p><strong>W</strong>: …There is a particular fight over who gets
the name “AI”, and I see that here when people like <a
href="https://en.wikipedia.org/wiki/Edward_Feigenbaum"
>Feigenbaum</a> and
<a
href="https://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)"
>McCarthy</a> are
vehement that <a href="https://en.wikipedia.org/wiki/David_Rumelhart"
>Rumelhart</a> is
not doing AI. Whatever it [<a href="!W">connectionism</a>] is he’s
doing, it’s not AI. And he doesn’t even have an appointment in this
department.</p>
<p>And then there’s the question of, in the grand future, what’s going
to be the answer to the quest? Are we going to be able to build the
model of ourselves? That may turn out not to be a straight-forward
extension of either what is the technology of AI or what’s going on in
neural nets, but you know, it’s a possibility. Somebody could build
something. It may come out of genetic engineering.</p></li>
</ul>
</div>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="/doc/ai/scaling/1995-breiman.pdf" class="backlink-not id-not">Reflections After Refereeing Papers for NIPS</a></p></li>
<li><p><a href="/doc/ai/scaling/2009-halevy.pdf" class="backlink-not id-not">The Unreasonable Effectiveness of Data</a></p></li>
<li><p><a href="https://arxiv.org/abs/2004.13831" class="backlink-not id-not">A Review of Winograd Schema Challenge Datasets and Approaches</a></p></li>
<li><p><a href="https://onlinelibrary.wiley.com/doi/full/10.1111/j.1756-8765.2010.01116.x" class="backlink-not id-not">Emergence in Cognitive Science</a></p></li>
<li><p><a href="https://jetpress.org/volume1/moravec.htm" class="backlink-not id-not">When will computer hardware match the human brain?</a></p></li>
<li><p><a href="https://web.archive.org/web/20230710000944/https://frc.ri.cmu.edu/~hpm/project.archive/general.articles/1975/Raw.Power.html" class="backlink-not id-not">The Role Of RAW POWER In INTELLIGENCE</a></p></li>
<li><p><a href="https://arxiv.org/abs/2201.02387" class="backlink-not id-not">The Defeat of the Winograd Schema Challenge</a></p></li>
<li><p><a href="/doc/ai/nn/1993-olazaran.pdf" class="backlink-not id-not">A Sociological Study of the Official History of the Perceptrons Controversy [1993]</a></p></li>
</ul>
</div>
---
/doc/ai/1991-schank.pdf
Where’s the AI?
Roger C. Schank
1991-12-15
2022-11-16
[("doi","10.1609/aimag.v12i4.917")]
ai
<p>[see the contemporaneous <a href="/doc/ai/1991-winograd.pdf#page=7" title="‘Oral History Interview with Terry Allen Winograd (OH #237) § SHRDLU’, Winograd & Norberg 1991 (page 7)">Winograd interview</a> on SHRDLU] I survey 4 viewpoints about what AI is: (1) AI means magic bullets, (2) AI means inference engines, (3) AI means getting a machine to do something you didn’t think a machine could do (the “gee whiz” view), and (4) AI means having a machine learn. I describe a program exhibiting AI as one that can change as a result of interactions with the user.</p>
<p>Such a program would have to process hundreds or thousands of examples as opposed to a handful. Because AI is a machine’s attempt to explain the behavior of the (human) system it is trying to model, the ability of a program design to scale up is critical.</p>
<p>Researchers need to face the complexities of scaling up to programs that actually serve a purpose. The move from toy domains into concrete ones has 3 big consequences for the development of AI. First, it will force software designers to face the idiosyncrasies of its users. Second, it will act as an important reality check between the language of the machine, the software, and the user. Third, the scaled-up programs will become templates for future work.</p>
<p>…The correct AI question had to do with the generality of a solution to a problem, and there was a good reason. It is trivial to build a program to do what, say, <a href="https://en.wikipedia.org/wiki/Terry_Winograd">Winograd</a> 1972’s <a href="https://en.wikipedia.org/wiki/SHRDLU">SHRDLU</a> program did for 31 sentences. Just match 31 strings with 31 behaviors. It would take a day to program. People believed that Winograd’s program was an AI program because they believed that his program “did it right.” They believed it would scale up. They believed that it would work on more than 31 sentences. (In fact, so did he. See Winograd 1973). At the time, when I was asked my opinion of Winograd’s work, I replied that it would never work on a substantially larger number of sentences, nor would it work in different domains than the one for which it was designed. I did not reply that his program was not AI, however.</p>
<p>The fact that a program does not scale up does not necessarily disqualify it from being AI. The ideas in Winograd’s program were AI ideas; they just weren’t correct AI ideas in my opinion.</p>
---
/doc/psychology/writing/1994-vandenbosch.pdf
Measuring the complexity of writing systems
Antal van den Bosch, Alain Contenty, Walter Daelemansz, Beatrice de Gelder
1994-09-20
2023-09-01
ai cs/algorithm/information/compression psychology/writing
<p>We propose a quantitative operationalization of the complexity of a writing system. This complexity, also referred to as <a href="https://en.wikipedia.org/wiki/Orthographic_depth">orthographic depth</a>, plays a crucial role in psycholinguistic modeling of reading aloud (and learning to read aloud) in several languages.</p>
<p>The complexity of a writing system is expressed by two measures, viz. that of the complexity of letter-phoneme alignment and that of the complexity of grapheme-phoneme correspondences.</p>
<p>We present the alignment problem and the correspondence problem as tasks to 3 different data-oriented learning algorithms [tree-learning], and submit them to English, French and Dutch learning and testing material.</p>
<p>Generalisation performance metrics are used to propose for each corpus a two-dimensional writing system complexity value.</p>
<div class="aux-links-append see-also-append collapse"> <p><strong>See Also</strong>:</p> <ul> <li><p><a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019875" class= "backlink-not id-not">Universal Entropy of Word Ordering Across Linguistic Families</a></p> </li>
<li><p><a href="https://arxiv.org/abs/2111.14232" class="backlink-not id-not">Long-range and hierarchical language predictions in brains and algorithms</a></p> </li>
<li><p><a href="https://www.biorxiv.org/content/10.1101/2020.12.03.410399.full" class="backlink-not id-not">A hierarchy of linguistic predictions during natural language comprehension</a></p> </li>
<li><p><a href="https://www.sciencedirect.com/science/article/pii/S0010027722000580" class= "backlink-not id-not">Poor writing, not specialized concepts, drives processing difficulty in legal language</a></p> </li>
<li><p><a href="https://arxiv.org/abs/2207.09847" class="backlink-not id-not">Predicting Word Learning in Children from the Performance of Computer Vision Systems</a></p> </li> </ul> </div>
---
/doc/cs/cryptography/1995-impagliazzo.pdf
A Personal View of Average-Case Complexity
Russell Impagliazzo
1995-06-19
2019-11-17
[("doi","10.1109/sct.1995.514853")]
ai cs/computable cs/cryptography
<p>The structural theory of <a href="https://en.wikipedia.org/wiki/Average-case_complexity">average-case complexity</a>, introduced by <a href="/doc/cs/algorithm/1986-levin.pdf" title="Average Case Complete Problems">Levin 1986</a>, gives a formal setting for discussing the types of inputs for which a problem is difficult. This is vital to understanding both when a seemingly difficult (eg. <a href="https://en.wikipedia.org/wiki/NP-completeness">NP-complete</a>) problem is actually easy on almost all instances, and to determining which problems might be suitable for applications <em>requiring</em> hard problems, such as cryptography.</p>
<p>The paper attempts to summarize the state of knowledge in this area, including some “folklore” results that have not explicitly appeared in print. We also try to standardize and unify definitions. Finally, we indicate what we feel are interesting research directions.</p>
<p>We hope that the paper motivates more research in this area and provide an introduction to the area for people new to it.</p>
<p>[In that paper <a href="https://en.wikipedia.org/wiki/Russell_Impagliazzo">Impagliazzo</a> describes 5 possible worlds and their implications to computer science.</p>
<ol>
<li><p><strong>Algorithmica</strong>: <a href="https://en.wikipedia.org/wiki/P%3DNP">P=NP</a> or something “morally equivalent” like fast probabilistic algorithms for NP.</p>
<p>In the science-fiction world of Algorithmica, all optimization problems such as strong AI and math proofs and every form of algorithmic inference is trivial, all kinds of magic is possible; simply feed the data in, and out will come the smallest optimal answer or the smallest algorithm generating the data. Cryptography and privacy are impossible.</p></li>
<li><p><strong>Heuristica</strong>: NP problems are hard in the worst case but easy on average.</p></li>
<li><p><strong>Pessiland</strong>: NP problems hard on average but no <a href="https://en.wikipedia.org/wiki/One-way_function">one-way functions</a> exist. We can easily create hard NP problems, but not hard NP problems where we know the solution. This is the worst of all possible worlds, since not only can we not solve hard problems on average but we apparently do not get any cryptographic advantage from the hardness of these problems.</p></li>
<li><p><strong>Minicrypt</strong>: One-way functions exist, but we do not have <a href="https://en.wikipedia.org/wiki/Public-key_cryptography">public-key cryptography</a>.</p></li>
<li><p><strong>Cryptomania</strong>: Public-key cryptography is possible, i.e. 2 parties can exchange secret messages over open channels.</p></li>
</ol>
<p>Relevant followup work: <a href="https://www.quantamagazine.org/which-computational-universe-do-we-live-in-20220418/" title="Which Computational Universe Do We Live In? Cryptographers want to know which of five possible worlds we inhabit, which will reveal whether truly secure cryptography is even possible.">time-bounded</a> <a href="!W">Kolmogorov complexity</a>.]</p>
---
/doc/ai/1997-domingos.pdf
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Pedro Domingos, Michael Pazzani
1997-11-01
2019-09-05
[("doi","10.1023/A:1007413511361")]
ai statistics/bayes
<p>The simple <a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Bayesian classifier</a> is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored.</p>
<p>Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive.</p>
<p>This article shows that, although the Bayesian classifier’s probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality.</p>
<p>This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption.</p>
<p>Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is <em>a priori</em> much less appropriate to the domain.</p>
<p>This article’s results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.</p>
<p>[<strong>Keywords</strong>: Simple Bayesian classifier, naive Bayesian classifier, zero-one loss, optimal classification, induction with attribute dependences]</p>
---
/doc/ai/1999-provost-2.pdf
Efficient Progressive Sampling
Foster Provost, David Jensen, Tim Oates
1999-08-01
2019-09-06
[("doi","10.1145/312129.312188")]
ai reinforcement-learning/exploration
<p>Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. <em>Samples</em> often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious.</p>
<p>We analyze methods for <em>progressive sampling</em>—using progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling.</p>
<p>We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence.</p>
<p>We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient.</p>
---
/doc/ai/2001-taylor.pdf#page=6
Recent Developments in the Evolution of Morphologies and Controllers for Physically Simulated Creatures § A Re-implementation of Sims’ Work Using the MathEngine Physics Engine
Tim Taylor, Colm Massey
2001
2022-09-19
[("doi","10.1162/106454601300328034")]
ai reinforcement-learning/model-free
<p>…We now describe our own work in this area, conducted in 1999 and early 2000. Our project was in the first batch of a recent spate of studies to use MathEngine’s commercially available <a href="https://en.wikipedia.org/wiki/Physics_engine">physics engine</a> [apparently now named <a href="https://en.wikipedia.org/wiki/Vortex_(software)">Vortex</a>], a version of which (SDK 1.1) is available free for academic use.<sup>20</sup> The system was basically a re-implementation of that written by <a href="https://en.wikipedia.org/wiki/Karl_Sims">Karl Sims</a> in 1994.</p>
<p>…We used a number of different fitness functions for scoring the success of each creature in its environment, but they all basically rewarded creatures for movement. The definition of the fitness function in fact turned out to be surprisingly difficult to get right, even when we just wanted to reward creatures for moving forward. A straightforward function that simply measured the distance moved by the creature’s center of mass over the period of evaluation had a tendency to select for creatures that (in the fluid environment) produced an initial thrust to move away from their starting position but showed no further movement and soon slowed to a halt. Such creatures would have high fitness relative to most of the randomly generated creatures in the early generations and would therefore be selected. However, it is clear that their fitness could be improved if they repeated the thrust movement to swim further and faster. Unfortunately it appeared that in many cases where these “one push” creatures were selected in the initial generations, the population reached an evolutionary impasse (a local optimum in the fitness landscape) and had no easy mutational routes to higher fitness…Additionally, if the distance moved by the creature was being measured at various time slices throughout the evaluation period (so that these various distances can be weighted and summed to give a final fitness score), we needed to decide whether to score distance moved in <em>any</em> direction at any one time slice equally (in which case there was no pressure to evolve creatures that swam in a straight line over the whole evaluation period), or whether to reward only distance moved in one particular direction (and if so, in <em>which</em> direction). It was not difficult to make pragmatic decisions about such choices, but the point is that the choice of fitness function even for seemingly straightforward behaviors is not trivial and usually requires considerable experimentation to get right. The function that successfully produces the desired behaviors can often be somewhat more complicated than might initially have been thought.</p>
<p>Even the method used to measure the position of a creature at a given instant was not straightforward. In most runs we used the center of mass. However, in some runs creatures evolved that would initially adopt a compact, folded configuration, then as the evaluation period proceeded they would “unfold” in a particular direction. This unfolding had the effect of shifting the creature’s center of mass, thereby increasing its fitness. Again, if this trick was selected in the early generations of a run, it was sometimes difficult for the population to jump out of this local fitness optimum and find continuous movements that would generate higher fitness scores. We experimented with various other ways of measuring distance moved, such as using the distance moved by the body part that had moved least over the duration of the evaluation. The general problem is, no matter what fitness function is used, there often seems to be a way for creatures to score highly on it while not performing the sort of behavior that we, as designers of the function, had hoped for. This problem is not insurmountable; with a more careful specification of the function all “undesired” behaviors could presumably be detected and given low fitness scores. However, this need for careful design of very specific, detailed fitness functions runs counter to one of our goals of implementing the system, namely, to use it as a method of automatically generating creatures given only a high level specification of the required behavior. Nevertheless, while the use of very specific fitness functions can certainly increase the chances of evolving the desired behaviors in any given run, even using straightforward fitness functions will <em>sometimes</em> produce the desired results (as will be demonstrated in the rest of this section), so our goal was at least partially fulfilled.</p>
<p>…A number of checks were also made to overcome limitations in the simulation software. Despite various attempts to limit the magnitude of the forces applied to joints, creatures would still sometimes evolve whose movements entailed forces and velocities that were too great for the physics engine to resolve at the given size of the integration step. In these cases, the physics engine tended to accumulate numerical errors to a point where the creature irrecoverably exploded (ie. the constraint solver failed to converge on a solution, and the integrator then generated incorrect velocities, giving the impression that the body parts had blown apart in random directions)…The MathEngine SDK does generate some runtime warnings that indicate that this kind of situation is imminent. We kept a tally of the number of such warnings that each creature generated and aborted the simulation of any creature that had generated more than a certain threshold number of them. We also checked whether a creature had actually exploded throughout its evaluation (by checking for high velocities, etc.) and immediately aborted any that had.</p>
<p>Note that we were using MathEngine’s SDK 1.1 for this work; subsequent experience with using their latest offering (the Dynamics Toolkit 2.0 alpha release) suggests that the software is now much more stable. However, our more recent experiences with using both MathEngine and other physics engines (eg. <a href="https://en.wikipedia.org/wiki/Havok_(software)">Havok</a>)<sup>12</sup> for this sort of work suggest that they all have some weaknesses in stability of simulation in certain situations. Unfortunately, it is in the nature of evolutionary algorithms that such weaknesses will almost inevitably be encountered. A recent review article has tested the stability of the MathEngine, Havok, and Ipion engines [since acquired by Havok] in a variety of situations.<sup>16, 17</sup> Although these products are improving, the current situation is that, no matter which physics engine is used, it is likely that a certain number of stability checks of the type just described will be required in any evolutionary system of this kind.</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="/doc/reinforcement-learning/exploration/2000-rechenberg.pdf" class="backlink-not id-not">Case studies in evolutionary experimentation and computation</a></p></li>
<li><p><a href="/doc/science/2001-buss.pdf" class="backlink-not id-not">Accurate and Efficient Simulation of Rigid-Body Rotations</a></p></li>
<li><p><a href="https://www.nature.com/articles/s41467-021-25874-z" class="backlink-not id-not">Embodied intelligence via learning and evolution</a></p></li>
<li><p><a href="/doc/reinforcement-learning/exploration/1973-rechenberg.pdf" class="backlink-not id-not"><em>Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution</em></a></p></li>
</ul>
</div>
---
/doc/ai/2007-raina.pdf
Self-taught learning: transfer learning from unlabeled data
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, Andrew Y. Ng
2007-06-20
2024-02-09
[("doi","10.1145/1273496.1273592")]
ai
<p>We present a new machine learning framework called <strong>self-taught learning</strong> for using unlabeled data in <a href= "https://en.wikipedia.org/wiki/Supervised_classification">supervised classification</a> tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task.</p>
<p>Such unlabeled data is easier to obtain than in typical <a href= "https://en.wikipedia.org/wiki/Semi-supervised_learning">semi-supervised</a> or <a href= "https://en.wikipedia.org/wiki/Transfer_learning">transfer learning</a> settings, making self-taught learning widely applicable to many practical learning problems.</p>
<p>We describe an approach to self-taught learning that uses <a href="https://en.wikipedia.org/wiki/Sparse_coding">sparse coding</a> to construct higher-level features using the unlabeled data. These features form a succinct input representation and improve classification performance.</p>
<p>When using an <a href="https://en.wikipedia.org/wiki/Support-vector_machine">SVM</a> for classification, we further show how a <a href="https://en.wikipedia.org/wiki/Fisher_kernel">Fisher kernel</a> can be learned for this representation.</p>
---
/doc/ai/2007-elson.pdf#microsoft
Asirra: a CAPTCHA that exploits interest-aligned manual image categorization
Jeremy Elson, John R. Douceur, Jon Howell, Jared Saul
2007-10-01
2019-09-08
[("doi","10.1145/1315245.1315291")]
ai
<p>We present Asirra (<strong>Figure 1</strong>), a CAPTCHA that asks users to identify <a href="https://en.wikipedia.org/wiki/Cat">cats</a> out of a set of 12 photographs of both cats and dogs.</p>
<p>Asirra is easy for users; user studies indicate it can be solved by humans 99.6% of the time in under 30 seconds. Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it. Asirra’s image database is provided by a novel, mutually beneficial partnership with Petfinder.com. In exchange for the use of their 3 million images, we display an “adopt me” link beneath each one, promoting Petfinder’s primary mission of finding homes for homeless animals.</p>
<p>We describe the design of Asirra, discuss threats to its security, and report early deployment experiences. We also describe 2 novel algorithms for amplifying the skill gap between humans and computers that can be used on many existing CAPTCHAs.</p>
---
/doc/ai/2008-omohundro.pdf
The Basic AI Drives
Stephen M. Omohundro
2008-06-01
2019-09-08
[("doi","10.5555/1566174.1566226")]
ai genetics/selection reinforcement-learning/safe
<p>One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways.</p>
<p>We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted.</p>
<p>We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves.</p>
<p>We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption. We also discuss some exceptional systems which will <em>want</em> to modify their utility functions.</p>
<p>We next discuss the drive toward self-protection which causes systems try to prevent themselves from being harmed. Finally we examine drives toward the acquisition of resources and toward their efficient usage.</p>
<p>We end with a discussion of how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.</p>
---
/doc/ai/2008-golle.pdf
Machine learning attacks against the Asirra CAPTCHA
Phillipe Golle
2008-10-01
2019-09-08
[("doi","10.1145/1455770.1455838")]
ai
<p>The Asirra CAPTCHA [<a href="/doc/ai/2007-elson.pdf#microsoft" title="‘Asirra: a CAPTCHA that exploits interest-aligned manual image categorization’, Elson et al 2007">EDHS2007</a>], proposed at ACM CCS 2007, relies on the problem of distinguishing images of <a href="https://en.wikipedia.org/wiki/Cat">cats</a> and dogs (a task that humans are very good at). The security of Asirra is based on the presumed difficulty of classifying these images automatically.</p>
<p>In this paper, we describe a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra. This classifier is a combination of <a href="!W">support-vector machine</a> classifiers trained on color and texture features extracted from images. Our classifier allows us to solve a 12-image Asirra challenge automatically with probability 10.3%. This probability of success is statistically-significantly higher than the estimate of 0.2% given in [EDHS2007] for machine vision attacks. Our results suggest caution against deploying Asirra without safeguards.</p>
<p>We also investigate the impact of our attacks on the <em>partial credit</em> and <em>token bucket</em> algorithms proposed in [EDHS2007]. The partial credit algorithm weakens Asirra considerably and we recommend against its use. The token bucket algorithm helps mitigate the impact of our attacks and allows Asirra to be deployed in a way that maintains an appealing balance between usability and security. One contribution of our work is to inform the choice of safeguard parameters in Asirra deployments.</p>
<p>[<strong>Keywords</strong>: CAPTCHA, reverse Turing test, machine learning, support vector machine, classifier.]</p>
<p>…Our classifier is a combination of 2 support-vector machine<sup>5</sup> (SVM) classifiers trained on color and texture features of images. The classifier is entirely automatic, and requires no manual input other than the one-time labelling of training images. Using 15,760 color features, and 5,000 texture features per image, our classifier is 82.7% accurate. The classifier was trained on a commodity PC, using 13,000 labeled images of cats and dogs downloaded from the Asirra website<sup>1</sup>.</p>
---
/doc/cs/algorithm/2012-jarvisalo.pdf
The International SAT Solver Competitions
Matti Järvisalo, Daniel Le Berre, Olivier Roussel, Laurent Simon
2012-03-15
2019-11-20
[("doi","10.1609/aimag.v33i1.2395")]
ai cs/algorithm cs/hardware economics/experience-curve
<p>The <a href="https://satcompetition.github.io/">International SAT Solver Competition</a> is today an established series of competitive events aiming at objectively evaluating the progress in <a href="https://en.wikipedia.org/wiki/SAT_solver">state-of-the-art procedures</a> for solving <a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem">Boolean satisfiability</a> (SAT) instances.</p>
<p>Over the years, the competitions have substantially contributed to the fast progress in SAT solver technology that has made SAT a practical success story of computer science. This short article provides an overview of the SAT solver competitions.</p>
<figure>
<img src="/doc/ai/2012-jarvisalo-figure2-satsolverimprovementovertime20022011.jpg" class="invert" alt="Figure 2: Performance Evolution of the Best SAT Solvers 2002–2011. The farther to the right the data points are, the better the solver." />
<figcaption aria-hidden="true"><strong>Figure 2</strong>: <em>Performance Evolution of the Best SAT Solvers from 2002–2011.</em> The farther to the right the data points are, the better the solver.</figcaption>
</figure>
---
https://www.fhi.ox.ac.uk/reports/2012-1.pdf
Indefinite survival through backup copies
Anders Sandberg, Stuart Armstrong
2012-06-06
2021-12-20
ai statistics/probability statistics/survival-analysis
<p>If an individual entity endures a fixed probability μ <1 of disappearing (“dying”) in a given fixed time period, then, as time approaches infinity, the probability of death approaches certainty.</p>
<p>One approach to avoid this fate is for individuals to copy themselves into different locations; if the copies each have an independent probability of dying, then the total risk is much reduced. However, to avoid the same ultimate fate, the entity must continue copying itself to continually reduce the risk of death.</p>
<p>In this paper, we show that to get a non-zero probability of ultimate survival, it suffices that the number of copies grows logarithmically with time. Accounting for expected copy casualties, the required rate of copying is hence bounded.</p>
---
/doc/ai/tabular/2012-rintanen.pdf
Planning as satisfiability: Heuristics
Jussi Rintanen
2012-12
2019-09-09
[("doi","10.1016/j.artint.2012.08.001")]
ai/tabular cs/algorithm reinforcement-learning/model
<p>Reduction to SAT is a very successful approach to solving hard combinatorial problems in Artificial Intelligence and computer science in general. Most commonly, problem instances reduced to SAT are solved with a general-purpose <a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem#Algorithms_for_solving_SAT">SAT solver</a>. Although there is the obvious possibility of improving the SAT solving process with application-specific heuristics, this has rarely been done successfully.</p>
<p>In this work we propose a planning-specific variable selection strategy for SAT solving. The strategy is based on generic principles about properties of plans, and its performance with standard planning benchmarks often substantially improves on generic variable selection heuristics, such as VSIDS, and often lifts it to the same level with other search methods such as explicit state-space search with heuristic search algorithms.</p>
---
/doc/cs/algorithm/2013-strannegard.pdf
Bounded Kolmogorov Complexity Based on Cognitive Models
Claes Strannegård, Abdul Rahim Nizamani, Anders Sjöberg, Fredrik Engström
2013-01
2023-02-27
[("doi","10.1007/978-3-642-39521-5_14")]
ai cs/algorithm
<p>Computable versions of <a href="https://en.wikipedia.org/wiki/Andrey_Kolmogorov">Kolmogorov</a> <a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity">complexity</a> have been used in the context of pattern discovery.<sup>1</sup> However, these complexity measures do not take the psychological dimension of pattern discovery into account.</p>
<p>We propose a method for pattern discovery based on a version of Kolmogorov complexity where computations are restricted to a cognitive model with limited computational resources.</p>
<p>The potential of this method is illustrated by implementing it in a system used to solve number sequence problems. The system was tested on the number sequence problems of the IST <a href="https://en.wikipedia.org/wiki/IQ">IQ</a> test,<sup>2</sup> and it scored 28⁄38 problems, above average human performance, whereas the mathematical software packages <a href="https://en.wikipedia.org/wiki/Maple_(software)">Maple</a>, <a href="https://en.wikipedia.org/wiki/Wolfram_Mathematica">Mathematica</a>, and <a href="https://en.wikipedia.org/wiki/WolframAlpha">WolframAlpha</a> scored 9, 9, and 12, respectively.</p>
<p>The results obtained and the generalizability of the method suggest that this version of Kolmogorov complexity is a useful tool for pattern discovery in the context of AGI.</p>
---
/doc/ai/2015-zhu-2.pdf
Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education
Xiaojin Zhu
2015-01-01
2019-09-11
[("doi","10.5555/2888116.2888288")]
ai reinforcement-learning/meta-learning
<p>I draw the reader’s attention to machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model. In addition to generating fascinating mathematical questions for computer scientists to ponder, machine teaching holds the promise of enhancing education and personnel training. The Socratic dialogue style aims to stimulate critical thinking.</p>
<p>[cf. <a href="/doc/reinforcement-learning/preference-learning/2012-cakmak.pdf" title="Algorithmic and human teaching of sequential decision tasks">Cakmak & Lopes 2012</a>; <a href="https://arxiv.org/abs/1702.03465" title="Enabling Robots to Communicate their Objectives">Huang et al 2017</a>; <a href="https://arxiv.org/abs/2002.09089" title="Safe imitation learning via fast Bayesian reward inference from preferences">Brown & Niekum 2019</a>.]</p>
---
/doc/ai/2016-bayern.pdf
The Implications of Modern Business-Entity Law for the Regulation of Autonomous Systems
Shawn Bayern
2016-06
2019-09-11
[("doi","10.1017/S1867299X00005729")]
ai bitcoin economics
<p>Nonhuman autonomous systems are not <a href="https://en.wikipedia.org/wiki/Legal_person">legal persons</a> under current law. The history of organizational law, however, demonstrates that agreements can, with increasing degrees of autonomy, direct the actions of legal persons. Agreements are isomorphic with algorithms; that is, a legally enforceable agreement can give legal effect to the arbitrary discernible states of an algorithm or other process. As a result, autonomous systems may end up being able, at least, to emulate many of the private-law rights of legal persons.</p>
<p>This essay demonstrates a technique by which this is possible by means of limited liability companies (LLCs), a very flexible modern type of business organization.</p>
<p>The techniques that this essay describes are not just futuristic possibilities; as this essay argues, they are already possible under current law.</p>
---
/doc/ai/2019-coursey.pdf
Living with Harmony: A Personal Companion System by Realbotix™
Kino Coursey, Susan Pirzchalski, Matt McMullen, Guile Lindroth, Yuri Furuushi
2019
2019-09-14
[("doi","10.1007/978-3-030-19734-6_4")]
ai reinforcement-learning/robot technology
<p>Existing personal assistants and agents are <em>by design</em> limited in their ability to form or encourage close personal bonds.</p>
<p>The Harmony system is designed to be a customizable personal companion agent capable of close personal interaction via the user’s phone, virtual reality headset, as well as through a physical interactive android body. In this chapter, we will describe the history that led to Harmony’s creation, the unique challenges and the overall system design.</p>
<p>We will also look at user reactions to the system and anticipated future developments.</p>
<p>[<strong>Keywords</strong>: androids, personal assistant, virtual reality, <a href="https://realbotix.com/">Realbotix</a>, embodied agent, companion agent]</p>
---
/doc/ai/2019-mccorduck-thiscouldbeimportant.epub
<em>This Could Be Important: My Life and Times with the Artificial Intelligentsia</em>
Pamela McCorduck
2019-10-01
2022-07-22
ai
<p>[<a href="https://pressbooks.pub/thiscouldbeimportantbook/">web version</a>] <a href="https://en.wikipedia.org/wiki/Pamela_McCorduck">Pamela McCorduck</a> wrote the first modern history of artificial intelligence, <a href="https://www.amazon.com/Machines-Who-Think-Artificial-Intelligence/dp/1568812051"><em>Machines Who Think</em></a>, and spent much time pulling on the sleeves of public intellectuals, futilely trying to suggest that artificial intelligence could be important. Memoir, social history, group biography of the founding fathers of <a href="https://en.wikipedia.org/wiki/Artificial_intelligence">AI</a>, <em>This Could Be Important</em> [ISBN 9780359901388] follows the personal story of one AI spectator, from her early enthusiasms to her mature, more nuanced observations of the field.</p>
<hr />
<p>In the autumn of 1960, 20yo humanities student Pamela McCorduck encountered both the fringe science of early artificial intelligence, and <a href="https://en.wikipedia.org/wiki/C._P._Snow">C. P. Snow’s</a> <a href="https://en.wikipedia.org/wiki/The_Two_Cultures">Two Cultures lecture</a> on the chasm between the sciences and the humanities. Each encounter shaped her life. Decades later her lifelong intuition was realized: AI and the humanities are profoundly connected. During that time, she wrote the first modern history of artificial intelligence, <em>Machines Who Think</em>, and spent much time pulling on the sleeves of public intellectuals, trying futilely to suggest that artificial intelligence could be important. Memoir, social history, group biography of the founding fathers of AI, <em>This Could Be Important</em> follows the personal story of one AI spectator, from her early enthusiasms to her mature, more nuanced observations of the field.</p>
<ol>
<li><p>The Two Cultures</p>
<ol>
<li><p>Living in the Exponential</p></li>
<li><p>The Capacious Structure of Computational Rationality, Fast and Slow Thinking, an Intelligence Continuum</p></li>
<li><p>The Two Cultures</p></li>
<li><p>Thinking, Then and Now</p></li>
<li><p>Learning a New Way of Thinking at <a href="https://en.wikipedia.org/wiki/Stanford_University">Stanford</a></p></li>
<li><p>Revolution in the Rust Belt</p></li>
</ol></li>
<li><p>Part 2: Brains</p>
<ol start="7" type="1">
<li><p>Machines Who Think Is Conceived; <a href="https://en.wikipedia.org/wiki/John_McCarthy">John McCarthy</a> Says Okay</p></li>
<li><p>Over Christmas, We Invented a Thinking Machine</p></li>
<li><p>What the First Thinking Machine Thought</p></li>
<li><a href="https://en.wikipedia.org/wiki/Herbert_Simon">Herbert Simon</a></li>
<li><a href="https://en.wikipedia.org/wiki/Allen_Newell">Allen Newell</a></li>
<li><a href="https://en.wikipedia.org/wiki/MIT_Computer_Science_and_Artificial_Intelligence_Laboratory">The MIT Group</a></li>
<li><a href="https://en.wikipedia.org/wiki/Edward_Feigenbaum">Edward Feigenbaum</a></li>
<li><a href="https://en.wikipedia.org/wiki/Raj_Reddy">Raj Reddy</a> and the Dawn of Machine Learning</li>
</ol></li>
<li><p>Part 3: Culture Clash</p>
<ol start="15" type="1">
<li><p>Whiplashed by the Manichean Struggle Between the Two Cultures</p></li>
<li><p>A Turning Point</p></li>
<li><p>Dissenters</p></li>
</ol></li>
<li><ol start="18" type="1">
<li><p>Photo Gallery</p></li>
</ol></li>
<li><p>Part 4: The World Discovers Artificial Intelligence</p>
<ol start="19" type="1">
<li><a href="https://en.wikipedia.org/wiki/Fifth_Generation_Computer_Systems">Japan Wakes the World Up to AI</a></li>
<li><p>Stragglers from the Wreck of Time</p></li>
<li><p>A Long Dance with <a href="https://en.wikipedia.org/wiki/IBM">IBM</a></p></li>
<li><p>Being a 9-Day Wonder</p></li>
<li><p>Breaking and Entering into the House of the Humanities</p></li>
</ol></li>
<li><p>Part 5: Silicon Valley Sketchbook</p>
<ol start="24" type="1">
<li><p>The Silicon Valley Sketchbook</p></li>
</ol></li>
<li><p>Part 6: Arts and Letters</p>
<ol start="25" type="1">
<li><p>Art and Artificial Intelligence</p></li>
<li><p>The Story as the Marker of Human Intelligence?</p></li>
<li><p>The Digital Humanities</p></li>
<li><p>Humanities Now and Forever</p></li>
</ol></li>
<li><p>Part 7: And Wherefore Was It Glorious?</p>
<ol start="29" type="1">
<li><p>Elegies</p></li>
<li><p>The Male Gaze</p></li>
<li><p>A Dark Horse Comes Out of Nowhere</p></li>
<li><p>Doing the Right Things</p></li>
<li><p>This Could Be Important</p></li>
</ol></li>
</ol>
</div>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="https://jetpress.org/volume1/moravec.htm" class="backlink-not id-not">When will computer hardware match the human brain?</a></p></li>
<li><p><a href="https://web.archive.org/web/20230710000944/https://frc.ri.cmu.edu/~hpm/project.archive/general.articles/1975/Raw.Power.html" class="backlink-not id-not">The Role Of RAW POWER In INTELLIGENCE</a></p></li>
<li><p><a href="https://web.archive.org/web/20230718144747/https://frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html" class="backlink-not id-not">Robot Predictions Evolution</a></p></li>
<li><p><a href="/doc/ai/nn/1993-olazaran.pdf" class="backlink-not id-not">A Sociological Study of the Official History of the Perceptrons Controversy (1993)</a></p></li>
<li><p><a href="/doc/statistics/decision/2006-drescher-goodandreal.pdf" class="backlink-not id-not"><em>Good and Real: Demystifying Paradoxes from Physics to Ethics</em></a></p></li>
</ul>
</div>
---
https://www.openphilanthropy.org/research/modeling-the-human-trajectory/
Modeling the Human Trajectory
David Roodman
2020-06-15
2022-03-16
ai economics/automation history
<p>One strand of analysis that has caught our attention is about the pattern of growth of human society over many millennia, as measured by number of people or value of economic production. Perhaps the mathematical shape of the past tells us about the shape of the future. I dug into that subject. A draft of my technical paper <a href="/doc/economics/automation/2020-roodman.pdf" title="‘Superexponential [Modeling the Human Trajectory]’, Roodman 2020">is here</a>. (Comments welcome.) In this post, I’ll explain in less technical language what I learned.</p>
<p>It’s extraordinary that the larger the human economy has become—the more people and the more goods and services they produce—the faster it has grown on average. Now, especially if you’re reading quickly, you might think you know what I mean. And you might be wrong, because I’m not referring to exponential growth. That happens when, for example, the number of people carrying a virus doubles every week. Then the <em>growth rate</em> (100% increase per week) holds fixed. The human economy has grown <em>super</em>-exponentially. The bigger it has gotten, the faster it has doubled, on average. The global economy churned out <a href="$2019">$74</a> trillion in goods and services in 2019, twice as much as in 2000.<sup>1</sup> Such a quick doubling was unthinkable in the Middle Ages and ancient times. Perhaps our earliest doublings took millennia.</p>
<p>If global economic growth keeps accelerating, the future will differ from the present to a mind-boggling degree. The question is whether there might be some plausibility in such a prospect. That is what motivated my exploration of the mathematical patterns in the human past and how they could carry forward. Having now labored long on the task, I doubt I’ve gained much perspicacity. I did come to appreciate that any system whose rate of growth rises with its size is inherently unstable. The human future might be one of explosion, perhaps an economic upwelling that eclipses the industrial revolution as thoroughly as it eclipsed the agricultural revolution. Or the future could be one of implosion, in which environmental thresholds are crossed or the creative process that drives growth runs amok, as in an AI dystopia. More likely, these impulses will mix. I now understand more fully a view that shapes the work of Open Philanthropy. The range of possible futures is wide. So it is our task as citizens and funders, at this moment of potential leverage, to lower the odds of bad paths and raise the odds of good ones.</p>
<ol>
<li><p>The human past, coarsely quantified</p></li>
<li><p>Capturing the randomness of history</p></li>
<li><p>Land, labor, capital, and more</p></li>
<li><p>Interpreting infinity</p></li>
<li><p>Conclusion</p></li>
</ol>
<p>…<strong>Conclusion</strong>: I do not know whether most of the history of technological advance on Earth lies behind or ahead of us. I do know that it is far easier to imagine what has happened than what hasn’t. I think it would be a mistake to laugh off or dismiss the predictions of infinity emerging from good models of the past. Better to take them as stimulants to our imaginations. I believe the predictions of infinity tell us two key things. First, if the patterns of history continue, then some sort of economic explosion will take place again, the most plausible channel being AI. It wouldn’t reach infinity, but it could be big. Second, and more generally, I take the propensity for explosion as a sign of instability in the human trajectory. Gross world product, as a rough <a href="https://en.wikipedia.org/wiki/Proxy_(statistics)">proxy</a> for the scale of the human enterprise, might someday spike or plunge or follow a complicated path in between. The projections of explosion should be taken as indicators of the long-run tendency of the human system to diverge. They are hinting that realistic models of long-term development are unstable, and stable models of long-term development unrealistic. The credible range of future paths is indeed wide.</p>
---
/doc/ai/2020-xia-2.pdf
Ball <em>k</em>-means: A Fast Adaptive <em>k</em>-means with No Bounds
Shuyin Xia, Daowan Peng, Deyu Meng, Changqing Zhang, Guoyin Wang, Elisabeth Giem, Wei Wei, Zizhong Chen
2020-07-13
2023-02-20
[("doi","10.1109/TPAMI.2020.3008694")]
ai
<p>This paper presents a novel accelerated exact <a href="https://en.wikipedia.org/wiki/K-means_clustering"><em>k</em>-means</a> called as <strong>Ball <em>k</em>-means</strong> by using the ball to describe each cluster, which focus on reducing the point-<a href="https://en.wikipedia.org/wiki/Centroid">centroid</a> distance computation.</p>
<p>The “Ball <em>k</em>-means” can exactly find its neighbor clusters for each cluster, resulting distance computations only between a point and its neighbor clusters’ centroids instead of all centroids. What’s more, each cluster can be divided into “stable area” and “active area”, and the latter one is further divided into some exact “annular area”. The assignment of the points in the “stable area” is not changed while the points in each “annular area” will be adjusted within a few neighbor clusters. There are no upper or lower bounds in the whole process. Moreover, ball <em>k</em>-means uses ball clusters and neighbor searching along with multiple novel stratagems for reducing centroid distance computations.</p>
<p>In comparison with the current state-of-the art accelerated exact bounded methods, the <a href= "https://proceedings.mlr.press/v37/ding15.html">Yinyang algorithm</a> and the <a href= "https://proceedings.mlr.press/v48/newling16.html">Exponion algorithm</a>, as well as other top-of-the-line tree-based and bounded methods, the ball <em>k</em>-means attains both higher performance and performs fewer distance calculations, especially for large-<em>k</em> problems.</p>
<p>The faster speed, no extra parameters and simpler design of “Ball <em>k</em>-means” make it an all-around replacement of the naive <em>k</em>-means</p>
<p>[<strong>Keywords</strong>: ball <em>k</em>-means, <em>k</em>-means, ball cluster, stable area, active area, neighbor cluster]</p>
---
/doc/economics/automation/2020-roodman.pdf
Superexponential [Modeling the Human Trajectory]
David Roodman
2020-07-30
2020-07-30
ai economics/automation
<p>A scan of the history of gross world product (GWP) over millennia raises fundamental questions about the human past and prospect. What is the distribution of shocks ranging from recession to pandemic? Were the agricultural and industrial revolutions one-offs or did they manifest ongoing dynamics? Is growth exponential, if with occasional step changes in the rate, or is it superexponential? If the latter, how do we interpret the implication that output will become infinite in finite time?</p>
<p>This paper introduces the first coherent statistical model of GWP history. It casts a GWP series as a sample path in a <em>stochastic diffusion</em>, one whose specification is novel yet rooted in neoclassical growth theory.</p>
<p>After <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood</a> fitting to GWP back to 10,000 BC, most observations fall between the 40<sup>th</sup> and 60<sup>th</sup> percentiles of predicted distributions. The fit implies that GWP explosion is all but inevitable, in a median year of 2047.</p>
<p>This projection cuts against the steadiness of growth in income per person seen in the last two centuries in countries at the economic frontier. And it essentially contradicts the laws of physics. But neither tension justifies immediate dismissal of the explosive projection. Accelerating economic growth is better explained by theory than constant growth. And if physical limits are articulated in a neoclassical-type model by endogenizing natural resources, explosion leads to implosion, formally avoiding infinities. The quality of the superexponential fit to the past suggests not so much that growth is destined to ascend as that the human system is unstable.</p>
<p>[<strong>Keywords</strong>: endogenous growth, macroeconomic history, gross world product, <a href="!W">stochastic differential equations</a>]</p>
---
https://arxiv.org/abs/2009.08449
"Less Than One"-Shot Learning: Learning <em>n</em> Classes From <em>M</em> < <em>N</em> Samples
Ilia Sucholutsky, Matthias Schonlau
2020-09-17
2021-04-22
[("doi","10.48550/arXiv.2009.08449")]
ai
<p>Deep neural networks require large training sets but suffer from high computational cost and long training times. Training on much smaller training sets while maintaining nearly the same accuracy would be very beneficial. In the few-shot learning setting, a model must learn a new class given only a small number of samples from that class. One-shot learning is an extreme form of few-shot learning where the model must learn a new class from a single example.</p>
<p>We propose the <strong>‘less than one’-shot learning task</strong> where models must learn <em>N</em> new classes given only <em>M</em><<em>N</em> examples and we show that this is achievable with the help of soft labels. We use a soft-label generalization of the <a href="!W"><em>k</em>-Nearest Neighbors</a> classifier to explore the intricate decision landscapes that can be created in the ‘less than one’-shot learning setting.</p>
<p>We analyze these decision landscapes to derive theoretical lower bounds for separating <em>N</em> classes using <em>M</em><<em>N</em> soft-label samples and investigate the robustness of the resulting systems.</p>
---
/doc/ai/nn/2020-devlin.pdf
Guys and Dolls
Kate Devlin, Chloé Locatelli
2020-10-20
2020-10-20
[("doi","10.1007/978-3-658-29864-7_5")]
ai/nn reinforcement-learning/robot sociology/technology
<p>This chapter explores the creators and potential consumers of sex robots.</p>
<p>With Realbotix as our case study, we take a closer look at the language and sentiments of those developing the technology and those who are testing, consuming, or showing an interest in it. We do this by means of website and chat forum analysis, and via interviews with those involved.</p>
<p>From this, we can see the motivation for developing a sexual companion robot places the emphasis firmly on the companionship aspect, and that those involved in creating and consuming the products share an ideology of intimacy and affection, with sexual gratification only playing a minor role.</p>
---
https://www.sciencedirect.com/science/article/pii/S0001691820305746
How humans impair automated deception detection performance
Bennett Kleinberg, Bruno Verschuere
2021-02
2022-11-21
[("doi","10.1016/j.actpsy.2020.103250")]
ai psychology/cognitive-bias
<ul>
<li><p>Machine learning identified lies and truths about the future above the chance level.</p></li>
<li><p>Human judges were allowed to overrule or adjust the machine judgment.</p></li>
<li><p>When interacting with the machine judgment, humans impaired the system’s performance.</p></li>
<li><p>Humans’ truth bias might explain these findings.</p></li>
</ul>
<p>[Another entry in the ‘clinical checklist’ literature: simple statistical models can outperform human judgment and actually be made worse by human input overriding them.]</p>
<p><strong>Background</strong>: Deception detection is a prevalent problem for security practitioners. With a need for more large-scale approaches, automated methods using machine learning have gained traction. However, detection performance still implies considerable error rates. Findings from different domains suggest that hybrid human-machine integrations could offer a viable path in detection tasks.</p>
<p><strong>Method</strong>: We collected a corpus of truthful and deceptive answers about participants’ autobiographical intentions (<em>n</em> = 1,640) and tested whether a combination of supervised machine learning and human judgment could improve deception detection accuracy. Human judges were presented with the outcome of the automated credibility judgment of truthful or deceptive statements. They could either fully overrule it (hybrid-overrule condition) or adjust it within a given boundary (hybrid-adjust condition).</p>
<p><strong>Results</strong>: The data suggest that in neither of the hybrid conditions did the human judgment add a meaningful contribution. Machine learning in isolation identified truth-tellers and liars with an overall accuracy of 69%. Human involvement through hybrid-overrule decisions brought the accuracy back to chance level. The hybrid-adjust condition did not improve deception detection performance. The decision-making strategies of humans suggest that the truth bias—the tendency to assume the other is telling the truth—could explain the detrimental effect.</p>
<p><strong>Conclusions</strong>: The current study does not support the notion that humans can meaningfully add the deception detection performance of a machine learning system. All data are available at <a href="https://osf.io/45z7e/">OSF</a>.</p>
<p>[<strong>Keywords</strong>: deception detection, machine learning, decision-making, truth bias, deceptive intentions]</p>
<p>…<strong>2.3. Machine learning classification</strong>: We used supervised machine learning to classify truthful and deceptive answers. We extracted the following features from the responses and reported the classification metrics for each.</p>
<p>Linguistic Inquiry and Word Count (LIWC) variables: we used all 93 categories of the LIWC as a feature set. The LIWC aims to measure linguistic and psycholinguistic processes through a word count lexicon approach (Fornaciari & Poesio 2013; Kleinberg et al 2018; Pérez-Rosas & Mihalcea 2014).</p>
<p>Relative part-of-speech (POS) frequencies: we extracted the POS of each word and calculated the frequency of each relative to the overall number of words. The POS tags were extracted according to the <a href="https://universaldependencies.org/u/pos/">Universal Dependencies scheme</a>.</p>
<p>For the classification exercises, we used 80% of the data (<em>n</em> = 1,313) for training and tested the final algorithm on the held-out 20% (<em>n</em> = 327). On the training set, we used 10× cross-validation with 10 repetitions and used a vanilla <a href="https://en.wikipedia.org/wiki/Random_forests">random forest</a> as the learning algorithm.</p>
---
/doc/psychology/cognitive-bias/illusion-of-depth/2022-bonezzi.pdf
The Human Black-Box: The Illusion of Understanding Human Better Than Algorithmic Decision-Making
Andrea Bonezzi, Massimiliano Ostinelli, Johann Melzner
2022-02-10
2022-06-15
[("doi","10.1037/xge0001181")]
ai philosophy/ethics psychology/cognitive-bias/illusion-of-depth
<p>As algorithms increasingly replace human decision-makers, concerns have been voiced about the black-box nature of algorithmic decision-making. These concerns raise an apparent paradox. In many cases, human decision-makers are just as much of a black-box as the algorithms that are meant to replace them. Yet, the inscrutability of human decision-making seems to raise fewer concerns.</p>
<p>We suggest that one of the reasons for this paradox is that people foster an illusion of understanding human better than algorithmic decision-making, when in fact, both are black-boxes. We further propose that this occurs, at least in part, because people project their own intuitive understanding of a decision-making process more onto other humans than onto algorithms, and as a result, believe that they understand human better than algorithmic decision-making, when in fact, this is merely an illusion.</p>
<p>[<strong>Keywords</strong>: understanding, projection, illusion of explanatory depth, algorithms, algorithm aversion]</p>
<p>…Drawing on this literature, we propose that because people are more similar to other humans than to algorithms (Epley et al 2007; Gray et al 2007; Haslam 2006), they are more likely to rely on their own understanding of a decision-making process to intuit how other humans, versus algorithms, make decisions. The privileged—yet often misguided—view that projection provides into other humans’ minds can foster the illusion of understanding human better than algorithmic decision processes, when in fact, both are black-boxes.</p>
<p>6 experiments test our hypotheses. <strong>Experiments 1A–C</strong> test whether people foster a stronger illusion of understanding human than algorithmic decision-making across 3 domains. <strong>Experiments 2, 3, and 4</strong> (in <a href="/doc/psychology/cognitive-bias/illusion-of-depth/2022-bonezzi-supplement-xge0001181.docx">online supplemental materials</a> <strong>E</strong>) test whether projection accounts for this phenomenon in each domain. <strong>Experiment 4</strong> also tests how illusory understanding affects trust in human versus algorithmic decisions. New York University and Winthrop University Institutional Review Board (IRB) approved the experimental protocols. In all experiments, the sample size was predetermined, and a sensitivity power analysis (Faul et al 2009) indicated that small-to-medium size effects could be detected with a power of 0.80. We report all conditions, manipulations, measures, and data exclusions. Questions to screen for bots and avoid differential dropout were included at the beginning of each experiment (see online supplemental materials <strong>B</strong>).</p>
---
/doc/philosophy/mind/2022-ujhelyi.pdf
Would You Pass the Turing Test? Influencing Factors of the Turing Decision
Adrienn Ujhelyi, Flora Almosdi, Alexandra Fodor
2022-04-27
2022-07-23
[("doi","10.31820/pt.31.1.9")]
ai philosophy/mind psychology/cognitive-bias
<p>We aimed to contribute to the emerging field of human-computer interaction by revealing some of the cues we use to distinguish humans from machines. Maybe the most well-known method of inquiry in artificial intelligence is the <a href="https://en.wikipedia.org/wiki/Turing_test">Turing test</a>, in which participants have to judge whether their conversation partner is either a machine or human.</p>
<p>In 2 studies, we used the Turing test as an opportunity to reveal the factors influencing Turing decisions. In our first study, we created a situation similar to a Turing test: a written, online conversation and we hypothesized that if the other entity expresses a view different from ours, we might think that they are a member of another group, in this case, the group of machines. We measured the attitude of the participants (<em>n</em> = 100) before the conversation, then we compared the attitude difference of the partners to their Turing decision.</p>
<p>…The results of the Turing decision revealed that 42% of participants (<em>n</em> = 42) thought that their conversational partner was a chatbot…Our results showed a <a href="https://en.wikipedia.org/wiki/Statistical_significance">statistically-significant</a> relationship between the Turing decision and the attitude difference of the conversation partners. The more difference between attitudes correlated with a more likely decision of the other being a machine.</p>
<p>With our second study, we wanted to widen the range of variables and we also wanted to measure their effect in a more controlled, systematic way. In this case, our participants (<em>n</em> = 632) were exposed to an excerpt of a manipulated Turing test transcription. The dialogues were modified based on 8 variables: humour, grammar, activity, the similarity of attitude, coherence, leading the conversation, emoji use, and the appearance of the interface.</p>
<p>Our results showed that logical answers, proper grammar, and similar attitudes predicted the Turing decisions best. We also found that more people considered mistaking a computer for a human being a bigger problem than vice versa and this choice was greatly influenced by the participants’ negative attitudes towards robots.</p>
<p>Besides contributing to our understanding of our attitude toward machines, our study has also shed light on the consequences of dehumanization.</p>
<p>[<strong>Keywords</strong>: Turing test, artificial intelligence, attitude, social psychology]</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="/doc/ai/nn/2020-hernandezorallo.pdf" class="backlink-not id-not">Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too</a></p></li>
<li><p><a href="/doc/psychology/cognitive-bias/illusion-of-depth/2022-bonezzi.pdf" class="backlink-not id-not">The Human Black-Box: The Illusion of Understanding Human Better Than Algorithmic Decision-Making</a></p></li>
<li><p><a href="https://arxiv.org/abs/2109.07958" class="backlink-not id-not">TruthfulQA: Measuring How Models Mimic Human Falsehoods</a></p></li>
<li><p><a href="https://arxiv.org/abs/2009.03300" class="backlink-not id-not" title="‘MMLU: Measuring Massive Multitask Language Understanding’, Hendrycks et al 2020">Measuring Massive Multitask Language Understanding</a></p></li>
</ul>
</div>
---
https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2022.941163/full
Who made the paintings: Artists or artificial intelligence? The effects of identity on liking and purchase intention
Li Gu, Yong Li
2022-08-05
2023-07-04
[("doi","10.3389/fpsyg.2022.941163")]
ai culture psychology/cognitive-bias psychology/collecting
<p>Investigating how people respond to and view <a href="https://en.wikipedia.org/wiki/Artificial_intelligence_art">AI-created artworks</a> is becoming increasingly crucial as the technology’s current application spreads due to its affordability and accessibility. This study examined how AI art alters people’s evaluation, purchase intention, and collection intention toward Chinese-style and Western-style paintings, and whether art expertise plays a role.</p>
<p><strong>Study 1</strong> recruited participants without professional art experience (non-experts) and found that those who made the paintings would not change their liking rating, purchase intention, and collection intention. In addition, they showed ingroup preference, favoring Chinese-style relative to Western-style paintings, in line with previous evidence on cultural preference in <a href="https://en.wikipedia.org/wiki/Empirical_aesthetics">empirical esthetics</a>.</p>
<p><strong>Study 2</strong> further investigated the modulation effect of art expertise. Art experts evaluated less favorably (less liking, lower purchase, and collection intentions) AI-generated paintings relative to artist-made paintings, while non-experts showed no preference. There was also an interaction effect between the author and the art expertise and interaction between the painting style and the art expertise.</p>
<p>Collectively, the findings in this study showed that who made the art matters for experts and that the painting style affects esthetic evaluation and ultimate reception of it. These results would also provide implications for AI-art practitioners.</p>
<div class="aux-links-append see-also-append collapse"> <p><strong>See Also</strong>:</p> <ul> <li><p><a href="/doc/ai/nn/gan/2021-gangadharbatla.pdf" class="backlink-not id-not">The Role of AI Attribution Knowledge in the Evaluation of Artwork</a></p> </li>
<li><p><a href="https://www.sciencedirect.com/science/article/pii/S0747563223000584" class= "backlink-not id-not">Defending humankind: Anthropocentric bias in the appreciation of AI art</a></p> </li>
<li><p><a href="/doc/psychology/novelty/1982-sluckin.pdf" class="backlink-not id-not">Some experimental studies of familiarity and liking</a></p> </li>
<li><p><a href="/doc/psychology/cognitive-bias/illusion-of-depth/2022-bonezzi.pdf" class="backlink-not id-not">The Human Black-Box: The Illusion of Understanding Human Better Than Algorithmic Decision-Making</a></p> </li> </ul> </div>
---
https://www.sciencedirect.com/science/article/pii/S0747563223000584
Defending humankind: Anthropocentric bias in the appreciation of AI art
Kobe Millet, Florian Buehler, Guanzhong Du, Michail Kokkoris
2023-02-14
2023-03-02
[("doi","10.1016/j.chb.2023.107707")]
ai culture psychology/cognitive-bias
<ul> <li><p>AI-made art poses an ontological threat to anthropocentric worldviews that artistic creativity is uniquely human.</p></li>
<li><p>Humans perceive the same artwork as less creative and awe-inspiring when it is labeled as AI-made (vs. human made).</p></li>
<li><p>The bias is more pronounced among people with stronger anthropocentric creativity beliefs.</p></li> </ul> <p>We argue that recent advances of artificial intelligence (AI) in the domain of art (eg. music, painting) pose a profound ontological threat to anthropocentric worldviews because they challenge one of the last frontiers of the human uniqueness narrative: artistic creativity.</p>
<p>4 experiments (<em>n</em> = 1,708), including a high-powered <a href= "https://en.wikipedia.org/wiki/Preregistration_(science)#Registered_reports">preregistered</a> experiment, consistently reveal:</p>
<p>a pervasive bias against AI-made artworks and shed light on its psychological underpinnings. The same artwork is preferred less when labeled as AI-made (vs. human-made) because it is perceived as less creative and subsequently induces less awe, an emotional response typically associated with the esthetic appreciation of art. These effects are more pronounced among people with stronger anthropocentric creativity beliefs (ie. who believe that creativity is a uniquely human characteristic).</p>
<p>Systematic depreciation of AI-made art (assignment of lower creative value, suppression of emotional reactions) appears to serve a shaken anthropocentric worldview whereby creativity is exclusively reserved for humans.</p>
<p>[<strong>Keywords</strong>: anthropocentrism, speciesism, artificial intelligence (AI), computational creativity, computer-generated art, awe]</p>
---
https://arxiv.org/abs/2308.04445
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc
Doug Lenat, Gary Marcus
2023-07-31
2023-09-05
ai philosophy/logic philosophy/ontology reinforcement-learning/imitation-learning
<p>Generative AI, the most popular current approach to AI, consists of <a href="!W">large language models</a> (LLMs) that are trained to produce outputs that are <em>plausible</em>, but not necessarily <em>correct</em>. Although their abilities are often uncanny, they are lacking in aspects of reasoning, leading LLMs to be less than completely trustworthy. Furthermore, their results tend to be both unpredictable and uninterpretable.</p>
<p>We [<a href="!W">Douglas Lenat</a> & <a href="!W">Gary Marcus</a>] lay out 16 desiderata for future AI [explanation · deduction · induction · analogy · <a href="!W">abductive reasoning</a> · <a href="!W">theory of mind</a> · quantifier-fluency · modal-fluency · defeasibility · pro/con arguments · contexts · meta-knowledge/reasoning · explicitly-ethical · sufficient-speed · sufficiently-lingual/embodied · broadly-deeply-knowledgeable], and discuss an alternative approach to AI which could theoretically address many of the limitations associated with current approaches: AI educated with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge. Even long arguments produced this way can be both trustworthy and interpretable, since the full step-by-step line of reasoning is always available, and for each step the provenance of the knowledge used can be documented and audited. There is however a catch: if the logical language is expressive enough to fully represent the meaning of anything we can say in English, then the inference engine runs much too slowly. That’s why symbolic AI systems typically settle for some fast but much less expressive logic, such as <a href="https://en.wikipedia.org/wiki/Knowledge_graphs">knowledge graphs</a>.</p>
<p>We describe how one AI system, <a href="https://en.wikipedia.org/wiki/Cyc"><strong>Cyc</strong></a>, has developed ways to overcome that tradeoff and is able to reason in higher order logic in real time.</p>
<p>We suggest that any trustworthy general AI will need to hybridize the approaches, the LLM approach and more formal approach, and lay out a path to realizing that dream.</p> <hr> <p>…<strong>3. How Cyc handles some of these 16 elements</strong>: [see also <a href="/doc/ai/1991-winograd.pdf#page=7" title="‘Oral History Interview with Terry Allen Winograd (OH #237) § SHRDLU’, Winograd & Norberg 1991 (page 7)">SHRDLU</a> & <a href="/doc/ai/1991-schank.pdf" title="‘Where’s the AI?’, Schank 1991">Schank’s critique</a>] Large Language Models such as OpenAI’s <a href= "https://openai.com/blog/chatgpt/">ChatGPT</a> and Google’s BARD and Microsoft’s Bing/Sydney represent one pole in potential architectural space, in which essentially neither knowledge nor reasoning is explicit. Cycorp’s CYC represents the opposite pole: a 4-decade-long 50-person project to explicitly articulate the tens of millions of pieces of common sense and general models of the world that people have, represent those in a form that computers can reason over mechanically, and develop reasoning algorithms which, working together, are able to do that reasoning sufficiently quickly.</p>
<p>…For that reason, Cycorp has persevered, unwilling to sacrifice the expressiveness of the logic involved, and its Cyc AI is the culmination of that effort. Over the past 4 decades it has developed <em>engineering solutions</em> to manage each of the 16 elements described in <a href="https://arxiv.org/pdf/2308.04445.pdf#page=3">§2</a>. Some are elegant; others simply required a lot of elbow grease—eg. for item 16, Cyc’s <a href="https://en.wikipedia.org/wiki/Knowledge_base">knowledge base</a> (KB) comprises tens of millions of hand-authored assertions, almost all of which are general “rule of thumb” axioms (most of the “facts” Cyc knows are ones that it can just look up on the internet much as a person would, or access in databases where the schema of the database has been aligned to Cyc’s <a href="!W">ontology</a>.)…Tens of millions of assertions and rules were written and entered into Cyc’s KB by hand, but it is important to realize that even just performing <em>one step</em> of reasoning, Cyc could generate tens of billions of new conclusions that follow from what it already knows.</p>
<p>…decades ago the Cyc ontologists pointed Cyc to the <a href="https://en.wikipedia.org/wiki/Linnaean_taxonomy">Linnaean taxonomy</a> system and added just one single rule to the Cyc KB of the form: For any 2 <a href="https://en.wikipedia.org/wiki/Taxons">taxons</a>, if one is not a specialization of the other (through a series of sub-taxon links), assume they are disjoint. This type of generalization was critical to have the KB-building enterprise take only (!) a few million person-hours of effort rather than a trillion. To speed up the educating process, the Cyc team developed tools that made use of the existing Cyc KB (and reasoners) to help the ontologists who were introspecting to unearth and formalize nuggets of common sense. For example, it was important that they <em>generalize</em> each nugget before entering into Cyc’s knowledge base…A software tool helps the ontologist semi-automatically walk up the hierarchy of types from “horse” to “physical object”, and from “leg” to “physical part”…Even with those Cyc-powered KB-building tools, it has taken a coherent team of logicians and programmers 4 decades, 2,000 person-years, to produce the current Cyc KB. Cycorp’s experiments with larger-sized teams generally showed a net decrease in total productivity, due to lack of coherence, deeper reporting chains, and so on.</p>
<p>…As we have already remarked, symbolic AI systems other than Cyc often approach speed very differently. Many limit their KB (which is what led to stove-piped <a href="!W">Expert Systems</a>), or they limit the expressiveness of their representation of knowledge, or they limit the types of operations that can be performed on those (ie. they adopt a more limited, but faster, logic.) Eg. they choose knowledge graphs or <a href="!W">propositional logic</a> which does not allow quantifiers, variables, modals, and so on…Cyc also allows multiple redundant representations for each assertion, and in practice it uses multiple redundant, specialized reasoners—Heuristic Level (HL) modules –each of which is much faster than general theorem-proving when it applies.</p>
<p>By 1989, Cyc had 20 such high-level reasoners (<a href="https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/download/842/760" title= "Cyc: A midterm report">Lenat & Guha 1990</a>); today it has over 1,100. For example, one fairly general high-level reasoner is able to quickly handle transitive relations, such as “Is Austin physically located in the Milky Way galaxy?”…That reasoner was extremely general; a more specific one handles the case where a problem can be represented as <em>n</em> linear equations in <em>n</em> unknowns. A fairly narrow Heuristic-Level module recognizes quadratic equations and applies the <a href= "https://en.wikipedia.org/wiki/Quadratic_formula">quadratic formula</a>. Another relatively narrow Heuristic-Level module recognizes a chemical equation that needs <a href= "https://en.wikipedia.org/wiki/Stoichiometry">balancing</a> and calls on a domain-specific algorithm to do that.</p>
<p>When confronted with a problem, all 1,100 reasoners are effectively brought to bear, and the most efficient one which can make progress on it does so, and the process repeats, over and over again, the “conversation” among the 1,100 Heuristic-Level modules continuing until the problem has been solved, or resource bounds have been exceeded (and work suspended on it). In principle there is always the general resolution <a href="https://en.wikipedia.org/wiki/Automated_theorem_proving">theorem prover</a> with its hand raised in the back of the room, so to speak: it always thinks it could apply, but it is the last resort to be called on because it always takes so long to return an answer…Something we don’t often talk about: We noticed empirically that the general theorem-proving reasoner actually took so long that over a million queries in a row that called on it, as a last resort, just timed out. Going back farther, we saw that that had happened for decades. So, about one decade ago, we quietly turned the general theorem prover off, so it never gets called on! The only impact is that Cyc sometimes runs a bit faster, since it no longer has that attractive but useless nuisance available to it.</p>
<p>When Cyc is applied to a new practical application, it is sometimes the case that even when it gets the right answers, its current battery of reasoners turns out to be unacceptably slow. In that case, the Cyc team shows to the human experts (who are able to perform the task quickly) Cyc’s step by step reasoning chain and asks them to introspect and explain to us how they are able to avoid such cumbersome reasoning. The result is often a new special-purpose Heuristic-Level reasoner, possibly with its own new, redundant representation which enables it to run so quickly. This is what happened, eg. for a chemical reaction application, where a special notation for chemical equations enabled a special-purpose algorithm to balance them quickly.</p>
<p>The trap the Cyc team fell into was assuming that there would be just one representation for knowledge, in which case it would have to be <em>n</em><sup>th</sup>-order <a href="https://en.wikipedia.org/wiki/Predicate_calculus">predicate calculus</a> (HOL) with <a href= "https://en.wikipedia.org/wiki/Modal_logic">modals</a>, because it is the only one expressive enough for all AGI reasoning purposes. Committing to that meant vainly searching for some fast general-purpose reasoning algorithm over HOL, which probably doesn’t exist. To escape from the trap the Cyc team built up a huge arsenal of redundant representations and redundant reasoners, such that in any given situation one of the efficient reasoners is usually able to operate on one of those representations and make some progress toward a solution. The entire arsenal is then brought to bear again, recursively, until the original problem has been fully dealt with or given up on.</p>
<p>[It sounds like the reason the Cyc company still exists is to serve as an expert-systems/knowledge-graph consultancy/body-shop for its customers, while masquerading as an AI/software company (similar to Palantir).]</p>
---
/doc/ai/anime/2015-saito.pdf
<code>Illustration2Vec</code>: a semantic vector representation of illustrations
Saito Masaki, Yusuke Matsui
2015-11-02
2019-09-30
[("doi","10.1145/2820903.2820907")]
ai/anime ai/nn/cnn ai/nn/retrieval
<p>Referring to existing illustrations helps novice drawers to realize their ideas.</p>
<p>To find such helpful references from a large image collection, we first build a semantic vector representation of illustrations by training <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional neural networks</a>.</p>
<p>As the proposed vector space correctly reflects the semantic meanings of illustrations, users can efficiently search for references with similar attributes. Besides the search with a single query, a <em>semantic morphing</em> algorithm that searches the intermediate illustrations that gradually connect two queries is proposed.</p>
<p>Several experiments were conducted to demonstrate the effectiveness of our methods.</p>
<p>[<strong>Keywords</strong>: illustration, CNNs, visual similarity, search, embedding]</p>
---
https://google.github.io/cartoonset/
Cartoon Set
Aḿelie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy
2018-07-10
2021-06-28
ai/anime ai/dataset ai/nn/gan
<p><strong>Cartoon Set</strong> is a collection of random, 2D cartoon avatar images. The cartoons vary in 10 artwork categories, 4 color categories, and 4 proportion categories, with a total of ~10<sup>13</sup> possible combinations. We provide sets of 10k and 100k randomly chosen cartoons and labeled attributes.</p>
<p>…We’ve also used the dataset to research cross-domain image translation: <a href="https://arxiv.org/abs/1711.05139#google" title="Royer et al 2017">“XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings”</a>.</p>
<figure>
<img src="/doc/ai/nn/gan/stylegan/2017-royer-cartoonset-randomsamples.png" alt="[6 random samples from the ‘Cartoon Set’ of synthetic cartoon avatar faces, developed by Google.]" />
<figcaption aria-hidden="true">[6 random samples from the ‘Cartoon Set’ of synthetic cartoon avatar faces, developed by Google.]</figcaption>
</figure>
---
https://arxiv.org/abs/1809.00946
Twin-GAN: Unpaired Cross-Domain Image Translation with Weight-Sharing GANs
Jerry Li
2018-08-26
2021-04-03
[("doi","10.48550/arXiv.1809.00946")]
ai/anime ai/nn/gan
<p>We present a framework [<strong>Twin-GAN</strong>] for translating unlabeled images from one domain into analog images in another domain.</p>
<p>We employ a progressively growing skip-connected encoder-generator structure and train it with a <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> loss for realistic output, a cycle consistency loss for maintaining same-domain translation identity, and a semantic consistency loss that encourages the network to keep the input semantic features in the output.</p>
<p>We apply our framework on the task of translating face images, and show that it is capable of learning semantic mappings for face images with no supervised one-to-one image mapping.</p>
---
/doc/ai/anime/2019-yu.pdf
Generating Furry Face Art from Sketches using a GAN
Andrew Yu
2019-12-01
2019-12-01
ai/anime ai/nn/gan ai/nn/vae
<p>I generate <a href="https://en.wikipedia.org/wiki/Furry_fandom">furry</a> face artwork from color sketches.</p>
<p>The sketches are procedurally generated from a data set of furry artwork. Sketches are translated back into artwork via a Generative Adversarial Network.</p>
<p>I implement the <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> using a <a href="https://en.wikipedia.org/wiki/U-Net">U-Net</a> autoencoder with encoder-decoder skip connections and experiment with adding adaptive instance normalization into upsampling layers.</p>
<p>The results show effective mapping of training and dev set sketches back to their input style. However, the model does not perform as effectively on novel user sketches and often fails to add stochastic textures like hair details.</p>
---
https://arxiv.org/abs/1912.11570
SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks
Alex Lamb, Sherjil Ozair, Vikas Verma, David Ha
2019-12-25
2021-04-11
[("doi","10.48550/arXiv.1912.11570")]
ai/anime
<p>Deep networks have achieved excellent results in perceptual tasks, yet their ability to generalize to variations not seen during training has come under increasing scrutiny. In this work, we focus on their ability to have invariance towards the presence or absence of details. For example, humans are able to watch cartoons, which are missing many visual details, without being explicitly trained to do so. As another example, <a href="https://en.wikipedia.org/wiki/3D_rendering">3D rendering software</a> is a relatively recent development, yet people are able to understand such rendered scenes even though they are missing details (consider a film like <em>Toy Story</em>). The failure of machine learning algorithms to do this indicates a substantial gap in generalization between human abilities and the abilities of deep networks.</p>
<p>We propose a dataset that will make it easier to study the detail-invariance problem concretely. We produce a concrete task for this: SketchTransfer, and we show that state-of-the-art domain transfer algorithms still struggle with this task.</p>
<p>The state-of-the-art technique which achieves over 95% on <a href="https://en.wikipedia.org/wiki/MNIST_database">MNIST</a> → <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37648.pdf">SVHN</a> transfer only achieves 59% accuracy on the SketchTransfer task, which is much better than random (11% accuracy) but falls short of the 87% accuracy of a classifier trained directly on labeled sketches. This indicates that this task is approachable with today’s best methods but has substantial room for improvement.</p>
---
/doc/ai/anime/2020-mobini.pdf
StarGAN Based Facial Expression Transfer for Anime Characters
Majid Mobini, Foad Ghaderi
2020-01-02
2020-01-02
[("doi","10.1109/csicc49403.2020.9050061")]
ai/anime ai/nn/gan
<p>Human facial expression transfer has been well explored using Generative Adversarial Networks. Also, in case of anime style images, several successful attempts have been made to generate high-quality anime face images using <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> approach. However, the task of anime facial expression transfer is not well studied yet due to the lack of a clean labeled anime dataset.</p>
<p>We address this issue from both data and model perspectives, by providing a clean labeled anime dataset and leveraging the use of the <a href="https://arxiv.org/abs/1711.09020" title="‘StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation’, Choi et al 2017">StarGAN</a> image-to-image translation framework. Our collected dataset consists of about 5k high-quality anime face images including 5 major emotions collected from online image boards. We preprocessed our dataset by <a href="https://openaccess.thecvf.com/content_ECCVW_2018/papers/11133/Li_CARN_Convolutional_Anchored_Regression_Network_for_Fast_and_Accurate_Single_ECCVW_2018_paper.pdf" title="‘CARN: Convolutional Anchored Regression Network for Fast and Accurate Single Image Super-Resolution’, Li et al 2018">CARN super-resolution technique</a> to improve quality of the images, and applied tuned StarGAN model to learn the mapping of an input anime image with arbitrary expression to the target expression.</p>
<p>We evaluate our work by visually comparing the output translated results with the baseline model. Moreover, we provide a quantitative analysis of our proposed approach by computing the <a href="https://en.wikipedia.org/wiki/Confusion_matrix">confusion matrix</a> of expression transfer accuracy.</p>
<p>[<strong>Keywords</strong>: facial expression transfer, unpaired image translation, Generative Adversarial Network, anime generation]</p>
---
https://github.com/arfafax/E621-Face-Dataset
E621 Face Dataset
Arfafax
2020-02-18
2021-06-22
ai/anime ai/nn/gan
<p>Tool for getting the dataset of cropped faces from [<a href="https://en.wikipedia.org/wiki/Furry_fandom">furry</a> booru] <a href="https://e621.net/posts">e621</a> (NSFW; <a href="https://en.wikifur.com/wiki/E621">WikiFur description</a>). It was created by training a <a href="https://arxiv.org/abs/1804.02767" title="‘YOLOv3: An Incremental Improvement’, Redmond & Farhadi 2018">YOLOv3</a> network on annotated facial features from about 1500 faces.</p>
<p>The total dataset includes ~186k faces. Rather than provide the cropped images, this repo contains CSV files with the <a href="https://en.wikipedia.org/wiki/Minimum_bounding_box">bounding boxes</a> of the detected features from my trained network, and a script to download the images from e621 and crop them based on these CSVs.</p>
<p>The CSVs also contain a subset of tags, which could potentially be used as labels to train a conditional <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>.</p>
<table style="width:99%;">
<colgroup>
<col style="width: 30%" />
<col style="width: 69%" />
</colgroup>
<thead>
<tr class="header header header">
<th style="text-align: left;">File</th>
<th style="text-align: left;"></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">get_faces.py</td>
<td style="text-align: left;">Script for downloading base e621 files and cropping them based on the coordinates in the CSVs.</td>
</tr>
<tr class="even">
<td style="text-align: left;">faces_s.csv</td>
<td style="text-align: left;">CSV containing URLs, bounding boxes, and a subset of the tags for 90k cropped faces with rating=safe from e621.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">features_s.csv</td>
<td style="text-align: left;">CSV containing the bounding boxes for 389k facial features with rating=safe from e621.</td>
</tr>
<tr class="even">
<td style="text-align: left;">faces_q.csv</td>
<td style="text-align: left;">CSV containing URLs, bounding boxes, and a subset of the tags for 96k cropped faces with rating=questionable from e621.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">features_q.csv</td>
<td style="text-align: left;">CSV containing the bounding boxes for 400k facial features with rating=questionable from e621.</td>
</tr>
</tbody>
</table>
<figure>
<img src="/doc/ai/nn/gan/2020-arfa-e621facedataset-cleaned-9x9previewgrid.jpg" alt="Preview grid" />
<figcaption aria-hidden="true">Preview grid</figcaption>
</figure>
---
https://www.equestriadaily.com/2020/03/pony-voice-event-what-people-forced.html
Pony Voice Event—What People Forced Ponies to Say!
Equestria Daily
2020-03-24
2021-12-18
ai/anime ai/music anime/my-little-pony
<p>[Compilation of 29 videos & ~25 audio files created using a new neural network service for voice synthesis of various characters, particularly <em>My Little Pony</em> characters.</p>
<p>Scripts include everything from every <em>Star Wars</em> opening to F1 car racing commentary to the <a href="https://en.wikipedia.org/wiki/Who%27s_on_First%3F">“Who’s on First?”</a> Abbott & Costello comedy dialogue to 1 hour recitation of π to the <em>Dune</em> Litany Against Fear & <em>Blade Runner</em> Tears in the Rain monologue.]</p>
---
https://arxiv.org/abs/2011.01007
The 2020s Political Economy of Machine Translation
Steven Weber
2020-11-02
2021-04-24
[("doi","10.48550/arXiv.2011.01007")]
ai/anime economics/automation
<p>This paper explores the hypothesis that the diversity of human languages, right now a barrier to ‘interoperability’ in communication and trade, will become substantially less of a barrier as machine translation technologies are deployed over the next several years. I argue that machine translation will become the 2020’s analogy for ideas to what container shipping did for goods trade in the second half of the 20<sup>th</sup> century. But as with container shipping or railroads in the 19<sup>th</sup> century, this new boundary-breaking technology does not reduce all boundaries equally, and it creates new challenges for the distribution of ideas and thus for innovation and economic growth. How we develop, license, commercialize, and deploy machine translation will be a critical determinant of its impact on trade, political coalitions, diversity of thought and culture, and the distribution of wealth.</p>
<p>[<strong>Keywords</strong>: trade, globalization, machine translation, inequality, productivity]</p>
---
/doc/design/typography/2022-chung.pdf
Fast Text Placement Scheme for ASCII Art Synthesis
Moonjun Chung, Taesoo Kwon
2022-04-14
2022-12-15
[("doi","10.1109/ACCESS.2022.3167567")]
ai/anime cs/algorithm design/typography
<p>This study suggests an algorithm that creates <a href="https://en.wikipedia.org/wiki/ASCII_art">ASCII art</a> from a binary image. Our approach aims to generate <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> art in a short period of time using multi-threaded local optimizations for a text placement method instead of a global optimization.</p>
<p>To generate ASCII art from various images, the original image is first converted into a thinned black and white image suitable for generating ASCII art. We then extract the pixel orientations from the input image and introduce a character similarity scheme that considers these orientations. We also propose a novel text placement algorithm to complete ASCII art in a swift manner. Our final system suggested here can generate ASCII art using a variety of proportional fonts.</p>
<p>The results of the experiments of this study show that the suggested system can generate ASCII art much faster than existing state-of-the-art techniques using <a href="!W">proportional fonts</a>.</p>
---
/doc/ai/anime/2022-sankalpa.pdf
Using Generative Adversarial Networks for Conditional Creation of Anime Posters
Donthi Sankalpa, Jayroop Ramesh, Imran Zualkernan
2022-07-28
2022-11-17
[("doi","10.1109/IAICT55358.2022.9887491")]
ai/anime ai/nn/gan
<p>Japanese animation, known as anime, has become one of the most accessible forms of entertainment across globe. Recent advances in generative adversarial networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>) and deep learning have contributed greatly to multiple interesting applications in the domain of anime, particularly in face generation, <a href="https://arxiv.org/abs/2109.03910#google" title="‘A Recipe For Arbitrary Text Style Transfer with Large Language Models’, Reif et al 2021">style transfer</a>, and colorization. However, there are no existing implementations for generating composite anime posters with a genre accompaniment prompt.</p>
<p>This work proposes a novel application of genre to anime poster generation conditioned on <a href="https://arxiv.org/abs/1810.04805#google" title="‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, Devlin et al 2018">BERT</a>-tokenized binary genre-tags of light-hearted or heavy-hearted categorized based on the thematic subject content of the medium. A dataset of 9,840 image with genre tags and synopses was constructed by scraping MyAnimeList.</p>
<p>The conditional Deep Convolution GAN with Spectral Normalization produced the best posters, achieving the quantitative scores of <a href="https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance">FID</a>: 90.17, average IS: 3.505, 1KNN with PSNR: 0.445 across inter-label discernibility, and FID: 166.4, across genuine versus generated poster distinguishability. [Terrible. This is far worse than <a href="https://thisanimedoesnotexist.ai/" title="‘This Anime Does Not Exist.ai (TADNE)’, Nearcyan et al 2021">TADNE</a> or <a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a> Danbooru2019, or even Make Girls.moe, never mind <a href="https://waifulabs.com/">Waifu Labs</a> or Crypko or Stable Diffusion…] The primary contribution of this work is to present results outlining the feasibility of various GAN architectures in synthesizing controllable and complex composite anime posters.</p>
<p>The larger implication of this project is to provide an introductory approach showing the promise of a creativity assistant for authors, artists, and animators, where they can simply enter a key phrase representing a concept they have in mind, to generate a baseline idea as an initial phase.</p>
<p>[<strong>Keywords</strong>: anime, computer generated art, deep learning, generative adversarial networks, image generation]</p>
---
/doc/ai/anime/2022-huang-2.pdf
Deep learning for image colorization: Current and future prospects
Shanshan Huang, Xin Jin, Qian Jiang, Li Liu
2022-09-01
2022-09-01
[("doi","10.1016/j.engappai.2022.105006")]
ai/anime
<p>Image colorization, as an essential problem in computer vision (CV), has attracted an increasing amount of researchers attention in recent years, especially deep learning-based image colorization techniques (DLIC).</p>
<p>Generally, most recent image colorization methods can be regarded as knowledge-based systems because they are usually trained by big datasets. Unlike the existing reviews, this paper adopts a unique deep learning-based perspective to review the latest progress in image colorization techniques systematically and comprehensively.</p>
<p>In this paper, a comprehensive review of recent DLIC approaches from algorithm classification to existing challenges is provided to facilitate researchers’ in-depth understanding of DLIC. In particular, we review DLIC algorithms from various perspectives, including color space, network structure, <a href="https://en.wikipedia.org/wiki/Loss_function">loss function</a>, level of automation, and application fields. Furthermore, other important issues are discussed, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we discuss several open issues of image colorization and outline future research directions.</p>
<p>This survey can serve as a reference for researchers in image colorization and related fields.</p>
<p>[<strong>Keywords</strong>: image colorization, deep learning, <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional neural network</a>, Generative Adversarial Network, <a href="https://arxiv.org/abs/1706.03762#google" title="‘Attention Is All You Need’, Vaswani et al 2017">Transformer</a>]</p>
---
/doc/ai/anime/2023-shen.pdf
Overview of Cartoon Face Generation
Xianfa Shen, Sujie Lei, Jiansong Liu
2023-02-24
2023-05-29
[("doi","10.1109/ITNEC56291.2023.10082673")]
ai/anime ai/nn/gan
<p>As a computer art form, animation stylization of human face images is widely used in every aspect of daily life. From children’s animation education books to classic animation works, animation style attracts children with a very charming art form, but also promotes the interest of children in exploring. In addition, animation production is also widely used in online games. Scenes and characters in games are often in the style of animation, which can reduce the production cost of games and the memory requirements of computers.</p>
<p>In social entertainment, there are more and more people turning self-portrait into animation style as their social network profile pictures, which can not only attract the attention of others, but also protect the privacy of the portrait. However, drawing cartoon portraits by hand is very laborious and requires a lot of artistic skills, even with photo editing software. Therefore, how to perform face cartoonization efficiently and with high quality is an important issue.</p>
<p>This article describes the development overview of face cartoonization; gives the application of face cartoonization; lists the commonly used datasets of face cartoonization; and discusses the methods of face cartoonization from 3 aspects; finally, the research direction and development trend of face cartoonization are prospected from the aspects of dataset, generated image definition, generated image details and model training time.</p>
<p>[<strong>Keywords</strong>: face cartoonization, unsupervised learning, deep learning, generative adversarial network]</p>
---
/doc/ai/anime/2023-ho.pdf
Abstraction-Perception Preserving Cartoon Face Synthesis
Sy-Tuyen Ho, Manh-Khanh Ngo Huu, Thanh-Danh Nguyen, Nguyen Phan, Vinh-Tiep Nguyen, Thanh Duc Ngo, Duy-Dinh Le, Tam V. Nguyen
2023-03-22
2023-06-02
[("doi","10.1007/s11042-023-14853-9")]
ai/anime ai/dataset ai/nn/gan
<p>Portrait cartoonization aims at translating a <a href="https://en.wikipedia.org/wiki/Portrait_photography">portrait image</a> to its cartoon version, which guarantees two conditions, namely, reducing textural details and synthesizing cartoon facial features (eg. big eyes or line-drawing nose).</p>
<p>To address this problem, we propose a two-stage training scheme based on <a href= "https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>, which is powerful for stylization problems.</p>
<p>The abstraction stage with a novel abstractive loss is used to reduce textural details. Meanwhile, the perception stage is adopted to synthesize cartoon facial features.</p>
<p>To comprehensively evaluate the proposed method and other state-of-the-art methods for portrait cartoonization, we contribute a new challenging large-scale dataset named <strong>CartoonFace10K</strong>…we access the website of <a href= "https://www.anime-planet.com/">Anime-Planet</a> to collect 50,245 images of cartoon characters. We use gender filter for separately collecting male and female characters. Secondly, a cartoon facial detector is leveraged to remove non-human images, eg. the character of Doraemon or Pikachu. Following the removal stage, there are 14,021 cartoon human face images. To enhance the confidence, we do a manual check across all images to ensure the purity of our proposed dataset. [They don't explain why <a href="/crop#portraits">Danbooru Portraits</a> or one of the many other face-crop datasets wouldn't've worked…]</p>
<p>In addition, we find that the popular metric <a href="https://en.wikipedia.org/wiki/Fr%C3%A9chet_Inception_Distance">FID</a> focuses on the target style yet ignores the preservation of the input image content. We thus introduce a novel metric <strong>FISI</strong>, which compromises <a href="https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance">FID</a> and <a href="https://en.wikipedia.org/wiki/Structural_similarity">SSIM</a> to focus on both target features and retaining input content.</p>
<p>Quantitative and qualitative results demonstrate that our proposed method outperforms other state-of-the-art methods.</p>
---
/doc/ai/anime/danbooru/2018-zhang-2.pdf
Two-stage Sketch Colorization
Lvmin Zhang, Chengze Li, Tientsin Wong, Yi Ji, Chunping Liu
2018
2019-10-01
[("doi","10.1145/3272127.3275090")]
ai/anime/danbooru ai/nn/gan
<p>[<a href="https://github.com/lllyasviel/style2paints" title="‘Style2Paints GitHub repository’, Zhang et al 2018">style2paints</a> v3] Sketch or line art colorization is a research field with substantial market demand. Different from photo colorization which strongly relies on texture information, sketch colorization is more challenging as sketches may not have texture. Even worse, color, texture, and gradient have to be generated from the abstract sketch lines.</p>
<p>In this paper, we propose a semi-automatic learning-based framework to colorize sketches with proper color, texture as well as gradient. Our framework consists of two stages. In the first drafting stage, our model guesses color regions and splashes a rich variety of colors over the sketch to obtain a color draft. In the second refinement stage, it detects the unnatural colors and artifacts, and try to fix and refine the result. Comparing to existing approaches, this two-stage design effectively divides the complex colorization task into two simpler and goal-clearer subtasks. This eases the learning and raises the quality of colorization.</p>
<p>Our model resolves the artifacts such as water-color blurring, color distortion, and dull textures.</p>
<p>We build an interactive software based on our model for evaluation. Users can iteratively edit and refine the colorization.</p>
<p>We evaluate our learning model and the interactive system through an extensive user study.</p>
<p>Statistics shows that our method outperforms the state-of-art techniques and industrial applications in several aspects including, the visual quality, the ability of user control, user experience, and other metrics.</p>
---
https://github.com/lllyasviel/style2paints
Style2Paints GitHub repository
Lvmin Zhang, Chengze Li, Tien-Tsin Wong, Yi Ji, Chunping Liu
2018-05-04
2021-06-25
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>Github repo with screenshot samples of <em>style2paints</em>, a neural network for colorizing anime-style illustrations (trained on Danbooru2018), with or without user color hints, which was available as an online service in 2018. <a href="https://github.com/lllyasviel/style2paints" title="‘Style2Paints GitHub repository’, Zhang et al 2018">style2paints</a> produces high-quality colorizations often on par with human colorizations. Many examples can be seen on <a href="https://x.com/iliiliiillillii">Twitter</a> or the <a href="https://en.wikipedia.org/wiki/Github">Github</a> repo:</p>
<figure>
<img src="/doc/ai/nn/gan/stylegan/anime/2018-zhang-style2paints-colorizationexample-hana.jpg" alt="Example style2paints colorization of a character from Prison School" />
<figcaption aria-hidden="true">Example style2paints colorization of a character from <em>Prison School</em></figcaption>
</figure>
<p>style2paints has been described in more detail in <a href="/doc/ai/anime/danbooru/2018-zhang-2.pdf">“Two-Stage Sketch Colorization”</a>, Zhang et al 2018:</p>
<blockquote>
<p>Sketch or line art colorization is a research field with substantial market demand. Different from photo colorization which strongly relies on texture information, sketch colorization is more challenging as sketches may not have texture. Even worse, color, texture, and gradient have to be generated from the abstract sketch lines. In this paper, we propose a semi-automatic learning-based framework to colorize sketches with proper color, texture as well as gradient. Our framework consists of two stages. In the first drafting stage, our model guesses color regions and splashes a rich variety of colors over the sketch to obtain a color draft. In the second refinement stage, it detects the unnatural colors and artifacts, and try to fix and refine the result.Comparing to existing approaches, this two-stage design effectively divides the complex colorization task into two simpler and goal-clearer subtasks. This eases the learning and raises the quality of colorization. Our model resolves the artifacts such as water-color blurring, color distortion, and dull textures.</p>
<p>We build an interactive software based on our model for evaluation. Users can iteratively edit and refine the colorization. We evaluate our learning model and the interactive system through an extensive user study. Statistics shows that our method outperforms the state-of-art techniques and industrial applications in several aspects including, the visual quality, the ability of user control, user experience, and other metric</p>
</blockquote>
---
https://www.thiswaifudoesnotexist.net/
ThisWaifuDoesNotExist.net
Gwern
2019-02-19
2022-05-05
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p><a href="https://www.thiswaifudoesnotexist.net/"><code>ThisWaifuDoesNotExist.net</code></a> (<a href="/twdne" title="‘This Waifu Does Not Exist’, Gwern 2019">TWDNE</a>) is a static website which uses JS to display random <a href="/face" title="‘Making Anime Faces With StyleGAN’, Gwern 2019">anime faces generated by StyleGAN</a> neural networks, along with <a href="/gpt-3" title="‘GPT-3 Creative Fiction’, Gwern 2020">GPT-3</a>-generated anime plot summaries. Followups: <a href="https://thisponydoesnotexist.net/" title="‘This Pony Does Not Exist’, Arfafax 2020">“This Pony Does Not Exist” (TPDNE)</a>/<a href="https://www.thisfursonadoesnotexist.com/" title="‘This Fursona Does Not Exist’, Arfafax 2020">“This Fursona Does Not Exist” (TFDNE)</a>/<a href="https://thisanimedoesnotexist.ai/" title="‘This Anime Does Not Exist.ai (TADNE)’, Nearcyan et al 2021">“This Anime Does Not Exist” (TADNE)</a>.</p>
<figure>
<img src="/doc/ai/nn/gan/stylegan/anime/thiswaifudoesnotexist.png" alt="A screenshot of “This Waifu Does Not Exist” (TWDNE) showing a random StyleGAN-generated anime face and a random GPT-3 text sample conditioned on anime keywords/phrases." />
<figcaption aria-hidden="true">A screenshot of “This Waifu Does Not Exist” (TWDNE) showing a random <a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a>-generated anime face and a random <a href="https://arxiv.org/abs/2005.14165#openai" title="‘GPT-3: Language Models are Few-Shot Learners’, Brown et al 2020">GPT-3</a> text sample conditioned on anime keywords/phrases.</figcaption>
</figure>
---
/doc/ai/anime/danbooru/2019-dai.pdf
SAN: Second-Order Attention Network for Single Image Super-Resolution
Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, Lei Zhang
2019-06-15
2019-09-14
[("doi","10.1109/CVPR.2019.01132")]
ai/anime/danbooru
<p>Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance. However, most of the existing <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">CNN</a>-based SISR methods mainly focus on wider or deeper architecture design, neglecting to explore the feature correlations of intermediate layers, hence hindering the representational power of CNNs.</p>
<p>To address this issue, in this paper, we propose a <strong>second-order attention network</strong> (SAN) for more powerful feature expression and feature correlation learning. Specifically, a novel trainable second-order channel attention (SOCA) module is developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Furthermore, we present a non-locally enhanced residual group (NLRG) structure, which not only incorporates non-local operations to capture long-distance spatial contextual information, but also contains repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations.</p>
<p>Experimental results demonstrate the superiority of our SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.</p>
---
https://waifulabs.com/
Waifu Labs
Sizigi Studios
2019-07-23
2021-11-13
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>[<a href="https://waifulabs.com/">Waifu Labs</a> is an interactive website for generating (1024px?) anime faces using a customized <a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a> trained on <a href="/danbooru2021" title="‘Danbooru2021: A Large-Scale Crowdsourced & Tagged Anime Illustration Dataset’, Gwern 2015">Danbooru2018</a>. Similar to <a href="https://www.artbreeder.com/">Artbreeder</a>, it supports face exploration and face editing, and at the end, a user can purchase prints of a particular face.]</p>
<p>We taught a world-class artificial intelligence how to draw anime. All the drawings you see were made by a non-human artist! Wild, right? It turns out machines love waifus almost as much as humans do.</p>
<p>We proudly present the next chapter of human history: lit waifu commissions from the world’s smartest AI artist. In less than 5 minutes, the artist learns your preferences to make the perfect waifu just for you.</p>
---
https://waifulabs.com/blog/ax
How we built the Waifu Vending Machine
Sizigi Studios
2019-07-23
2021-11-13
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>[Design company Sizigi Studios discusses their creation of <a href="https://waifulabs.com/">Waifu Labs</a>, a deep learning <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> website for interactive generation of anime faces, and their experience running a prototype of it at the <a href="!W">Anime Expo</a> (AX) 2019 anime convention in Los Angeles, where it was a popular exhibit. Laptops were setup attached to printers in an enclosed booth, making a ‘vending machine’.</p>
<p>Challenges included: no electricity outlets and no WiFi. Multiple laptops were cycled through as batteries wore out, while a gaming PC ran the neural network GANs locally rather than in a cloud VM. The failed WiFi was bypassed by using a smartphone as a local router.</p>
<p>Further bugs were discovered in the code while many users waited in a long line. but were fixed in time, and the waifu vending machine was a success.]</p>
---
https://www.engineeringletters.com/issues_v27/issue_3/EL_27_3_01.pdf
Anime Sketch Coloring with Swish-gated Residual U-net and Spectrally Normalized GAN (SSN-GAN)
Gang Liu, Xin Chen, Yanzhong Hu
2019-08-12
2021-02-24
ai/anime/danbooru ai/nn/gan
<p>Anime sketch coloring is to fill various colors into the black-and-white anime sketches and finally obtain the color anime images. Recently, anime sketch coloring has become a new research hotspot in the field of deep learning. In anime sketch coloring, generative adversarial networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) have been used to design appropriate coloring methods and achieved some results. However, the existing methods based on GANs generally have low-quality coloring effects, such as unreasonable color mixing, poor color gradient effect.</p>
<p>In this paper, an efficient anime sketch coloring method using swish-gated residual <a href="https://en.wikipedia.org/wiki/U-Net">U-Net</a> (SGRU) and spectrally normalized GAN (SNGAN) has been proposed to solve the above problems.</p>
<p>The proposed method is called spectrally normalized GAN with swish-gated residual U-Net (<strong>SSN-GAN</strong>). In SSN-GAN, SGRU is used as the generator. SGRU is the U-Net with the proposed swish layer and swish-gated residual blocks (SGBs). In SGRU, the proposed swish layer and swish-gated residual blocks (SGBs) effectively filter the information transmitted by each level and improve the performance of the network. The perceptual loss and the per-pixel loss are used to constitute the final loss of SGRU. The discriminator of SSN-GAN uses spectral normalization as a stabilizer of training of GAN, and it is also used as the perceptual network for calculating the perceptual loss. SSN-GAN can automatically color the sketch without providing any coloring hints in advance and can be easily <a href="/doc/cs/end-to-end-principle/index">end-to-end</a> trained.</p>
<p>Experimental results show that our method performs better than other state-of-the-art coloring methods, and can obtain colorful anime images with higher visual quality.</p>
---
https://www.artbreeder.com/
Artbreeder
Joel Simon
2019-09-09
2021-11-22
ai/anime/danbooru ai/nn/gan/biggan ai/nn/gan/stylegan/anime
<p>[<a href="https://www.artbreeder.com/">Artbreeder</a> is an interactive <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> generator website. Originally named “Ganbreeder” and providing only the 256px <a href="https://arxiv.org/abs/1809.11096#deepmind" title="‘BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis’, Brock et al 2018">BigGAN</a> generator, it now provides a variety of BigGAN & <a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a> models, including the anime portrait StyleGAN model. (It is more general than the similar <a href="https://waifulabs.com/">Waifu Labs</a>, but my anime model is not as good.)</p>
<p>Users can generate random samples and explore slight variants of them to gradually explore the “latent space” and find interesting images, but they can also edit images more directly, upload existing images to find the most similar image produced by the model, etc. A popular website, it has generated >56m images from September 2019 to January 2020.]</p>
---
/doc/ai/anime/danbooru/2019-lee-2.pdf
Unpaired Sketch-to-Line Translation via Synthesis of Sketches
Gayoung Lee, Dohyun Kim, Youngjoon Yoo, Dongyoon Han, Jung-Woo Ha, Jaehyuk Chang
2019-11-17
2019-11-17
[("doi","10.1145/3355088.3365163")]
ai/anime/danbooru
<p>Converting hand-drawn sketches into clean line drawings is a crucial step for diverse artistic works such as comics and product designs. Recent data-driven methods using deep learning have shown their great abilities to automatically simplify sketches on raster images. Since it is difficult to collect or generate paired sketch and line images, lack of training data is a main obstacle to use these models.</p>
<p>In this paper, we propose a training scheme that requires only unpaired sketch and line images for learning sketch-to-line translation. To do this, we first generate realistic paired sketch and line images from unpaired sketch and line images using rule-based line augmentation and unsupervised texture conversion. Next, with our synthetic paired data, we train a model for sketch-to-line translation using supervised learning.</p>
<p>Compared to unsupervised methods that use cycle consistency losses, our model shows better performance at removing noisy strokes. We also show that our model simplifies complicated sketches better than models trained on a limited number of handcrafted paired data.</p>
---
/doc/ai/anime/danbooru/2019-ye.pdf
Interactive Anime Sketch Colorization with Style Consistency via a Deep Residual Neural Network
Ru-Ting Ye, Wei-Li Wang, Ju-Chin Chen, Kawuu W. Lin
2019-11-21
2019-11-21
[("doi","10.1109/taai48200.2019.8959911")]
ai/anime/danbooru ai/nn/gan
<p>Anime line sketch colorization is to fill a variety of colors the anime sketch, to make it colorful and diverse. The coloring problem is not a new research direction in the field of deep learning technology. Because of coloring of the anime sketch does not have fixed color and we can’t take texture or shadow as reference, so it is difficult to learn and have a certain standard to determine whether it is correct or not. After generative adversarial networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) was proposed, some used GANs to do coloring research, achieved some result, but the coloring effect is limited.</p>
<p>This study proposes a method use deep <a href="https://arxiv.org/abs/1512.03385#microsoft" title="‘Deep Residual Learning for Image Recognition’, He et al 2015">residual network</a>, and adding discriminator to network, that expect the color of colored images can consistent with the desired color by the user and can achieve good coloring results.</p>
---
/doc/ai/anime/danbooru/2020-akita.pdf
Deep-Eyes: Fully Automatic Anime Character Colorization with Painting of Details on Empty Pupils
Kenta Akita, Yuki Morimoto, Reiji Tsuruno
2020-01-01
2020-01-01
[("doi","10.2312/egs.20201023")]
ai/anime/danbooru ai/nn/cnn ai/nn/gan
<p>[<a href="/doc/ai/anime/danbooru/2020-akita-2.pdf" title="‘Colorization of Line Drawings with Empty Pupils’, Akita et al 2020b">followup</a>] Many studies have recently applied deep learning to the automatic colorization of line drawings. However, it is difficult to paint empty pupils using existing methods because the networks are trained with pupils that have edges, which are generated from color images using image processing. Most actual line drawings have empty pupils that artists must paint in.</p>
<p>In this paper, we propose a novel network model that transfers the pupil details in a reference color image to input line drawings with empty pupils.</p>
<p>We also propose a method for accurately and automatically coloring eyes. In this method, eye patches are extracted from a reference color image and automatically added to an input line drawing as color hints using our eye position estimation network.</p>
---
/doc/ai/anime/danbooru/2020-zhelonkin.pdf
Training Effective Model for Real-Time Detection of NSFW Photos and Drawings
Dmirty Zhelonkin, Nikolay Karpov
2020-02-02
2020-02-02
[("doi","10.1007/978-3-030-39575-9_31")]
ai/anime/danbooru
<p>Convolutional Neural Networks (<a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">CNN</a>) show state-of-the-art results on a variety of tasks.</p>
<p>The paper presents the scheme how to prepare highly accurate (97% on the test set) and fast CNN for detection not suitable or safe for work (<a href="https://en.wikipedia.org/wiki/Not_safe_for_work">NSFW</a>) images. The present research focuses on investigating questions concerning identifying NSFW pictures with nudity by neural networks. One of the main features of the present work is considering the NSFW class of images not only in terms of natural human nudity but also include cartoons and other drawn pictures containing obscene images of the primary sexual characteristics. Another important considered issue is collecting representative dataset for the problem.</p>
<p>The research includes the review of existing nudity detection methods, which are provided by traditional machine learning techniques and quite new neural networks based approaches. In addition, several important problems in NSFW pictures filtering are considered in the study.</p>
<p>[<strong>Keywords</strong>: image recognition, pattern recognition, Not Suitable or Safe For Work, Convolutional Neural Networks, pornography detection]</p>
---
/doc/ai/anime/danbooru/2020-su.pdf
Avatar Artist Using GAN [CS230]
Hui Su, Jin Fang
2020-04-12
2020-04-12
ai/anime/danbooru ai/nn/gan
<p>[CS230 class project; <a href="https://github.com/diandiansu/anime-artist">code</a>] Human sketches can be expressive and abstract at the same time. Generating anime avatars from simple or even bad face drawing is an interesting area. Lots of related work has been done such as auto-coloring sketches to anime or transforming real photos to anime. However, there aren’t many interesting works yet to show how to generate anime avatars from just some simple drawing input.</p>
<p>In this project, we propose using <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> to generate anime avatars from sketches.</p>
---
https://arxiv.org/abs/2004.07543
Classification Representations Can be Reused for Downstream Generations
Saisubramaniam Gopalakrishnan, Pranshu Ranjan Singh, Yasin Yazici, Chuan-Sheng Foo, Vijay Chandrasekhar, ArulMurugan Ambikapathi
2020-04-16
2021-04-14
[("doi","10.48550/arXiv.2004.07543")]
ai/anime/danbooru
<p>Contrary to the convention of using supervision for class-conditioned <em>generative modeling</em>, this work explores and demonstrates the feasibility of a learned supervised representation space trained on a discriminative classifier for the <em>downstream</em> task of sample generation.</p>
<p>Unlike generative modeling approaches that aim to <em>model</em> the manifold distribution, we directly <em>represent</em> the given data manifold in the classification space and leverage properties of <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> space representations to generate new representations that are guaranteed to be in the same class. Interestingly, such representations allow for controlled sample generations for any given class from existing samples and do not require enforcing <a href="https://en.wikipedia.org/wiki/Prior_probability">prior distribution</a>.</p>
<p>We show that these latent space representations can be smartly manipulated (using convex combinations of <em>n</em> samples, <em>n</em> ≥ 2) to yield meaningful sample generations. Experiments on image datasets of varying resolutions demonstrate that downstream generations have higher classification accuracy than existing conditional generative models while being competitive in terms of <a href="https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance">FID</a>.</p>
---
/doc/ai/anime/danbooru/2020-koyama.pdf
System for searching illustrations of anime characters focusing on degrees of character attributes
Yuta Koyama, Tomohiro Fukuhara, Koichi Yamada, Hironobu Abe, Hidetaka Masuda
2020-06-01
2020-06-01
[("doi","10.1117/12.2566509")]
ai/anime/danbooru ai/nn/retrieval
<p>Keyword searches are generally used when searching for illustrations of anime characters. However, keyword searches require that the illustrations be tagged first. The illustration information that a tag can express is limited, and it is difficult to search for a specific illustration. We focus on character attributes that are difficult to express using tags. We propose a new search method using the vectorization degrees of character attributes. Accordingly, we first created a character illustration dataset limited to the hair length attribute and then trained a <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional neural network</a> (CNN) to extract the features. We obtained a [illustration2vec Danbooru] vector representation of the character attributes using CNN and confirmed that they could be used for new searches.</p>
<p>[<strong>Keywords</strong>: Illustration search, Anime characters, Vectorization, CNN]</p>
---
/doc/ai/anime/danbooru/2020-ko.pdf
SickZil-Machine (SZMC): A Deep Learning Based Script Text Isolation System for Comics Translation
U-Ram Ko, Hwan-Gue Cho
2020-08-14
2020-08-14
[("doi","10.1007/978-3-030-57058-3_29")]
ai/anime/danbooru
<p>The translation of comics (and Manga) involves removing text from a foreign comic images and typesetting translated letters into it. The text in comics contain a variety of deformed letters drawn in arbitrary positions, in complex images or patterns. These letters have to be removed by experts, as computationally erasing these letters is very challenging. Although several classical image processing algorithms and tools have been developed, a completely automated method that could erase the text is still lacking.</p>
<p>Therefore, we propose an image processing framework called ‘<strong>SickZil-Machine</strong>’ (SZMC) that automates the removal of text from comics. SZMC works through a two-step process. In the first step, the text areas are segmented at the pixel level. In the second step, the letters in the segmented areas are erased and inpainted naturally to match their surroundings.</p>
<p>SZMC exhibited a notable performance, employing deep learning based <a href="https://en.wikipedia.org/wiki/Image_segmentation">image segmentation</a> and image inpainting models.</p>
<p>To train these models, we constructed 285 pairs of original comic pages, a text area-mask dataset, and a dataset of 31,497 comic pages. We identified the characteristics of the dataset that could improve SZMC performance.</p>
<p><a href="https://github.com/KUR-creative/SickZil-Machine">SZMC code is available</a>.</p>
<p>[<strong>Keywords</strong>: comics translation, deep learning, image manipulation system]</p>
---
/doc/ai/anime/danbooru/2020-dragan.pdf
Demonstrating that dataset domains are largely linearly separable in the feature space of common CNNs
Matthew R. Dragan
2020-09-01
2020-09-01
ai/anime/danbooru ai/nn/cnn
<p>Deep <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional neural networks</a> (DCNNs) have achieved state-of-the-art performance on a variety of tasks. These high-performing networks require large and diverse training datasets to facilitate generalization when extracting high-level features from low-level data. However, even with the availability of these diverse datasets, DCNNs are not prepared to handle all the data that could be thrown at them.</p>
<p>One major challenges DCNNs face is the notion of forced choice. For example, a network trained for image classification is configured to choose from a predefined set of labels with the expectation that any new input image will contain an instance of one of the known objects. Given this expectation it is generally assumed that the network is trained for a particular domain, where domain is defined by the set of known object classes as well as more implicit assumptions that go along with any data collection. For example, some implicit characteristics of the <a href="https://arxiv.org/abs/1409.0575" title="‘ImageNet Large Scale Visual Recognition Challenge’, Russakovsky et al 2014">ImageNet</a> dataset domain are that most images are taken outdoors and the object of interest is roughly in the center of the frame. Thus the domain of the network is defined by the training data that is chosen.</p>
<p>Which leads to the following key questions:</p>
<ol>
<li><p>Does a network know the domain it was trained for? and</p></li>
<li><p>Can a network easily distinguish between in-domain and out-of-domain images?</p></li>
</ol>
<p>In this thesis it will be shown that for several widely used public datasets and commonly used neural networks, the answer to both questions is <strong>yes</strong>. The presence of a simple method of differentiating between in-domain and out-of-domain cases has substantial implications for work on domain adaptation, transfer learning, and model generalization.</p>
---
https://openreview.net/forum?id=1Fqg133qRaI
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Anonymous
2020-09-28
2021-09-09
ai/anime/danbooru ai/nn/gan/data-augmentation
<p>A computational-efficient <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> for few-shot hi-fi image dataset (converge on single GPU with few hours’ training, on <100 1024px images).</p>
<p>Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose a light-weight GAN structure that gains superior quality on 1,024×1,024px resolution. Notably, the model converges from scratch with just a few hours of training on a single RTX-2080 GPU; and has a consistent performance, even with less than 100 training samples. 2 technique designs constitute our work, a skip-layer channel-wise excitation module and a self-supervised discriminator trained as a feature-encoder. With 13 datasets covering a wide variety of image domains, we show our model’s robustness and its superior performance compared to the state-of-the-art <a href="https://arxiv.org/abs/1912.04958#nvidia" title="‘Analyzing and Improving the Image Quality of StyleGAN’, Karras et al 2019">StyleGAN2</a>.</p>
<p>[<strong>Keywords</strong>: deep learning, generative model, image synthesis, few-shot learning, generative adversarial network, <a href="https://en.wikipedia.org/wiki/Semi-supervised_learning">self-supervised learning</a>, unsupervised learning]</p>
---
https://openreview.net/forum?id=6puCSjH3hwA#snapchat
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian, Jian Ren, Menglei Chai, Kyle Olszewski, Xi Peng, Dimitris N. Metaxas, Sergey Tulyakov
2020-09-28
2021-09-09
ai/anime/danbooru ai/nn/gan
<p>Image and video synthesis are closely related areas aiming at generating content from noise. While rapid progress has been demonstrated in improving image-based models to handle large resolutions, high-quality renderings, and wide variations in image content, achieving comparable video generation results remains problematic.</p>
<p>We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> space of a pre-trained and fixed image generator. Not only does such a framework render high-resolution videos, but it also is an order of magnitude more computationally efficient.</p>
<p>We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled. With such a representation, our framework allows for a broad range of applications, including content and motion manipulation. Furthermore, we introduce a new task, which we call cross-domain video synthesis, in which the image and motion generators are trained on disjoint datasets belonging to different domains. This allows for generating moving objects for which the desired video data is not available.</p>
<p>Extensive experiments on various datasets demonstrate the advantages of our methods over existing video generation techniques. <a href="https://github.com/snap-research/MoCoGAN-HD">Code will be released</a>.</p>
<p>[<strong>Keywords</strong>: high-resolution video generation, <a href="https://arxiv.org/abs/2010.05113" title="‘Contrastive Representation Learning: A Framework and Review’, Le-Khac et al 2020">contrastive</a> learning, cross-domain video generation]</p>
---
/doc/ai/anime/danbooru/2020-zheng-2.pdf
Learning from the Past: Meta-Continual Learning with Knowledge Embedding for Jointly Sketch, Cartoon, and Caricature Face Recognition
Wenbo Zheng, Lan Yan, Fei-Yue Wang, Chao Gou
2020-10
2020-10
[("doi","10.1145/3394171.3413892")]
ai/anime/danbooru reinforcement-learning/meta-learning/continual-learning
<p>This paper deals with a challenging task of learning from different modalities by tackling the difficulty problem of jointly face recognition between abstract-like sketches, cartoons, caricatures and real-life photographs. Due to the substantial variations in the abstract faces, building vision models for recognizing data from these modalities is an extremely challenging.</p>
<p>We propose a novel framework termed as <em>Meta-Continual Learning with Knowledge Embedding</em> to address the task of jointly sketch, cartoon, and caricature face recognition. In particular, we firstly present a deep relational network to capture and memorize the relation among different samples. Secondly, we present the construction of our knowledge graph that relates image with the label as the guidance of our meta-learner. We then design a knowledge embedding mechanism to incorporate the knowledge representation into our network. Thirdly, to mitigate catastrophic forgetting, we use a meta-continual model that updates our <a href="!W" title="Ensemble learning">ensemble</a> model and improves its prediction accuracy. With this meta-continual model, our network can learn from its past. The final classification is derived from our network by learning to compare the features of samples.</p>
<p>Experimental results demonstrate that our approach achieves substantially higher performance compared with other state-of-the-art approaches.</p>
---
https://github.com/zymk9/Yet-Another-Anime-Segmenter
Yet-Another-Anime-Segmenter
zymk9
2020-10-08
2021-06-27
ai/anime/danbooru
<p>Instance <a href="https://en.wikipedia.org/wiki/Image_segmentation">segmentation</a> for anime characters based on <a href="https://arxiv.org/abs/2003.05664" title="‘Conditional Convolutions for Instance Segmentation’, Tian et al 2020">CondInst</a>, using the implementation from <a href="https://github.com/aim-uofa/AdelaiDet">AdelaiDet</a> and <a href="https://github.com/facebookresearch/detectron2">detectron2</a>.</p>
<p>Many thanks to <a href="https://github.com/jerryli27/AniSeg">AniSeg</a> created by jerryli27, as part of the dataset originates from the segmentation data provided in <a href="https://github.com/jerryli27/AniSeg#about-the-models">this repo</a>. The rest of the dataset is retrieved from <a href="!W">Pixiv</a> and manually annotated.</p>
---
https://github.com/zyddnys/RegDeepDanbooru
RegDeepDanbooru: Yet another Deep Danbooru project
zyddnys
2020-10-11
2021-06-27
ai/anime/danbooru ai/nn/sparsity/low-precision
<p>But based on <a href="https://arxiv.org/abs/2003.13678#facebook" title="‘RegNet: Designing Network Design Spaces’, Radosavovic et al 2020">RegNetY-8G</a>, relative lightweight, designed to run fast on GPU. Training is done using mixed precision training on a single RTX2080Ti for 3 weeks. Some code are from https://github.com/facebookresearch/pycls</p>
<p>Most of the 1,000 tags is character tags (see <code>danbooru_labels.txt</code>, line 1536), primarily <a href="https://en.wikipedia.org/wiki/Touhou_Project">Touhou</a> characters (<code>hakurei_reimu</code>, <code>cirno</code> etc). Half is Danbooru attribute tags (face, eye, hair etc).</p>
---
/doc/ai/anime/danbooru/2020-cao-2.pdf
Deep learning-based classification of the polar emotions of ‘moe’-style cartoon pictures
Qinchen Cao, Weilin Zhang, Yonghua Zhu
2020-10-12
2020-10-12
[("doi","10.26599/TST.2019.9010035")]
ai/anime/danbooru ai/nn/cnn
<p>The cartoon animation industry has developed into a huge industrial chain with a large potential market involving games, digital entertainment, and other industries. However, due to the coarse-grained classification of cartoon materials, cartoon animators can hardly find relevant materials during the process of creation. The polar emotions of cartoon materials are an important reference for creators as they can help them easily obtain the pictures they need. Some methods for obtaining the emotions of cartoon pictures have been proposed, but most of these focus on expression recognition. Meanwhile, other emotion recognition methods are not ideal for use as cartoon materials.</p>
<p>We propose a deep learning-based method to classify the polar emotions of the cartoon pictures of the “Moe” drawing style. According to the expression feature of the cartoon characters of this drawing style, we recognize the facial expressions of cartoon characters and extract the scene and facial features of the cartoon images. Then, we correct the emotions of the pictures obtained by the expression recognition according to the scene features. Finally, we can obtain the polar emotions of corresponding picture.</p>
<p>We designed a dataset and performed verification tests on it, achieving 81.9% experimental accuracy. The experimental results prove that our method is competitive.</p>
<p>[<strong>Keywords</strong>: cartoon, emotion classification, deep learning]</p>
---
/doc/ai/anime/danbooru/2020-wu.pdf
Watermarking Neural Networks with Watermarked Images
Hanzhou Wu, Gen Liu, Yuwei Yao, Xinpeng Zhang
2020-10-13
2020-10-13
[("doi","10.1109/TCSVT.2020.3030671")]
ai/anime/danbooru
<p>Watermarking neural networks is a quite important means to protect the intellectual property (IP) of neural networks. In this paper, we introduce a novel digital watermarking framework suitable for deep neural networks that output images as the results, in which any image outputted from a watermarked neural network must contain a certain watermark. Here, the host neural network to be protected and a watermark-extraction network are trained together, so that, by optimizing a combined <a href="https://en.wikipedia.org/wiki/Loss_function">loss function</a>, the trained neural network can accomplish the original task while embedding a watermark into the outputted images. This work is totally different from previous schemes carrying a watermark by network weights or classification labels of the trigger set. By detecting watermarks in the outputted images, this technique can be adopted to identify the ownership of the host network and find whether an image is generated from a certain neural network or not. We demonstrate that this technique is effective and robust on a variety of image processing tasks, including image colorization, super-resolution, image editing, semantic <a href="https://en.wikipedia.org/wiki/Image_segmentation">segmentation</a> and so on.</p>
<p>[<strong>Keywords</strong>: watermarking, neural networks, deep learning, image transformation, information hiding]</p>
---
https://papers.nips.cc/paper/2020/file/1cfa81af29c6f2d8cacb44921722e753-Paper.pdf
Network-to-Network Translation with Conditional Invertible Neural Networks
Robin Rombach, Patrick Esser, Björn Omme
2020-10-22
2021-09-14
ai/anime/danbooru ai/nn/gan/biggan
<p>[uses <a href="/crop#danbooru2019-portraits" title="‘Anime Crop Datasets: Faces, Figures, & Hands § Danbooru2019 Portraits’, Branwen et al 2020">Danbooru2019 Portraits</a>] Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation. Recent work suggests that the power of these massive models is captured by the representations they learn. Therefore, we seek a model that can relate between different existing representations and propose to solve this task with a conditionally invertible network.</p>
<p>This network demonstrates its capability by (1) providing generic transfer between diverse domains, (2) enabling controlled content synthesis by allowing modification in other domains, and (3) facilitating diagnosis of existing representations by translating them into interpretable domains such as images. Our domain transfer network can translate between fixed representations without having to learn or finetune them. This allows users to use various existing domain-specific expert models from the literature that had been trained with extensive computational resources.</p>
<p>Experiments on diverse conditional image synthesis tasks, competitive image modification results and experiments on image-to-image and text-to-image generation demonstrate the generic applicability of our approach. For example, we translate between <a href="https://arxiv.org/abs/1810.04805#google" title="‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, Devlin et al 2018">BERT</a> and <a href="https://arxiv.org/abs/1809.11096#deepmind" title="‘BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis’, Brock et al 2018">BigGAN</a>, state-of-the-art text and image models to provide text-to-image generation, which neither of both experts can perform on their own.</p>
---
/doc/ai/anime/danbooru/2020-akita-2.pdf
Colorization of Line Drawings with Empty Pupils
K. Akita, Y. Morimoto, R. Tsuruno
2020-11-24
2020-11-24
[("doi","10.1111/cgf.14171")]
ai/anime/danbooru
<p>[note: near-identical to <a href="/doc/ai/anime/danbooru/2020-akita.pdf" title="‘Deep-Eyes: Fully Automatic Anime Character Colorization with Painting of Details on Empty Pupils’, Akita et al 2020">Akita et al 2020a</a>] Many studies have recently applied deep learning to the automatic colorization of line drawings. However, it is difficult to paint empty pupils using existing methods because the <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional neural network</a> are trained with pupils that have edges, which are generated from color images using image processing. Most actual line drawings have empty pupils that artists must paint in.</p>
<p>In this paper, we propose a novel network model that transfers the pupil details in a reference color image to input line drawings with empty pupils.</p>
<p>We also propose a method for accurately and automatically colorizing eyes. In this method, eye patches are extracted from a reference color image and automatically added to an input line drawing as color hints using our pupil position estimation network.</p>
---
/doc/ai/anime/danbooru/2020-lee-2.pdf
Automatic Colorization of High-resolution Animation Style Line-art based on Frequency Separation and Two-Stage Generator
Yeongseop Lee, Seongjin Lee
2020-11-27
2020-11-27
[("doi","10.5370/KIEEP.2020.69.4.275")]
ai/anime/danbooru ai/nn/gan
<p>In this paper, we use Generative Adversarial Networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>) to address the industrial needs of auto colorization of line arts which takes enormous amount of manual labor. Auto-colorization method used in Image-to-Image conversion based on GAN has received a lot of attention due to its promising results.</p>
<p>In this paper, we present a solution to not only colorize the line art but also transform the low resolution out image to match the resolution of the input image through 2 generators and frequency separation method. A high frequency components are extracted from the line, then 2 generators are used to colorize the image in low resolution. The high frequency component is merged with low resolution image to produce the high resolution colorized image. The resolution of final output image matches the resolution of original image while preserving the texture of the input image, whereas the other schemes reduce the output image to 512 pixels.</p>
<p>We performed visual and qualitative evaluation using <a href="https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance">FID</a>, <a href="https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio">PSNR</a>, and <a href="https://en.wikipedia.org/wiki/Structural_similarity">SSIM</a>. The FID Score of the proposed method is better than the base model by about 4 (proposed: 47.87 and base model 51.64). PNSR and SSIM of the high-resolution images are also better than the base model. PSNR and SSIM of base model is 13.01 and 0.72 whereas the proposed is 20.77 and 0.86, respectively.</p>
<p>[<strong>Keywords</strong>: machine learning, Generative Adversarial Network, line arts colorization, image generation]</p>
---
/doc/ai/anime/danbooru/2021-golyadkin.pdf
Semi-automatic Manga Colorization Using Conditional Adversarial Networks
Maksim Golyadkin, Ilya Makarov
2021
2021
[("doi","10.1007/978-3-030-72610-2_17")]
ai/anime/danbooru
<p>Manga colorization is time-consuming and hard to automate.</p>
<p>In this paper, we propose a conditional adversarial deep learning approach for semi-automatic manga images colorization. The system directly maps a tuple of grayscale manga page image and sparse color hint constructed by the user to an output colorization. High-quality colorization can be obtained in a fully automated way, and color hints allow users to revise the colorization of every panel independently.</p>
<p>We collect a dataset of manually colorized and grayscale manga images for training and evaluation. To perform supervised learning, we construct synthesized monochrome images from colorized. Furthermore, we suggest a few steps to reduce the domain gap between synthetic and real data. Their influence is evaluated both quantitatively and qualitatively. Our method can achieve even better results by fine-tuning with a small number of grayscale manga images of a new style. The code is available at <a href="https://github.com/qweasdd/manga-colorization"><code>github.com</code></a>.</p>
<p>[<strong>Keywords</strong>: generative adversarial networks, manga colorization, interactive colorization]</p>
---
/doc/ai/anime/danbooru/2021-fang.pdf
Stylized-Colorization for Line Arts
Tzu-Ting Fang, Minh Duc Vo, Akihiro Sugimoto, Shang-Hong Lai
2021-01-10
2021-01-10
ai/anime/danbooru ai/nn/gan
<p>We address a novel problem of stylized-colorization which colorizes a given line art using a given coloring style in text.</p>
<p>This problem can be stated as multi-domain image translation and is more challenging than the current colorization problem because it requires not only capturing the illustration distribution but also satisfying the required coloring styles specific to anime such as lightness, shading, or saturation.</p>
<p>We propose a <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>-based <a href="/doc/cs/end-to-end-principle/index">end-to-end</a> model for stylized-colorization where the model has one generator and two discriminators. Our generator is based on the <a href="https://en.wikipedia.org/wiki/U-Net">U-Net</a> architecture and receives a pair of a line art and a coloring style in text as its input to produce a stylized-colorization image of the line art. Two discriminators, on the other hand, share weights at early layers to judge the stylized-colorization image in two different aspects: one for color and one for style. One generator and two discriminators are jointly trained in an adversarial and end-to-end manner.</p>
<p>Extensive experiments demonstrate the effectiveness of our proposed model.</p>
---
https://thisanimedoesnotexist.ai/
This Anime Does Not Exist.ai (TADNE)
Nearcyan, Aydao, Shawn Presser, Gwern
2021-01-19
2021-11-09
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>[Website demonstrating samples from a modified <a href="https://arxiv.org/abs/1912.04958#nvidia" title="‘Analyzing and Improving the Image Quality of StyleGAN’, Karras et al 2019">StyleGAN2</a> trained on Danbooru2019 using <a href="https://sites.research.google/trc/">TRC</a> <a href="/doc/ai/scaling/hardware/2020-jouppi.pdf#google" title="‘A domain-specific supercomputer for training deep neural networks’, Jouppi et al 2020">TPUs</a> for ~5m iterations for ~2 months on a <a href="https://en.wikipedia.org/wiki/Tensor_Processing_Unit#Third_generation_TPU">TPUv3-32</a> pod; this modified ‘StyleGAN2-ext’, removes various regularizations which make StyleGAN2 data-efficient on datasets like <a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">FFHQ</a>, but hobble its ability to model complicated images, and scales the model up >2×. This is surprisingly effective given StyleGAN’s previous inability to approach <a href="https://arxiv.org/abs/1809.11096#deepmind" title="‘Large Scale GAN Training for High Fidelity Natural Image Synthesis’, Brock et al 2018">BigGAN’s</a> Danbooru2019, and <a href="https://thisanimedoesnotexist.ai/" title="‘This Anime Does Not Exist.ai (TADNE)’, Nearcyan et al 2021">TADNE</a> shows off the entertaining results.</p>
<p>The interface reuses Said Achmiz’s <a href="https://demos.obormot.net/these-waifus-do-not-exist-alt">These Waifus Do Not Exist</a> grid UI.</p>
<p><a href="/face#extended-stylegan2-danbooru2019-aydao">Writeup</a>; see also: <a href="https://colab.research.google.com/drive/1_DydlRBBTUupM9djmtegqnuettSTrrRD" title="This Anime Does Not Exist, Search: this notebook uses the precomputed CLIP feature vectors for 100k images from TADNE">Colab notebook to search</a> by CLIP embedding; <a href="https://www.thiswaifudoesnotexist.net/" title="’’ThisWaifuDoesNotExist.net’’, Gwern 2019">“This Waifu Does Not Exist” (TWDNE)</a>/<a href="https://www.thisfursonadoesnotexist.com/" title="’’This Fursona Does Not Exist’’, Arfafax 2020">“This Fursona Does Not Exist” (TFDNE)</a>/<a href="https://thisponydoesnotexist.net/" title="’’This Pony Does Not Exist’’, Arfafax 2020">“This Pony Does Not Exist” (TPDNE)</a>, <a href="https://x.com/Nearcyan/status/1368737578334228482">TADNE face editing</a>, <a href="https://openai.com/index/clip/" title="’’CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3’’, Radford et al 2021">CLIP</a>=<a href="https://x.com/metasemantic/status/1368713208429764616" title="CLIP + StyleGAN + #mylittlepony A thread 🧵 starting with @ElvisPresley ’’A pony that looks like Elvis Presley’’">guided ponies</a>]</p>
<figure>
<img src="/doc/ai/nn/gan/stylegan/anime/2020-01-22-gwern-tadne-screenshot.png" alt="Screenshot of “This Anime Does Not Exist” infinite-scroll website." />
<figcaption aria-hidden="true">Screenshot of “This Anime Does Not Exist” infinite-scroll website.</figcaption>
</figure>
---
https://github.com/nagolinc/notebooks/blob/main/TADNE_and_CLIP.ipynb
Scoring images from TADNE with CLIP
nagolinc
2021-01-20
2021-06-25
ai/anime/danbooru ai/nn/gan/stylegan/anime ai/nn/transformer/clip
<p>[Source code for working with the <a href="https://en.wikipedia.org/wiki/OpenAI">OpenAI</a> <a href="https://openai.com/index/clip/" title="CLIP: Connecting Text and Images">CLIP</a> zero-shot universal image classifier and the <a href="https://thisanimedoesnotexist.ai/">This Anime Does Not Exist.ai (TADNE)</a> <a href="https://arxiv.org/abs/1912.04958#nvidia" title="‘Analyzing and Improving the Image Quality of StyleGAN’, Karras et al 2019">StyleGAN2</a>-ext model: CLIP can use text descriptions to score images by how well they match the text description, and this scoring can be used to <em>generate</em> images matching the description by iteratively refining the pixels to make CLIP increase the description score (gradient ascent).]</p>
<figure>
<img src="/doc/ai/nn/gan/stylegan/anime/2021-01-20-nagolinc-tadne-clipbasedgeneration-agirlwithapinkhat.png" alt="Demonstration of using CLIP to pull out an image of “a girl with a pink hat” from the TADNE GAN." />
<figcaption aria-hidden="true">Demonstration of using CLIP to pull out an image of “a girl with a pink hat” from the TADNE <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>.</figcaption>
</figure>
---
https://github.com/kosuke1701/ZACI-20-dataset
Danbooru 2020 Zero-shot Anime Character Identification Dataset (ZACI-20)
Kosuke Akimoto
2021-02-06
2021-06-24
ai/anime/danbooru
<p><strong>Danbooru 2020 Zero-shot Anime Character Identification Dataset (ZACI-20)</strong>: The goal of this dataset is creating human-level character identification models which do not require retraining on novel characters. The dataset is derived from <a href="/danbooru2021#danbooru2020">Danbooru2020 dataset</a>.</p>
<p>Features:</p>
<ul>
<li><p><strong>Large-scale</strong>: 1.45M images of 39K characters (train dataset).</p></li>
<li><p>Designed for <strong>zero-shot</strong> setting: characters in the test dataset do not appear in the train dataset, allowing us to test model performance on novel characters.</p></li>
<li><p><strong>Human-annotated test dataset</strong>:</p>
<ul>
<li><p>Image pairs with erroneous face detection or duplicate images are <span class="smallcaps">manually removed</span>.</p></li>
<li><p>We can compare model performance to <span class="smallcaps">human performance</span>.</p></li>
</ul></li>
</ul>
<p>Benchmarks:</p>
<table>
<colgroup>
<col style="width: 20%" />
<col style="width: 20%" />
<col style="width: 20%" />
<col style="width: 20%" />
<col style="width: 20%" />
</colgroup>
<thead>
<tr class="header header header">
<th>model name</th>
<th>FPR (%)</th>
<th>FNR (%)</th>
<th>EER (%)</th>
<th>note</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Human</td>
<td>1.59</td>
<td>13.9</td>
<td>N/A</td>
<td>by kosuke1701</td>
</tr>
<tr class="even">
<td>ResNet-152</td>
<td><strong>2.40</strong></td>
<td>13.9</td>
<td>8.89</td>
<td>w/ <a href="https://arxiv.org/abs/1909.13719#google" title="‘RandAugment: Practical automated data augmentation with a reduced search space’, Cubuk et al 2019">RandAugment</a>, Contrastive loss. <a href="https://github.com/kosuke1701/AnimeCV/releases/download/0111_best_randaug/0206_resnet152.zip">0206_resnet152</a> by kosuke1701</td>
</tr>
<tr class="odd">
<td><a href="https://arxiv.org/abs/1709.01507" title="‘Squeeze-and-Excitation Networks’, Hu et al 2017">SE</a>-ResNet-152</td>
<td>2.43</td>
<td>13.9</td>
<td><strong>8.15</strong></td>
<td>w/ RandAug, Contrastive loss. <a href="https://github.com/kosuke1701/AnimeCV/releases/download/0111_best_randaug/0206_seresnet152.zip">0206_seresnet152</a> by kosuke1701</td>
</tr>
<tr class="even">
<td>ResNet-18</td>
<td>5.08</td>
<td>13.9</td>
<td>9.59</td>
<td>w/ RandAug, Contrastive loss. <a href="https://github.com/kosuke1701/AnimeCV/releases/download/0111_best_randaug/0206_resnet18.zip">0206_resnet18</a> by kosuke1701</td>
</tr>
</tbody>
</table>
---
https://arxiv.org/abs/2102.12593
AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation
Bing Li, Yuanlue Zhu, Yitong Wang, Chia-Wen Lin, Bernard Ghanem, Linlin Shen
2021-02-24
2021-05-02
[("doi","10.48550/arXiv.2102.12593")]
ai/anime/danbooru ai/nn/gan
<p>In this paper, we propose a novel framework to translate a portrait photo-face into an anime appearance. Our aim is to synthesize anime-faces which are style-consistent with a given reference anime-face. However, unlike typical translation tasks, such anime-face translation is challenging due to complex variations of appearances among anime-faces. Existing methods often fail to transfer the styles of reference anime-faces, or introduce noticeable artifacts/distortions in the local shapes of their generated faces.</p>
<p>We propose AniGAN, a novel <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>-based translator that synthesizes high-quality anime-faces. Specifically, a new generator architecture is proposed to simultaneously transfer color/texture styles and transform local facial shapes into anime-like counterparts based on the style of a reference anime-face, while preserving the global structure of the source photo-face. We propose a double-branch discriminator to learn both domain-specific distributions and domain-shared distributions, helping generate visually pleasing anime-faces and effectively mitigate artifacts.</p>
<p>Extensive experiments qualitatively and quantitatively demonstrate the superiority of our method over state-of-the-art methods.</p>
---
/doc/ai/anime/danbooru/2021-hernandez.pdf
CómicGAN: Generación de ilustraciones con redes GAN de crecimiento progresivo
Guillermo Iglesias Hernández
2021-04-01
2021-04-01
ai/anime/danbooru ai/nn/gan/stylegan/progan
<p>This degree work shows the implementation of generative adversarial networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) to generate completely new illustrations, making use of comic images that have been previously studied, normalized, and filtered. The report will reflect the evolution of the research, which takes as a starting point the results of a previous research in which the first steps to follow in order to obtain a valid <em>dataset</em> are proposed.</p>
<p>The set of steps to obtain the final result are presented: the work methodology used, the remote work configuration using <a href="https://en.wikipedia.org/wiki/Google_Colab">Google Colab</a>, the search and study of architectures and the evolution and improvement of the different networks to achieve the final results.</p>
<p>To carry out the generation of comic illustrations, a set of models is implemented progressively, taking an evolution from simpler to more complex and actual models, in order to better assimilate the necessary knowledge. In this way, the use of progressively growing generative adversarial networks or <a href="https://arxiv.org/abs/1710.10196#nvidia" title="‘Progressive Growing of GANs for Improved Quality, Stability, and Variation’, Karras et al 2017">ProGANs</a> is ultimately proposed. By using the ProGAN architecture, it is possible to improve the results compared to the use of traditional methods, obtaining images of great similarity to the original ones, maintaining the originality and avoiding the direct copy of images from the dataset.</p>
<p>In order to validate and demonstrate that the results obtained are replicable, a comparison between 2 different sets of images has been performed. Although both <em>datasets</em> have the same drawing style, they present substantial differences in their composition. On the one hand, results are shown for images of characters with the whole body and later using illustrations of character faces in the foreground.</p>
---
https://openaccess.thecvf.com/content/CVPR2021W/CVFAD/papers/Yuan_Line_Art_Colorization_With_Concatenated_Spatial_Attention_CVPRW_2021_paper.pdf
Line Art Colorization with Concatenated Spatial Attention
Mingcheng Yuan, Edgar Simo-Serra
2021-04-18
2021-09-03
ai/anime/danbooru
<p>Line art plays a fundamental role in illustration and design, and allows for iteratively polishing designs. However, as they lack color, they can have issues in conveying final designs.</p>
<p>In this work, we propose an interactive colorization approach based on a conditional generative adversarial network that takes both the line art and color hints as inputs to produce a high-quality colorized image. Our approach is based on a <a href="https://en.wikipedia.org/wiki/U-Net">U-net</a> architecture with a multi-discriminator framework. We propose a Concatenation and Spatial Attention module that is able to generate more consistent and higher quality of line art colorization from user given hints.</p>
<p>We evaluate on a large-scale illustration dataset and comparison with existing approaches corroborate the effectiveness of our approach.</p>
---
https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_User-Guided_Line_Art_Flat_Filling_With_Split_Filling_Mechanism_CVPR_2021_paper.pdf
User-Guided Line Art Flat Filling With Split Filling Mechanism
Lvmin Zhang, Chengze Li, Edgar Simo-Serra, Yi Ji, Tien-Tsin Wong, Chunping Liu
2021-06
2021-09-03
ai/anime/danbooru
<p>Flat filling is a critical step in digital artistic content creation with the objective of filling line arts with flat colors.</p>
<p>We present a deep learning framework for user-guided line art flat filling that can compute the “influence areas” of the user color scribbles, ie. the areas where the user scribbles should propagate and influence. This framework explicitly controls such scribble influence areas for artists to manipulate the colors of image details and avoid color leakage/contamination between scribbles, and simultaneously, leverages data-driven color generation to facilitate content creation. This framework is based on a <strong>Split Filling Mechanism</strong> (SFM), which first splits the user scribbles into individual groups and then independently processes the colors and influence areas of each group with a <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">Convolutional Neural Network</a> (CNN).</p>
<p>Learned from more than a million illustrations, the framework can estimate the scribble influence areas in a content-aware manner, and can smartly generate visually pleasing colors to assist the daily works of artists.</p>
<p>We show that our proposed framework is easy to use, allowing even amateurs to obtain professional-quality results on a wide variety of line arts.</p>
---
https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Discovering_Interpretable_Latent_Space_Directions_of_GANs_Beyond_Binary_Attributes_CVPR_2021_paper.pdf
AdvStyle: Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes
Huiting Yang, Liangyu Chai, Qiang Wen, Shuang Zhao, Zixun Sun, Shengfeng He
2021-06
2021-09-02
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>Generative adversarial networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) learn to map noise <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> vectors to high-fidelity image outputs. It is found that the input latent space shows semantic correlations with the output image space. Recent works aim to interpret the latent space and discover meaningful directions that correspond to human interpretable image transformations. However, these methods either rely on explicit scores of attributes (eg. memorability) or are restricted to binary ones (eg. gender), which largely limits the applicability of editing tasks, especially for free-form artistic tasks like style/anime editing.</p>
<p>In this paper, we propose an adversarial method, <strong>AdvStyle</strong>, for discovering interpretable directions in the absence of well-labeled scores or binary attributes. In particular, the proposed adversarial method simultaneously optimizes the discovered directions and the attribute assessor using the target attribute data as positive samples, while the generated ones being negative. In this way, arbitrary attributes can be edited by collecting positive data only, and the proposed method learns a controllable representation enabling manipulation of non-binary attributes like anime styles and facial characteristics. Moreover, the proposed learning strategy attenuates the entanglement between attributes, such that multi-attribute manipulation can be easily achieved without any additional constraint.</p>
<p>Furthermore, we reveal several interesting semantics with the involuntarily learned negative directions. Extensive experiments on 9 anime attributes and 7 human attributes demonstrate the effectiveness of our adversarial approach qualitatively and quantitatively. Code is available at <a href="https://github.com/BERYLSHEEP/AdvStyle">GitHub</a>.</p>
---
https://lllyasviel.github.io/MangaFilter/
Generating Manga from Illustrations via Mimicking Manga Workflow
Lvmin Zhang, Xinrui Wang, Qingnan Fan, Yi Ji, Chunping Liu
2021-06
2021-08-05
ai/anime/danbooru
<p>We present a [<a href="https://github.com/lllyasviel/style2paints" title="‘Style2Paints GitHub repository’, Zhang et al 2018">style2paints</a>] framework to generate manga from digital illustrations.</p>
<p>In professional manga studios, the manga creation workflow consists of three key steps: (1) Artists use line drawings to delineate the structural outlines in manga storyboards. (2) Artists apply several types of regular screentones to render the shading, occlusion, and object materials. (3) Artists selectively paste irregular screen textures onto the canvas to achieve various background layouts or special effects.</p>
<p>Motivated by this workflow, we propose a data-driven framework to convert a digital illustration into 3 corresponding components: manga line drawing, regular screentone, and irregular screen texture. These components can be directly composed into manga images and can be further retouched for more plentiful manga creations. To this end, we create a large-scale dataset with these 3 components annotated by artists in a human-in-the-loop manner. We conduct both perceptual user study and qualitative evaluation of the generated manga, and observe that our generated image layers for these 3 components are practically usable in the daily works of manga artists.</p>
<p>We provide 60 qualitative results and 15 additional comparisons in the supplementary material. We will make our presented manga dataset publicly available to assist related applications.</p>
<p>[<a href="https://github.com/lllyasviel/MangaFilter/releases/download/file/manga.pdf">Paper</a>; <a href="https://github.com/lllyasviel/MangaFilter/releases/download/file/manga_sup.pdf">supplement</a>; cf. <a href="https://github.com/lllyasviel/DanbooRegion">DanbooRegion</a>]</p>
---
/doc/ai/anime/danbooru/2021-sun.pdf
Hide Chopin in the Music: Efficient Information Steganography Via Random Shuffling
Zhun Sun, Chao Li, Qibin Zhao
2021-06-06
2021-06-06
[("doi","10.1109/ICASSP39728.2021.9413357")]
ai/anime/danbooru cs/cryptography/steganography music
<p>Information <a href="https://en.wikipedia.org/wiki/Steganography">steganography</a> is a family of techniques that hide secret messages into a carrier; thus, the messages can only be extracted by receivers with a correct key Although many approaches have been proposed to achieve this purpose, historically, it is a difficult problem to conceal a large amount of information without occasioning human perceptible changes.</p>
<p>In this paper, we explore the room introduced by the low-rank property of natural signals (ie. images, audios), and propose a training-free model for efficient information steganography, which provides a capacity of hiding full-size images into carriers of the same spatial resolution. The key of our method is to randomly shuffle the secrets and carry out a simple reduction summation with the carrier. On the other hand, the secret images can be reconstructed by solving a <a href="https://en.wikipedia.org/wiki/Convex_optimization">convex optimization</a> problem similar to the ordinary <a href="https://en.wikipedia.org/wiki/Tensor_decomposition">tensor decomposition</a>.</p>
<p>In the experimental analysis, we carry out 2 tasks: concealing a full-RGB-color image into a gray-scale image; concealing images into music signals. The results confirm the ability of our model to handle massive secret payloads.</p>
<p>The code of our paper is provided in <a href="https://github.com/minogame/icassp-SIC" class="uri">https://github.com/minogame/icassp-SIC</a>.</p>
<p>[<strong>Keywords</strong>: image steganography, tensor decomposition, privacy protection, image signal processing]</p>
---
/doc/ai/anime/danbooru/2021-li-2.pdf
DP-LaSE: Discovering Density-Preserving Latent Space Walks in GANs for Semantic Image Transformations
Guanyue Li, Yi Liu, Xiwen Wei, Yang Zhang, Si Wu, Yong Xu, Hau-San Wong
2021-08-22
2021-08-22
[("doi","10.1145/3474085.3475293")]
ai/anime/danbooru ai/nn/gan/biggan ai/nn/gan/stylegan/anime
<p>[<a href="/doc/ai/anime/danbooru/2021-li-dplase-ganlatentspaceeditingvideo.mp4">video</a>] Generative adversarial network (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a>)-based models possess superior capability of high-fidelity image synthesis. There are a wide range of semantically meaningful directions in the <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> representation space of well-trained GANs, and the corresponding latent space walks are meaningful for semantic controllability in the synthesized images.</p>
<p>To explore the underlying organization of a latent space, we propose an unsupervised <strong>Density-Preserving Latent Semantics Exploration</strong> model (<span class="smallcaps">DP-LaSE</span>). The important latent directions are determined by maximizing the variations in intermediate features, while the correlation between the directions is minimized. Considering that latent codes are sampled from a <a href="https://en.wikipedia.org/wiki/Prior_probability">prior distribution</a>, we adopt a density-preserving regularization approach to ensure latent space walks are maintained in iso-density regions, since moving to a higher/lower density region tends to cause unexpected transformations. To further refine semantics-specific transformations, we perform subspace learning over intermediate feature channels, such that the transformations are limited to the most relevant subspaces.</p>
<p>Extensive experiments on a variety of benchmark datasets demonstrate that <span class="smallcaps">DP-LaSE</span> is able to discover interpretable latent space walks, and specific properties of synthesized images can thus be precisely controlled.</p>
---
https://www.sysu-imsl.com/files/PG2021/line_art_colorization_pg2021_main.pdf
Line Art Colorization Based on Explicit Region Segmentation
Ruizhi Cao, Haoran Mo, Chengying Gao
2021-09-04
2022-04-26
ai/anime/danbooru
<p>Automatic line art colorization plays an important role in anime and comic industry. While existing methods for line art colorization are able to generate plausible colorized results, they tend to suffer from the color bleeding issue.</p>
<p>We introduce an explicit <a href="https://en.wikipedia.org/wiki/Image_segmentation">segmentation</a> fusion mechanism to aid colorization frameworks in avoiding color bleeding artifacts. This mechanism is able to provide region segmentation information for the colorization process explicitly so that the colorization model can learn to avoid assigning the same color across regions with different semantics or inconsistent colors inside an individual region. The proposed mechanism is designed in a plug-and-play manner, so it can be applied to a diversity of line art colorization frameworks with various kinds of user guidance.</p>
<p>We evaluate this mechanism in tag-based and reference-based line art colorization tasks by incorporating it into the state-of-the-art models. Comparisons with these existing models corroborate the effectiveness of our method which largely alleviates the color bleeding artifacts.</p>
<p>The code is available at <a href="https://github.com/Ricardo-L-C/ColorizationWithRegion">Github</a>.</p>
---
https://openaccess.thecvf.com/content/ICCV2021/papers/Zhang_SmartShadow_Artistic_Shadow_Drawing_Tool_for_Line_Drawings_ICCV_2021_paper.pdf
SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
Lvmin Zhang, Jinyue Jiang, Yi Ji, Chunping Liu
2021-10
2021-10
ai/anime/danbooru
<p><strong>SmartShadow</strong> is a deep learning application for digital painting artists to draw shadows on line drawings, with 3 proposed tools:</p>
<ol>
<li><p>Shadow brush: artists can draw scribbles to coarsely indicate the areas inside or outside their wanted shadows, and the application will generate the shadows in real-time.</p></li>
<li><p>Shadow boundary brush: this brush can precisely control the boundary of any specific shadow.</p></li>
<li><p>Global shadow generator: this tool can estimate the global shadow direction from input brush scribbles, and then consistently propagate local shadows to the entire image.</p></li>
</ol>
<p>These 3 tools can not only speed up the shadow drawing process (by 3.1× as experiments validate), but also allow for the flexibility to achieve various shadow effects and facilitate richer artistic creations.</p>
<p>To this end, we train <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">Convolutional Neural Networks</a> (CNNs) with a collected large-scale dataset of both real and synthesized data, and especially, we collect 1670 shadow samples drawn by real artists. Both qualitative analysis and user study show that our approach can generate high-quality shadows that are practically usable in the daily works of digital painting artists.</p>
<p>We present 30 additional results and 15 visual comparisons in the supplementary materiel.</p>
---
https://www.hindawi.com/journals/wcmc/2021/3560592/
3D Modeling Design of Multirole Virtual Character Based on Visual Communication in Wireless Sensor Networks
Meiting Qu, Lei Li
2021-11-10
2021-12-30
[("doi","10.1155/2021/3560592")]
ai/anime/danbooru
<p>In order to solve the problems of poor design effect and time consumption of traditional virtual character modeling design methods, a 3-dimensional (3D) modeling design method of multi-role virtual characters based on visual communication is studied.</p>
<p>Firstly, the wireless sensor network is used to locate, scan, and collect the human body structure information and convert the coordinates to bind the 3D skeleton. Secondly, according to different human postures, we switch the linear hybrid skin algorithm, spherical hybrid skin algorithm, and double quaternion hybrid skin algorithm; design the geometric surface; and attach it to the 3D skeleton to generate 3D modeling. Finally, based on the influence of visual communication on human eye observation and psychological feeling, the geometric surface is divided twice, and the virtual character is rendered in the way of display, coating, and splicing to obtain a complete 3D modeling of the virtual character.</p>
<p>The results show that the positioning coverage of this method is higher, the rendering effect of the hand and head is better, the design time is substantially shortened, and the maximum time is no more than 35 min.</p>
---
/doc/ai/anime/danbooru/2021-geng.pdf
Passive Non-Line-of-Sight Imaging Using Optimal Transport
Ruixu Geng, Yang Hu, Zhi Lu, Cong Yu, Houqiang Li, Hengyu Zhang, Yan Chen
2021-11-22
2021-11-22
[("doi","10.1109/TIP.2021.3128312")]
ai/anime/danbooru ai/nn/vae
<p>Passive <a href="https://en.wikipedia.org/wiki/Non-line-of-sight_propagation">non-line-of-sight</a> (NLOS) imaging has drawn great attention in recent years. However, all existing methods are in common limited to simple hidden scenes, low-quality reconstruction, and small-scale datasets.</p>
<p>In this paper, we propose <strong><span class="smallcaps">NLOS-OT</span></strong>, a novel passive NLOS imaging framework based on manifold embedding and <a href="https://en.wikipedia.org/wiki/Transportation_theory_(mathematics)">optimal transport</a>, to reconstruct high-quality complicated hidden scenes. <span class="smallcaps">NLOS-OT</span> converts the high-dimensional reconstruction task to a low-dimensional manifold mapping through optimal transport, alleviating the ill-posedness in passive NLOS imaging. Besides, we create the first large-scale passive NLOS imaging dataset, <strong>NLOS-Passive</strong>, which includes 50 groups and more than 3,200,000 images. NLOS-Passive collects target images with different distributions and their corresponding observed projections under various conditions, which can be used to evaluate the performance of passive NLOS imaging algorithms.</p>
<p>It is shown that the proposed <span class="smallcaps">NLOS-OT</span> framework achieves much better performance than the state-of-the-art methods on NLOS-Passive.</p>
<p>We believe that the <span class="smallcaps">NLOS-OT</span> framework together with the NLOS-Passive dataset is a big step and can inspire many ideas towards the development of learning-based passive NLOS imaging. Codes and dataset are <a href="https://github.com/ruixv/NLOS-OT">publicly available</a>.</p>
<p>[<strong>Keywords</strong>: non-line-of-sight imaging, optimal transport, autoencoder, manifold embedding]</p>
---
/doc/ai/anime/danbooru/2022-kim-2.pdf
Late-Resizing: A Simple but Effective Sketch Extraction Strategy for Improving Generalization of Line-Art Colorization
Dohyun Kim, Dajung Je, Kwangjin Lee, Moohyun Kim, Han Kim
2022
2022
ai/anime/danbooru
<p>Automatic line-art colorization is a demanding research field owing to its expensive and labor-intensive workload. Learning-based approaches have lately emerged to improve the quality of colorization. To handle the lack of paired data in line art and color images, sketch extraction has been widely adopted. This study primarily focuses on the resizing process applied within the sketch extraction procedure, which is essential for normalizing input sketches of various sizes to the target size of the colorization model.</p>
<p>We first analyze the inherent risk in a conventional resizing strategy, ie. early-resizing, which places the resizing step before the line detection process to ensure the practicality. Although the strategy is extensively used, it involves an often overlooked risk of substantially degrading the generalization of the colorization model. Thus, we propose a late-resizing strategy in which resizing is applied after the line detection step. The proposed late-resizing strategy has 3 advantages: prevention of a quality degradation in the color image, augmentation for downsizing artifacts, and alleviation of look-ahead bias.</p>
<p>In conclusion, we present both quantitative and qualitative evaluations on representative learning-based line-art colorization methods, which verify the effectiveness of the proposed method in the generalization of the colorization model.</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="/doc/ai/anime/danbooru/2020-lee.pdf" class="backlink-not id-not" title="‘LDM: Automatic Colorization of Anime Style Illustrations Using a Two-Stage Generator’, Lee & Lee 2020">“Automatic Colorization of Anime Style Illustrations Using a Two-Stage Generator”</a></p></li>
<li><p><a href="https://arxiv.org/abs/1908.05840" class="backlink-not id-not">“Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss”</a></p></li>
<li><p><a href="https://www.sysu-imsl.com/files/PG2021/line_art_colorization_pg2021_main.pdf" class="backlink-not id-not">“Line Art Colorization Based on Explicit Region Segmentation”</a></p></li>
<li><p><a href="https://arxiv.org/abs/2109.14518" class="backlink-not id-not">“Generative Probabilistic Image Colorization”</a></p></li>
<li><p><a href="/doc/ai/anime/danbooru/2021-fang.pdf" class="backlink-not id-not">“Stylized-Colorization for Line Arts”</a></p></li>
<li><p><a href="/doc/ai/anime/danbooru/2019-ye.pdf" class="backlink-not id-not">“Interactive Anime Sketch Colorization with Style Consistency via a Deep Residual Neural Network”</a></p></li>
<li><p><a href="https://arxiv.org/abs/1704.08834" class="backlink-not id-not">“Outline Colorization through Tandem Adversarial Networks”</a></p></li>
<li><p><a href="https://arxiv.org/abs/2107.01619" class="backlink-not id-not">“Deep Edge-Aware Interactive Colorization against Color-Bleeding Effects”</a></p></li>
<li><p><a href="/doc/ai/anime/danbooru/2020-lee-2.pdf" class="backlink-not id-not">“Automatic Colorization of High-resolution Animation Style Line-art based on Frequency Separation and Two-Stage Generator”</a></p></li>
</ul>
</div>
---
/doc/ai/anime/danbooru/2022-madhusudana.pdf#google
Image Quality Assessment Using Synthetic Images
Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
2022
2022
ai/anime/danbooru
<p>Training deep models using <a href="https://arxiv.org/abs/2010.05113" title="‘Contrastive Representation Learning: A Framework and Review’, Le-Khac et al 2020">contrastive learning</a> has achieved impressive performances on various computer vision tasks. Since training is done in a self-supervised manner on unlabeled data, contrastive learning is an attractive candidate for applications for which large labeled datasets are hard/expensive to obtain. In this work we investigate the outcomes of using contrastive learning on synthetically generated images for the Image Quality Assessment (IQA) problem.</p>
<p>The training data consists of computer generated images corrupted with predetermined distortion types [<a href="https://en.wikipedia.org/wiki/Georges_Matheron">Georges Matheron’s’s</a> “Dead Leaves”]. Predicting distortion type and degree is used as an auxiliary task to learn image quality features. The learned representations are then used to predict quality in a No-Reference (NR) setting on real-world images [Danbooru2020].</p>
<p>We show through extensive experiments that this model achieves comparable performance to state-of-the-art NR image quality models when evaluated on real images afflicted with synthetic distortions, even without using any real images during training.</p>
<p>Our results indicate that training with synthetically generated images can also lead to effective, and perceptually relevant representations.</p>
---
/doc/ai/nn/gan/stylegan/anime/2022-zhou.pdf
Pro-PULSE: Learning Progressive Encoders of Latent Semantics in GANs for Photo Upsampling
Yang Zhou, Yangyang Xu, Yong Du, Qiang Wen, Shengfeng He
2022-01-11
2022-01-11
[("doi","10.1109/TIP.2022.3140603")]
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>The state-of-the-art photo upsampling method, <a href="https://arxiv.org/abs/2003.03808" title="’PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models’, Menon et al 2020">PULSE</a>, demonstrates that a sharp, high-resolution (HR) version of a given low-resolution (LR) input can be obtained by exploring the <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> space of generative models. However, mapping an extreme LR input (16<sup>2</sup>) directly to an HR image (1024<sup>2</sup>) is too ambiguous to preserve faithful local facial semantics.</p>
<p>In this paper, we propose an enhanced upsampling approach, <strong>Pro-PULSE</strong>, that addresses the issues of semantic inconsistency and optimization complexity.</p>
<p>Our idea is to learn an encoder that progressively constructs the HR latent codes in the extended 𝑊+ latent space of <a href="https://arxiv.org/abs/1812.04948#nvidia" title="’A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a>. This design divides the complex 64× upsampling problem into several steps, and therefore small-scale facial semantics can be inherited from one end to the other.</p>
<p>In particular, we train 2 encoders, the base encoder maps latent vectors in 𝑊 space and serves as a foundation of the HR latent vector, while the second scale-specific encoder performed in 𝑊+ space gradually replaces the previous vector produced by the base encoder at each scale. This process produces intermediate side-outputs, which injects deep supervision into the training of encoder.</p>
<p>Extensive experiments demonstrate superiority over the latest latent space exploration methods, in terms of efficiency, quantitative quality metrics, and qualitative visual results.</p>
---
/doc/ai/anime/danbooru/2022-gopalakrishnan.pdf
Classify and generate: Using classification latent space representations for image generations
Saisubramaniam Gopalakrishnan, Pranshu Ranjan Singh, Yasin Yazici, Chuan-Sheng Foo, Vijay Chandrasekhar, ArulMurugan Ambikapathi
2022-01-30
2022-01-30
[("doi","10.1016/j.neucom.2021.10.090")]
ai/anime/danbooru
<p>Usage of classification <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> space information for downstream reconstruction and generation is an intriguing and a relatively unexplored area. In general, discriminative representations are rich in class specific features but are too sparse for reconstruction, whereas, in <a href="https://en.wikipedia.org/wiki/Autoencoder">autoencoders</a> the representations are dense but has limited indistinguishable class specific features, making it less suitable for classification.</p>
<p>In this work, we propose a discriminative modeling framework that employs manipulated supervised latent representations to reconstruct and generate new samples belonging to a given class.</p>
<p>Unlike generative modeling approaches such as <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a> and <a href="https://en.wikipedia.org/wiki/Variational_autoencoder">VAEs</a> that aim to <em>model</em> the data manifold distribution, <strong>Representation based Generations</strong> (ReGene) directly <em>represents</em> the given data manifold in the classification space. Such supervised representations, under certain constraints, allow for reconstructions and controlled generations using an appropriate decoder without enforcing any <a href="https://en.wikipedia.org/wiki/Prior_probability">prior distribution</a>. Theoretically, given a class, we show that these representations when smartly manipulated using convex combinations retain the same class label. Furthermore, they also lead to novel generation of visually realistic images.</p>
<p>Extensive experiments on datasets of varying resolutions demonstrate that ReGene has higher classification accuracy than existing conditional generative models while being competitive in terms of <a href="https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance">FID</a>.</p>
<p>[<strong>Keywords</strong>: classification latent space, convex combination, image generation]</p>
---
https://github.com/williamyang1991/DualStyleGAN
DualStyleGAN: Official PyTorch Implementation for "Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer"
Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy
2022-03-11
2022-05-27
ai/anime/danbooru ai/nn/gan/stylegan/anime
<p>Recent studies on <a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a> show high performance on artistic portrait generation by transfer learning with limited data. In this paper, we explore more challenging exemplar-based high-resolution portrait <a href="https://arxiv.org/abs/2109.03910#google" title="‘A Recipe For Arbitrary Text Style Transfer with Large Language Models’, Reif et al 2021">style transfer</a> by introducing a novel <strong>DualStyleGAN</strong> with flexible control of dual styles of the original face domain and the extended artistic portrait domain.</p>
<p>Different from StyleGAN, DualStyleGAN provides a natural way of <a href="https://arxiv.org/abs/1508.06576" title="‘A Neural Algorithm of Artistic Style’, Gatys et al 2015">style transfer</a> by characterizing the content and style of a portrait with an <em>intrinsic style</em> path and a new <em>extrinsic style</em> path, respectively. The delicately designed extrinsic style path enables our model to modulate both the color and complex structural styles hierarchically to precisely pastiche the style example. Furthermore, a novel progressive fine-tuning scheme is introduced to smoothly transform the generative space of the model to the target domain, even with the above modifications on the network architecture.</p>
<p>Experiments demonstrate the superiority of DualStyleGAN over state-of-the-art methods in high-quality portrait style transfer and flexible style control.</p>
<p>Features:</p>
<div class="columns">
<ul>
<li><p>High-Resolution (1024px)</p></li>
<li><p>Training Data-Efficient (<em>n</em> ~ 200 Images)</p></li>
<li><p>Exemplar-Based Color and Structure Transfer</p></li>
</ul>
</div>
<figure>
<img src="/doc/ai/anime/danbooru/2022-yang-dualstylegan-examplesofcaricatureanimepixarcomiccartoonportraitedits.jpg" alt="Example edits." />
<figcaption aria-hidden="true">Example edits.</figcaption>
</figure>
---
/doc/ai/anime/danbooru/2022-rios.pdf
Anime Character Recognition using Intermediate Features Aggregation
Edwin Arkel Rios, Min-Chun Hu, Bo-Cheng Lai
2022-05-27
2022-12-18
[("doi","10.1109/ISCAS48785.2022.9937519")]
ai/anime/danbooru ai/nn/transformer
<p>In this work we study the problem of anime character recognition. Anime, refers to animation produced within Japan and work derived or inspired from it.</p>
<p>We propose a novel <strong>Intermediate Features Aggregation</strong> classification head, which helps smooth the optimization landscape of <a href="https://arxiv.org/abs/2010.11929#google" title="‘Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale’, Dosovitskiy et al 2020">Vision Transformers</a> (ViTs) by adding skip connections between intermediate layers and the classification head, thereby improving relative classification accuracy by up to 28%. The proposed model, named as <strong>Animesion</strong>, is the first <a href="/doc/cs/end-to-end-principle/index">end-to-end</a> framework for large-scale anime character recognition.</p>
<p>We conduct extensive experiments using a variety of classification models, including <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">CNNs</a> and self-attention based ViTs. We also adapt its multimodal variation Vision-Language <a href="https://arxiv.org/abs/1706.03762#google" title="‘Attention Is All You Need’, Vaswani et al 2017">Transformer</a> (<a href="https://arxiv.org/abs/2102.03334" title="‘ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision’, Kim et al 2021">ViLT</a>), to incorporate external [Danbooru] tag data for classification, without additional multimodal pre-training.</p>
<p>Through our results we obtain new insights into the effects of how hyperparameters such as input sequence length, mini-batch size, and variations on the architecture, affect the transfer learning performance of Vi(L)Ts.</p>
<p>…<a href="https://github.com/arkel23/animesion">we release our source-code</a> and pretrained model checkpoints, in an effort to encourage and facilitate researchers to continue work in this domain.</p>
<p>…<strong>3.1. Data</strong>: We use the DanbooruAnimeFaces dataset in our experiments. <a href="https://github.com/grapeot/Danbooru2018AnimeCharacterRecognitionDataset">DAF</a>, is a subset of the 2018 release of <a href="/danbooru2021#danbooru2019">Danbooru2019</a>. Due to its extremely long-tailed distribution, we only keep classes with at least 20 samples, resulting in 463, 437 images of 3,263 characters. We split it into training, validation, and testing sets using a ratio of 0.7, 0.1, and 0.2, respectively. Since the original dataset only contains face crops, we also sample full body images by resizing the original images from Danbooru20×x, and coin it as <strong>DAFull</strong>. Furthermore, we include description tags from Danbooru20×x as additional multimodal data.</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="https://arxiv.org/abs/2101.08674" class="backlink-not id-not">DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition</a></p></li>
<li><p><a href="https://arxiv.org/abs/2111.07640" class="backlink-not id-not">AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment</a></p></li>
<li><p><a href="https://github.com/kosuke1701/ZACI-20-dataset" class="backlink-not id-not"><span class="cite"><span class="cite-author">Danbooru<span class="cite-date">2020</span></span> Zero-shot Anime Character Identification Dataset (ZACI-20)</span></a></p></li>
<li><p><a href="/doc/ai/anime/danbooru/2020-koyama.pdf" class="backlink-not id-not">System for searching illustrations of anime characters focusing on degrees of character attributes</a></p></li>
<li><p><a href="/crop" title="‘Anime Crop Datasets: Faces, Figures, & Hands’, Branwen et al 2020" class="backlink-not id-not">Anime Crop Datasets: Faces, Figures, & Hands</a></p></li>
</ul>
</div>
---
/doc/ai/nn/gan/biggan/2022-wang-2.pdf
Generalizing Factorization of GANs by Characterizing Convolutional Layers
Yuehui Wang, Qing Wang, Dongyu Zhang
2022-07-18
2022-10-17
[("doi","10.1109/ICME52920.2022.9859692")]
ai/anime/danbooru ai/nn/gan/biggan ai/nn/gan/stylegan/anime
<p>Existing unsupervised disentanglement methods in <a href="https://en.wikipedia.org/wiki/Latent_variable">latent</a> space of the <a href="https://arxiv.org/abs/1406.2661">Generative Adversarial Networks</a> (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) rely on the analysis and decomposition of pre-trained weight matrix. However, they only consider the weight matrix of the <a href="/note/fully-connected" title="‘Fully-Connected Neural Nets’, Gwern 2021">fully-connected</a> layers, ignoring the convolutional layers which are indispensable for image processing in modern generative models. This results in the learned latent semantics lack interpretability, which is unacceptable for image editing tasks.</p>
<p>In this paper, we propose a more <em>generalized</em> closed-form factorization of latent semantics in GANs, which takes the convolutional layers into consideration when searching for the underlying variation factors. Our method can be applied to a wide range of deep generators with just a few lines of code.</p>
<p>Extensive experiments on multiple GAN models trained on various datasets [<a href="https://arxiv.org/abs/1812.04948#nvidia" title="‘A Style-Based Generator Architecture for Generative Adversarial Networks’, Karras et al 2018">StyleGAN</a> 1, StyleGAN 2, <a href="/biggan" title="‘Making Anime With BigGAN’, Gwern 2019">Danbooru BigGAN</a>] show that our approach is capable of not only finding semantically meaningful dimensions, but also maintaining the consistency and interpretability of image content.</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="https://arxiv.org/abs/2007.06600" class="backlink-not id-not">Closed-Form Factorization of Latent Semantics in GANs</a></p></li>
<li><p><a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Discovering_Interpretable_Latent_Space_Directions_of_GANs_Beyond_Binary_Attributes_CVPR_2021_paper.pdf" class="backlink-not id-not">AdvStyle: Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes</a></p></li>
<li><p><a href="https://arxiv.org/abs/2004.02546" class="backlink-not id-not">GANSpace: Discovering Interpretable GAN Controls</a></p></li>
<li><p><a href="https://arxiv.org/abs/1802.05701" class="backlink-not id-not">Inverting The Generator Of A Generative Adversarial Network (II)</a></p></li>
<li><p><a href="/doc/ai/nn/gan/stylegan/2019-abdal.pdf" class="backlink-not id-not">Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?</a></p></li>
<li><p><a href="https://arxiv.org/abs/1612.04357" class="backlink-not id-not">Stacked Generative Adversarial Networks</a></p></li>
<li><p><a href="https://arxiv.org/abs/1906.11880" class="backlink-not id-not">Style Generator Inversion for Image Enhancement and Animation</a></p></li>
</ul>
</div>
---
/doc/ai/anime/danbooru/2022-lan.pdf
GCN-Based Multi-Modal Multi-Label Attribute Classification in Anime Illustration Using Domain-Specific Semantic Features
Ziwen Lan, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
2022-10-16
2022-12-18
[("doi","10.1109/ICIP46576.2022.9898071")]
ai/anime/danbooru ai/nn/cnn
<p>This paper presents a multi-modal multi-label attribute classification model in anime illustration based on Graph Convolutional Networks (GCN) using domain-specific semantic features. In animation production, since creators often intentionally highlight the subtle characteristics of the characters and objects when creating anime illustrations, we focus on the task of multi-label attribute classification.</p>
<p>To capture the relationship between attributes, we construct a multi-modal GCN model that can adopt semantic features specific to anime illustration. To generate the domain-specific semantic features that represent the semantic contents of anime illustrations, we construct a new captioning framework for anime illustration by combining real images and their style transformation. The contributions of the proposed method are: (1) More comprehensive relationships between attributes are captured by introducing GCN with semantic features into the multi-label attribute classification task of anime illustrations; (2) More accurate image captioning of anime illustrations can be generated by a trainable model by using only real-world images.</p>
<p>To our best knowledge, this is the first work dealing with multi-label attribute classification in anime illustration.</p>
<p>The experimental results show the effectiveness of the proposed method by comparing it with some existing methods including the state-of-the-art methods.</p>
<p>[<strong>Keywords</strong>: anime illustration, graph convolutional networks, semantic feature, multi-modal classification, image captioning]</p>
<p>…<strong>3.2. Training of Whole Multi-label Classification Model</strong>: In our experiments, we used Danbooru2020 dataset for training the whole multi-label attribute classification model. Danbooru2020 dataset is a large anime illustration dataset with over 4.2 million images and over 130 million tags. From the dataset, we extracted about 25,000 anime illustrations, which include 100 common attribute classes, and each illustration contains an average of 6.3 attribute labels. We used 75% of the 25,000 images as the training set and the remaining 25% as the validation set.</p>
<div class="aux-links-append see-also-append collapse">
<p><strong>See Also</strong>:</p>
<ul>
<li><p><a href="https://arxiv.org/abs/2101.08674" class="backlink-not id-not">DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition</a></p></li>
<li><p><a href="/doc/ai/anime/2015-saito.pdf" class="backlink-not id-not" title="‘<code>Illustration2Vec</code>: a semantic vector representation of illustrations’, Masaki & Matsui 2015"><code>Illustration2Vec</code>: a semantic vector representation of illustrations</a></p></li>
<li><p><a href="/doc/ai/anime/danbooru/2020-koyama.pdf" class="backlink-not id-not">System for searching illustrations of anime characters focusing on degrees of character attributes</a></p></li>
</ul>
</div>
---
/doc/ai/anime/danbooru/2022-gao-2.pdf
An analysis: different methods about line art colorization
Jinhui Gao, Ruihao Zeng, Yuan Liang, Xinyu Diao
2022-11-10
2023-02-25
[("doi","10.1117/12.2641852")]
ai/anime/danbooru ai/nn/gan
<p>We have conducted a series of studies and analyses to address the problem of line art colorization. We chose Generative Adversarial Networks (<a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>), a leading neural network architecture for solving this problem, as our focus.</p>
<p>For a large number of studies based on this architecture, we improved, applied, and analytically compared 4 methods, <a href= "https://arxiv.org/abs/1611.07004#bair" title="‘Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks’, Isola et al 2016">pix2pix</a>, pix2pixHD, white-box, and scaled Fourier transform (SCFT), which can represent the mainstream problem-solving direction in the field of line colorization to the greatest extent possible.</p>
<p>Finally, two reference quantities were introduced to quantify the results of the analysis…From the coloring results and the two indicators, we can see that the white-box and pix2pixHD methods are relatively good coloring results, and pix2pix is less effective.</p>
…[File truncated due to length; see original file]…