Ero

Member
  • Content count

    1,140
  • Joined

  • Last visited

Everything posted by Ero

  1. I don't believe one can be called a 'genius' without having done some form of revolutionary work or fundamentally novel recontextualization of an existing field (you yourself have stated it has little to do with intelligence). Wolfram has done neither. 'Uncomputable' is a term coined by Church and Turing in 1936 as a response to Hilbert's 1928 Entscheidungsproblem. Wolfram simply renamed it as 'computational irreducibility' and claimed to have discovered it. Conway defined 'Game of Life' in 1970, which is 2D cellular automata. Wolfram simplified it to 1D and ran a few iterations, again claiming to be the first to discover it. I can keep going. The point is, there is nothing scientific or even epistemically genuine about what he does. To me that isn't genius, but rather intellectual hubris. You a quote me on this - Wolfram will remain the crackpot at the outskirts of science because nothing of what he does is revolutionary. If one actually were interested in these topics, it would only take a little reading to realize that this so called new 'paradigm' has been brewing for decades - I can give you examples of far more sophisticated scientists from any field: Biology and Medicine - Michael Levin Neuroscience and Statistics - Karl Friston, Demis Hassabis Physics - Giorgio Parisi, Chris Fields Mathematics - Alexander Grothendieck, John von Neumann, Michel Talagrand, Stanislav Smirnov Computer Science - Alan Turing, Alonzo Church, Leslie Valiant, David Ackley As part of Kuhn's argument, it new paradigms no longer occur due to the 'lone genius' as Wolfram imagines it does. Remember when you said in the university thread that if you don't study formally STEM, you will just be an 'ogre' in your lair? That is kinda what Wolfram is, having been disconnected of formal science for close to 40 years.
  2. I wrote more in detail in my journal why this approach of seeking a 'fundamental theory' instead of building bridges is fundamentally problematic:
  3. Look, I am not saying what he does is entirely useless. The main argument I made in my first post in this thread which I will repeat again, is that there isn't anything special about the structure he examines. Every graph he draws, I can encode in a matrix. Matrices are so powerful for two reasons - there is something called Cayley's theorem, which states that any abstract group (the underlying structure in math) can be embedded in a permutation group (i.e how many ways you can re-order n numbers) and every permutation has a matrix representation. This essentially means that all of math can be 'found' inside the space of matrices. What I described to you above is a 'bridge', i.e a functorial/ representational relationship that allows me to switch perspectives. These bridges are used to establish the equivalence I mentioned with other Turing-systems. What my argument is, is that he has simply decided to focus on one Turing-complete system (cellular automata initially, now dynamic graphs) and base his entire theory on it. Again, there is nothing special about that structure, it is simply an instance of a larger formalism which he refuses to acknowledge. Sure, we can work only in his 'ruliad universe'. But why do that when we get better results in a different formalization that may be more suited for the task at hand - for example fluids and stochastic systems. If you are actually interested in what could serve as the foundation of AI (and I mean rigorous theories), then read the work of people who have actually built AI models - Peter Velickovic at DeepMind (Category Theory Paper), Philip Rigollet at MIT (Mathematical Perspective on Transformers). See the difference? There are rigorous statements about relevant models and predictions for those, something you don't find at all in Wolfram's work. Show me where he has made a clear and concise statement that is falsifiable. All he does is say "suggests", "indicates", expecting the reader to trust him because of his 'brilliance'. That is not who science and mathematics work. No matter how smart you are - even if you are Terence Tao or Noam Elkies (who has actually taught me), you still carry the burden of proof. Period.
  4. Thanks for the share. I am not familiar with his work, so I will look into it.
  5. Yet that is exactly what Wolfram does himself in that interview. There is nothing about his demeanour or arguments that suggests otherwise. Wolfram is a Layman’s genius. He clearly has prodigious level intelligence, yet as my argument above shows, nothing of his technical work is really “genius”. He hasn’t invented any of the concepts, hasn’t made any substantive contributions or predictions for that matter. When someone doesn’t have a technical background, it is hard to discern between the two, hence the term “Layman’s genius”.
  6. @Keryo Koffa That's a very good point. Thing is, words being usurped by a belief system/ paradigm shouldn't stop us from using them, especially when they are the most succinct way to express an idea. I believe the opposite - by recontextualizing the meaning, we can in fact accelerate the transition away from the old paradigm. I can give you two examples: "God" - a very, very loaded word associated with stage blue paradigm. Nonetheless it has the exact meaning we psychonauts/spiritually developed people experience. Sure, you can try and swerve around it calling it 'All', 'The Universe', etc. but nothing really gets the point across as well as 'God' does. By recontextualizing the experience of Christ consciousness/ Oneness with God, we can help people realize there isn't a separation. "Entropy" - largely interpreted as the movement towards disorder in the old Physics paradigm. The reality is, that is true only in equilibrium systems. In non-equilibrium systems, such as the Earth itself and all life on it, it has the exactly opposite effect - it creates order through the emergence of levels of abstraction. It wouldn't make sense to change the word for it, because it is fundamentally the exact same underlying principle. In short, recontextualizing a concept is more powerful than swaying away from it because it has loaded meaning. This idea translates even for some of the most revolutionary work done in mathematics (e.g. Topos Theory and Motives as developed by Grothendieck)
  7. @Keryo Koffa Yeah, the connotation I had for 'useful' is in a post-capitalist sense. I don't believe in the pure capitalist meritocracy of 'be useful or die useless'. I meant it more in the universal way of giving back to the flow of consciousness and creation. Utility in my book is a rather nuanced notion and one that is fundamentally not economically transferrable.
  8. Agreed. Hoffman is a far more accomplished scientist in any aspect you can think. Yet Wolfram's hubris in the conversation is truly unbearable. His biggest strength of a hyper-intellect is also his worst flaw since it makes him think he is superior despite not having done the actual work. Growing up a prodigy has its pitfalls.
  9. @Someone here For resources, I suggest Kuhn's 'Structure of Scientific Revolutions'. It gives a nice framework for thinking of the general direction of technology/science (they go hand in hand, hence the term 'STEM') My two cents stem from my current contextualization through the paradigm of 'Chaos, Entropy, Order'. Science is fundamentally our generative model of the world per the Free Energy Principle (attr. Karl Friston), aiming at minimizing our 'surprise' from observations of the world (equivalent to maximizing entropy - postulate 2). Technology are the fruits of this model - once you understand a system, you can influence/control it. The examples are numerous. By gaining new levels of abstractions (postulate 3 - Order), we can make sense of the apparent complexity around us (postulate 1 - Chaos) and turn it into something useful. To sum it up, in the words of Arthur C Clarke - 'Any sufficiently advanced technology is indistinguishable from magick'
  10. First step is to set a direction of your reading/research, which will largely inform what is useful. I then sift through papers and books by reading the abstract and/or introduction and contents page. If something peaks my interest, I take a deeper dive. Nowadays you can use GPT to compile/ give you a resume and you can decide whether it is worth your time.
  11. Celebrities, gossip, trends, fashion, 'culture'. Too many people care about being liked/ in-tune and not about being useful.
  12. Coffeezilla is doing actually important research. He called the FTX scandals better than any economist/ tech person I know. That says something
  13. @Fearey Really clear snd succinct. Thanks for sharing. Reminds me of how people who don’t have the slightest clue of basic finance think centbillionaires have all their money as piles of cash laying around they can just spend.
  14. In light of my recent technical feedback on Wolfram's Fundamental Theory: I wanted to share my perspective on why the idea of a 'fundamental theory' or a 'theory of everything' is fundamentally flawed and a doomed from the start. My thoughts have been largely inspired by mathematics and coincide with the ruminations Edward Frenkel has shared with both Curt Jaimungal and the Science and Nonduality folk. I will try to use as many metaphors as possible without bogging the discussion with too much technical detail. For the sake of epistemic neatness, let me first specify that when speaking of a theory, I am referring to a model of the world that works through the creation/study of abstract structures and their relationships, i.e the map, not the territory. A theory is fundamentally different than an ontology (the matter/ composition of the terrain) and the two should not be conflated. An ontology gives no 'shape' to the territory, but it 'informs' what structure you can build on top the same way you can't build a castle from sand. The underlying premise of a unifying theory is the idea of 'universal terrain' - an underlying structure that can be inferred through the various ways of mapping the same territory - whether it is topography (height map), seismic analysis (depth map) or standard cartography (landscape), the terrain manifests itself differently in each instance, but still posses some invariant qualities across each type of measurement. For example, a desert doesn't have caves or tall mountains and is more spread out compared to a cavernous karst landscape. The presiding epistemological assumption in Physics is the existence of such a landscape - a grand unified theory that can be found across all our methods of sampling of reality - electromagnetic, gravitational or nuclear in nature. The underlying belief is that once such a Unified Field theory is found, everything else will fall into place. This, of course, is nothing more than a pipe dream. Contemporary physics itself does not operate under that assumption. The problem with it stems from the fundamentally computationally intractable nature of reality (postulate 1 of 'Chaos'): If you have heard of the idea of the 'Three-Body Problem' (recommend the book), than you already know that there a fairly simple systems (three body acting through gravitation on each other) for which we can no longer find closed-form solutions. This, in fact, turns out to be the rule and not the exception. All our calculable models are fundamentally idealizations - when studying newtonian mechanics, you always ignore friction. When studying many-body quantum systems, you try and minimize long-distance interactions. When trying to fight a virus, we do not study its quantum composition, only its molecular/cellular structure. The aforementioned intractability provably appears also in fluids, black holes, Ising models, lorenz systems, etc. Now it seems we can't solve anything. Or can we? The person who made the most progress on the aforementioned three-body problem is Henri Poincare, a true giant in the field of mathematics. He not only proved that no general closed-form solution exists, he developed entirely new models in topology and symplectic geometry that allowed him to find stable 'homoclinic' orbits - by creating new models of abstraction, he was able to discern properties of the system despite its fundamental intractability. This approach of abstraction is in fact universal (postulate 3 'Order') A great anecdote Frenkel gave is that when you are playing chess, you don't care about calculating every quantum state of the system and even if you could (we can't by a factor of 10^23), it still wouldn't help you win the game. Combining this with the prior realization about the intractable nature of reality helps us approach a greater understanding and one that has in fact been foundational to the true nature of mathematics - the study of structure and abstraction itself. Instead of seeking a 'unifying theory', we should marvel at the complexity of reality and approach problems at each level (physics, chemistry, biology, computer science etc.) by applying models and ways of thinking from across the the fields without any 'elitism' as to which science is the 'fundamental one'. And yet, here is why 'mathematics is the queen of all sciences' as Gauss said it - because Mathematics does not make itself to be a fundamental theory, as it does not make prescriptive statements about the landscape. Rather, it is a universal language that helps us map and compare all the landscapes there are. Even deeper at its essence, unification in Mathematics is about building the bridges across the continents.
  15. @CARDOZZO Pointers and actual maps are different things. I can point you in the right direction and you can sill fall off a cliff and kill yourself. I do agree with the need for a new paradigm, which is why I started a journal here explicitly with this goal: Thing is, far more accomplished and cogent arguments have been made for the new of a scientific paradigm. I mention some of them in the first post above, but they include Nobel Laureates like Ilya Prigogine, Fields Medalists like Alexander Grothendieck, and the most accomplished neuroscientist of all time - Karl Friston. The first two have worked on 'a new paradigm' since before Wolfram was born. Saying this from the standpoint of someone who reads hundreds of pages of scientific literature a week, Wolfram's writings are some of most tedious and excruciating to read and not because they are conceptually hard, but rather the opposite - it's just fluff.
  16. From a philosophical standpoint there are very interesting and important observations, especially relating computational irreducibility, the ruliad and the hyper-ruliad. Those deserve attention and I am sure Leo will cover them in detail, which is why I will abstain from the philosophical aspect and instead give my two cents on the technical aspect of his work (after all, he claims he is doing 'Science', fundamentally rigorous in nature), given my background is in Pure Mathematics. For reference, I have read his 'New Kind of Science" and his 'Physics Project' Technical Report. To get straight to the point, the technical aspect his work is subpar - the entirety of his arguments are heuristic and qualitative based. His 'proofs' and discoveries are mostly pictures/ diagrams and he references solely himself with sense of aggrandizement that would make you think he is the only human being who has every said anything about this topic (far from it). The actual 'physics' and math concepts he derives from his 'foundational theory' are unworkable toy examples that do not transfer at all. There are no isomorphisms, functors or representations (i.e 'bridges') that would help you put all prior existing work into his context in a way that would be expected from a 'foundational theory'. Consider for example his 'operator' interpretation here (the bread and butter of physics)- instead of interpreting the functional space and operator algebra as is necessary in defining it, he only gives you a toy 'commutator' example that does nothing to demonstrate that his approach is even workable at the level of complexity modern science expects (i.e Hamiltonian, Langevin). For reference, the majority of Quantum Mechanics is based on non-commutative operator algebras. Furthermore, I don't see how you would even be able to define spectral decomposition of eigenvalues and eigenfunctions in his context. Same goes for his 'gauge invariant' interpretation - instead of defining/ representing the fundamental gauge field symmetry groups U(1)xSU(2)xSU(3), he again simply refers to a toy example without any prescriptive powers. This means that his model fundamentally fails at predicting any kind of behavior that we observe for example in particle colliders. Why on earth would then physicists use his theory? To put it plainly, if I were to present his work to any of my professors, I would get an 'F'. Plain and simple. You may ask, isn't this something that can be solved with some extra time and rigor? The problem in fact with his 'theory' is deeper - he makes a fundamental ontological fallacy. Having observed what he describes as 'complex behavior' in 1D finite state automata, he claims then it must be true the entirety of existence is based on finite, simple and discrete rules. There are two problems with this argument: Firstly, there is no point at which he defines 'complexity' or provides us with his interpretation of it (immediate red flag for any scientist). Simply observing something as seemingly complex does not mean it inherently is. For example, if we take Kologomorov Complexity as our working definition, the entirety of his celullar automaton examples are in fact not complex because it takes very little 'space' to define them. The second and deeper issue is that there is nothing special about finite state automata. They are simply an instance of a larger class of systems we call 'Turing complete'. Even the representation he uses to draw the pictures is not special to cellular automata (and no, he did not invent it)- consider for example the following picture, showing three different Turing machines's tapes as they progress in time (horizontal axis). You can clearly see the parallel. Now, here is where is the problem. There are in fact many Turing-complete systems we know of, which means by definition everything he observes - rule 30, rule 110, etc. has an exact equivalent: - The four nucleotide bases of DNA - Fluid systems - Ferromagnets (Spin Glass model) - Water pipes - ... and many more By his argument, is then the entire world the genetic code of some organism? A large magnet? A large glass of water? The sewage system of some alien's house? There is nothing that gives basis for making the claims he does. And as much as he wants you to believe that he is the only one doing this kind of work, there are entire disciplines dedicated to studying integrable systems. Mathematicians and physicists know all too well about the problem of uncomputatbility, lack of definedness, etc., and instead of discarding the entire field of mathematics as he proposes we do, they build tools specifically to find one's way around it. My research into Heavy-Tailed Matrices is an example. You can also consider scattering resonances, chiral models, etc. and much more. The fields of mathematics and statistics are in fact much ahead of him than he wants you to think. TLDR: Philosophically interesting, technically subpar
  17. That maybe true for High School Math and some undergraduate topics that have a visual component, like Linear Algbera, Multivariable Analysis and Complex Analysis. However, for any more advanced topic like Algebraic Geometry, Algebraic Topology, Representation theory and Functional analysis, not only are there zero to none videos, I fail to see how you would even represent infinite-dimensional spaces, sheafs and cohomology chains. The power of mathematics comes precisely from the ability to prove stuff about things we can’t even visualise.
  18. There is indeed a very large variance in how textbooks are written. Especially for grad material, some professors just like being dicks and making everything excruciatingly dry. It's the same reason for those laughs. Mathematics gets such a bad name because of a few people who can't help their insecurities.
  19. There is something to be said about being a catalyst. Instead of 'control', why not 'steer'? Carnot 'steered' Steam Power. Tesla was instrumental in the 'wave' of Electrification. Oppenheimer - nuclearization. Once a chain fission reaction is unleashed, little can be done to stop a wave so destructive as to 'surf it' . But under the right conditions, it becomes a source of energy. By finding the underlying mechanisms of the complex system, one can learn to orchestrate the emergent phenomena without micro-managing. Who is to say we won't be able to surgically engineer tectonic movement or climate currents? There are those that strive to Tame the Chaos.
  20. Pulled an 11h day working through the literature. Meeting with advisor is schedule on Wednesday. Current thesis proposal draft: # Thesis Proposal: Heavy-tailed Random Matrices ## Introduction Recent developments in the field of Deep Neural Networks (DNNs) have proven incredibly effective in extracting correlations from data. However, the current paradigm and methodology are still largely based on heuristics and lack the theoretical underpinnings necessary for both prescriptive and explanatory properties. Many approaches have been proposed with the purpose of alleviating this so-called 'black box problem' (i.e., lack of interpretability), ranging from the early attempts at using Vapnik-Chervonenkis (VC) theory [1] to subsequent applications of Statistical Mechanics [2-4]. Arguably, none have been as effective at predicting the quality of state-of-the-art trained models as Random Matrix Theory (RMT) [5,6], and more specifically, the recently established Theory of Heavy-Tailed Self-Regularization (HT-SR) by Martin and Mahoney [7-11]. Their empirical results have led to the creation of novel metrics, as well as a variety of interesting theoretical results with respect to the study of the generalization properties of stochastic gradient descent (SGD) under heavy-tailed noise [12,13]. ## Background and Significance ### HT-SR Theory Martin and Mahoney's approach is based on the study of the empirical spectral density (ESD) of layer matrices and their distributions [7]. More specifically, for \( N \times M, N \geq M \) real-valued weight matrices \( W_l \) with singular value decomposition \( \mathbf{W} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T \), where \( \nu_i = \mathbf{\Sigma}_{ii} \) is the \( i \)-th singular value and \( p_i = \nu_i^2/\sum_i \nu_i^2 \). They define the associated \( M \times M \) correlation matrix \( \mathbf{X}_l = \frac{1}{N}\mathbf{W}_l^T\mathbf{W}_l \) and compute its eigenvalues, i.e., \( \mathbf{X}\mathbf{v}_i =\lambda_i\mathbf{v}_i \), where \( \forall_{i=1, \cdots, M}\lambda_i = \nu_i^2 \). They subsequently categorize 5+1 phases in the training dynamics by modeling the elements of the latter matrices using Heavy-Tailed distributions, i.e., \( W_{ij}\sim P(X)\sim \frac{1}{x^{1+\mu}}, \mu>0 \), whereas the ESD \( \rho_N(\lambda) \) likewise exhibits Heavy-Tailed properties. Excluding the two initial phases and that of over-training (+1), there are 3 phases of interest, categorized by their better generalization, namely: - **Weakly Heavy-Tailed**: \( 4 < \mu \) with Marchenko-Pastur behavior in the finite limit and Power-Law statistics at the edge. - **Moderately Heavy-Tailed**: \( 2 < \mu < 4 \) with \( \rho(\lambda)\sim \lambda^{-1-\mu/2} \) at finite size and \( \rho_N(\lambda)\sim \lambda^{-a\mu+b} \) at infinite size, whereas the parameters \( a, b \) are empirically fitted using linear regression. Maximum eigenvalues follow the Frechet distribution. - **Very Heavy-Tailed**: \( 0 < \mu < 2 \), where the ESD is Heavy-Tailed/PL for all finite \( N \) and converges for \( N\rightarrow\infty \) to a distribution with tails \( \rho(\lambda)\sim \lambda^{-1-\mu/2} \). The maximum eigenvalues again follow a Frechet distribution. ### Significance The theory of HT-SR has led to interesting results both for the sake of applicability and from a purely theoretical standpoint. The practicality of this work has become apparent due to the development of more efficient training policies, such as temperature balancing [9], as well as real-time metrics like the Frobenius Norm, Spectral Norm, Weighted Alpha, and \( \alpha \)-Norm, which are calculated using HT-SR independently of the training and testing data [10]. On the other hand, the empirical observations have inspired the construction of stronger bounds for the generalization properties of SGD's trajectories via stochastic differential equations (SDEs) under heavy-tailed gradient noise [12]. These bounds have indicated a 'non-monotonic relationship between the generalization error and heavy tails,' and have been developed into a general class of objective functions based on the Wasserstein stability bounds for heavy-tailed SDEs and their discretization [13]. The aforementioned results support the claim that a more detailed study of the ESDs of various open-source models can lead to a refined understanding of the phenomenology and provoke interesting theoretical insights. It is also important to mention the feasibility of this work, as the empirical component does not require extensive computational resources. Using open-sourced models and their weights, the analysis can be performed on a local machine without significant overhead. ## Objectives & Methodology The goals of this paper are two-fold: - To present a theoretical exposition of the 'relatively new branch' from RMT [14], specifically that of heavy-tailed random matrices, by citing the rapidly developing literature [15-18]. - To expand the empirical results of HT-SR by applying refined classification through the use of Maximum Likelihood Expectation (MLE) with respect to a range of heavy-tailed distributions, instead of linear regression for a Power-Law fit. Additionally, the paper aims to examine a wide array of open-source models varying in architecture and underlying symmetries. ### Empirical Study The methodology proposed follows Martin and Mahoney’s approach [7]—studying the ESD of layer weight matrices of DNNs. Their classification of training dynamics involves 5+1 phases determined by the deviation of Bulk Statistics from the standard Marchenko-Pastur Distribution towards a Heavy-Tailed distribution. Martin and Mahoney estimate the extent of heavy-tailed behavior through linear regression on the log-log plot of the empirically fitted Power-Law exponent \( \alpha \). While sufficient for their stated "aim for an operational theory that can guide practice for state-of-the-art DNNs, not for idealized models for which one proves theorems" [8], this approach is agnostic to the underlying heavy-tailed distribution and potentially misses valuable information. Studies of heavy tails have noted the unreliability of using linear regression for estimating the scaling parameter \( \alpha \) [19]. To address this issue, we propose using MLE with respect to different heavy-tailed distributions, such as the Pareto, Cauchy, Levy, Weibull, or Frechet distributions. The latter is particularly meaningful given the empirical observations in HT-SR [@martin2019traditional]. This approach aims to refine the classification of underlying distributions by analyzing a broader array of models, such as the 16 open-source symmetry-informed geometric representation models of *Geom3D* [20]. ### Theoretical The purpose of providing a theoretical exposition of heavy-tailed random matrices is, first and foremost, to consolidate what is currently a rich, but largely disconnected body of literature[14-19]. With only a single dedicated chapter in the Oxford Handbook of RMT [14] and not a lot of theoretical surveys, it is hard to put in context the earlier work. The consequences of this can be seen through a single example, found in a paper, predating the HT-SR theory by more than 11 years. More specifically, observe the following theorem (Theorem 1 [21]) for random matrices with i.i.d heavy tailed entries, i.e $(a_{ij}), 1\leq i\leq n, 1\leq j\leq n$ with $1-F(x)=\bar{F}(x)=\mathbb{P}\left(\left|a_{i j}\right|>x\right)=L(x) x^{-\alpha}$ where $0<\alpha<4$ and $\forall t>0, \lim_{x\rightarrow\infty}\frac{L(tx)}{L(x)}=1$. With the additional assumption of $\mathbb{E}(a_{ij})=0$ for $2\leq \alpha <4$ , the theorem stats that the random point process $\hat{\mathcal{P}}_n = \sum_{i\leq i\leq j\leq n}\delta_{b_n^{-1}|a_{ij}|}$ converges to a Poisson Point process with intensity $\rho(x) =\alpha\cdot x^{-1-\alpha}$. This theoretical result in fact matches precisely the values used to classify one of the phase transitions in HT-SR [7] w.r.t $\alpha$, as well as the power law exponent of the linear regression fit. Furthermore, its corollary (Corollary 1 [21]) gives theoretical justification for what the authors of HT-SR [7] observe to be the Frechet distribution fit for the maximum eigenvalues within that same heavy-tailed phase. Not only that, but its Poisson process phenomenology seems to agree with the underlying assumption behind one of the aforementioned theoretical results, namely that SDE trajectories of SGD are well-approximated by a Feller process [12]. This suggests that the latter results are exceptionally interesting due their potential to serve as theoretical grounding for what is currently only an empirical theory. A more rigorous exposition of the material, paired with the aforementioned empirical analysis have the potential to give clarity to what is already being developed as a potential theory for the learning of DNNs. ## References 1] - V. Vapnik, E. Levin, and Y. Le Cun. Measuring the VC-dimension of a learning machine. Neural Computation, 6(5):851–876, 1994. [2] - A. Engel and C. P. L. Van den Broeck. Statistical mechanics of learning. Cambridge University Press, New York, NY, USA, 2001. [3] - Y. Bahri, J. Kadmon, J. Pennington, S. S. Schoenholz, J. Sohl-Dickstein, and S. Ganguli. Statistical Mechanics of Deep Learning. Annual Review of Condensed Matter Physics 11:501-528, 2020. [4] - Schoenholz, S. S., Pennington, J., & Sohl-Dickstein, J. A Correspondence Between Random Neural Networks and Statistical Field Theory, 2020 [5] -J. Pennington, and P. Worah. Nonlinear random matrix theory for deep learning. NIPS 2017 [6] - J. Pennington, and Y. Bahri. Geometry of Neural Network Loss Surfaces via Random Matrix Theory. PMLR 70, 2017. [7] - C. H. Martin and M. W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. Journal of Machine Learning Research, 22(165):1–73, 2021. [8] - C. H. Martin and M. W. Mahoney. Traditional and heavy tailed self regularization in neural network models. In International Conference on Machine Learning, 2019. [9] - Y. Zhou, T. Pang, K. Liu, C. H. Martin, M. W. Mahoney, and Y. Yang. Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training. NIPS 2023 [10] -C. H. Martin, T. S. Peng, and M. W. Mahoney. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nature Communications, 12(1):1–13, 2021. [11] - C. H. Martin and M. W. Mahoney. Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks. In SIAM International Conference on Data Mining, 2020. [12] - Şimşekli, U., Sener, O., Deligiannidis, G., & Erdogdu, M. A. Hausdorff dimension, heavy tails, and generalization in neural networks. Journal of Statistical Mechanics 2021(12), 2021. [13] - Anant Raj, Zhu, L., Gürbüzbalaban, M., & Umut \c{S}imşekli. Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions, 2023 [14] -32. Z. Burda and J. Jurkiewicz. Heavy-tailed random matrices. The Oxford Handbook of Random Matrix Theory, 2011 [15] - J. Bouchaud, and M. Potters. Financial applications of random matrix theory: a short review. The Oxford Handbook of Random Matrix Theory, 2011 [16] - Edelman, A., Guionnet, A., & Péché, S. .Beyond Universality in Random Matrix Theory. The Annals of Applied Probability 26(3), 2016 [17] G. B. Arous, and A. Guionnet. The Spectrum of Heavy Tailed Random Matrices. Springer, 2017 [18] - Rebrova, E. Spectral Properties of Heavy-Tailed Random Matrices. ProQuest Dissertations & Theses, 2018 [19] -Nair, J., Wierman, A., & Zwart, B. The fundamentals of heavy-tails: properties, emergence, and identification. Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems 387–388, 2013 [20] - S., Du, W., Li, Y., Li, Z., Zheng, Z., Duan, C., Ma, Z., Yaghi, O., Anandkumar, A., Borgs, C., Chayes, J., Guo, H., & Tang, J. (2023). Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials. NIPS 2024, 2023 [21] - A. Auffinger, G. B. Arous, G, and S. Peche. Poisson convergence for the largest eigenvalues of heavy tailed random matrices. Annales de l’I.H.P. Probabilités et Statistiques 45(3), 589–610, 2009
  21. Hey, brother, hope you are hanging in there. Since most people took the spiritual route with their answer, let me give you a man to man answer. I have had periods of consecutive 4-5 months that I hadn't talked to absolutely anybody. 2 years of my life I had only nightmares. I've had months on end when the only thing I felt was depression so strong, I had tearing physical pain in my chest. That is to say, I've been to some dark times and places, and the one thing that has kept me together throughout is physical exercise. I mean it. Whether it was running 6 miles/10k daily a few years ago, or as of the last 2 years powerlifting, the precise exercise doesn't really matter as long as I am pushing. As males, our hormone levels and aggression (or lack there-of) are directly tied to our body. By creating a baseline of discipline, it gives you something to fall back on. It is also incredibly healthy for your neurotransmitters/brain chemistry. Along that line, I also quit weed 8 months ago, so respect for your commitment, you are already steps in the right direction.
  22. I just set up a python script that creats my daily note with all the necessary structure, pulls my tasks from the whole database and syncs it with my Google Calendar. And all I have to do is type 'python daily_drive.py'. If you are a techie/ nerd out on stuff like this, ain't no other option than obsidian. P.S - you can use your notes as a contextual window for AI agents to help you with projects. I set up the script above in 10 minutes using this fact.
  23. @Null Simplex Thanks for sharing! I do agree that there are many deficiencies in our current model of teaching world-wide. Some of it has to do with lack of resources, but even in Ivy I've stumbled on really badly taught classes in math, so I get your point. I agree mathematics can be taught in much more intuitive way with visualizations like the ones 3Blue1Brown does and sure hope to contribute in the automation of such visualizations in the near future. My only point is that if you are really passionate about mathematics, university is a really good idea.
  24. I would say both my personality and life purpose are at a strong opposition with New Age beliefs/ social Dynamics, despite having extremely strong spiritual experiences, dozens of trips, and a very deep love for nature. I cannot stand conformity and could care less about being part of a group. I am open to and have experienced paranormal/ siddhis. But as someone highly-technical, the new age thinking and rhetoric on these topics is just making me cringe. I seek understanding, so I can engineer the next paradigm. I work 9-10h days with passion and could care less about “going with the flow”.