Algorithms & Complexity Seminar, MIT : Spring 2021

Organizers: Sitan Chen (sitanc AT mit DOT edu), Quanquan C. Liu (quanquan AT mit DOT edu), Nikhil Vyas (nikhilv AT mit DOT edu)

The Algorithms & Complexity Seminar for Spring 2021 will usually (unless otherwise stated) meet on Wednesdays 4pm-5pm in 32-G575 virtually (Theory Lab on the 5th floor of the Stata Center). The style and format of these meetings are variable. Please feel free to contact the organizers to find out more details. To receive announcements of upcoming talks, subscribe to the mailing list by either visiting the mailman page or send an empty email to compalgsem-subscribe@lists.csail.mit.edu.

Upcoming Talks

• Monday, May 24, 2021: Thatchaphol Saranurak (University of Michigan). [Time: 3-4pm EDT: Zoom Link]
Abstract. I will talk about a new technique called congestion balancing which allows us to obtain several strong dynamic graph algorithms against an adaptive adversary including:

1. A near-optimal decremental single-source shortest path algorithm. This essential closes a long line of work on this problem. Moreover, it implies the first almost-linear time algorithms for computing (1+eps)-approximate min-cost flow on undirected vertex-capacitated graphs, which in turn implies many other applications.

2. A decremental single-source reachability algorithm O(mn^{2/3})-total-update time, breaking the O(mn) bound barrier since 1981 by [Even-Shiloach].

3. A near-optimal decremental bipartite matching.

I will explain the technique in a tutorial style (it should be easy-going) and explain on a high level why it is applicable for other problems.

Based on two papers with Aaron Bernstein and Maximilian Probst Gutenberg: https://arxiv.org/abs/2101.07149, https://arxiv.org/abs/2009.02584.

Past Talks

• Wednesday, February 24, 2021: David Wajc (Stanford). [Time: 4-5pm ET, Recording]
• Wednesday, March 3, 2021: Noah Golowich (MIT). [Time: 4-5pm ET, Recording]
Topic. Sample-efficient proper PAC learning with approximate differential privacy.
Abstract. An exciting recent development in the theory of differentially private machine learning is the connection between private learning and online learning, and in particular the result that a binary hypothesis class is privately learnable if and only if it is online learnable (i.e., has finite Littlestone dimension). In this talk I will discuss our work strengthening various aspects of the result that online learning implies private learning: first, we show that the sample complexity of properly learning a class of Littlestone dimension d with approximate differential privacy is Õ(d^6), ignoring privacy and accuracy parameters. This result answers a question of Bun et al. (2020) by improving upon their upper bound of 2^O(d) on the sample complexity. Prior to our work, finiteness of the sample complexity for privately learning a class of finite Littlestone dimension was only known for improper private learners, and the fact that our learner is proper answers another question of Bun et al., which was also asked by Bousquet et al. (2020). Using machinery developed by Bousquet et al., we also show that the sample complexity of sanitizing a binary hypothesis class is at most polynomial in its Littlestone dimension and dual Littlestone dimension. This implies that a class is sanitizable if and only if it has finite Littlestone dimension. An important ingredient of our proofs is a new property of binary hypothesis classes that we call irreducibility, which may be of independent interest.

Joint work with Badih Ghazi, Ravi Kumar, and Pasin Manurangsi.
• Wednesday, March 10, 2021: Timothy Chu (CMU). [Time: 4-5pm ET, Recording]
Topic. Manhattan distances, Metric Transforms, and Kernels.
Abstract. Takei $n$ points in any dimension, compute the Manhattan distance between each pair of points, and take the $\pi/4$ power of each distance. The resulting distances will always be a Manhattan distance. Can you prove why?

In our work, we classify all functions $f$ that send Manhattan distances to Manhattan distances. We also classify all functions $f$ that send Manhattan distances to inner products, a fundamental question related to kernels in machine learning. The tools we use include the Hadamard transform, Abelian group representation theory, probability theory, and linear algebra.

The Euclidean analog of our work was established by Von Neumann and Schoenberg from 1937-1942 (Annals of Math ‘37, ‘38, ....). The Euclidean analog is the foundation of kernel methods, harmonic analysis of semi groups, and was recently used to show that sketching and embedding of norms were equal(AKR, STOC ‘15).

We will provide a number of open problems related to this topic. No special background will be needed to enjoy this talk!

Joint work with Gary Miller, Shyam Narayanan, and Mark Sellke. Link to paper: arXiv.
• Wednesday, March 17, 2021: Omri Ben-Eliezer (Harvard). [Time: 4-5pm ET, Recording]
Abstract. Streaming algorithms are traditionally divided into two categories: deterministic algorithms and randomized ones. These two types of algorithms exhibit a tradeoff between robustness and efficiency; on one hand, deterministic algorithms always return a correct output, but for many important streaming problems, they are provably inefficient. On the other hand, efficient randomized algorithms are known for a very wide range of problems, but their correctness typically relies on the assumption that the stream is fixed in advance, which is commonly unrealistic in practice.

In this talk, I will focus on a middle ground that combines both robustness and efficiency: adversarially robust algorithms, whose output is correct with high probability even when the stream updates are adaptively chosen as a function of previous outputs. This regime has received surprisingly little attention until recently, and many intriguing problems are still open. I will mention some of the recent results, discussing algorithms that are well-suited for the adversarially robust regime (random sampling), algorithms that are not robust (sketching), and efficient techniques to turn algorithms that work in the standard static setting into robust streaming algorithms.

The results demonstrate strong connections between streaming and other areas such as online learning, differential privacy, cryptography, adaptive data analysis and statistics.

Based on joint works with Noga Alon, Yuval Dagan, Rajesh Jayaram, Shay Moran, Moni Naor, David Woodruff, and Eylon Yogev.
• Wednesday, March 24, 2021: Raghavendra Addanki (UMass Amherst). [Time: 4-5pm EDT: Recording]
Topic. Approximation Algorithms for Learning Causal Graphs.
Abstract. Discovering causal relationships is one of the fundamental problems in causality. In our work, we study the problem of learning a causal graph where we seek to identify all the causal relations (edges) between variables in our system (nodes of the graph). It has been shown that, under certain assumptions, observational data alone lets us recover the existence of a causal relationship between, but not the direction of all relationships. To recover the direction, we use the notion of an intervention (or an experiment) described in Pearl’s Structural Causal Models (SCM) framework. We assume access to an undirected graph G on the observed variables whose edges represent all causal relationships and our goal is to recover their directions using a minimum cost set of interventions.

It is known that constructing an exact minimum cost intervention set for an arbitrary graph G is NP-hard. We further argue that, conditioned on the hardness of approximate graph coloring, no polynomial time algorithm can achieve an approximation factor better than o(log n), where n is the number of observed variables in G. To overcome this limitation, we introduce a bi-criteria approximation goal that lets us recover the directions of all but small fraction of edges in G. Under this relaxed goal, we give polynomial time algorithms that achieve an approximation factor of 2. Our algorithms combine work on efficient intervention design and the design of low-cost separating set systems, with ideas from the literature on graph property testing.

Based on joint works with Shiva Kasiviswanathan, Andrew McGregor and Cameron Musco.
• Wednesday, March 31, 2021: Yang P. Liu (Stanford). [Time: 4-5pm EDT: Zoom Link]
Topic. Fully Dynamic Electrical Flows: Sparse Maxflow Faster Than Goldberg-Rao.
Abstract. We give an algorithm for computing exact maximum flows on graphs with $m$ edges and integer capacities in the range $[1, U]$ in $O(m^{(3/2-1/328)} \log U)$ time. For sparse graphs with polynomially bounded integer capacities, this is the first improvement over the $O(m^{1.5} \log U)$ time bound from [Goldberg-Rao JACM 98].

Our algorithm revolves around dynamically maintaining the augmenting electrical flows at the core of the interior point method based algorithm from [Madry JACM 16]. This entails designing data structures that, in limited settings, return edges with large electric energy in a graph undergoing resistance updates.

Based on joint work with Yu Gao and Richard Peng.
• Wednesday, April 7, 2021: Meena Jagadeesan (Berkeley). [Time: 12-1pm EDT (Note: Time Change): Recording]
Topic. Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm.
Abstract. The magnitude of the weights of a neural network is a fundamental measure of complexity that plays a crucial role in the study of implicit and explicit regularization as well as generalization. For example, gradient descent updates in overparameterized models often lead to solutions that implicitly minimize the ell_2 norm of the parameters of the model, resulting in an inductive bias that is highly architecture dependent. To investigate the properties of learned functions, it is natural to consider a function space view given by the minimum ell_2 norm of weights required to realize a given function with a given network. We call this the “induced regularizer” of the network.

Building on a line of recent work, we study the induced regularizer of linear convolutional neural networks with a focus on the role of kernel size and the number of channels. We introduce an SDP relaxation of the induced regularizer, that we show is tight for networks with a single input channel. Using this SDP formulation, we show that the induced regularizer is independent of the number of the output channels for single-input channel networks, and for multi-input channel networks, we show independence given sufficiently many output channels. Moreover, we show that as the kernel size increases, the induced regularizer interpolates between a basis-invariant norm and a basis-dependent norm that promotes sparse structures in Fourier space.

Based on joint work with Suriya Gunasekar and Ilya Razenshteyn.
• Wednesday, April 14, 2021: Sami Davies (University of Washington). [Time: 4-5pm EDT: Zoom Link]
Topic. Scheduling with Communication Delays via LP Hierarchies and Clustering.
Abstract. We consider the classic problem of scheduling jobs with precedence constraints in the presence of communication delays. In this setting, if two dependent jobs are scheduled on different machines, then at least c units of time must pass between their executions. Despite its relevance to many applications, this model remains one of the most poorly understood in scheduling theory. Even in the special case where an unlimited number of identical machines are available, when the objective is minimizing makespan the best known approximation ratio is 2/3*(c+1). An outstanding open problem in the top-10 list by Schuurman and Woeginger and its recent update by Bansal asks whether there exists a constant-factor approximation algorithm.

We give a polynomial-time O(log c * log m)-approximation algorithm for the problem when given m identical machines and communication delay c and the objective is to minimize makespan. Further, for scheduling n jobs on related machines, we prove a O(log^4 n)-approximation algorithm when the objective is to minimize the weighted sum of completion times and a O(log^3 n)-approximation algorithm when the objective is minimizing makespan. Our approach is based on a Sherali-Adams lift of a linear programming relaxation and a randomized clustering of the semimetric space induced by this lift. This is joint work with Janardhan Kulkarni, Thomas Rothvoss, Jakub Tarnawski, and Yihao Zhang.
• Wednesday, April 21, 2021: Richard Peng (Georgia Tech). [Time: 4-5pm EDT: Zoom Link]
Topic. Solving Sparse Linear Systems Faster than Matrix Multiplication.
Abstract. Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an $n$-by-$n$ linear system $Ax=b$ is $n^\omega$, where $\omega<2.372864$ is the matrix multiplication exponent. Improving on this has been an open problem even for sparse linear systems with $\text{poly}(n)$ condition number.

We present an algorithm that solves linear systems in sparse matrices asymptotically faster than matrix multiplication for any $\omega>2$. This speedup holds for any input matrix $A$ with $o(n^{\omega - 1}/\log(\kappa(A)))$ non-zeros, where $\kappa(A)$ is the condition number of $A$. For poly(n)-conditioned matrices with $O(n)$ nonzeros, and the current value of $\omega$, the bit complexity of our algorithm to solve to within any $1/\text{poly}(n)$ error is $O(n^{2.331645})$.

Our algorithm can be viewed as an efficient randomized implementation of the block Krylov method via recursive low displacement rank factorizations. It is inspired by the algorithm of [Eberly-Giesbrecht-Giorgi-Storjohann-Villard ISSAC 06 07] for inverting matrices over finite fields. In our analysis of numerical stability, we develop matrix anti-concentration techniques to bound the smallest eigenvalue and the smallest gap in eigenvalues of semi-random matrices.

Joint work with Santosh Vempala, manuscript at https://arxiv.org/abs/2007.10254.
• Wednesday, April 28, 2021: C. Seshadhri (UC Santa Cruz). [Time: 4-5pm EDT: Zoom Link]
Topic. Random walks and forbidden minors III: poly(d/eps)-time partition oracles for minor-free graph classes.
Abstract. Consider the family of bounded degree graphs in any minor-closed family (such as planar graphs). Let $d$ be the degree bound and $n$ be the number of vertices of such a graph. Graphs in these classes have hyperfinite decompositions, where, for sufficiently small $\varepsilon > 0$, one removes $\varepsilon \cdot dn$ edges to get connected components of size independent of $n$. An important tool for sublinear algorithms and property testing for such classes is the partition oracle, introduced by the seminal work of Hassidim-Kelner-Nguyen-Onak (FOCS 2009). A partition oracle is a local procedure that gives consistent access to a hyperfinite decomposition, without any preprocessing. Given a query vertex $v$, the partition oracle outputs the component containing $v$ in time independent of $n$. All the answers are consistent with a single hyperfinite decomposition.

An important open problem in property testing of constructing partition oracles for minor-free graph classes that run in $\text{poly}(d/\varepsilon)$ time per query. We resolve this open problem. As a direct consequence, we obtain $\text{poly}(d/\varepsilon)$-query algorithms for additive $\varepsilon n$-approximations for problems such as maximum matching, minimum vertex cover, maximum independent set, and minimum dominating set for these graph families.

Joint work with Akash Kumar and Andrew Stolman
• Monday, May 3, 2021: Andrea Lincoln (Berkeley). [Time: 3-4pm EDT: Zoom Link] Please note the date and time.
Topic. New Techniques for Proving Fine-Grained Average-Case Hardness
Abstract. In this talk I will cover a new technique for worst-case to average-case reductions. There are two primary concepts introduced in this talk: "factored" problems and a framework for worst-case to average-case fine-grained (WCtoACFG) self reductions.

We will define new versions of OV, kSUM and zero-k-clique that are both worst-case and average-case fine-grained hard assuming the core hypotheses of fine-grained complexity. We then use these as a basis for fine-grained hardness and average-case hardness of other problems. Our hard factored problems are also simple enough that we can reduce them to many other problems, e.g. to edit distance, k-LCS and versions of Max-Flow. We further consider counting variants of the factored problems and give WCtoACFG reductions for them for a natural distribution. To show hardness for these factored problems we formalize the framework of [Boix-Adsera et al. 2019] that was used to give a WCtoACFG reduction for counting k-cliques. We define an explicit property of problems such that if a problem has that property one can use the framework on the problem to get a WCtoACFG self reduction.

In total these factored problems and the framework together give tight fine-grained average-case hardness for various problems including the counting variant of regular expression matching.

Based on joint work with Mina Dalirrooyfard and Virginia Vassilevska Williams.
• Wednesday, May 5, 2021: Josh Alman (Harvard). [Time: 4-5pm EDT: Zoom Link]
Topic. A Refined Laser Method and Faster Matrix Multiplication
Abstract. The complexity of matrix multiplication is measured in terms of $\omega$, the smallest real number such that two $n\times n$ matrices can be multiplied using $O(n^{\omega+\varepsilon})$ field operations for all $\varepsilon>0$; the best bound until now is $\omega<2.37287$ [Le Gall'14]. All bounds on $\omega$ since 1986 have been obtained using the so-called laser method, a way to lower-bound the `value' of a tensor in designing matrix multiplication algorithms. The main result of this paper is a refinement of the laser method that improves the resulting value bound for most sufficiently large tensors. Thus, even before computing any specific values, it is clear that we achieve an improved bound on ω, and we indeed obtain the best bound on ω to date: $\omega<2.37286$. The improvement is of the same magnitude as the improvement that [Le Gall'14] obtained over the previous bound [Vassilevska W.'12]. Our improvement to the laser method is quite general, and we believe it will have further applications in arithmetic complexity.

In this talk, I'll give an overview of how the laser method works and our new improvement. No background about matrix multiplication algorithms will be assumed.

Joint work with Virginia Vassilevska Williams.
• Monday, May 10, 2021: Ali Vakilian (IDEAL Institute TTIC). [Time: 3-4pm EDT: Zoom Link] Please note the date and time.
Topic. Approximation Algorithms for Fair Clustering
Abstract. Growing use of automated machine learning in high-stake decision making tasks has led to an extensive line of research on the societal aspects of algorithms. In particular, the design of fair algorithms for the task of clustering, which is a core problem in both machine learning and algorithms, has received lots of attention in recent years. The fair clustering problem was first introduced in the seminal work of (Chierichetti, Kumar, Lattanzi and Vassilvitskii, 2017) where they proposed the “proportionality/balance inside the clusters” as the notion of fairness. Since then, fairness in clustering has been advanced in this setting and has been further studied in other contexts such as equitable representation and individual fairness.

In this talk, I will overview my recent works on the design of efficient approximation algorithms for fair clustering in the three aforementioned contexts. My talk is based on three papers co-authored with Arturs Backurs, Piotr Indyk, Sepideh Mahabadi, Yury Makarychev, Krzysztof Onak, Baruch Schieber, Tal Wagner.

Bio: Ali Vakilian is a postdoctoral researcher at the IDEAL institute hosted by TTIC. His research interests include learning-based algorithms, algorithmic fairness and algorithms for massive data. Ali received his PhD from MIT under the supervision of Erik Demaine and Piotr Indyk.
• Wednesday, May 12, 2021: Nika Haghtalab (Berkeley). [Time: 4-5pm EDT: Zoom Link]
Topic. Beyond Worst-Case Adversaries in Machine Learning
Abstract. The widespread application of machine learning in practice necessitates the design of robust learning algorithms that can withstand unpredictable and even adversarial environments. The holy grail of robust algorithm design is to provide performance guarantees that do not diminish in presence of all-powerful adversaries who know the algorithm and adapt to its decisions. However, for most fundamental problems in machine learning and optimization, such guarantees are impossible. This indicates that new models are required to provide rigorous guidance on the design of robust algorithms. This talk goes beyond the worst-case analysis and presents such models and perspectives for fundamental problems in machine learning and optimization. I will also present general-purpose techniques that lead to strong robustness guarantees in practical domains and for a wide range of applications, such as online learning, differential privacy, discrepancy theory, and learning-augmented algorithm design.

Bio: Nika Haghtalab is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at University of California, Berkeley. She works broadly on the theoretical aspects of machine learning and algorithmic economics with a special focus on developing foundations for machine learning that accounts for social and strategic interactions. Prior to Berkeley, she was an assistant professor in the CS department of Cornell University in 2019-2020 and a postdoctoral researcher at Microsoft Research, New England in 2018-2019. She received her Ph.D. from the Computer Science Department of Carnegie Mellon University in 2018. Her thesis titled Foundation of Machine Learning, by the People, for the People received the CMU School of Computer Science Dissertation Award and a SIGecom Dissertation Honorable Mention Award.
• Tuesday, May 18, 2021: Jane Lange (MIT). [Time: 4-5pm EDT: Zoom Link]
Topic. Properly learning decision trees in almost-polynomial time
Abstract. We give an $n^{O(\log\log n)}$-time membership query algorithm for properly learning decision trees under the uniform distribution. Our algorithm is furthermore agnostic. Even in the realizable setting, the previous fastest runtime was $n^{O(\log n)}$, a consequence of a classic algorithm of Ehrenfeucht and Haussler [EH 89]. Our algorithm is built on a new structural result for decision trees that strengthens a theorem of O'Donnell, Saks, Schramm, and Servedio [OSSS 05]. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be "pruned" so that every variable in the resulting tree is influential.

Joint work with Guy Blanc, Mingda Qiao, and Li-Yang Tan.
• Wednesday, May 19, 2021: Allen Liu (MIT). [Time: 4-5pm EDT: Zoom Link]
Topic. Settling the Robust Learnability of Mixtures of Gaussians
Abstract. This work represents a natural coalescence of two important lines of work -- learning mixtures of Gaussians and algorithmic robust statistics. In particular, we give the first provably robust algorithm for learning mixtures of any constant number of Gaussians. We require only mild assumptions on the mixing weights and non-degeneracy of the components. At the heart of our algorithm is a new method for proving dimension-independent polynomial identifiability through applying a carefully chosen sequence of differential operations to certain generating functions that not only encode the parameters we would like to learn but also the system of polynomial equations we would like to solve. We show how the symbolic identities we derive can be directly used to analyze a natural sum-of-squares relaxation.

Joint work with Ankur Moitra. Paper: https://arxiv.org/abs/2011.03622, Follow-up: https://arxiv.org/abs/2104.09665.