## Algorithms & Complexity Seminar, MIT : Spring 2019
The Algorithms & Complexity Seminar for Spring 2019 will usually (unless otherwise stated) meet on ## Upcoming Talks## Past Talks-
**Thursday, February 7, 2019**:**Jerry Li**(Microsoft Research Redmond). [32-G575 Time: 4-5pm]
**Topic.**Nearly Optimal Algorithms for Robust Mean Estimation.**Abstract.**Robust mean estimation is the following basic estimation question: given samples from a distribution, where an $\epsilon$-fraction of them have been corrupted, how well can you estimate the mean of the distribution? This is a classical problem in statistics, going back to the 60's and 70's, and has recently found application to many problems in reliable machine learning. However, in high dimensions, classical algorithms for this problem either were (1) computationally intractable, or (2) lost poly(dimension) factors in their accuracy guarantees. Recently, polynomial time algorithms have been demonstrated for this problem that still achieve (nearly) optimal error guarantees. However, the runtimes of these algorithms still had additional polynomial factors which can render them ineffective in practice. In this talk we give the first truly nearly linear time algorithms for these problems that achieve nearly optimal statistical performance. The algorithms are surprisingly simple, and are based on directly instantiating the matrix multiplicative weights framework. Moreover, these algorithms apply very generally to a wide class of distributions. -
**Tuesday, February 19, 2019**:**Gautam Kamath**(Simons Institute). [32-D463 (Star) 3:45-4:45pm]
**Topic.**Privately Learning High-Dimensional Distributions**Abstract.**We present novel, computationally efficient, and differentially private algorithms for two fundamental high-dimensional learning problems: learning a multivariate Gaussian in $\mathbb{R}^d$ and learning a product distribution in $\{0,1\}^d$ in total variation distance. The sample complexity of our algorithms nearly matches the sample complexity of the optimal non-private learners for these tasks in a wide range of parameters. Thus, our results show that privacy comes essentially for free for these problems, providing a counterpoint to the many negative results showing that privacy is often costly in high dimensions. Our algorithms introduce a novel technical approach to reducing the sensitivity of the estimation procedure that we call recursive private preconditioning, which may find additional applications.
Based on joint work with Jerry Li, Vikrant Singhal, and Jonathan Ullman. -
**Wednesday, February 27, 2019**:**Fang-Yi Yu**(University of Michigan). [32-G575 3:00-4:00pm]
**Topic.**Opinion formation, stochastic gradient descent, and gradient-like systems**Abstract.**We study opinion dynamics on networks with two communities. Each node has one of two opinions and updates its opinion as a ``majority-like" function of the frequency of opinions among its neighbors. The networks we consider are weighted graphs comprised of two equally sized communities where intracommunity edges have weight $p$, and intercommunity edges have weight $q$. Thus $q$ and $p$ parameterize the connectivity between the two communities. We prove a dichotomy theorem about the interaction of the two parameters: 1) the ``majority-like" update function, and 2) the level of intercommunity connectivity. For each set of parameters, we show that either: the system quickly converges to consensus with high probability in time $\Theta(n \log(n))$; or, the system can get ``stuck" and take time $2^{\Theta(n)}$ to reach consensus. Technically, we achieve this the fast convergence result by exploiting the connection between a family of reinforced random walks and dynamical systems literature. Our main result shows if the systems are a reinforced random walk with a gradient-like function, it converges to an arbitrary neighborhood of a local attracting point in $O(n\log n)$ time with high probability. This result adds to the recent literature on saddle-point analysis and shows a large family of stochastic gradient descent algorithm converges to local minimal in $O(n\log n)$ when the step size $O(1/n)$. -
**Wednesday, March 6, 2019**:**Maximilian Probst**(University of Copenhagen). [32-G575 Time: 4-5pm]
**Topic.**Decremental Strongly-Connected Components and Single-Source Reachability in Near-Linear Time.**Abstract.**Computing the Strongly-Connected Components (SCCs) in a graph $G=(V,E)$ is known to take only $O(m + n)$ time using an algorithm by Tarjan from 1972 [SICOMP 72] where $m = |E|$, $n=|V|$. For fully-dynamic graphs, conditional lower bounds provide evidence that the update time cannot be improved by polynomial factors over recomputing the SCCs from scratch after every update. Nevertheless, substantial progress has been made to find algorithms with fast update time for decremental graphs, i.e. graphs that undergo edge deletions.
In this paper, we present the first algorithm for general decremental graphs that maintains the SCCs in total update time $O(m \cdot \text{polylog}(n))$, thus only a polylogarithmic factor from the optimal running time. Previously such a result was only known for the special case of planar graphs [Italiano et al, STOC 17]. Our result should be compared to the formerly best algorithm for general graphs achieving $O(m\sqrt{n} \text{polylog}(n))$ total update time by Chechik et.al. [FOCS 16] which improved upon a breakthrough result leading to $O(mn^{0.9 + o(1)})$ total update time by Henzinger, Krinninger and Nanongkai [STOC 14, ICALP 15]; these results in turn improved upon the longstanding bound of $O(mn)$ by Roditty and Zwick [STOC 04]. -
**Wednesday, April 17, 2019**:**Alexander Golovnev**(Harvard University). [32-G575 Time: 4-5pm]
**Topic.**Static Data Structure Lower Bounds Imply Rigidity.**Abstract.**We show that static data structure lower bounds in the group (linear) model imply semiexplicit lower bounds on matrix rigidity. In particular, we prove that an explicit lower bound of $t > \omega(log^2 n)$ on the cell-probe complexity of linear data structures in the group model, even against arbitrarily small linear space, $s = (1 + \epsilon)n$, would already imply a semi-explicit (P^NP) construction of rigid matrices with significantly better parameters than the current state of art (Alon, Panigrahy and Yekhanin, 2009). Our results further assert that polynomial, $t \geq n^{\delta}$, data structure lower bounds against near-optimal space, would imply super-linear circuit lower bounds for log-depth linear circuits (a four-decade open question). In the succinct space regime, $s = n + o(n)$, we show that any improvement on current cell-probe lower bounds in the linear model would also imply new rigidity bounds. Our results rely on a new connection between the “inner” and “outer” dimensions of a matrix (Paturi and Pudlak, 2006), and on a new reduction from worst-case to average-case rigidity, which is of independent interest.
Joint work with Zeev Dvir and Omri Weinstein. -
**Friday, April 26, 2019**:**Rio LaVigne**(MIT). [32-D463 (Star) Time: 11am-12pm]
**Topic.**Adversarially Robust Property Preserving Hashes.**Abstract.**Property-preserving hashing is a method of compressing a large input $x$ into a short hash $h(x)$ in such a way that given $h(x)$ and $h(y)$, one can compute a property $P(x,y)$ of the original inputs. The idea of property-preserving hash functions underlies sketching, compressed sensing and locality-sensitive hashing.
Property-preserving hash functions are usually probabilistic: they use the random choice of a hash function from a family to achieve compression, and as a consequence, err on some inputs. Traditionally, the notion of correctness for these hash functions requires that for every two inputs $x$ and $y$, the probability that $h(x)$ and $h(y)$ mislead us into a wrong prediction of $P(x,y)$ is negligible. As observed in many recent works (incl. Mironov, Naor and Segev, STOC 2008; Hardt and Woodruff, STOC 2013; Naor and Yogev, CRYPTO 2015), such a correctness guarantee assumes that the adversary (who produces the offending inputs) has no information about the hash function, and is too weak in many scenarios.
We initiate the study of adversarial robustness for property-preserving hash functions, provide definitions, derive broad lower bounds due to a simple connection with communication complexity, and show the necessity of computational assumptions to construct such functions. Our main positive results are two candidate constructions of property-preserving hash functions (achieving different parameters) for the (promise) gap-Hamming property which checks if $x$ and $y$ are “too far” or “too close.” Our first construction relies on generic collision-resistant hash functions, and our second on a variant of the syndrome decoding assumption on low-density parity. Joint work with Elette Boyle and Vinod Vaikuntanathan. -
**Tuesday, April 30, 2019**:**Bistra Dilkina**(University of Southern California). [Patil/Kiva 32-G575 Time: 4-5pm]
**Topic.**Learning-Driven Algorithms for Discrete Optimization.**Abstract.**This talk focuses on a novel fruitful synergy between machine learning and optimization --- in particular, how ML techniques can improve the design of algorithms for Discrete Optimization, both complete algorithms such as branch and bound as well as incomplete ones such as heuristic greedy search. Branch and Bound solvers for Mixed Integer Programs (MIP) such as CPLEX, Gurobi and SCIP are used daily across different domains and industries to find solutions with optimality guarantees for NP-hard combinatorial problems. Leveraging the plethora of rich and useful data generated during the solving process, we illustrate the potential for ML in MIP on two crucial tasks in branch and bound: branching variable selection and primal heuristic selection. Empirical results show that our novel approaches can significantly improve the performance of a solver on both heterogeneous benchmark instances as well as homogeneous families of instances. In the second part of the talk, we show how to leverage a unique combination of reinforcement learning and graph embedding to infer very effective data-driven greedy strategies for solving well-studied combinatorial optimization problems on graphs such as Minimum Vertex Cover, Max Cut and Traveling Salesman. I will conclude with some new directions on 1) learning novel heuristics that apply to any MIP problem distributions, as well as 2) decision-focused learning.
**Bio:**Bistra Dilkina is a Gabilan Assistant Professor of Computer Science at the University of Southern California and an Associate Director of the USC Center for AI in Society (CAIS), since January 2018. Before that, Dilkina was as an Assistant Professor in the College of Computing at the Georgia Institute of Technology and a co-director of the Data Science for Social Good Atlanta summer program. She received her PhD from Cornell University, and was a Post-Doctoral Associate at the Institute for Computational Sustainability. Her work spans discrete optimization, machine learning, network design, and stochastic optimization. Dilkina's research focuses on advancing the state of the art for solving real-world large-scale combinatorial optimization problems, particularly ones that arise in sustainability areas such as biodiversity conservation planning and urban planning. Dilkina is one of the junior faculty leaders in the young field of Computational Sustainability, and has co-organized workshops, tutorials, special tracks at AAAI and IJCAI, as well as a doctoral consortium on Computational Sustainability. -
**Wednesday, May 1, 2019**:**Greg Yang**(Microsoft Research Redmond). [32-G575 Time: 4-5pm]
**Topic.**Batch Normalization Causes Gradient Explosion in Deep Randomly Initialized Networks.**Abstract.**Batch Normalization (batchnorm) has become a staple in deep learning since its introduction in 2015. The authors conjectured that “Batch Normalization may lead the layer Jacobians to have singular values close to 1” and recent works suggest it benefits optimization by smoothing the optimization landscape during training. We disprove the “Jacobian singular value” conjecture for randomly initialized networks, showing batchnorm causes gradient explosion that is exponential in depth. This implies that at initialization, batchnorm in fact “roughens” the optimization landscape. This explosion empirically prevents one from training ReLU networks with more than 50 layers without skip connection. We discuss several ways of mitigating this explosion and their relevance in practice. -
**Wednesday, May 15, 2019**:**Ofir Geri**(Stanford). [32-G575 Time: 4-5pm]
**Topic.**Sampling Sketches for Concave Sublinear Functions of Frequencies.**Abstract.**We consider massive distributed datasets that consist of elements that are key-value pairs. Our goal is to compute estimates of statistics or aggregates over the data, where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning. A common approach is to maintain a sample of keys and estimate statistics from the sample. Ideally, to obtain low-variance estimates we sample keys with probabilities proportional to their contributions. One simple way to do so is to first aggregate the raw data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. This aggregation however requires data structures of size proportional to the number of distinct keys and is too costly when the number is very large. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies (including log, the moments $x^p$ for 0 < p < 1, and more).
Joint work with Edith Cohen. -
**Wednesday, May 22, 2019**:**Josh Wang**(Google Research). [32-G575 Time: 4-5pm]
**Topic.**Tinkering with Double-Greedy.**Abstract.**Submodular functions can be used to model a wide array of important problems from fields such as theoretical computer science, economics, and machine learning. One of the most basic problems in submodular optimization is to maximize a submodular function $f$. When the function is not necessarily monotone, this problem is known to be NP-hard to approximate better than a factor of 1/2 [Feige, Mirrokni, Vondrak '11; Dobzinski and Vondrak '12]. This lower bound was met by the landmark Double-Greedy algorithm of [Buchbinder, Feldman, Naor, Schwartz '12]. In this talk, I review this embarrassingly simple algorithm and then present two extensions to different settings. The first extension is to online no-regret learning, where we receive a submodular function each round and want to do well compared to the best fixed point in hindsight. The second extension concerns the weaker generalization of submodularity to the continuous setting, where we are not guaranteed that the function is coordinate-wise concave. Both extensions are powered by a game-theoretic view of the Double-Greedy algorithm.
Joint work with Tim Roughgarden and Rad Niazadeh. -
**Wednesday, May 29, 2019**:**Greg Yang**(Microsoft Research). [32-G575 Time: 4-5pm]
**Topic.**Tensor Programs: A Swiss-Army Knife for Nonlinear Random Matrix Theory of Deep Learning and Beyond.**Abstract.**The resurgence of neural networks has revolutionized artificial intelligence since 2010. Luckily for mathematicians and statistical physicists, the study of large random network scaling limits, which can be thought of as *nonlinear* random matrix theory, is both practically important and mathematically interesting. We describe several problems in this setting and develop a new comprehensive framework, called “tensor programs,” for solving these problems. This framework can be thought of as an automatic tool to derive the behavior of computation graphs with large matrices, as used in neural network computation. It is very general, and from it we also obtain new proofs of the semicircle and the Marchenko-Pastur laws. Thus, “tensor programs” is broadly useful to linear and nonlinear random matrix theory alike, and we hope it will be adopted as a standard tool.
This talk presents the work arXiv:1902.04760. -
**Wednesday, June 5, 2019**:**Jerry Zheng Li**(Microsoft Research). [32-G575 Time: 4-5pm]
**Topic.**The Sample Complexity of Toeplitz Covariance Estimation.**Abstract.**We study the query complexity of estimating the covariance matrix T of a distribution D over d-dimensional vectors, under the assumption that T is Toeplitz. This assumption is standard in a wide variety of signal processing problems, where the covariance between any two measurements only depends on the time or distance between those measurements. In many of these applications, we are interested in estimation strategies that may choose to view only a subset of entries in each d-dimensional sample x from D. We care about minimizing both 1) the number of samples taken and 2) the number of entries accessed in each sample, which often equates to minimizing equipment requirements in applications ranging from wireless transmission to advanced imaging.
We give many of the first non-asymptotic bounds on these sample complexity measures. We analyze classical and widely used estimation algorithms, in particular methods based on selecting entries from each sample according to a `sparse ruler'. We explain how sparse ruler based estimation can significantly outperform naive methods when T is close to low-rank, as is often the case in practice. We also develop a novel sampling and estimation strategy that improves on known algorithms in the low-rank case. Our method utilizes tools from random matrix sketching, leverage score based sampling techniques for continuous time signals, and sparse Fourier transform algorithms.
Joint work with Y. Eldar, C. Musco, and C. Musco. -
**Friday, July 5, 2019**:**Arnab Bhattacharyya**(NUS). [32-G449 Time: 11-12pm] (Hosted by Nikhil Vyas)
**Topic.**Parameterized Intractability of Even Set and Shortest Vector Problem.**Abstract.**The k-Even Set problem is a parameterized variant of the Minimum Distance Problem of binary linear codes, which can be stated as follows: given a generator matrix A and an integer k, determine whether the code generated by A has distance at most k, or in other words, whether there is a nonzero vector x such that Ax has at most k nonzero coordinates. The question of whether k-Even Set is fixed parameter tractable (FPT) parameterized by the distance k has been repeatedly raised in literature; in fact, it is one of the few remaining open questions from the seminal book of Downey and Fellow (1999). In this work, we show that k-Even Set is W[1]-hard under randomized reductions.
We also consider the parameterized k-Shortest Vector Problem (SVP), in which we are given a lattice whose basis vectors are integral and an integer k, and the goal is to determine whether the norm of the shortest vector (in the ℓ_p norm for some fixed p) is at most k. Similar to k-Even Set, understanding the complexity of this problem is also a long-standing open question in the field of Parameterized Complexity. We show that, for any $p > 1$, k-SVP is W[1]-hard to approximate (under randomized reductions) to some constant factor.
Joint work with Edouard Bonnet, Laszlo Egri, Suprovat Ghoshal, Karthik C.S., Bingkai Lin, Pasin Manurangsi and Daniel Marx.
## Theory CalendarAdd this calendar to yours to receive details about upcoming events in the Theory group
## Earlier Editions of A&C Seminar |